| Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part I. In Proceedings of the Sixth SIAM Conference on Parallel Proceesing for Scientific Computing. SIAM, 1993. Long version available as UC Berkeley Computer Science report all.ps.Z via anonymous ftp from toe.cs.berkeley.edu, directory pub/tech-reports/cs/csd-92-718. 32 |
....architectures particularly hierarchical memory architectures. This is so much the case that some high performance algorithms work around the slow but reliable QR algorithm by using faster but less reliable methods to tear or split o# smaller submatrices on which to apply the QR algorithm [2, 5, 6, 7, 23, 33]. An exception is the successful high performance pipelined Householder QZ algorithm in [18] Although this paper is not directly concerned with distributed memory computation, it is worth noting that there are distributed memory implementations of the QR algorithm [31, 45, 48, 50] Readers of ....
....best. Many manufacturers supply hand tuned, extraordinarily e#cient implementations of the BLAS. Automatically tuned versions of the BLAS [61] also perform well. It is the ability to exploit matrix matrix multiplies that makes spectral splitting methods attractive competitors to the QR algorithm [2, 5, 6, 7]. The small bulge multishift QR algorithm which we propose attains much of its e#ciency through the level 3 BLAS. 1.1.2. Notation. Throughout this paper we use the following notation and definitions. 1. We will use the colon notation to denote submatrices: H i:j,k:l is the submatrix of matrix ....
Z. Bai and J. Demmel, Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part II, Tech. Report 95-11, Department of Mathematics, University of California, Berkeley, 1995.
....architectures particularly hierarchical memory architectures. This is so much the case that some high performance algorithms work around the slow but reliable QR algorithm by using faster but less reliable methods to tear or split o# smaller submatrices on which to apply the QR algorithm [2, 5, 6, 7, 23, 33]. An exception is the successful high performance pipelined Householder QZ algorithm in [18] Although this paper is not directly concerned with distributed memory computation, it is worth noting that there are distributed memory implementations of the QR algorithm [31, 45, 48, 50] Readers of ....
....best. Many manufacturers supply hand tuned, extraordinarily e#cient implementations of the BLAS. Automatically tuned versions of the BLAS [61] also perform well. It is the ability to exploit matrix matrix multiplies that makes spectral splitting methods attractive competitors to the QR algorithm [2, 5, 6, 7]. The small bulge multishift QR algorithm which we propose attains much of its e#ciency through the level 3 BLAS. 1.1.2. Notation. Throughout this paper we use the following notation and definitions. 1. We will use the colon notation to denote submatrices: H i:j,k:l is the submatrix of matrix ....
Z. Bai and J. Demmel, Design of a parallel nonsymmetric eigenroutine toolbox, part I, in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, Vol. 1, SIAM, Philadelphia, 1993, pp. 391--398.
....sign function for finding all the eigenvalues in a region of the complex plane and the corresponding invariant subspace. The building blocks are BLAS, QR and LU factorizations and the matrix sign function. They also develop supporting perturbation theory, stability analysis and refinement schemes [9]. Tools other than the sign function can be used to obtain the desired invariant subspace Q 1 . One alternative is an algorithm involving no matrix inversions that is explored by Bai, Demmel and Gu [10] and is based on original algorithms of Bulgakov, Godunov and Malyshev. Other schemes have been ....
....based on original algorithms of Bulgakov, Godunov and Malyshev. Other schemes have been suggested; see the references in [10] The spectral divide and conquer approach can be extended to compute deflating subspaces of a matrix pencil A Gamma B and hence to solve the generalized eigenproblem; see [9], 10] Condition numbers and error bounds for the generalized eigenproblem and algorithms for reordering the generalized Schur form are given by Kagstrom and Poromaa [95] A comprehensive survey of existing parallel eigenroutines and their limitations is given in [39] 6 The Symmetric ....
Zhaojun Bai and James W. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, part II. Computer Science Division Report UCB/CSD-94-???, University of California, Berkeley, CA, USA, ??? 1994.
....the spectrum has been divided along the imaginary axis. By computing the QR factorization of the sign function of ffA fiI for complex ff and fi, or of (A fiI) ffI for real ff and fi, we can divide the spectrum along other lines in the complex plane, retaining real arithmetic if A is real [7], 88] 122] Using this approach, we can determine the eigenvalues lying within quite general regions of the complex plane. The matrix sign function can be computed using the Newton iteration A i 1 = A i A i ) A 0 = A, which converges globally and quadratically to sign(A) whenever ....
....complex plane. The matrix sign function can be computed using the Newton iteration A i 1 = A i A i ) A 0 = A, which converges globally and quadratically to sign(A) whenever sign(A) is defined. Other iterations, some with more natural parallelism, are available too [98] Bai and Demmel [7] develop a toolbox of routines based on the matrix sign function for finding all the eigenvalues in a region of the complex plane and the corresponding invariant subspace. The building blocks are BLAS, QR and LU factorizations and the matrix sign function. They also develop supporting perturbation ....
Zhaojun Bai and James W. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part I. In Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, Volume I, Richard F. Sincovec, David E. Keyes, Michael R. Leuze, Linda R. Petzold, and Daniel A. Reed, editors, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1993, pages 391--398.
....and suitability for parallel implementation [43] Numerical stability is a critical issue in every proposed numerical algorithm. However, a theoretical analysis of stability of algorithms we propose for M G 1 and G M 1 type Markov chains is beyond the scope of the current paper. We refer to [3] and [4] on the numerical stability of matrix sign iterations based on which the algorithms are developed. The organization of the paper is as follows. In Section 2, mathematical preliminaries necessary for the development of the paper are presented, including polynomial matrices, polynomial ....
....) Tsgn(M)T for any nonsingular T . By property 2, an orthogonal basis for the left invariant subspace of M which has a dimension r is given by the first r columns of the orthogonal matrix in a rank revealing QR decomposition of (Z Gamma I) 11] Alternative methods are presented in [3] and [4] The above definition for the matrix sign does not lend itself to an efficient computation but there are several ways of evaluating the matrix sign function. The simplest iteration scheme is Newton s method applied to sgn(M) I: Z 0 = M; Z k 1 = 1 2 (Z k Z k ) 17) Then ....
[Article contains additional citation context not shown here]
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox: part I. Computer Science Division Report CSD-92-718, University of California at Berkeley, Dec. 1992.
....scheme Z k 1 = 1 2c k (Z k c k Z k ) c k = j det Z k j 1=m ; 11) where c k is an appropriately chosen scalar and m is the matrix size. Also note that the determinant det Z k can be calculated from the LU or QR factors that we use to perform the matrix inversion at each step [3] so that this kind of scaling does not introduce additional complexity to the classical scheme (10) The stopping criterion we will use is the one experimented in [3] jj Z k 1 Gamma Z k jj 1 jj Z k jj 1 ; 12) where is a small, user specified error bound. More general iteration schemes ....
.... that the determinant det Z k can be calculated from the LU or QR factors that we use to perform the matrix inversion at each step [3] so that this kind of scaling does not introduce additional complexity to the classical scheme (10) The stopping criterion we will use is the one experimented in [3]: jj Z k 1 Gamma Z k jj 1 jj Z k jj 1 ; 12) where is a small, user specified error bound. More general iteration schemes for computing the matrix sign function are based on Pad e approximations [13] In this case, the convergence rate is improved at the expense of more computational load ....
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox: part I. Computer Science Division Report CSD-92-718, University of California at Berkeley, Dec. 1992.
.... for solving the ARE (5) As our focus is on high performance computing and particularly on parallel algorithms for distributed memory architectures, our methods for solving Lyapunov equations will be based on the sign function method which is known to be easily and eciently parallelizable; see [4, 5, 10, 20]. A sign function based Lyapunov solver is then also used in the inner loop of Newton s method for solving AREs; we use the same procedure to solve (5) 3.1 Solving stable Lyapunov equations with the sign function method In this section we describe Lyapunov equation solvers based on the matrix ....
....of the Newton iteration is globally quadratic, the initial convergence may be slow. Acceleration is possible, e.g. via determinantal scaling [15] Z k c k Z k ; c k = j det (Z k )j 1 n ; where det (Z k ) denotes the determinant of Z k . Other acceleration schemes can be employed; see [4] for a comparison of these schemes. Roberts [40] was the rst to use the matrix sign function for solving Lyapunov (and Riccati) equations. In the proposed method, the solution of the stable Lyapunov equation F T X XF Q = 0; 20) is computed by applying the Newton iteration (19) to the ....
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part I. In R.F. Sincovec et al., editor, Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientic Computing, pages 391-398. SIAM, Philadelphia, PA, 1993. See also: Tech. Report CSD-92-718, Computer Science Division, University of California, Berkeley, CA 94720.
....eigenvalue problem have been allusive. There are several matrix multiply based methods currently being studied. Auslander and Tsao [2] and Lederman, Tsao, and Turnbull [27] have a matrix multiplybased parallel algorithm, which uses a polynomial mapping of the eigenvalues. Bai and Demmel [4] have another parallel algorithm based on bisection with the matrix sign function. Matrix tearing methods for finding the eigensystem of an unsymmetric Hessenberg matrix have been proposed by Dongarra and Sidani [11] These involve doing a rank one change to the Hessenberg matrix to make two ....
....implementation of this method on a high performance parallel computer shows that in practice, parallelism can be extracted as well. We caution the reader against misinterpreting the results in this paper as largely supporting the notion that new methods like those developed by Bai and Demmel [4] and Dongarra and Sidani [11] must be pursued if nonsymmetric eigenvalue problems are to be solved on massively parallel computers. Let us address some of the arguments that can be made to support such an interpretation and how these arguments are somewhat unsatisfactory. No parallelism exists in ....
Bai, Z., Demmel, J., Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part I, Parallel Processing for Scientific Computing, Editors R. Sincovec, D. Keyes, M. Leuze, L. Petzold, and D. Reed, pp. 391--398, SIAM Publications, Philadelphia, PA, 1993
....is highly parallel. Auslander and Tsao [2] and Lederman, Tsao, and Turnbull [33] use multiply based parallel algorithms based on matrix polynomials to split the spectrum. Bai and Demmel [4] use similar matrix multiply techniques using the matrix sign function to split the spectrum (see also [6, 10, 5, 7]. Dongarra and Sidani [17] introduced tearing methods based on doing rank one updates to an unsymmetric Hessenberg matrix, resulting in two smaller problems, which are solved independently and then glued back together with a Newton iteration. This tends to suffer from stability problems since ....
Bai, Z., Demmel J., Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part II, University of California at Berkeley Technical Report in Progress: 1/96
....attention recently have been based on matrix multiplication. The reason is clear: large matrix multiplication is highly parallel. Auslander and Tsao [2] and Lederman, Tsao, and Turnbull [33] use multiply based parallel algorithms based on matrix polynomials to split the spectrum. Bai and Demmel [4] use similar matrix multiply techniques using the matrix sign function to split the spectrum (see also [6, 10, 5, 7] Dongarra and Sidani [17] introduced tearing methods based on doing rank one updates to an unsymmetric Hessenberg matrix, resulting in two smaller problems, which are solved ....
.... problem [30] In situations where more than just a few of the eigenvalues (and perhaps eigenvectors as well) are needed, the most competitive serial algorithm is the QR algorithm [20, 1] Matrix multiply methods tend to require many more flops, as well as sometimes encountering accuracy problems [4]. Although matrix tearing methods may have lower flops counts, they require finding all the eigenvectors and hence are only useful when all the eigenvectors are required. Furthermore, there are instances where they simply fail [30] Jacobi methods [23] have notoriously high flop counts. There are ....
Bai, Z., Demmel, J., Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part I, Parallel Processing for Scientific Computing, Editors R. Sincovec, D. Keyes, M. Leuze, L. Petzold, and D. Reed, pp. 391--398, SIAM Publications, Philadelphia, PA, 1993
....considerable freedom in implementing SYISDA, in particular with respect to choosing the polynomials p i as well as the method for computing the invariant subspaces. We also mention that any other method that produces invariant subspaces, such as approximation methods for the matrix sign function [12, 13, 20, 2], could be used in the Eigenvalue Smoothing step as well. As in [26] we use predominantly the first incomplete beta function 3x 2 Gamma 2x 3 in our implementation. The experiments in [26] also confirm the numerical robustness of SYISDA. While the SYISDA algorithm can be used to compute a ....
Bai, Z., & J. Demmel, Design of parallel nonsymmetric eigenroutine toolbox, Part I, Research report 92-09, University of Kentucky (Dec. 1992), (also PRISM Working Note #5).
.... Jacobi methods [29, 7, 10, 30] and homotopy methods [25] Parallelizable algorithms for dense nonsymmetric matrices that have been investigated include the QR algorithm [3, 32] Jacobi like methods [31] homotopy methods [24] and the matrix sign function approach to computing invariant subspaces [6, 11, 12, 19, 26, 4]. The purpose of this paper is to present preliminary research results on a new algorithm for finding all the eigenvalues and eigenvectors of a real diagonalizable matrix with real eigenvalues. Although this class of matrices is not completely general, it includes the important class of real ....
....feel that the beta function approach promises more robust, scalable performance than the matrix sign approach for the matrices we are considering. However, for the general nonsymmetric eigenvalue problem where the matrices may have complex eigenvalues, the matrix sign approach is quite promising [6, 11, 12, 19, 26, 4]. 3. Test cases. Testing of the algorithm described was performed on both nonsymmetric and symmetric matrices. Even though the code performs dense computations and does not take advantage of sparsity, we tested our algorithm on both dense and upper Hessenberg matrices, since the reduction to ....
Z. BAI AND J. DEMMEL, Design of Parallel Nonsymmetric Eigenroutine Toolbox, Part I, Research report 92-09, University of Kentucky, Lexington, KY, December 1992.
....though some of the proposed algorithms are of dubious computational merit. Recently there has been a resurgence of interest in the matrix sign function because of its suitability for constructing parallel algorithms [9] 14] 15] particularly in the context of the nonsymmetric eigenproblem [3], 37] 3 The polar decomposition is much older than the matrix sign function. It was introduced by Autonne in 1902 [2] It is the decomposition A = UH of A 2 C m Thetan (m n) where U 2 C m Thetan has orthonormal columns and H 2 C n Thetan is Hermitian and positive semidefinite. If A ....
Zhaojun Bai and James W. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part I. In Richard F. Sincovec, David E.
....of Kansas, Lawrence, KS 66045, USA. 1 subspaces the solution of many classes of matrix equations can be obtained. These include linear and quadratic matrix equations as well as some rational and higher order polynomial matrix equations like the matrix m th root and m th sector function, see [4, 24, 26, 37]. 2 Preliminaries By C n Thetan and R n Thetan we denote the sets of complex or real n Theta n matrices, respectively. By I n and 0 n the n Theta n identity matrix and zero matrix, respectively and we set J n = h 0n GammaI n In 0n i . We omit the subscript n, if the sizes are clear ....
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part I. In R.F. Sincovec et al, editor, Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, 1993. See also: Tech. Report CSD-92-718, Computer Science Division, University of California, Berkeley, CA 94720.
....nonsymmetric algebraic eigenvalue problem is highly challenging. The QR algorithm, which is by far the most efficient and robust serial algorithm, offers limited potential in parallelism [8] Efficient, stable and highly scalable algorithms for today s massively parallel computers remain in search [1, 4, 5, 11]. Different versions of homotopy continuation method for this problem has been reported [12, 13, 17] The algorithm has the advantage of natural parallelism and scalability. However, those early versions do not have satisfactory stability or efficiency for matrices arising in applications. In this ....
Z. Bai and J. Demmel, Design of a parallel nonsymmetric eigenroutine toolbox, part I, Parallel Processing for Scientific Computing, R. Sincovec, et al eds, pp 391--398, SIAM, Philadelphia, 1993
....only. We show that from these deflating subspaces the solution of many classes of matrix equations can be obtained. These include linear and quadratic matrix equations as well as some rational and higher order polynomial matrix equations like the matrix m th root and m th sector function, see [5, 33, 35, 49]. 1 All authors were partially supported by Deutsche Forschungsgemeinschaft, Sonderforschungsbereich 393, Numerische Simulation auf massiv parallelen Rechnern . 2 This author was partially supported by National Science Foundation awards CCR 9732671, MRI9977352, and by the NSF EPSCoR K STAR ....
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part I. In R.F. Sincovec et al, editor, Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, 1993. See also: Tech. Report CSD-92-718, Computer Science Division, University of California, Berkeley, CA 94720.
....a random 10 by 10 matrix. Only one path of a complex conjugate pair of eigenpaths is shown. and Sidani [6] Saad [25] Shroff [28] Sorensen [29] Ruhe [24] and Bai et al. [2] The classic reference for the eigenvalue problem is the treatise by Wilkinson [30] See also Saad [26] and Bai and Demmel [4] and references therein. Except for some of the numerical results, the work in this paper had been completed in Lui [22] In the paper of Li, Zeng and Cong [20] they prove Lemma A.1 5 (which they attribute to an unpublished work of H. B. Keller) which gives a necessary condition for a certain ....
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, part i. Technical Report UCB/CSD-92-718, 1992.
.... by Kenney and Laub [28] Unless otherwise stated, we will focus on the numerical experimentation of the scaling scheme (21) since it is widely accepted [33] Also note that the determinant det Z k can be calculated from the LU or QR factors that we use to perform the matrix inversion at each step [6] so that this kind of scaling does not introduce additional complexity to the classical scheme (19) The stopping criterion we will use is the one experimented in [6] jj Z k 1 Gamma Z k jj 1 jj Z k jj 1 ; 22) where is a small, user specified error bound. More general iteration schemes ....
.... that the determinant det Z k can be calculated from the LU or QR factors that we use to perform the matrix inversion at each step [6] so that this kind of scaling does not introduce additional complexity to the classical scheme (19) The stopping criterion we will use is the one experimented in [6]: jj Z k 1 Gamma Z k jj 1 jj Z k jj 1 ; 22) where is a small, user specified error bound. More general iteration schemes for computing the matrix sign function are based on Pad e approximations [27] Below we give a brief summary of these schemes based on [27] We first seek for rational ....
[Article contains additional citation context not shown here]
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox: part I. Computer Science Division Report CSD-92-718, University of California at Berkeley, Dec. 1992. N. Akar and K. Sohraby, On Computational Aspects of the Invariant Subspace Approach 38
....the scope of this paper. Furthermore, iterations with higher convergence rates (e.g. cubic convergence) have also been proposed in [11] at the expense of more computational load at each iteration, compared with the iteration given in (43) We propose the following stopping criterion based on [5]: jj Z k 1 Gamma Z k jj 1 jj Z k jj 1 ; 44) where is a small, user specified error bound. 4 Numerical Algorithm We now summarize all the steps for finding the steady state distribution for the finite QBD process with the probability transition matrix P given as in (1) when the traffic ....
.... time efficient algorithms (e.g. the popular iteration (43) with quadratic (or higher) convergence rates are available in the linear algebra literature [11] Different matrix sign iterations or other invariant subspace computation methods (e.g. Schur decomposition methods) can also be employed [5]. 5 Extension to fl 0 When the traffic parameter fl in (3) is greater than zero, the form of stationary probabilities is still the same as in (11) but the properties of the matrix geometric factors R 1 and R 2 are a little different. Based on [7] Delta 1 (z) now has exactly m Gamma 1 roots ....
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox: part I. Computer Science Division Report CSD-92-718, University of California at Berkeley, Dec. 1992.
.... Kenney and Laub [24] Furthermore, iterations with higher convergence rates (e.g. cubic convergence) have also been proposed in [23] at the expense of increased computational load at each iteration as compared to the iteration given in (58) We propose the following stopping criterion based on [6]: jjZ k 1 Gamma Z k jj 1 jjZ k jj 1 ; 59) where is a small, user specified error bound. Recalling equations (51) and (53) all above discussion suggests that the matrix geometric factors R 1 and R 2 can be computed by performing two rank revealing QR decompositions after finding the ....
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox: part I. Computer Science Division Report CSD-92-718, University of California at Berkeley, Dec. 1992.
....axis eigenvalues is (see [7] and [14] Z 0 = M; Z k 1 = 1 2c k (Z k c 2 k Z Gamma1 k ) c k = j det(Z k )j 1=m : 47) Then, lim k 1 Z k = Z = sgn(M) where sgn(M) denotes the matrix sign of M , and convergence is quadratic. The stopping criterion we use is the one proposed in [5]: jjZ k 1 Gamma Z k jj 1 ffljjZ k jj 1 : 48) The most important property of matrix sign is that Im (Z Gamma I) Im (Z I) is equal to the left (right) invariant subspace of M [32] Then find S = sgn(W e ) 49) through the matrix sign function iterations (47) Recall that there are m u ....
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox: Part I. Computer Science Division Report CSD-92-718, University of California at Berkeley, Dec. 1992.
....and suitability for parallel implementation [43] Numerical stability is a critical issue in every proposed numerical algorithm. However, a theoretical analysis of stability of algorithms we propose for M G 1 and G M 1 type Markov chains is beyond the scope of the current paper. We refer to [3] and [4] on the numerical stability of matrix sign iterations based on which the algorithms are developed. The organization of the paper is as follows. In Section 2, mathematical preliminaries necessary for the development of the paper are presented, including polynomial matrices, polynomial ....
....) Tsgn(M)T Gamma1 for any nonsingular T . By property 2, an orthogonal basis for the left invariant subspace of M which has a dimension r is given by the first r columns of the orthogonal matrix in a rank revealing QR decomposition of (Z Gamma I) 11] Alternative methods are presented in [3] and [4] The above definition for the matrix sign does not lend itself to an efficient computation but there are several ways of evaluating the matrix sign function. The simplest iteration scheme is Newton s method applied to sgn(M) 2 = I: Z 0 = M; Z k 1 = 1 2 (Z k Z Gamma1 k ) 17) Then ....
[Article contains additional citation context not shown here]
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox: part I. Computer Science Division Report CSD-92-718, University of California at Berkeley, Dec. 1992.
.... for example, in geodesy [17] computer aided design [19] nonlinear leastsquares problems [25] the solution of integral equations [15] and in the calculation of splines [18] Other applications arise in beam forming [8] spectral estimation [23] regularization [21, 29] and eigenproblems [3]. Algorithms for the reliable computation of rank revealing factorizations have recently received considerable attention (see, for example [6, 7, 10, 11, 20, 26, 27] However, the most common approach to computing such a RRQRF is the column pivoting procedure suggested by Businger and Golub [9] ....
Z. Bai and J. Demmel, Design of a parallel nonsymmetric eigenroutine toolbox, Part I, in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, R. F. S. et al, ed., SIAM, 1993, pp. 391--398.
....considerable freedom in implementing SYISDA, in particular with respect to choosing the polynomials p i as well as the method for computing the invariant subspaces. We also mention that any other method that produces invariant subspaces, such as approximation methods for the matrix sign function [12, 13, 20, 2], could be used in the Eigenvalue Smoothing step as well. As in [26] we use predominantly the first incomplete beta function 3x 2 Gamma 2x 3 in our implementation. The experiments in [26] also confirm the numerical robustness of SYISDA. While the SYISDA algorithm can be used to compute a ....
Bai, Z. & J. Demmel, Design of parallel nonsymmetric eigenroutine toolbox, Part I, Research report 92-09, University of Kentucky (Dec. 1992), (also PRISM Working Note #5).
....we describe several modifications to the generalized Newton iteration and the subspace extraction stage that allow a reduction to 25 in the computational cost and a remarkable improvement in the numerical accuracy. First, in order to obtain the basis for the left and right deflating subspaces, in [3] it is proposed to perform two independent generalized Newton iterations, with initial matrix pairs (A; B) and (A h ; B h ) followed by a subspace extraction procedure which involves the converged matrices. Following our results in [28] we show that a single iteration provides basis for ....
....B) and (A h ; B h ) followed by a subspace extraction procedure which involves the converged matrices. Following our results in [28] we show that a single iteration provides basis for both the left and right deflating subspaces. Thus, we reduce the number of iterations of the approach in [3] by half. Secondly, we propose an iteration on an equivalent matrix pair, where one of the matrices is reduced to a very simple form (bidiagonal form) This reduction is carried out before the iteration is started and cuts the computational cost of each iteration by half. Finally, we describe a ....
Z. Bai and J. W. Demmel, Design of a parallel nonsymmetric eigenroutine toolbox, part II, tech. report, Department of Mathematics, University of Kentucky, 1996.
....(E 1 nA 1 ) 9 = The inverse relation (EnA) Gamma1 = AnE) has the property that (x; y) 2 (EnA) Gamma1 if and only if (y; x) 2 (EnA) If E T A is nonsingular, then (EnA) EnA) Gamma1 = InI) is the identity relation. However, this does not hold for general E and A. For example, [1]n[0] f(y; z) 2 R Theta R j z = 0g ae R Theta R and ( 0]n[1] f(x; y) 2 R Theta R j x = 0g ae R Theta R are inverse relations, but their products are ( 1]n[0] 0]n[1] f(0; 0)g and ( 0]n[1] 1]n[0] R Theta R. Note also that the product relation may require a matrix ....
....has the property that (x; y) 2 (EnA) Gamma1 if and only if (y; x) 2 (EnA) If E T A is nonsingular, then (EnA) EnA) Gamma1 = InI) is the identity relation. However, this does not hold for general E and A. For example, 1]n[0] f(y; z) 2 R Theta R j z = 0g ae R Theta R and ( 0]n[1]) f(x; y) 2 R Theta R j x = 0g ae R Theta R are inverse relations, but their products are ( 1]n[0] 0]n[1] f(0; 0)g and ( 0]n[1] 1]n[0] R Theta R. Note also that the product relation may require a matrix representation with a different number of rows than the factors. For ....
[Article contains additional citation context not shown here]
Z. Bai and J. Demmel, Design of a parallel nonsymmetric eigenroutine toolbox, Part I, in Proceedings of the sixth SIAM Conference on Parallel Processing for Scientific Computing, SIAM Publications, Philadelphia, PA, 1993.
.... Jacobi methods [29, 7, 10, 30] and homotopy methods [25] Parallelizable algorithms for dense nonsymmetric matrices that have been investigated include the QR algorithm [3, 32] Jacobilike methods [31] homotopy methods [24] and the matrix sign function approach to computing invariant subspaces [6, 11, 12, 19, 26, 4]. The purpose of this paper is to present preliminary research results on a new algorithm for finding all the eigenvalues and eigenvectors of a real diagonalizable matrix with real eigenvalues. Although this class of matrices is not completely general, it includes the important class of real ....
....feel that the Beta function approach promises more robust, scalable performance than the matrix sign approach for the matrices we are considering. However, for the general nonsymmetric eigenvalue problem where the matrices may have complex eigenvalues, the matrix sign approach is quite promising [6, 11, 12, 19, 26, 4]. 3. Test cases. Testing of the algorithm described was performed on both nonsymmetric and symmetric matrices. Even though the code performs dense computations and does not take advantage of sparsity, we tested our algorithm on both dense and upper Hessenberg matrices, since the reduction to ....
Bai, Z. & J. Demmel, Design of parallel nonsymmetric eigenroutine toolbox, Part I, Research report 92-09, University of Kentucky (Dec. 1992), (also PRISM Working Note #5).
.... needed, for example, in geodesy [17] computer aided design [19] nonlinear leastsquares problems [25] the solution of integral equations [15] and the calculation of splines [18] Other applications arise in beam forming [8] spectral estimation [23] regularization [21, 29] and eigenproblems [3]. Algorithms for the reliable computation of rank revealing factorizations have recently received considerable attention (see, for example [6, 7, 10, 11, 20, 26, 27] However, the most common approach to computing such an RRQRF is the columnpivoting procedure suggested by Businger and Golub [9] ....
Z. Bai and J. Demmel, Design of a parallel nonsymmetric eigenroutine toolbox, Part I, in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, R. F. Sinovec et al., ed., SIAM, 1993, pp. 391--398.
....invariant subspaces, parallel algorithm. Typeset by A M S T E X 2 HUSS LEDERMAN et al. dense nonsymmetric matrices that have been investigated include the QR algorithm [3, 32] Jacobilike methods [31] homotopy methods [24] and the matrix sign function approach to computing invariant subspaces [6, 11, 12, 19, 26, 4]. The purpose of this paper is to present preliminary research results on a new algorithm for finding all the eigenvalues and eigenvectors of a real diagonalizable matrix with real eigenvalues. Although this class of matrices is not completely general, it includes the important class of real ....
....feel that the Beta function approach promises more robust, scalable performance than the matrix sign approach for the matrices we are considering. However, for the general nonsymmetric eigenvalue problem where the matrices may have complex eigenvalues, the matrix sign approach is quite promising [6, 11, 12, 19, 26, 4]. 3. Test cases. Testing of the algorithm described was performed on both nonsymmetric and symmetric matrices. Even though the code performs dense computations and does not take advantage of sparsity, we tested our algorithm on both dense and upper Hessenberg matrices, since the reduction to upper ....
Bai, Z. & J. Demmel, Design of parallel nonsymmetric eigenroutine toolbox, Part I, Research report 92-09, University of Kentucky (Dec. 1992), (also PRISM Working Note #5).
.... 7] where the scalar c k in iteration (7) is set to c k j det (Y Gamma1 Z k )j Gamma1=r ( if Z; Y 2 IR r Thetar ) From the definition of Z and Y in (9) we have for iteration (14) c k j det (E Gamma1 A k )j Gamma1=n : Other choices of c k are possible; for a comparison see [2]. From (10) and (12) we obtain A1 = GammaE. This suggests the stopping criterion kA k Ek tol Delta kEk; for a suitable norm and a user defined tolerance tol. This criterion is easy to check and does not require additional computations or workspace. As suggested in [3] in our ....
Z. Bai and J. Demmel, Design of a parallel nonsymmetric eigenroutine toolbox, Part I, in Proceedings of the Sixth SIAM Conference on parallel Processing for Scientific Computing, R. S. et al, ed., 1993.
.... for example, in geodesy [17] computer aided design [19] nonlinear least squares problems [25] the solution of integral equations [15] and in the calculation of splines [18] Other applications arise in beam forming [8] spectral estimation [23] regularization [21,29] and eigenproblems [3]. Algorithms for the reliable computation of rank revealing factorizations have recently received considerable attention (see, for example [6, 7, 10, 11, 20, 26, 27] However, the most common approach to computing such a RRQRF is the column pivoting procedure suggested by Businger and Golub [9] ....
Z. Bai and J. Demmel, Design of a parallel nonsymmetric eigenroutine toolbox, Part I, in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, R. F. S. et al, ed., SIAM, 1993, pp. 391--398.
No context found.
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part I. In Proceedings of the Sixth SIAM Conference on Parallel Proceesing for Scientific Computing. SIAM, 1993. Long version available as UC Berkeley Computer Science report all.ps.Z via anonymous ftp from toe.cs.berkeley.edu, directory pub/tech-reports/cs/csd-92-718. 32
....Grants CDA 8722788 and CDA 9401156. 206 ZHAOJUN BAI AND JAMES DEMMEL However, the numerical accuracy and stability of the matrix sign function and divide and conquer algorithms based on it are poorly understood. In this paper, we will address these issues. Much of this work also appears in [3]. Let us first restate some of basic definitions and ideas to establish notation. The matrix sign function of a matrix A is defined as follows [30] let A = Xdiag(J , J )X 1 be the Jordan canonical form of a matrix A # C n n , where the eigenvalues of J lie in the open right half plane ....
<F3.759e+05> Z. Bai and J.<F3.851e+05> Demmel,<F3.469e+05> Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part<F3.851e+05> II, Department of Mathematics Research Report 95-11, University of Kentucky, Lexington, KY, 1995.
No context found.
Z. Bai and J. Demmel, Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Computer Science Tech. Report UCB/CSD-92-718, U.C.Berkeley, 1992.
....the time is spent in these routines. The QR algorithm computes all the eigenvalues of the given matrix. It is difficult to restrict them to computing eigenvalues in the domain of interest without any heuristics, although some progress on large matrices on parallel machines has recently been made [BD93] The order of the matrix, say p, corresponds to the product of the degree of the two curves and the number of eigenvalues is equal to the order of the matrix. The running time of the algorithm is a cubic function of p. However, eigenvalue algorithms have good convergence. Each iteration of the ....
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part I. In Proceedings of the Sixth SIAM Conference on Parallel Proceesing for Scientific Computing. SIAM, 1993. Long version available as UC Berkeley Computer Science report UCB/CSD/92-718.
....a spectral divide and conquer (SDC) algorithm for finding eigenvalues and invariant subspaces of nonsymmetric matrices on distributed memory parallel computers. The algorithm recursively divides the matrix into smaller submatrices, each of which has a subset of the original eigenvalues as its own [5, 28, 3]. On a 256 processor Intel Touchstone Delta system, the SDC algorithm reached 31 e#ciency with respect to the underlying matrix multiplication (PUMMA [13] for matrices of order 4000 and 82 e#ciency with respect to the underlying ScaLAPACK 1.0 matrix inversion. On a 32 processor Thinking ....
....the ways to compute the spectral projector P is to use the matrix sign function. The matrix sign function was introduced by Roberts [31] for solving the algebraic Riccati equation. However, it was soon extended to solving the spectral decomposition problem [5] More recent studies may be found in [28, 3, 23]. The matrix sign function, sign(A) of a matrix A with no eigenvalues on the imaginary axis can be defined via the Jordan canonical form of A (1) where the eigenvalues of J are in the open right halfplane D, and the eigenvalues of J are in the open left halfplane D. Then sign(A) is ....
[Article contains additional citation context not shown here]
Z. BAI AND J. DEMMEL, Design of a parallel nonsymmetric eigenroutine toolbox, Part I, in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, R. F. Sincovec et al., ed., SIAM, Phiadelphia, PA, 1993.
....79, 78, 21, 45, 37, 82, 81, 75] 2. Reduction to nonsymmetric tridiagonal form [46, 32, 43, 44] 3. Jacobi s method [38, 39, 74, 61, 69, 65, 80] 4. Divide and conquer based on Newton s method or homotopy continuation [16, 17, 83, 57, 58, 34] 5. Divide and conquer based on the matrix sign function [59, 7, 60] In contrast to the symmetric problem or SVD, no guaranteed stable and highly parallel algorithm for the nonsymmetric problem exists. Reduction to Hessenberg form (the prerequisite to methods (1) and (4) above) can be done efficiently [33, 36] but Hessenberg QR is hard to parallelize, and the ....
....the operating system, which is quite slow) overflow can cause a slowdown of several orders of magnitude. For the generalized nonsymmetric eigenproblem A0B we do not even know how to perform generalized Hessenberg reduction using more than the Level 1 BLAS. The sign function and related techniques [60, 7] promise to be helpful here. 3 Recommendations for Floating Point Arithmetic We summarize the recommendations we have made in previous sections regarding floating point arithmetic support to mitigate the tradeoff between parallelism (or speed) and stability: accurate rounding, support for ....
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox. Computer Science Dept. preprint, University of California, Berkeley, CA, 1992.
No context found.
Z. Bai and J. Demmel, Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Computer Science Tech. Report UCB/CSD-92-718, U.C.Berkeley, 1992.
....in many applications. In adaptive mesh refinement (AMR) algorithms, there is task parallelism between meshes and data parallelism within a mesh [2] In computing eigenvalues of nonsymmetric matrices, the sign function algorithm does divide and conquer with matrix factorizations at each division [3]. In timinglevel circuit simulation there is parallelism between separate subcircuits and parallelism within the model evaluation of each subcircuit [26] In sparse matrix factorization, multifrontal algorithms expose task parallelism between separate dense sub matrices and data parallelism within ....
....the running times of full divide and conquer trees using these three kinds of parallelism and apply the results to our examples. 4.1.1 Applications Eigenvalue algorithms. Eigenvalue algorithms exhibit mixed parallelism. For example, a recent implementation of a dense nonsymmetric algorithm [3] proceeds by successively separating the matrix into two submatrices, the union of whose eigenvalues are the eigenvalues of the original matrix. The root node has size N = n 2 . If the separation is perfect, each child is of size n 2 Theta n 2 , or N=4. Performing this separation requires ....
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part I. In Proceedings of the Sixth SIAM Conference on Parallel Proceesing for Scientific Computing. SIAM, 1993.
....memory parallel computers. These algorithms perform spectral divide and conquer, i.e. they recursively divide the matrix into smaller submatrices, each of which has a subset of the original eigenvalues as its own. One algorithm uses the matrix sign function evaluated with Newton iteration [8, 42, 6, 4]. The other algorithm avoids the matrix inverse required by Newton iteration, and so is called the inverse free algorithm [30, 10, 44, 7] Both algorithms are simply constructed from a small set of highly parallelizable building blocks, including matrix multiplication, QR decomposition and matrix ....
....Newton iteration with global convergence still need to compute the inverse of a matrix explicitly in one form or another. Dealing with ill conditioned matrices and instability in the Newton iteration for computing the matrix sign function and the subsequent spectral decomposition is discussed in [11, 6, 4] and the references therein. 2.2 The SDC algorithm with inverse free iteration The above algorithm needs an explicit matrix inverse. This could cause numerical instability when the matrix is ill conditioned. The following algorithm, originally due to Godunov, Bulgakov and Malyshev [30, 10, 44] ....
[Article contains additional citation context not shown here]
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part II. in preparation.
....differing very slightly from the polynomial in the bottom row [47, 48] In most applications we are only interested in eigenvalues in a given domain. Until recently, there were no reliable algorithms other than QR, which must find all the eigenvalues, wanted or not. These recent algorithms [42] generally take more floating point operation than QR, but have the advantage of being easy to parallelize, which QR is not. Another class of parallel eigenvalue algorithm applies homotopy continuation to Determinant( L(s) 43, 44] These algorithms, while parallelizable and often accurate, ....
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part I. in Proceedings of the Sixth SIAM Conference on Parallel Proceesing for Scientific Computing, SIAM, 1993.
No context found.
Z. Bai, J. W. Demmel, Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part I, Technical Report RR 92-09, Department of Mathematics, University of Kentucky, 1992.
No context found.
Z. Bai and J. Demmel, Design of a parallel nonsymmetric eigenroutine toolbox, Part I, in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, R. F. S. et al, ed., SIAM, 1993, pp. 391--398.
No context found.
Z. Bai and J.W. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox,Part II. Technical report, Dept. of Mathematics, University of California, Berkeley, Berkeley, Ca, 1994.
No context found.
Z. Bai and J.W. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox. In Proc. of the 6th SIAM Conf. on Parallel Processing for Scientific Computing, Philadelphia, 1993. SIAM. Long version available from Dept. of Mathematics, University of California, Berkeley.
No context found.
Z. Bai and J.W. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox,Part I. Technical Report CDS-92-718, Dept. of Mathematics, University of California, Berkeley, Berkeley, Ca, 1992.
No context found.
Z. Bai and J. Demmel, "Design of a parallel nonsymmetric eigenroutine toolbox, Part I", in Proceeding of the sixth SIAM conference on parallel processing for scientific computing, 1993.
No context found.
Z. BAI AND J. DEMMEL, Design of a parallel nonsymmetric eigenroutine toolbox, Part [, in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, R. F. S. et al, ed., SIAM, 1993, pp. 391-398.
No context found.
Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part I. In Proceedings of the Sixth SIAM Conference on Parallel Proceesing for Scientific Computing. SIAM, 1993.
No context found.
Z. Bai and J. Demmel, Design of a parallel nonsymmetric eigenroutine toolbox, Part I, in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, R. F. S. et al, ed., SIAM, 1993, pp. 391--398.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC