41 citations found. Retrieving documents...
Z. Bai and J. Demmel. On a block implementation of Hessenberg multishift QR iteration. International Journal of High Speed Computing, 1(1):97--112, 1989. (also LAPACK Working Note #8 http://www.netlib.org/lapack/lawns/lawn8.ps).

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Recent Developments in Dense Numerical Linear Algebra - Higham (2000)   (Correct)

....and LAPACK 2. 0) fails to converge, both in exact arithmetic and in floating point arithmetic [16] 31] heuristic remedies are proposed by Day [31] LAPACK includes an implementation of the QR algorithm that uses a multishift strategy to enhance the performance on high performance machines [5]. In the Hessenberg QR iteration, instead of using a single or double shift and chasing the resulting bulge of dimension 1 or 2 a column at a time down the matrix, k simultaneous shifts are used and the k Theta k bulge is chased p columns at a time. Here, k and p are implementation dependent ....

Zhaojun Bai and James W. Demmel. On a block implementation of Hessenberg multishift QR iteration. Int. J. High Speed Computing, 1(1):97--112, 1989.


Parallelizing The QR Algorithm For The Unsymmetric Algebraic.. - Henry, Geijn (1994)   (20 citations)  (Correct)

.... parallel) implementations of the QR algorithm use a blocked version of the Francis double implicit shifted algorithm [15] or a variant thereof [23] There have also been attempts at improving data reuse by increasing the number of shifts either by using a multi implicited shifted QR algorithm [3] or pipelining several double shifts simultaneously [34, 35] A number of attempts at parallelizing the QR algorithm have been made (see Boley and Maier [9] Geist et al. 17, 16] and Stewart [30] Distributing the work evenly amongst the processors has proven difficult for conventional ....

Bai, Z., Demmel, J., On a Block Implementation of Hessenberg Multishift QR Iteration, International Journal of High Speed Computing, Vol. 1, pp. 97--112, 1989


A Distributed Memory Implementation of the Nonsymmetric.. - Dongarra, Henry, Watkins (1996)   (Correct)

....31] and it has data re use similar to level 1 operations (it does O(n) flops on O(n) data [23] This imposes an upper limit to how fast it can run on the high performance computers with a memory hierarchy. One attempt to rectify this problem was the multishift QR algorithm of Bai and Demmel [3], which we mentioned earlier. The idea was to generate a large number M of shifts and use them to chase a large bulge. This allowed for a GEMM based (level 3 BLAS) algorithm to be used [3] Unfortunately, this requires too many more flops and the GEMM itself has two of the three required ....

....with a memory hierarchy. One attempt to rectify this problem was the multishift QR algorithm of Bai and Demmel [3] which we mentioned earlier. The idea was to generate a large number M of shifts and use them to chase a large bulge. This allowed for a GEMM based (level 3 BLAS) algorithm to be used [3]. Unfortunately, this requires too many more flops and the GEMM itself has two of the three required dimensions very small [28] However, even if a multishift QR algorithm is used without the additional matrix multiply (as was implemented in LAPACK [1] the algorithm has convergence problems ....

[Article contains additional citation context not shown here]

Bai, Z., Demmel, D., On a Block Implementation of Hessenberg Multishift QR Iteration Argonne National Laboratory Technical Report ANL-MCS-TM-127, 1989, and International Journal of High Speed Computing, Vol. 1, 1989, p. 97--112


A Parallelizable Eigensolver for Real Diagonalizable.. - Huss-Lederman, Tsao.. (1997)   (20 citations)  (Correct)

.... include bisection multisection, followed by inverse iteration [21, 22, 20] Cuppen s divide and conquer algorithm [9, 14, 28] Jacobi methods [29, 7, 10, 30] and homotopy methods [25] Parallelizable algorithms for dense nonsymmetric matrices that have been investigated include the QR algorithm [3, 32], Jacobi like methods [31] homotopy methods [24] and the matrix sign function approach to computing invariant subspaces [6, 11, 12, 19, 26, 4] The purpose of this paper is to present preliminary research results on a new algorithm for finding all the eigenvalues and eigenvectors of a real ....

Z. BAI AND J. DEMMEL, On a block implementation of Hessenberg multishift QR iteration, Internat. J. High Speed Comput., 1 (1989), pp. 97--112.


QR-like Algorithms for Eigenvalue Problems - Watkins (2000)   (1 citation)  (Correct)

....= z Gamma ) z Gamma ) gives a double GR step with shifts and . A double step is worth two single steps. The standard QR codes for real matrices (dating back to Francis [29] take double steps with either real and or complex = This keeps the computations real. The multishift QR algorithm [3] takes f(z) z Gamma 1 ) z Gamma 2 ) Delta Delta Delta (z Gamma p ) where p can be as big as one pleases in principle. In practice roundoff errors cause problems if p is taken much bigger than six. A more exotic choice would be a rational function such as f(z) z Gamma ) z ....

....and 2 were the only choices of p that were used. The structure of certain types of matrices [17] causes their eigenvalues to come in sets of four (e.g. Gamma, Gamma) For these matrices the choice p = 4 is obviously in order. The use of large values of p was first advocated by Bai and Demmel [3]. This seemed like an excellent idea. If one gets, say, thirty shifts from the lower right hand 30 Theta 30 submatrix and uses them for a QR step of degree p = 30, then one has to chase a 30 Theta 30 bulge. This is like doing 30 steps at a time, and it entails a lot of arithmetic. Since the ....

Z. Bai and J. Demmel, On a block implementation of the Hessenberg multishift QR iteration, Internat. J. High Speed Comput., 1 (1989), pp. 97--112.


A High Performance Parallelization Scheme for the.. - Suda, Nishida, Oyanagi   (Correct)

....overheads are a little worse than the block Hanckel wrapped storage scheme. Henry and van de Geijn [4] reported an implementation of parallel Hessenberg double shift QR algorithm with the block Hanckel wrapped storage scheme. The performance was much better than the preceding research results [1 3,9], but the parallel efficiency was not satisfactory. The major overhead that determines the parallel efficiency of their implementation was the idle time waiting for the computations of the look ahead steps (computing transformations and rotating rows on diagonal blocks) and the broadcasts of the ....

....in some blocks near the diagonal, but a quarter (4 2 ) still remains. The column rotations in a processor are executed from bottom to top, because the next 5 1 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 1 2 3 4 1 2 3 4 1 2 3 4 1. 4 1. 4 mapped to proc 0 mapped to proc 1 [ 1 ] [ 2 ] 3 ] 4 ] 5 ] quarters of row rotations quarters of column rotations Fig. 5. The whole story of the task scheduling (p = 2) processor uses results of the computations of the half block at the bottom in the fourth quarter of the same block transformation. Therefore, the latency of ....

Z. Bai and J. Demmel, On a block implementation of Hessenberg multishift QR iteration, International Journal of High Speed Computing, Vol. 1, No. 1 (1989) 97--112.


Homotopy Method For The Large Sparse Real Nonsymmetric.. - Lui, Keller, Kwok (1996)   (3 citations)  (Correct)

....entire matrix. This may pose a problem if the matrix is so large that not all its entries can be accommodated within the main memory of the computer. A second drawback is that it is inherently a sequential algorithm due to the fact that Givens rotations must be applied sequentially. Bai and Demmel [3] have circumvented somewhat the second problem by performing a block version of the QR algorithm. This improved version seems to work well on vector machines. We now describe a homotopy method to compute the eigenpairs of a given matrix A 1 . From the eigenpairs of some real matrix A 0 , we ....

Z. Bai and J. Demmel. On a block implementation of hessenberg multishift qr iteration. Int. J. High Speed Computing, 1(1):97--112, 1989.


How the QR algorithm fails to converge and how to fix it - Day (1996)   (6 citations)  (Correct)

....with polynomial Y k p k ( 3 Property (1) shows that an ideal shift is a polynomial that annihilates an eigenvalue of H. Towards this end, p( is usually the characteristic polynomial of a South East (SE) submatrix of H at each step. This is called a generalized Rayleigh quotient shift [1]. The Rayleigh quotient shift, translation by the SE element of H, is never a complex number and thus a poor shift for nonsymmetric matrices. QRF (see equation 3) uses the next most simple shift, the generalized Rayleigh quotient shift for the 2 Gammaby Gamma2 SE submatrix. The exceptional shift ....

....) IF( ABS(H33) ABS(H44) GT. ZERO )THEN H33 = H33 H44 H43H34 20 H44 = H33 ( SIGN( DISC, AVE ) AVE ) ELSE H44 = SIGN( DISC, AVE ) AVE END IF H33 = H44 H43H34 = ZERO END If 6 Multishift QR Fails We take as the definition of multishift QR the subroutine HSEQR from version 2. 0 of LAPACK [1, 2]. To compute shifts HSEQR uses the LAPACK implementation of QR, LAHQR. In the examples at hand LAHQR frequently terminates without computing eigenvalues. For this reason we substituted our version of LAHQR modified as described in the next section to converge in all known cases. HSEQR terminates ....

Z. Bai and J. Demmel (1989). On a block implementation of the Hessenberg multishift QR iteration, Int. J. High-speed Comp. 1, 97-112.


Convergence of Algorithms of Decomposition Type for the.. - Watkins, Elsner (1995)   (22 citations)  (Correct)

....attention on single and double steps, i.e. steps of multiplicity one and two, respectively. In recent years it has been recognized that steps of higher multiplicity are sometimes useful. For example, in the SR algorithm [9] it is natural to use steps of multiplicity four. Recently Bai and Demmel [1] have experimented with QR steps of multiplicity as high as 20, the objective being to improve the opportunities for parallelism in a QR step. Since we allow steps of any multiplicity, our theory covers all of these cases. In x3 we show that every GR algorithm is a form of nested subspace ....

Z. Bai and J. Demmel, On a block implementation of Hessenberg multishift QR iteration, LAPACK Working Note No. 8, Argonne National Laboratory MCS-TM-127, January 1989.


The Transmission of Shifts and Shift Blurring in the QR Algorithm - Watkins (1992)   (3 citations)  (Correct)

....attempts to parallelize the QR algorithm have been mostly unsatisfactory. However, the work of Henry and van de Geijn [10] 11] is recent good news. One attempt at parallelization that appeared at first to have great promise was to use the multishift QR algorithm with high multiplicities [4] [5]. A multishift iteration of multiplicity m amounts to m iterations of the ordinary QR algorithm performed at once. Unfortunately the multishift algorithm turned out to have serious convergence difficulties caused by roundoff errors when large multiplicities were used [6] 17] The intent This ....

Z. Bai and J. Demmel, On a block implementation of the Hessenberg multishift QR iteration, Internat. J. High Speed Comput., 1:97-112 (1989).


QR-like Algorithms - An Overview of Convergence Theory and Practice - Watkins (1996)   (Correct)

.... Gamma ) gives a double GR step with shifts oe and . A double step is worth two single steps. The standard QR codes for real matrices (dating back to Francis [18] take double steps with either real oe and or complex = oe. This keeps the computations real. The multishift QR algorithm [2] Takes f(z) z Gamma oe 1 ) z Gamma oe 2 ) Delta Delta Delta (z Gamma oe p ) where p can be as big as one pleases in principle. In practice roundoff errors cause problems if p is taken much bigger than six. A more exotic choice would be a rational function such as f(z) z Gamma oe) z ....

....the shifts are ill defined, blurred so to speak. 12 DAVID S. WATKINS In view of these findings it is no wonder that multishift QR performs poorly when p is large. Many more details are given in [35] The motive for using large p is to improve the prospects for parallelizing the QR algorithm [2]. A potential route to parallelism that circumvents the shift blurring problem is now under investigation. If one has, say, 30 shifts and wishes to perform a GR step of degree 30, one can instead use the 30 shifts to perform fifteen double steps. Thus, rather than chasing one big bulge, one can ....

Z. Bai and J. Demmel, On a block implementation of the hessenberg multishift QR iteration, Internat. J. High Speed Comput. 1 (1989), 97--112.


Preliminary LAPACK Users' Guide - Anderson, Bai, Bischof, Demmel.. (1991)   Self-citation (Bai Demmel)   (Correct)

....matrix, or in computing the singular values and vectors of a bidiagonal matrix. However, for computing the eigenvalues and eigenvectors of a Hessenberg matrix or rather for computing its Schur factorization yet another flavour of block algorithm has been developed: a multishift QR iteration [3]. Whereas the traditional EISPACK routine HQR uses a double shift (and the corresponding complex routine COMQR uses a single shift) the multishift algorithm uses block shifts of higher order. It has been found that the total number of operations decreases as the order of shift is increased until ....

Z. Bai and J. Demmel. On a block implementation of Hessenberg multishift QR iteration. Int. J. High Speed Comput., 1:97--112, 1989. (Also LAPACK Working Note #8.)


Block LU Factorization - Demmel, al. (1995)   (5 citations)  Self-citation (Demmel)   (Correct)

.... block algorithm can cause confusion and so we do not recommend this terminology. Note that in the particular case of matrix multiplication partitioned and block algorithms are equivalent. LAPACK contains only partitioned algorithms. A possible exception is the multi shift Hessenberg QR iteration [2], which could be regarded a block algorithm, even though it does not work with a block Hessenberg form. As this example indicates, not all algorithms fit neatly into one class or the other, so our definitions should not be interpreted too strictly. Block LU factorization is one of the few block ....

Zhaojun Bai and James W. Demmel, On a block implementation of Hessenberg multishift QR iteration, Int. J. High Speed Computing, 1 (1989), pp. 97-- 112.


Stability of Block Algorithms with Fast Level 3 BLAS - Demmel, Higham (1992)   (11 citations)  Self-citation (Demmel)   (Correct)

....and then B is updated according to B (I W r Y T r )B = B W r (Y T r B) which involves only BLAS3 operations. The process is now repeated on the last m Gamma r rows of B. An alternative form of accumulation is proposed in [14] for r = 2, extended to general r in [13] and used in [2]. In the context of orthogonal similarity reduction to Hessenberg form, the technique involves expressing P r P r Gamma1 : P 1 AP 1 : P r Gamma1 P r = A Gamma U r V T r Gamma W r U T r ; 3.1) where U r ; V r ; W r 2 IR n Thetar . We refer the reader to [13] for details of how to ....

Z. Bai and J.W. Demmel, On a block implementation of Hessenberg multishift QR iteration, Int. J. High Speed Computing, 1 (1989), pp. 97--121.


On Designing Portable High Performance . . . - Demmel, al. (1992)   Self-citation (Demmel)   (Correct)

....these methods are currently implemented only as serial codes. We intend to supply parallel versions in future releases. Third, the Hessenberg eigenvalue algorithm has proven quite difficult to parallelize. We have a partially blocked implementation of the QR algorithm but the speedup is modest [5]. There has been quite recent progress [19] but it remains an open problem to produce a highly parallel and reliably stable and convergent algorithm for this problem and for the generalized Hessenberg eigenvalue problem. Fourth is the issue of performance tuning, in particular choosing the block ....

Z. Bai and J. Demmel. On a block implementation of Hessenberg multishift QR iteration. International Journal of High Speed Computing, 1(1):97--112, 1989. (also LAPACK Working Note #8).


On Designing Portable High Performance . . . - Demmel, al. (1991)   Self-citation (Demmel)   (Correct)

....these 5 methods are currently implemented only as serial codes. We intend to supply parallel versions in future releases. Third, the Hessenberg eigenvalue algorithm has proven quite difficult to parallelize. We have a partially blocked implementation of the QR algorithm but the speedup is modest [5]. There has been quite recent progress [19] but it remains an open problem to produce a highly parallel and reliably stable and convergent algorithm for this problem and for the generalized Hessenberg eigenvalue problem. Fourth is the issue of performance tuning, in particular choosing the block ....

Z. Bai and J. Demmel. On a block implementation of Hessenberg multishift QR iteration. International Journal of High Speed Computing, 1(1):97--112, 1989. (also LAPACK Working Note #8).


Trading Off Parallelism and Numerical Stability - Demmel (1992)   (9 citations)  Self-citation (Demmel)   (Correct)

....algorithm, it took several years of research to discover this, so the price paid for poorly rounded floating point was several years of delay. 2.5 The nonsymmetric eigenproblem Five kinds of parallel methods for the nonsymmetric eigenproblem have been investigated: 1. Hessenberg QR iteration [6, 79, 78, 21, 45, 37, 82, 81, 75], 2. Reduction to nonsymmetric tridiagonal form [46, 32, 43, 44] 3. Jacobi s method [38, 39, 74, 61, 69, 65, 80] 4. Divide and conquer based on Newton s method or homotopy continuation [16, 17, 83, 57, 58, 34] 5. Divide and conquer based on the matrix sign function [59, 7, 60] In contrast to the ....

Z. Bai and J. Demmel. On a block implementation of Hessenberg multishift QR iteration. International Journal of High Speed Computing, 1(1):97--112, 1989. (also LAPACK Working Note #8).


Execution Time of Symmetric Eigensolvers - Stanley (1997)   (7 citations)  (Correct)

No context found.

Z. Bai and J. Demmel. On a block implementation of Hessenberg multishift QR iteration. International Journal of High Speed Computing, 1(1):97--112, 1989. (also LAPACK Working Note #8 http://www.netlib.org/lapack/lawns/lawn8.ps).


Parallel Adaptive Wavefront Algorithms Solving Lyapunov.. - Claver, Hernandez (1997)   (Correct)

No context found.

Z. Bai and J. Demmel, On a block implementation of Hessenberg multishift QR iteration, Int. Journal of High Speed Computing, Vol. 1, (1989), 97-112.


Cyclic Wavefront Algorithms For Solving Semidefinite Lyapunov .. - Jose Claver And   (Correct)

No context found.

Z. Bai and J. Demmel, On a block implementation of Hessenberg multishift QR iteration, Int. Journal of High Speed Computing, Vol. 1, (1989), 97-112.


Informe T'ecnico - Di Block-Oriented Implementations   (Correct)

No context found.

Z. Bai, J. Demmel, On a Block Implementation of Hessenberg Multishift QR Iteration, International Journal of High Speed Computing, 1 (1989), pp. 97--112.


Parallel Wavefront Algorithms Solving Lyapunov Equations for .. - Claver, Hernandez (1997)   (Correct)

No context found.

Z. Bai and J. Demmel, On a block implementation of Hessenberg multishift QR iteration, Int. Journal of High Speed Computing, Vol. 1, (1989), 97-112.


IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO. 12.. - Andras Var Ga (1996)   (Correct)

No context found.

Z. Bai and J. W. Demmel, "On a block implementation of the Hessenberg multishift QR iteration," Int. J. High-Speed Comp., vol. 1, pp. 97--112, 1989.


Multishift Algorithm For Pole Assignment Of - Single-Input Systems Varga (1995)   (Correct)

No context found.

Z. Bai and J. W. Demmel. On a block implementation of the Hessenberg multishift QR iteration. Int. J. High-Speed Comp., 1:97--112, 1989.


Improving the Unsymmetric Parallel QR Algorithm on Vector Machines - Henry (1993)   (Correct)

No context found.

Bai, Z., Demmel, J., On a Block Implementation of Hessenberg Multishift QR Iteration, International Journal of High Speed Computing, Vol. 1, 1989, pp. 97-112, Formerly: ANL, Technical Report MCSTM -127, LAPACK Working Note 8, January 1989.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC