Results 1  10
of
23
Parallel Numerical Linear Algebra
, 1993
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illust ..."
Abstract

Cited by 773 (23 self)
 Add to MetaCart
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.
An Analytical Constant Modulus Algorithm
, 1996
"... Iterative constant modulus algorithms such as Godard and CMA have been used to blindly separate a superposition of cochannel constant modulus (CM) signals impinging on an antenna array. These algorithms have certain deficiencies in the context of convergence to local minima and the retrieval of all ..."
Abstract

Cited by 166 (35 self)
 Add to MetaCart
Iterative constant modulus algorithms such as Godard and CMA have been used to blindly separate a superposition of cochannel constant modulus (CM) signals impinging on an antenna array. These algorithms have certain deficiencies in the context of convergence to local minima and the retrieval of all individual CM signals that are present in the channel. In this paper, we show that the underlying constant modulus factorization problem is, in fact, a generalized eigenvalue problem, and may be solved via a simultaneous diagonalization of a set of matrices. With this new, analytical approach, it is possible to detect the number of CM signals present in the channel, and to retrieve all of them exactly, rejecting other, nonCM signals. Only a modest amount of samples are required. The algorithm is robust in the presence of noise, and is tested on measured data, collected from an experimental setup.
Joint Angle and Delay Estimation Using ShiftInvariance Properties
, 1997
"... Assuming a multipath propagation scenario, we derive a closedform subspacebased method for the simultaneous estimation of arrival angles and path delays from measured channel impulse responses, using knowledge of the transmitted pulse shape function and assuming a uniform linear array and uniform ..."
Abstract

Cited by 78 (15 self)
 Add to MetaCart
(Show Context)
Assuming a multipath propagation scenario, we derive a closedform subspacebased method for the simultaneous estimation of arrival angles and path delays from measured channel impulse responses, using knowledge of the transmitted pulse shape function and assuming a uniform linear array and uniform sampling. The algorithm uses a 2D ESPRITlike shiftinvariance technique to separate and estimate the phase shifts due to delay and directionofincidence, with automatic pairing of the two parameter sets. A straightforward extension to the multiuser case allows to connect rays to users as well.
An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems
, 1997
"... We discuss an inversefree, highly parallel, spectral divide and conquer algorithm. It can compute either an invariant subspace of a nonsymmetric matrix A, or a pair of left and right deflating subspaces of a regular matrix pencil A − λB. This algorithm is based on earlier ones of Bulgakov, Godunov ..."
Abstract

Cited by 68 (10 self)
 Add to MetaCart
(Show Context)
We discuss an inversefree, highly parallel, spectral divide and conquer algorithm. It can compute either an invariant subspace of a nonsymmetric matrix A, or a pair of left and right deflating subspaces of a regular matrix pencil A − λB. This algorithm is based on earlier ones of Bulgakov, Godunov and Malyshev, but improves on them in several ways. This algorithm only uses easily parallelizable linear algebra building blocks: matrix multiplication and QR decomposition, but not matrix inversion. Similar parallel algorithms for the nonsymmetric eigenproblem use the matrix sign function, which requires matrix inversion and is faster but can be less stable than the new algorithm.
Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part I
, 1993
"... The dense nonsymmetric eigenproblem is one of the hardest linear algebra problems to solve effectively on massively parallel machines. Rather than trying to design a "black box" eigenroutine in the spirit of EISPACK or LAPACK, we propose building a toolbox for this problem. The tools ar ..."
Abstract

Cited by 65 (12 self)
 Add to MetaCart
The dense nonsymmetric eigenproblem is one of the hardest linear algebra problems to solve effectively on massively parallel machines. Rather than trying to design a "black box" eigenroutine in the spirit of EISPACK or LAPACK, we propose building a toolbox for this problem. The tools are meant to be used in different combinations on different problems and architectures. In this paper, we will describe these tools which include basic block matrix computations, the matrix sign function, 2dimensional bisection, and spectral divide and conquer using the matrix sign function to find selected eigenvalues. We also outline how we deal with illconditioning and potential instability. Numerical examples are included. A future paper will discuss error analysis in detail and extensions to the generalized eigenproblem.
A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures
 SIAM J. SCI. COMPUT
, 2002
"... One approach to solving the nonsymmetric eigenvalue problem in parallel is to parallelize the QR algorithm. Not long ago, this was widely considered to be a hopeless task. Recent efforts have led to significant advances, although the methods proposed up to now have suffered from scalability problems ..."
Abstract

Cited by 37 (3 self)
 Add to MetaCart
One approach to solving the nonsymmetric eigenvalue problem in parallel is to parallelize the QR algorithm. Not long ago, this was widely considered to be a hopeless task. Recent efforts have led to significant advances, although the methods proposed up to now have suffered from scalability problems. This paper discusses an approach to parallelizingthe QR algorithm that greatly improves scalability. A theoretical analysis indicates that the algorithm is ultimately not scalable, but the nonscalability does not become evident until the matrix dimension is enormous. Experiments on the Intel Paragon system, the IBM SP2 supercomputer, the SGI Origin 2000, and the Intel ASCI Option Red supercomputer are reported.
The spectral decomposition of nonsymmetric matrices on distributed memory parallel computers
 SIAM J. Sci. Comput
, 1997
"... Abstract. The implementation and performance of a class of divideandconquer algorithms for computing the spectral decomposition of nonsymmetric matrices on distributed memory parallel computers are studied in this paper. After presenting a general framework, we focus on a spectral divideandconqu ..."
Abstract

Cited by 34 (10 self)
 Add to MetaCart
Abstract. The implementation and performance of a class of divideandconquer algorithms for computing the spectral decomposition of nonsymmetric matrices on distributed memory parallel computers are studied in this paper. After presenting a general framework, we focus on a spectral divideandconquer (SDC) algorithm with Newton iteration. Although the algorithm requires several times as many floating point operations as the best serial QR algorithm, it can be simply constructed from a small set of highly parallelizable matrix building blocks within Level 3 basic linear algebra subroutines (BLAS). Efficient implementations of these building blocks are available on a wide range of machines. In some illconditioned cases, the algorithm may lose numerical stability, but this can easily be detected and compensated for. The algorithm reached 31 % efficiency with respect to the underlying PUMMA matrix multiplication and 82 % efficiency with respect to the underlying ScaLAPACK matrix inversion on a 256 processor Intel Touchstone Delta system, and 41 % efficiency with respect to the matrix multiplication in CMSSL on a 32 node Thinking Machines CM5 with vector units. Our performance model predicts the performance reasonably accurately. To take advantage of the geometric nature of SDC algorithms, we have designed a graphical user interface to let the user choose the spectral decomposition according to specified regions in the complex plane.
Trading off Parallelism and Numerical Stability
, 1992
"... The fastest parallel algorithm for a problem may be significantly less stable numerically than the fastest serial algorithm. We illustrate this phenomenon by a series of examples drawn from numerical linear algebra. We also show how some of these instabilities may be mitigated by better floating poi ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
The fastest parallel algorithm for a problem may be significantly less stable numerically than the fastest serial algorithm. We illustrate this phenomenon by a series of examples drawn from numerical linear algebra. We also show how some of these instabilities may be mitigated by better floating point arithmetic.
Numerical methods for palindromic eigenvalue problems: Computing the antitriangular Schur form
 Numer. Linear Algebra Appl
"... Abstract. We present structurepreserving numerical methods for the eigenvalue problem of complex palindromic pencils. Such problems arise in control theory, as well as from palindromic linearizations of higher degree palindromic matrix polynomials. A key ingredient of these methods is the developme ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We present structurepreserving numerical methods for the eigenvalue problem of complex palindromic pencils. Such problems arise in control theory, as well as from palindromic linearizations of higher degree palindromic matrix polynomials. A key ingredient of these methods is the development of an appropriate condensed form — the antitriangular Schur form. Illconditioned problems with eigenvalues near the unit circle, in particular near ±1, are discussed. We show how a combination of unstructured methods followed by a structured refinement can be used to solve such problems accurately.
A JacobiLike Method For Solving Algebraic Riccati Equations On Parallel Computers
"... . An algorithm to solve continuoustime algebraic Riccati equations through the Hamiltonian Schur form is developed. It is an adaption for Hamiltonian matrices of an unsymmetric Jacobi method of Eberlein [15]. It uses unitary symplectic similarity transformations and preserves the Hamiltonian struc ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
. An algorithm to solve continuoustime algebraic Riccati equations through the Hamiltonian Schur form is developed. It is an adaption for Hamiltonian matrices of an unsymmetric Jacobi method of Eberlein [15]. It uses unitary symplectic similarity transformations and preserves the Hamiltonian structure of the matrix. Each iteration step needs only local information about the current matrix, thus admitting efficient parallel implementations on certain parallel architectures. Convergence performance of the algorithm is compared with the HamiltonianJacobi algorithm of Byers [12]. The numerical experiments suggest that the method presented here converges considerably faster for nonHermitian Hamiltonian matrices than Byers' HamiltonianJacobi algorithm. Besides that, numerical experiments suggest that for the method presented here the number of iterations needed for convergence can be predicted by a simple function of the matrix size. Key words. Jacobilike method, Hamiltonian matri...