Results 11 - 20
of
97
Incomplete Cholesky Factorizations With Limited Memory
- SIAM J. SCI. COMPUT
, 1999
"... We propose an incomplete Cholesky factorization for the solution of large-scale trust region subproblems and positive definite systems of linear equations. This factorization depends on a parameter p that specifies the amount of additional memory (in multiples of n, the dimension of the problem) tha ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
We propose an incomplete Cholesky factorization for the solution of large-scale trust region subproblems and positive definite systems of linear equations. This factorization depends on a parameter p that specifies the amount of additional memory (in multiples of n, the dimension of the problem) that is available; there is no need to specify a drop tolerance. Our numerical results show that the number of conjugate gradient iterations and the computing time are reduced dramatically for small values of p. We also show that in contrast with drop tolerance strategies, the new approach is more stable in terms of number of iterations and memory requirements.
A Domain Decomposition Preconditioner for a Parallel Finite Element Solver on Distributed Unstructured Grids
- Parallel Computing
, 1995
"... We consider a number of practical issues associated with the parallel distributed memory solution of elliptic partial differential equations using unstructured meshes in two dimensions. The first part of the paper describes a parallel mesh generation algorithm which is designed both for efficiency a ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
We consider a number of practical issues associated with the parallel distributed memory solution of elliptic partial differential equations using unstructured meshes in two dimensions. The first part of the paper describes a parallel mesh generation algorithm which is designed both for efficiency and to produce a well-partitioned, distributed mesh, suitable for the efficient parallel solution of an elliptic p.d.e. The second part of the paper concentrates on parallel domain decomposition preconditioning for the linear algebra problems which arise when solving such a p.d.e. on the unstructured meshes that we generate. It is demonstrated that by allowing the mesh generator and the p.d.e. solver to share a certain coarse grid structure we are able to obtain efficient parallel solutions to a number of large problems. Although the work is presented here in a finite element context, the issues of mesh generation and domain decomposition are not of course strictly dependent upon this particu...
A Parallel Preconditioned Conjugate Gradient Package for Solving Sparse Linear Systems on a Cray Y-MP
- Appl. Num. Math
, 1991
"... In this paper we discuss current activities at Cray Research to develop generalpurpose, production-quality software for the efficient solution of sparse linear systems. ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
In this paper we discuss current activities at Cray Research to develop generalpurpose, production-quality software for the efficient solution of sparse linear systems.
The Multigrid Preconditioned Conjugate Gradient Method
, 1993
"... This paper considers an efficient preconditioner and proposes a multigrid preconditioned conjugate gradient method (MGCG method) which is the conjugate gradient method with the multigrid method as a preconditioner. The combination of the multigrid method and the conjugate gradient method was already ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
This paper considers an efficient preconditioner and proposes a multigrid preconditioned conjugate gradient method (MGCG method) which is the conjugate gradient method with the multigrid method as a preconditioner. The combination of the multigrid method and the conjugate gradient method was already considered. Kettler and Meijerink [7] and Kettler [8] treated the multigrid method as a preconditioner of the conjugate gradient method. However this paper formulates MGCG method more generally than these ones and requirements of the multigrid preconditioner are studied. On the other hands, Bank and Douglas [2] treated the conjugate gradient method as a relaxation method of the multigrid method. Braess [3] considered these two combinations and reported the conjugate gradient method with a multigrid preconditioning is effective for elasticity problems. We study requirements of the valid multigrid preconditioner and evaluates this preconditioner by some numerical experiments and eigenvalue analysis. Especially, eigenvalue analysis is more direct and more reasonable criterion than convergence rate, since the number of iterations of the conjugate gradient method until convergence depends on the eigenvalues' distribution of the preconditioned matrix. In Sections 2 and 3, the preconditioned conjugate gradient method and the multigrid method which are the basis of this paper are briefly explained. Section 4 discusses the requirements of the valid two-grid preconditioner for the conjugate gradient method. Then in Section 5, it is extended to the requirements of the multigrid preconditioner. In Section 7, numerical experiments show that MGCG method converges with very few iterations even for ill-conditioned problems. In Section 8, eigenvalue analysis is performed, and it is realize...
SOR as a Preconditioner
- APPL. NUMER. MATH
, 1995
"... Introduction It is well-known (see, e.g. [2] and [4]) that the use of red/black or multicolor orderings to parallelize SSOR or ILU preconditioning may seriously degrade the rate of convergence of the conjugate gradient method, as compared with the natural ordering. The SOR iteration itself, howeve ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Introduction It is well-known (see, e.g. [2] and [4]) that the use of red/black or multicolor orderings to parallelize SSOR or ILU preconditioning may seriously degrade the rate of convergence of the conjugate gradient method, as compared with the natural ordering. The SOR iteration itself, however, does not suffer this degradation. Indeed, if the coefficient matrix is consistently ordered with property A, the asymptotic rates of convergence of the natural and red/black orderings are identical (Young[9]); moreover, in practice one quite often sees faster convergence in the red/black ordering than in the natural ordering. This suggests the possible use of SOR as a parallel preconditioner. It cannot be a preconditioner for the conjugate gradient method on symmetric positive definite systems since the corresponding preconditioned matrix is not symmetric. But this restriction does not apply to nonsymmetric systems and conjugate-gradient type methods such as GMRES
Crout versions of ILU for general sparse matrices
, 2002
"... This paper presents an e#cient implementation of incomplete LU #ILU# factorizations that are derived from the Crout version of Gaussian elimination #GE#. At step k of the elimination, the k-th rowofU and the k-th column of L are computed using previously computed rows of U and columns of L. The da ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
This paper presents an e#cient implementation of incomplete LU #ILU# factorizations that are derived from the Crout version of Gaussian elimination #GE#. At step k of the elimination, the k-th rowofU and the k-th column of L are computed using previously computed rows of U and columns of L. The data structure and implementation borrow from already known techniques used in developing both sparse direct solution codes and incomplete Cholesky factorizations. It is shown that this version of ILU has many practical advantages. In particular, its data structure allows e#cient implementation of more rigorous and e#ective dropping strategies. Numerical tests show that the method is far more e#cient than standard threshold-based ILU factorizations computed row-wise or column-wise.
A Parallel-Vector Algorithm for Rapid Structural Analysis on High-Performance Computers
, 1990
"... A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel c ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the "loop unrolling" technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large-scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method. Nomenclature e a error norm for solution residuals e s strain energy error norm {f} load vector hpm har...
A new class of asynchronous iterative algorithms with order intervals
- Mathematics of Computation
, 1998
"... Abstract. This paper deals with a new class of parallel asynchronous iterative algorithms for the solution of nonlinear systems of equations. The main feature of the new class of methods presented here is the possibility of flexible communication between processors. In particular partial updates can ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Abstract. This paper deals with a new class of parallel asynchronous iterative algorithms for the solution of nonlinear systems of equations. The main feature of the new class of methods presented here is the possibility of flexible communication between processors. In particular partial updates can be exchanged. Approximation of the associated fixed point mapping is also considered. A detailed convergence study is presented. A connection with the Schwarz alternating method is made for the solution of nonlinear boundary value problems. Computational results on a shared memory multiprocessor IBM 3090 are briefly presented. 1.
The Design and Analysis of Bulk-Synchronous Parallel Algorithms
, 1998
"... The model of bulk-synchronous parallel (BSP) computation is an emerging paradigm of general-purpose parallel computing. This thesis presents a systematic approach to the design and analysis of BSP algorithms. We introduce an extension of the BSP model, called BSPRAM, which reconciles shared-memory s ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The model of bulk-synchronous parallel (BSP) computation is an emerging paradigm of general-purpose parallel computing. This thesis presents a systematic approach to the design and analysis of BSP algorithms. We introduce an extension of the BSP model, called BSPRAM, which reconciles shared-memory style programming with efficient exploitation of data locality. The BSPRAM model can be optimally simulated by a BSP computer for a broad range of algorithms possessing certain characteristic properties: obliviousness, slackness, granularity. We use BSPRAM to design BSP algorithms for problems from three large, partially overlapping domains: combinatorial computation, dense matrix computation, graph computation. Some of the presented algorithms are adapted from known BSP algorithms (butterfly dag computation, cube dag computation, matrix multiplication). Other algorithms are obtained by application of established non-BSP techniques (sorting, randomised list contraction, Gaussian elimination without pivoting and with column pivoting, algebraic path computation), or use original techniques specific to the BSP model (deterministic list contraction, Gaussian elimination with nested block pivoting, communication-efficient multiplication of Boolean matrices, synchronisation-efficient shortest paths computation). The asymptotic BSP cost of each algorithm is established, along with its BSPRAM characteristics. We conclude by outlining some directions for future research.
A New Hessian Preconditioning Method Applied to Variational Data Assimilation Experiments Using NASA General Circulation Models
, 1996
"... An analysis is provided to show that Courtier's et al. method for estimating the Hessian preconditioning is not applicable to important categories of cases involving nonlinearity. An extension of the method to cases with higher nonlinearity is proposed in the present paper by designing an algorith ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
An analysis is provided to show that Courtier's et al. method for estimating the Hessian preconditioning is not applicable to important categories of cases involving nonlinearity. An extension of the method to cases with higher nonlinearity is proposed in the present paper by designing an algorithm that reduces errors in Hessian estimation induced by lack of validity of the tangent linear approximation. The new preconditioning method was numerically tested in the framework of variational data assimilation expeximents using both the National Aeronautics and Space Administration (NASA) semi-Lagrangian semi-implicit global shallow-water equations model and the adiabatic version of the NASA/Data AssimilatiOn Office (DAO) Goddard Observing System Version I (GEOS-1) general circulation model. The authors' results show that the new preconditioning method speeds up convergence rate of minimization when applied to variational data assimilation cases characterized by strong nonlinearity.

