Results 1 - 10
of
129
Preconditioning techniques for large linear systems: A survey
- J. COMPUT. PHYS
, 2002
"... This article surveys preconditioning techniques for the iterative solution of large linear systems, with a focus on algebraic methods suitable for general sparse matrices. Covered topics include progress in incomplete factorization methods, sparse approximate inverses, reorderings, parallelization i ..."
Abstract
-
Cited by 192 (5 self)
- Add to MetaCart
(Show Context)
This article surveys preconditioning techniques for the iterative solution of large linear systems, with a focus on algebraic methods suitable for general sparse matrices. Covered topics include progress in incomplete factorization methods, sparse approximate inverses, reorderings, parallelization issues, and block and multilevel extensions. Some of the challenges ahead are also discussed. An extensive bibliography completes the paper.
hypre: a Library of High Performance Preconditioners
- Preconditioners,” Lecture Notes in Computer Science
, 2002
"... hypre is a software library for the solution of large, sparse linear systems on massively parallel computers. Its emphasis is on modern powerful and scalable preconditioners. hypre provides various conceptual interfaces to enable application users to access the library in the way they naturally ..."
Abstract
-
Cited by 90 (5 self)
- Add to MetaCart
hypre is a software library for the solution of large, sparse linear systems on massively parallel computers. Its emphasis is on modern powerful and scalable preconditioners. hypre provides various conceptual interfaces to enable application users to access the library in the way they naturally think about their problems. This paper presents the conceptual interfaces in hypre. An overview of the preconditioners that are available in hypre is given, including some numerical results that show the eciency of the library.
Parallel multigrid smoothing: polynomial versus Gauss-Seidel
- J. Comp. Phys
, 2003
"... Abstract. Gauss-Seidel method is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. ..."
Abstract
-
Cited by 46 (13 self)
- Add to MetaCart
(Show Context)
Abstract. Gauss-Seidel method is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson’s equation, thin-body elasticity, and eddy current approximations to Maxwell’s equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.
The design and implementation of hypre, a library of parallel high performance preconditioners
- Numerical solution of Partial Differential Equations on Parallel Computers, Lect. Notes Comput. Sci. Eng
, 2006
"... Summary. The hypre software library provides high performance preconditioners and solvers for the solution of large, sparse linear systems on massively parallel com-puters. One of its attractive features is the provision of conceptual interfaces. These interfaces give application users a more natura ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
(Show Context)
Summary. The hypre software library provides high performance preconditioners and solvers for the solution of large, sparse linear systems on massively parallel com-puters. One of its attractive features is the provision of conceptual interfaces. These interfaces give application users a more natural means for describing their linear systems, and provide access to methods such as geometric multigrid which require additional information beyond just the matrix. This chapter discusses the design of the conceptual interfaces in hypre and illustrates their use with various examples. We discuss the data structures and parallel implementation of these interfaces. A brief overview of the solvers and preconditioners available through the interfaces is also given. 1
Reducing complexity in parallel algebraic multigrid preconditioners
- SIAM J. Matrix Anal. Appl
, 2006
"... Abstract. Algebraic multigrid (AMG) is a very efficient iterative solver and preconditioner for large unstructured sparse linear systems. Traditional coarsening schemes for AMG can, however, lead to computational complexity growth as problem size increases, resulting in increased memory use and exec ..."
Abstract
-
Cited by 33 (8 self)
- Add to MetaCart
(Show Context)
Abstract. Algebraic multigrid (AMG) is a very efficient iterative solver and preconditioner for large unstructured sparse linear systems. Traditional coarsening schemes for AMG can, however, lead to computational complexity growth as problem size increases, resulting in increased memory use and execution time, and diminished scalability. Two new parallel AMG coarsening schemes are proposed that are based solely on enforcing a maximum independent set property, resulting in sparser coarse grids. The new coarsening techniques remedy memory and execution time complexity growth for various large three-dimensional (3D) problems. If used within AMG as a preconditioner for Krylov subspace methods, the resulting iterative methods tend to converge fast. This paper discusses complexity issues that can arise in AMG, describes the new coarsening schemes, and examines the performance of the new preconditioners for various large 3D problems. Key words. parallel coarsening algorithms, algebraic multigrid, complexities, preconditioners
Geometric optimization of the evaluation of finite element matrices
- SIAM J. Sci. Comput
"... Abstract. Assembling stiffness matrices represents a significant cost in many finite element computations. We address the question of optimizing the evaluation of these matrices. By finding redundant computations, we are able to significantly reduce the cost of building local stiffness matrices for ..."
Abstract
-
Cited by 29 (17 self)
- Add to MetaCart
(Show Context)
Abstract. Assembling stiffness matrices represents a significant cost in many finite element computations. We address the question of optimizing the evaluation of these matrices. By finding redundant computations, we are able to significantly reduce the cost of building local stiffness matrices for the Laplace operator and for the trilinear form for Navier-Stokes. For the Laplace operator in two space dimensions, we have developed a heuristic graph algorithm that searches for such redundancies and generates code for computing the local stiffness matrices. Up to cubics, we are able to build the stiffness matrix on any triangle in less than one multiply-add pair per entry. Up to sixth degree, we can do it in less than about two. Preliminary low-degree results for Poisson and Navier-Stokes operators in three dimensions are also promising.
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
- In International Conference for High Performance Computing, Networking, Storage and Analysis (SC’10
, 2010
"... Abstract—This paper presents an in-depth analysis of the impact of system noise on large-scale parallel application performance in realistic settings. Our analytical model shows that not only collective operations but also point-to-point communications influence the application’s sensitivity to nois ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
(Show Context)
Abstract—This paper presents an in-depth analysis of the impact of system noise on large-scale parallel application performance in realistic settings. Our analytical model shows that not only collective operations but also point-to-point communications influence the application’s sensitivity to noise. We present a simulation toolchain that injects noise delays from traces gathered on common large-scale architectures into a LogGPS simulation and allows new insights into the scaling of applications in noisy environments. We investigate collective operations with up to 1 million processes and three applications (Sweep3D, AMG, and POP) with up to 32,000 processes. We show that the scale at which noise becomes a bottleneck is system-specific and depends on the structure of the noise. Simulations with different network speeds show that a 10x faster network does not improve application scalability. We quantify noise and conclude that our tools can be utilized to tune the noise signatures of a specific system. I. MOTIVATION AND BACKGROUND The performance impact of operating system and architectural overheads (system noise) at massive scale is increasingly of concern. Even small local delays on compute nodes, which can be caused by interrupts, operating system daemons, or even cache or page misses, can affect global application performance significantly [1]. Such local delays often cause less than 1 % overhead per process but severe performance losses can occur if noise is propagated (amplified) through communication or global synchronization. Previous analyses generally assume that the performance impact of system noise grows at scale and Tsafrir et al. [2] even suggest that the impact of very low frequency noise scales linearly with the system size. A. Related Work Petrini, Kerbyson, and Pakin [1] report that the parallel performance of SAGE on a fixed number of ASCI Q nodes was highest when SAGE used only three of the four CPUs per node. It turned out that “resonance ” between the application’s collective communication and the misconfigured system caused delays during each iteration. Jones, Brenner, and Fier
Combining Performance Aspects of Irregular Gauss-Seidel via Sparse Tiling
- in 15th Workshop on Languages and Compilers for Parallel Computing (LCPC
, 2002
"... Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as Gauss-Seidel. To improve performance, iterative smoothers can exploit parallelism, intra-iteration data reuse, and inter-iteration data reuse. Current met ..."
Abstract
-
Cited by 25 (12 self)
- Add to MetaCart
(Show Context)
Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as Gauss-Seidel. To improve performance, iterative smoothers can exploit parallelism, intra-iteration data reuse, and inter-iteration data reuse. Current methods for parallelizing Gauss-Seidel on irregular grids, such as multi-coloring and ownercomputes based techniques, exploit parallelism and possibly intra-iteration data reuse but not inter-iteration data reuse. Sparse tiling techniques were developed to improve intra-iteration and inter-iteration data locality in iterative smoothers. This paper describes how sparse tiling can additionally provide parallelism. Our results show the effectiveness of Gauss-Seidel parallelized with sparse tiling techniques on shared memory machines, specifically compared to owner-computes based Gauss-Seidel methods. The latter employ only parallelism and intra-iteration locality. Our results support the premise that better performance occurs when all three performance aspects (parallelism, intraiteration, and inter-iteration data locality) are combined.
A Distributed Memory Unstructured Gauss-Seidel Algorithm for Multigrid Smoothers
, 2001
"... Gauss-Seidel is a popular multigrid smoother as it is provably optimal on structured grids and exhibits superior performance on unstructured grids. Gauss-Seidel is not used to our knowledge on distributed memory machines as it is not obvious how to parallelize it effectively. We, among others, have ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
Gauss-Seidel is a popular multigrid smoother as it is provably optimal on structured grids and exhibits superior performance on unstructured grids. Gauss-Seidel is not used to our knowledge on distributed memory machines as it is not obvious how to parallelize it effectively. We, among others, have found that Krylov solvers preconditioned with Jacobi, block Jacobi or overlapped Schwarz are effective on unstructured problems. Gauss-Seidel does however have some attractive properties, namely: fast convergence, no global communication (ie, no dot products) and fewer flops per iteration as one can incorporate an initial guess naturally. This paper discusses an algorithm for parallelizing Gauss-Seidel for distributed memory computers for use as a multigrid smoother and compares its performance with preconditioned conjugate gradients on unstructured linear elasticity problems with up to 76 million degrees of freedom.
Coarse grid classification: a parallel coarsening scheme for algebraic multigrid methods
- NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS 13(2–3)
, 2006
"... In this paper we present a new approach to the parallelization of algebraic multigrid (AMG), i.e., to the parallel coarse grid selection in AMG. Our approach does not involve any special treatment of processor subdomain boundaries and hence avoids a number of drawbacks of other AMG parallelization t ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
(Show Context)
In this paper we present a new approach to the parallelization of algebraic multigrid (AMG), i.e., to the parallel coarse grid selection in AMG. Our approach does not involve any special treatment of processor subdomain boundaries and hence avoids a number of drawbacks of other AMG parallelization techniques. The key idea is to select an appropriate (local) coarse grid on each processor from all admissible grids such that the composed coarse grid forms a suitable coarse grid for the whole domain, i.e. there is no need for any boundary treatment. To this end, we first construct multiple equivalent coarse grids on each processor subdomain. In a second step we then select exactly one grid per processor by a graph clustering technique. The results of our numerical experiments clearly indicate that this approach results in coarse grids of high quality which are very close to those obtained with sequential AMG. Furthermore, the operator and grid complexities of our parallel AMG are mostly smaller than those obtained by other parallel AMG methods, whereas the scale-up behavior of the proposed algorithm is similar to that of other parallel AMG techniques. However a significant improvement with respect to the speed-up performance is achieved.