Results 1  10
of
129
Preconditioning techniques for large linear systems: A survey
 J. COMPUT. PHYS
, 2002
"... This article surveys preconditioning techniques for the iterative solution of large linear systems, with a focus on algebraic methods suitable for general sparse matrices. Covered topics include progress in incomplete factorization methods, sparse approximate inverses, reorderings, parallelization i ..."
Abstract

Cited by 192 (5 self)
 Add to MetaCart
(Show Context)
This article surveys preconditioning techniques for the iterative solution of large linear systems, with a focus on algebraic methods suitable for general sparse matrices. Covered topics include progress in incomplete factorization methods, sparse approximate inverses, reorderings, parallelization issues, and block and multilevel extensions. Some of the challenges ahead are also discussed. An extensive bibliography completes the paper.
hypre: a Library of High Performance Preconditioners
 Preconditioners,” Lecture Notes in Computer Science
, 2002
"... hypre is a software library for the solution of large, sparse linear systems on massively parallel computers. Its emphasis is on modern powerful and scalable preconditioners. hypre provides various conceptual interfaces to enable application users to access the library in the way they naturally ..."
Abstract

Cited by 90 (5 self)
 Add to MetaCart
hypre is a software library for the solution of large, sparse linear systems on massively parallel computers. Its emphasis is on modern powerful and scalable preconditioners. hypre provides various conceptual interfaces to enable application users to access the library in the way they naturally think about their problems. This paper presents the conceptual interfaces in hypre. An overview of the preconditioners that are available in hypre is given, including some numerical results that show the eciency of the library.
Parallel multigrid smoothing: polynomial versus GaussSeidel
 J. Comp. Phys
, 2003
"... Abstract. GaussSeidel method is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as GaussSeidel. This leads us to consider alternative smoothers. ..."
Abstract

Cited by 46 (13 self)
 Add to MetaCart
(Show Context)
Abstract. GaussSeidel method is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as GaussSeidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as GaussSeidel are illustrated on several applications: Poisson’s equation, thinbody elasticity, and eddy current approximations to Maxwell’s equations. While parallelizing the GaussSeidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with GaussSeidel smoothers on serial machines.
The design and implementation of hypre, a library of parallel high performance preconditioners
 Numerical solution of Partial Differential Equations on Parallel Computers, Lect. Notes Comput. Sci. Eng
, 2006
"... Summary. The hypre software library provides high performance preconditioners and solvers for the solution of large, sparse linear systems on massively parallel computers. One of its attractive features is the provision of conceptual interfaces. These interfaces give application users a more natura ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
(Show Context)
Summary. The hypre software library provides high performance preconditioners and solvers for the solution of large, sparse linear systems on massively parallel computers. One of its attractive features is the provision of conceptual interfaces. These interfaces give application users a more natural means for describing their linear systems, and provide access to methods such as geometric multigrid which require additional information beyond just the matrix. This chapter discusses the design of the conceptual interfaces in hypre and illustrates their use with various examples. We discuss the data structures and parallel implementation of these interfaces. A brief overview of the solvers and preconditioners available through the interfaces is also given. 1
Reducing complexity in parallel algebraic multigrid preconditioners
 SIAM J. Matrix Anal. Appl
, 2006
"... Abstract. Algebraic multigrid (AMG) is a very efficient iterative solver and preconditioner for large unstructured sparse linear systems. Traditional coarsening schemes for AMG can, however, lead to computational complexity growth as problem size increases, resulting in increased memory use and exec ..."
Abstract

Cited by 33 (8 self)
 Add to MetaCart
(Show Context)
Abstract. Algebraic multigrid (AMG) is a very efficient iterative solver and preconditioner for large unstructured sparse linear systems. Traditional coarsening schemes for AMG can, however, lead to computational complexity growth as problem size increases, resulting in increased memory use and execution time, and diminished scalability. Two new parallel AMG coarsening schemes are proposed that are based solely on enforcing a maximum independent set property, resulting in sparser coarse grids. The new coarsening techniques remedy memory and execution time complexity growth for various large threedimensional (3D) problems. If used within AMG as a preconditioner for Krylov subspace methods, the resulting iterative methods tend to converge fast. This paper discusses complexity issues that can arise in AMG, describes the new coarsening schemes, and examines the performance of the new preconditioners for various large 3D problems. Key words. parallel coarsening algorithms, algebraic multigrid, complexities, preconditioners
Geometric optimization of the evaluation of finite element matrices
 SIAM J. Sci. Comput
"... Abstract. Assembling stiffness matrices represents a significant cost in many finite element computations. We address the question of optimizing the evaluation of these matrices. By finding redundant computations, we are able to significantly reduce the cost of building local stiffness matrices for ..."
Abstract

Cited by 29 (17 self)
 Add to MetaCart
(Show Context)
Abstract. Assembling stiffness matrices represents a significant cost in many finite element computations. We address the question of optimizing the evaluation of these matrices. By finding redundant computations, we are able to significantly reduce the cost of building local stiffness matrices for the Laplace operator and for the trilinear form for NavierStokes. For the Laplace operator in two space dimensions, we have developed a heuristic graph algorithm that searches for such redundancies and generates code for computing the local stiffness matrices. Up to cubics, we are able to build the stiffness matrix on any triangle in less than one multiplyadd pair per entry. Up to sixth degree, we can do it in less than about two. Preliminary lowdegree results for Poisson and NavierStokes operators in three dimensions are also promising.
Characterizing the Influence of System Noise on LargeScale Applications by Simulation
 In International Conference for High Performance Computing, Networking, Storage and Analysis (SC’10
, 2010
"... Abstract—This paper presents an indepth analysis of the impact of system noise on largescale parallel application performance in realistic settings. Our analytical model shows that not only collective operations but also pointtopoint communications influence the application’s sensitivity to nois ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents an indepth analysis of the impact of system noise on largescale parallel application performance in realistic settings. Our analytical model shows that not only collective operations but also pointtopoint communications influence the application’s sensitivity to noise. We present a simulation toolchain that injects noise delays from traces gathered on common largescale architectures into a LogGPS simulation and allows new insights into the scaling of applications in noisy environments. We investigate collective operations with up to 1 million processes and three applications (Sweep3D, AMG, and POP) with up to 32,000 processes. We show that the scale at which noise becomes a bottleneck is systemspecific and depends on the structure of the noise. Simulations with different network speeds show that a 10x faster network does not improve application scalability. We quantify noise and conclude that our tools can be utilized to tune the noise signatures of a specific system. I. MOTIVATION AND BACKGROUND The performance impact of operating system and architectural overheads (system noise) at massive scale is increasingly of concern. Even small local delays on compute nodes, which can be caused by interrupts, operating system daemons, or even cache or page misses, can affect global application performance significantly [1]. Such local delays often cause less than 1 % overhead per process but severe performance losses can occur if noise is propagated (amplified) through communication or global synchronization. Previous analyses generally assume that the performance impact of system noise grows at scale and Tsafrir et al. [2] even suggest that the impact of very low frequency noise scales linearly with the system size. A. Related Work Petrini, Kerbyson, and Pakin [1] report that the parallel performance of SAGE on a fixed number of ASCI Q nodes was highest when SAGE used only three of the four CPUs per node. It turned out that “resonance ” between the application’s collective communication and the misconfigured system caused delays during each iteration. Jones, Brenner, and Fier
Combining Performance Aspects of Irregular GaussSeidel via Sparse Tiling
 in 15th Workshop on Languages and Compilers for Parallel Computing (LCPC
, 2002
"... Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as GaussSeidel. To improve performance, iterative smoothers can exploit parallelism, intraiteration data reuse, and interiteration data reuse. Current met ..."
Abstract

Cited by 25 (12 self)
 Add to MetaCart
(Show Context)
Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as GaussSeidel. To improve performance, iterative smoothers can exploit parallelism, intraiteration data reuse, and interiteration data reuse. Current methods for parallelizing GaussSeidel on irregular grids, such as multicoloring and ownercomputes based techniques, exploit parallelism and possibly intraiteration data reuse but not interiteration data reuse. Sparse tiling techniques were developed to improve intraiteration and interiteration data locality in iterative smoothers. This paper describes how sparse tiling can additionally provide parallelism. Our results show the effectiveness of GaussSeidel parallelized with sparse tiling techniques on shared memory machines, specifically compared to ownercomputes based GaussSeidel methods. The latter employ only parallelism and intraiteration locality. Our results support the premise that better performance occurs when all three performance aspects (parallelism, intraiteration, and interiteration data locality) are combined.
A Distributed Memory Unstructured GaussSeidel Algorithm for Multigrid Smoothers
, 2001
"... GaussSeidel is a popular multigrid smoother as it is provably optimal on structured grids and exhibits superior performance on unstructured grids. GaussSeidel is not used to our knowledge on distributed memory machines as it is not obvious how to parallelize it effectively. We, among others, have ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
GaussSeidel is a popular multigrid smoother as it is provably optimal on structured grids and exhibits superior performance on unstructured grids. GaussSeidel is not used to our knowledge on distributed memory machines as it is not obvious how to parallelize it effectively. We, among others, have found that Krylov solvers preconditioned with Jacobi, block Jacobi or overlapped Schwarz are effective on unstructured problems. GaussSeidel does however have some attractive properties, namely: fast convergence, no global communication (ie, no dot products) and fewer flops per iteration as one can incorporate an initial guess naturally. This paper discusses an algorithm for parallelizing GaussSeidel for distributed memory computers for use as a multigrid smoother and compares its performance with preconditioned conjugate gradients on unstructured linear elasticity problems with up to 76 million degrees of freedom.
Coarse grid classification: a parallel coarsening scheme for algebraic multigrid methods
 NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS 13(2–3)
, 2006
"... In this paper we present a new approach to the parallelization of algebraic multigrid (AMG), i.e., to the parallel coarse grid selection in AMG. Our approach does not involve any special treatment of processor subdomain boundaries and hence avoids a number of drawbacks of other AMG parallelization t ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
(Show Context)
In this paper we present a new approach to the parallelization of algebraic multigrid (AMG), i.e., to the parallel coarse grid selection in AMG. Our approach does not involve any special treatment of processor subdomain boundaries and hence avoids a number of drawbacks of other AMG parallelization techniques. The key idea is to select an appropriate (local) coarse grid on each processor from all admissible grids such that the composed coarse grid forms a suitable coarse grid for the whole domain, i.e. there is no need for any boundary treatment. To this end, we first construct multiple equivalent coarse grids on each processor subdomain. In a second step we then select exactly one grid per processor by a graph clustering technique. The results of our numerical experiments clearly indicate that this approach results in coarse grids of high quality which are very close to those obtained with sequential AMG. Furthermore, the operator and grid complexities of our parallel AMG are mostly smaller than those obtained by other parallel AMG methods, whereas the scaleup behavior of the proposed algorithm is similar to that of other parallel AMG techniques. However a significant improvement with respect to the speedup performance is achieved.