Results 1 - 10
of
11
ILUM: A Multi-Elimination ILU Preconditioner For General Sparse Matrices
- SIAM J. Sci. Comput
, 1999
"... Standard preconditioning techniques based on incomplete LU (ILU) factorizations offer a limited degree of parallelism, in general. A few of the alternatives advocated so far consist of either using some form of polynomial preconditioning, or applying the usual ILU factorization to a matrix obtain ..."
Abstract
-
Cited by 49 (9 self)
- Add to MetaCart
Standard preconditioning techniques based on incomplete LU (ILU) factorizations offer a limited degree of parallelism, in general. A few of the alternatives advocated so far consist of either using some form of polynomial preconditioning, or applying the usual ILU factorization to a matrix obtained from a multicolor ordering. In this paper we present an incomplete factorization technique based on independent set orderings and multicoloring. We note that in order to improve robustness, it is necessary to allow the preconditioner to have an arbitrarily high accuracy, as is done with ILUs based on threshold techniques. The ILUM factorization described in this paper is in this category. It can be viewed as a multifrontal version a Gaussian elimination procedure with threshold dropping which has a high degree of potential parallelism. The emphasis is on methods that deal specifically with general unstructured sparse matrices such as those arising from finite element methods on un...
Overlapping domain decomposition algorithms for general sparse matrices
, 1996
"... Abstract. Domain decomposition methods for Finite Element problems using a partition based on the underlying nite element mesh have been extensively studied. In this paper, we discuss algebraic extensions of the class of overlapping domain decomposition algorithms for general sparse matrices. The su ..."
Abstract
-
Cited by 29 (13 self)
- Add to MetaCart
Abstract. Domain decomposition methods for Finite Element problems using a partition based on the underlying nite element mesh have been extensively studied. In this paper, we discuss algebraic extensions of the class of overlapping domain decomposition algorithms for general sparse matrices. The subproblems are created with an overlapping partition of the graph corresponding to the sparsity structure of the matrix. These algebraic domain decomposition methods are especially useful for unstructured mesh problems. We also discuss some di culties encountered in the algebraic extension, particularly the issues related to the coarse solver. Key words. Sparse matrix, iterative methods, preconditioning, graph partitioning, domain decomposition. 1. Introduction. The
Multigrid Solution of the Convection-Diffusion Equation with High-Reynolds Number
- in Preliminary Proceedings of 1996 Copper Mountain Conference on Iterative Methods, Copper Mountain
, 1996
"... A fourth-order compact finite difference scheme is employed with the multigrid technique to solve the variable coefficient convection-diffusion equation with high-Reynolds number. Scaled inter-grid transfer operators and potential on vectorization and parallelization are discussed. The high-order mu ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
A fourth-order compact finite difference scheme is employed with the multigrid technique to solve the variable coefficient convection-diffusion equation with high-Reynolds number. Scaled inter-grid transfer operators and potential on vectorization and parallelization are discussed. The high-order multigrid method is unconditionally stable and produces solution of 4th-order accuracy. Numerical experiments are included. Key words: Multigrid method, high-order discretization, scaled residual transfer operator, convectiondiffusion equation. 1 Introduction Numerical simulation of the convection-diffusion equation plays a very important role in modern large scale scientific computation, especially in computational fluid dynamics. The general convection-diffusion equation with Dirichlet boundary conditions is of the form u xx (x; y) + u yy (x; y) + p(x; y)u x (x; y) + q(x; y)u y (x; y) = f(x; y); (x; y) 2\Omega ; u(x; y) = g(x; y); (x; y) 2 @\Omega ; ) (1) where p(x; y) and q(x; y) ar...
The Parallel U-Cycle Multigrid Method
- in Proceedings of the 8th Copper Mountain Conference on Multigrid Methods
, 1997
"... . A simple way to avoid idle processors in implementing the multigrid method on a parallel computer is to select a proper finer grid as the new coarsest grid. For clarity, the variant of the V-cycle generated by this approach is called the U-cycle in this paper. It is proved that the U-cycle with a ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
. A simple way to avoid idle processors in implementing the multigrid method on a parallel computer is to select a proper finer grid as the new coarsest grid. For clarity, the variant of the V-cycle generated by this approach is called the U-cycle in this paper. It is proved that the U-cycle with a finer coarsest grid can have a faster convergence rate, and the coarsest grid equations of the Ucycle can be solved approximately without increasing the total number of U-cycle iterations over what would be required using exact coarsest grid solutions. Then, a parallel U-cycle is defined by using domain partitioning techniques, which can be implemented on a MIMD multiprocessor computer without any idle processors. An analysis of the time complexity of the parallel U-cycle shows that the parallel U-cycle is fully scalable, and can have super-linear speed-up in comparison to the original V-cycle. Further, the scaled efficiency of the parallel U-cycle in the memory-constrained case is discussed...
New Parallel Sor Method By Domain Partitioning
- Local and Global Invariants of Linear Differential-Algebraic Equations and their Relation
, 1995
"... . In this paper, we propose and analyze a new parallel SOR method, the PSOR method, formulated by using domain partitioning and interprocessor data communication techniques. We prove that the PSOR method has the same asymptotic rate of convergence as the Red/Black (R/B) SOR method for the 5-point st ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
. In this paper, we propose and analyze a new parallel SOR method, the PSOR method, formulated by using domain partitioning and interprocessor data communication techniques. We prove that the PSOR method has the same asymptotic rate of convergence as the Red/Black (R/B) SOR method for the 5-point stencil on both strip and block partitions, and as the four-color (R/B/G/O) SOR method for the 9-point stencil on strip partitions. We also demonstrate the parallel performance of the PSOR method on four different MIMD multiprocessors (a KSR1, the Intel Delta, a Paragon and an IBM SP2). Finally, we compare the parallel performance of PSOR, R/B SOR and R/B/G/O SOR. Numerical results on the Paragon indicate that PSOR is more efficient than R/B SOR and R/B/G/O SOR in both computation and interprocessor data communication. Key words. parallel computing, SOR, multicolor SOR, JSOR, PSOR, convergence analysis, nonmigratory permutation AMS subject classifications. Primary 65Y05; Secondary 65F10. 1. ...
Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors
- in: ICS ’06: Proc. 20th Annual International Conference on Supercomputing
, 2006
"... Efficient solutions of partial differential equations require a match between the algorithm and the underlying architecture. The new chip-multiprocessors, CMPs (a.k.a. multicore), feature low intrachip communication cost and smaller per-thread caches compared to previous systems. From an algorithmic ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Efficient solutions of partial differential equations require a match between the algorithm and the underlying architecture. The new chip-multiprocessors, CMPs (a.k.a. multicore), feature low intrachip communication cost and smaller per-thread caches compared to previous systems. From an algorithmic point of view this means that data locality issues become more important than communication overheads. This may require re-evaluation of many existing algorithms. We have investigated parallel implementations of multigrid methods using a temporally blocked, naturally ordered, smoother implementation. Compared with the standard multigrid solution based on the two-color red-black algorithm, we improve the data locality often as much as ten times, while our use of a fine-grained locking scheme keeps the parallel efficiency high. While our algorithm initially was inspired by CMPs, it was surprising to see our OpenMP multigrid implementation run up to 40 percent faster than the standard red-black algorithm on an 8-way SMP system. Studying the smoother part of the algorithm in isolation often shows it performing two iterations at the same time as a single iteration with an ordinary red-black smoother. Running our smoother on a 32-thread UltraSPARC T1 (Niagara) SMT/CMP and a simulated 32-way CMP demonstrates the communication cost of our algorithm to be low on such architectures. 1
Language Concepts for Distributed Processing of Large Arrays
- Principles of Distributed Computing
, 1982
"... Ab s t r ac t A large array is an array whose storage is distributed among primary and secondary storage and whose processing may be distributed among several tasks in a distributed system. This paper presents a semantic model (set of lan-guage concepts) for representing large arrays in a distribute ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Ab s t r ac t A large array is an array whose storage is distributed among primary and secondary storage and whose processing may be distributed among several tasks in a distributed system. This paper presents a semantic model (set of lan-guage concepts) for representing large arrays in a distributed system in such a way that the performance realities inherent in the distributed storage and processing can be adequately represented. An implementation of the large ar-ray concept as an ADA package (abstract data type) is described, as well as a particular tailoring of the concept for the NASA Finite Element Machine. An example application program using the package is also described.
Multigrid Method and Fourth Order Compact Difference Scheme for 2D Poisson Equation with Unequal Meshsize Discretization
, 2001
"... A fourth order compact difference scheme with unequal meshsizes in different coordinate directions is employed to discretize two dimensional Poisson equation in a rectangular domain. Multigrid methods using a partial semicoarsening strategy and line Gauss-Seidel relaxation are designed to solve t ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
A fourth order compact difference scheme with unequal meshsizes in different coordinate directions is employed to discretize two dimensional Poisson equation in a rectangular domain. Multigrid methods using a partial semicoarsening strategy and line Gauss-Seidel relaxation are designed to solve the resulting sparse linear systems. Numerical experiments are conducted to test accuracy of the fourth order compact difference scheme and to compare it with the standard second order difference scheme. Convergence behavior of the partial semicoarsening and line Gauss-Seidel relaxation multigrid methods is examined experimentally. Key words: Poisson equation, fourth order compact scheme, unequal meshsize, multigrid method, semicoarsening. Mathematics Subject Classification: 65F10, 65N06, 65N22, 65N55, 76D07. 1 Introduction We are interested in the high accuracy numerical solution of two dimensional (2D) Poisson equation of the form u xx (x; y) + u yy (x; y) = f(x; y); (x; y) 2\Omeg...
Preconditioning Free Multigrid Method for Convection-Diffusion Equations with Variable Coefficients
, 1995
"... : A high order compact finite difference scheme is employed in conjunction with the multigrid algorithm to solve the convection-diffusion equations with variable coefficients. Special treatments, such as restriction on the coarsest grid and residual injection scaling factor for accelerating the conv ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
: A high order compact finite difference scheme is employed in conjunction with the multigrid algorithm to solve the convection-diffusion equations with variable coefficients. Special treatments, such as restriction on the coarsest grid and residual injection scaling factor for accelerating the convergence for both small and large Reynolds number problems, are discussed. A heuristic residual analysis is given to obtain a cost-effective residual injection operator for the diffusiondominated problems. The multigrid method requires neither a preconditioner nor added dissipation terms for high-Reynolds problems. Numerical experiments are employed to test the stability and efficiency of the proposed method. Key words: Multigrid method, high-order discretization, residual transfer, convection-diffusion equation. AMS subject classifications: 65F10, 65N22, 65N55. 1 Introduction Numerical simulation of convection-diffusion equations plays a very important role in modern large scale scientific...
Reordering Iterations in Runtime Loop Parallelization
- Proceedings of the 1996 Hawaii International Conference on System Sciences (HICSS-29) 1060-3425/96 $10.00 © 1996 IEEE PI K. Hwang, Advanced Computer Architecture: Parallelism, Scalability, Progmmmability, MIT Press and MCGRAW-Hill Inc
, 1992
"... When a loop in a sequential program is parallelized, it is normally guaranteed that all flow dependencies and anti-dependencies are respected so that the result of parallel execution is always the same as sequential execution. In some cases, however, the algorithm implemented by the loop allows the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
When a loop in a sequential program is parallelized, it is normally guaranteed that all flow dependencies and anti-dependencies are respected so that the result of parallel execution is always the same as sequential execution. In some cases, however, the algorithm implemented by the loop allows the iterations to be executed in a different sequential order than the one specified in the program. This opportunity can be exploited to expose parallelism that exists in the algorithm but is obscured by its sequential program implementation. In this paper, we show how parallelization of this kind of loop can be integrated into the runtime parallelization scheme of Saltz et al. [17, 18]. Runtime parallelization is a general technique appropriate for loops whose dependency structures cannot be determined at compile time. The compiler generates two pieces of code: the inspector examines dependencies at run time and computes a parallel schedule; the executor executes iterations in parallel accordi...

