16 citations found. Retrieving documents...
E. G. Ng and B. W. Peyton, A supernodal Cholesky factorization algorithm for sharedmemory multiprocessors, SIAM J. Sci. Comput., 14 (1993), pp. 761--769.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Multiple-Rank Modifications of a Sparse Cholesky Factorization - Davis, Hager (1999)   (Correct)

....utilized in the computations. This reordering improves the numerical update downdate algorithm by placing all columns of W that a ect any given subpath next to each other, eliminating an indexing operation. Reordering the columns of a sparse matrix prior to Cholesky factorization is very common [3, 22, 23, 25]. It improves data locality and simpli es the algorithm, just as it does for reordering W in a multiple rank update downdate. The depth rst ordering of the tree changes as the elimination tree changes, so columns of W must be ordered for each update or downdate. To illustrate this reordering, ....

E. G. Ng and B. W. Peyton, A supernodal Cholesky factorization algorithm for sharedmemory multiprocessors, SIAM J. Sci. Comput., 14 (1993), pp. 761-769.


Analysis, Tuning and Comparison of Two General Sparse .. - Amestoy, Duff.. (2000)   (Correct)

....[2] MA49 Multifrontal QR RECT www.cse.clrc.ac. uk Activity HSL [5] PARDISO Left right looking UNS Schenk [29] PSLDLT Left looking SPD SGI product [28] PSLDU Left looking UNS SGI product [28] SuperLU Left looking UNS www.nersc.gov xiaoye SuperLU [10] PanelLLT Left looking SPD Ng [26] Table 16: Shared memory codes Acknowledgments We want to thank James Demmel, Jacko Koster and Rich Vuduc for very helpful discussions. We are grateful to Chiara Puglisi for her comments on an early version of this report and her help with the presentation. 48 9 Appendix The complete set of ....

E. G. Ng and B. W. Peyton. A supernodal Cholesky factorization algorithm for shared-memory multiprocessors. SIAM Journal on Scientific and Statistical Computing, 14:761--769, 1993.


Analysis and Comparison of Two General Sparse Solvers .. - Amestoy, Duff.. (2000)   (Correct)

....Only object code for IBM is available. No numerical pivoting performed. Code Technique Scope Availability Ref GSPAR Interpretative UNS Grund [9] MA41 Multifrontal UNS www.cse.clrc.ac.uk Activity HSL [2] MA49 Multifrontal QR RECT www.cse.clrc.ac. uk Activity HSL [5] PanelLLT Left looking SPD Ng [28] PARDISO Left right looking UNS Schenk [31] PSLDLT y Left looking SPD SGI product [30] PSLDU y Left looking UNS SGI product [30] SuperLU Left looking UNS www.nersc.gov xiaoye SuperLU [11] Table 17: Shared memory codes y Only object code for SGI is available Acknowledgments We want to ....

E. G. Ng and B. W. Peyton. A supernodal Cholesky factorization algorithm for shared-memory multiprocessors. SIAM Journal on Scientic and Statistical Computing, 14:761-769, 1993.


Analysis, Tuning and Comparison of Two General Sparse .. - Amestoy, Duff.. (2000)   (Correct)

....[2] MA49 Multifrontal QR RECT www.cse.clrc.ac. uk Activity HSL [5] PARDISO Left right looking UNS Schenk [29] PSLDLT Left looking SPD SGI product [28] PSLDU Left looking UNS SGI product [28] SuperLU Left looking UNS www.nersc.gov xiaoye SuperLU [10] PanelLLT Left looking SPD Ng [26] Table 16: Shared memory codes Acknowledgments We want to thank James Demmel, Jacko Koster and Rich Vuduc for very helpful discussions. We are grateful to Chiara Puglisi for her comments on an early version of this report and her help with the presentation. 48 9 Appendix The complete set of ....

E. G. Ng and B. W. Peyton. A supernodal Cholesky factorization algorithm for shared-memory multiprocessors. SIAM Journal on Scientic and Statistical Computing, 14:761-769, 1993.


Scalable Parallel Sparse Factorization with Left-Right.. - Schenk, Gärtner.. (1999)   (Correct)

....that are already factorized. Although a pipelining approach is difficult to realize in sparse direct solver packages, the pipelining parallelism is essential to achieve higher concurrency. Our parallel formulation of the sparse numerical factorization is based on the general framework described in [2, 10]. It successively computes fractions of supernodes called panels, which are introduced to increase load balancing and pipelining parallelism in the sequential part of the elimination tree. The algorithm with reduced synchronization is shown in Figure 1. First, the scheduler is initialized with ....

....iterative solver failed during the simulation. The main memory requirement of DESSIS GammaI SE with the parallel direct solver was about 4 Gbytes on the SGI Origin 2000 for the largest example with 105 237 vertices. 5 Summary and outlook. The implementation features state of the art techniques [1, 10, 11]. A very high level of efficiency has been achieved on typical shared memory parallel servers and supercomputers. The proposed left right looking supernode algorithm reduces the synchronization events required to manage the factorization to O(n) This results in a parallel efficiency of ....

E. Ng and B. Peyton, A supernodal Cholesky factorization algorithm for shared-memory multiprocessors, SIAM Journal on Scientific Computing, 14 (1993), pp. 761--769.


Application of Parallel Sparse Direct Methods in.. - Schenk, Gärtner.. (1999)   (Correct)

....Although a pipelining approach is difficult to realize in sparse direct solver packages, the pipelining parallelism is essential to achieve higher concurrency. In the PARDISO package, fractions of supernodes are owned by processors and spawned for OpenMP threads. Following Ng and Peyton [9], we build a pool of tasks containing the list of tasks that can be performed by available processors. This pool is initialized with all leaves of the supernode elimination tree. One important consequence of the left looking supernode algorithm is that the factorization of a supernode S consists ....

E.G. Ng and B.W. Peyton. A supernodal Cholesky factorization algorithm for shared-memory multiprocessors. SIAM Journal on Scientific Computing, 14(4):761--769, 1993.


Scalable Parallel Sparse LU Factorization with a.. - Schenk, Fichtner.. (2000)   (Correct)

....well balanced task distribution requires dynamic job scheduling. Processors are dynamically assigned to supernodes in the elimination tree in such a way that new tasks are given to processors when the previous tasks are finished. Dynamic job scheduling is implemented by a pool of tasks approach [2, 4, 8, 10]. The pool contains the list of tasks that can be performed by available processors. Therefore, the queue is initialized with all leaves of the elimination tree (line 5) Then, a group of p processes is created. Each process asks the queue for a new task until all supernodes have been factorized. ....

E.G. Ng and B.W. Peyton. A supernodal Cholesky factorization algorithm for shared-memory multiprocessors. SIAM Journal on Scientific Computing, 14(4):761--769, 1993.


A Parallel Interior-Point Algorithm for Linear Programming.. - Andersen, Andersen (1998)   (2 citations)  (Correct)

....speedup was obtained for the BLAS3 subroutines i.e. the matrix with matrix products. For the computationally cheap BLAS1 and BLAS2 operations, the speed up was insignificant due to the parallel overhead. The above approach is known as automatic parallelization of the Cholesky decomposition in [26], where it also is reported to give poor results. In summary it was necessary to modify the algorithm for computation of the Cholesky decomposition to achieve a significant speed up. The modified Cholesky decomposition is still partly based on the push Cholesky, because it leads to efficient use ....

E. Ng and B. W. Peyton. A supernodal Cholesky factorization algorithm for shared-- memory multiprocessors. SIAM J. Sci. Statist. Comput., 14(4):761--769, 1993.


Developments and Trends in the Parallel Solution of Linear.. - Duff, van der Vorst (1999)   (1 citation)  (Correct)

....Cholesky factorizations, either the fan in or the fan out algorithm. The original codes just used a column column formulation of the algorithm as in the fan in algorithm of [94] but it was soon apparent that better efficiency could be obtained using a supernode column fan in approach as in [156]. Some of this early work on parallel algorithms for distributed memory computers is reviewed by Heath, Ng, and Peyton [117] For distributed memory machines, processors can be assigned work corresponding to subtrees, but this requires quite balanced trees. A breadth first search strategy can be ....

E. G. Ng and B. W. Peyton. A supernodal Cholesky factorization algorithm for shared-memory multiprocessors. SIAM J. Scientific Computing, 14:761--769, 1993.


Two-dimensional Block Partitionings for the.. - Dumitrescu.. (1997)   (3 citations)  (Correct)

....Thus, supernodes groups of consecutive columns with the same row structure were used instead of columns. The columns of a supernode can be factored together as for a dense matrix, allowing block operations. This approach, named block column (or 1D) was used, to give only few examples, in [16] or [8] but goes back to [4] and even earlier. Since the supernode structure is specific to each matrix, there are two ways to adapt supernode sizes. Amalgamation as proposed by Ashcraft and Grimes [3] allows the grouping of several supernodes into a greater supernode, with the sacrifice of ....

E. Ng and B.W. Peyton. A Supernodal Cholesky Factorization Algorithm for Sharedmemory Multiprocessors. SIAM J.Sci.Comput., 14(4):761--769, July 1993.


Implementation of Interior Point Methods for Large.. - Andersen, Gondzio.. (1996)   (45 citations)  (Correct)

.... the cache memory on several computer architectures [52] The effect of the supernodal methods is highly hardware dependent and several results can be found in the literature: the efficiency of the supernodal decomposition on the shared memory multiprocessors is discussed by Esmond and Peyton [69], the exploitation of the cache memory on high performance workstations is studied by Rothberg and Gupta [72] in the framework of the right looking factorization while the case of the left looking factorization was investigated by M esz aros [64] Block Cholesky factorization Another ....

E. Ng and B. W. Peyton. A supernodal Cholesky factorization algorithm for shared--memory multiprocessors. SIAM J. Sci. Statist. Comput., 14(4):761--769, 1993.


Sparse Numerical Linear Algebra: Direct Methods and Preconditioning - Duff (1996)   (9 citations)  (Correct)

.... sparse factorization 3 3 Orderings for symmetric problems 5 4 Solution of sets of sparse unsymmetric equations 9 5 Solution of indefinite symmetric systems 10 6 Sparse least squares 11 7 Parallel computing 12 8 Preconditioning 15 9 Towards a sparse problem solving environment 19 10 Concluding remarks 20 i 1 Introduction In common with many of my co authors in this volume, my starting point was to read the review on this subject given at the last State of the Art meeting in Birmingham. The article Sparse Matrices was authored by John Reid, and it is perhaps significant that it covered both ....

....for iterative methods. Undoubtedly, the kernel that gets closest to peak performance on modern computers is a dense matrix matrix multiply. A standard version of this kernel is provided by subroutine GEMM in the Level 3 Basic Linear Algebra Subprograms or BLAS (Dongarra, Du Croz, Duff and Hammarling 1990) and is supported by many vendors of high performance computers. Most sparse direct codes use this and allied kernels to achieve high performance, and we discuss how they are able to do this in Section 2. Ten years ago, one might have been forgiven for thinking that the problem of ordering a ....

[Article contains additional citation context not shown here]

Ng, E. G. and Peyton, B. W. (1993b), `A supernodal Cholesky factorization algorithm for shared-memory multiprocessors', SIAM J. Scientific Computing 14, 761--769.


Sparse Gaussian Elimination on High Performance Computers - Li (1996)   (19 citations)  (Correct)

....DEC AlphaServer 8400 [46] and Cray C90 J90 [110, 111] In addition to demonstrating the efficiency of our parallel algorithm on these machines, we also study the (theoretical) upper bound on performance of this algorithm. Several methods have been proposed to perform sparse Cholesky factorization [49, 73, 90] and sparse LU factorization [6, 57, 65] on shared memory machines. A common practice is to organize the program as a self scheduling loop, interacting with a global pool of tasks that are ready to be executed. Each processor repeatedly takes a task from the pool, executes it, and puts new ready ....

Esmond G. Ng and Barry W. Peyton. A supernodal Cholesky factorization algorithm for shared-memory multiprocessors. SIAM J. Sci. Comput., 14(4):761--769, July 1993.


Multiple-Rank Modifications of a Sparse Cholesky Factorization - Davis, Hager (2001)   (Correct)

No context found.

E. G. Ng and B. W. Peyton, A supernodal Cholesky factorization algorithm for sharedmemory multiprocessors, SIAM J. Sci. Comput., 14 (1993), pp. 761--769.


PARDISO: A High-Performance Serial and Parallel.. - Schenk, Gärtner.. (2000)   (Correct)

No context found.

E. Ng and B. Peyton, A supernodal Cholesky factorization algorithm for shared-memory multiprocessors, SIAM Journal on Scientic Computing, 14 (1993), pp. 761-769.


Task Scheduling Using a Block Dependency DAG for.. - Lee, Kim, Hong, Lee (2000)   (Correct)

No context found.

E. G. Ng and B. W. Peyton. A supernodal cholesky factorization algorithm for shared-memory multiprocessors. SIAM J. Sci. Comput., 14(4):761--769, 1993.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC