24 citations found. Retrieving documents...
John R. Gilbert and Robert Schreiber. Highly parallel sparse Cholesky factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A Highly Scalable Parallel Algorithm for Sparse Matrix.. - Gupta, Karypis, Kumar (1995)   (39 citations)  (Correct)

....factorization used extensively in practice, their use for solving large sparse systems has been mostly confined to big vector supercomputers due to its high time and memory requirements. As a result, parallelization of sparse Cholesky factorization has been the subject of intensive research [26, 55, 12, 15, 14, 18, 54, 40, 41, 3, 49, 50, 57, 9, 28, 26, 27, 51, 2, 1, 44, 58, 16, 55, 43, 33, 5, 42, 4, 59]. We have developed highly scalable formulations of sparse Cholesky factorization that substantially improve the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. It is well known that dense matrix factorization can be ....

.... Theta(p 3 ) due to concurrency constraints because the algorithm cannot effectively utilize more than O( p N) processors for matrices arising from two dimensional constant node degree graphs. Recently, a number of schemes with two dimensional partitioning of the matrix have been proposed [18, 53, 52, 18, 54, 1, 44, 58, 16, 33, 42, 4, 59]. The least total communication volume in most of these schemes is O(N p p log p) box C) 2 . Most researchers so far have analyzed parallel sparse matrix factorization in terms of the total communication volume. It is noteworthy that, on any parallel architecture, the total communication ....

John R. Gilbert and Robert Schreiber. Highly parallel sparse Cholesky factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992. 24


Scalable Parallel Algorithms for Solving Sparse Systems of Linear.. - Gupta   (Correct)

....due to their high time and memory requirements. Parallel processing offers the potential to tackle both these problems; however, despite intensive research, only limited success had been achieved until recently in developing scalable parallel formulations of sparse matrix factorization [22, 51, 13, 16, 15, 36, 37, 4, 46, 47, 53, 10, 25, 22, 23, 48, 2, 1, 39, 54, 17, 51, 38]. We have developed a highly parallel sparse Cholesky factorization algorithm that substantially improves the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. We show that our algorithm is just as scalable as dense matrix ....

....algorithm is still #(p 3 ) due to concurrency constraints because the algorithm cannot effectively utilize more than O( # N) processors for matrices arising from two dimensional constant node degree graphs. A few schemes with two dimensional partitioning of the matrix have been proposed [50, 49, 1, 39, 54, 17], and the total communication volume in the best of these schemes [50, 49] is #(N # p log p) box C) Most researchers so far have analyzed parallel sparse matrix factorization in terms of the total communication volume. It is noteworthy that, on any parallel architecture, the total communication ....

John R. Gilbert and Robert Schreiber. Highly parallel sparse Cholesky factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992.


Efficient sparse Cholesky factorization on a massively.. - Manne, Hafsteinsson (1994)   (2 citations)  (Correct)

....are mostly for hypercube architectures, such as the Intel iPSC [11] 26] 30] Until recently the conventional wisdom has been that parallel computers based on the SIMD model are better suited for iterative rather than for direct algorithms for solving sparse linear systems. Gilbert and Schreiber [15] attempted to refute that belief by implementing a supernodal, multifrontal algorithm to compute the Cholesky factorization on the Connection Machine CM 2. Despite a disappointing MAEop rate they managed to show the feasibility of direct sparse methods on SIMD computers. Kratzer [19] demonstrated ....

J. R. Gilbert and R. Schreiber, Highly parallel sparse Cholesky factorization, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 11511172.


A High Performance Sparse Cholesky Factorization Algorithm.. - Karypis, Kumar (1994)   (5 citations)  (Correct)

....it uses the highly scalable two dimensional grid partitioning for dense matrix factorization for each supernodal computation in the multifrontal algorithm. As a result, the communication overhead of this scheme is the lowest of all other known parallel formulations for sparse matrix factorization [24, 25, 1, 31, 32, 39, 8, 38, 17, 33, 37, 3, 6, 18, 15, 40, 26, 12, 36, 35]. In fact, asymptotically, the isoefficiency of this scheme is O(p 1:5 ) for sparse matrices arising out of two and three dimensional finite element problems on a wide variety of architectures such as hypercube, mesh, fat tree, and three dimensional torus. Note that the isoefficiency of the ....

John R. Gilbert and Robert Schreiber. Highly Parallel Sparse Cholesky Factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992. 14


A High Performance Sparse Cholesky Factorization Algorithm.. - Karypis, Kumar (1994)   (5 citations)  (Correct)

....it uses the highly scalable two dimensional grid partitioning for dense matrix factorization for each supernodal computation in the multifrontal algorithm. As a result, the communication overhead of this scheme is the lowest of all other known parallel formulations for sparse matrix factorization [1, 3, 6, 9, 12, 14, 24, 25, 26, 27, 28, 19, 29]. In fact, asymptotically, the isoefficiency of this scheme is O(p 1.5 ) for sparse matrices arising out of two and threedimensional finite element problems on a wide variety of architectures such as hypercube, mesh, fat tree, and threedimensional torus. Note that the isoefficiency of the best ....

John R. Gilbert andRobert Schreiber. Highly Parallel Sparse Cholesky Factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992.


Highly Scalable Parallel Algorithms for Sparse Matrix.. - Gupta, Karypis, Kumar (1995)   (39 citations)  (Correct)

....factorization used extensively in practice, their use for solving large sparse systems has been mostly confined to big vector supercomputers due to its high time and memory requirements. As a result, parallelization of sparse Cholesky factorization has been the subject of intensive research [27, 59, 12, 15, 14, 18, 58, 41, 42, 3, 53, 54, 61, 9, 29, 27, 28, 55, 2, 1, 45, 62, 16, 59, 44, 34, 5, 43, 4, 63]. We have developed highly scalable formulations of sparse Cholesky factorization that substantially improve the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. It is well known that dense matrix factorization can be ....

....is still #(p 3 ) due to concurrency constraints because the algorithm cannot effectively utilize more than O( # N) processors for matrices arising from two dimensional constant node degree graphs. Recently, a number of schemes with two dimensional partitioning of the matrix have been proposed [18, 57, 56, 18, 58, 1, 45, 62, 16, 34, 43, 4, 63]. The least total communication volume in most of these schemes is O(N # p log p) box C) 2 . Most researchers so far have analyzed parallel sparse matrix factorization in terms of the total communication volume. It is noteworthy that, on any parallel architecture, the total communication volume ....

John R. Gilbert and Robert Schreiber. Highly parallel sparse Cholesky factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992.


Parallel Direct Solution of Large Sparse Systems in Finite Element.. - Lin (1993)   (5 citations)  (Correct)

....system, since elements of a frontal matrix have to be transferred and redistributed before and after the assembly. The multi frontal method is better suited for shared memory (super)computers with a small number of (vector) processors, but less suited to distributed memory architectures. In [6] Gilbert and Schreiber consider fine grain parallel factorization of large sparse matrices. In their scheme, the unit of operation to be distributed across the processors is a single multiplication or addition of two scalar numbers. This approach aims at maximally utilizing the parallelism ....

J.R. Gilbert and R. Schreiber, "Highly parallel sparse cholesky factorization ", SIAM J. Sci. Stat. Comput., Vol. 13, No. 5, 1992, pp.


Analysis and Design of Scalable Parallel Algorithms for Scientific .. - Gupta (1995)   (2 citations)  (Correct)

....due to their high time and memory requirements. Parallel processing offers the potential to tackle both these problems; however, despite intensive research, only limited success had been achieved until recently in developing scalable parallel formulations of sparse matrix factorization [63, 123, 44, 47, 46, 95, 96, 10, 114, 115, 127, 41, 69, 63, 64, 117, 8, 7, 104, 140, 49, 123, 103]. We have developed a highly parallel sparse Cholesky factorization algorithm that substantially improves the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. We show that our algorithm is just as scalable as dense matrix ....

....algorithm is still #(p 3 ) due to concurrency constraints because the algorithm cannot effectively utilize more than O( # N ) processors for matrices arising from two dimensional constant node degree graphs. A few schemes with two dimensional partitioning of the matrix have been proposed [120, 119, 7, 104, 140, 49], and the total communication volume in the best of these schemes [120, 119] is #(N # p log p) box 94 C) Most researchers so far have analyzed parallel sparse matrix factorization in terms of the total communication volume. It is noteworthy that, on any parallel architecture, the total ....

John R. Gilbert and Robert Schreiber. Highly parallel sparse Cholesky factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992. 155


A Scalable Parallel Algorithm for Sparse Matrix Factorization - Gupta, Kumar (1994)   (7 citations)  (Correct)

.... very poorly as the number of processors is increased [47, 46] In [2] Ashcraft proposes a fan both family of parallel Cholesky factorization algorithms that have a total communication volume of O(N p p log N ) A few schemes with two dimensional partitioning of the matrix have been proposed [46, 45, 1, 37, 50, 18], and the total communication volume in the best of these schemes [46, 45] is O(N p p log p) box C) Most researchers so far have analyzed parallel sparse matrix factorization in terms of the total communication volume. It is noteworthy that, on any parallel architecture, the total ....

John R. Gilbert and Robert Schreiber. Highly parallel sparse Cholesky factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992.


Distributed Sparse Gaussian Elimination And Orthogonal.. - Raghavan (1995)   (6 citations)  (Correct)

....low communication are desired in either case, there are substantial differences in the implementations. For SIMD machines, a data transfer ties up all processors even if only one is active, whereas, in MIMD machines a processor can transfer data while others are computing. Gilbert and Schreiber [21] and Kratzer [28, 29] have studied numeric factorization on SIMD machines. Gilbert and Schreiber develop a method to map to processors a set of dense matrix operations in a single level across the task tree of parallel multifrontal LU factorization for symmetric A. Kratzer [29] considers the same ....

....of the machine. Kratzer [28] also considers sparse orthogonal factorization using a column oriented approach where the machine is treated as a pipe line to restrict communication to nearest neighbors. This column oriented approach is substantially different from the earlier multifrontal approach [21, 29] which is conceptually similar to some methods for MIMD machines. Early research on sparse factorization on MIMD machines, reviewed by Heath et al. [23] considers column oriented methods for sparse Cholesky factorization. A key idea that resulted is the subtree to subcube map of of George et al. ....

J. R. Gilbert and R. Schreiber, Highly parallel sparse Cholesky factorization, SIAM J. Sci. Stat. Comput., 13 (1992), pp. 1151--1172.


On The LU Factorization Of Sequences Of Identically Structured.. - Hadfield (1994)   (5 citations)  (Correct)

.... Cholesky factorization) are the fan in methods [9, 117, 88] These methods accumulate contributions to the updating of the active submatrix and send fewer but larger messages [9] An increasingly popular approach to sparse matrix factorization are the symmetricpattern, multifrontal methods [105, 76, 51, 121, 122, 77]. Large grain parallelism is available in these methods via independent subtrees in the assembly trees. However, as subtrees combine and parallelism at that level decreases, a switch is made to exploit parallelism at a finer grain within the factorization of a particular frontal matrix. This is ....

.... structure and within the distinct pivot steps has been found to be critical to the success of all of these methods [76, 51, 121] Recently, fine grain parallelism has been investigated by Gilbert and Schreiber for a variety of sparse Cholesky methods on a SIMD distributed memory environment [77]. The multifrontal approach has demonstrated the greatest potential for exploiting this level of parallelism. A critical subissue in any parallel sparse matrix factorization routine is how to efficiently schedule the component tasks. This issue will be discussed in the next section. 2.7 ....

J. R. Gilbert and R. Schreiber. Highly parallel sparse cholesky factorization. SIAM J. Sci. Comput., 13(5):1151--1172, 1992.


Task Scheduling Using Block Dependency DAG of.. - Lee, Kim, Hong, Lee (1999)   (Correct)

....simulation. Much work has been done on parallelizing the sparse Cholesky factorization for solving large sparse systems of linear equations. Consequently, several different approaches have been proposed for parallel sparse Cholesky factorization based on supernodal or multifrontal approaches [1, 2, 3, 4, 5]. Great advances have been made with multifrontal methods that organize the sparse matrix into a sequence of partial factorization on smaller, dense submatrices [1, 2, 3] From the fact that multiple independent fronts can be processed independently, much successful work has been done using the ....

.... methods that organize the sparse matrix into a sequence of partial factorization on smaller, dense submatrices [1, 2, 3] From the fact that multiple independent fronts can be processed independently, much successful work has been done using the supernodal multifrontal method in parallel systems [4, 5]. The recent advanced methods for sparse matrix factorization decompose the sparse matrix into sub blocks and process non zero blocks using Level 3 Basic Linear Algebra Subprograms (BLAS) 6, 7] Such a 2 D decomposition is more scalable than a 1 D decomposition with an increased degree of ....

J. R. Gilbert and R. Schreiber, "Highly parallel sparse cholesky factorization," SIAM J. Sci. Stat. Comput., vol. 13, pp. 1151--1172, Sept. 1992.


A Parallel Formulation of Interior Point Algorithms - Karypis, Gupta, Kumar (1994)   (10 citations)  (Correct)

.... factorization algorithm has been analyzed in [19] It is shown there that, for matrices whose corresponding graphs are planar, the time spent by each processor for communication is Theta(m= p p) This overhead is smaller than the overheads of other schemes for parallel Cholesky factorization [39, 40, 1, 54, 55, 62, 15, 14, 23, 20, 59, 65, 47, 18, 57]. The experimental results in [19] suggest that our scheme is superior to other existing schemes even for non planar graphs. But the communication overhead for non planar graphs should be somewhat higher than Theta(m= p p) Hence Theta(m= p p) can be taken as a lower bound. Recall from ....

J. R. Gilbert and R. Schreiber. Highly Parallel Sparse Cholesky Factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992.


Developments and Trends in the Parallel Solution of Linear.. - Duff, van der Vorst (1999)   (1 citation)  (Correct)

....thus getting a high degree of reuse of data and a performance similar to the Level 3 BLAS. We now examine some of the history of attempts to parallelize sparse direct codes. Historic survey of techniques to exploit parallelism Early experiments on the massively parallel SIMD Connection machine [103] were a little disappointing although they did indicate the possibility of using massive parallelism in sparse factorization. The experiments did show that a grid distributed multifrontal implementation substantially outperformed an algorithm based on a left looking approach that exploited the ....

J. R. Gilbert and R. Schreiber. Highly parallel sparse Cholesky factorization. SIAM J. Scientific and Statistical Computing, 13:1151--1172, 1992.


Parallel Direct Methods For Sparse Linear Systems - Heath (1997)   (1 citation)  (Correct)

....shared memory and distributed memory architectures. Early parallel implementations included those of (Duff, 1986) Ashcraft, Grimes, Lewis, Peyton, and Simon, 1987) Benner, Montry, and Weigand, 1987) and (Lucas, Blank, and Tieman, 1987) More recent parallel implementations include those of (Gilbert and Schreiber, 1992), Heath and Raghavan, 1994a) Conroy, Kratzer, and Lucas, 1994) Rothberg, 1994) Gupta and Kumar, 1994) and (Karypis and Kumar, 1994) To summarize, at the current state of the art, the principal ingredients in an efficient parallel algorithm for sparse Cholesky factorization are ffl an ....

Gilbert, J., and Schreiber, R., 1992. "Highly parallel sparse Cholesky factorization," SIAM J. Sci. Stat. Comput. 13, pp. 1151-- 1172.


Efficient sparse Cholesky factorization on a parallel SIMD.. - Manne, Hafsteinsson (1995)   (1 citation)  (Correct)

....are mostly for hypercube architectures, such as the Intel iPSC [11] 25] 29] Until recently the conventional wisdom has been that parallel computers based on the SIMD model are better suited for iterative rather than for direct algorithms for solving sparse linear systems. Gilbert and Schreiber [15] attempted to refute that belief by implementing a supernodal, multifrontal algorithm to compute the Cholesky factorization on the Connection Machine CM 2. Despite a disappointing Mflop rate they managed to show the feasibility of direct sparse methods on SIMD computers. Kratzer [18] demonstrated ....

J. R. Gilbert and R. Schreiber, Highly parallel sparse Cholesky factorization, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 1151--1172.


An Algorithm Implementing a New Torus-Like Mapping for Parallel.. - Cleary   (Correct)

....a column oriented algorithm that would lower the communication volume to O(p 1=2 k 2 ) but at the cost of increasing the memory requirement by a factor of p 1=2 = log 2 k. Most previous work considers only column methods, though there have been attempts to map the data in two dimensions [GS92, NP87, SL90, CK93]. However, none of these methods utilizes the sparse structure of the matrix in a systematic enough way to attain the volume savings achieved here. We observe that from the original volume of O(pk 2 log 2 k) use of the torus mapping alone as in [SL90] re 2 duces the p term from p to p ....

J. Gilbert and R. Schreiber, Highly parallel sparse Cholesky factorization, To appear in SISSC (1992).


Sparse Numerical Linear Algebra: Direct Methods and Preconditioning - Duff (1996)   (9 citations)  (Correct)

.... original matrix is first permuted to block triangular form (Pothen and Fan 1990) A column ordering of the rectangular matrix A is obtained using a symmetric ordering on the structure of the normal equations, although the ordering can be obtained from A without requiring the formation of A T A (Gilbert et al. 1992). The column ordering can then be used to construct a computational tree to drive the numerical factorization, which is performed using a QR factorization (for example, Raghavan 1995a, Amestoy, Duff and Puglisi 1996, Matstoms 1995) This can be implemented using a supernodal or multifrontal ....

....the parallel processing. Johnson and Davis (1992) examine aspects of a parallel buddy memory system, while Amestoy and Duff (1993) discuss and compare several approaches and recommend a hybrid scheme with different memory management in different subregions of storage. The original experiments of Gilbert and Schreiber (1992) on the massively parallel SIMD Connection machine were a little disappointing although they did indicate the possibility of using massive parallelism in sparse factorization. The experiments did show that a grid distributed multifrontal implementation substantially outperformed a Router ....

[Article contains additional citation context not shown here]

Gilbert, J. R. and Schreiber, R. (1992), `Highly parallel sparse Cholesky factorization', SIAM J. Scientific and Statistical Computing 13, 1151--1172.


Spectral Nested Dissection - Pothen, Simon, Wang (1992)   (8 citations)  (Correct)

....graph (viewed as a rooted tree with an appropriate choice of a root clique) In [16] a more restrictive definition of a clique tree is given, which will not be repeated here. Clique trees have been applied in various contexts in sparse matrix algorithms: in addition to the above paper, see [4, 13, 26, 27]. 3. The spectral nested dissection algorithm. In this section, we first describe a spectral algorithm to compute vertex separators that was designed in [25] and then discuss how the separator algorithm can be recursively used to compute elimination orderings. The vertex separator algorithm is ....

J. R. Gilbert and R. Schreiber, Highly parallel sparse Cholesky factorization, Tech. Rep. CSL-90-7, Xerox Palo Alto Research Center, Palo Alto, CA, 1990.


Parallel Numerical Linear Algebra - Demmel, Heath, van der Vorst (1993)   (53 citations)  (Correct)

....of the front matrices is relatively complicated. As a consequence, multifrontal methods are difficult to specify succinctly, so we will not attempt to do so here, but note that multifrontal methods have been implemented for both shared memory (e.g. 19, 68] and distributed memory (e.g. [94, 140]) parallel computers, and are among the most effective methods known for sparse factorization in all types of computational environments. For a unified description and comparison of parallel fan in, fan out, and multifrontal methods, see [7] In this brief section on parallel direct methods for ....

J. Gilbert and R. Schreiber. Highly parallel sparse cholesky factorization. SIAM J. Sci. Stat. Comput., 13:1151--1172, 1992.


Improved Load Distribution in Parallel Sparse Cholesky.. - Rothberg, Schreiber (1994)   (40 citations)  Self-citation (Schreiber)   (Correct)

....block on the diagonal, and an identical set of non zeroes for each column below the diagonal block. Supernodes arise in any sparse factor, and they are typically quite large. The regularity in the sparse matrix captured by this supernodal structure can be exploited for a variety of purposes [3, 13, 17, 19]. We exploit it to simplify the internal non zero structure of blocks of the matrix. Specifically, we create block columns whose member columns belong to the same supernode. As a result, we obtain blocks whose rows are either completely zero or are dense. This regular structure allows the block ....

Gilbert, J., and Schreiber, R., "Highly parallel sparse Cholesky factorization", SIAM Journal on Scientific and Statistical Computing, 13: 1151-1172, 1992.


Highly Scalable Parallel Algorithms for Sparse Matrix.. - Gupta, Karypis, Kumar (1995)   (39 citations)  (Correct)

No context found.

John R. Gilbert and Robert Schreiber. Highly parallel sparse Cholesky factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992.


WSMP: A High-Performance Shared- and Distributed-Memory.. - Gupta, Joshi (2001)   (Correct)

No context found.

John R. Gilbert and Robert Schreiber. Highly parallel sparse Cholesky factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992.


WSMP: A High-Performance Shared- and Distributed-Memory.. - Gupta, Joshi (2001)   (Correct)

No context found.

John R. Gilbert and Robert Schreiber. Highly parallel sparse Cholesky factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC