15 citations found. Retrieving documents...
A. George, J. W.-H. Liu, and E. G.-Y. Ng. Communication reduction in parallel sparse Cholesky factorization on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors 1987.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Graph Partitioning Based Sparse Matrix Orderings for.. - Gupta (1996)   (1 citation)  (Correct)

....which WGPP produced orderings significantly better than the best MD based heuristics. In addition, graph bisection yields two submatrices of the original matrix that can be factored independently in parallel. This is the basis of many efficient parallel algorithms for sparse matrix factorization [16, 21, 11]. A key step in our ordering algorithm is finding a small node bisector of a graph. Recent research [3, 18, 22, 14] has shown multilevel algorithms to be fast and effective in computing graph partitions. A typical multilevel graph partitioning algorithm has four components, namely coarsening, ....

Alan George, Joseph W.-H. Liu, and Esmond G.-Y. Ng. Communication reduction in parallel sparse Cholesky factorization on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors 1987.


Molecular Structure Computation from Multiple Data Sources - Chen (2000)   (Correct)

....be derived using information computed in the preprocessing steps. Consequently, most partitioning methods discussed in the literature are based on static analysis of the input matrix. An early processor mapping algorithm which attempts to reduce inter process communication is subtree to subcube [34, 36], which works well for balanced tree topologies. Later research efforts have improved upon the load balancing aspects for more general elimination trees [29, 32, 50, 55, 75, 76, 81] Our static partitioning heuristic shares similarities with several of these methods. In less predictable ....

A. George, J. Liu, and E. Ng, "Communication Reduction in Parallel Sparse Cholesky Factorization on a Hypercube", pp. 576-586, in Proc. Second Conference on Hypercube Multiprocessors, Knoxville, TN, 1986.


Parallel Algorithms for Forward and Back Substitution in.. - Gupta, Kumar (1995)   (1 citation)  (Correct)

....a two dimensional partitioning among groups of processors. However, this distribution is not suitable for the triangular solvers, which are scalable only with a one dimensional partitioning of the supernodal blocks of L. We show that if the supernodes are distributed in a subtree to subcube manner [2] then the cost of converting the two dimensional distribution to a one dimensional distribution is only a constant times the cost of solving the triangular systems. From our experiments, we observed that this constant is fairly small on the Cray T3D at most 0.9 for a single right hand side ....

....and fill ins are denoted by the symbol # . 2. 1 Forward elimination The basic approach to forward elimination is very similar to that of multifrontal numerical factorization [12] guided by an elimination tree [13, 8] with the distribution of computation determined by a subtree to subcube mapping [2]. A symmetric sparse matrix, its lower triangular Cholesky factor, and the corresponding elimination tree with subtree to subcube mapping onto 8 processors is shown in Figure 1. The computation in forward elimination starts with the leaf supernodes of the elimination tree and progresses upwards to ....

A. George, J. W.-H. Liu, and E. G.-Y. Ng. Communication reduction in parallel sparse Cholesky factorization on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors


A Highly Scalable Parallel Algorithm for Sparse Matrix.. - Gupta, Karypis, Kumar (1995)   (39 citations)  (Correct)

....factorization used extensively in practice, their use for solving large sparse systems has been mostly confined to big vector supercomputers due to its high time and memory requirements. As a result, parallelization of sparse Cholesky factorization has been the subject of intensive research [26, 55, 12, 15, 14, 18, 54, 40, 41, 3, 49, 50, 57, 9, 28, 26, 27, 51, 2, 1, 44, 58, 16, 55, 43, 33, 5, 42, 4, 59]. We have developed highly scalable formulations of sparse Cholesky factorization that substantially improve the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. It is well known that dense matrix factorization can be ....

....Box D represents our algorithm, which is a significant improvement over other known classes of algorithms for this problem. 3 the column based schemes represented in box A has been improved using smarter ways of mapping the matrix columns onto processors, such as, the subtree to subcube mapping [14] (box B) A number of column based parallel factorization algorithms [40, 41, 3, 49, 50, 57, 12, 9, 28, 26, 55, 43, 5] have a lower bound of O(Np) on the total communication volume [15] Since the overall computation is only O(N 1:5 ) 13] the ratio of communication to computation of ....

Alan George, Joseph W.-H. Liu, and Esmond G.-Y. Ng. Communication reduction in parallel sparse Cholesky factorization on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors


A Reordering and Mapping Algorithm for Parallel.. - Kumar, Eswar.. (1994)   (1 citation)  (Correct)

....algorithm [8] is used to generate a low fill ordering. An equivalent ordering is then produced that preserves the fill structure, but is more suitable for parallel factorization [12, 14, 17, 18] Mapping algorithms attempt to map the nodes in an elimination tree to a finite number of processors [2, 4, 9, 20]. Independent subtrees of the elimination tree correspond to independent subproblems that can be solved in parallel without incurring any communication between them. The objective of the mapping algorithms is to attempt to achieve load balance and minimize communication costs. Previous approaches ....

....same number of operations to compute the Cholesky factor. In early work on the mapping problem, nodes were ordered level by level in the elimination tree and mapped cyclically onto the processors. This resulted in good load balancing, but also in unnecessarily high communication. George et al. [9] introduced the subtree to subcube mapping that is effective in reducing communication while maintaining good load balance for model grid problems. Geist et al. proposed the bin packing algorithm for mapping arbitrary trees [4] Pothen et al. 20] introduced the proportional mapping algorithm that ....

A. George, J.W.-H. Liu, and E.G.-Y. Ng. Communication Reduction in Parallel Sparse Cholesky Factorization on a Hypercube. In Hypercube Multiprocessors, pages 576--586, 1987.


Scalable Parallel Algorithms for Solving Sparse Systems of Linear.. - Gupta   (Correct)

....due to their high time and memory requirements. Parallel processing offers the potential to tackle both these problems; however, despite intensive research, only limited success had been achieved until recently in developing scalable parallel formulations of sparse matrix factorization [22, 51, 13, 16, 15, 36, 37, 4, 46, 47, 53, 10, 25, 22, 23, 48, 2, 1, 39, 54, 17, 51, 38]. We have developed a highly parallel sparse Cholesky factorization algorithm that substantially improves the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. We show that our algorithm is just as scalable as dense matrix ....

.... of this type on p processors results in an #(Np log N) total communication volume [16] box A) The communication volume of the column based schemes represented in box A has been improved using smarter ways of mapping the matrix columns onto processors, such as, the subtree to subcube mapping [15] (box B) A number of column based parallel factorization algorithms [36, 37, 4, 46, 47, 53, 13, 16, 10, 25, 22, 23, 48, 51, 38] have a lower bound of #(Np) on the total communication volume [16] Since the overall computation is only #(N 1.5 ) 14] the ratio of communication to computation of ....

[Article contains additional citation context not shown here]

A. George, J. W.-H. Liu, and E. G.-Y. Ng. Communication reduction in parallel sparse Cholesky factorization on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors 1987, pages 576--586. SIAM, Philadelphia, PA, 1987.


Scheduling Strategies For Sparse Cholesky Factorization On.. - Hahad, Erhel, Priol (1994)   (Correct)

....and so, a large grain model can be used. For distributed memory machines, the first experiments on the Fan In algorithm [1] were carried out on matrices arising from nine point finite difference operators on rectangular grids. The nested dissection ordering [6] and the subtree to subcube mapping [7] are used, achieving low fill in and good load balance for the problems under consideration. Thus, the performances obtained are better than for the distributed Fan Out algorithm. Indeed, the latter is used to solve finite elements problems arising from L shaped triangular meshes. A simple ....

Alan GEORGE, Joseph LIU, and Esmond NG. Communication reduction in parallel sparse cholesky factorization on a hypercube. pages 576-- 586, Hypercube, 1987.


A New Approach to Parallel Sparse Cholesky Factorization on .. - Hahad, Erhel, Priol (1993)   (Correct)

....and so, a large grain model can be used. For distributed memory machines, the first experiments on the Fan In algorithm [1] were carried out on matrices arising from nine point finite difference operators on rectangular grids. The nested dissection ordering [6] and the subtree to subcube mapping [7] are used, achieving low fill in and good load balance for the problems under consideration. Thus, the performances obtained are better than for the distributed Fan Out algorithm. Indeed, the latter is used to solve finite elements problems arising from L shaped triangular meshes. A simple ....

Alan GEORGE, Joseph LIU, and Esmond NG. Communication reduction in parallel sparse cholesky factorization on a hypercube. pages 576--586, Hypercube 1987. A New Approach to Parallel Sparse Cholesky Factorization on DMPCs 17


Highly Scalable Parallel Algorithms for Sparse Matrix.. - Gupta, Karypis, Kumar (1995)   (39 citations)  (Correct)

....factorization used extensively in practice, their use for solving large sparse systems has been mostly confined to big vector supercomputers due to its high time and memory requirements. As a result, parallelization of sparse Cholesky factorization has been the subject of intensive research [27, 59, 12, 15, 14, 18, 58, 41, 42, 3, 53, 54, 61, 9, 29, 27, 28, 55, 2, 1, 45, 62, 16, 59, 44, 34, 5, 43, 4, 63]. We have developed highly scalable formulations of sparse Cholesky factorization that substantially improve the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. It is well known that dense matrix factorization can be ....

.... of this type on p processors results in an O(Np log N) total communication volume [15] box A) The communication volume of the column based schemes represented in box A has been improved using smarter ways of mapping the matrix columns onto processors, such as, the subtree to subcube mapping [14] (box B) A number of column based parallel factorization algorithms [41, 42, 3, 53, 54, 61, 12, 9, 29, 27, 59, 44, 5] 1 In [48] Pan and Reif describe a parallel sparse matrix factorization algorithm for a PRAM type architecture. This algorithm is not cost optimal (i.e. the processor time ....

A. George, J. W.-H. Liu, and E. G.-Y. Ng. Communication reduction in parallel sparse Cholesky factorization on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors 1987, pages 576--586. SIAM, Philadelphia, PA, 1987.


Analysis and Design of Scalable Parallel Algorithms for Scientific .. - Gupta (1995)   (2 citations)  (Correct)

....due to their high time and memory requirements. Parallel processing offers the potential to tackle both these problems; however, despite intensive research, only limited success had been achieved until recently in developing scalable parallel formulations of sparse matrix factorization [63, 123, 44, 47, 46, 95, 96, 10, 114, 115, 127, 41, 69, 63, 64, 117, 8, 7, 104, 140, 49, 123, 103]. We have developed a highly parallel sparse Cholesky factorization algorithm that substantially improves the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. We show that our algorithm is just as scalable as dense matrix ....

....overview of the performance and scalability of parallel algorithms for factorization of sparse matrices resulting from two dimensional N node grid graphs. Box D represents our algorithm, which is a significant improvement over other known classes of algorithms for this problem. to subcube mapping [46] (box B) A number of column based parallel factorization algorithms [95, 96, 10, 114, 115, 127, 44, 47, 41, 69, 63, 64, 117, 123, 103] have a lower bound of #(Np) on the total communication volume [47] Since the overall computation is only #(N 1.5 ) 45] the ratio of communication to ....

[Article contains additional citation context not shown here]

A. George, J. W.-H. Liu, and E. G.-Y. Ng. Communication reduction in parallel sparse Cholesky factorization on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors 1987, pages 576--586. SIAM, Philadelphia, PA, 1987.


A Scalable Parallel Algorithm for Sparse Matrix Factorization - Gupta, Kumar (1994)   (7 citations)  (Correct)

.... of this type on p processors results in an O(Np log N) total communication volume [17] box A) The communication volume of the column based schemes represented in box A has been improved using smarter ways of mapping the matrix columns onto processors, such as, the subtree to subcube mapping [16] (box B) A number of column based parallel factorization algorithms [32, 33, 4, 43, 44, 49, 14, 11, 24, 23, 47, 36] have a lower bound of O(Np) on the total communication volume. Since the overall computation is only O(N 1:5 ) 15] the ratio of communication to computation of column based ....

.... of rows and columns of the matrix among the processors must follow some sort of cyclic order to ensure proper load balance, and (3) communication must be localized among as few processors as possible at every stage of elimination by following a subtree to subcube type work distribution strategy [16]. Ours is the only implementation we know of that satisfies all of the above conditions. Through a preliminary implementation on nCUBE2, we have demonstrated the feasibility of using highly parallel computers for numerical factorization of sparse matrices. In [25] we have applied our algorithm to ....

A. George, J. W.-H. Liu, and E. G.-Y. Ng. Communication reduction in parallel sparse Cholesky factorization on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors 1987, pages 576--586. SIAM, Philadelphia, PA, 1987.


Parallel Algorithms for Forward Elimination and Backward.. - Gupta, Kumar (1995)   (6 citations)  (Correct)

....two dimensional partitioning among groups of processors. However, this distribution is not suitable for the triangular solvers, which are scalable only with a one dimensional partitioning of the supernodal blocks of L. We show that if the supernodes are distributed in a subtree to subcube manner [2] then the cost of converting the two dimensional distribution to a one dimensional distribution is only a constant times the cost of solving the triangular systems. From our experiments, we observed that this constant is fairly small on the Cray T3D at most 0.9 for a single righthand side vector ....

....by the corresponding matrix operations. 2. 1 Forward elimination The basic approach to forward elimination is very similar to that of multifrontal numerical factorization [12] guided by an elimination tree [13, 8] with the distribution of computation determined by a subtree to subcube mapping [2]. A symmetric sparse matrix, its lower triangular Cholesky factor, and the corresponding elimination tree with subtree to subcube mapping onto 8 processors is shown in Figure 1. The computation in forward elimination starts with the leaf supernodes of the elimination tree and progresses upwards to ....

A. George, J. W.-H. Liu, and E. G.-Y. Ng. Communication reduction in parallel sparse Cholesky factorization on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors 1987, pages 576--586. SIAM, Philadelphia, PA, 1987.


Fast and Effective Algorithms for Graph Partitioning and Sparse.. - Gupta (1996)   (30 citations)  (Correct)

....works quite well in practice even for matrices with arbitrary sparsity patterns. In addition, graph bisection yields two submatrices of the original matrix that can be factored independently in parallel. This is the basis of many efficient parallel algorithms for sparse matrix factorization [15, 16, 28]. 4.1 Graph bisection A key step in our ordering algorithm is finding a small node bisector of a graph. This can be accomplished by the heuristics described in Section 3 with some modifications to the coarsening and refinement strategies. The heavy and heaviest edge coarsening strategies ....

Alan George, Joseph W.-H. Liu, and Esmond G.-Y. Ng. Communication reduction in parallel sparse Cholesky factorization on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors 1987, pages 576--586. SIAM, Philadelphia, PA, 1987.


Highly Scalable Parallel Algorithms for Sparse Matrix.. - Gupta, Karypis, Kumar (1995)   (39 citations)  (Correct)

No context found.

A. George, J. W.-H. Liu, and E. G.-Y. Ng. Communication reduction in parallel sparse Cholesky factorization on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors 1987.


A Clustering Algorithm For Parallel Sparse Cholesky.. - Kumar, Eswar..   (Correct)

No context found.

A. George, J.W.-H. Liu, and E.G.-Y. Ng. Communication reduction in parallel sparse cholesky factorization on a hypercube. In Hypercube Multiprocessors, pages 576--586, 1987.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC