| Alex Pothen and Chunguang Sun. Distributed multifrontal factorization using clique trees. In Jack Dongarra, Ken Kennedy, Paul Messina, Danny C. Sorensen, and Robert G. Voigt, editors, Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, Houston, Texas, USA, March 25-27, 1991. |
....of the input matrix. An early processor mapping algorithm which attempts to reduce inter process communication is subtree to subcube [34, 36] which works well for balanced tree topologies. Later research efforts have improved upon the load balancing aspects for more general elimination trees [29, 32, 50, 55, 75, 76, 81]. Our static partitioning heuristic shares similarities with several of these methods. In less predictable environments than a dedicated homogeneous system, or even in less efficient communication architectures where contention induces large communication imbalances (e.g. software shared memory ....
A. Pothen and C. Sun, "Distributed Multifrontal Factorization Using Clique Trees", pp. 3440, in Proc. Fifth SIAM Conference on Parallel Processing for Scientific Computation, Houston, TX, 1991.
....factorization used extensively in practice, their use for solving large sparse systems has been mostly confined to big vector supercomputers due to its high time and memory requirements. As a result, parallelization of sparse Cholesky factorization has been the subject of intensive research [26, 55, 12, 15, 14, 18, 54, 40, 41, 3, 49, 50, 57, 9, 28, 26, 27, 51, 2, 1, 44, 58, 16, 55, 43, 33, 5, 42, 4, 59]. We have developed highly scalable formulations of sparse Cholesky factorization that substantially improve the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. It is well known that dense matrix factorization can be ....
....other known classes of algorithms for this problem. 3 the column based schemes represented in box A has been improved using smarter ways of mapping the matrix columns onto processors, such as, the subtree to subcube mapping [14] box B) A number of column based parallel factorization algorithms [40, 41, 3, 49, 50, 57, 12, 9, 28, 26, 55, 43, 5] have a lower bound of O(Np) on the total communication volume [15] Since the overall computation is only O(N 1:5 ) 13] the ratio of communication to computation of column based schemes is quite high. As a result, these column cased schemes scale very poorly as the number of processors is ....
Alex Pothen and Chunguang Sun. Distributed multifrontal factorization using clique trees. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 34--40, 1991.
....due to their high time and memory requirements. Parallel processing offers the potential to tackle both these problems; however, despite intensive research, only limited success had been achieved until recently in developing scalable parallel formulations of sparse matrix factorization [22, 51, 13, 16, 15, 36, 37, 4, 46, 47, 53, 10, 25, 22, 23, 48, 2, 1, 39, 54, 17, 51, 38]. We have developed a highly parallel sparse Cholesky factorization algorithm that substantially improves the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. We show that our algorithm is just as scalable as dense matrix ....
.... volume [16] box A) The communication volume of the column based schemes represented in box A has been improved using smarter ways of mapping the matrix columns onto processors, such as, the subtree to subcube mapping [15] box B) A number of column based parallel factorization algorithms [36, 37, 4, 46, 47, 53, 13, 16, 10, 25, 22, 23, 48, 51, 38] have a lower bound of #(Np) on the total communication volume [16] Since the overall computation is only #(N 1.5 ) 14] the ratio of communication to computation of column based schemes is quite high. As a result, these column cased schemes scale very poorly as the number of processors is ....
[Article contains additional citation context not shown here]
Alex Pothen and Chunguang Sun. Distributed multifrontal factorization using clique trees. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 34--40, 1991.
....it uses the highly scalable two dimensional grid partitioning for dense matrix factorization for each supernodal computation in the multifrontal algorithm. As a result, the communication overhead of this scheme is the lowest of all other known parallel formulations for sparse matrix factorization [24, 25, 1, 31, 32, 39, 8, 38, 17, 33, 37, 3, 6, 18, 15, 40, 26, 12, 36, 35]. In fact, asymptotically, the isoefficiency of this scheme is O(p 1:5 ) for sparse matrices arising out of two and three dimensional finite element problems on a wide variety of architectures such as hypercube, mesh, fat tree, and three dimensional torus. Note that the isoefficiency of the ....
Alex Pothen and Chunguang Sun. Distributed multifrontal factorization using clique trees. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 34--40, 1991.
....it uses the highly scalable two dimensional grid partitioning for dense matrix factorization for each supernodal computation in the multifrontal algorithm. As a result, the communication overhead of this scheme is the lowest of all other known parallel formulations for sparse matrix factorization [1, 3, 6, 9, 12, 14, 24, 25, 26, 27, 28, 19, 29]. In fact, asymptotically, the isoefficiency of this scheme is O(p 1.5 ) for sparse matrices arising out of two and threedimensional finite element problems on a wide variety of architectures such as hypercube, mesh, fat tree, and threedimensional torus. Note that the isoefficiency of the best ....
Alex Pothen and Chunguang Sun. Distributed multifrontal factorization using clique trees. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 34--40, 1991.
....factorization used extensively in practice, their use for solving large sparse systems has been mostly confined to big vector supercomputers due to its high time and memory requirements. As a result, parallelization of sparse Cholesky factorization has been the subject of intensive research [27, 59, 12, 15, 14, 18, 58, 41, 42, 3, 53, 54, 61, 9, 29, 27, 28, 55, 2, 1, 45, 62, 16, 59, 44, 34, 5, 43, 4, 63]. We have developed highly scalable formulations of sparse Cholesky factorization that substantially improve the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. It is well known that dense matrix factorization can be ....
.... volume [15] box A) The communication volume of the column based schemes represented in box A has been improved using smarter ways of mapping the matrix columns onto processors, such as, the subtree to subcube mapping [14] box B) A number of column based parallel factorization algorithms [41, 42, 3, 53, 54, 61, 12, 9, 29, 27, 59, 44, 5] 1 In [48] Pan and Reif describe a parallel sparse matrix factorization algorithm for a PRAM type architecture. This algorithm is not cost optimal (i.e. the processor time product exceeds the serial complexity of sparse matrix factorization) and is not included in the classification given in ....
Alex Pothen and Chunguang Sun. Distributed multifrontal factorization using clique trees. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 34--40, 1991.
....due to their high time and memory requirements. Parallel processing offers the potential to tackle both these problems; however, despite intensive research, only limited success had been achieved until recently in developing scalable parallel formulations of sparse matrix factorization [63, 123, 44, 47, 46, 95, 96, 10, 114, 115, 127, 41, 69, 63, 64, 117, 8, 7, 104, 140, 49, 123, 103]. We have developed a highly parallel sparse Cholesky factorization algorithm that substantially improves the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. We show that our algorithm is just as scalable as dense matrix ....
....of sparse matrices resulting from two dimensional N node grid graphs. Box D represents our algorithm, which is a significant improvement over other known classes of algorithms for this problem. to subcube mapping [46] box B) A number of column based parallel factorization algorithms [95, 96, 10, 114, 115, 127, 44, 47, 41, 69, 63, 64, 117, 123, 103] have a lower bound of #(Np) on the total communication volume [47] Since the overall computation is only #(N 1.5 ) 45] the ratio of communication to computation of column based schemes is quite high. As a result, these columncased schemes scale very poorly as the number of processors is ....
[Article contains additional citation context not shown here]
Alex Pothen and Chunguang Sun. Distributed multifrontal factorization using clique trees. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 34--40, 1991.
.... volume [17] box A) The communication volume of the column based schemes represented in box A has been improved using smarter ways of mapping the matrix columns onto processors, such as, the subtree to subcube mapping [16] box B) A number of column based parallel factorization algorithms [32, 33, 4, 43, 44, 49, 14, 11, 24, 23, 47, 36] have a lower bound of O(Np) on the total communication volume. Since the overall computation is only O(N 1:5 ) 15] the ratio of communication to computation of column based schemes is quite high. As a result, these column cased schemes scale very poorly as the number of processors is ....
....problem sizes. These speedups are computed with respect to a very efficient serial implementation of the multifrontal algorithm. To lend credibility to our speedup figures, we compared the run times of our program on a single processor with the single processor run times given for iPSC 2 in [43] and [49] The nCUBE2 processors are about 2 to 3 times faster than iPSC 2 processors and our serial implementation, with respect to which the speedups are computed, is 4 to 5 times faster than the one in [43] and [49] Our single processor run times are four times less than the single processor ....
[Article contains additional citation context not shown here]
Alex Pothen and Chunguang Sun. Distributed multifrontal factorization using clique trees. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 34--40, 1991.
....the irregular sparse structure of the matrices makes it difficult to partition and map the sparse matrix to a distributed architecture in a way that minimizes communication costs and minimizes the total execution time of the parallel computation. This issue has been addressed by Pothen and Sun [5], who have developed a distributed algorithm for the multifrontal method that uses a proportional mapping scheme for assigning tasks in a clique tree to processors. In this paper we study the performance characteristics of a partitioning and mapping strategy, also based on general tree and graph ....
A. Pothen and C. Sun, Distributed multifrontal factorization using clique trees, in Proceedings of 5th SIAM Conference on Parallel Processing for Scientific Computing, J. Dongarra, K. Kennedy, P. Messina, D. C. Sorenson, and R. G. Voigt, eds., SIAM, Philadelphia, 1992, pp. 34--40.
.... factorization algorithm has been analyzed in [19] It is shown there that, for matrices whose corresponding graphs are planar, the time spent by each processor for communication is Theta(m= p p) This overhead is smaller than the overheads of other schemes for parallel Cholesky factorization [39, 40, 1, 54, 55, 62, 15, 14, 23, 20, 59, 65, 47, 18, 57]. The experimental results in [19] suggest that our scheme is superior to other existing schemes even for non planar graphs. But the communication overhead for non planar graphs should be somewhat higher than Theta(m= p p) Hence Theta(m= p p) can be taken as a lower bound. Recall from ....
A. Pothen and C. Sun. Distributed multifrontal factorization using clique trees. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 34--40, 1991.
.... volume [11] box A) The communication volume of the columnbased schemes represented in box A has been improved using smarter ways of mapping the matrix columns onto processors, such as, the subtree to subcube mapping [11] box B) A number of column based parallel factorization algorithms [21, 3, 25, 9, 8, 15, 14, 28] have a lower bound of O(Np) on the total communication volume. In [1] Ashcraft proposes a fan both family of parallel Cholesky factorization algorithms that have a total communication volume of O(N p p log N ) A few schemes with two dimensional partitioning of the matrix have been proposed ....
....and with O(p 1:5 (log p) 3 ) curve (the lower bound on the isoefficiency function of the best known parallel sparse factorization algorithm until now) The four data points on the curves correspond to the matrices GRID63x63, GRID103x95, GRID175x127, and GRID223x207. given for iPSC 2 in [25]. The nCUBE2 processors are about 2 to 3 times faster than iPSC 2 processors and our serial implementation, with respect to which the speedups are computed, is 4 to 5 times faster than the one in [25] Our single processor run times are four times less than the single processor run times on iPSC 2 ....
[Article contains additional citation context not shown here]
Alex Pothen and Chunguang Sun. Distributed multifrontal factorization using clique trees. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 34--40, 1991.
No context found.
Alex Pothen and Chunguang Sun. Distributed multifrontal factorization using clique trees. In Jack Dongarra, Ken Kennedy, Paul Messina, Danny C. Sorensen, and Robert G. Voigt, editors, Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, Houston, Texas, USA, March 25-27, 1991.
No context found.
Alex Pothen and Chunguang Sun. Distributed multifrontal factorization using clique trees. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 34--40, 1991.
No context found.
Alex Pothen and Chunguang Sun. Distributed multifrontal factorization using clique trees. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 34--40, 1991.
No context found.
Alex Pothen and Chunguang Sun. Distributed multifrontal factorization using clique trees. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 34--40, 1991.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC