| Cleve Ashcraft. The fan-both family of column-based distributed cholesky factorization algorithms. In A. George, John R. Gilbert, and J. W.-H. Liu, editors, Graph Theory and Sparse Matrix Computations. Springer-Verlag, New York, NY, 1993. |
....factorization used extensively in practice, their use for solving large sparse systems has been mostly confined to big vector supercomputers due to its high time and memory requirements. As a result, parallelization of sparse Cholesky factorization has been the subject of intensive research [26, 55, 12, 15, 14, 18, 54, 40, 41, 3, 49, 50, 57, 9, 28, 26, 27, 51, 2, 1, 44, 58, 16, 55, 43, 33, 5, 42, 4, 59]. We have developed highly scalable formulations of sparse Cholesky factorization that substantially improve the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. It is well known that dense matrix factorization can be ....
....of O(Np) on the total communication volume [15] Since the overall computation is only O(N 1:5 ) 13] the ratio of communication to computation of column based schemes is quite high. As a result, these column cased schemes scale very poorly as the number of processors is increased [55, 53] In [2], Ashcraft proposes a fan both family of parallel Cholesky factorization algorithms that have a total communication volume of Theta(N p p log N ) Although the communication volume is less than the other column based partitioning schemes, the isoefficiency function of Ashcraft s algorithm is ....
Cleve Ashcraft. The fan-both family of column-based distributed Cholesky factorization algorithms. In Alan George, John R. Gilbert, and Joseph W.-H. Liu, editors, Graph Theory and Sparse Matrix Computations. Springer-Verlag, New York, NY, 1993.
.... P i Gamma1 j=1 l ij L(i : n; j) L(i : n; i) 1 p l ii L(i : n; i) enddo do i = 1; n do j 2 Struct(L i ) cmod(i; j) enddo cdiv(i) enddo (a) b) Figure 1: Column Cholesky Factorization However, the elimination tree produced using the minimum degree ordering is usually skewed [1]. The central concept in one way dissection is the removal of a small set of nodes termed as the separator from the graph of the sparse matrix, that leaves the remaining graph in two or more components. The nodes in the components are numbered first, followed by the nodes in the separator. This ....
C. Ashcraft. The Fan-Both Family of Column-Based Distributed Cholesky Factorization Algorithms. Technical report, Boeing Computer Services, Dec. 1992.
....due to their high time and memory requirements. Parallel processing offers the potential to tackle both these problems; however, despite intensive research, only limited success had been achieved until recently in developing scalable parallel formulations of sparse matrix factorization [22, 51, 13, 16, 15, 36, 37, 4, 46, 47, 53, 10, 25, 22, 23, 48, 2, 1, 39, 54, 17, 51, 38]. We have developed a highly parallel sparse Cholesky factorization algorithm that substantially improves the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. We show that our algorithm is just as scalable as dense matrix ....
....of #(Np) on the total communication volume [16] Since the overall computation is only #(N 1.5 ) 14] the ratio of communication to computation of column based schemes is quite high. As a result, these column cased schemes scale very poorly as the number of processors is increased [51, 50] In [2], Ashcraft proposes a fan both family of parallel Cholesky factorization algorithms that have a total communication volume of #(N # p log N) Although the communication volume is less than the other columnbased partitioning schemes, the isoefficiency function of Ashcraft s algorithm is still #(p ....
Cleve Ashcraft. The fan-both family of column-based distributed cholesky factorization algorithms. In A. George, John R. Gilbert, and J. W.-H. Liu, editors, Graph Theory and Sparse Matrix Computations. Springer-Verlag, New York, NY, 1993.
....consumption due to the storage of aggregates, these structures are allocated only when needed and are deallocated just after being sent. If memory is a critical issue, an aggregate update column block can be sent with partial aggregation to free memory space; this is close to the Fan Both scheme [2]. 3 Notations for a given k, 1 k N : r k : number of contributions yet to subtract to column block k, r o k : initial value of r k , s k : number of contributions yet to add to AUCB k , s o k : initial value of s k , symbol means 8j 2 BStruct(L k ) let j k; ....
C. Ashcraft. The fan-both family of column-based distributed Cholesky factorization algorithms. Graph Theory and Sparse Matrix Computation, IMA, Springer-Verlag, 56:159-190, 1993. 9
....versions of distributed memory parallel solvers are based on a static mapping of the tasks and of the data and do not allow either numerical pivoting or task migration during numerical factorization. Assuredly, among the other work on distributed memory sparse direct solvers of which we are aware [7, 10, 11, 21, 22, 23], we do not know of any with the same capabilities as the MUMPS solver. The current version of our package provides a large range of options (assembled, assembled distributed, and elemental input format, determination of null space basis and rank deficiency, return of Schur complement matrix, and ....
....is simply due to the fact that the frontal matrix structure contains, by definition, all the variables adjacent to all the fully summed variables of the front. As a consequence, element matrices need not be split during the assembly process. Note that, for classical fan in and fan out approaches [7], this property does not hold since the positions of the element matrices to be assembled are not restricted to fully summed rows and columns. The main modifications that we had to make to our implementation for assembled matrices lie in the analysis, the distribution of the matrix, and the ....
C. Ashcraft. The fan-both family of column-based distributed Cholesky factorisation algorithm. In J.R. Gilbert and J.W.H Liu, editors, Graph theory and Sparse matrix Computations, pages 159--190. SpringerVerlag NY, 1993.
....locally in block structures. This scheme is close to the Fan In algorithm [4] as processors communicate using only aggregated update blocks. If memory is a critical issue, an aggregated update block can be sent with partial aggregation to free memory space; this is close to the Fan Both scheme [2]. So, a block j in column block k will receive an aggregated update block only from every processor in set P rocs(L jk ) fmap(j; i) j i 2 BStruct(L k ) and j 2 BStruct(L i )g, where the map( operator is the 2D block mapping function. These aggregated update blocks, denoted in the following ....
C. Ashcraft. The fan-both family of column-based distributed Cholesky factorization algorithms. Graph Theory and Sparse Matrix Computation, IMA, Springer-Verlag, 56:159-190, 1993.
....factorization used extensively in practice, their use for solving large sparse systems has been mostly confined to big vector supercomputers due to its high time and memory requirements. As a result, parallelization of sparse Cholesky factorization has been the subject of intensive research [27, 59, 12, 15, 14, 18, 58, 41, 42, 3, 53, 54, 61, 9, 29, 27, 28, 55, 2, 1, 45, 62, 16, 59, 44, 34, 5, 43, 4, 63]. We have developed highly scalable formulations of sparse Cholesky factorization that substantially improve the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. It is well known that dense matrix factorization can be ....
....of O(Np) on the total communication volume [15] Since the overall computation is only O(N 1.5 ) 13] the ratio of communication to computation of column based schemes is quite high. As a result, these column cased schemes scale very poorly as the number of processors is increased [59, 57] In [2], Ashcraft proposes a fan both family of parallel Cholesky factorization algorithms that have a total communication volume of #(N # p log N) Although the communication volume is less than the other columnbased partitioning schemes, the isoefficiency function of Ashcraft s algorithm is still #(p ....
Cleve Ashcraft. The fan-both family of column-based distributed cholesky factorization algorithms. In A. George, John R. Gilbert, and J. W.-H. Liu, editors, Graph Theory and Sparse Matrix Computations. Springer-Verlag, New York, NY, 1993.
....due to their high time and memory requirements. Parallel processing offers the potential to tackle both these problems; however, despite intensive research, only limited success had been achieved until recently in developing scalable parallel formulations of sparse matrix factorization [63, 123, 44, 47, 46, 95, 96, 10, 114, 115, 127, 41, 69, 63, 64, 117, 8, 7, 104, 140, 49, 123, 103]. We have developed a highly parallel sparse Cholesky factorization algorithm that substantially improves the state of the art in parallel direct solution of sparse linear systems both in terms of scalability and overall performance. We show that our algorithm is just as scalable as dense matrix ....
....of #(Np) on the total communication volume [47] Since the overall computation is only #(N 1.5 ) 45] the ratio of communication to computation of column based schemes is quite high. As a result, these columncased schemes scale very poorly as the number of processors is increased [123, 120] In [8], Ashcraft proposes a fan both family of parallel Cholesky factorization algorithms that have a total communication volume of #(N # p log N) Although the communication volume is less than the other column based partitioning schemes, the isoefficiency function of Ashcraft s algorithm is still #(p ....
Cleve Ashcraft. The fan-both family of column-based distributed cholesky factorization algorithms. In A. George, John R. Gilbert, and J. W.-H. Liu, editors, Graph Theory and Sparse Matrix Computations. Springer-Verlag, New York, NY, 1993.
....bound of O(Np) on the total communication volume. Since the overall computation is only O(N 1:5 ) 15] the ratio of communication to computation of column based schemes is quite high. As a result, these column cased schemes scale very poorly as the number of processors is increased [47, 46] In [2], Ashcraft proposes a fan both family of parallel Cholesky factorization algorithms that have a total communication volume of O(N p p log N ) A few schemes with two dimensional partitioning of the matrix have been proposed [46, 45, 1, 37, 50, 18] and the total communication volume in the best ....
Cleve Ashcraft. The fan-both family of column-based distributed cholesky factorization algorithms. In A. George, John R. Gilbert, and J. W.-H. Liu, editors, Graph Theory and Sparse Matrix Computations. Springer-Verlag, New York, NY, 1993.
....for the multifrontal algorithm into the unifying framework as described below. The block fan out algorithm presented in [17] corresponds to an index based mapping scheme where PA is unity, i.e. it is a F Theta S scheme, using the notation of this paper. The fan both algorithm presented in [2] corresponds to a mapping scheme where PF is unity. Nested mapping is not used in [2] but can easily be used as shown here. In a multifrontal algorithm, each node in the elimination tree is associated with a dense lower triangular matrix called its front. The front consists of the column itself ....
....The block fan out algorithm presented in [17] corresponds to an index based mapping scheme where PA is unity, i.e. it is a F Theta S scheme, using the notation of this paper. The fan both algorithm presented in [2] corresponds to a mapping scheme where PF is unity. Nested mapping is not used in [2], but can easily be used as shown here. In a multifrontal algorithm, each node in the elimination tree is associated with a dense lower triangular matrix called its front. The front consists of the column itself and one column for each of its target columns. The latter columns are said to form an ....
C. Ashcraft, "The fan-both family of column-based distributed Cholesky factorization algorithms, " Boeing Computer Services, Technical Report No. MEA-TR-208, 1992.
....basis for rank deficient matrices, and can return a Schur complement matrix. It contains classical pre and postprocessing facilities; for example, matrix scaling, iterative refinement, and error analysis. Among the other work on distributed memory sparse direct solvers of which we are aware [7, 10, 12, 22, 23, 24], we do not know of any with the same capabilities as the MUMPS solver. Because of the difficulty of handling dynamic data structures efficiently, most distributed memory approaches do not perform numerical pivoting during the factorization phase. Instead, they are based on a static mapping of the ....
....process. This is due to the fact that the frontal matrix structure contains, by definition, all the variables adjacent to any fully summed variable of the front. As a consequence, element matrices need not be split during the assembly process. Note that, for classical fan in and fan out approaches [7], this property does not hold since the positions of the element matrices to be assembled are not restricted to fully summed rows and columns. The main modifications that we had to make to our algorithms for assembled matrices to accommodate unassembled matrices lie in the analysis, the ....
C. Ashcraft. The fan-both family of column-based distributed Cholesky factorisation algorithms. In J. R. Gilbert and J. W. H Liu, editors, Graph Theory and Sparse Matrix Computations, pages 159--190. SpringerVerlag NY, 1993.
....merged as necessary throughout the algorithm, before ultimately being incorporated into each affected target column. Before stating the multifrontal algorithm in detail, however, we need to introduce a bit more machinery. A different type of hybrid between fan out and fan in algorithms is given in (Ashcraft, 1992). Elimination Tree To help in analyzing the sparse factorization process, we introduce the concept of an elimination tree, which is defined by the fol lowing parent relationship: parent(j) min fi 2 Struct(L j )g; if Struct(L j ) 6= j otherwise: Thus, parent(j) is the row index of the ....
Ashcraft, C., 1992. "The fan-both family of column-based distributed Cholesky factorization algorithms," Tech. Rept. MEA- TR-208, Boeing Computer Services, Seattle, WA.
....box A has been improved using smarter ways of mapping the matrix columns onto processors, such as, the subtree to subcube mapping [11] box B) A number of column based parallel factorization algorithms [21, 3, 25, 9, 8, 15, 14, 28] have a lower bound of O(Np) on the total communication volume. In [1], Ashcraft proposes a fan both family of parallel Cholesky factorization algorithms that have a total communication volume of O(N p p log N ) A few schemes with two dimensional partitioning of the matrix have been proposed [22, 29, 27, 26] and the total communication volume in the best of ....
Cleve Ashcraft. The fan-both family of column-based distributed cholesky factorization algorithms. In A. George, John R. Gilbert, and J. W.-H. Liu, editors, Graph Theory and Sparse Matrix Computations. Springer-Verlag, New York, NY, 1993.
....2 34 1800 323 110 44 19 10 6 4 3 40 2525 466 154 61 27 13 7 5 4 48 3690 679 224 88 39 18 10 6 4 56 5076 932 308 121 53 25 12 7 5 There is another method that maintains the 1 D decomposition for the fronts (which is crucial for pivoting) but requires O( # pn 4 ) communication. The fan both method [2] communicates T q J aggregate update matrices, but also U J,#J and L #J,J factor submatrices. It is parameterized via the relation p = p 1 p 2 , where p is the number of processors. The fanin method is a special case where p 1 = 1. The fan out method [11] is a special case where p 2 = 1. ....
C. Ashcraft, The fan-both family of column-based distributed Cholesky factorization algorithms, in Graph Theory and Sparse Matrix Computation, Springer-Verlag, 1993, pp. 159--190.
No context found.
Cleve Ashcraft. The fan-both family of column-based distributed cholesky factorization algorithms. In A. George, John R. Gilbert, and J. W.-H. Liu, editors, Graph Theory and Sparse Matrix Computations. Springer-Verlag, New York, NY, 1993.
No context found.
Cleve Ashcraft. The fan-both family of column-based distributed Cholesky factorization algorithms. In Alan George, John R. Gilbert, and Joseph W.-H. Liu, editors, Graph Theory and Sparse Matrix Computations. Springer-Verlag, New York, NY, 1993.
No context found.
Cleve Ashcraft. The fan-both family of column-based distributed Cholesky factorization algorithms. In Alan George, John R. Gilbert, and Joseph W.-H. Liu, editors, Graph Theory and Sparse Matrix Computations. Springer-Verlag, New York, NY, 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC