| A. George, M. Heath, J. Liu and E. Ng, Solution of sparse positive definite systems on a shared memory multiprocessor, Int. J. Parallel Programming, 15, pp.309-325, 1986. |
....[4] for a distributed memory system. We adapted this algorithm for the modified Cholesky factorization on shared memory systems. We will refer to this algorithm as RLA (right looking algorithm) A parallel version of the left looking Cholesky factorization on a shared memory system can be found in [3]. In this version a linked list called link is used to store the columns that are ready to modify other columns. As we showed in [11] the linked list does not extract all the existing parallelism. We propose the use of a set of n queues instead. In parallel implementations, a task queue that ....
A. George, M. Heath, J. Liu and E. Ng, Solution of sparse positive definite systems on a shared memory multiprocessor, Int. J. Parallel Programming, 15, pp.309-325, 1986.
....than for the distributed Fan Out algorithm. Indeed, the latter is used to solve finite elements problems arising from L shaped triangular meshes. A simple wrap around task assignment is used after a nested dissection ordering of the test matrices [5] For shared memory machines, George et al. [4] use straightforwardly the large grain task model in the Fan In algorithm. However, they do not enhance the parallelism by a post ordering of the elimination tree. In a quite different way, Rothberg et al. 18] take advantage of a larger task grain size : a task is the completion of a set of ....
Alan GEORGE, Michael HEATH, Joseph LIU, and Esmond NG. Solution of sparse positive definite systems on a shared memory multiprocesor. International journal of parallel programming, 309-- 325, 1986.
....matrix kernels [6, 7] to factorize E k : the Level 2 BLAS for simple nodes (g k = 1) or the Level 3 BLAS for supernodes (g k 1) The classical multifrontal method is not the only method based on the elimination tree or its variants. In the sparse column Cholesky factorization of George et al. [19], the work at node k in the elimination tree is the computation of column k of L. The work at node k modifies column k with columns corresponding to a subset of the descendants of node k in the elimination tree. This is in contrast to the classical multifrontal method, in which data is assembled ....
A. George, M. T. Heath, J. W. H. Liu, and E. Ng. Solution of sparse positive definite systems on a shared-memory multiprocessor. International Journal of Parallel Programming, 15(4):309--325, 1986.
....than for the distributed Fan Out algorithm. Indeed, the latter is used to solve finite elements problems arising from L shaped triangular meshes. A simple wrap around task assignment is used after a nested dissection ordering of the test matrices [5] For shared memory machines, George et al. [4] use straightforwardly the large grain task model in the Fan In algorithm. However, they do not enhance the parallelism by a post ordering of the elimination tree. In a quite different way, Rothberg et al. 16] take advantage of a larger task grain size : a task is the completion of a set of ....
Alan GEORGE, Michael HEATH, Joseph LIU, and Esmond NG. Solution of sparse positive definite systems on a shared memory multiprocesor. International journal of parallel programming, 309--325, 1986.
....Finally, we know of only one application that specifically requires the row counts rather than the column counts. The row counts are the numbers of column modifications (sparse SAXPY s) required to complete each column in sparse Cholesky factorization algorithms. Some parallel implementations [13, 14] need the row counts to tell when all the modifications have arrived for each column. 1.2 Previous work Like many combinatorial algorithms in sparse matrix factorization, all the efficient algorithms for row and column counts begin by computing the elimination tree of the matrix (defined in the ....
A. George, M. T. Heath, J. W-H. Liu, and E. G-Y. Ng. Solution of sparse positive definite systems on a shared memory multiprocessor. Internat. J. Parallel Programming, 15:309--325, 1986.
....dense matrix counterparts [20] Parallel sparse matrix solver performance generally is less than similar dense matrix solvers even though there is more inherent parallelism in sparse matrix algorithms than dense matrix algorithms. This additional parallelism is often described by elimination trees [14, 15, 16, 20, 37, 38, 39, 40, 41, 45], graphs that illustrate the dependencies in the calculations. Parallel sparse linear solvers can simultaneously factor entire groups of mutually independent contiguous blocks of columns or rows without communications; meanwhile, dense linear solvers can only update blocks of contiguous columns or ....
....parallel calculations with no additional parallel communications overhead. 1. 2 Block Diagonal Bordered Direct Linear Solvers Block diagonal bordered sparse matrix algorithms require modifications to the normal preprocessing phase described in numerous papers on parallel Choleski factorization [14, 15, 16, 20, 37, 38, 39, 40, 41, 45]. Each of the numerous papers referenced above use the paradigm to order the sparse matrix and then perform symbolic factorization in order to determine the locations of all fillin values so that static data structures can be utilized for maximum efficiency when performing numerical factorization. ....
[Article contains additional citation context not shown here]
A. George, M. T. Heath, J. Liu, and E. Ng. Solution of Sparse Positive Definite Systems on a Shared-Memory Multiprocessor. International Journal of Parallel Programming, 15(4):309--328, August 1986.
....performance generally is less than similar dense matrix solvers even though there is more inherent parallelism in sparse matrix algorithms than dense matrix algorithms. This additional parallelism is often described by elimination trees, graphs that illustrate the dependencies in the calculations [19, 20, 21, 29, 55, 56, 57, 58, 64]. Parallel sparse linear solvers can simultaneously factor entire groups of mutually independent contiguous blocks of columns or rows without communications; meanwhile, dense linear solvers can only update blocks of contiguous columns or rows during each pipelined communication cycle. The limited ....
.... been reported in the power systems community journals to solve the special very sparse irregular power systems network matrices, there has been significant research into efficient general sparse linear solvers for general matrices, always larger and less sparse than power systems network matrices [8, 9, 10, 19, 20, 21, 25, 29, 46, 47, 55, 56, 57, 58, 64]. In the research presented in this thesis, we have developed specialized, efficient parallel sparse linear solvers for linear systems derived from power systems networks. The performance of our parallel linear solvers is significantly better than the performance of linear solvers reported in the ....
[Article contains additional citation context not shown here]
A. George, M. T. Heath, J. Liu, and E. Ng. Solution of Sparse Positive Definite Systems on a Shared-Memory Multiprocessor. International Journal of Parallel Programming, 15(4):309--328, August 1986.
....to processors. The earliest work on parallelizing sparse codes for distributed memory machines was based on column oriented Cholesky factorizations, either the fan in or the fan out algorithm. The original codes just used a column column formulation of the algorithm as in the fan in algorithm of [94] but it was soon apparent that better efficiency could be obtained using a supernode column fan in approach as in [156] Some of this early work on parallel algorithms for distributed memory computers is reviewed by Heath, Ng, and Peyton [117] For distributed memory machines, processors can be ....
A. George, M. T. Heath, J. W. H. Liu, and E. Ng. Solution of sparse positive-definite systems on a shared memory multiprocessor. Int J. Parallel Programming, 15:309--325, 1986.
....is the solution of simultaneous systems of sparse linear equations. There is extensive research to illustrate that it is feasible to obtain reasonable parallelism and speedup when solving general sparse matrices on state of the art distributed memory multiprocessors by using direct techniques [17, 18, 19, 23, 26, 37, 43, 48], as well as by using iterative techniques [4, 11, 29, 36] Meanwhile, there appears to be significant structure in sparse matrices encountered in power system transient stability simulations; that structure can be exploited for additional parallelism in both the portions of the matrices that ....
....can be efficient only for matrices with block bordered diagonal form. Independent processors can work on separate portions of the matrix, with communications required only for elements in the borders. Forward reduction and backward substitution are generally considered to be sequential processes [17, 18, 19], however, the significant parallelism in these stages for transient stability simulations are a direct result of exploiting the matrix structure. There are other research areas available in the parallel direct solution of systems of linear equations. After a matrix is reordered, there still is ....
A. George, M. T. Heath, J. Liu, and E. Ng. Solution of sparse positive definite systems on a shared-memory multiprocessor. International Journal of Parallel Programming, 15(4):309--328, August 1986.
....column j from another processor and subtract u from col(j) Endwhile cdiv(j) Endif Endif Endfor Figure 16 : Algorithme Fan In pour machine a m emoire distribu ee. 2.3. 2 Distribution du controle George et al. ont d evelopp e en 1986 un algorithme pour machines parall eles a m emoire partag ee[4]. Cet al..gorithme est une adaptation quasi directe de l algorithme s equentiel Fan In pour matrices creuses (figure 8) Il est repris dans la figure 18. Les tests de performances effectu es avec cet al..gorithme sur une machine Sequent sont fort prometteurs. Toutefois, un faible nombre de ....
Alan GEORGE, Michael HEATH, Joseph LIU, and Esmond NG. Solution of sparse positive definite systems on a shared memory multiprocesor. International journal of parallel programming, 1986.
....takes advantage of the dense matrix kernels [6, 7] to factorize E t : the Level 2 BLAS (outer product) if g t = 1 or the Level 3 BLAS if g t 1. The classical multifrontal method is not the only method based on the elimination tree. In the sparse column Cholesky factorization of George et al. [17], the work at node k in the elimination tree is the computation of column k of L. The work at node k modifies column k with columns corresponding to a subset of the descendants of node k in the elimination tree. The data flow graph is not a tree, but it is at least spanned by the control flow ....
A. George, M. T. Heath, J. W. H. Liu, and E. Ng, Solution of sparse positive definite systems on a shared-memory multiprocessor, Internat. J. Parallel Programming, 15 (1986), pp. 309-325.
....to processors. The earliest work on parallelizing sparse codes for distributed memory machines was based on column oriented Cholesky factorizations, either the fan in or the fan out algorithm. The original codes just used a column column formulation of the algorithm as in the fan in algorithm of George, Heath, Liu and Ng (1986) but it was soon apparent that better efficiency could be obtained as in the supernode column fan in approach of Ng and Peyton (1993b) Some of this early work on parallel algorithms for distributed memory computers is reviewed by Heath, Ng and Peyton (1991) For distributed memory machines, ....
George, A., Heath, M. T., Liu, J. W. H. and Ng, E. (1986), `Solution of sparse positive-definite systems on a shared memory multiprocessor', Int J. Parallel Programming 15, 309--325.
....and submatrix Cholesky algorithms have important implications for parallel implementations. For further details on the performance implications of the various rearrangements of Gaussian elimination in various architectural contexts, see, for example, Dongarra, Gustavson, and Karp, 1984) (George, Heath, and Liu, 1986), Ortega, 1988) Robert, 1990) Dongarra, Duff, Sorensen, and van der Vorst, 1991) Demmel, Heath, and van der Vorst, 1993) Data Dependence The data dependences in column oriented Cholesky factorization are shown in Figure 3. All of the modifications to column k must be completed before ....
....a common pool of tasks from which available processors claim work to do. This approach has the additional advantage of providing automatic load balancing to whatever degree is permitted by the chosen task granularity. An implementation of this approach for parallel sparse factorization is given in (George, Heath, Liu, and Ng, 1986). In a distributed memory environment, communication costs often prohibit dynamic task assignment or load balancing, and thus we seek a static mapping of tasks to processors. In the case of column oriented factorization algorithms, this amounts to assigning the columns of the matrix to processors ....
George, A., Heath, M., Liu, J., and Ng, E., 1986. "Solution of sparse positive definite systems on a shared memory multiprocessor," Internat. J. Parallel Programming 15, pp. 309--325.
....a common pool of tasks from which available processors claim work to do. This approach has the additional advantage of providing automatic load balancing to whatever degree is permitted by the chosen task granularity. An implementation of this approach for parallel sparse factorization is given in [88]. In a distributed memory environment, communication costs often prohibit dynamic task assignment or load balancing, and thus we seek a static mapping of tasks to processors. In the case of column oriented factorization algorithms, this amounts to assigning the columns of the matrix to processors ....
A. George, M. Heath, J. Liu, and E. Ng. Solution of sparse positive definite systems on a shared memory multiprocessor. Internat. J. Parallel Programming, 15:309--325, 1986.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC