| A.George and E.Ng, Parallel sparse Gaussian elimination with partial pivoting, Annals of Operations Research, Vol. 22, 1990, pp. 219-240. |
....to obtain a di erent solution vector x in each case. Thus, in problems which involve solution of multiple b vectors, the time taken by repeated execution of substitution phase dominates the overall solution time. Although ecient parallel algorithms exist for the numerical factorization phase [2, 1, 23, 6, 4, 10, 15], not much progress has been made in the case of substitution phase [6, 11, 14] due to the limited amount of parallelism inherent in this phase. In Part I of this paper, we developed a bidirectional algorithm that is suitable for the solution of sparse symmetric linear systems with multiple ....
A.George and E.Ng, Parallel sparse Gaussian elimination with partial pivoting, Annals of Operations Research, Vol. 22, 1990, pp. 219-240.
....edges incident on only one node in the subgraph become incident on the resulting supernode. The process repeats until the desired assembly tree is obtained. Elimination and assembly trees are typically defined only for symmetric patterned LU factors, with the exception of partial pivoting methods [21, 22]. A more general graph is needed for the unsymmetric pattern multifrontal method. The classical multifrontal method [2, 9, 10, 15, 16] is based on the assembly tree. It has a similar formulation as the general frontal matrix formulation described in the previous section, except that the analysis ....
A. George and E. Ng. Parallel sparse gaussian elimination with partial pivoting. Annals of Operation Research, 22:219--240, 1990.
....to introduce some extra BLAS 3 operations in re designing the LU algorithm so that the new algorithm is easily parallelized but the sequential performance of this code is still competitive to the current best sequential code. We use the static symbolic factorization technique first proposed in [16, 17] to predict the worst possible structures of the L and U factors without knowing the actual numerical values, then we develop a non symmetric L U supernode partitioning technique to identify the dense structures in both the L and U factors, and maximize the use of BLAS 3 level 3 subroutines ....
....numerical computation. This method overestimates fillins and allocates memory space to avoid dynamic memory management. It has been shown that the static approach is competitive with the dynamic approach in terms of performance as well as memory requirements for a broad range of linear systems [7, 11, 13, 17] and it is much easier to be parallelized on distributed memory machines. Static symbolic factorization is proposed in [17] to identify the worst case nonzero patterns without knowing numerical values of elements. The basic idea is to statically consider all the possible pivoting choices at each ....
[Article contains additional citation context not shown here]
A. George and E. Ng. Parallel Sparse Gaussian Elimination with Partial Pivoting. Annals of Operations Research, 22:219--240, 1990.
....[7] MA47 [5] MF, LDL T , BLAS 1 Com HSL s.p.d. Ng Peyton [16] LL, BLAS 3 Pub Author Shared Memory Algorithms nonsym. SuperLU LL, partial, BLAS 2.5 Pub UCB nonsym. PARASPAR [19, 20] RL, Markowitz, BLAS 1, SD Res Author sym MUPS [2] MF, threshold, BLAS 3 Res Author pattern nonsym. George Ng [9] RL, partial, BLAS 1 Res Author s.p.d. Gupta, Rothberg, LL, BLAS 3 Com SGI Ng Peyton [11] Pub Author s.p.d. SPLASH [13] RL, 2 D block, BLAS 3 Pub Stanford Distributed Memory Algorithms sym. van der Stappen [18] RL, Markowitz, Scalar Res Author sym Lucas et al. 15] MF, no pivoting, BLAS 1 ....
Alan George and Esmond Ng. Parallel sparse Gaussian elimination with partial pivoting. Annals of Operation Research, 22:219--240, 1990.
....operations dynamically change computation and communication patterns during the elimination process, and cause severe caching miss and load imbalance on modern computers with memory hierarchies. The previous work has addressed parallelization using shared memory platforms or restricted pivoting [3, 12, 13, 16]. Most notably, the recent shared memory implementation of SuperLU [3, 4, 18] has achieved up to 2.58GFLOPS on 8 Cray C90 nodes. For distributed memory machines, in [10] we proposed a novel approach called S that integrates three key strategies together in parallelizing this algorithm: 1) adopt ....
....implementation of SuperLU [3, 4, 18] has achieved up to 2.58GFLOPS on 8 Cray C90 nodes. For distributed memory machines, in [10] we proposed a novel approach called S that integrates three key strategies together in parallelizing this algorithm: 1) adopt a static symbolic factorization scheme [13] to eliminate the data structure variation caused by dynamic pivoting; 2) identify data regularity from the sparse structure obtained by the symbolic factorization so that efficient dense operations can be used to perform most of the computation; 3) make use of graph scheduling techniques and ....
[Article contains additional citation context not shown here]
A. George and E. Ng. Parallel Sparse Gaussian Elimination with Partial Pivoting. Annals of Operations Research, 22:219--240, 1990.
....levels. It is implemented on Illinois Cedar multiprocessors based on Aliant shared memory clusters. This paper focuses on parallelization issues for a given matrix ordering with partial pivoting to maintain numerical stability. Parallelization of sparse LU with partial pivoting is also studied in [18] on a shared memory machine. Their approaches overestimate the nonzero fill ins by using a static symbolic LU factorization so that the dynamic variation of LU data structures is avoided. They have obtained good speedups for up to 6 processors on a Sequent machine and further work is needed to ....
....introduce some extra BLAS 3 operations in re designing the LU algorithm so that the new algorithm is easy to be parallelized but the sequential performance of this code is still competitive to the current best sequential code. We use the static symbolic factorization technique first proposed in [17, 18] to predict the worst possible structures of L and U factors without knowing the actual numerical values, then we develop a 2 D L U supernode partitioning technique to identify dense structures in both L and U factors, and maximize the use of BLAS 3 level subroutines for these dense structures. We ....
[Article contains additional citation context not shown here]
A. George and E. Ng. Parallel Sparse Gaussian Elimination with Partial Pivoting. Annals of Operations Research, 22:219--240, 1990.
....caching miss and load imbalance on modern architectures with memory hierarchies. The previous work such as SuperLU [5] has addressed parallelization using shared memory platforms. For distributed memory machines, in [9, 10] we proposed an approach that adopts a static symbolic factorization scheme [12] to avoid data structure variation, identifies data regularity to maximize the use of BLAS 3 operations, and utilizes graph scheduling techniques and efficient run time support [11] to exploit irregular parallelism. Recently [16] we have further studied the properties of elimination forests to ....
....two space optimization strategies. Section 4 describes the experimental results of sequential performance. Section 5 presents the experimental results of parallel performance. Section 6 concludes the paper. 2 Background Static symbolic factorization. Static symbolic factorization is proposed in [12] to identify the worst case nonzero patterns without knowing numerical values of elements. The basic idea is to statically consider all the possible pivoting choices at each elimination step and the space is allocated for all the possible nonzero entries. Using an efficient implementation of the ....
A. George and E. Ng. Parallel Sparse Gaussian Elimination with Partial Pivoting. Annals of Operations Research, 22:219--240, 1990.
....based on Aliant shared memory clusters. Our paper focuses on the parallelization issues for a given matrix ordering with a commonly used pivoting strategy (column partial pivoting) to maintain numerical stability. Parallelization of sparse LU with partial column pivoting is also studied in [11, 12] on a shared memory machine. Their approaches overestimate the nonzero fill ins by using a static symbolic LU factorization so that the dynamic variation of LU data structures is avoided. They have obtained good speedups for up to 6 processors on a Sequent machine and further work is needed to ....
....introduce some extra BLAS 3 operations in re designing the LU algorithm so that the new algorithm is easy to be parallelized but the sequential performance of this code is still competitive to the current best sequential code. We use the static symbolic factorization technique first proposed in [11, 12] to predict the worst possible structures of the L and U factors without knowing the actual numerical values, then we develop a nonsymmetric L U supernode partitioning technique to identify the dense structures in both L and U factors, and maximize the use of BLAS 3 level subroutines (matrixmatrix ....
[Article contains additional citation context not shown here]
A. George and E. Ng. Parallel Sparse Gaussian Elimination with Partial Pivoting. Annals of Operations Research, 22:219--240, 1990.
....based on Aliant shared memory clusters. Our paper focuses on the parallelization issues for a given matrix ordering with a commonly used pivoting strategy (column partial pivoting) to maintain numerical stability. Parallelization of sparse LU with partial column pivoting is also studied in [12, 13] on a shared memory machine. Their approaches overestimate the nonzero fill ins by using a static symbolic LU factorization so that the dynamic variation of LU data structures is avoided. They have obtained good speedups for up to 6 processors on a Sequent machine and further work is needed to ....
....introduce some extra BLAS 3 operations in re designing the LU algorithm so that the new algorithm is easy to be parallelized but the sequential performance of this code is still competitive to the current best sequential code. We use the static symbolic factorization technique first proposed in [12, 13] to predict the worst possible structures of the L and U factors without knowing the actual numerical values, then we develop a nonsymmetric L U supernode partitioning technique to identify the dense structures in both L and U factors, and maximize the use of BLAS 3 level subroutines (matrixmatrix ....
[Article contains additional citation context not shown here]
A. George and E. Ng. Parallel Sparse Gaussian Elimination with Partial Pivoting. Annals of Operations Research, 22:219--240, 1990.
....vectorization within each frontal matrix. However, this method is based on an assumption of a symmetric nonzero pattern, and so has a poor performance on matrices whose patterns are very unsymmetric. None of the previous parallel methods for unsymmetric patterned matrices use dense matrix kernels [2, 10, 11], with the exception of a multifrontal QR factorization algorithm [15] which will be compared later on with the algorithms we develop) available via anonymous ftp to cis.ufl.edu as cis tech reports tr92 tr92 014.ps.Z This paper presents a new unsymmetric pattern multifrontal method that ....
A. George and E. Ng. Parallel sparse Gaussian elimination with partial pivoting. Technical report, Oak Ridge National Laboratory, 1988.
....[110, 111] In addition to demonstrating the efficiency of our parallel algorithm on these machines, we also study the (theoretical) upper bound on performance of this algorithm. Several methods have been proposed to perform sparse Cholesky factorization [49, 73, 90] and sparse LU factorization [6, 57, 65] on shared memory machines. A common practice is to organize the program as a self scheduling loop, interacting with a global pool of tasks that are ready to be executed. Each processor repeatedly takes a task from the pool, executes it, and puts new ready task(s) in the pool. This pool of tasks ....
....At best, the same semantics of equivalent reordering may be used but applied to the Cholesky factor L c of A T A. This only says that the upper bounds of the fills and arithmetic on L and U are the same (Theorem 5 in Section 5.4.3) with no guarantees for L and U themselves. George and Ng [57] employed this technique in their parallel sparse Gaussian elimination algorithm. Their implementation makes use of the static data structure L and U obtained from a symbolic row merge algorithm. In structure, L and U are identical to H and R respectively, by Theorem 6 in Section 5.4.3, ....
[Article contains additional citation context not shown here]
Alan George and Esmond Ng. Parallel sparse Gaussian elimination with partial pivoting. Annals of Operation Research, 22:219--240, 1990.
....has been implemented on Illinois Cedar multi processors based on Aliant shared memory clusters. This paper focuses on parallelization issues for a given column ordering with row interchanges to maintain numerical stability. Parallelization of sparse LU with partial pivoting is also studied in [21] on a shared memory machine by using static symbolic LU factorization to overestimate nonzero fill ins and avoid dynamic variation of LU data structures. This approach leads to good speedups for up to 6 processors on a Sequent machine and further work is needed to assess the performance of the ....
....to introduce some extra BLAS 3 operations in re designing the LU algorithm so that the new algorithm is easy to be parallelized but the sequential performance of this code is still competitive to the current best sequential code. We use the static symbolic factorization technique first proposed in [20, 21] to predict the worst possible structures of L and U factors without knowing the actual numerical values, then we develop a 2 D L U supernode partitioning technique to identify dense structures in both L and U factors, and maximize the use of BLAS 3 level subroutines for these dense structures. We ....
[Article contains additional citation context not shown here]
A. George and E. Ng. Parallel Sparse Gaussian Elimination with Partial Pivoting. Annals of Operations Research, 22:219--240, 1990.
....parallelization problem has been relatively well solved [3, 8, 12, 13] the sparse LU factorization is much harder to be parallelized due to its dynamic nature caused by pivoting operations. The previous work has addressed parallelization issues using shared memory platforms or restricted pivoting [6, 7, 9]. In [4] we proposed a novel approach that integrates three key strategies together in parallelizing this algorithm on distributed memory machines: 1) adopt a static symbolic factorization scheme [7] to eliminate the data structure variation caused by dynamic pivoting; 2)identify data regularity ....
.... has addressed parallelization issues using shared memory platforms or restricted pivoting [6, 7, 9] In [4] we proposed a novel approach that integrates three key strategies together in parallelizing this algorithm on distributed memory machines: 1) adopt a static symbolic factorization scheme [7] to eliminate the data structure variation caused by dynamic pivoting; 2)identify data regularity from the sparse structure obtained by the symbolic factorization so that efficient dense operations can be used to perform most of the computation; 3) make use of graph scheduling techniques and ....
[Article contains additional citation context not shown here]
A. George and E. Ng, Parallel Sparse Gaussian Elimination with Partial Pivoting, Annals of Operations Research, 22 (1990), pp. 219--240.
No context found.
A.George and E.Ng, Parallel sparse Gaussian elimination with partial pivoting, Annals of Operations Research, Vol. 22, 1990, pp. 219-240.
No context found.
A. George and E. Ng. Parallel Sparse Gaussian Elimination with Partial Pivoting. Annals of Operations Research, 22:219--240, 1990.
No context found.
A. George and E. G.-Y. Ng. Parallel sparse Gaussian elimination with partial pivoting. Annals of Operations Research, 22:219--240, 1990.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC