#### DMCA

## SuperLU DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems (2003)

### Cached

### Download Links

Venue: | ACM Trans. Mathematical Software |

Citations: | 144 - 18 self |

### Citations

2072 | GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems
- Saad, Schultz
- 1986
(Show Context)
Citation Context ...e, when large pivot growth still occurs, there are inexpensive methods to tolerate and compensate for the growth, such as 4 iterative methods preconditioned by the computed LU factors, of which GMRES =-=[55-=-] and iterative renement are two examples. This observation led us to design a static pivoting factorization algorithm, called GESP [46]. We demonstrated that GESP works well for practical matrices. I... |

773 | Applied Numerical Linear Algebra
- Demmel
- 1997
(Show Context)
Citation Context ...gnitude than the sum of magnitudes of the o-diagonal entries in its row ( P j 6=i ja ij j) or column ( P j 6=i ja ji j). It is known that choosing diagonal pivots ensures stability for such matrices [=-=18, 33-=-]. We therefore expect that if each diagonal entry can somehow be made larger relative to the o-diagonals in its row or column, then diagonal pivoting will be more stable. The purpose of step (1) is t... |

658 |
Direct Methods for Sparse Matrices
- Duff, Erisman, et al.
- 1986
(Show Context)
Citation Context ...ted than Choleksy for at least two reasons. First and foremost, some kind of numerical pivoting is necessary for stability. Classical partial pivoting [33] or the sparse variant of threshold pivoting =-=[23]-=- typically cause thesll-ins and workload to be generated dynamically during factorization. Therefore, we must either design dynamic data structures and algorithms to accommodate thesesll-ins [3], or e... |

610 | Computer Solution of Large Sparse Positive Definite Systems - George, Liu - 1981 |

535 | University of Florida sparse matrix collection, http://www.cise.ufl.edu/research/sparse/matrices
- Davis
- 2002
(Show Context)
Citation Context ...variety of applications. The application domains of the matrices are given in Table 1. Most of them, except for wu, can be obtained from the Harwell-Boeing Collection [24] and the collection of Davis =-=[16]-=-. Matrix wu was provided by Yushu Wu from the Earth Sciences Division of Lawrence Berkeley National Laboratory. Figure 2 plots the dimension, nnz(A), and nnz(L+U) (i.e., thesll-ins, after the minimum ... |

411 |
ScaLAPACK Users’ Guide.
- Blackford, Choi, et al.
- 1997
(Show Context)
Citation Context ... # of blocks nzval block # row subscripts i1 i2 # of full rows block # row subscripts i1 i2 # of full rows LDA of nzval were demonstrated to be more scalable in the implementations for dense matrices =-=[13]-=- and sparse Cholesky factorization [37, 54]. We now describe the distributed data structures to store local submatrices. In the 2D blocking, each block column of L resides on more than one process, na... |

399 | MeTiS { A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices { Version 4.0
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ... processors. For all these matrices, the algorithm can eciently use 128 processors. Beyond 128 processors, not all matrices can benet from the additional processor power. Only bbmat with ND ordering [=-=43-=-] and ecl32 with AMD [2] can benet from using 512 processors. Our lack of other large unsymmetric systems gives us few data points in this regime. To further analyse the scalability of our solvers, we... |

316 | An approximate minimum degree ordering algorithm
- Amestoy, Davis, et al.
- 1996
(Show Context)
Citation Context ...f. The scheme is similar to the Markowitz scheme [49] but limits the pivot search to the entries on the main diagonal. The ecient implementation is similar to that of approximate minimum degree (AMD) =-=[2]-=-, but it generalizes the (symmetric) quotient graph to the bipartite quotient graph to model the unsymmetric node elimination. The preliminary results show that the new ordering method reduces the amo... |

314 | Sparsekit: a basic tool kit for sparse matrix computations
- Saad
- 1990
(Show Context)
Citation Context ...), (2), (6), and the diagonal perturbation in step (4)). Now we turn to the second conguration of our algorithm, in which restarted GMRES [55] was used in step (6) (we used the version from SPARSKIT [=-=5-=-6]). The restart value is 50. Here, our LU factorization is used in preconditioning for GMRES. The convergence test is based on residual norm: jjr i jj 2 rtol jjr 0 jj 2 + atol, where the relative t... |

266 | A fully asynchronous multifrontal solver using distributed dynamic scheduling
- Amestoy, Duff, et al.
- 2001
(Show Context)
Citation Context ... data structures and communication pattern. Researchers have been quite successful in achieving \scalable" performance for sparse Cholesky factorization; available codes include CAPSS [38], MUMPS=-=-SYM [3-=-], PaStix [40], PSLDLT [54], and PSPACES [36]. In contrast, for nonsymmetric or indenite systems, few distributed-memory codes exist. They are more complicated than Choleksy for at least two reasons. ... |

266 | Nested dissection of a regular finite element mesh. - George - 1973 |

264 | User’s Guide for the Harwell-Boeing Sparse Matrix Collection.
- Duff, Grimes, et al.
- 1992
(Show Context)
Citation Context ...etric matrices drawn from a wide variety of applications. The application domains of the matrices are given in Table 1. Most of them, except for wu, can be obtained from the Harwell-Boeing Collection =-=[24]-=- and the collection of Davis [16]. Matrix wu was provided by Yushu Wu from the Earth Sciences Division of Lawrence Berkeley National Laboratory. Figure 2 plots the dimension, nnz(A), and nnz(L+U) (i.e... |

261 | A supernodal approach to sparse partial pivoting
- Demmel, Eisenstat, et al.
- 1999
(Show Context)
Citation Context ... Despite these diculties, researchers have been addressing these issues successfully for sequential and shared memory machines; available codes include MA41 [6, 5], PARDISO [57], SPOOLES [9], SuperLU =-=[19], Sup-=-erLU MT [20], UMFPACK/MA38 [15], and WSMP [34]. 3 In our earlier codes SuperLU (serial) and SuperLU MT (shared-memory), we devised ecient \symbolic" factorization algorithms to accommodate the dy... |

200 |
The role of elimination trees in sparse factorization
- Liu
- 1990
(Show Context)
Citation Context ... structurally dierent yet closely related to each other in theslled pattern. Unlike the Cholesky factor whose minimum graph representation is a tree (called the elimination tree, or etree for short) [=-=48]-=-, the minimum graph representations of the L and U factors are directed acyclic graphs (called elimination DAGs, or edags for short) [31, 32]. Despite these diculties, researchers have been addressing... |

159 |
Modification of the minimum-degree algorithm by multiple elimination
- Liu
- 1985
(Show Context)
Citation Context ... A in step (1). Step (2) is standard in sparse direct solvers. The column permutation P c can be obtained from anysll-reducing heuristic. In our code, we provide the minimum degree ordering algorithm =-=[47]-=- on the structure of A T + A. The code can also take as input an ordering based on some other algorithm, such as the nested dissection on A T +A [27, 39, 5 Figure 1: The outline of the GESP algorithm.... |

129 | Highly scalable parallel algorithms for sparse matrix factorization
- Gupta, Karypis, et al.
- 1997
(Show Context)
Citation Context ... Researchers have been quite successful in achieving \scalable" performance for sparse Cholesky factorization; available codes include CAPSS [38], MUMPS-SYM [3], PaStix [40], PSLDLT [54], and PSP=-=ACES [36-=-]. In contrast, for nonsymmetric or indenite systems, few distributed-memory codes exist. They are more complicated than Choleksy for at least two reasons. First and foremost, some kind of numerical p... |

117 |
The elimination form of the inverse and its application to linear programming
- Markowitz
- 1957
(Show Context)
Citation Context ...roposed a new symmetric ordering scheme that does not require any symmetrization of the underlying matrix, that is, it works directly on matrix A itself. The scheme is similar to the Markowitz scheme =-=[49]-=- but limits the pivot search to the entries on the main diagonal. The ecient implementation is similar to that of approximate minimum degree (AMD) [2], but it generalizes the (symmetric) quotient grap... |

114 | A combined unifrontal/multifrontal method for unsymmetric sparse matrices,
- Davis, Duff
- 1999
(Show Context)
Citation Context ...s have been addressing these issues successfully for sequential and shared memory machines; available codes include MA41 [6, 5], PARDISO [57], SPOOLES [9], SuperLU [19], SuperLU MT [20], UMFPACK/MA38 =-=[15], and-=- WSMP [34]. 3 In our earlier codes SuperLU (serial) and SuperLU MT (shared-memory), we devised ecient \symbolic" factorization algorithms to accommodate the dynamically generatedsll-ins due to pa... |

95 | An asynchronous parallel supernodal algorithm for sparse Gaussian elimination.
- Gilbert, Demmel, et al.
- 1999
(Show Context)
Citation Context ...culties, researchers have been addressing these issues successfully for sequential and shared memory machines; available codes include MA41 [6, 5], PARDISO [57], SPOOLES [9], SuperLU [19], SuperLU MT =-=[20], UMF-=-PACK/MA38 [15], and WSMP [34]. 3 In our earlier codes SuperLU (serial) and SuperLU MT (shared-memory), we devised ecient \symbolic" factorization algorithms to accommodate the dynamically generat... |

92 |
Solving sparse linear systems with sparse backward error
- Arioli, Demmel, et al.
- 1989
(Show Context)
Citation Context ...of an iterative method like iterative renement (shown) or GMRES [55] if the solution from step (5) is not accurate enough. The termination criterion is based on the componentwise backward error berr [=-=8, 18]. T-=-he condition berr " means that the computed solution is the exact solution of a slightly dierent sparse linear system (A + A)x = b + b where A changes only each nonzero entry a ij by at most one... |

90 | The design and use of algorithms for permuting large entries to the diagonal of sparse matrices - Duff, Koster - 1999 |

86 | Introduction to Parallel Computing”, The Benjamin/ Cumminy Publishing Company Inc, - Kumar, Grama, et al. - 2003 |

85 | Computer Solution of Linear Algebraic Systems, - Forsythe, Moler - 1967 |

73 | The Chaco User's Guide, Version 1.0," - Hendrickson, Leland - 1993 |

73 |
A collection of Fortran codes for large scale scienti c computation'. http://www.cse.clrc.ac.uk/Activity/HSL
- HSL
(Show Context)
Citation Context ...g matrices D r and D c , and the permutation P r to make each a ii larger in this sense. We have experimented with a number of heuristic algorithms implemented in the routine MC64 (available from HSL =-=[41-=-]) [22]. All depend on the following graph representation of an n n sparse matrix A: it is represented as an undirected weighted bipartite graph with one vertex for each row, one vertex for each colu... |

70 | Design, implementation and testing of extended and mixed precision BLAS
- Li, Demmel, et al.
- 2002
(Show Context)
Citation Context ...gorithms will probably depend on the number of right-hand sides. Improve numerical robustness. More techniques can be used; these include performing iterative renement with extra precise residuals [4=-=5-=-] and using dynamic precision during the factorization, see Appendix A. Acknowledgments We would like to thank Patrick Amestoy, Iain Du, Jean-Yves L'Excellent and Rich Vuduc for very helpful discussio... |

57 | Preconditioning highly indefinite and nonsymmetric matrices - Benzi, Haws, et al. |

50 | Robust ordering of sparse matrices using multisection:
- shcraft, Liu
- 1998
(Show Context)
Citation Context ...e been developed. Comparing SuperLU DIST with those solvers remains future work. SPOOLES is a supernodal, left-up-looking solver [9]. Thesll reducing ordering is a hybrid approach called multisection =-=[10]-=-, which is applied to the structure of A T +A. It performs threshold rook pivoting with both row and column interchanges. The task dependency graph is the elimination tree of A T + A. S+ is a supernod... |

50 | Predicting structure in sparse matrix computations
- Gilbert
- 1984
(Show Context)
Citation Context ...on is a tree (called the elimination tree, or etree for short) [48], the minimum graph representations of the L and U factors are directed acyclic graphs (called elimination DAGs, or edags for short) =-=[31, 32]-=-. Despite these diculties, researchers have been addressing these issues successfully for sequential and shared memory machines; available codes include MA41 [6, 5], PARDISO [57], SPOOLES [9], SuperLU... |

48 | A new pivoting strategy for Gaussian elimination
- Olschowka, Neumaier
- 1996
(Show Context)
Citation Context ... r and D c simultaneously so that each diagonal entry of P r D r AD c is 1, each o-diagonal entry is bounded by 1 in magnitude. The implementation is based on the algorithm by Olshowka and Neumaier [5=-=-=-0]. We report results for this algorithm only. The worst case serial complexity of this algorithm is O(n nnz(A) log n), where nnz(A) is the number of nonzeros in A. In practice it is much faster; th... |

44 |
Performance of panel and block approaches to sparse Cholesky factorization on the iPSC/860 and Paragon systems
- Rothberg
- 1994
(Show Context)
Citation Context ...unication pattern. Researchers have been quite successful in achieving \scalable" performance for sparse Cholesky factorization; available codes include CAPSS [38], MUMPS-SYM [3], PaStix [40], PS=-=LDLT [54-=-], and PSPACES [36]. In contrast, for nonsymmetric or indenite systems, few distributed-memory codes exist. They are more complicated than Choleksy for at least two reasons. First and foremost, some k... |

43 |
Symbolic factorization for sparse Gaussian elimination with partial pivoting
- George, Ng
- 1987
(Show Context)
Citation Context ... row interchanges during the factorization depend on the numerical values. However, for any row interchanges, the structures of L and U are subsets of the structures of H (or R T ) and R respectively =-=[28, 30]-=-. Therefore, a good symmetric ordering P c on A T A (either based on minimum degree or nested dissection) that preserves the sparsity of R can be applied to the columns of A, forming AP T c , so that ... |

43 |
Efficient sparse lu factorization with left-right looking strategy on shared memory multiprocessors
- Schenk, Gärtner, et al.
(Show Context)
Citation Context ... edags for short) [31, 32]. Despite these diculties, researchers have been addressing these issues successfully for sequential and shared memory machines; available codes include MA41 [6, 5], PARDISO =-=[57], SPO-=-OLES [9], SuperLU [19], SuperLU MT [20], UMFPACK/MA38 [15], and WSMP [34]. 3 In our earlier codes SuperLU (serial) and SuperLU MT (shared-memory), we devised ecient \symbolic" factorization algor... |

42 | Elimination structures for unsymmetric sparse LU factors
- Gilbert, Liu
- 1993
(Show Context)
Citation Context ...on is a tree (called the elimination tree, or etree for short) [48], the minimum graph representations of the L and U factors are directed acyclic graphs (called elimination DAGs, or edags for short) =-=[31, 32]-=-. Despite these diculties, researchers have been addressing these issues successfully for sequential and shared memory machines; available codes include MA41 [6, 5], PARDISO [57], SPOOLES [9], SuperLU... |

42 | Scalable iterative solution of sparse linear systems
- Jones, Plassmann
- 1991
(Show Context)
Citation Context ...the orderings that give wide and bushy elimination DAGs, such as nested dissection. To speed up the triangular solve, we may apply some graph coloring heuristic to reduce the number of parallel steps =-=[42]-=-. There are also alternative algorithms other than substitutions, such as those based on partitioned inversion [1] or selective inversion [51]. However, these algorithms usually require preprocessing ... |

42 | Making sparse Gaussian elimination scalable by static pivoting
- Li, Demmel
- 1998
(Show Context)
Citation Context ...SuperLU DIST which is targeted for large-scale distributed-memory machines, we use a static pivoting approach, called GESP (Gaussian Elimination with Static Pivoting), proposed earlier by the authors =-=[46]-=-. We parallelized the GESP algorithm using MPI. Our parallelization strategies center around the scalability concern. We use a 2D block-cyclic mapping of a sparse matrix to the processors, and designe... |

31 | E cient sparse LU factorization with partial pivoting on distributed memory architectures
- Fu, Jiao, et al.
- 1998
(Show Context)
Citation Context ...actorization. Therefore, we must either design dynamic data structures and algorithms to accommodate thesesll-ins [3], or else use static data structures which can grossly overestimate the truesll-in =-=[26, 35-=-]. The second complication is the need to handle two factored matrices L and U , which are structurally dierent yet closely related to each other in theslled pattern. Unlike the Cholesky factor whose ... |

30 |
Memory management issues in sparse multifrontal methods on multiprocessors
- Amestoy, Du
- 1993
(Show Context)
Citation Context ...ination DAGs, or edags for short) [31, 32]. Despite these diculties, researchers have been addressing these issues successfully for sequential and shared memory machines; available codes include MA41 =-=[6, 5], PAR-=-DISO [57], SPOOLES [9], SuperLU [19], SuperLU MT [20], UMFPACK/MA38 [15], and WSMP [34]. 3 In our earlier codes SuperLU (serial) and SuperLU MT (shared-memory), we devised ecient \symbolic" facto... |

29 |
Computer solution of large sparse positive de systems
- George, Liu
- 1981
(Show Context)
Citation Context ...nsider the 3D cubic grid problem using the standard nested dissection ordering, thesll in the factored matrix is O(N 4=3 ) and the number ofsoating-point operations to factorize the matrix is O(N 2 ) =-=[29]-=-. Let the P processors be arranged as a square process grid. In our parallel algorithm (Figure 12), each nonzero element is sent to at most p P processors. The total communication overhead is O(N 4=3 ... |

27 | Highly parallel sparse triangular solution
- Alvarado, Pothen, et al.
- 1993
(Show Context)
Citation Context ...e, we may apply some graph coloring heuristic to reduce the number of parallel steps [42]. There are also alternative algorithms other than substitutions, such as those based on partitioned inversion =-=[1-=-] or selective inversion [51]. However, these algorithms usually require preprocessing or dierent matrix distributions than the one used in our factorization. Whether the preprocessing and redistribut... |

24 | An unsymmetrized multifrontal LU factorization
- Amestoy, Puglisi
- 2000
(Show Context)
Citation Context ...ination DAGs, or edags for short) [31, 32]. Despite these diculties, researchers have been addressing these issues successfully for sequential and shared memory machines; available codes include MA41 =-=[6, 5], PAR-=-DISO [57], SPOOLES [9], SuperLU [19], SuperLU MT [20], UMFPACK/MA38 [15], and WSMP [34]. 3 In our earlier codes SuperLU (serial) and SuperLU MT (shared-memory), we devised ecient \symbolic" facto... |

23 | Analysis and comparison of two general sparse solvers for distributed memory computers
- Amestoy, Duff, et al.
(Show Context)
Citation Context ...process PK+1 on the critical path. This could happen if the sender and receiver are required to handshake before proceeding, as is the case with large messages that exceed the MPI internal buer size [=-=7]-=-. That is, process PK+1 posts mpi send long before processes PROC r (K) post the matching mpi recv, and the sender must be blocked to wait for mpi recv. To avoid this synchronization cost, we introduc... |

23 | Improved symbolic and numerical factorization algorithms for unsymmetric sparse matrices,”
- Gupta
- 2002
(Show Context)
Citation Context ...actorization. Therefore, we must either design dynamic data structures and algorithms to accommodate thesesll-ins [3], or else use static data structures which can grossly overestimate the truesll-in =-=[26, 35-=-]. The second complication is the need to handle two factored matrices L and U , which are structurally dierent yet closely related to each other in theslled pattern. Unlike the Cholesky factor whose ... |

22 |
The design and use of algorithms for permuting large entries to the diagonal of sparse matrices
- Du, Koster
- 1999
(Show Context)
Citation Context ...column permutation algorithms in steps (1) and (2) (computing P r and P c ) are not easy to parallelize (their parallelization is future work). Fortunately, their memory requirement is just O(nnz(A)) =-=[17, 21]-=-, as opposed to the superlinear memory requirement for L and U factors, so in the meantime we can run the ordering algorithms on a single processor. Figure 7 shows the times spent in the other steps o... |

20 | Performance of a fully parallel sparse solver
- Heath, Raghavan
- 1997
(Show Context)
Citation Context ...(2D) distributed data structures and communication pattern. Researchers have been quite successful in achieving \scalable" performance for sparse Cholesky factorization; available codes include C=-=APSS [38-=-], MUMPS-SYM [3], PaStix [40], PSLDLT [54], and PSPACES [36]. In contrast, for nonsymmetric or indenite systems, few distributed-memory codes exist. They are more complicated than Choleksy for at leas... |

15 |
Optimally scalable parallel sparse cholesky factorization
- Gupta, Kumar
- 1995
(Show Context)
Citation Context ...ts i1 i2 # of full rows block # row subscripts i1 i2 # of full rows LDA of nzval were demonstrated to be more scalable in the implementations for dense matrices [13] and sparse Cholesky factorization =-=[37, 54]-=-. We now describe the distributed data structures to store local submatrices. In the 2D blocking, each block column of L resides on more than one process, namely, a column of processes. For example, i... |

14 |
A data structure for sparse QR and LU factorizations.
- GEORGE, LIU, et al.
- 1988
(Show Context)
Citation Context ... row interchanges during the factorization depend on the numerical values. However, for any row interchanges, the structures of L and U are subsets of the structures of H (or R T ) and R respectively =-=[28, 30]-=-. Therefore, a good symmetric ordering P c on A T A (either based on minimum degree or nested dissection) that preserves the sparsity of R can be applied to the columns of A, forming AP T c , so that ... |

13 | Nested dissection of a regular element mesh - George - 1973 |

13 |
Efficient parallel sparse triangular solution using selective inversion
- Raghavan
- 1998
(Show Context)
Citation Context ...coloring heuristic to reduce the number of parallel steps [42]. There are also alternative algorithms other than substitutions, such as those based on partitioned inversion [1] or selective inversion =-=-=-[51]. However, these algorithms usually require preprocessing or dierent matrix distributions than the one used in our factorization. Whether the preprocessing and redistribution will oset the benet o... |

13 |
Collisional breakup in a quantum system of three charged particles,
- Rescigno, Baertschy, et al.
- 1999
(Show Context)
Citation Context ... to be done repeatedly in each preconditioning step). The total execution time is about 1 hour. See [11] for more details. The scientic breakthrough result was reported in a cover article of Science [=-=52]-=-. More recently, we have been collaborating with researchers at the Stanford Linear Accelerator Center to develop alternative eigensolvers for Omega3P, a widely used electromagnetics code in accelerat... |

11 | A mapping and scheduling algorithm for parallel sparse fan-in numerical factorization
- Henon, Ramet, et al.
- 1999
(Show Context)
Citation Context ...ures and communication pattern. Researchers have been quite successful in achieving \scalable" performance for sparse Cholesky factorization; available codes include CAPSS [38], MUMPS-SYM [3], Pa=-=Stix [40-=-], PSLDLT [54], and PSPACES [36]. In contrast, for nonsymmetric or indenite systems, few distributed-memory codes exist. They are more complicated than Choleksy for at least two reasons. First and for... |

10 | WSMP: The Watson Sparse Matrix Package
- GUPTA
- 1997
(Show Context)
Citation Context ...ressing these issues successfully for sequential and shared memory machines; available codes include MA41 [6, 5], PARDISO [57], SPOOLES [9], SuperLU [19], SuperLU MT [20], UMFPACK/MA38 [15], and WSMP =-=[34]. 3 I-=-n our earlier codes SuperLU (serial) and SuperLU MT (shared-memory), we devised ecient \symbolic" factorization algorithms to accommodate the dynamically generatedsll-ins due to partial pivoting.... |

8 | Diagonal Markowitz scheme with local symmetrization
- Amestoy, Li, et al.
(Show Context)
Citation Context ...n A T +A may destroy the 12 sparsity of matrix A, particularly when A is highly unsymmetric. Recently, motivated by the GESP algorithm and an unsymmetrized multifrontal method [5], Amestoy, Li and Ng =-=[4]-=- proposed a new symmetric ordering scheme that does not require any symmetrization of the underlying matrix, that is, it works directly on matrix A itself. The scheme is similar to the Markowitz schem... |

6 |
SPOOLES: An object oriented sparse matrix library
- Ashcraft, Grimes
- 1999
(Show Context)
Citation Context ...rt) [31, 32]. Despite these diculties, researchers have been addressing these issues successfully for sequential and shared memory machines; available codes include MA41 [6, 5], PARDISO [57], SPOOLES =-=[9], Sup-=-erLU [19], SuperLU MT [20], UMFPACK/MA38 [15], and WSMP [34]. 3 In our earlier codes SuperLU (serial) and SuperLU MT (shared-memory), we devised ecient \symbolic" factorization algorithms to acco... |

5 |
Preconditioning highly inde and nonsymmetric matrices
- Benzi, Haws, et al.
(Show Context)
Citation Context ... is substantially improved in many cases when the large-diagonal permutation is employed. Benzi, Haws and Tuma conducted more extensive experiments on the eect of MC64 on preconditioning strategies [1=-=2]-=-. Chen [14] also considered using MC64 to avoid pivoting as much as possible in the ILU methods. Amestoy et al. developed a distributed-memory multifrontal solver, called MUMPS [3]. It is based on the... |

3 |
Solution of a three-body poblem in quantum mechanics
- Baertschy, Li
- 2001
(Show Context)
Citation Context ... NERSC (this is done only once), and it takes 26 seconds to perform triangular solutions (this needs to be done repeatedly in each preconditioning step). The total execution time is about 1 hour. See =-=[11-=-] for more details. The scientic breakthrough result was reported in a cover article of Science [52]. More recently, we have been collaborating with researchers at the Stanford Linear Accelerator Cent... |

3 |
Preconditioning Sparse Matrices for Computing Eigenvalues and Solving Linear Systems of Equations
- Chen
- 2001
(Show Context)
Citation Context ...tially improved in many cases when the large-diagonal permutation is employed. Benzi, Haws and Tuma conducted more extensive experiments on the eect of MC64 on preconditioning strategies [12]. Chen [1=-=4]-=- also considered using MC64 to avoid pivoting as much as possible in the ILU methods. Amestoy et al. developed a distributed-memory multifrontal solver, called MUMPS [3]. It is based on the symmetric ... |

2 |
Parallel ARPACK. http://www.caam.rice.edu/kristyn/parpack home.html
- Lehoucq, Maschho, et al.
(Show Context)
Citation Context ...agnetics code in accelerator design. In this application the interior eigenvalues and eigenvectors of a large sparse generalized eigenvalue problem are needed. We integrated SuperLU DIST with PARPACK =-=[44]-=-, a parallel Lanczos code, to construct a shift-and-invert eigensolver. For a system of order 1.3 million, PARPACK needs about 4.5 solves for each eigenpair. For each solve, SuperLU DIST takes 39 seco... |

2 |
Parallel Bipartite Matching for Sparse Matrix Computation
- Riedy, Demmel
(Show Context)
Citation Context ...lel. Although it takes very little time, its parallelization would enhance memory scalability, and will be our future work. There is an on-going work by Riedy on parallel bipartite matching algorithm =-=[53]-=-. We will use it in place of MC64 in the future. For now, we start with a copy of the entire matrix on each processor, and run steps (1) through (3) independently on each processor. The third column o... |

2 | A new pivoting strategy for Gaussian elimination - OLSHOWKA, A - 1996 |

1 | SuperLU DIST: A Scalable Sparse Direct Solver • 139 DEMMEL,J.W.,GILBERT,J.R.,AND - LI - 1999 |