| F. L. Alvarado and R. Schreiber. Optimal parallel solution of sparse triangular systems. SIAM Journal on Scientific Computing, 14(2):446--460, March 1993. |
....SuperLU [14] MUMPS [2] UMFPACK [13] PSPASES [24] and SPOOLES [5] among others) These libraries have focused primarily on speeding up the factorization step, and employ sophisticated methods for creating dense structure. E#orts to speedup the triangular solve step in these and other work [33, 32, 25, 18, 26, 36, 19, 1, 30] have focused on improving parallel scalability, whereas we address uniprocessor tuning exclusively here. For dense algorithms, a variety of sophisticated static models for selecting transformations and tuning parameters have been developed [12, 17, 28, 11, 42] However, it is di#cult to apply ....
F. L. Alvarado and R. Schreiber. Optimal parallel solution of sparse triangular systems. SIAM Journal on Scientific Computing, 14(2):446--460, March 1993.
....is proposed for shared memory architectures based on blocking techniques. We have been inspired by this work (originally intended for shared memory systems) in the development of the second algorithm presented in this paper. There are other more recent methods based on other ideas. Thus, in [1] they start from the representation of the triangular matrix L as a product of a few sparse factors, replacing the triangular solutions by a set of matrix vector products. Gupta and Kumar [7] start from the elimination tree of a Cholesky matrix in order to solve a triangular matrix based on the ....
Fernando L. Alvarado and Robert Schreiber. Optimal parallel solution of sparse triangular systems. SIAM J. Sci. Comput., 14(2):446--460, March 1993.
....is proposed for shared memory architectures based on blocking techniques. We have been inspired by this work (originally intended for shared memory systems) in the development of the second algorithm presented in this paper. There are other more recent methods based on other ideas. Thus, in [1] they start from the representation of the triangular matrix L as a product of a few sparse factors, replacing the triangular solutions by a set of matrix vector products. This idea is particularized in [13] to the triangular system that results from the Cholesky factorization (A = LDL T ) ....
Fernando L. Alvarado and Robert Schreiber. Optimal parallel solution of sparse triangular systems. SIAM J. Sci. Comput., 14(2):446--460, March 1993.
....triangular factor. This algorithm will later be used for testing our norm estimator. The inversion of sparse matrices is additionally associated with the problem of fill in. However, in the case of triangular matrices fill in can be avoided by storing the inverse in factored form as proposed by Alvarado and Schreiber (1993). We describe the details of this approach in Section 4.1 and illustrate problems that can occur when we try to detect ill conditioning from the factored form. We show the reliability of our incremental norm estimator in Section 5, by presenting results obtained from a variety of dense and sparse ....
....form, that is the ill conditioning is hidden by this implicit representation of the inverse. From this example, we conclude that we need to calculate the inverse explicitly 11 to avoid hiding the ill conditioning. For most matrices, it is not possible to do this without fill in, however, in Alvarado and Schreiber (1993), it is shown how the number of factors in the sparse factored form of the inverse can be reduced while still avoiding fill in so long as the matrix satisfies a certain condition. The original intention of Alvarado and Schreiber (1993) was to enhance parallelism in the solution of triangular ....
[Article contains additional citation context not shown here]
Alvarado, F. L. and Schreiber, R. (1993), `Optimal parallel solution of sparse triangular systems', SIAM J. Scientific Computing 14, 446--460.
.... in the sequential execution of the ETMSP power grid code [ETM93] We would like to note that in applications where the solve is repeated several times, it may be more efficient to invert the triangular matrix, converting the solve into an easily parallelizable matrix vector multiplication [AS92] It is possible to eliminate fill in during the inversion [PA92] We have not investigated this technique in this paper. 0 We estimatedspeedupsassuming a sequential solve speed similar to that of uncachedDAXPY on a single iPSC 860 node [Moy91] 3 3 Experimental Platform In this section, we ....
Fernando Alvarado and Robert Schreiber. Optimal parallel solution of sparse triangular systems. SIAM Journal of Scientific and Statistical Computation, 1992. to appear.
....and of SI. Section 4 contains concluding remarks. The rest of this section contains a brief survey of related research. Related Research. Improving the efficiency of parallel substitution schemes has been the subject of considerable research for dense [9, 12, 15, 16] and sparse linear systems [1, 2, 5, 18, 21]. For dense systems, this resulted in sophisticated pipelined schemes that do indeed show high efficiency for sufficiently large matrix sizes. In the sparse case, several researchers have considered inversion and matrix vector multiplication as an alternative to substitution schemes. We now ....
....for sufficiently large matrix sizes. In the sparse case, several researchers have considered inversion and matrix vector multiplication as an alternative to substitution schemes. We now provide a brief survey of the work related to parallel sparse triangular solution. Alvarado and Schreiber [1] consider solving in parallel a sparse triangular system for multiple right hand sides. The sparse system to be solved need not be associated with sparse matrix factors. Their partitioned inverse scheme (PI) replaces a triangular matrix T by a representation of T Gamma1 that has no more ....
F. L. Alvarado and R. Schreiber, Optimal parallel solution of sparse triangular systems, SIAM J. Sci. Comput., 14 (1993), pp. 446--460.
....on substitution are surveyed by Gallivan et al. 9, Sec. 3. 5] Heller [13] and Ortega and Voigt [16] A new method has been developed recently for the parallel solution of sparse triangular systems with many right hand sides when these vectors are not necessarily available at the same time [1] [2], 3] 8] The method involves representing the inverse of the coefficient matrix as a product of sparse factors, and can be explained as follows. If L 2 IR n Thetan is lower triangular we can write L = L 1 L 2 : Ln , where L k differs from the identity matrix only in the kth column: L k ....
....since m is the number of serial steps in the parallel evaluation of x. Since we are assuming that many right hand sides are to be processed, we can afford to spend some computational effort in constructing the partition (1.2) Algorithms for finding a best no fill partition (1. 2) are described in [1, 2, 3]; such a partition has the smallest possible number of factors (the minimum value of m) subject to the requirement that each G k is invertible in place. A matrix X is invertible in place if (X Gamma1 ) ij = 0 whenever x ij = 0, for any assignment of (nonzero) numerical values to the nonzeros in ....
[Article contains additional citation context not shown here]
F. L. Alvarado and R. S. Schreiber, Optimal parallel solution of sparse triangular systems, SIAM J. Sci. Comput., 14 (1993), pp. 446--460.
....An implementation based on this approach and computational results are given in [10] Partitioned inverse One can determine a product decomposition of L; for example, L = Y i=1 L i ; 4. 2) where the nonzero structure, S, of the product elements satisfy S(L i ) S(L Gamma1 i ) [1, 2]. The inversion of L can be performed with matrix products once the partitioned inverse is formed. We note that this can always done with a pointwise coloring, where is the number of colors used. It has been observed by Robert Schreiber [18] that the partitioned inverse approach can reduce the ....
F. L. Alvarado and R. Schreiber, Optimal parallel solution of sparse triangular systems, SIAM Journal on Scientific and Statistical Computing, (to appear).
....on highly parallel machines like the Connection Machine CM2. It has been recognized that the triangular matrix can be symmetrically permuted to minimize the number of factors, and hence several strategies for minimizing t over appropriate permutations of L have been considered in previous work [2, 11]. Minimizing t over all symmetric permutations of L for which the permuted matrix remains lower triangular gives rise to a directed acyclic graph (DAG) partitioning problem [2] After introducing some notation, we discuss this problem in some detail, after which we proceed with a description of ....
.... hence several strategies for minimizing t over appropriate permutations of L have been considered in previous work [2, 11] Minimizing t over all symmetric permutations of L for which the permuted matrix remains lower triangular gives rise to a directed acyclic graph (DAG) partitioning problem [2]. After introducing some notation, we discuss this problem in some detail, after which we proceed with a description of the closely related partitioning problem addressed in this paper. Let G d = V; F ) be the directed graph of the matrix L with vertices V = f1; ng corresponding to the ....
[Article contains additional citation context not shown here]
F. L. Alvarado and R. S. Schreiber, Optimal parallel solution of sparse triangular systems. SIAM J. Sci. Stat. Comput., to appear, 1992.
....induced by each R i is transitively closed; and 3. t is minimum over partitions of all DAGs obtained from PEOs of G. Problem 1 and a simpler DAG partitioning problem arose in the design of algorithms for solving sparse triangular systems of equations on highly parallel computers. The papers [2, 12, 18, 20] discuss various aspects of this problem, and a survey is provided in [1] An algorithm for solving this partitioning problem in time and space O(jV j jEj) has been described in [18] This greedy algorithm eliminates all vertices that are eligible for elimination at each step; hence the set ....
F. L. Alvarado and R. S. Schreiber, Optimal parallel solution of sparse triangular systems, SIAM J. Sci. Comput., 14 (1993), pp. 446--460.
.... L Gamma1 , or better a partitioned form of this to avoid some of the fill in that would be associated with forming L Gamma1 explicitly [5] Various schemes for this partitioning have been proposed to balance the parallelism (limited by the number of partitions) with the fill in (for example, [3, 4, 75]) and, more recently, the selective inversion of submatrices produced by a multifrontal factorization algorithm has been proposed [78] 6 Current situation There is no question that direct sparse matrix algorithms and codes based on them have come of age . Gone are the days when the only sparse ....
F. L. Alvarado and R. Schreiber. Optimal parallel solution of sparse triangular systems. SIAM J. Scientific Computing, 14:446--460, 1993.
....a non zero in column j. Element i of the right hand side vector, b, is also kept with DAG node i, as is element i of the result vector x. A. 4 Partitioned Inverses Partitioned inverses is an alternative method of sparse triangular solution that has been shown to greatly enhance parallelism [AS93] APS93] CS95] In general, the system Lx = b can be solved through the computation x = L Gamma1 b. Unfortunately, as Figure A 2 shows, the calculation of L Gamma1 can lead to fill, non zeros that were not in the original sparse structure of L. Fill occurs because L Gamma1 must have ....
....transitive closure. L Gamma1 1 L Gamma1 0 b. Figure A 4 shows how we represent this calculation as a DAG. In general, Lx = b may be solved by computing: x = Y i L Gamma1 i b where the L i s are chosen to have transitively closed column subgraphs. We use the rp2 algorithm given in [AS93] to choose our partitions. Note that rp2 may also reorder L into other lower triangular matrices in an attempt to find better partitions. A.5 Substitution vs. Partitioned Inverses The key issue in choosing a technique is fill. Triangular factors used in direct solution contain much fill, which ....
Fernando Alvarado and Robert Schreiber. Optimal parallel solution of sparse triangular systems. SIAM Journal of Scientific and Statistical Computation, 14:446--460, 1993.
....operation (flop) as ff = 1 2 (Mult Add) the peak speed of a single processor element is 0.1412 Mflops for 64 bit arithmetic. A 16K processor machine would thus have the peak performance of 2314 Mflops. Comparing the speed of arithmetic to communication on the MP 2, we obtain the ratio Xnet[1] ff = 0:8: Thus floating point arithmetic is actually more expensive (by 20 ) than sending the value to the nearest neighbor in the processor array. 5.2 The algorithm Let A be M Theta N sparse matrix and x a vector of length N . We have implemented an algorithm for calculating the matrix vector ....
....that the same structural matrix is multiplied many times. If this is so then the indices have only to be calculated before the first time. The complete algorithm is shown in Figure 3. In the algorithm the variables iy and ix contain the row and column index of each processor. The statment sendE[1].z = z is the use of the Xnet primitive, where E gives the direction of sending (in this case east) 5.3 Load balancing The expected execution time of the program in Figure 3 is given by the following formula: fl 1 m dN= mn)e fl 2 max i;j jA ij j fl 3 n dM= mn)e (17) The first term ....
[Article contains additional citation context not shown here]
F. L. Alvarado and R. Schreiber, Optimal parallel solution of sparse triangular systems, SIAM J. Sci. Statist. Comput., 14 (1993).
.... L Gamma1 , or better a partitioned form of this to avoid some of the fill in that would be associated with forming L Gamma1 explicitly [6] Various schemes for this partitioning have been proposed to balance the parallelism (limited by number of partitions) with the fill in (for example, [4, 5, 161]) and, more recently, the selective inversion of submatrices produced by a multifrontal factorization algorithm has been proposed [167] Effect of ordering schemes We now turn to the crucial issue of ordering the rows and columns (choosing the pivots) to preserve sparsity and to exploit ....
.... matrices that arise from problems in electromagnetism [2] For some classes of problems, it may be attractive to construct the explicit inverses of the LU factors, even if these are considerably less sparse than the factors L and U , because such a factorization can be more efficient in parallel [5]. An incomplete form of this factorization for use as a preconditioner was proposed in [3] Preconditioning by Blocks or Domains Other preconditioners that use direct methods, are those where the direct method, or an incomplete version of it, is used to solve a subproblem of the original ....
F. L. Alvarado and R. Schreiber. Optimal parallel solution of sparse triangular systems. SIAM J. Scientific Computing, 14:446--460, 1993.
....U k is the product of a set of consecutive elementary matrices U k = I Gamma e i u T i ) I Gamma e j u T j ) such, that there is no or little fill in. When U is not so sparse, e.g. when U stems from a complete factorization, the number of terms in (5. 1) can be quite moderate (see [2]) Unfortunately this is not true when U is as sparse as it is when an incomplete factorization was used. Three algorithms for discarding fill ( sparsification ) are presented: 1) numerical dropping on exact value (SW x ILU i ) 2) positional dropping on fill level (IW j ILU i ) 3) ....
F.L. Alvarado and R. Schreiber. Optimal parallel solution of sparse triangular systems. SIAM J. Sci. Comput., 14(2):446--460, March 1993.
....substitution. This was accomplished by representing the computation as a directed acyclic graph (DAG) that was prescheduled by dominant sequence clustering (DSC) 2] Unfortunately, when L is a complete factor, parallelism is extremely limited. This study uses the method of partitioned inverses [3] [4] to increase available parallelism over substitution, commonly by an order of magnitude. We also represent our partitioned inverses computation as a DAG and preschedule using DSC. Our preliminary results exhibit dramatically improved speedups on the CM5 and CM5E. Contact: ftchong lcs.mit.edu, ....
....to any extra arcs added to our DAG representation by transitive closure. Extra arcs are shown as dashed arrows and fill is shown as empty circles. 3 Partitioned Inverses Partitioned inverses is an alternative method of sparse triangular solution that has been shown to greatly enhance parallelism [3] [4] In general, the system Lx = b can be solved through the computation x = L Gamma1 b. Unfortunately, as we can see in Figure 2 the calculation of L Gamma1 can lead to fill, nonzeros that were not in the original sparse structure of L. Fill occurs because L Gamma1 must have the ....
[Article contains additional citation context not shown here]
F. Alvarado and R. Schreiber, "Optimal parallel solution of sparse triangular systems," SIAM Journal of Scientific and Statistical Computation, vol. 14, pp. 446--460, 1993.
....of r ensures that inclusion of the vertex r in the current factor P k will continue to make G(P k ) transitively closed, and thus P k will be invertible in place. 5 Alvarado, Yu, and Betancourt did not consider the issue of optimality, but later it was proved by Alvarado and Schreiber [2] that Algorithm P1 solves problem (Pr1) Best reordered partitions. Now we describe Algorithm RP1 that solves the reordered partitioning problem (Pr2) A vertex v in the DAG G(L) is a source if there are no edges directed into v: i.e. there are no edges (u; v) The level of a vertex v is the ....
....an appropriate symmetric permutation Q to minimize the number of factors. Conditions 1a and 1b in the algorithm ensure that the first condition of problem (Pr2) is satisfied; similarly condition 2 ensures that the column subgraphs of the factors are transitively closed. Alvarado and Schreiber [2] proved that Algorithm RP1 finds a best reordered partition. The time complexity of the algorithm is dominated by the checking of condition 2: in 6 the worst case, this cost is P v2V d I (v)d O (v) where d I (v) is the indegree and dO (v) is the outdegree of v. Since d I (v) n 0 1, and ....
F. L. Alvarado and R. Schreiber, Optimal parallel solution of sparse triangular systems. SIAM J. Sci. Stat. Comput., to appear, 1992.
No context found.
F. L. Alvarado and R. Schreiber. Optimal parallel solution of sparse triangular systems. SIAM Journal on Scientific Computing, 14(2):446--460, March 1993.
No context found.
F. L. Alvarado and R. Schreiber. Optimal parallel solution of sparse triangular systems. SIAM J. Sci. Comput., 14:446--460, 1993.
No context found.
Alvarado, F.L., Schreiber, R., "Optimal Parallel Solution of Sparse Triangular Systems", SIAM J. Sci. Stat. Comput. 14 (1993), 446-460.
No context found.
F. Alvarado and R. Schreiber, "Optimal parallel solution of sparse triangular systems, " SIAM Journal of Scientific and Statistical Computation, 1992.
No context found.
Alvarado, F.L., Schreiber, R., "Optimal Parallel Solution of Sparse Triangular Systems", SIAM J. Sci. Stat. Comput. 14 (1993), 446-460.
No context found.
Alvarado, F.L., Schreiber, R., "Optimal Parallel Solution of Sparse Triangular Systems ", SIAM J. Sci. Comput. 14 (1993), 446-460.
No context found.
Alvarado, F.L., Schreiber, R., "Optimal Parallel Solution of Sparse Triangular Systems", SIAM J. Sci. Stat. Comput. 14 (1993), 446-460.
No context found.
Alvarado, F. L. and Schreiber, R. Optimal parallel solution of sparse triangular systems. SIAM J. Sci. and Stat. Comput., 14:446--460, 1993.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC