| L.F. Romero and E.L. Zapata. Data Distributions for Sparse Matrix Vector Multiplication. J. Parallel Computing, 21(4):583--605, April 1995. |
....a good balance in the computational work. Recursive bisection is a well known optimisation technique, which has been used for instance in parallel circuit simulation, see Fox et al. 17, Chapt. 22] This technique can also be used to partition matrices, as has been done by Romero and Zapata [35] to achieve good load balance in sparse matrix vector multiplication. In the present work, we bring the techniques discussed above together, hoping to obtain a more e#cient sparse matrix vector multiplication. Our primary focus is the general case of a sparse rectangular matrix with input and ....
L. F. Romero and E. L. Zapata, Data distributions for sparse matrix vector multiplication, Parallel Computing, 21 (1995), pp. 583--605.
....we have decided to concentrate basically on its application to preconditioned iterative methods. The main operation of these methods is the sparse matrix vector product. This has forced us to use an adequate data distribution for this operation, such as the Block Row Scatter (BRS) distribution [10]. This distribution obtains very good results in this operation, making optimal use of the memory resources and providing a good computational load balance. 2 Description of the methods We consider the solution of a lower triangular system Lx = b, where L is the coefficient matrix of the linear ....
L. F. Romero and E. L. Zapata. Data distributions for sparse matrix vector multiplication. Parallel Computing, (21):583--605, April 1995.
....to the rest of the operations to avoid data redistribution between different steps. The main operation of Krylov subspace methods is the sparse matrix vector product. This has forced us to use an adequate data distribution for this operation, such as the Block Column Scatter (BCS) distribution [11]. This distribution yields good results in this operation, making optimal use of the memory resources and providing a good computational load balance. It employs a cyclic projection of the matrix onto P Theta Q processors. The matrix is subdivided according to a template of size P Theta Q, and ....
L. F. Romero and E. L. Zapata. Data distributions for sparse matrix vector multiplication. Parallel Computing, (21):583--605, April 1995.
....we have decided to concentrate basically on its application to preconditioned iterative methods. The main operation of these methods is the sparse matrix vector product. This has forced us to use an adequate data distribution for this operation, such as the Block Row Scatter (BRS) distribution [14]. This distribution obtains very good results in this operation, making optimal use of the memory resources and providing a good computational load balance. 2 Description of the methods We consider the solution of a lower triangular system Lx = b, where L is the coefficient matrix of the linear ....
L. F. Romero and E. L. Zapata. Data distributions for sparse matrix vector multiplication. Parallel Computing, (21):583--605, April 1995.
....of large sparse matrices on a two dimensional mesh processor array; J. Anderson et al. proposed a modification using a less compact data structure and a heuristic scheduling procedure that enabled them to exploit more parallelism and reduce the amount of interprocessor communication [2] In [11], a sparse matrix is decomposed into submatrices with an equal number of non zero elements (multiple recursive decomposition) however, the different sizes (in general) of the individual submatrices make this method unsuitable for SIMD computers. A randomized packing algorithm proposed in [10] ....
.... A randomized packing algorithm proposed in [10] divides the matrix into submatrices of equal size (except, possibly, the ones on the boundaries of the matrix) This algorithm is conceptually similar to (in the sense that it divides the matrix into submatrices) the block row scatter method [11], and is designed for a SIMD machine. It is evaluated in Section 5, along with the segmented scan method [5] and the snake like method [8] 11] these three algorithms were found to be the most competitive for the matrices associated with the intended application. 3. SPARSE MATRIX VECTOR ....
[Article contains additional citation context not shown here]
L. F. Romero and E. L. Zapata, Data Distributions for Sparse Matrix Vector Multiplication, Technical report, Department of Computer Architecture, University of Malaga, Spain, 1993; also published in the Proceedings of the Fourth International Workshop on Compilers for Parallel Computers, (1993).
....A which is assumed to be known in advance. This approach is justified if one has to compute many products involving the same matrix, or involving matrices with the same nonzero structure. The development of efficient partition schemes is a very active area of research; for some recent results see [3, 4, 5, 14, 15]. In this paper we prove lower bounds which also hold for algorithms which partition the components of x and y on the basis of the nonzero structure of A. First, we introduce four probability measures on the class of n Theta n sparse matrices with Theta(n) nonzero elements. Then, we show that ....
L. Romero and E. Zapata. Data distributions for sparse matrix vector multiplication. Parallel Computing 21:583--605, 1995. Sparse matrix vector multiplication 47
....A which is assumed to be known in advance. This approach is justified if one has to compute many products involving the same matrix, or involving matrices with the same nonzero structure. The development of efficient partition schemes is a very active area of research; for some recent results see [3, 4, 5, 14, 15]. In this paper we prove lower bounds which also hold for algorithms which partition the components of x and y on the basis of the nonzero structure of A. First, we introduce four probability measures on the class of n # n sparse matrices with ##n# nonzero elements. Then, we show that with high ....
L. Romero and E. Zapata. Data distributions for sparse matrix vector multiplication. Parallel Computing 21:583--605, 1995. Sparse matrix vector multiplication 47
....because, in order to preserve load balancing, the data set has to be partitioned proportionally to the computational power of each node. Such a partitioning problem presents some similarities to that of mapping the data set of irregular computations onto homogeneous platforms (e.g. MRD strategy [25], irreg distribution in CHAOS [24] However, distribution methods for sparse matrix computations on homogeneous platforms require data structures, such as index translation tables, that are not suitable for dense computations on heterogeneous systems. A comparative analysis between the solutions ....
....the ADD partitioning algorithm with the purpose of supporting computations the amount of work of which is not proportional to the number of elements in their data set. We can now compare ADD against two known supports for irregular and sparse computations, that are the CHAOS library [24] and MRD [25]. The CHAOS project aims to provide a library of primitives and runtime supports by which the HPF supercompiler can deal with unstructured computations. With respect to ADD, which effectively supports irregular data distributions for heterogeneous and variable platforms, CHAOS has much wider ....
[Article contains additional citation context not shown here]
L.F. Romero and E.L. Zapata, Data distribution for sparse matrix vector multiplication,
No context found.
L.F. Romero and E.L. Zapata. Data Distributions for Sparse Matrix Vector Multiplication. J. Parallel Computing, 21(4):583--605, April 1995.
No context found.
L.F.Romero and E.L.Zapata. Data Distributions for Sparse Matrix Vector Multiplication. Parallel Computing, 21(4):583--605, 1995.
No context found.
Romero, L.F., Zapata, E.L.: Data Distributions for Sparse Matrix Vector Multiplication. Parallel Computing, 21(4) (1995) 583-605
No context found.
L.F. Romero and E.L. Zapata. Data Distributions for Sparse Matrix Vector Multiplication. Parallel Computing, 21(4):583--605, 1995. 231, 235, 244 R. Asenjo et al.
.... MVPRODUCT is a set of basic sparse matrix operations including sparse matrix vector multiplication and the product and sum of two sparse matrices [3, 12] The representation of the sparse 4 matrices employs two different schemes: compressed row storage (CRS) and compressed column storage (CCS) [22]. The access pattern is demonstrated by the following code abstract: do i=1,anr do k=1,bnc do ja=ar(i) ar(i 1) 1 do jb=bc(k) bc(k 1) 1 if (ac(ja) eq.br(jb) THEN c(i,k) c(i,k) ad(ja) bd(jb) endif enddo enddo enddo enddo Here indirection occurs on the right hand side of the ....
L.F. Romero and E.L. Zapata. Data Distributions for Sparse Matrix Vector Multiplication. J. Parallel Computing, 21(4):583--605, April 1995.
....minimize memory usage. In regard with the data distribution, a stripped ordering results in a thin banded diagonal which will produce the parallel distribution with less communication expenses. The Multiple Recursive Distribution (MRD) has higher locality, which gives rise to a better cache usage [2]. Both have been used in this work with good results, but the choice of the method will depend of the scenery and the computational platform. 4 Solving the System: Preconditioned Conjugate Gradient In the Solve System procedure, the unknowns v are computed. As stated above, implicit integration ....
Romero, L.F., Zapata E.L.: Data Distribution for Sparse Matrix Vector Multiplication. Parallel Computing Vol. 21, pp. 583-605, 1995.
....of the matrix is used in order to minimize memory usage. A striped ordering results in a thin banded diagonal which will produce the parallel distribution with less communication expenses. The Multiple Recursive Distribution (MRD) has higher locality, which will result in a better cache usage [2]. Both have been used in this work with good results, but the choice of the method will depend of the scenery and the computational platform. 3.1 Forces In the core of the forces evaluation stage, loops like the one presented below 1 are found. for (i=0;i NumForces;i ) partic0 = ....
Romero, L.F., Zapata E.L.: Data Distribution for Sparse Matrix Vector Multiplication. Parallel Computing Vol. 21, pp. 583--605, 1995.
....to the sparse matrix itself [30] However, having such directive permit to define sparse data distributions with the property of, on the one hand, exploiting data locality and, on the other hand, simplifying the location of non local data. We have proposed the directive SPARSE to fill this gap [4,46]. For instance, for the example shown in figure 1 (a) we would write HPF SPARSE(CRS(Data,Col,Row) A(N,N) to express that Data, Col and Row are the arrays associated with the CRS representation of a sparse matrix. In addition, the sparse matrix is giving a name, A in the example shown. ....
....both nonzero and zero elements are considered for distribution. As a consequence of this specification, the arrays Data, Col and Row will be mapped to the processors in such a way that the above distribute directive is obeyed. This kind of sparse distribution was called BRS (Block Row Scatter) in [46,4]. For mapping purposes, the sparse matrix is dealt as a dense matrix. Hence the workload is balanced using similar techniques than if the code were dense. Besides the compiler can derive simple formulas to determine the location of each sparse data entry. Communication schedules are easy to ....
[Article contains additional citation context not shown here]
L.F. Romero and E.L. Zapata, Data Distributions for Sparse Matrix Vector Multiplication, J. Parallel Computing , 21 (4) (April 1995) 583-605.
....is assigned the non zeros that lie in its region. Thus, processors may be assigned regions of unequal sizes but the indexes of each region can be described in a regular manner. There are many distributions that satisfy this condition. Two of these are: ffl Multiple Recursive Decomposition (MRD) [7] recursively decomposes the sparse matrix over P processors using horizontal and vertical partitions, until the matrix has been decomposed into P1 Theta P2 rectangular submatrices. At each stage of the partitioning process, the non zeros in the submatrix of that stage are divided as evenly as ....
....using horizontal and vertical partitions, until the matrix has been decomposed into P1 Theta P2 rectangular submatrices. At each stage of the partitioning process, the non zeros in the submatrix of that stage are divided as evenly as possible (see Figure 2) ffl Block Row Scatter (BRS) [7] uses a cyclic mapping of the matrix among P processors. The matrix is subdivided using a stencil of size P1 , and each processor gets the non zero elements matching its position in the stencil (see Figure 3) This is similar to scatterdecomposition distribution schemes and is useful in situations ....
L.F. Romero and E.L. Zapata, Data distributions for sparse matrix vector multiplication solvers, Parallel Computing, Vol. 21, no. 4, pp. 583-605, April 1995.
No context found.
L.F. Romero and E.L. Zapata, Data distributions for sparse matrix vector multiplication solvers, J. Parallel Computing, Vol. 21, pp. 583-605, 1995.
....distributions presented (MRD and BRS BCS) were developed in our department. We have tested them mainly with sparse matrix algebra problems like Sparse Matrix Vector product and LU decomposition. A complete explanation with demonstration of their statistical properties can be found in [3, 4]. 2 Multiple Recursive Decomposition distribution Berger and Bokhari propossed the Binary Recursive decomposition (BRD) in [5] for partitioning irregular problems. With that strategy matrix is recursively splitted, bisecting sucesively by each dimension of the matrix, until be have as many ....
L. F. Romero and E. L. Zapata. Data distributions for sparse matrix vector multiplication solvers. Parallel Computing, 1994. To appear.
.... scheme the combination of these two aspects (data structure data distribution) We have developed and extensively tested a number of pseudo regular distribution schemes for sparse problems, which combines natural extensions of regular data distributions with compressed data storages [2] 4] [23] [25] 26] These distribution schemes can be incorporated to a data parallel language (HPF) in a simple way. The programmer can use them easily and obtain high efficiencies from the parallelization of irregular codes. The above mentioned distribution schemes are faced to static sparse problems ....
....defines an array (or two) of pointers to the above items. Once storage schemes have been defined, we can use the SPARSE directive to specify that a sparse matrix (or sparse array) is stored using a particular linked list scheme. This directive was previously introduced, for instance in [2] and in [23], in the context of static sparse applications. Fig. 5 shows the BNF syntax for the dynamic SPARSE directive. The first two data structures, LLRS and LLCS, are defined by two arrays of pointers ( pointer array name ) which point to the beginning and to the end, respectively, of each row (or ....
[Article contains additional citation context not shown here]
L.F. Romero and E.L. Zapata. Data Distributions for Sparse Matrix Vector Multiplication. Parallel Computing, 21(4):583--605, 1995.
No context found.
L.F. Romero and E.L. Zapata. Data distributions for sparse matrix vector multiplication solvers. Journal of Parallel Computing (To appear).
....of each region can be described in a regular manner. Each region is again a sparse matrix which is locally represented on each processor using the very same format than in the global case. Distribution strategies fullfilling such conditions are: ffl The Multiple Recursive Decomposition (MRD) [1, 15] recursively decomposes the sparse matrix over P processors using horizontal and vertical partitions, until the matrix has been de AA IRNA FACT IRNF IPTRA IPTRL IPTRU A C A C E B D F A B C C B A WORKSPACE WORKSPACE A1 A2 A3 U1 L1 U2 L2 U3 L3 B D E F Figure 3: Main data structure for the ....
....At each stage of the partitioning process, the nonzeros in the submatrix of that stage are divided as evenly as possible (see Figure 4) For a CRS local representation, the MRD CRS distribution scheme is originated, and similarly for MRD CCS with respect to CCS. ffl The Block Row Scatter (BRS) [1, 15] uses a cyclic mapping of the matrix represented by CRS among P processors. The matrix is subdivided using a stencil of size P 1 , and each processor gets the non zero elements matching its position in the stencil. For situations where the matrix is represented by CCS, the Block Column Scatter ....
L.F. Romero and E.L. Zapata. Data distributions for sparse matrix vector multiplication solvers. Journal of Parallel Computing vol. 21, no. 4, April 1995, pp. 583-605.
....dimension of matrix M. We find parallel Givens algorithms in [2] and [14] Matrix M is distributed onto a mesh with m Theta n PEs. Each PE is identified by coordinates (idx,idy) with 0 idx n and 0 idy m. Nonzero elements of M are mapped over PEs using a Block Column Scatter (BCS) scheme [10], but these elements are stored in doubly linked lists instead of vectors. This distribution provides data and load balancing. The algorithm requires access both by rows and by columns; a data structure such as a two dimensional doubly linked list (used in [11] for a LU factorization) would be ....
L.F.Romero and E.L.Zapata. Data Distributions for Sparse Matrix Vector Multiplication. Parallel Computing, 21(4):583--605, 1995.
No context found.
L.F. Romero and E.L. Zapata. Data Distributions for Sparse Matrix Vector Multiplication. J. Parallel Computing, 21(4):583--605, April 1995.
No context found.
L.F. Romero and E.L. Zapata. Data Distributions for Sparse Matrix Vector Multiplication. J. Parallel Computing, 21(4):583--605, April 1995.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC