11 citations found. Retrieving documents...
R.C. Agarwal, F.G. Gustavson, and M. Zubair. A high performance algorithm using pre-processing for the sparse matrix-vector multiplication. In Supercomputing '92, pages 321es# Minneapolis, Minnesota, November16--21 19921

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Self-adapting Numerical Software for Next Generation.. - Dongarra, Eijkhout (2002)   (2 citations)  (Correct)

....manner. Intermediately sparse storages will require development of hybrid kernels, whose development should springboard directly from the dense work. The second approach to exploiting input data structure involves having the software recognize sparsity patterns without user intervention, as in [37, 2]. Recognizing contiguous pieces of data in the original user s operands, such as subdiagonals or rectangular blocks, allows the computational kernel to maximize the data reuse at the highest levels of the memory hierarchy. In this way, recognizing patterns in a sparse matrix can lead to large ....

.... coupled with dense code to generate sparse codes, as in [9, 10, 11, 39, 38] Work that will be more immediately usable involves generating optimal kernels based on known storage sparsity patterns, as in [30, 11, 37, 45] Finally, some work has also been done on performing sparsity analysis, as in [37, 2]. Just as in our work on dense kernels, we will explore these research results within the SANS framework in order to discover how to make the performance gains they lead to portable. Historically, we have found that this involves extending many of these areas of research. In particular, research ....

R. Agarwal, F. Gustavson, and M. Zubair. A High-Performance Algorithm Using Preprocessing for the Sparse Matrix-Vector Multiplication. In Proceedings of the International Conference on SuperComputing, pages 32--41, 1992.


Level 3 Basic Linear Algebra Subprograms for Sparse.. - Duff, Marrone.. (1997)   (3 citations)  (Correct)

....on the number of right hand sides or the number of columns in 8 the full matrix (K, say) The entries of the sparse matrix are used K times, therefore as K increases the cost of indirectly addressing them reduces. When K is very large, a full vector model of computation gives high performance (Agarwal, Gustavson Zubair 1992). Similarly, it can be helpful to provide CSDP with information on the number of subsequent calls that will be made using the output structure. If the computation is to be repeated a large number of times, it is more worthwhile to spend extra time in the data preprocessing phase in order to obtain ....

....depending on the storage format. For example, see results of Erhel (1990) or the Stripped Jagged Diagonals scheme (Paolini Radicati di Brozolo 1989) that has proven efficient on an IBM 3090 VF computer. A good example of the influence of data structures on machine performance is given by Agarwal et al. 1992). 6 Permutations In order to avoid permutations on each vector algebra operation, we allow permutations on the data structures outside the loop of the iterative algorithm. We believe that this can be accomplished using only a column permutation of a sparse matrix and a row permutation of a full ....

Agarwal, R. C., Gustavson, F. G. & Zubair, M. (1992), A high performance algorithm using pre-processing for the sparse matrix-vector multiplication, in ACM/IEEE, ed., `Proceedings of Supercomputing '92, Minneapolis, MN. Nov 16-20, 1992.', IEEE Computer Society Press, Los Alamitos, California, pp. 32--41.


Improving Performance of Sparse Matrix-Vector Multiplication - Pinar, Heath (1999)   (5 citations)  (Correct)

....memory requirement, because only one index per block is required. Moreover, the entry of the right hand side vector x can 3 be used multiple times as opposed to once in the conventional scheme after a load operation. A similar problem has been studied in the context of vector processors [1], but those efforts concentrate on finding fewer blocks of larger size, whereas we are interested in much smaller blocks. 2.2 Blocked Compressed Row Storage In this section we propose a new sparse matrix storage scheme designed to reduce the number of load operations. The idea of this scheme is ....

R. C. Agarwal, F. G. Gustavson, and M. Zubair, "A high performance algorithm using pre-processing for sparse matrix vector multiplication", Proceedings of Supercomputing '92, pp. 32--41.


Parallelizing Unstructured Sparse Matrix Computations On.. - Venugopal (1993)   (Correct)

....solved for several right hand side vectors b, so that the run time preprocessing cost is amortized over the iterations of the solves. For sparse matrix vector multiplication, we sketch how the methodology can be applied to parallelize an algorithm recently proposed by Agarwal, Gustavson and Zubair [1]. 214 Their algorithm is based on preprocessing the sparse matrix to extract a block based structure, and using the extracted structure to perform several iterations of the multiplication. 8.1 Block sparse triangular systems The partitioning of the matrix L implicitly defines a partitioning of ....

....multiplication, y = Ax, is an important component in a variety of engineering and scientific applications, and typically arises in iterative solution methods. One of the most recent algorithms for sparse matrix vector multiplication is the one proposed by Agarwal, Gustavson and Zubair [1]. The main idea of their algorithm is to exploit any regular block structures or features in the sparse matrix A, which could lead to better exploitation of cache and vector processing or floating 225 point units, and therefore result in high performance. The algorithm is called Feature Based ....

[Article contains additional citation context not shown here]

R. C. Agarwal, F. G. Gustavson, and M. Zubair. A high performance algorithm using pre-processing for the sparse matrix-vector multiplication. In Proceedings of Supercomputing '92, pages 32--41, 1992.


Techniques for the Interactive Development of Numerical Linear.. - Marsolf (1997)   (3 citations)  (Correct)

.... primitives for the multiplication of a sparse matrix and a vector, Agarwal, Gustavson and Zubair have advocated an analysis procedure that decomposes a sparse matrix into a sum of several sparse matrices, each of which possess a sparsity structure that can be exploited to enhance performance [AGZ92] In that work, the extraction and exploitation of sparsity structure was limited to the scope of the matrix vector product primitive. When placed in the context of, say, one iteration of a sparse linear system solver, it is clear that the structural information can be propagated to the other ....

R. C. Agarwal, F. G. Gustavson, and M. Zubair. A High Performance Algorithm Using Pre-Processing for the Sparse Matrix-Vector Multiplication. In Proceedings of Supercomputing '92, pages 32--41, 1992.


Improving Memory-System Performance of Sparse Matrix-Vector.. - Sivan Toledo (1997)   (24 citations)  (Correct)

....sparse matrix vector multiplication. The floating point units are therefore underutilized. We present below two techniques that address this issue, including blocking to reduce the number of load instructions. Blocking in sparse matrixvector multiplication was used in somewhat different forms in [1, 4]. Although the techniques that we propose can be applied separately, they are most effective when they are combined. In particular, reordering the matrix can enhance or degrade the effect of blocking. Also, without the reduction in the number of cache misses on x that reordering yields, our ....

....is different from previous algorithms, mainly in that our algorithm attempts to find many small completely dense blocks. Other researchers have proposed algorithms that attempt to find larger (and hence fewer) dense blocks and or blocks that are not completely dense. Agarwal, Gustavson and Zubair [1] describe an algorithm designed for vector processors. Their algorithm tried to find few large relatively dense blocks. They divide the rows of the matrix into blocks, and attempt to find one fairly dense rectangular block in every block of rows. The other nonzeros in the block of rows remain ....

[Article contains additional citation context not shown here]

R. C. Agarwal, F. G. Gustavson, and M. Zubair. A high performance algorithm using preprocessing for sparse matrix-vector multiplication. In Proceedings of Supercomputing '92, pages 32--41, November 1992.


The Generation of Optimized Codes using Nonzero.. - Gallivan, Marsolf.. (1995)   (2 citations)  (Correct)

.... primitives for the multiplication of a sparse matrix and a vector, Agawal, Gustavson and Zubair have advocated an analysis procedure that decomposes a sparse matrix into a sum of several sparse matrices, each of which possess a sparsity structure that can be exploited to enhance performance [1]. In that work, the extraction and exploitation of sparsity structure was limited to the scope of the matrix vector product primitive. When placed in the context of, say, one iteration of a sparse linear system solver, it is clear that the structural information can be propagated to the other ....

R. C. Agawal, F. G. Gustavson, and M. Zubair. A High Performance Algorithm Using Pre-Processing for the Sparse Matrix-Vector Multiplication. In Proceedings of the International Conference on Supercomputing, pages 32--41, 1992.


Segmented Operations for Sparse Matrix Computation on.. - Blelloch, Heroux, Zagha (1993)   (4 citations)  (Correct)

....20] Feature Extraction: Another approach to optimizing sparse matrix operations is to decompose the matrix into additive submatrices and store each submatrix in a separate data structure. For example, the Feature Extraction Based Algorithm (FEBA) presented by Agarwal, Gustavson, and Zubair [1] recursively extracts structural features from a given sparse matrix structure and uses the additive property of matrix multiplication to compute the matrix vector product as a sequence of operations. The FEBA scheme first attempts to extract nearly dense blocks. From the remainder of entries, it ....

....in cases where the sparse matrix has no known regular sparsity pattern or where a general purpose data structure is needed in order to handle a variety of sparse matrix patterns. There are many general purpose sparse data structures, but we only discuss a few of them here. For more examples, see [1, 16, 18, 19, 20, 22, 27, 30, 31, 32, 33, 34]. Compressed Sparse Row: One of the most commonly used data structures is the compressed sparse row (CSR) format. The CSR format stores the entries of the matrix row by row in a scalar valued array VAL. A corresponding integer array INDX holds the column index of each entry, and another integer ....

R. C. Agarwal, F. G. Gustavson, and M. Zubair. A high performance algorithm using pre-processing for the sparse matrix-vector multiplication. In Proceedings Supercomputing '92, pages 32--41, Nov. 1992.


Improving Memory-System Performance of Sparse Matrix-Vector.. - Toledo (1997)   (24 citations)  (Correct)

....units are often the bottleneck in sparse matrix vector multiplication. The floating point units are therefore underutilized. We present below a blocking technique that reduces the number of load instructions. Blocking in sparse matrix vector multiplication was used in somewhat different forms in [1, 3]. Another inessential but still important factor that limits the performance of the code in Figure 1 is the fact that the column index colind(jp) must be converted from an integer index to a byte offset from the beginning of x. This conversion, which is required on most processors for indirect ....

....small completely dense blocks. Other researchers have proposed algorithms that attempt to find larger (and hence fewer) dense blocks and or blocks that are not completely dense. The full paper contains detailed comparisons between our blocking strategy and those of Agarwal, Gustavson and Zubair [1] and of Balay, Gropp, McInnes and Smith [3] Prefetching in Irregular Loops. Traditionally, prefetching was considered to be a technique for hiding latency, in the sense that prefetching can prevent memory access latency from degrading performance, as long as memory bandwidth is sufficient to ....

[Article contains additional citation context not shown here]

R. C. Agarwal, F. G. Gustavson, and M. Zubair, A high performance algorithm using pre-processing for sparse matrix-vector multiplication, in Proceedings of Supercomputing '92, Nov. 1992, pp. 32--41.


A Relational Approach To The - Automatic Generation Of   (Correct)

No context found.

R.C. Agarwal, F.G. Gustavson, and M. Zubair. A high performance algorithm using pre-processing for the sparse matrix-vector multiplication. In Supercomputing '92, pages 321es# Minneapolis, Minnesota, November16--21 19921


Optimizing Sparse Matrix-Vector Product Computations.. - Mellor-Crummey, Garvin (2003)   (Correct)

No context found.

R. Agarwal, F. Gustavson, and M. Zubair. A high performance algorithm using pre-processing for the sparse matrix vector multiplication. In Proceedings of Supercomputing '92, Minneapolis, MN, Nov. 1992.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC