23 citations found. Retrieving documents...
E.-J. Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Combining Performance Aspects of Irregular.. - Strout, Carter.. (2002)   (1 citation)  (Correct)

.... rate depends on the number of processors and degrades as this number increases [15] There has also been work on run time techniques for improving the intra iteration locality for irregular grids which applies a data reordering and computation rescheduling within a single convergence iteration [25, 26, 11, 19, 16]. Some of these techniques do not apply to Gauss Seidel because it has data dependences within the convergence iteration. We use graph partitioning of the sub matrices to give our owner computes simulation better intra iteration locality. Work which looks at inter iteration locality on regular ....

Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiply. Ph.d. thesis, University of California, Berkeley, May 2000.


Automatic Performance Tuning and Analysis of Sparse .. - Vuduc, Kamil, Hsu, .. (2002)   (Correct)

....sparse data structures. Consequently, it is not unusual to see kernels such as SpTS run at under 10 of peak uniprocessor floating point performance. Our approach to automatic tuning of SpTS builds on prior experience with building tuning systems for sparse matrix vector multiply (SpMV) [21, 22, 40], and dense matrix kernels [8, 41] In particular, we adopt the two step methodology of previous approaches: 1) we identify and generate a set of reasonable candidate implementations, and (2) search this set for the fastest implementation by some combination of performance modeling and actually ....

....Therefore, we consider both algorithmic and data structure reorganizations which partition the solve into a sparse phase and a dense phase. To the sparse phase, we adapt the register blocking optimization, previously proposed for sparse matrix vector multiply (SpMV) in the Sparsity system [21, 22], to the SpTS kernel; to the dense phase, we make judicious use of highly tuned BLAS routines by switching to a dense implementation (switch to dense optimization) We describe fully automatic hybrid o# line on line heuristics for selecting the key tuning parameters: the register block size and ....

E.-J. Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000.


Using Sparse Tiling with Symmetric Multigrid - Strout, Carter, Ferrante (2002)   (Correct)

.... which looks at inter iteration locality on regular grids includes [2] 20] and [19] There has also been work on run time techniques for improving the intra iteration locality for irregular meshes which applies a data reordering and computation rescheduling within a single convergence iteration [16, 17, 4, 11, 8]. However, many of these techniques do not apply to Gauss Seidel because it has data dependences within the convergence iteration. The only other technique to our knowledge which handles inter iteration locality for irregular meshes is unstructured cache blocking by Douglas et al. 5] We have ....

Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiply. Ph.d. thesis, University of California, Berkeley, May 2000.


Better Tiling and Array Contraction for Compiling Scientific.. - Pike, Hilfinger (2002)   (Correct)

....6.3 Parameter Searching A compilation system that does not expose parameters for tuning is necessarily suboptimal, because no compiler that takes finite time can always guess the best parameters for all programs. Projects that use parameter searching include PHiPAC [1] ATLAS [29] Sparsity [10] [11] FFTW [7] and OCEANS [14] 15] 21] 16] PHiPAC and ATLAS automatically generate numerous variants of matrix multiply or other kernels in an attempt to select the best one for a particular task on a particular machine. PHiPAC and ATLAS use hand crafted templates for handling edge cases, ....

Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiplication. Ph.D. dissertation, University of California, Berkeley, 2000.


Performance Optimizations and Bounds for Sparse.. - Vuduc, Demmel.. (2002)   (4 citations)  (Correct)

....the matrix structure is given at run time, we must be careful not to spend too much time either generating the set of candidate algorithms or searching. By contrast, all the algorithm generation and search can be done o# line for the dense BLAS. In prior work on the Sparsity system (Version 1. 0) [17, 16], Im developed an algorithm generator and search strategy for SpMV that was quite e#ective in practice. The Sparsity generators employed a variety of performance optimization techniques, including register blocking, cache blocking, and multiplication by multiple vectors. In this paper, we focus on ....

....and performance results on key dense kernels. Latency estimates were obtained from published sources and confirmed experimentally using the memory system microbenchmark due to Saavedra Barrera [27] Matrices We evaluate the SpMV implementations on the matrix benchmark suite used by Im [16]. Table 2 summarizes the size and source of each matrix. Most of the matrices are available from either of the collections at NIST (MatrixMarket [5] and the University of Florida [9] The matrices in Table 2 are arranged in roughly four groups. Matrix 1 is a dense matrix stored in sparse format; ....

[Article contains additional citation context not shown here]

E.-J. Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000.


Reordering and Storage Optimizations for Scientific Programs - Pike (2002)   (4 citations)  (Correct)

....2.4 Parameter Searching A compilation system that does not expose parameters for tuning is necessarily suboptimal, because no compiler that takes finite time can always guess the best parameters for all programs. Projects that use parameter searching include PHiPAC [7] ATLAS [36] Sparsity (Im [17]; Im and Yelick [18] BeBOP [6] 16 FFTW ( 12] and iterative compilation (Kisuki et al. 21] Kisuki et al. 22] O Boyle et al. 31] PHiPAC and ATLAS automatically generate numerous variants of matrix multiply or other kernels in an attempt to select the best one for a particular task on ....

Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiplication. Ph.D. thesis, University of California, Berkeley, 2000.


Proof of Correctness for Sparse Tiling of Gauss-Seidel - Strout, Carter, Ferrante   (Correct)

....the tiling and array padding factors has not been solved for all cases. Rivera and Tseng [15] look more speci cally at how to do tiling and array padding for 3D regular meshes. There has also been work on run time techniques for improving the intra iteration locality for irregular meshes [8, 14, 2, 7, 13]. Mitchell et al. 14] describe a compiler optimization which operates on non ane array references in loops. Sparse matrix data structures require indirect array references, which are a type of non ane array reference. Also, Im and Yelick [8, 9] describe a code generator called SPARSITY which ....

....locality for irregular meshes [8, 14, 2, 7, 13] Mitchell et al. 14] describe a compiler optimization which operates on non ane array references in loops. Sparse matrix data structures require indirect array references, which are a type of non ane array reference. Also, Im and Yelick [8, 9] describe a code generator called SPARSITY which generates blocked sparse matrix vector multiply. Both Mitchell and Im s techniques improve spatial and temporal locality on the vectors u and f when dealing with the system Au = f . Therefore, when applied to an iterative algorithm such as ....

Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiply. Ph.d. thesis, University of California, Berkeley, May 2000.


Rescheduling for Locality in Sparse Matrix Computations - Strout, Carter, Ferrante (2001)   (2 citations)  (Correct)

....structures to do the Gauss Seidel computation. Since the structure isn t known until runtime, any rescheduling of the computation or data rearrangement must occur at runtime as well. 2 Previous Work There has been a lot of previous work on techniques for improving the intra iteration locality [4, 7, 1, 3, 6]. We will be presenting a technique for improving both the intra and inter iteration locality. The only other technique, we are aware of, which accomplishes this for iterative algorithms on irregular meshes is cacheblocking by Douglas et al. 2] On the left is an Iteration Space Graph which has ....

Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiply. Ph.d. thesis, University of California, Berkeley, May 2000.


Rescheduling for Locality in Sparse Matrix Computations - Strout, Carter, Ferrante   (Correct)

....transformations which we will refer to as sparse tiling. Mitchell et al. [14] describe a compiler optimization which operates on nonaffine array references in code. The use of sparse data structures causes indirect array references which are a type of non affine array reference. Also, Eun Jin Im [10] describes a code generator called SPARSITY which generates cacheblocked sparse matrix vector multiply. Both of these techniques improve spatial and temporal locality on the vectors u and f when dealing with the system Au = f . However, they do not improve the temporal locality on the sparse ....

Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiply. Ph.d. thesis, University of California, Berkeley, May 2000.


Optimizing Sparse Matrix Computations for Register Reuse in.. - Im, Yelick   (7 citations)  Self-citation (Im)   (Correct)

....to know the details of their machine s memory hierarchy or how their particular matrix structure will be mapped onto that hierarchy. Sparsity performs several optimizations, including register blocking, cache blocking, loop unrolling, matrix reordering, and reorganization for multiple vectors [Im00] The optimizations involve both code and data structure transformations, which can be quite expensive. Fortunately, sparse matrix vector multiplication is often used in iterative solvers or other settings where the same matrix is multiplied by several di#erent vectors, or matrices with di#erent ....

....probably a reflection of the more expensive memory system on the R10000. 5 Performance of Register Optimizations We have generated register blocked codes for varying sizes of register blocks and varying numbers of vectors using Sparsity, and have measured their performance on several machines [Im00] In this paper we will present the results for a set of 39 matrices on the UltraSPARC I and MIPS R10000. The matrices in the set are taken from fluid dynamics, structural modeling, chemistry, economics, circuit simulation and device simulation, and we include one dense matrix in sparse format ....

Eun-Jin Im. Optimizing the Performance of Sparse Matrix - Vector Multiplication. PhD thesis, University of California at Berkeley, May 2000.


Optimization of Sparse Matrix Kernels for Data Mining - Im, Yelick (2000)   (1 citation)  Self-citation (Im)   (Correct)

....to worsen as the relative speed of processors and memory continues to diverge and the size of data sets to be mined increases. We have developed a system called Sparsity to automatically generate an optimized sparse matrix vector multiplication routine for a given matrix structure and machine [Im00] The optimization include register level blocking, cache blocking, and blocking across multiple vectors when they exist in the higher level algorithm. The absolute performance as well as the relative speedup each optimization is highly dependent on the matrix structure, which in turn depends on ....

....value. This example matrix has 4, 4 4 blocks. The row start array points to the beginning of each row of blocks, while the block ptr array keeps pointers to the beginnings of individual rows inside those blocks. overhead, it incurs significantly more runtime overhead than static cache blocking [Im00] so use static cache blocking. The practical implication of this decision is that the matrix storage should be used either throughout an entire application or at least during a iterative solver to amortize the cost of reorganization. In static cache blocking, the sparse matrix is reorganized by ....

[Article contains additional citation context not shown here]

Eun-Jin Im. Optimizing the Performance of Sparse Matrix - Vector Multiplication. PhD thesis, University of California at Berkeley, May 2000.


Performance Modeling and Analysis of Cache Blocking.. - Nishtala, Vuduc.. (2004)   (Correct)

No context found.

E.-J. Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000.


A New Algorithm for Continuation and Bifurcation Analysis of.. - Castillo (2004)   (Correct)

No context found.

E. Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, U.C Berkeley, 2000.


Automatic Performance Tuning and Analysis of Sparse .. - Vuduc, Kamil, Hsu, .. (2002)   (Correct)

No context found.

E.-J. Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000.


Performance Tuning and Analysis of Sparse Triangular Solve .. - Richie Bebop Computer (2002)   (Correct)

No context found.

Eun-Jin Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000.


Performance Optimizations and Bounds for Sparse.. - Vuduc, Demmel, Yelick (2002)   (4 citations)  (Correct)

No context found.

Eun-Jin Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000.


Memory Hierarchy Optimizations and Performance Bounds.. - Vuduc, Gyulassy.. (2003)   (Correct)

No context found.

Eun-Jin Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000.


Optimizing Sparse Matrix-Vector Product Computations.. - Mellor-Crummey, Garvin (2003)   (Correct)

No context found.

E.-J. Im. Optimizing the Performance of Sparse Matrix-Vector Multiplication. PhD thesis, University of California Berkeley, May 2000.


Performance Modeling and Analysis of Cache Blocking.. - Nishtala, Vuduc.. (2004)   (Correct)

No context found.

E.-J. Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000.


When Cache Blocking of Sparse Matrix Vector Multiply.. - Nishtala, Vuduc..   (Correct)

No context found.

E.-J. Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000.


Algorithms + Data Structures + Transformations = Portable Program .. - Strout (2000)   (Correct)

No context found.

Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiply. Ph.d. thesis, University of California, Berkeley, May 2000. URL: http://www.cs.berkeley.edu/ ejim/publication/.


Combining Performance Aspects of Irregular.. - Strout, Carter.. (2002)   (1 citation)  (Correct)

No context found.

Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiply. Ph.d. thesis, University of California, Berkeley, May 2000.


Proof of Correctness for Sparse Tiling of Gauss-Seidel - Strout, Carter, Ferrante (2003)   (Correct)

No context found.

Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiply. Ph.d. thesis, University of California, Berkeley, May 2000.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC