Results 1 -
7 of
7
Rescheduling for Locality in Sparse Matrix Computations
- Proceedings of the 2001 International Conference on Computational Science, Lecture Notes in Computer Science
"... . In modern computer architecture the use of memory hierarchies causes a program's data locality to directly affect performance. Data locality occurs when a piece of data is still in a cache upon reuse. For dense matrix computations, loop transformations can be used to improve data locality. How ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
. In modern computer architecture the use of memory hierarchies causes a program's data locality to directly affect performance. Data locality occurs when a piece of data is still in a cache upon reuse. For dense matrix computations, loop transformations can be used to improve data locality. However, sparse matrix computations have non-affine loop bounds and indirect memory references which prohibit the use of compile time loop transformations. This paper describes an algorithm to tile at runtime called serial sparse tiling. We test a runtime tiled version of sparse Gauss-Seidel on 4 different architectures where it exhibits speedups of up to 2.7. The paper also gives a static model for determining tile size and outlines how overhead affects the overall speedup. 1
Exploiting Locality in the Run-Time Parallelization of Irregular Loops
"... The goal of this work is the efficient parallel execution of loops with indirect array accesses, in order to be embedded in a parallelizing compiler framework. In this kind of loop pattern, dependences can not always be determined at compile-time as, in many cases, they involve input data that are o ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
The goal of this work is the efficient parallel execution of loops with indirect array accesses, in order to be embedded in a parallelizing compiler framework. In this kind of loop pattern, dependences can not always be determined at compile-time as, in many cases, they involve input data that are only known at run-time and/or the access pattern is too complex to be analyzed. In this paper we propose runtime strategies for the parallelization of these loops. Our approaches focus not only on extracting parallelism among iterations of the loop, but also on exploiting data access locality to improve memory hierarchy behavior and, thus, the overall program speedup. Two strategies are proposed: one based on graph partitioning techniques and other based on a block-cyclic distribution. Experimental results show that both strategies are complementary and the choice of the best alternative depends on some features of the loop pattern.
Sparse Tiling for Stationary Iterative Methods
- International Journal of High Performance Computing Applications
"... In modern computers, a program’s data locality can affect performance significantly. This paper details full sparse tiling, a run-time reordering transformation that improves the data locality for stationary iterative methods such as Gauss–Seidel operating on sparse matrices. In scientific applicati ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In modern computers, a program’s data locality can affect performance significantly. This paper details full sparse tiling, a run-time reordering transformation that improves the data locality for stationary iterative methods such as Gauss–Seidel operating on sparse matrices. In scientific applications such as finite element analysis, these iterative methods dominate the execution time. Full sparse tiling chooses a permutation of the rows and columns of the sparse matrix, and then an order of execution that achieves better data locality. We prove that full sparsetiled Gauss–Seidel generates a solution that is bitwise identical to traditional Gauss–Seidel on the permuted matrix. We also present measurements of the performance improvements and the overheads of full sparse tiling and of cache blocking for irregular grids, a related technique developed by Douglas et al. Key words: tiling, iterative alogorithms, static and dynamic analysis, irregular grids, data locality, sparse matrix, computer architecture
An inspector-executor algorithm for irregular assignment parallelization
- In Proc. of the 2nd International Symposium on Parallel and Distributed Processing and Applications (ISPA
, 2004
"... Abstract. A loop with irregular assignment computations contains loopcarried output data dependences that can only be detected at run-time. In this paper, a load-balanced method based on the inspector-executor model is proposed to parallelize this loop pattern. The basic idea lies in splitting the i ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. A loop with irregular assignment computations contains loopcarried output data dependences that can only be detected at run-time. In this paper, a load-balanced method based on the inspector-executor model is proposed to parallelize this loop pattern. The basic idea lies in splitting the iteration space of the sequential loop into sets of conflictfree iterations that can be executed concurrently on different processors. As will be demonstrated, this method outperforms existing techniques. Irregular access patterns with different load-balancing and reusability properties are considered in the experiments. 1
A Compiler Framework to Detect Parallelism in Irregular Codes
- In 14th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2001
, 2001
"... This paper describes a compiler framework that enhances the detection of parallelism in loops with complex irregular computations. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper describes a compiler framework that enhances the detection of parallelism in loops with complex irregular computations.
ABSTRACT Improving Parallel Irregular Reductions Using Partial Array Expansion
"... Much effort has been devoted recently to efficiently parallelize irregular reductions. In this paper, parallelizing techniques for these computations are analyzed in terms of three performance aspects: parallelism, data locality and memory overhead. These aspects have a strong influence in the overa ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Much effort has been devoted recently to efficiently parallelize irregular reductions. In this paper, parallelizing techniques for these computations are analyzed in terms of three performance aspects: parallelism, data locality and memory overhead. These aspects have a strong influence in the overall performance and scalability of the parallel code. We will discuss how the parallelization techniques usually try to optimize some of these aspects, while missing the other(s). We will show that by combining complementary techniques we can improve the overall performance/scalability of the parallel irregular reduction, obtaining an effective solution for large problems on large machines. Specifically, a combination of array expansion and a locality-oriented method (DWA-LIP), named partial array expansion, is introduced. An implementation of the proposed method is discussed, showing that the transformation that the compiler must apply to the irregular reduction code is not excessively complex. Finally, the method is analyzed and experimentally evaluated. 1.
Partial Array Expansion for Irregular Reductions
, 2001
"... Irregular reductions are usual operations in the core of a large class of scienti c/engineering applications. ..."
Abstract
- Add to MetaCart
Irregular reductions are usual operations in the core of a large class of scienti c/engineering applications.

