Results 1 -
2 of
2
Precise Data Locality Optimization of Nested Loops
- J. SUPERCOMPUT
, 2002
"... A significant source for enhancing application performance and for reducing power consumption in embedded processor applications is to improve the usage of the memory hierarchy. In this paper, a temporal and spatial locality optimization framework of nested loops is proposed, driven by parameterized ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
A significant source for enhancing application performance and for reducing power consumption in embedded processor applications is to improve the usage of the memory hierarchy. In this paper, a temporal and spatial locality optimization framework of nested loops is proposed, driven by parameterized cost functions. The considered loops can be imperfectly nested. New data layouts are propagated through the connected references and through the loop nests as constraints for optimizing the next connected reference in the same nest or in the other ones. Unlike many existing methods, special attention is paid to TLB (Translation Lookaside Buffer) effectiveness since TLB misses can take from tens to hundreds of processor cycles. Our approach only considers active data, that is, array elements that are actually accessed by a loop, in order to prevent useless memory loads and take advantage of storage compression and temporal locality. Moreover, the same data transformation is not necessarily applied to a whole array. Depending on the referenced data subsets, the transformation can result in different data layouts for a same array. This can significantly improve the performance since a priori incompatible references can be simultaneously optimized. Finally, the process does not only consider the innermost loop level but all levels. Hence, large strides when control returns to the enclosing loop are avoided in several cases, and better optimization is provided in the case of a small index range of the innermost loop.
Increasing Parallelism of Loops with the Loop Distribution Technique
, 1995
"... In a loop, the parallelism is bad when the statements in the loop body are involved in a datadependence cycle. How to break data-dependence cycles is the key point for increasing the parallelism of loop execution. In this paper, we consider the data-dependence relation in the viewpoint of statements ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In a loop, the parallelism is bad when the statements in the loop body are involved in a datadependence cycle. How to break data-dependence cycles is the key point for increasing the parallelism of loop execution. In this paper, we consider the data-dependence relation in the viewpoint of statements. We propose two new methods, the modi ed index shift method and the statement substitution-shift method. They have better parallelism and performance than the index shift method in general. The modi ed index shift method is obtained from modifying the index shift method and combining with the loop distribution method. The statement substitution-shift method is obtained from combining the statement substitution method, the index shift method and the unimodular transformation method with the loop distribution method. Moreover, the topological sort can be applied to determine the parallel execution order of statements.

