| M.E. Wolf and M.S. Lam, A Data Locality Optimization Algorithm, in Proceeding of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation, Toronto, Ontario, Canada, June 26--28, 1991. |
....during the execution of a code, as a result of locality (spatial and temporal) a piece of data loaded before in the level of the memory hierarchy under study is accessed without necessity of accessing the lower level. These definitions of locality and reuse are interchanged by some authors [23,10]. Most of the codes presenting regular accesses exhibit a considerable amount of locality in their execution. The accesses follow an easy to predict pattern and therefore the locality can be exploited by the compiler and the programmer to speed up the execution. Dense codes in matrix algebra ....
....an analytical model. This feature allows the model to be used in the preprocessing or the compilation step to drive optimizations of the code. Most of the work devoted to analytical modelling focus on predicting the number of cache misses, and do it on codes with regular accesses. In particular [1,23,10] present analytical models which analyze perfect nested loops. For the case of irregular codes, the number of articles on modelling is small as a result of the difficulty for finding general models describing their characteristics. Even so, in this group we can emphasize [8] where the authors ....
[Article contains additional citation context not shown here]
M. E. Wolf and M. S. Lam. A data locality optimization algorithm. In Proc. SIGPLAN'91 Conf. on Programming Language Design and Implementation, June 1991.
.... Partitioning steps are performed so as to obtain the best trade off between optimizing parallelism and optimizing locality [27, 28] Locality optimizations result in reduced interprocessor communication regardless of whether the target is a shared memory or a distributed memory multiprocessor [31, 2]. We assume that the number of virtual processors equals the number of physical processors so that the output of Computation Partitioning can capture the trade off between locality and parallelism as accurately as possible 2 . However, note that interprocessor communication is still 2 Usually, ....
Michael E. Wolf and Monica S. Lam. A Data Locality Optimization Algorithm. Proceedings of the ACM SIGPLAN Symposium on Programming Language Design and Implementation, June 1991.
....of C and C programs for comparison, this paper compares C and C programs that execute the same task. One approach to improving spatial locality of a program is to make algorithmic changes to the program such that the reference behavior better suits the way the data is laid out in the memory [GJG88, WL91, CMT94]. Another approach is to change the data layout to match the program s reference behavior [SD94, CKJA98, CDL99] Recently Ding and Kennedy [DK99] proposed a hybrid approach of reorganizing both computation and data at run time to improve cache performance. Chilimbi et al. CDL99] proposed semi ....
M. E. Wolf and M. S. Lam. A data locality optimization algorithm. In PLDI, pages 30-44, 1991.
....often relies on an approximate cost model to evaluate the cache performance of a loop nest. Most of these techniques analyze a loop nest to nd out the amount of reuse present at various levels and then apply loop or data transformations to improve the reuse, thereby enhancing data locality [12, 25, 23, 17, 5]. The primary drawback of these methods is their imprecision. For example, they are unable to characterize con ict misses, which result in severe performance degradation for some practical cache organizations. Also there is no integrated framework to precisely detect whether the problem is due to ....
....Tiling The familiar loop optimization tiling is a mechanism often used to eliminate capacity misses by reordering accesses, so that accesses to reused data are closer together in the iteration space. It is a combination of strip mining and loop interchange to realize reuse carried at outer loops [4, 24, 12, 23]. We look for con gurations in the CME Table that suggest possible bene t from exploiting reuse vectors beyond the shortest (leftmost) reuse vector of a reference. Tiling can help when the CME Table indicates many self and or cross reference interferences for a longer reuse vector of a reference ....
[Article contains additional citation context not shown here]
M. E. Wolf and M. S. Lam. A data locality optimization algorithm. In Proceedings of the SIGPLAN'91 Conference on Programming Language Design and Implementation, June 1991.
....which can be imprecise. In this article we present a precise mathematical framework that can be used to guide a range of memory optimizations. There has been extensive research on improving the cache performance of numerical programs [McKinley et al. 1996; Ferrante et al. 1991; Lam et al. 1991; Wolf and Lam 1991; Wolfe 1989] Most of this work targets loop nests with predictable and regular data accesses. Loop optimization plays a significant role in compiler optimization, as scientific programs spend a considerable amount of time processing large arrays within loops. Tiling, strip mining, loop ....
....[Bacon et al. 1994; Lam et al. 1991] and hence we need more precise characterization to understand the underlying cause behind such conflict misses. Most previous compiler techniques to optimize loop nests either use simple cost models to guide loop transformations [McKinley et al. 1996; Wolf and Lam 1991] or are targeted toward some specific optimization [Bacon et al. 1994; Lam et al. 1991; Rivera and Tseng 1998] There has also been some initial work on estimating the number of cache misses in numerical code [Ferrante et al. 1991; Temam et al. 1994] Though the strategies given in previous papers ....
[Article contains additional citation context not shown here]
Wolf, M. E. and Lam, M. S. 1991. A data locality optimization algorithm. In Proceedings of the SIGPLAN `91 Conference on Programming Language Design and Implementation.
....available at load time. These optimizations cannot currently be performed on the Java virtual machine architecture because reconstructing data flow and control flow information from byte codes and performing these optimizations at the time of method activation is too time intensive. Cache blocking [WL91] and loop unrolling are two examples of these techniques. Analyzing and recognizing access patterns, as well as having precise information about important cache parameters (e.g. cache size, line size) is a prerequisite for these optimizations. While the former can be accomplished at compile time, ....
M. Wolf, M. Lam. A Data Locality Optimization Algorithm. In Proceedings of the SIGPLAN `91 Conference on Programming Language Design and Implementation, pp 30--44, Published as SIGPLAN Notices 26(6), June 1991.
....misses are highly sensitive to slight variations in problem size and base addresses; hence we need precise characterizations to understand the underlying cause behind such conflict misses. Most previous compiler techniques to optimize loop nests use ad hoc cost models to guide loop transformations [WL91, CMT94]. We generate a set of equations called the Cache Miss Equations (or CM equations) representing all the cache misses in a loop nest. This simple, precise characterization allows one to better understand the cause behind such misses, and helps one reduce cache misses in a methodical way. We ....
....example given in Figure 1 as our running example to illustrate our algorithm. In order to describe our analysis steps in a concise mathematical form we represent a loop nest of depth n as a finite convex polyhedron of the ndimensional iteration space Z n , bounded by the loop bounds [WL91]. Each iteration in the loop corresponds to a node in the polyhedron and is called an iteration point. Every iteration point is identified by its index vector i = i1 ; i 2 ; Delta Delta Delta ; i n ) where i l is the loop index of the l th loop in the nest with the outermost loop ....
[Article contains additional citation context not shown here]
Michael E. Wolf and Monica S. Lam. A data locality optimization algorithm. In Proc. ACM SIGPLAN '91 Conf. on Programming Language Design and Implementation, volume 26(6), pages 30-44, June 1991.
....can be slow, or on compiler heuristics which can be imprecise. In this paper we present an analysis technique that is more precise than many existing compiler heuristics and that is faster than simulation. There has been extensive research on improving the cache performance of numerical programs [WL91, KM88, CM95, CMT94, FST91, LRW91]. Most of this work targets loop nests with predictable and regular data accesses. Loop optimization plays a significant role in compiler optimization as scientific programs spend a considerable amount of time processing large arrays within loops. Tiling, strip mining, interchanging, skewing and ....
....highly sensitive to slight variations in problem size and base addresses and hence we need more precise characterization to understand the underlying cause behind such conflict misses. Most previous compiler techniques to optimize loop nests use ad hoc cost models to guide loop transformations [WL91, CMT94]. Though the strategies given in previous papers help in reducing cache misses, they give little insight about the causes of such misses. This paper attempts to fill this gap by finding precise relationships among the loop indices, array sizes and base addresses, and the cache parameters for the ....
[Article contains additional citation context not shown here]
Michael E. Wolf and Monica S. Lam. A data locality optimization algorithm. In Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, volume 26(6), pages 30-44, June 1991.
....which can be imprecise. In this paper we present a precise mathematical framework that can be used to guide a range of memory optimizations. There has been extensive research on improving the cache performance of numerical programs [McKinley et al. 1996; Ferrante et al. 1991; Lam et al. 1991; Wolf and Lam 1991; Wolfe 1989] Most of this work targets loop nests with predictable and regular data accesses. Loop optimization plays a significant role in compiler optimization as scientific programs spend a considerable amount of time processing large arrays within loops. Tiling, strip mining, loop ....
....addresses [Bacon et al. 1994; Lam et al. 1991] and hence we need more precise characterization to understand the underlying cause behind such conflict misses. Most previous compiler techniques to optimize loop nests either use simple cost models to guide loop transformations [McKinley et al. 1996; Wolf and Lam 1991] or are targeted towards some specific optimization [Bacon et al. 1994; Lam et al. 1991; Rivera and Tseng 1998] There has also been some initial work on estimating the number of cache misses in numerical code [Ferrante et al. 1991; Temam et al. 1994] Though the strategies given in previous ....
[Article contains additional citation context not shown here]
Wolf, M. E. and Lam, M. S. 1991. A data locality optimization algorithm. In Proc. SIGPLAN `91 Conf. on Programming Language Design and Implementation (June).
....improving the data locality of loop oriented programs loop nest restructuring and data layout optimizations. Restructuring loop optimizations (e.g. permutation, tiling, and fusion) are mechanisms widely used to reorder the access pattern in a loop nest for better temporal and spatial locality [8, 12, 15, 23, 24]. The key issues here are determining appropriate analyses and policies for determining when to apply these optimizations. In the past, such analyses have primarily considered capacity misses, but loops can also suffer heavily from conflict misses, particularly in caches with low associativity ....
....identically, so the model is of a write allocate cache with fetch on write. 2. 4 Terminology Our work with CMEs draws on the substantial body of research in which iteration spaces and reuse vectors are used to analyze memory reference behavior for dependence analysis [18] locality optimizations [23], or prefetching algorithms [17] We build on these approaches and adopt more precise mechanisms for using them. Iteration Space: Every iteration of a loop nest is viewed as a single entity termed an iteration point in the set of all iteration points known as the iteration space. Formally, we ....
[Article contains additional citation context not shown here]
M. E. Wolf and M. S. Lam. A data locality optimization algorithm. In Proc. SIGPLAN `91 Conf. on Programming Language Design and Implementation, June 1991.
....is more precise than many existing compiler heuristics and that is faster than simulation. To appear in the Proceedings of the 11th ACM International Conference on Supercomputing, Vienna, Austria, July 1997. There has been extensive research on improving the cache performance of numerical programs [4, 7, 9, 16, 17]. Most of this work targets loop nests with predictable and regular data accesses. Loop optimization plays a significant role in compiler optimization as scientific programs spend a considerable amount of time processing large arrays within loops. Tiling, strip mining, interchanging, skewing and ....
....to slight variations in problem size and base addresses [1, 9] and hence we need more precise characterization to understand the underlying cause behind such conflict misses. Most previous compiler techniques to optimize loop nests either use ad hoc cost models to guide loop transformations [4, 16] or are targeted towards some specific optimization [1, 9] There has also been some initial work on estimating the number of cache misses in numerical code [7, 14] Though the strategies given in previous papers help in reducing cache misses, they give little insight about the causes of such ....
[Article contains additional citation context not shown here]
M. E. Wolf and M. S. Lam. A data locality optimization algorithm. In Proc. SIGPLAN `91 Conf. on Programming Language Design and Implementation, June 1991.
....In recent years, compilers have begun to address this issue. For example, techniques have been developed to mask memory latency by fetching data ahead of time [MLG92] and program transformations such as cache blocking, loopskewing, and loop tiling have been invented to increase data locality [WL91]. All of these optimizations are particularly e#ective in the domain of scientific computing, in which programs operate extensively on arrays. Unfortunately, they fare considerably worse in application domains in which most data structures are dynamically allocated and accessed via pointers. ....
....and dynamic code generation. 8 Related Work Cache optimizations aim to reduce the gap between memory and processor speeds. For example, data locality can be increased in scientific, array based programs by applying techniques such as loop reversal, loop tiling, loop skewing, and cache blocking [WL91]. They change algorithmic behavior by reordering the execution sequence of iterations and by changing the shape of a loop s iteration space and iteration depth. Rivera and Tseng s algorithm [RT98] inserts inter variable and intra variable padding to control the placement of arrays in memory and to ....
M. Wolf and M. Lam. "A Data Locality Optimization Algorithm". In Proceedings of the SIGPLAN `91 Conference on Programming Language Design and Implementation, June 1991.
.... 38, 26] ffl automatic data partitioning for distributed memory machines [25] ffl array privatization [21, 40, 45] 6 Their original work incorrectly reported 2 additional loops in ssifa as being parallel without loop distribution [41] ffl analysis of locality to benefit cache performance [23, 53]. Determining the effectiveness and efficiency of these applications of fida. Acknowledgements The authors thank Fran Allen for her encouragement and support of this work. The ptran system served as a useful vehicle for our experiments. We wish to thank all members, past and present, of the ....
Michael E. Wolf and Monica S. Lam. A Data Locality Optimization Algorithm. In SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 30--44, June 1991. SIGPLAN Notices 266.
....weaker consistency models were successful in hiding the latency of memory update operations, thus increasing the scalability of the architecture. In the compiler world, much effort has been devoted to better manage memory accesses by taking into account the hierarchical structure of the memory [WL91] Methods that improve locality and adjust to the dynamic pattern of the memory references were found to be effective not only in shortening the latency, but also in reducing the bandwidth requirements of each processor, thereby reducing contention in the memory system. Hudak and Abraham [HA92] ....
....5: T l Gamma g, T o Gamma g, and T f Gamma g versus g processors attempt to access a single hot memory module. 5 Benefits of Replication The performance evaluation of parallel programs indicates that good locality of reference is crucial for scaling up the performance of parallel code [LHE92, WL91] Replicating selected data structures in various parts of memory is one way of increasing locality by reducing the number of remote references. In this section we assess the potential advantage of replication. We consider here two way replication. Multiple way replication is analogous, but with ....
M. Wolf and M. Lam. A data locality optimization algorithm. In 4th Intl. Conference on Architectural Support for Programming Languages and Operating Systems, pages 34--44. ACM, October 1991.
....applications or collections of loops as in stream based applications. As a result, being able to handle loops efficiently is of fundamental importance. A lot of the past work in optimizing the performance of loops has focused on individual loop nests rather than on collections of loop nests [2, 11, 3, 24, 22, 23, 20]. This paper examines the weighted loop fusion problem. Each pair of loop nests has an associated non negative weight which is the cost savings that would be obtained if the two loop nests were fused. The weight values depend on the target hardware; contributions to the weights can arise from ....
.... performing collective loop fusion is a Loop Dependence Graph (LDG) A node in the LDG represents a perfect loop nest i.e. a set of perfectly nested loops [24] We assume that, prior to loop fusion, suitable iteration reordering loop transformations have been performed on the individual loop nests [2, 11, 3, 24, 22, 23, 20, 19] and that individual loops have been identified as being parallel or serial (either by programmer input or by automatic parallelization) To simplify the discussion in this paper, we assume that the loop dependence graph represents a set of k adjacent conformable and identically control dependent ....
[Article contains additional citation context not shown here]
Michael E. Wolf and Monica S. Lam. A Data Locality Optimization Algorithm. Proceedings of the ACM SIGPLAN Symposium on Programming Language Design and Implementation, pages 30--44, June 1991.
....DA for a single array reference contained in two loops. These bounds provide an estimate of the number of cache misses incurred by a given loop nest. They also showed how to analyze set conflicts in the case of direct mapped and set associative caches with non unit line sizes. Wolf and Lam [18] propose an algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling based on a mathematical formulation of reuse and locality, and a loop transformation theory that unifies the various transforms as unimodular transformations. ....
Michael E. Wolf and Monica S. Lam. A Data Locality Optimization Algorithm. Proceedings of the ACM SIGPLAN Symposium on Programming Language Design and Implementation, pages 30--44, June 1991.
.... The techniques that have been proposed in the past for reducing false sharing fall into one of the following approaches [18, 8, 5] Changing loop structures : Transform program loops, e.g. by blocking, alignment, or peeling, so that iterations in a parallel loop access disjoint cache lines [19, 8, 7]. Changing data structures : Change the layout of data structures, e.g. by array alignment and padding [18, 1] Array alignment is the insertion of dummy space so as to change the starting address of an array variable. Array padding is an increase in the allocated dimension size of an array ....
Michael E. Wolf and Monica S. Lam. A Data Locality Optimization Algorithm. Proceedings of the ACM SIGPLAN Symposium on Programming Language Design and Implementation, pages 30--44, June 1991.
No context found.
M.E. Wolf and M.S. Lam, A Data Locality Optimization Algorithm, in Proceeding of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation, Toronto, Ontario, Canada, June 26--28, 1991.
No context found.
M. E. Wolf and M. S. Lam. A data locality optimization algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 30-44, Toronto Canada, June 1991.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC