30 citations found. Retrieving documents...
J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Evaluating the Impact of Memory System Performance on Software.. - Badawy, al. (2001)   (5 citations)  (Correct)

....cause data to be accessed in an irregular manner, making spatial locality (reuse of data on a cache line) unlikely when the data is larger than the cache. Researchers have discovered recently that run time data and computation transformations can improve the locality of irregular computations [1, 13, 28, 29]. Because computations are typically commutative, loop iterations can be safely reordered to bring accesses to the same data closer together in time. Data layout can also be transformed so that data accesses are more likely to be to the same cache line. These compiler and run time transformations ....

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the


Scaling Irregular Parallel Codes with Minimal.. - Nikolopoulos.. (2001)   (Correct)

....paradigm to perform in par with implementations of the same applications with message passing remains an open problem. There are some important steps taken in this direction, encompassing techniques such as manual data placement, data reordering and dynamic subdivision of the problem space [6, 12, 19]. Unfortunately, most, if not all, of these techniques are non portable and require complex code and data transformations, hence significant programming effort. The lack of a systematic methodology for applying these transformations is also a concern. It is a major research challenge to scale ....

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving Memory Hierarchy Performance for Irregular Applications. In Proc. of the 13th ACM International Conference on Supercomputing (ICS'99), pages 425--433, Rhodes, Greece, 1999.


Proof of Correctness for Sparse Tiling of Gauss-Seidel - Strout, Carter, Ferrante   (Correct)

....the tiling and array padding factors has not been solved for all cases. Rivera and Tseng [15] look more speci cally at how to do tiling and array padding for 3D regular meshes. There has also been work on run time techniques for improving the intra iteration locality for irregular meshes [8, 14, 2, 7, 13]. Mitchell et al. 14] describe a compiler optimization which operates on non ane array references in loops. Sparse matrix data structures require indirect array references, which are a type of non ane array reference. Also, Im and Yelick [8, 9] describe a code generator called SPARSITY which ....

John Mellor-Crummey, David Whalley, and Ken Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the


Bridging Processor and Memory Performance in ILP Processors via .. - Palem, al. (2001)   (2 citations)  (Correct)

....This motivates the need for bandwidth ameliorating strategies. have significantly improved the performance of applications with predictable access patterns. Unfortunately, these optimizations fare poorly when applied to important pointer intensive scientific and dynamic real world applications[26, 31, 22]. We propose a data remapping strategy to enhance locality, specifically for pointer intensive programs with extensive dynamic object allocations. We implement our locality enhancing algorithms (LEA ) in the Trimaran[28] EPIC compiler and the GNU C compiler. We subsequently detail simulations and ....

....that for the faster Pentium III processor, despite a factor of 8 reduction in the second level cache size, we attain 20 improvement compared to the UltraSparc. 4 Related Work Previous work has attacked the processor memory gap by computation reordering to increase spatial and temporal locality[5, 8, 7, 32, 18, 22]. Most recently, Crummey et al. 22] explore a coordinated data and computation reordering 10 Performance 3 0 0 0 0 157 158 159 are 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ART DM Field Health PerimeterT SP T eeAdd Normalized 188 tion Time 8 eedup Pentium III Pentium II UltraSpar ....

[Article contains additional citation context not shown here]

J. Mellor-Crummy, D. Whalley, and K. Kennedy. "Improving memory hierarchy performance for irregular applications using data and computation reordering". In Proceedings of the ACM International Conference on Supercomputing', pages 425-433, June 1999.


Compiler Generated Multithreading to Alleviate Memory Latency - Beyls, D'Hollander (2000)   (Correct)

....to reduce the number of capacity misses, but also other loop transformations can be used. Examples of such loop transformations can be found in [Yamada et al..1994, McKinley et al..1996] Recently, some program transformations to shorten the reuse distance in irregular applications have been proposed[Mellor Crummey et al..1999, Ding et al..1999] It needs to be investigated how cache remapping can be used in these applications to remove the remaining con ict misses. Software controlled cache placement gives rise to a number of interesting future research directions. As the compiler is able to control the caching ....

John Mellor-Crummey, David Whalley, and Ken Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 Conference on Supercomputing, pages 425-433, June 20-25 1999.


Rescheduling for Locality in Sparse Matrix Computations - Strout, Carter, Ferrante (2001)   (2 citations)  (Correct)

....structures to do the Gauss Seidel computation. Since the structure isn t known until runtime, any rescheduling of the computation or data rearrangement must occur at runtime as well. 2 Previous Work There has been a lot of previous work on techniques for improving the intra iteration locality [4, 7, 1, 3, 6]. We will be presenting a technique for improving both the intra and inter iteration locality. The only other technique, we are aware of, which accomplishes this for iterative algorithms on irregular meshes is cacheblocking by Douglas et al. 2] On the left is an Iteration Space Graph which has ....

John Mellor-Crummey, David Whalley, and Ken Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 Conference on Supercomputing, ACM SIGARCH, pages 425--433, N.Y., June 20--25 1999. ACM Press.


Rescheduling for Locality in Sparse Matrix Computations - Strout, Carter, Ferrante   (Correct)

....locality on the sparse matrix, because in their rescheduled code the entire sparse matrix is traversed each convergence iteration. Other work which looks at runtime data reorganization and rescheduling includes Demmel et al. 2] Han and Tseng[8] Ding and Kennedy [3] and Mellor Crummey et al.[13]. 5 Conclusion Runtime tiling is possible with unstructured iteration spaces, and we show it can improve the data locality and therefore the performance of Gauss Seidel. Specifically we present an algorithm for generating a serial sparse tiling for GaussSeidel. We also describe a simple static ....

John Mellor-Crummey, David Whalley, and Ken Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 Conference on Supercomputing, ACM SIGARCH, pages 425--433, N.Y., June 20--25 1999. ACM Press.


Evaluating the Impact of Memory System Performance on.. - Badawy, Aggarwal.. (2001)   (5 citations)  (Correct)

....arrays cause data to be accessed in an irregular manner, making spatial locality (reuse of data on a cache line) unlikely when the data is larger than the cache. Recently, researchers have discovered run time data and computation transformations can improve the locality of irregular computations [1, 13, 28, 29]. Because computations are typically commutative, loop iterations can be safely reordered to bring accesses to the same data closer together in time. Data layout can also be transformed so that data accesses are more likely to be to the same cache line. These compiler and run time transformations ....

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.


A Comparison of Parallelization Techniques for Irregular.. - Han, Tseng (2001)   (2 citations)  (Correct)

.... of our applications, we applied recursive coordinate bisection (RCB) to recursively divides data nodes into partitions based on their geometric coordinates [2] RCB, space filling curves, and graph partitioning algorithms have all been shown to improve locality in irregular scientific codes [1, 15, 10]. For our input data sets RCB improves the percentage of interior iterations to 74 90 . Figure 11 compares 16 processor speedups for baseline input and preprocessed input with RCB. Y axis displays speedups and x axis shows locality measured in the percentage of interior iterations. Each graph ....

.... system [4, 14] Compiler techniques were developed to automatically generate calls to the run time routines [11] Researchers have also developed techniques to improve the data locality of irregular computations, using dynamic copying of data elements [5] or partitioning computation and data [1, 15]. We earlier developed LOCALWRITE for use on a software DSM running on a message passing multiprocessor [9] We compared it to REPLICATEBUFS and explicit message passing using in CHAOS. Results showed LOCALWRITE consistently outperformed REPLICATEBUFS, butwas less efficient than explicit messages ....

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.


Locality Optimizations For Adaptive Irregular Scientific Codes - Han, Tseng (2000)   (2 citations)  (Correct)

....is relatively little information at compile time concerning the locality properties of irregular programs. Researchers have demonstrated that the performance of irregular programs can be improved by applying a combination of computation and data layout transformations on irregular computations [5, 1, 7, 14, 15]. Results have been promising, but a number of issues have not been examined. Our paper makes the following contributions: ffl Design a compiler run time system for either automatically applying graph based locality optimizations, or using programmer annotations to guide coordinate based locality ....

....improved with low overhead [5, 1] Both CPACK and RCM are traversal methods which reorder based on a single pass over the computation. Space filling curve (MORTON) Space filling curves (e.g. Morton) are continuous, non smooth curves that pass through every point in a finite k dimensional space [11, 14]. Because interactions tend to be local, arranging data using space filling curves reduces the distance (in memory) between two geometrically close points in space, yielding better locality [14] When geometric coordinate information is available, sorting data and computation using space filling ....

[Article contains additional citation context not shown here]

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.


Software Support For Improving Locality in Advanced Scientific Codes - Tseng (2000)   (Correct)

.... transformations (e.g. loop permutation, fusion, tiling) for sequential dense matrix codes with regular memory access patterns has proven useful [19, 27, 48, 49, 57, 55, 71, 76, 87, 88] Data layout optimizations (e.g. transpose, padding) also help [2, 3, 13, 18, 39, 69, 70] even for irregular [1, 22, 58] and pointer based programs [8, 17] Despite the major advances made in providing software support for improving locality for both sequential and parallel programs, more work remains for advanced scientific computations. 2 Advanced Scientific Applications We begin by presenting three types of ....

....both stability and improved performance for 3D stencil codes. 3.2 Run time Data and Computation Transformations 3D multigrid codes are amenable to compile time analysis. Recently, researchers found data and computation transformations can also improve the locality of both irregular computations [1, 20, 22, 58, 59] and even programs with pointer based data structures [8, 16, 17] These transformations may be applied because irregular scientific computa 5 3 1 2 4 1 2 3 4 1 2 3 4 5 1 2 3 4 1 2 3 4 3 1 5 2 4 3 1 4 2 3 1 5 2 4 Figure 4 Overview of Data and Computation Transformations tions are ....

[Article contains additional citation context not shown here]

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.


A Comparison of Locality Transformations for Irregular Codes - Han, Tseng (2000)   (8 citations)  (Correct)

....computations have poor temporal and spatial locality because they do not repeatedly access data in memory with small constant strides. Researchers recently showed locality can be improved for irregular codes using compiler and run time data layout and computation reordering transformations [1, 8, 27, 28]. In this paper, we form a framework for applying locality optimizations to irregular codes and experimentally evaluate the impact of a number of optimization techniques. The contributions of this paper are: This research was supported in part by NSF CAREER Development Award #ASC9625531 in New ....

....data in irregular computations, by viewing iterations as graph edges. Performance is improved with low overhead [1, 6] Recursive Coordinate Bisection (rcb) Space filling curves (e.g. Hilbert, Morton) are continuous, non smooth curves that pass through every point in a finite k dimensional space [15, 27, 32]. Because interactions tend to be local, arranging data using space filling curves reduces the distance (in memory) between two geometrically close points in space, yielding better locality [27] When geometric coordinate information is available, sorting data and computation using spacefilling ....

[Article contains additional citation context not shown here]

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.


Locality Optimizations For Adaptive Irregular Scientific Codes - Han, Tseng (2000)   (2 citations)  (Correct)

....is relatively little information at compile time concerning the locality properties of irregular programs. Researchers have demonstrated that the performance of irregular programs can be improved by applying a combination of computation and data layout transformations on irregular computations [5, 1, 7, 14, 15]. Results have been promising, but a number of issues have not been examined. Our paper makes the following contributions: ffl Design a compiler language run time system for either automatically applying graph based locality optimizations, or using programmer annotations to guide coordinatebased ....

....improved with low overhead [5, 1] Both CPACK and RCM are traversal methods which reorder based on a single pass over the computation. Space filling curve (MORTON) Space filling curves (e.g. Morton) are continuous, non smooth curves that pass through every point in a finite k dimensional space [11, 14]. Because interactions tend to be local, arranging data using space filling curves reduces the distance (in memory) between two geometrically close points in space, yielding better locality [14] When geometric coordinate information is available, sorting data and computation using space filling ....

[Article contains additional citation context not shown here]

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.


Guiding Program Transformations with Modal Performance Models - Mitchell (2000)   (2 citations)  (Correct)

....[33] have applied the original inspector executor technique to increase locality. A direct application of inspector executor cannot localize non a#ne references, as it remaps A. Remapping A would perform as badly as the original computation. For work done contemporaneously with this thesis, see [79] and [53] Thus, localizing non a#ne references requires a general iteration permutation, as well as a non linear data remapping. To generate a permutation and remap the data, we must introduce a computation to do so at run time. However, this new computation must also perform well. In the ....

John Mellor-Crummey, David Whalley, and Ken Kennedy. Improving memory hierarchy performance for irregular applications. In International Conference on Supercomputing, pages 425--433, June 1999.


Improving Locality For Adaptive Irregular Scientific Codes - Han, Tseng (1999)   (4 citations)  (Correct)

....is relatively little information at compile time concerning the locality properties of irregular programs. Researchers have demonstrated that the performance of irregular programs can be improved by applying a combination of computation and data layout transformations on irregular computations [9, 1, 11, 33, 35] and even programs with pointer based data structures [5, 7] Results have been promising, but a number of issues have not been 1 examined. Our paper makes the following contributions: ffl Experimentally determine effective parameters for locality optimization algorithms that balance overhead ....

....Sort 2. 3 Locality optimization algorithms When geometric coordinate information is available, sorting data and computation using space filling curves is the locality optimization of choice, since it achieves good locality with very low overhead simply using a multidimensional bucket sort [33]. However, geometric coordinate information is not necessarily available, and will probably require user intervention to identify. We briefly review a number of other locality optimization algorithms which are easier to automate as compiler directed transformations. Lexicographical sorting One ....

[Article contains additional citation context not shown here]

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.


Improving Fine-Grained Irregular Shared-Memory Benchmarks.. - Hu, Cox, Zwaenepoel (2000)   (9 citations)  (Correct)

....modern uniprocessor architectures. There has been a large body of research on loop transformations for dense matrix codes, e.g. 14, 32, 7, 31] Recently, several works have focused on array and pointerbased data structures in irregular applications. Ding and Kennedy [12] and Mellor Crummey et al. [24] looked at irregular applications which perform irregular accesses of array elements via interaction (indirection) arrays. Such applications fall into our Category 2. The basic idea in Ding and Kennedy [12] is to examine the contents of indirection array and generate a new ordering for the array ....

....on the access affinity implied by the indirection array. They then reorder data in memory and adjust the contents of the indirection arrays. The technique does not require the geometric coordinates of the physical quantities that array elements are modeling. Like our work, Mellor Crummey et al. [24] use of space filling curves to reorder data and or computation, but they focus on the uniprocessor memory hierarchy. There have been several semi automatic data placement approaches to improving cache performance on uniprocessors for heap objects. Calder et al. proposed cache conscious data ....

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 International Conference on Supercomputing, June 1999.


Evaluating the Impact of Memory System Performance on .. - Aggarwal, Badawy.. (2000)   (5 citations)  (Correct)

....on the values in the index array. If data is accessed in an irregular manner, spatial locality is unlikely to be obtained if the data is larger than the cache. Recently, researchers have discovered run time data and computation transformations can improve the locality of irregular computations [1, 14, 29, 30]. Because computations are typically commutative, loop iterations can be safely reordered to bring accesses to the same data closer together in time. Data layout can also be transformed so that data accesses are more likely to be to the same cache line. These compiler and run time transformations ....

....interactions for a pair of data. Such interactions tend to occur between nearby data items. Partitioning data based on either geometric coordinate data or the underlying interaction graph can thus increase the probability accesses will be made to data within the partition, increasing cache hits [1, 29, 18]. In particular, partitions based on geometric coordinates yield good locality if applicable. To improve data locality of indexed array codes, we thus apply recursive coordinate bisection (RCB) a partitioning technique based on geometric coordinate information. RCB works by recursively splitting ....

[Article contains additional citation context not shown here]

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.


Code Transformations to Improve Memory Parallelism - Pai, Adve (1999)   (6 citations)  (Correct)

....Mp3d is an irregular, asynchronous, communication intensive SPLASH code [23] To eliminate falsesharing, key data structures were padded to a multiple of the cache line size. To reduce true sharing and improve locality, the data elements were sorted by position in the modeled physical world [24]. Mp3d has no recurrences, but sees poor miss clustering because of large loop bodies. Thus, inner loop unrolling and aggressive scheduling can provide clustering here, as discussed in Section 3.3. We assumed that the dominant move loop was explicitly marked parallel. Despite these ....

J. Mellor-Crummey, D. Whalley, and K. Kennedy, "Improving Memory Hierarchy Performance for Irregular Applications, " in Proceedings of the 13th ACM-SIGARCH International Conference on Supercomputing, pp. 425-- 433, June 1999.


Code Transformations to Improve Memory Parallelism - Pai, Adve (1999)   (6 citations)  (Correct)

....particles 1 Table 2. Data set sizes and number of processors for experiments on simulated and real systems. key data structures were padded to a multiple of the cache line size. To reduce true sharing and improve locality, the data elements were sorted by position in the modeled physical world [12]. Mp3d has no recurrences, but sees poor miss clustering because of large loop bodies. Thus, inner loop unrolling and aggressive scheduling can provide clustering here, as discussed in Section 3.3. We assumed that the dominant move loop was explicitly marked parallel. Despite these ....

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving Memory Hierarchy Performance for Irregular Applications. In Proceedings of the 13th ACM-SIGARCH International Conference on Supercomputing, pages 425--433, June 1999.


Evaluating Locality Optimizations For Adaptive Irregular.. - Han, Tseng   (Correct)

....is relatively little information at compile time concerning the locality properties of irregular programs. Researchers have demonstrated that the performance of irregular programs can be improved by applying a combination of computation and data layout transformations on irregular computations [9, 1, 11, 34, 36] and even programs with pointer based data structures [5, 7] Results have been promising, but a number of issues have not been examined. Our paper makes the following contributions: ffl Experimentally evaluate the effectiveness of several locality optimizations for a range of input data and ....

....in irregular computations, by viewing iterations as graph edges. Performance is improved with low overhead [9, 1] Recursive Coordinate Bisection (RCB) Space filling curves (e.g. Hilbert, Morton) are continuous, non smooth curves that pass through every point in a finite k dimensional space [22, 34, 45]. Because interactions tend to be local, arranging data using space filling curves reduces the distance (in memory) between two geometrically close points in space, yielding better locality [34] When geometric coordinate information is available, sorting data and computation using space filling ....

[Article contains additional citation context not shown here]

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.


Code Transformations to Improve Memory Parallelism - Pai, Adve (1999)   (6 citations)  (Correct)

....Mp3d is an irregular, asynchronous, communicationintensive SPLASH code [16] To eliminate false sharing, key data structures were padded to a multiple of the cache line size. To reduce true sharing and improve locality, the data elements were sorted by position in the modeled physical world [9]. Mp3d has no recurrences, but sees poor miss clustering because of large loop bodies. Thus, inner loop unrolling and aggressive scheduling can provide clustering here, as discussed in Section 3.3. We assumed that the dominant move loop was explicitly marked parallel. 5. Experimental Results ....

J. Mellor-Crummey et al. Improving Memory Hierarchy Performance for Irregular Applications. In Proc. of the 13th Int'l Conf. on Supercomputing, 1999.


Efficient Compiler and Run-Time Support for Parallel Irregular.. - Han, Tseng (2000)   (4 citations)  (Correct)

....data elements based on loop traversal 25 order, and show major improvements in performance [6] They were able to automate most of their transformations in a compiler. Mellor Crummey et al. use a geometric partitioning algorithm based on space filling curves to map multidimensional data to memory [32]. In comparison, we improve locality characteristics of parallel reductions, a key component of irregular computations. We previously introduced the LocalWrite algorithm in the context of compiling for software DSMs [15,14] This paper extends and refines our algorithm in greater detail. Two other ....

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.


A Comparison of Locality Transformations for Irregular Codes - Hwansoo Han Chau-Wen (2000)   (8 citations)  (Correct)

....computations have poor temporal and spatial locality because they do not repeatedly access data in memory with small constant strides. Researchers recently showed locality can be improved for irregular codes using compiler and run time data layout and computation reordering transformations [1, 8, 25, 26]. In this paper, we form a framework for applying locality optimizations to irregular codes and experimentally evaluate the impact of a number of optimization techniques. The contributions of this paper are: ffl Experimentally evaluate the effectiveness of several locality optimizations for a ....

....data in irregular computations, by viewing iterations as graph edges. Performance is improved with low overhead [1, 6] Recursive Coordinate Bisection (RCB) Space filling curves (e.g. Hilbert, Morton) are continuous, non smooth curves that pass through every point in a finite k dimensional space [13, 25, 30]. Because interactions tend to be local, arranging data using space filling curves reduces the distance (in memory) between two geometrically close points in space, yielding better locality [25] However, space filling curves may not work well for unevenly distributed nodes, due to the fixed size ....

[Article contains additional citation context not shown here]

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.


Exploiting Instruction-Level Parallelism for Memory System.. - Pai (2000)   Self-citation (Kennedy)   (Correct)

....eliminate false sharing in the irregular, asynchronous, and communication intensive Mp3d application, key data structures were padded to a multiple of the cache line size. To reduce true sharing and improve locality in Mp3d, the data elements were sorted by position in the modeled physical world [MWK99] Erlebacher is a shared memory port of a code by Thomas Eidson at the Institute for Computer Applications in Science and Engineering (ICASE) Like FFT, LU, and Ocean, Erlebacher is also a regular array based code dominated by loop nests. Em3d is a shared memory adaptation of a Split C ....

John Mellor-Crummey, David Whalley, and Ken Kennedy. Improving Memory Hierarchy Performance for Irregular Applications. In Proceedings of the 13th ACM-SIGARCH International Conference on Supercomputing, pages 425--433, June 1999.


Transforming Loops to Recursion for Multi-Level Memory.. - Yi, Adve, Kennedy (2000)   (10 citations)  Self-citation (Kennedy)   (Correct)

....slicing in our recursion transformation. The loop fusion e ect we obtain follows directly from this use of iteration space slicing. Finally, the automatic recursion transformation can play an important complementary role to several recursive data organizing techniques that have been proposed [7, 20]. For example, Chatterjee et al. show that recursive reordering of data produces signi cant performance bene ts on modern memory hierarchies, and they argue that recursive control structures may be needed to fully exploit their potential. Conversely, we believe that our work can specially bene t ....

J. Mellor-Crummy, D. Whalley, and K. Kennedy. Improving Memory Hierarchy Performance For Irregular Applications. In Proc. 13th ACM Int'l Conference on Supercomputing, Phodes, Greece., 1999.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC