(Enter summary)
Abstract: Modern architectural trends in instruction-level parallelism
(ILP) are to increase the computational power of
microprocessors significantly. As a result, the demands on
memory have increased. Unfortunately, memory systems
have not kept pace. Even hierarchical cache structures are
ineffective if programs do not exhibit cache locality. Because
of this compilers need to be concerned not only with
finding ILP to utilize machine resources effectively, but also
with ensuring that the resulting code... (Update)
Context of citations to this paper: More
.... disadvantages of larger loop nests in selecting an optimal degree of unrolling and is similar in spirit to the technique discussed in [26]. There have been numerous studies showing the effectiveness of these optimizations on performance (e.g. 27, 28] their impact on...
Cited by: More
Improving Register Allocation for Subscripted Variables - Callahan, Carr, Kennedy (1990)
(Correct)
Evaluating Integrated Hardware-Software.. - Vijaykrishnan.. (2003)
(Correct)
A Quantitative Analysis of Tile Size Selection Algorithms - Hsu, Kremer
(Correct)
Similar documents (at the sentence level):
14.3%: Combining Optimization for Cache and Instruction-Level Parallelism - Carr (1996)
(Correct)
5.7%: Improving Software Pipelining with Hardware Support for.. - Carr, Sweany (1998)
(Correct)
Active bibliography (related documents): More All
0.1: Improving the Ratio of Memory Operations to Floating-Point.. - Carr, Kennedy (1994)
(Correct)
0.1: Optimizing Fortran90D/HPF for Distributed-Memory Computers - Roth (1997)
(Correct)
0.1: A General Stencil Compilation Strategy for.. - Roth, Carr.. (1996)
(Correct)
Similar documents based on text: More All
0.3: Optimizing Loop Performance for Clustered VLIW Architectures - Qian, Carr, Sweany (2002)
(Correct)
0.3: Automatic Data Partitioning for the Agere Payload Plus Network .. - Carr, Sweany (2004)
(Correct)
0.2: Optimizing Sparse Matrix-Vector Product Computations.. - Mellor-Crummey, Garvin (2003)
(Correct)
Related documents from co-citation: More All
7: Combining loop transformations considering caches and scheduling (context) - Wolf, Maydan et al. - 1996
6: High Performance Compilers for Parallel Computing (context) - Wolfe - 1995
5: The cache performance and Optimizations of Blocked Algorithms (context) - Lam, Rothberg et al. - 1991
BibTeX entry: (Update)
S. Carr and Y. Guan. Unroll-and-jam using uniformly generated sets. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-97), pages 349--357, Los Alamitos, December1--3 1997. IEEE Computer Society. http://citeseer.ist.psu.edu/132810.html More
@inproceedings{ carr97unrolljam,
author = "Steve Carr and Yiping Guan",
title = "Unroll-and-Jam Using Uniformly Generated Sets",
booktitle = "International Symposium on Microarchitecture",
pages = "349-357",
year = "1997",
url = "citeseer.ist.psu.edu/132810.html" }
Citations (may not include all citations):
474
A data locality optimizing algorithm (context) - Wolf, Lam - 1991 ACM DBLP
344
Design and evaluation of a compiler algorithm for prefetchin..
- Mowry, Lam et al. - 1992 ACM DBLP
216
Strategies for cache and local memory management by global p.. (context) - Gannon, Jalby et al. - 1987
162
Improving data locality with loop transformations
- McKinley, Carr et al. - 1996 ACM DBLP
158
Improving register allocation for subscripted variables
- Callahan, Carr et al. - 1990 ACM DBLP
110
Practical dependence testing
- Goff, Kennedy et al. - 1991 ACM DBLP
79
Combining loop transformations considering cachesand schedul.. (context) - Wolf, Maydan et al. - 1996
72
A catalogue of optimizing transformations (context) - Allen, Cocke - 1972
69
Estimating interlock and improving balance for pipelined mac..
- Callahan, Cocke et al. - 1988
66
ParaScope: A parallel programming environment (context) - Callahan, Cooper et al. - 1987
51
Improving the ratio of memory operations to floating-point o..
- Carr, Kennedy - 1994 ACM DBLP
49
Memory-Hierarchy Management
- Carr - 1992 ACM
47
Scalar replacement in the presence of conditional control fl..
- Carr, Kennedy - 1994 ACM DBLP
26
Combining optimization for cache and instructionlevel parall..
- Carr - 1996
13
Loop quantization: An analysis and algorithm (context) - Aiken, Nicolau - 1987 ACM
1
Unroll-and-jam guided by a linear-algebra-based reuse model (context) - Guan - 1995
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.cs.mtu.edu/~carr/Publications.html): More
Using Value Cloning To Improve Code Generation For Software.. - Kuras (1998)
(Correct)
The Performance of Scalar Replacement on the HP 715/50 - Steve Carr (1995)
(Correct)
Using Genetic Algorithms to Fine-Tune.. - Beaty, Colcord, Sweany (1996)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC