See this document in CiteSeerX!

Unroll-and-Jam Using Uniformly Generated Sets (1997)  (Make Corrections)  (9 citations)
Steve Carr Department of Computer Science Michigan Technological University...
International Symposium on Microarchitecture



  Home/Search   Context   Related

Links:   ACM   DBLP

 
View or download:
mtu.edu/pub/carr/LAUnroll.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  mtu.edu/~carr/Publications (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Modern architectural trends in instruction-level parallelism (ILP) are to increase the computational power of microprocessors significantly. As a result, the demands on memory have increased. Unfortunately, memory systems have not kept pace. Even hierarchical cache structures are ineffective if programs do not exhibit cache locality. Because of this compilers need to be concerned not only with finding ILP to utilize machine resources effectively, but also with ensuring that the resulting code... (Update)

Context of citations to this paper:   More

.... disadvantages of larger loop nests in selecting an optimal degree of unrolling and is similar in spirit to the technique discussed in [26]. There have been numerous studies showing the effectiveness of these optimizations on performance (e.g. 27, 28] their impact on...

Cited by:   More
Improving Register Allocation for Subscripted Variables - Callahan, Carr, Kennedy (1990)   (Correct)
Evaluating Integrated Hardware-Software.. - Vijaykrishnan.. (2003)   (Correct)
A Quantitative Analysis of Tile Size Selection Algorithms - Hsu, Kremer   (Correct)

Similar documents (at the sentence level):
14.3%:   Combining Optimization for Cache and Instruction-Level Parallelism - Carr (1996)   (Correct)
5.7%:   Improving Software Pipelining with Hardware Support for.. - Carr, Sweany (1998)   (Correct)

Active bibliography (related documents):   More   All
0.1:   Improving the Ratio of Memory Operations to Floating-Point.. - Carr, Kennedy (1994)   (Correct)
0.1:   Optimizing Fortran90D/HPF for Distributed-Memory Computers - Roth (1997)   (Correct)
0.1:   A General Stencil Compilation Strategy for.. - Roth, Carr.. (1996)   (Correct)

Similar documents based on text:   More   All
0.3:   Optimizing Loop Performance for Clustered VLIW Architectures - Qian, Carr, Sweany (2002)   (Correct)
0.3:   Automatic Data Partitioning for the Agere Payload Plus Network .. - Carr, Sweany (2004)   (Correct)
0.2:   Optimizing Sparse Matrix-Vector Product Computations.. - Mellor-Crummey, Garvin (2003)   (Correct)

Related documents from co-citation:   More   All
7:   Combining loop transformations considering caches and scheduling (context) - Wolf, Maydan et al. - 1996
6:   High Performance Compilers for Parallel Computing (context) - Wolfe - 1995
5:   The cache performance and Optimizations of Blocked Algorithms (context) - Lam, Rothberg et al. - 1991

BibTeX entry:   (Update)

S. Carr and Y. Guan. Unroll-and-jam using uniformly generated sets. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-97), pages 349--357, Los Alamitos, December1--3 1997. IEEE Computer Society. http://citeseer.ist.psu.edu/132810.html   More

@inproceedings{ carr97unrolljam,
    author = "Steve Carr and Yiping Guan",
    title = "Unroll-and-Jam Using Uniformly Generated Sets",
    booktitle = "International Symposium on Microarchitecture",
    pages = "349-357",
    year = "1997",
    url = "citeseer.ist.psu.edu/132810.html" }
Citations (may not include all citations):
474   A data locality optimizing algorithm (context) - Wolf, Lam - 1991  ACM   DBLP
344   Design and evaluation of a compiler algorithm for prefetchin.. - Mowry, Lam et al. - 1992  ACM   DBLP
216   Strategies for cache and local memory management by global p.. (context) - Gannon, Jalby et al. - 1987
162   Improving data locality with loop transformations - McKinley, Carr et al. - 1996  ACM   DBLP
158   Improving register allocation for subscripted variables - Callahan, Carr et al. - 1990  ACM   DBLP
110   Practical dependence testing - Goff, Kennedy et al. - 1991  ACM   DBLP
79   Combining loop transformations considering cachesand schedul.. (context) - Wolf, Maydan et al. - 1996
72   A catalogue of optimizing transformations (context) - Allen, Cocke - 1972
69   Estimating interlock and improving balance for pipelined mac.. - Callahan, Cocke et al. - 1988
66   ParaScope: A parallel programming environment (context) - Callahan, Cooper et al. - 1987
51   Improving the ratio of memory operations to floating-point o.. - Carr, Kennedy - 1994  ACM   DBLP
49   Memory-Hierarchy Management - Carr - 1992  ACM
47   Scalar replacement in the presence of conditional control fl.. - Carr, Kennedy - 1994  ACM   DBLP
26   Combining optimization for cache and instructionlevel parall.. - Carr - 1996
13   Loop quantization: An analysis and algorithm (context) - Aiken, Nicolau - 1987  ACM
1   Unroll-and-jam guided by a linear-algebra-based reuse model (context) - Guan - 1995



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.cs.mtu.edu/~carr/Publications.html):   More
Using Value Cloning To Improve Code Generation For Software.. - Kuras (1998)   (Correct)
The Performance of Scalar Replacement on the HP 715/50 - Steve Carr (1995)   (Correct)
Using Genetic Algorithms to Fine-Tune.. - Beaty, Colcord, Sweany (1996)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC