(Enter summary)
Abstract: Exploiting parallelism at both the multiprocessor level
and the instruction level is an effective means for supercomputers
to achieve high-performance. The amount
of instruction-level parallelism available to superscalar or
VLIW node processors can be limited, however, with conventional
compiler optimization techniques. In this paper,
a set of compiler transformations designed to increase
instruction-level parallelism is described. The effectiveness
of these transformations is evaluated using... (Update)
Context of citations to this paper: More
.... with other code optimizations, can increase instruction level parallelism and improve memory hierarchy locality [Alex93, Baco94, Davi94, Davi96, Mahl92], as described in chapter four and chapter five. A detailed example of loop unrolling has already been presented in...
...by the compiler and discusses the effect of loop unrolling on instruction size buffer and register pressure within the loop. Mahlke [19] studies optimizations which can increase instruction level parallelism for supercomputer. Loop unrolling is one of them. By analyzing...
Cited by: More
The Impact SC140 Code Generator - Shannon (2002)
(Correct)
Runtime Predictability of Loops - de Alba, Kaeli (2001)
(Correct)
Compiler and Microarchitecture Mechanisms for Exploiting.. - Postiff (2001)
(Correct)
Active bibliography (related documents): More All
0.5: Hyperblock Performance Optimizations For ILP Processors - August (1996)
(Correct)
0.2: Using Profile Information to Assist Advanced.. - Chen, Mahlke.. (1992)
(Correct)
0.2: Three Architectural Models for Compiler-Controlled.. - Chang, Warter.. (1995)
(Correct)
Similar documents based on text: More All
0.2: Acceleration of First and Higher Order Recurrences on.. - Schlansker, Kathail (1993)
(Correct)
0.2: Enhancing Instruction Level Parallelism Through.. - Bringmann (1995)
(Correct)
0.2: Tolerating Data Access Latency with Register Preloading - William Chen (1992)
(Correct)
Related documents from co-citation: More All
6: Trace Scheduling: A Technique for Global Microcode Compaction (context) - Fisher - 1981
6: IMPACT: An architectural framework for multiple-instruction-issue processors
- Chang, Mahlke et al. - 1991
6: Dependence Graphs and Compiler Optimization (context) - Kuck - 1981
BibTeX entry: (Update)
S. A. Mahlke, W. Y. Chen, J. C. Gyllenhaal, W. W. Hwu, P. P. Chang, and T. Kiyohara, "Compiler Code Transformations for Superscalar-Based High-Performance Systems," in Proceedings of Supercomputing `92, Nov. 1992. http://citeseer.ist.psu.edu/mahlke92compiler.html More
@inproceedings{ mahlke92compiler,
author = "Scott A. Mahlke and William Y. Chen and John C. Gyllenhaal and Wen-mei W. Hwu and Pohua P. Chang and Tokuzo Kiyohara",
title = "Compiler Code Transformations for Superscalar-Based High-Performance Systems",
booktitle = "Proceedings Supercomputing '92",
publisher = "IEEE",
address = "Minn., MN",
pages = "808--817",
year = "1992",
url = "citeseer.ist.psu.edu/mahlke92compiler.html" }
Citations (may not include all citations):
1399
Compilers: Principles (context) - Aho, Sethi et al. - 1986
407
Trace scheduling: A technique for global microcode compactio.. (context) - Fisher - 1981
353
Software pipelining: An effective scheduling technique for V.. (context) - Lam - 1988
296
Free Software Foundation (context) - Stallman, porting - 1989
217
The PERFECT club benchmarks: Effective performance evaluatio..
- Berry - 1989
173
Bulldog: A Compiler for VLIW Architectures (context) - Ellis - 1985
171
Dependence graphs and compiler optimizations (context) - Kuck, Kuhn et al. - 1981
160
IMPACT: An architectural framework for multiple-instruction-..
- Chang, Mahlke et al. - 1991
130
A VLIW architecture for a trace scheduling compiler (context) - Colwell, Nix et al. - 1987
104
The Structure of Computers and Computations (context) - Kuck - 1978
90
Optimal loop parallelization (context) - Aiken, Nicolau - 1988
37
The Cydra 5 departmental supercomputer (context) - Rau, Yen et al. - 1989
33
KAP User's Guide (context) - Associates, Champaign - 1988
31
The superblock: An effective structure for VLIW and supersca.. (context) - Hwu - 1992
25
Instruction-level parallel processing (context) - Fisher, Rau - 1991
6
Compilation of arithmetic expressions for parallel computati.. (context) - Baer, Bovet - 1968
5
Combining as a compilation technique for VLIW architectures (context) - Nakatani, Ebcioglu - 1989
5
Code compaction for parallel architectures (context) - Anantha, Long - 1990
2
An instructionlevel performance analysis of the Multiflow TR.. (context) - Schuette, Shen - 1991
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.crhc.uiuc.edu/Impact/people/graduated/mahlke/mahlke_pubs.html): More
Design And Implementation Of A Portable Global Code Optimizer - Mahlke (1992)
(Correct)
The Importance of Prepass Code Scheduling for.. - Chang, Lavery.. (1994)
(Correct)
Using Profile Information to Assist Classic Code Optimizations - Chang (1991)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC