(Enter summary)
Abstract: This report describes parallelization techniques for accelerating a broad class of recurrences on
processors with instruction level parallelism. We introduce a new technique, called blocked
back-substitution, which has lower operation count and higher performance than previous
methods. The blocked back-substitution technique requires unrolling and non-symmetric
optimization of innermost loop iterations. We present metrics to characterize the performance
of software-pipelined loops and compare... (Update)
Context of citations to this paper: More
...by the optimization. Recently, transformations have been proposed which require that the loop be unrolled. Blocked backsubstitution [17] unrolls the loop b times and reduces the RecMII by a factor of b. Control recurrences within loops can also be accelerated by a factor of b...
...using a variety of transformations. These include expression re association, tree height reduction [11] and blocked back substitution [17]. Although ILP compilers may aggressively restructure computation, they typically preserve the program s original control structure. This...
Cited by: More
Iterative Modulo Scheduling: An Algorithm for Software Pipelining.. - Rau (1994)
(Correct)
Height Reduction of Control Recurrences for ILP Processors - Michael Schlansker Vinod (1994)
(Correct)
Modulo Scheduling, Machine Representations, and.. - Eichenberger (1997)
(Correct)
Active bibliography (related documents): More All
0.7: Solving Linear Recurrences with Loop Raking - Guy Blelloch School (1992)
(Correct)
0.5: Control CPR: A Branch Height Reduction Optimization for .. - Schlansker, Mahlke.. (1999)
(Correct)
0.3: Loop Optimization Techniques On Multi-Issue Architectures - Kaiser
(Correct)
Similar documents based on text: More All
1.1: Parallelization of Control Recurrences for ILP Processors - Schlansker, Kathail, Anik (1994)
(Correct)
0.2: Automatic architectural synthesis of VLIW and EPIC processors - Aditya, Rau, Kathail
(Correct)
0.2: Compiler Code Transformations for Superscalar-Based.. - Mahlke, Chen.. (1992)
(Correct)
Related documents from co-citation: More All
4: Some scheduling techniques and an easily schedulable horizontal architecture for.. (context) - Rau, Glaeser - 1981
4: Trace Scheduling: A Technique for Global Microcode Compaction (context) - Fisher - 1981
4: Parallelization of loops with exits on pipelined architectures (context) - Tirumalai, Lee et al. - 1990
BibTeX entry: (Update)
M. Schlansker and V. Kathail, "Acceleration of first and higher order recurrences on processors with instruction level parallelism," in Proceedings of Languages and Compilers for Parallel Computing, 6th International Workskop, August 1993. http://citeseer.ist.psu.edu/schlansker93acceleration.html More
@inproceedings{ schlansker93acceleration,
author = "Michael S. Schlansker and Vinod Kathail",
title = "Acceleration of First and Higher Order Recurrences on Processors with Instruction Level Parallelism",
booktitle = "Languages and Compilers for Parallel Computing",
pages = "406-429",
year = "1993",
url = "citeseer.ist.psu.edu/schlansker93acceleration.html" }
Citations (may not include all citations):
407
Trace Scheduling: A Technique for Global Microcode Compactio.. (context) - Fisher - 1981
176
Some Scheduling Techniques and an Easily Schedulable Horizon.. (context) - Rau, Glaeser - 1981
164
The Superblock: An Effective Technique for VLIW and Supersca.. (context) - Hwu - 1993
156
The Multiflow Trace Scheduling Compiler
- Lowney - 1993
104
The structure of Computers and Computations (context) - Kuck - 1978
66
A Systolic Array Optimizing Compiler (context) - Lam - 1987
46
The Journal of Supercomputing (context) - Dehnert, Towle et al. - 1993
25
Recognizing and Parallelizing Bounded Recurrences (context) - Callahan - 1991
24
ACM Transactions on Mathematical Software (context) - Wang, Method et al. - 1981
21
Practical Parallel Band Triangular System Solvers (context) - Chen, Kuck et al. - 1978
14
Some Aspects of the Cyclic Reduction Algorithm for Block Tri.. (context) - Heller - 1976
13
Solving Triangular Systems on a Parallel Computer (context) - Sameh, Brent - 1977
11
Data Flow and Dependence Analysis for Instruction Level Para.. (context) - Rau - 1992
10
Parallel Tridiagonal Equation Solvers (context) - Stone - 1975
10
Compiling Techniques for First-Order Linear Recurrences on a.. (context) - Tanaka - 1988
7
Time and Parallel Processor Bounds for Linear Recurrence Sys.. (context) - Chen, Kuck - 1975
4
Code Generation Schemas for Modulo Scheduled DO-Loops and WH.. (context) - Rau, Schlansker et al. - 1992
2
Vectorization of Linear Recurrence Relations (context) - Van Der Vorst, Dekker - 1989
2
Acceleration of Algebraic Recurrences on Processors with Ins.. (context) - Schlansker, Kathail - 1993
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.hpl.hp.com/research/itc/car/papers/): More
Code Size Minimization and Retargetable Assembly for custom .. - Aditya, Mahlke, Rau (2000)
(Correct)
Automatic architectural synthesis of VLIW and EPIC processors - Aditya, Rau, Kathail
(Correct)
Parallelization of Control Recurrences for ILP Processors - Schlansker, Kathail, Anik (1994)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC