Results 1 - 10
of
100
Iterative modulo scheduling: An algorithm for software pipelining loops
- In Proceedings of the 27th Annual International Symposium on Microarchitecture
, 1994
"... Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characte ..."
Abstract
-
Cited by 263 (2 self)
- Add to MetaCart
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Minimizing Register Requirements under Resource-Constrained Rate-Optimal Software Pipelining
, 1995
"... The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs ..."
Abstract
-
Cited by 73 (13 self)
- Add to MetaCart
The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (`a la rate-optimal) while minimizing the number of buffers --- a close approximation to minimizing the number of registers. The main contributions of this paper are: ffl First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rate-optimal schedules) under different sets of constraints. ffl Secondly, we show that a precise mathematical formulation...
Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo Schedule
- IN PROC. OF THE 28TH ANNUAL INT. SYMP. ON MICROARCHITECTURE (MICRO-28
, 1995
"... Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a set of low computational complexity stage-scheduling heuristics that reduce the register requirements o ..."
Abstract
-
Cited by 57 (5 self)
- Add to MetaCart
Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a set of low computational complexity stage-scheduling heuristics that reduce the register requirements of a given modulo schedule by shifting operations by multiples of II cycles. Measurements on a benchmark suite of 1289 loops from the Perfect Club, SPEC-89, and the Livermore Fortran Kernels shows that our best heuristic achieves on average 99% of the decrease in register requirements obtained by an optimal stage scheduler.
Software pipelining showdown: Optimal vs. heuristic methods in a production compiler
- In Proc. of the ACM SIGPLAN'96 Conf. on Programming Languages Design and Implementation
, 1996
"... This paper is a scientific comparison of two code generation tech-niques with identical goals — generation of the best possible soft-ware pipelined code for computers with instruction level parallelism. Both are variants of modulo scheduling, a framework for generation of soflware pipelines pioneere ..."
Abstract
-
Cited by 53 (9 self)
- Add to MetaCart
This paper is a scientific comparison of two code generation tech-niques with identical goals — generation of the best possible soft-ware pipelined code for computers with instruction level parallelism. Both are variants of modulo scheduling, a framework for generation of soflware pipelines pioneered by Rau and Glaser [RaG181], but are otherwise quite dissimilar. One technique was developed at Silicon Graphics and is used in the MIPSpro compiler. This is the production compiler for SG1’S systems which are based on the MIPS R8000 processor [Hsu94]. It is essentially a branch-and-bound enumeration of possible sched-ules with extensive pruning. This method is heuristic becaus(s of the way it prunes and also because of the interaction between reg-ister allocation and scheduling. The second technique aims to produce optimal results by formulat-
Hypernode Reduction Modulo Scheduling
- IN PROC. OF THE 28TH ANNUAL INT. SYMP. ON MICROARCHITECTURE (MICRO28
, 1995
"... Software Pipelining is a loop scheduling technique that extracts parallelism from loops by overlapping the execution of several consecutive iterations. Most prior scheduling research has focused on achieving minimum execution time, without regarding register requirements. Most strategies tend to str ..."
Abstract
-
Cited by 53 (22 self)
- Add to MetaCart
Software Pipelining is a loop scheduling technique that extracts parallelism from loops by overlapping the execution of several consecutive iterations. Most prior scheduling research has focused on achieving minimum execution time, without regarding register requirements. Most strategies tend to stretch operand lifetimes because they schedule some operations too early or too late. The paper presents a novel strategy that simultaneously schedules some operations late and other operations early, minimizing all the stretchable dependencies and therefore reducing the registers required by the loop. The key of this strategy is a pre-ordering phase that selects the order in which the operations will be scheduled. The results show that the method described in this paper performs better than other heuristic methods and almost as well as a linear programming method but requiring much less time to produce the schedules.
Resource-Constrained Software Pipelining
- Advances in Languages and Compilers for Parallel Processing, Res. Monographs in Parallel and Distrib. Computing, chapter 14
, 1995
"... This paper presents a software pipelining algorithm for the automatic extraction of fine-grain parallelism in general loops. The algorithm accounts for machine resource constraints in a way that smoothly integrates the management of resource constraints with software pipelining. Furthermore, general ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
This paper presents a software pipelining algorithm for the automatic extraction of fine-grain parallelism in general loops. The algorithm accounts for machine resource constraints in a way that smoothly integrates the management of resource constraints with software pipelining. Furthermore, generality in the software pipelining algorithm is not sacrificed to handle resource constraints, and scheduling choices are made with truly global information. Proofs of correctness and the results of experiments with an implementation are also presented. 1 Introduction Recently there has been considerable interest in a class of compiler parallelization techniques known collectively as software pipelining. Software pipelining algorithms compute a static parallel schedule overlapping the operations of a loop body in much the same way that a hardware pipeline overlaps operations in a dynamic instruction stream. The schedule computed by a software pipelining algorithm is suitable for execution on a ...
Two-level Hierarchical Register File Organization for VLIW Processors
- In International Symposium on Microarchitecture
, 2000
"... High-performance microprocessors are currently designed to exploit the inherent instruction level parallelism (ILP) available in most applications. The techniques used in their design and the aggressive scheduling techniques used to exploit this ILP tend to increase the register requirements of the ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
High-performance microprocessors are currently designed to exploit the inherent instruction level parallelism (ILP) available in most applications. The techniques used in their design and the aggressive scheduling techniques used to exploit this ILP tend to increase the register requirements of the loops. If more registers than those available in the architecture are required, some actions (such as spill code insertion) have to be applied to reduce this pressure, at the expense of some performance degradation. This degradation could be avoided if a high--capacity register file were included without causing a negative impact on the cycle time of the processor. In this paper we propose a two-level hierarchical register file organization for VLIW architectures that combines high capacity and low access time. For the configuration proposed in this paper, the new organization achieves a speed--up of 10--14% over a monolithic organization with 64 registers; it is obtained with a 43% (40%) reduction in area (peak power dissipation). Compared to a monolithic file with 32 registers, the speed--up is as much as 38% with just a 14% (4%) increase in area (peak power dissipation). 1.
Minimum register requirements for a modulo schedule
- In Proc. of the 27th Ann. Intl. Symp. on Microarchitecture
, 1994
"... Modulo scheduling is an e cient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a combined approach that schedules the loop operations for minimum register requirements, given a modulo re ..."
Abstract
-
Cited by 35 (6 self)
- Add to MetaCart
Modulo scheduling is an e cient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a combined approach that schedules the loop operations for minimum register requirements, given a modulo reservation table. Our method determines optimal register requirements for machines with nite resources and for general dependence graphs. This method demonstrates the potential of lifetime-sensitive modulo scheduling and is useful in evaluating the performance of lifetime-sensitive modulo scheduling heuristics. 1
Software Pipelining
, 1995
"... Utilizing parallelism at the instruction level is an important way to improve performance. Since the time spent in loop execution dominates total execution time, a large body of optimizations focus on decreasing the time to execute each iteration. Software pipelining is a technique that reforms t ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
Utilizing parallelism at the instruction level is an important way to improve performance. Since the time spent in loop execution dominates total execution time, a large body of optimizations focus on decreasing the time to execute each iteration. Software pipelining is a technique that reforms the loop so that a faster execution rate is realized. Iterations are executed in overlapped fashion to increase parallelism. 1 Let --ABC n represent a loop containing operations A, B, C that is executed n times. Although the operations of a single iteration can be parallelized, more parallelism may be achievable if the entire loop is considered rather than a single iteration. The software pipelining transformation utilizes the fact that a loop --ABC n is equivalent to A--BCA n\Gamma1 BC. Although the operations contained in the loop do not change, the operations are from different iterations of the original loop. Various algorithms for software pipelining exist. A comparison of ...
Unrolling-Based Optimizations for Modulo Scheduling
- in Proceedings of the 28th International Symposium on Microarchitecture
, 1995
"... Modulo scheduling is a method for overlapping successive iterations of a loop in order to find sufficient instruction-level parallelism to fully utilize high-issue-rate processors. The achieved throughput modulo scheduled loop depends on the resource requirements, the dependence pattern, and the reg ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Modulo scheduling is a method for overlapping successive iterations of a loop in order to find sufficient instruction-level parallelism to fully utilize high-issue-rate processors. The achieved throughput modulo scheduled loop depends on the resource requirements, the dependence pattern, and the register requirements of the computation in the loop body. Traditionally, unrolling followed by acyclic scheduling of the unrolled body has been an alternative to modulo scheduling. However, there are benefits to unrolling even if the loop is to be modulo scheduled. Unrolling can improve the throughput by allowing a smaller non-integral effective initiation interval to be achieved. After unrolling, optimizations can be applied to the loop that reduce both the resource requirements and the height of the critical paths. Together, unrolling and unrolling-based optimizations can enable the completion of multiple iterations per cycle in some cases. This paper describes the benefits of unrolling and ...

