Results 1 - 10
of
43
Iterative modulo scheduling: An algorithm for software pipelining loops
- In Proceedings of the 27th Annual International Symposium on Microarchitecture
, 1994
"... Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characte ..."
Abstract
-
Cited by 263 (2 self)
- Add to MetaCart
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Minimizing Register Requirements under Resource-Constrained Rate-Optimal Software Pipelining
, 1995
"... The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs ..."
Abstract
-
Cited by 73 (13 self)
- Add to MetaCart
The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (`a la rate-optimal) while minimizing the number of buffers --- a close approximation to minimizing the number of registers. The main contributions of this paper are: ffl First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rate-optimal schedules) under different sets of constraints. ffl Secondly, we show that a precise mathematical formulation...
A Framework for Balancing Control Flow and Predication
- IN PROCEEDINGS OF THE 30TH INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE
, 1997
"... Predicated execution is a promising architectural feature for exploiting instruction-level parallelism in the presence of control flow. Compiling for predicated execution involves converting program control flow into conditional, or predicated, instructions. This process is known as if-conversion. I ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
Predicated execution is a promising architectural feature for exploiting instruction-level parallelism in the presence of control flow. Compiling for predicated execution involves converting program control flow into conditional, or predicated, instructions. This process is known as if-conversion. In order to effectively apply if-conversion, one must address two major issues: what should be if-converted and when the if-conversion should be applied. A compiler's use of predication as a representation is most effective when large amounts of code are if-converted and if-conversion is performed early in the compilation procedure. On the other hand, the final code generated for a processor with predicated execution requires a delicate balance between control flow and predication to achieve efficient execution. The appropriate balance is tightly coupled with scheduling decisions and detailed processor characteristics. This paper presents an effective compilation framework that allows the compiler to maximize the benefits of predication as a compiler representation while delaying the final balancing of control flow and predication to schedule time.
Resource-Constrained Software Pipelining
- Advances in Languages and Compilers for Parallel Processing, Res. Monographs in Parallel and Distrib. Computing, chapter 14
, 1995
"... This paper presents a software pipelining algorithm for the automatic extraction of fine-grain parallelism in general loops. The algorithm accounts for machine resource constraints in a way that smoothly integrates the management of resource constraints with software pipelining. Furthermore, general ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
This paper presents a software pipelining algorithm for the automatic extraction of fine-grain parallelism in general loops. The algorithm accounts for machine resource constraints in a way that smoothly integrates the management of resource constraints with software pipelining. Furthermore, generality in the software pipelining algorithm is not sacrificed to handle resource constraints, and scheduling choices are made with truly global information. Proofs of correctness and the results of experiments with an implementation are also presented. 1 Introduction Recently there has been considerable interest in a class of compiler parallelization techniques known collectively as software pipelining. Software pipelining algorithms compute a static parallel schedule overlapping the operations of a loop body in much the same way that a hardware pipeline overlaps operations in a dynamic instruction stream. The schedule computed by a software pipelining algorithm is suitable for execution on a ...
Exploiting Instruction Level Parallelism in the Presence of Conditional Branches
, 1996
"... Wide issue superscalar and VLIW processors utilize instruction-level parallelism (ILP) to achieve high performance. However, if insufficient ILP is found, the performance potential of these processors suffers dramatically. Branch instructions, which are one of the major limitations to exploiting ILP ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
Wide issue superscalar and VLIW processors utilize instruction-level parallelism (ILP) to achieve high performance. However, if insufficient ILP is found, the performance potential of these processors suffers dramatically. Branch instructions, which are one of the major limitations to exploiting ILP, enforce strict ordering conditions in programs to ensure correct execution. Therefore, it is difficult to achieve the desired overlap of instruction execution with branches in the instruction stream. To effectively exploit ILP in the presence of branches requires efficient handling of branches and the dependences they impose. This dissertation investigates two techniques for exposing and enhancing ILP in the presence of branches, speculative execution and predicated execution. Speculative execution enables an ILP compiler to remove dependences between instructions and prior branches. In this manner, the execution of instructions and predicted future instructions may be overlapped. Compiler-controlled speculative execution is employed using an efficient structure called the superblock. The formation and optimization of superblocks increase ILP along important execution paths by systematically removing constraints due to unimportant paths. In conjunction with superblock optimizations, speculative execution is utilized to remove control dependences in the superblock
Software Pipelining
, 1995
"... Utilizing parallelism at the instruction level is an important way to improve performance. Since the time spent in loop execution dominates total execution time, a large body of optimizations focus on decreasing the time to execute each iteration. Software pipelining is a technique that reforms t ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
Utilizing parallelism at the instruction level is an important way to improve performance. Since the time spent in loop execution dominates total execution time, a large body of optimizations focus on decreasing the time to execute each iteration. Software pipelining is a technique that reforms the loop so that a faster execution rate is realized. Iterations are executed in overlapped fashion to increase parallelism. 1 Let --ABC n represent a loop containing operations A, B, C that is executed n times. Although the operations of a single iteration can be parallelized, more parallelism may be achievable if the entire loop is considered rather than a single iteration. The software pipelining transformation utilizes the fact that a loop --ABC n is equivalent to A--BCA n\Gamma1 BC. Although the operations contained in the loop do not change, the operations are from different iterations of the original loop. Various algorithms for software pipelining exist. A comparison of ...
Global Predicate Analysis and its Application to Register Allocation
- in Proceedings of the 29th International Symposium on Microarchitecture
, 1996
"... To fully utilize the wide machine resources in modern high-performance microprocessors it is necessary to exploit parallelism beyond individual basic blocks. Architectural support for predicated execution increases the degree of instruction level parallelism by allowing instructions from different b ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
To fully utilize the wide machine resources in modern high-performance microprocessors it is necessary to exploit parallelism beyond individual basic blocks. Architectural support for predicated execution increases the degree of instruction level parallelism by allowing instructions from different basic blocks to be converted to straight-line code guarded by boolean predicates. However, predicated execution also presents significant challenges to an optimizing compiler. For example, in live range analysis, a predicated definition does not necessarily end the live range of a virtual register. This paper describes techniques to analyze the relations among predicates in order to improve the precision and effectiveness of various compiler analysis and transformation phases in the presence of predicated code. Our predicate analysis operates globally to obtain relations among predicates. Moreover, we analyze control flow and predication in a single unified framework. The result can be queri...
Guarded Execution and Branch Prediction in Dynamic ILP Processors
- IN PROCEEDINGS OF THE 21ST INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1994
"... In this paper we evaluate the effects of guarded (or conditional, or predicated) execution on the performance of an instruction level parallel processor employing dynamic branch prediction. First, we assess the utility of guarded execution, both qualitatively and quantitatively, using a variety of a ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
In this paper we evaluate the effects of guarded (or conditional, or predicated) execution on the performance of an instruction level parallel processor employing dynamic branch prediction. First, we assess the utility of guarded execution, both qualitatively and quantitatively, using a variety of application programs. Our assessment shows that guarded execution significantly increases the opportunities for both a compiler, and dynamic hardware, to extract and exploit parallelism. However, existing methods of specifying guarded execution have several drawbacks that limit its use. Second, we study the interaction of guarding and dynamic branch prediction. No clear trends emerge regarding the ability of guarding to uniformly eliminate branches with poor predictability. In some cases guarding eliminates branches with a poor prediction accuracy, in other cases it eliminates branches with good predictability. However, the use of guarding results in a significant increase in the dynamic wind...
Partial dead code elimination using slicing transformations
- In Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation
, 1997
"... ..."
Predicated Static Single Assignment
- IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES
, 1999
"... Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multipl ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose such parallelism, traditional compiler data-flow analysis needs to be extended to predicated code. In this paper, we present

