107 citations found. Retrieving documents...
Bernstein, D. and Rodeh, M. 1991. Global instruction scheduling for superscalar machines. In Proceedings of the SIGPLAN '91 ConferenceonProgramming Language Design and Implementation.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Non-Local Instruction Scheduling with Limited Code Growth - Keith Cooper Philip (1998)   (6 citations)  (Correct)

....unique entry point. This method can lead to better runtime performance than trace scheduling, but the block duplication can increase code size. Several other techniques that benefit from code replication or growth have been used. These include Bernstein and Rodeh s Global Instruction Scheduling [4, 2], and Ebcioglu and Nakatani s Enhanced Percolation Scheduling [6] 2 3 The Two Techniques In this section we look at two non local scheduling techniques specifically designed to avoid increasing code size, namely dominator path scheduling (dps) and extended basic block scheduling (ebbs) We ....

David Bernstein and Michael Rodeh. Global instruction scheduling for superscalar machines. SIGPLAN Notices, 26(6):241--255, June 1991. Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation.


Non-Local Instruction Scheduling with Limited Code Growth - Cooper, Schielke (1998)   (6 citations)  (Correct)

....unique entry point. This method can lead to better runtime performance than trace scheduling, but the block duplication can increase code size. Several other techniques that benefit from code replication or growth have been used. These include Bernstein and Rodeh s Global Instruction Scheduling [4, 2], Ebcioglu and Nakatani s Enhanced Percolation Scheduling [6] and Gupta and So#a s Region Scheduling [12] 3 The Two Techniques In this section we look at two non local scheduling techniques specifically designed to avoid increasing code size, namely dominator path scheduling (dps) and ....

David Bernstein and Michael Rodeh. Global instruction scheduling for superscalar machines. SIGPLAN Notices, 26(6):241--255, June 1991. Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation.


Meld Scheduling: A Technique for Relaxing Scheduling.. - Abraham, Kathail, Deitrich (1998)   (1 citation)  (Correct)

....length. Thus, the superscalar meld scheduler does not cause any increase in schedule length while attempting to reduce stalls due to inter region dangles. 5 Related work There is a substantial body of work in the area of instruction scheduling for instruction level parallel (ILP) machines [1, 2, 4 7, 14, 15] . Most of the work, however, is directed at two related areas. The first area is the type of scheduling region, e.g. trace, superblock, hyperblock, general DAG, innermost loops. The motivation here is either to enlarge the scope of scheduling or to simplify compiler engineering. The second area ....

....loops. The motivation here is either to enlarge the scope of scheduling or to simplify compiler engineering. The second area is the actual scheduling algorithm and heuristics used within a region. 29 Many of these scheduling techniques are developed in the context of superscalar machines [1, 4, 6]. Thus, they accurately model resource usage and latencies within a region in order to get the best performance. But they ignore the constraints at region boundaries in the hope that any required runtime stalls will not affect the performance significantly. In contrast, both the Multiflow Trace ....

D. Bernstein and M. Rodeh, "Global instruction scheduling for superscalar machines," presented at SIGPLAN '91 Conference on Programming Language Design and Implementation, 1991.


Issues in Instruction Scheduling - Schielke (1998)   (Correct)

....have executed. This scheduling method prohibits moving an operation between basic blocks, if that move would require the operation to be copied to another basic block. Bernstein, et al. use control dependence information (via the Program Dependence Graph or PDG) to guide the scheduling process [2]. A set of nodes with the same control dependences are scheduled together. This set of nodes has the interesting property that if one of the nodes execute, all of them do. In this work we will be looking at various scheduling techniques. We would like to evaluate these techniques in both a local ....

David Bernstein and Michael Rodeh. Global instruction scheduling for superscalar machines. SIGPLAN Notices, 26(6):241--255, June 1991. Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation.


A Study of Control Independence with a Single Flow of Control - Rotenberg, Jacobson, Smith   (Correct)

....in dynamic program order that is control independent of the branch in block 2, is referred to as the immediate post dominator of block 2. More formally, the immediate post dominator of a branch is the basic block nearest the branch which lies on every path between the branch and the CFG exit block [8,9]. The first instruction in block 5 is called the reconvergent point with respect to the branch in block 2. For the specific misprediction in the example above, block 3b is the incorrect control dependent block and block 4b is the correct control dependent block. 1.2 Why study control ....

D. Bernstein and M. Rodeh. Global instruction scheduling for superscalar machines. Proc. ACM Conference on Programming Language Design and Implementation, June 1991.


Compiler Optimization on Instruction Scheduling for Low Power - Jenq (2000)   (10 citations)  (Correct)

....high performance and low power consumption. To such a high performance embedded system, we need to address both the issues of high performance computing and low power consumption. Much work on aggressive compiler and software optimization has put emphasis on delivering high performance computing [2, 3, 4, 5]. However, less attention was paid to reducing power during compiler optimization. For that reason, we will study the aspect of compiler transformations to reduce power consumptions for such a system. In CMOS circuits, power is dissipated in a gate when the gate output changes from 0 to 1 or from ....

....In this work, list scheduling algorithm [2] will be used in the first phase for performance optimization. List scheduling programs are easy to write, and can compact original microinstructions approximately as fast as linear analysis [2] However, any conventional VLIW instruction scheduler [2, 3, 4, 5] can be used. Since list scheduling algorithm is well documented, we will focus on algorithms as presented in the following subsections to re schedule instructions for power optimization. 3.1 Horizontal Scheduling We first propose a horizontal scheduling algorithm to re schedule instruction ....

D. Bernstein, and M. Rodeh, "Global Instruction Scheduling for Superscalar Machines", Proceedings of SIGPLAN '91 Conf. on Programming Language Design and Implementation, June 1991.


General-Purpose Architecture Instruction Scheduling Techniques - De Sutter (1998)   (Correct)

....average somewhat longer schedules. 16 1 1 10 T T T T T T T T T T T T F F F F F F F F ENTRY EXIT 10 2 2 3 6 7 6 4 5 8 9 3 4 5 7 8 9 The CSPDG of a sample program. And its corresponding forward control subgraph. 3.1. 4 A Program Dependence Graph Approach In [12], Bernstein and Rodeh, propose another global scheduling algorithm, based on the Program Dependency Graph (PDG) Compared to trace scheduling, their algorithm is not based on the assumption that a main trace exists and the algorithm de nes a framework to distinguish between useful and speculative ....

Bernstein, D., and Rodeh, M. Global instruction scheduling for superscalar machines. In Proceedings of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation (Toronto, Ontario, Canada, June 1991), pp. 241-255.


Adaptive Explicitly Parallel Instruction Computing - Talla (2000)   (4 citations)  (Correct)

....Two of the proposals for split points are splitting around loops, at dominance frontiers or at reverse dominance frontiers. Hansoo [87] proposed a frequency based live range splitting algorithm which attempts to split along the least frequent edges in the control flow graph. Bernstien et al. [14] proposed live range selection heuristics. The heuristics give an estimate of a live range s contribution to the total resource pressure. Callahan and koblenz [24] proposed a hierarchical method to heuristically prune the interference graph. We propose a pruning technique based on the ....

D. Bernstein and M. Rodeh. Global instruction scheduling for superscalar machines, 1991.


Efficient Superscalar Performance Through Boosting - Michael Smith Mark (1992)   (48 citations)  (Correct)

....Scheduling [20] which describes a complete set of semantics preserving transformations for moving any operation between adjacent blocks. The next step taken by global scheduling researchers was to extend the neighborhood to include conditional pairs [27] also called equivalent basic blocks [2]) Two basic blocks are equivalent if and only if the execution of one block implies the execution of the other block; equivalence is simply a combination of the move op and unification transformations of Percolation Scheduling for control independent basic blocks. We refer to these types of ....

....the best schedule, and finally invoke the global transformations to safely move the requested instructions to the current scheduling point. The key difference between schedulers is how the available instruction set is generated, and the types of global transformation used. Bernstein and Rodeh [2] describe a scheduling algorithm that looks in neighbor and peer basic blocks for available instructions. Neighbor and peer basic blocks are only a small set of the blocks from which instructions are available, and thus, this decision greatly limits the size of the available set. The ....

D. Bernstein and M. Rodeh. Global Instruction Scheduling for Superscalar Machines. In Proc. ACM SIGPLAN `91 Conf. on Programming Language Design and Implementation, pp. 241--255, June 1991.


A New Framework for Integrated Global Local Scheduling - Mantripragada, Jain, Dehnert (1998)   (7 citations)  (Correct)

....global scheduling algorithms as either profile or structure driven. A structure driven scheduler attempts to identify parallelism along all execution paths within a region, with priorities based on program structure. Region scheduling [3] percolation scheduling [11] global instruction scheduling [1], all fall in this category. These approaches attempt to increase parallelism by moving operations between basic blocks without considering execution frequency. As a result, their decisions may not always be profitable. Recently, there has been more focus on feedbackdirected compilation. Profile ....

....and where the execution costs of predication are low. In the MIPS architecture there is limited support for predicated execution (in the form of conditional moves) Emulating the effect of full predication is expensive and the approach is not viable. Among structure driven approaches, Bernstein [1] used the program dependence graph for global scheduling. Their global scheduling approach included reordering of instructions among equivalent blocks or 1 level speculative blocks within a given loop body. Region scheduling [3] also used the PDG and provided a more powerful set of code ....

[Article contains additional citation context not shown here]

D. Bernstein and M. Rodeh. Global Instruction Scheduling for Superscalar Machines. In Conference Record of SIGPLAN Programing Language and Design Implementation, pages 241--255, 1991.


Data Dependence Analysis of Assembly Code - Amme, Braun, Zehendner, Thomasset   (5 citations)  (Correct)

....Most of today s instruction schedulers only determine data dependences between register accesses and consider memory to be one cell, so that every two memory accesses must be assumed as data dependent. Thus, analyzing memory accesses becomes more important while doing global instruction scheduling [3]. In this paper, we describe an intraprocedural value based data dependence analysis, see Maslov [14] for details about address based and value based data dependences) implemented in the context of the SALTO tool [19] SALTO is a framework to develop optimization and transformation techniques ....

D. Bernstein and M. Rodeh. Global instruction scheduling for superscalar machines. In Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 241--255, Toronto, Canada, June 1991.


Modulo Scheduling, Machine Representations, and.. - Eichenberger (1997)   (Correct)

.... may not schedule operations in cycle order, focusing initially on operations along critical paths [27] 30] 43] 49] 77] 95] they may backtrack to reverse poor scheduling decisions [27] 49] 59] 77] 83] 94] and they may hide long latencies by speculating operations across branches and basic blocks [12][20] 27] 30] 59] 67] High performance compilers have also used precisely detailed machine models [19] 27] 42] 45] 59] 78] to better utilize the machine resources of current processors with increasingly wider issue mechanisms, deeper pipelines, and more heterogeneous functional units. Precise ....

.... architectures such as X86, PA RISC, and SPARC to research architectures such as PlayDoh [54] With the recent emphasis on exploiting instruction level parallelism, compile time is increasingly spent in the contention query module as several cycles of a schedule, possibly from several basic blocks [12][67] are queried per operation in order to achieve good schedules. Optimizing contention query modules therefore has a significant impact on the overall performance of a compiler, as these time consuming queries are issued in the innermost loop of the scheduler. For example, the high performance ....

[Article contains additional citation context not shown here]

D. Bernstein and M. Rodeh. Global instruction scheduling for superscalar machines. In Proceedings of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation, pages 241--255, June 1991.


Path-Selection Heuristics for Dominator-Path Scheduling - Huber (1995)   (4 citations)  (Correct)

....copies. On machines with low levels of ILP, it might not be possible to place these copies in existing holes, requiring more instructions. In a tight loop, this can severely hurt performance. Several authors have presented algorithms to try to minimize the harmful effects of compensation copies[2, 8, 10]. 2.3 Dominator Path Scheduling Dominator path scheduling (DPS) avoids trace scheduling s potential for code explosion by avoiding compensation copies entirely. Sweany, concerned that compensation copies might not be profitable in all cases, wanted a global scheduler which would not allow ....

D. Bernstein and M. Rodeh. Global instruction scheduling for superscalar machines. In Conference on Programming Language Design and Implementation, pages 241--255, Toronto, June 1991. SIGPLAN '91.


Access Ordering Algorithms for a Multicopy Memory - Moyer (1992)   (1 citation)  (Correct)

....to here as access scheduling. Essentially, access scheduling techniques attempt to separate the execution of a load store instruction from the execution of the instruction which consumes produces its operand, reducing the time the processor spends delayed on memory requests. Bernstein and Rodeh [BeRo91] present an algorithm for scheduling intra loop instructions on superscalar architectures that accommodates load delay. Lam [Lam88] presents a technique referred to as software pipelining that structures code such that a given loop iteration loads the data for a later iteration, stores results ....

Bernstein-D, Rodeh-M, "Global Instruction Scheduling for Superscalar Machines", Proc. SIGPLAN'91 Conf. Prog. Lang. Design and Implementation, 1991, pp. 241-255.


Utilising Parallel Resources by Speculation - Unger, Zehendner, Ungerer   (Correct)

.... execution, predicated execution [8] numerous techniques to reduce instruction penalties [14] and the concurrent execution of more than one thread of control [9] 13] Current improvements in the field of scheduling techniques are: an enlargement of the program sections treated by the algorithms [1][4] 6] 17] the improvement of the used heuristics [5] the enhancement of the information made available by dataflow analysis [11] and a better exploitation of the processor properties [21] Neither the above mentioned improvements of scheduling techniques nor hardware techniques that implement ....

....decreases. Conditional branches seriously prevent the scheduling techniques from moving instructions to unused instruction slots. Whether a conditional branch is taken or not cannot be decided during compile time. Global scheduling techniques as for instance trace Scheduling [6] PDG Scheduling [1], Dominator path Scheduling [17] or Selective Scheduling [10] use various approaches to overcome this problem. However, investigations have shown that the conditions that must be fulfilled to safely move an instruction across the basic block boundaries are very restrictive. Therefore the ....

[Article contains additional citation context not shown here]

D. Bernstein and M. Rodeh. Global instruction scheduling for superscalar machines. In B. Hailpern, editor, Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 241-- 255, Toronto, ON, Canada, June 1991.


Understanding and Improving Register Assignment - Norris, Fenwick, Jr.   (Correct)

....strategies on a subsequent scheduling phase is explored. A new register assignment strategy and experimental results are presented. 1 Introduction An important phase of a compiler for a pipelined, superscalar, or VLIW machine is scheduling to increase available instruction level parallelism [1, 14, 2, 20]. Scheduling increases run time performance by rearranging the code to overlap the execution of low level machine instructions such as memory loads and stores, and integer and floating point operations to hide latencies and reduce possible run time delays. Unfortunately, scheduling interferes with ....

David Bernstein and Michael Rodeh. Global instruction scheduling for superscalar machines. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, CANADA, June 1991.


Access Ordering Algorithms for a Single Module Memory - Moyer (1992)   (3 citations)  (Correct)

....to here as access scheduling. Essentially, access scheduling techniques attempt to separate the execution of a load store instruction from the execution of the instruction which consumes produces its operand, reducing the time the processor spends delayed on memory requests. Bernstein and Rodeh [BeRo91] present an algorithm for scheduling intra loop instructions on superscalar architectures that accommodates load delay. Lam [Lam88] presents a technique referred to as software pipelining that structures code such that a given loop iteration loads the data for a later iteration, stores results ....

Bernstein-D, Rodeh-M, "Global Instruction Scheduling for Superscalar Machines", Proc. SIGPLAN'91 Conf. Prog. Lang. Design and Implementation, 1991, pp. 241-255.


Efficient Computation of Interprocedural Control Dependence - Ezick, Bilardi, Pingali   (Correct)

....both control and data dependence to allow programs to be sliced at a point p with respect to a variable x defined or used at p. In restructuring and optimizing compilers, control dependence is used in scheduling instructions across basic block boundaries for speculative or predicated execution [2, 9, 22], in merging program versions [14] and in automatic parallelization [1, 8, 30] In some applications such as code scheduling, it is necessary to know which nodes have the same control dependences as a given node. This information is useful in code scheduling because basic blocks with the same ....

D. Bernstein and M. Rodeh. Global instruction scheduling for superscalar machines. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 241--255, Toronto, Ontario, June 26--28, 1991.


Value Dependence Graphs: Representation Without Taxation - Weise, Crew, Ernst.. (1994)   (43 citations)  (Correct)

....This framework simplifies transformations and improves upon several published results. For example, it enables more powerful code motion than [CLZ86, FOW87] eliminates as many redundancies as [AWZ88, RWZ88] except for redundant loops) and provides important information to the code scheduler [BR91] We exhibit a one pass method for elimination of partial redundancies that never performs redundant code motion [KRS92, DS93] and is simpler than the classical [MR79, Dha91] or SSA [RWZ88] methods. These results accrue from eliminating the CFG from the analysis transformation phases and using ....

....end. Analysis and transformation is simpler to implement, understand, and express formally, and frequently faster, when using a VDG rather than a CFG. While simple minded code generation from the VDG may result in poor code, the representation provides important information to the code scheduler [BR91] which makes code movement and scheduling considerably easier. We believe the code generator is the right place for the complexity of generating sequential code, rather than taxing every analysis and transformation with the burden of maintaining structures that are irrelevant to the operation ....

David Bernstein and Michael Rodeh. Global instruction scheduling for superscalar machines. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 241--255, June 1991.


and the Roman Chariots Problem - Keshav Pingali Cornell (1997)   (Correct)

No context found.

Bernstein, D. and Rodeh, M. 1991. Global instruction scheduling for superscalar machines. In Proceedings of the SIGPLAN '91 ConferenceonProgramming Language Design and Implementation.


Adaptive Explicitly Parallel Instruction Computing - Surendranath Talla Of (2000)   (4 citations)  (Correct)

No context found.

D. Bernstein and M. Rodeh. Global instruction scheduling for superscalar machines, 1991.


Efficient Modeling of Itanium® Architecture during.. - Chen, Liu, Ju, al. (2004)   (Correct)

No context found.

D. Berstein, M. Rodeh, " Global Instruction Scheduling for Superscalar Machines," in Proceedings of SIGPLAN'91 Conference on Programming Language Design and Implementation, pp. 241-255, June 1991.


Register Allocation With Instruction - Scheduling New Approach   (Correct)

No context found.

D. Bernstein and M. Rodeh. Global instruction scheduling for superscalar machines. In SIGPLAN'91 Conference on Programming Language Design and Implementation, pp. 241#255. ACM, June 1991.


Analysis of Profiling Information for Cache Sensitive Scheduling - Lindenmaier (1999)   (Correct)

No context found.

David Bernstein and Michael Rodeh, \Global Instruction Scheduling for Super scalar Machines", Proceedings of the ACM SIGPLAN `91, pp. 241-255, Jun. 1991


Scheduling Time-Constrained Instructions on Pipelined.. - Leung, Palem, Pnueli   (Correct)

No context found.

Bernstein, D., and Rodeh, M. Global instruction scheduling for superscalar machines. Proceedings of SIGPLAN'91 Conference on Programming Language Design and Implementation (1991).

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC