| D. R. Kerns and S. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 278--289, June 1993. |
....is nothing to do about the latencies of them, whether they hit or miss the caches. That is, not during the scheduling phase, see also section 4.3. What we can however do is to try to increase the processor utilization during these latencies. This is precisely what Balanced List Scheduling (BLS) [48] is meant for. This algorithm does not 9 These two loads have load level parallelism 3. That is 1 (the cycle in which the load is issued) 4 2 (since the four operations that can be executed in parallel with the loads have to be divided over the two loads) Both loads in this DDD have a ....
....This number can di er for di erent loads in the same basic block, if the DDD is not symmetrical as in the depicted example. By using this number as a heuristic for the priority function, the schedules generated are optimized for the program, rather than for the architecture. Results discussed in [48] show a speed up of 3 18 compared to traditional LS. When BLS is combined with other ILP optimizations such as loop unrolling, trace scheduling (see section 3.1.1) and cache locality analysis, speedups are achieved in the range of 15 to 40 [52] 2.3.2 Stochastic Algorithms Though LS is simple ....
Kerns, D., and Eggers, J. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation (June 1993), pp. 278-289.
....can be exploited by the target at any one time. Since costs are based on worse case values rather than typical ones, the traditional list scheduling heuristics tend to overly migrate independent instructions to the top of the schedule, leaving insufficient parallelism for later. Kerns and Eggers [18] proposed a code scheduling algorithm called balanced scheduling for synchronous architectures which is similar in concept. Their algorithm is specifically designed to tolerate a wide range of variance in load latency, e.g. cache misses hits, global and Algorithm 1 : The MAP scheduler (generate ....
D. R. Kerns and S. J. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. SIGPLAN Notices, 28(6):278--289, June 1993. Proceedings of the ACM Conference on Programming Language Design and Implementation.
....larger than a single instruction window (possibly because of unroll and jam or inner loop unrolling) In such cases, the instruction scheduler should pack independent miss references in the loop body close to each other. The technique of balanced scheduling can provide some of these benefits [KE93, LE95] but may also miss some opportunities since it does not explicitly consider window size. Nevertheless, this heuristic worked well for the 52 code sequences we examined. More appropriate local scheduling algorithms remain the subject of future research. 4.3 Measuring the Impact of ....
....at the loop nest level, but also discusses the possible interaction between clustering and basic block scheduling. We have not yet dealt with clustered codes that are limited by basic block size and not amenable to previously understood local scheduling techniques such as balanced scheduling [KE93, LE95] In such situations, all of the independent misses exposed by the transformation will not actually issue to the memory system together, limiting the system s latency 128 tolerance ability. To improve latency tolerance, the instruction scheduler can reschedule independent misses to insure ....
Daniel R. Kerns and Susan J. Eggers. Balanced Scheduling: Instruction Scheduling When Memory Latency is Uncertain. In Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 278--289, June 1993.
....an instruction will be scheduled which is most likely to cause interlocks with instructions after it. The complexity in the absence of any lookahead in the instructions is (n 2 ) where n is the number of instructions in a basic block. 2. 2 The Balanced scheduler The Balanced scheduler [3] was devised to take account of unpredictable memory access latencies. The idea is to compute weights for load instructions based on the number of available independent instructions. The instructions are scheduled as in a traditional list scheduler with independent instructions being distributed ....
.... schedule has at most n Gamma 1 consecutive dependencies (a pure sequential code) giving a complexity of (n 2 ) and the best case is (n) The linear time complexity for the PTD scheduler is better than the (n 2 ) for the list scheduler [2] and (n 2 ff n) 1 for the balanced scheduler [3]. 4 Results We next compare the quality of schedules produced by the Balanced, Gibbons and Muchnick (GM) and the PTD schedulers for a range of benchmarks which 1 ff is the inverse of the Ackerman function. represent both loop intensive (Livermore loops) and control intensive categories of ....
D. R. Kerns and S. J. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In ACM SIGPLAN
....is our solution, where we use average latencies of load instructions as an integral part of the rank function to properly control the scheduling of load instructions. Our algorithm is based on greedy list scheduling with static priorities given to each operation via our rank function [1]. All operations are given a priority using this rank function; however, the rank function treats loads and other longer latency operations differently than other operations. Longer latency operations that affect the greatest part of the program will be given priority over other operations. In ....
Daniel R. Kerns and Susan J. Eggers, Balanced Scheduling: Instruction Scheduling When Memory Latency is Uncertain , in , 1993
....optimizations are mostly inadequate. The majority of strategies advocated to address the memory bottleneck either attempt to hide long access latencies or enhance data locality. Examples of latency masking optimizations include prefetching[24, 20, 6] and load sensitive scheduling algorithms[15, 30]. However, such strategies are vulnerable to unpredictable memory reference patterns and may degrade performance. Specifically, prefetch strategies waste bandwidth and pollute caches when data is unnecessarily requested. Similarly, poor or pessimistic operation characterization during scheduling ....
D. Kerns, and S. Eggers. "Balanced scheduling: instruction scheduling when memory latency is uncertain". In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, June 1993.
....defined, and model optimistic program behavior for instructions with variable real latencies. This works well when the assumptions match real runtime latencies, but events such as branch misprediction, cache misses, etc. can render the compiler s schedule quite inaccurate. Balanced scheduling [6] is an example of a structure directed approach which attempts to address some of the issues at the basic block level. We believe that the best results will be obtained by balancing profile and structure driven techniques, while modifying the traditional dependence graph model to better reflect ....
....nop [0] or 12, 0, 0 [4] L.2.40: addiu 12, 0, 1 [0] L.2.31: bne 12, 0, L.2.41 [0] L.1.41.temp addu 13, 11, 10 [0] lw 5, gp rel(prev) gp) 0] andi 25, 4,32767 [0] addu 10, 25, 25 [1] addu 13, 11, 10 [2] lhu 4,0( 13) 3] L.2.41 beq 4, 0, BB42. [6] lhu 4,0( 13) 0] addiu 8, 8, 1 [6] addiu 8, 8, 1 [2] BB36. beq 4, 0, BB42. 3] bne 8, 0, L.1.25.temp 4 [0] addu 14, 4, 24 [3] addu 11, 4, 10 [0] BB36. BB42. andi 25, 4,32767 [0] jr 31 [0] addu 13, 2, 14 [0] nop [0] lbu 12, 1( 13) 1] lw 11, got disp(prev) 1) 2] lbu ....
[Article contains additional citation context not shown here]
D. R. Kerns and S. J. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In Proceedings of the SIGPLAN'93 Conference on Programming Language Design and Implementation, June, 1993.
....in order to minimize the execution time of a software pipelined loop. Finally, we show that schemes based on binding prefetch are more effective than those based on nonbinding prefetch for software pipelined schedules. The use of binding and nonbinding prefetching has been previously studied in [13][1] and [4] 9] 14] 18] 3] respectively among others. However, there are very few works analyzing the interactions of these prefetching schemes with software pipelining techniques. The selective scheduling ( 1] schedules some operations with cache hit latency and others with cache miss latency, ....
D.R. Kerns and S.J. Eggers, "Balanced Scheduling: Instruction Scheduling When Memory Latency is Uncertain", in Procs. of PLDI 93, pp.278-289, 1993
....currently available are the following: register renaming: renames local registers in each basic blocks. This aims at removing false dependences. loop unrolling: unrolls loop bodies. local superblock scheduling: this transformation performs the scheduling of basic blocks or of superblocks [16, 11, 12]. superblocks construction: gathers a set of basic block into a superblock [11] guard insertion adds guards to instructions to remove jumps and thus allows scheduling across jumps [10] software pipeline generates a modulo scheduling of the loop body. Registers are renamed to achieve low ....
Daniel R. Kerns and Susan J. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In Conference on Programming Language Design and Implementation, pages 278--289, 1993.
....latency with the measure of load level parallelism and then schedules instructions normally. Previous work has demonstrated that when memory latency is uncertain, balanced schedules for the Perfect Club benchmark suite show speedups of between 3 and 18 for different architectural models [Kerns93]. Since its success depends on the amount of instruction level parallelism, balanced scheduling should perform better when more parallelism is available. In this study, we try to improve the performance of the balanced scheduler by applying two compiler optimizations aimed at increasing ....
....suite of scientific benchmarks for several architectural models, including caches with various hit rates and latencies, networks with varying amounts of congestion, and sophistication of non blocking hardware. More details of the balanced scheduling algorithm and these results are presented in [Kerns93]. 3.0 Compiler Optimizations Load level parallelism is required to hide the additional load latencies exposed by non blocking architectures. By applying techniques to increase load level parallelism, the balanced scheduler should be able to generate schedules that are even more tolerant of ....
[Article contains additional citation context not shown here]
D. R. Kerns and S. J. Eggers.Balanced Scheduling: Instruction Scheduling When Memory Latency Is Uncertain. In SIGPLAN Conference on Programming Language Design and Implementation, June 1993.
....larger than a single instruction window (possibly because of unroll and jam or inner loop unrolling) In such cases, the instruction scheduler should pack independent miss references in the loop body close to each other. The technique of balanced scheduling can provide some of these benefits [12, 13], but may also miss some opportunities since it does not explicitly consider window size. Nevertheless, this heuristic worked well for the code sequences we examined. More appropriate local scheduling algorithms remain the subject of future research. 4. Experimental Methodology 4.1 Evaluation ....
D. R. Kerns and S. J. Eggers, "Balanced Scheduling: Instruction Scheduling When Memory Latency is Uncertain, " in Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, pp. 278--289, June 1993.
....larger than a single instruction window (possibly because of unroll and jam or inner loop unrolling) In such cases, the instruction scheduler should pack independent miss references in the loop body close to each other. The technique of balanced scheduling can provide some of these benefits [9, 10], but may also miss some opportunities since it does not explicitly consider window size. Nevertheless, this heuristic worked well for the code sequences we examined. More appropriate local scheduling algorithms remain the subject of future research. 6 Processor parameters Clock rate 500 MHz ....
D. R. Kerns and S. J. Eggers. Balanced Scheduling: Instruction Scheduling When Memory Latency is Uncertain. In Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 278--289, June 1993.
....in each basic block. Superblock construction merges a set of basic block into a superblock [14] Guard insertion adds guards to instructions to remove jumps and thus allow scheduling across jumps [13] Loop unrolling (also available at the high level) Local superblock scheduling [14, 15]. Software pipeline generates a modulo scheduling of the loop body. The implementation is based on the tools PiLo [19] and LoRa [11] PiLo and LoRa are optimisation kernels based on periodic scheduling and graph colouring algorithms. Sea sends data dependencies between instructions, the ....
D. R. Kerns and S. J. Eggers. Balanced Scheduling: Instruction Scheduling When Memory Latency is Uncertain. Conference on Programming Language Design and Implementation, 1993, pp. 278--289.
....larger than a single instruction window (possibly because of unroll and jam or inner loop unrolling) In such cases, the instruction scheduler should pack independent miss references in the loop body close to each other. The technique of balanced scheduling can provide some of these benefits [6, 7], but may also miss some opportunities since it does not explicitly consider window size. Nevertheless, this heuristic worked well for the code sequences we examined. More appropriate local scheduling algorithms remain the subject of future research. 4. Experimental Methodology 4.1. Evaluation ....
D. R. Kerns and S. J. Eggers. Balanced Scheduling: Instruction Scheduling When Memory Latency is Uncertain. In Proc. of the Conf. on Programming Language Design and Implementation, 1993.
....profiles to sharpen constant propagation [2] Our work is unique in that it uses information at the instruction level, and integrates it into a scheduler. Previous work on using instruction level parallelism (ILP) to hide latencies for nonblocking caches has two major differences from this work [4, 6, 8, 10, 12]. First, previous work uses static locality analysis which works very well for regular array accesses. Secondly, these schedulers only differentiates between a hit or a miss. Since we use performance counters, we can improve the schedules of pointer based codes that compilers have difficulty ....
....0.7 0.8 0.9 1 0 20 40 60 80 100 Percentage Hit in First Level Cache strict heuristic generous heuristic Fig. 2. Simulated number of loads and comparison of heuristics to simulation. 6 4. 1 Balanced Scheduling We use the Multiflow compiler [7, 11] with the Balanced Scheduling algorithm [8, 10], and additional optimizations, e.g. unrolling, to generate ILP and traces of instructions that combine basic blocks. Below we first briefly describe Balanced scheduling and then we present our modifications. Balanced scheduling first creates an acyclic scheduling data dependency graph (DAG) ....
[Article contains additional citation context not shown here]
D. R. Kerns and S. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 278--289, Albuquerque, NM, June 1993.
....these run time interactions. These predictions are often performed with the help of a model of the processor, with some models assuming constant latencies for all operations (e.g. Lowney 1993] while some others seek to account for variances caused by data cache misses (e.g. Abraham 1993, Kerns 1993]) Typically, the interaction predictions are only made in relation to instructions that belong to a given basic block because the issue order of these instructions can be determined with 100 accuracy. To take into account the interactions between instructions belonging to different basic ....
Kerns, D. R. and Eggers, S. J. (1993). Balanced Scheduling: Instruction Scheduling When Memory Latency is Uncertain. In the Proceedings of the Conference on Programming Language Design and Implementation, pages 278--289.
....needs the loaded value. This means that it is good if the compiler can schedule instructions for the latency of a cache miss. These latencies are however so large (around ten cycles and up) that they can not meaningfully be used as labels in the dependence graph. Therefore, Kerns and Eggers [20] propose using a latency based on the parallelism actually available in the code, a method they refer to as balanced scheduling. The edge latencies depend on both the machine specific operation latencies and on the kind of dependence they encode. Typical values are the following: ffl Superscalar ....
Daniel R. Kerns and Susan J. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. SIGPLAN Notices, 28(6):278--289, June 1993. Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation.
....the memory bus traffic increases. 4 Discussion 4.1 Software techniques This study has compared various hardware techniques for tolerating memory latency. There has also been work on software techniques for for tolerating memory latency, such as prefetching[MLG92, TE95] and balanced scheduling[KE93]. Hardware and software techniques are compared in [BP92] CCMH91] and [CB94] In general, it appears that software and hardware techniques are complementary. Compile time optimizations for memory latency tolerance can include large scale code motion, such as loop transformations, that are ....
D. R. Kerns and S. J. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 278--89, June 1993.
No context found.
D. R. Kerns and S. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 278--289, June 1993.
No context found.
D. R. Kerns and S. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 278--289, June 1993.
No context found.
D.R. Kerns and S.J. Eggers, "Balanced Scheduling: Instruction Scheduling When Memory Latency is Uncertain", in Procs. of PLDI 93, pp.278-289, 1993
No context found.
D. R. Kerns and S. J. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In Proceedings of the SIGPLAN'93 Conference on Programming Language Design and Implementation, pages 278--289, June 1993.
No context found.
D.R. Kerns and S.J. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 278--89, June 1993.
No context found.
D. R. Kerns and S. J. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 278-289, Albuquerque, NM USA, June 1993.
No context found.
Daniel R. Kerns and Susan Eggers, \Balanced Scheduling: Instruction scheduling when memory latency is uncertain", Conference on Programming Language Design and Implementation, Jun. 1993
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC