| GOOSSENS, G., VANDEWLLE, J., AND DE MAN, H. 1989. Loop optimization in register-transfer scheduling for DSP-systems. In Proceedings of the 26th ACM/IEEE Conference on Design Automation (DAC '89, Las Vegas, NV, June 25-29), D. E. Thomas, Ed. ACM Press, New York, NY, 826 -- 831. |
.... as key to exploiting much larger amounts of parallelism in program codes than is possible by looking at just a single iteration of the loop body [7, 8, 9, 10] In contrast, the use of loop transformations for high level synthesis have been few and the work has focused mostly on data flow graphs [11, 12, 13, 14]. Furthermore, these works have not analyzed the hardware performance trade offs of these transformations. Straightforward application of compiler transformations does not work for high level synthesis. This is because the increases in control, interconnect (multiplexing) and area costs are ....
....of the loop body into pieces that then form repeating pipeline stages. Rotation scheduling [14] moves or rotates operations in a DFG from one iteration of the loop body to the next. Percolation based synthesis [13] applies the perfect pipelining approach to high level synthesis. Cathedral II [12] applies loop folding to overlap successive iterations of a loop body in a data flow graph. 5 Holtmann and Ernst [17] apply loop pipelining to designs with conditional branches. They schedule operations on the most probable path through the loop body by deferring operations on other paths. ....
G. Goossens, J. Vandewlle, and H. De Man. Loop optimization in register-transfer scheduling for dspsystems. In Design automation conference, 1989.
.... either assume that all input and delay node samples are available at the same time (all phases are zero) 33] 34] 40] or indirectly assign values to the phases by using schedulers that incorporate techniques such as overlapped scheduling and software pipelining to generate complex time shapes [12], 25] 38] However, only recently has some limited work been done on relaxing the assumption that all phases are zero and explicitly manipulating the phases. In one such effort, Iqbal et al. 20] proposed an algebraic speed up algorithm that satisfies an arbitrary set of timing constraints on ....
....of the iteration bound and therefore improvement of the throughput. Rephasing targets time loop. If the target is changed to control loop, a new transformation, software rephasing, is developed. There is a significant level of similarity and conceptual equivalence between software pipelining [12], 25] and software rephasing they are like different views of the same object. However, an advantage of the rephasing approach is that it, unlike software pipelining, keeps scheduling decoupled from transformation. Therefore, rephasing can be easily combined with other transformations such as ....
G. Goossens, J. Wandewalle, and H. DeMan, "Loop optimization in register-transfer scheduling for DSP-systems," in Proc. DAC-89, 1989, pp. 826--831.
....codesign [16] Sanchez [10] and Chao et al. 13] use retiming techniques for pipelining under resource constraints. Results show that the UNRET technique proposed in [10] performs better than the well known pipelining schemes like percolation based scheduling [4] SEHWA [17] and others [18, 19, 20]. The Rotation Scheduling (RS) proposed by Chao et al. in [13] also shows promising results. Retiming, introduced by Leiserson and Saxe [8] is a circuit transformation in which registers can be redistributed maintaining functional equivalence. In [8] and This work was supported by the ARPA ....
G. Goossens, J. Vandewalle, H.De Man. "Loop optimization in register-transfer scheduling for DSP systems". In Proc. Design Automation Conference, pages 826--831, 1989.
....and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and or specific permission. 1995 ACM 0 89791 756 1 95 0006 3. 50 software pipelining to generate complex time shapes [Goo89, Pot94, Lam88]. However only recently has some limited work been done on relaxing the assumption that all phases are zero and explicitly manipulating the phases. Perhaps the first direct effort at directly manipulating the phases as part of an algorithm transformation was described by [Sri94] who applied it to ....
G. Goossens, J. Wandewalle, H. DeMan: "Loop Optimization in register-transfer scheduling for DSP-systems", DAC-89, pp. 826-831, 1989.
.... path, several algorithms designed by Leiserson and Saxe provide an optimal solution [19] When the goal is minimal area or power, the problem has been proven to be NP complete [27] There is one more important relationship between two transformations: pipelining and software pipelining [32] [11]. As described in [30] those two transformations represent the same computation structure alternation in two different computational models. While pipelining is applied on a semi infinite stream of incoming data along the time loop, software pipelining is applied on a finite stream of data along ....
G. Goossens, J. Wandewalle, and H. De Man, "Loop optimization in register -transfer scheduling for DSP-systems," presented at the 26th Design Automation Conf., Las Vegas, NV, 1989.
.... algorithms designed by Leiserson and Saxe provide an optimal solution in polynomial time [Lei91] When the goal is minimal area or power, the problem has been proven to be NP complete [Pot91] There is one more important relationship between two transformations: pipelining and software pipelining [Rau82, Goo89]. As described in [Pot92] those two transformations represent the same computation structure alternation in two different computational models. While pipelining is applied on a semi infinite stream of incoming data along the time loop, software pipelining is applied on a finite stream of data ....
G. Goossens, J. Wandewalle, H. De Man: "Loop optimization in register-transfer scheduling for DSP-systems ", 26th Design Automation Conference, pp. 826-831, Las Vegas, NV, 1989.
....solution. 5. Previous Work This section briefly surveys previous work in retiming and software pipelining, starting with algorithms aiming at centralized machines only. Examples of software pipelining algorithms that are based on some variation of list scheduling include [10] 2] 11] and [12]. In [13] a software pipelining algorithm that can handle conditionals on the loop body is proposed. The retiming algorithm proposed in [14] compacts a given valid schedule by applying a phased iterative retiming and scheduling. The method proposed in [15] uses a probabilistic rejectionless ....
G. Goossens, J. Vandewalle, H. De Man, "Loop optimization in register-transfer scheduling for DSP-systems", in proc. of the ACM/IEEE Design Automation Conf., 1989.
....value, in an attempt to find a solution with less register requirements. 5. Previous Work This section briefly surveys previous work in retiming and software pipelining. Examples of software pipelining algorithms that are based on some variation of list scheduling include [12] 3] 13] and [14]. In [15] a resource constrained softwarepipelining algorithm that can handle conditionals on the loop body is proposed. The retiming algorithm proposed in [16] compacts a given valid schedule by applying a phased iterative retiming and scheduling. The method proposed in [17] uses a probabilistic ....
G. Goossens, J. Vandewalle, H. De Man, "Loop optimization in register-transfer scheduling for DSP-systems", Proc. of the ACM/IEEE Design Automation Conference, pp. 826-831, 1989.
....first phase, the input graph is scheduled assuming unlimited resources. Resource conflicts are then solved by delaying selected operations. If a valid schedule cannot be generated, the number of pipe stages is incremented and the algorithm is repeated. The software pipelining algorithm proposed in [12] also performs an initial scheduling assuming unlimited resources. The resulting retimed graph is then scheduled for minimum latency under resource constraints, using list scheduling. If the latency of the scheduled graph is larger than the target latency, the nodes at the tail of the graph are ....
G. Goossens, J. Vandewalle, H. De Man, "Loop optimization in register-transfer scheduling for DSP-systems", Proceedings of the ACM/IEEE Design Automation Conference, pages 826-831, 1989.
....interval (II) reducing the number of registers, and finding efficient schedules with resource constraints are common. Techniques such as modulo scheduling [1] perfect pipelining [2] or Lam s algorithm [3] among others, have been proposed for parallel architectures, while loop folding [4], retiming [5] loop winding [6] or functional pipelining [7] have been proposed for HLS. This paper presents UNRET (unrolling and retiming) a new approach for software pipelining with resource constraints. UNRET works as follows: first, a lower bound of the minimum initiation interval (MII) for ....
....significantly faster than [10] and more efficiently than [11] since the search space is more exhaustively explored) 1. 1 Contributions This paper presents the following new contributions with regard to previous approaches: ffl Most current techniques work with a single iteration of the loop [4, 5]. Other approaches compute the optimal unrolling degree [11] and use an unrolled loop as a new loop for scheduling [8, 12] In both cases, throughput is only explored in one dimension: once the target loop is obtained (unrolled or not) the expected II is increased when no schedule is found, ....
[Article contains additional citation context not shown here]
G. Goossens, J. Vandewalle, and H. De Man. Loop optimization in register-transfer scheduling for DSP systems. In Proc. of the 26th Design Automation Conf., pages 826--831, 1989.
....focus of this paper is on register binding. In our work, we also consider pipelined schedules: In a loop construct the loop body is executed a number of times. In a traditional schedule, iteration i 1 of the loop body is executed strictly after the execution of the i th iteration. Goossens [11] demonstrates a practical way to overlap the executions of di#erent loop body iterations, thus obtaining potentially much more e#cient schedules. The pipelined schedule is executed periodically, where the period is called the initiation interval II. III. Problem statement and global approach In ....
G. Goossens, J. Vandewalle, and H. De Man, "Loop optimization in register-transfer scheduling for dsp-systems," in Proceedings of the 26th ACM/IEEE Design Automation Conference, Las Vegas, June 1989, ACM and IEEE Computer Society, pp. 826--831.
....[3] Loop pipelining techniques transform a sequential loop into a loop with parallelism across multiple iterations extracted while preserving the program s semantics. Since in general the problem of scheduling with resource constraints is NP complete, heuristic based loop pipelining techniques [7, 9] have been developed to compact loops with given resource constraints. Percolation based loop pipelining techniques [21] first compact a loop into its optimal parallel counterpart and then apply resource constraints on the parallel version. Static scheduling via optimum unfolding[20] in DSP ....
G. Goossens, J. Vandewalle and H. De Man, "Loop optimizations in register-transfer scheduling for DSP systems", Proceedings of the ACM/IEEE 26th Design Automation Conference, 1989.
....rates for the resources, it is useful to let executions of different loop iterations overlap. When using a counter based controller, like in the Phideo compiler [4] this can be done very efficiently. In a micro coded controller, overlapping executions are realized by software pipelining [3], also called loop pipelining or loop folding. Such methods assume that the scheduler is able to handle resource conflicts between operations belonging to different loop iterations. The difficulty in handling these inter iteration conflicts is illustrated with a small example in Figure 1. In this ....
....in clock cycles at the horizontal axis. The operations are enumerated vertically. The white area represents the reductions obtained by both BSG and CAS analysis. For example, the execution interval of operation r0, based on ASAP and ALAP is [1, 5] BSG and CAS are able to reduce this interval to [3,4]. The grey area is the reduction obtained by CAS, that BSG Table 1 Average schedule freedom for radix 2 butterfly ASAP ALAP BSG CAS non folded 1.2 .7 .7 folded 1.2 .5 .1 r0 m2 w0 w1 1 1 1 1 m1 1 r1 1 a0 1 a1 1 1 1 1 1 1 resource binding: a0 ACU II = 6 latency = 8 a1 ACU ....
G. Goossens, J. Vandewalle and H. De Man, "Loop optimization in register-transfer scheduling for DSP-systems", Proc. 26th DAC, pp. 826-831, 1989
....acyclic data flow graph in a preprocessing phase for pipelining. A graph with long critical paths is partitioned into several subgraphs with shorter critical paths. Each subgraph corresponds to a pipeline stage. A few systems have been designed to pipeline loops with inter iteration dependencies [31, 33, 50, 68, 77]. ALPS [33] formulates the scheduling problem in an integer linear programming form. The objective of the integer linear programming formulation can be either minimizing the delay for a given number of resources or maximizing the throughput for a set of functional units. Other systems use ....
....to acyclic DFGs, cycles in a DFG provide bounds on the improvement we can achieve by pipelining. In this chapter, we propose a generic technique to optimize a cyclic DFG under resource constraints. Previous work on loop pipelining for loops with cyclic dependencies appears in several systems [50, 31, 68, 77]. Percolation based scheduling [58, 68] unfolds (unwinds) the loop incrementally to find a repeating pattern in the schedule of the unfolded loop without resource constraints, and then schedules it with resources. The size of the pipeline schedule cannot be predicted until several incremental ....
[Article contains additional citation context not shown here]
Goossens, G., Vandewalle, J., and De Man, H. Loop optimization in registertransfer scheduling for DSP-systems. In Proceedings of the ACM/IEEE Design Automation Conference (1989), pp. 826--831.
....by the Ministry of Education and Science of Spain, under contract CICYT TIC 95 0419 efficient schedules with resource constraints are common. Techniques such as modulo scheduling [17] or Lam s algorithm [13] among others, have been proposed for parallel architectures, while loop folding [7], loop winding [5] or functional pipelining [10] have been devised for HLS. This paper presents UNRET (unrolling and retiming) a new approach for software pipelining with resource constraints. The ideas behind UNRET have been used in [22] to propose a software pipelining approach with timing ....
....increasing the II . ffl Software pipelining is reduced to the interleaved combination of two decoupled techniques: retiming and scheduling. Several configurations are explored by retiming the loop, and scheduling is done for each configuration. Most approaches perform both tasks simultaneously [7, 10, 17], obtaining inferior results because only one configuration of the loop is considered and some scheduling decisions are taken much too soon. UNRET obtains optimal schedules with shorter CPU times in most cases, as shown in Section 6. ffl Decoupling retiming and scheduling also results in ....
G. Goossens, J. Vandewalle,and H. De Man. Loop optimization in register-transfer schedulingfor DSP systems. In Proc. of the 26th Design Automation Conf. (DAC), pages 826--831, 1989.
....prediction and error correction (MBP SC) is employed as SC technique with low circuit overhead. Given a loop with profiling information, MBP SC predicts the path through the loop body with the highest probability to be taken. This path is predicted and scheduled using loop pipelining (LoopP) [12], whereby operations belonging to other paths are deferred. If this path is predicted incorrectly, execution switches to a restore phase (prediction error correction) The body itself is always predicted to continue rather than being terminated. HW overhead due to prediction error correction ....
G. Goossens, J. Vandewalle, H. DeMan, "Loop optimization in register-transfer scheduling for
.... that all input and delay node samples are available at the same time (all phases are zero) McF88, Not91, Rab91, Wal91] or indirectly assign values to the phases by using schedulers that incorporate techniques such as overlapped scheduling and software pipelining to generate complex time shapes [Goo89, Lee92, Pot94, Lam88]. However only recently has some limited work been done on relaxing the assumption that all phases are zero and explicitly manipulating the phases. In one such effort Iqbal et al. Iqb93] proposed an algebraic speed up algorithm that satisfies an arbitrary set of timing constraints on inputs and ....
....the reduction of the iteration bound, and therefore improvement of the throughput. 6. 3 Software Rephasing The importance of merging compiler and high level synthesis techniques has been emphasized in some recent work [Mar93a, Mar93b] Software pipelining is a popular compiler transformations [Lam88, Goo89, Lee92] that integrates scheduling and functional pipelining for optimization of control loops. It is widely used in compilers for throughput optimization under resource constraints [Lam88] Rephasing targets time loop. If the target is changed to control loop, a new transformation, software rephasing is ....
G. Goossens, J. Wandewalle, H. DeMan: "Loop Optimization in register-transfer scheduling for DSPsystems ", DAC-89, pp. 826-831, 1989.
....approaches such as simulated annealing, and exact approaches such as integer linear programming. Greedy heuristics attempt to minimize resource costs but do not guarantee that an optimal schedule will be found. Examples of greedy approaches include fast, simple heuristics such as list scheduling [8, 9, 10, 11, 12] and more complex (and more effective) heuristics such as force directed scheduling [5] Greedy heuristics suffer the shortcoming that they can be trapped in local minima in the cost function and so may not find the globally best schedule. Transformational approaches alter an existing schedule ....
....drawn from layout compaction [21] can be used to find an ASAP like schedule that meets timing constraints. This approach was used by Borriello [22] in the synthesis of asynchronous interface transducers. A similar approach has also been used to satisfy constraints between loop iterations [12]. More recently, Ku and De Micheli [23] have applied constraint solution to the problem of scheduling with timing constraints after allocation has been performed. This technique, called relative scheduling has the added feature that it can guarantee constraint satisfaction in the presence of ....
G. Goossens, J. Vandewalle, and H. De Man, "Loop optimization in register-transfer scheduling for DSP-systems", Proceedings 26th DAC, pp. 826-831, June 1989.
....the data paths [3] Loop pipelining techniques transform a sequential loop into a loop with parallelism across multiple iterations extracted while preserving the program s semantics. Since scheduling with resource constraints in general is NP complete, heuristic based loop pipelining techniques [5, 19, 7] have been developed to compact loops with given resource constraints. Percolation based loop pipelining techniques [18] first compact a loop into its optimal parallel counterpart and then apply resource constraints on the parallel version. Static scheduling via optimum unfolding [16] in DSP ....
....2] t[3] t[2] c[ k 4] t[7] t[6] c[ k 6] t[14] t[13] c[ k 9] t[24] t[23] c[ k 13] t[4] a[ k 5] k 4] a[ k 6] k 5] 3. x[ k 4] t[1] 3 x[ k 2] t[8] t[7] 3 a[ k 7] k 6] t[15] t[14] 3 a[ k 10] k 9] t[25] t[24] 3 a[ k 14] k 13] t[5] = t[4] 3 a[ k 7] k 6] t[10] a[ k 8] k 7] 3 a[ k 9] k 8] 4. x[ k 4] x[ k 4] t[3] t[9] t[8] c[ k 7] t[16] t[15] c[ k 10] t[26] t[25] c[ k 14] t[11] t[10] 3 a[ k 10] k 9] t[19] a[ k 12] k 11] 3 a[ k 13] k 12] 5. x[ k 6] t[4] ....
[Article contains additional citation context not shown here]
G. Goossens, J. Vandewalle and H. De Man, "Loop optimizations in register-transfer scheduling for DSP systems", Proceedings of the ACM/IEEE 26th Design Automation Conference, 1989.
....interval (II ) reducing the number of registers, and finding efficient schedules with resource constraints are common. Techniques such as modulo scheduling [1] perfect pipelining [2] or Lam s algorithm [3] among others, have been proposed for parallel architectures, while loop folding [4], retiming [5] loop winding [6] or functional pipelining [7] have been proposed for HLS. This paper presents UNRET (unrolling and retiming) a new approach for software pipelining with resource constraints. UNRET works as follows: first, a lower bound of the minimum initiation interval (MII) for ....
....while maintaining the throughput. This work was supported by CYCYT TIC 91 1036 1. 1 Contributions This paper presents the following new contributions with regard to previous approaches: ffl Most current techniques do not perform previous unrolling of the loop, working with a single iteration [4, 5]. Other approaches compute the optimal unrolling degree [9] and use an unrolled loop as a new loop [10, 8] In both cases, throughput is only explored in one dimension: once the target loop is obtained (unrolled or not) only the expected II is changed when no schedule is found, whereas the ....
[Article contains additional citation context not shown here]
G. Goossens, J. Vandewalle, and H. De Man. Loop optimization in register-transfer scheduling for DSP systems. In Proc. of the 26th Design Automation Conf., pages 826--831, 1989.
....moves, and those that incorporate software pipelining in a single global scheduling algorithm. Modulo scheduling, presented in [91] first converts conditional branches in the loop into straight line code and subsequently applies a local scheduling algorithm that pipelines the loop. Loop folding [92] is an iterative approach to software pipelining. In every step of the algorithm, a local list schedule is computed for the loop body. Based on this schedule, partial instructions are selected and moved between loop iterations. A similar strategy has been added to the global code motion tool of ....
G. Goossens et al., "Loop optimization in register-transfer scheduling for DSP-systems," in Proc. 26th IEEE/ACM Design Autom. Conf., June 1989.
....pipelining have been presented as well [1] The main disadvantage of functional pipelining is that it does not take loop carried data dependences and cyclic signal flow graphs into account. A heuristic, iterative technique for the software pipelining of cyclic signal flow graphs was proposed in [5]. This approach cannot guarantee optimality, but has proven to perform quite well, with small CPU times. Another approach is proposed in [12] here, software pipelining (or retiming) is performed as a graph transformation for a better resource utilization, as part of a global search strategy. The ....
....p 2 d 1;2 :C = p 2 (d 1;2 init r 2 Gamma r 1 ) C (4) where C is the number of time steps required to execute one loop iteration. Hence, the projected timing relation between p 1 and p 2 is: p 2 p 1 ffi 1;2 Gamma (d 1;2 init r 2 Gamma r 1 ) C (5) The projection idea was described in [5], and is extended here for software pipelining. Most important is the fact that edges between different loop iterations and cyclic signal flow graphs are modeled. They are crucial for correctly modeling software pipelining, and are absent in many published techniques for pipeline scheduling [9] ....
[Article contains additional citation context not shown here]
G. Goossens, J.Vandewalle, and H. De Man. Loop optimization in register-transfer scheduling for DSPsystems. In Proc. of the 26th ACM/IEEE Design Automation Conference, June 1989.
....moves, and those that incorporate software pipelining in a single global scheduling algorithm. Modulo scheduling, presented in [91] first converts conditional branches in the loop into straight line code and subsequently applies a local scheduling algorithm that pipelines the loop. Loop folding [92] is an iterative approach to software pipelining. In every step of the algorithm, a local list schedule is computed for the loop body. Based on this schedule, partial instructions are selected and moved between loop iterations. A similar strategy has been added to the global code motion tool of ....
G. Goossens et al., "Loop optimization in register-transfer scheduling for DSP-systems," Proc. 26th IEEE/ACM Design Autom. Conf., June 1989.
No context found.
GOOSSENS, G., VANDEWLLE, J., AND DE MAN, H. 1989. Loop optimization in register-transfer scheduling for DSP-systems. In Proceedings of the 26th ACM/IEEE Conference on Design Automation (DAC '89, Las Vegas, NV, June 25-29), D. E. Thomas, Ed. ACM Press, New York, NY, 826 -- 831.
No context found.
G. Goossens, et al., "Loop optimization in registertransfer scheduling for dsp-systems," Proc. of 26th DAC, pp. 826-831.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC