28 citations found. Retrieving documents...
G. Goossens, J. Rabaey, J. Vandwalle, and H. De Man, "An efficient microcode compiler for application specific DSP processors," IEEE Trans. Computer-Aided Design , vol. 9, pp. 925--937, Sept. 1990.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Time-Constrained Failure Diagnosis in Distributed Embedded .. - Nagarajan Kandasamy John (2002)   (Correct)

....such that successive iterations execute in a pipelined or overlapped fashion. Since graph folding operates on a single iteration, it introduces no control jitter. Graph folding has been previously applied at the instruction or microcode level to improve software performance on a single CPU [11] [12] Our scheduling approach adapts this concept to handle a more general situation where tasks are preemptible, have deadlines, and incur communication delays. 3.2 Graph Folding We use the small graph G 2 in Fig. 5(a) to illustrate some basic graph folding concepts which can then be applied ....

....that they fall within the frame duration. If T i has , then this release time is modified to fit within the time interval [0, #]ofthe frame. The task is now considered to have been folded with respect to the original graph. A more detailed and formal discussion of graph folding is provided in [11]. The valid scheduling range for T j within the frame is givenby[r j , d j ] where r j and d j denote its release time and deadline, respectively. In addition to the task folding discussed earlier, these parameters are also determined by a combination of precedence and performance constraints as ....

[Article contains additional citation context not shown here]

G. Goossens et al., "An Efficient Microcode Compiler for Application Specific DSP Processors," IEEE Trans. Comput. -Aided Design, vol. 9, no. 9, pp. 925-937, 1990.


List Scheduling for Iterative Data-Flow Graphs - Koster, Gerez (1995)   (Correct)

....other objective functions [23] In high level synthesis systems that perform retiming, retiming is a step preceding scheduling [7, 23] The scheduling method presented here considers all essentially equivalent networks [4] of the IDFG simultaneously making explicit retiming unnecessary. In [6] a method called loop folding for the produc tion of overlapped schedules is mentioned. It works by explicitly moving a number of operations from one iteration to another and then scheduling all operations within an interval of length To. In the method presented here, overlapped schedules are ....

G. Goossens, J. Rabaey, J. Vandewalle, and H. De Man. An efficient microcode compiler for application specific DSP processors. IEEE Transactions on Computer-Aided Design of In- tegrated Circuits and Systems, 9(9):925-937, September 1990. 7


High-Level Synthesis of Control and Memory Intensive Applications - Ellervee (2000)   (Correct)

....description of applying the pre packing on a real life design is discussed in the section 7.2. 4.2. Related work Almost all published techniques for dealing with the allocation of storage units have been scalar oriented and they employ a scheduling directed view (see e.g. KuPa87] BMB88] GRV90] SSP92] where the control steps of production consumption for each individual signal are determined beforehand. This applies also for memory register estimation techniques (see e.g. KuPa87] GDW93] DePa96] and their references) This strategy is mainly due to the fact that applications ....

G. Goossens, J. Rabaey, J. Vandewalle, H. De Man, "An efficient microcode compiler for application-specific DSP processors", IEEE Trans. on Comp.-aided Design, Vol.9, No.9, pp.925-937, Sep. 1990.


Exploiting Data Transfer Locality in Memory Mapping - Ellervee, Miranda.. (1999)   (2 citations)  (Correct)

....Finally, the results are presented of applying the approach to ATM cell processing applications. 2. Related work Until recently, almost all published techniques for dealing with the allocation of storage units have been scalaroriented and they employ a scheduling directed view (see e.g. [8, 1, 7, 12]) where the control steps of production consumption for each individual signal are determined beforehand. This applies also for memory register estimation techniques (see e.g. 8, 6, 5] and their references) This strategy is mainly due to the fact that applications targeted in conventional ....

G. Goossens, J. Rabaey, J. Vandewalle, H. De Man, "An efficient microcode compiler for application-specific DSP processors", IEEE Trans. on Comp.-aided Design,Vol.9, No.9, pp.925-937, Sep. 1990.


A Mathematical Formulation of the Loop Pipelining Problem - Cortadella, Badia, Sanchez (1995)   (5 citations)  (Correct)

....is crucial to obtain high quality architectures. The techniques proposed for this problem attempt to overlap the execution of different loop iterations to reduce the cycle count (initiation interval or II) per iteration. Different methods have been proposed with such a goal: loop folding [12, 23], functional pipelining [17] loop winding [11] rotation scheduling [4] and percolation based synthesis [29] among others. The area of fixed rate DSP has also drawn the attention of other authors to propose techniques for loop pipelining with timing constraints [3, 6, 34] Similar (if not ....

....For simplicity in the definition of data dependence constraints, the auxiliary variable c u will be used to denote the cycle at which u is scheduled. Hence, c u = X i2C i Delta s u;i ; 8u 2 V (5) The constraint that guarantees data dependences to be honored in the schedule is the following [5, 12, 18, 23]: c v c u T (u) Gamma II Delta ffi u;v ; 8(u; v) 2 E (6) Figure 8 illustrates how ffi u;v influences the schedule of v. If ffi u;v = 0 (ILD) then v must be scheduled after the completion of u (c u T (u) as shown in Figure 8(a) By increasing the fold from u i to u i m (Figures 8(b) and ....

G. Goossens, J. Rabaey, J. Vandewalle, and H. De Man. An efficient microcode compiler for application specific DSP processors. IEEE Trans. Computer-Aided Design, 9(9):925--937, September 1989.


CHESS: Retargetable Code Generation For Embedded DSP Processors - Lanneer, al. (1995)   (37 citations)  (Correct)

....etc. A global optimisation of the control flow of the application during scheduling is essential, in order to produce machine code of high quality. Traditional schedulers are restricted to basic blocks in the application. Only recently, control flow optimisations like software pipelining [134, 188, 88] or code hoisting [194, 202, 236] have gained attention. Chess is using a list scheduling algorithm [88, 232] that has been extended for the above requirements [114] A variety of microcoded controller architectures is currently supported, with different behaviour of the sequencing logic (e.g. ....

....to produce machine code of high quality. Traditional schedulers are restricted to basic blocks in the application. Only recently, control flow optimisations like software pipelining [134, 188, 88] or code hoisting [194, 202, 236] have gained attention. Chess is using a list scheduling algorithm [88, 232] that has been extended for the above requirements [114] A variety of microcoded controller architectures is currently supported, with different behaviour of the sequencing logic (e.g. multi way branching or two way branching, delayed branch or not, etc. The scheduler supports software ....

G. Goossens, J. Rabaey, J. Vandewalle, H. De Man, "An efficient microcode compiler for application-specific DSP-processors", IEEE Trans. on Comp.- Aided Design, Vol. 9, No. 9, Sept. 1990, pp. 925--937.


Synthesis of Low Power Folded Programmable Coefficient FIR.. - Sundararajan, Parhi (2000)   (Correct)

....overhead. Depending on the unfolding factor employed the average power consumed in a multiplier is seen to reduce anywhere from 54.75 to 81.73 when transpose FIR filters are synthesized as opposed to synthesizing direct form FIR filters with no unfolding. 1. INTRODUCTION Folding [1] 2] 3] [4], 5] or time multiplexing is a technique for efficient resource sharing for area constrained behavioral synthesis from a data flow graph (DFG) The throughput requirement in folded architectures is met by pipelining the hardware functional units to a relevant number of levels. In this way folded ....

G. Goossens, J. Rabaey, J. Vandwalle, and H. De Man, "An efficient Microcode Compiler for Application Specific DSP Processors," IEEE Transactions on Computer-Aided Design of Integrated Circuits, vol. 9, pp. 925--937, September 1990.


Memory Data Organization for Improved Cache Performance in .. - Panda, Dutt, Nicolau (1997)   (18 citations)  (Correct)

....to the embedded processor environment, and which have not been addressed by traditional compilers (or have been addressed only partially) largely due to restrictions on compilation times permitted. Generation of efficient code for embedded processors has been the subject of recent investigation [Goosens et al. 1990; Paulin et al. 1995; Araujo et al. 1995] Optimization techniques that improve the performance of application programs by exploiting the irregular architectures of some embedded DSP processors and other application specific processors have been reported [Liao et al. 1995; Sudarsanam and Malik ....

.... Paulin et al. 1995; Araujo et al. 1995] Optimization techniques that improve the performance of application programs by exploiting the irregular architectures of some embedded DSP processors and other application specific processors have been reported [Liao et al. 1995; Sudarsanam and Malik 1995; Goosens et al. 1990; Liem et al. 1994] Research efforts have also focussed on retargetable code generation, with an attempt to generate code from the same behavioral specification, into different target embedded processors, using a suitable processor model [Lanneer et al. 1995; Schenk 1995] An important ....

Goosens, G., Rabaey, J., Vandewalle, J., and Man, H. D. 1990. An efficient microcode compiler for application specific dsp processors. IEEE Transactions on CAD/ICAS 9, 9 (Sept.), 925--937.


Optimization of Memory Organization and Hierarchy.. - Nachtergaele.. (1995)   (4 citations)  (Correct)

.... outcome of the background memory estimation or memory mapping tools (e.g. for memory allocation, memory assignment, address generation) Note that this high level memory management stage is fully complementary to the traditional high level synthesis step known as register allocation assignment [24, 13, 2, 11, 1, 22] which deals with individual storage places for scalars in registers or register files, after scheduling. Manual transforming the specification during the early system level to explore the cost measures for several alternatives is tedious and error prone. To remove this design time bottle neck, in ....

G. Goossens, J. Rabaey, J. Vandewalle, and H. De Man. An efficient microcode compiler for application-specific dsp processors. IEEE Trans. on Comp.-Aided Design, 9:925-- 937, Sep. 1990.


Recent Developments in High-Level Synthesis - Lin (1997)   (15 citations)  (Correct)

.... to decrease the height of a long expression chain, and exposes the potential parallelism within a complicated data flow graph [31, 67] Pipelining is another frequently applied transformation in HLS [71] Other commonly used transformations include loop folding [23, 88] software pipelining [26, 77], and retiming [76] Hardware specific transformations at the logic, RT and system levels can be applied to the intermediate representation. In general, these are local transformations that use properties of hardware at different design levels to optimize the intermediate representation. For ....

G. Goossens, J. Rabaey, J. Vandewalle, and H. De Man, "An Efficient Microcode Compiler for Application Specific DSP Processors," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 9, No. 9, pp. 925-937, September 1990.


A Genetic Approach to the Overlapped Scheduling of Iterative.. - Erwin Bonsma (1997)   (Correct)

.... if all operations belonging to the same iteration have to finish before the operations of the next iteration can start; if operations belonging to different iterations can execute simultaneously, the schedule is called overlapped [3] Overlapped scheduling is sometimes also called loop folding [4, 5]. An overlapped schedule can be characterized by two entities: the iteration period T 0 , the time between the invocation of the same operation in subsequent iterations, and the latency, the time that passes between the consumption of the first input and the production of the last output in the ....

G. Goossens, J. Rabaey, J. Vandewalle, and H. De Man, "An efficient microcode compiler for application specific DSP processors, " IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 9, no. 9, pp. 925--937, September 1990.


ILP Based Cost-Optimal DSP Synthesis with Module Selection.. - Ito, Lucke, Parhi (1999)   (3 citations)  (Correct)

....[7] 10] 11] 19] 39] These schedulers schedule a single iteration of the DFG but allow subsequent iterations to overlap the first. This is sometimes referred to as loop unrolling or functional pipelining [7] 39] and has also been used in design of high performance compilers [37] [16]. An overlapping schedule automatically supports retiming and functional pipelining. The minimum possible iteration period for an overlapping scheduler is limited by the longest execution time of a single node or the iteration bound, whichever is largest. Moreover, even when it is possible to ....

G. Goossens, J. Rabaey, J. Vandewalle, and H. J. De Man, "An efficient microcode compiler for application specific DSP processors," IEEE Trans. Computer-Aided Design, vol. CAD-9, no. 6, pp. 925--937, June 1990.


A Genetic Approach to the Overlapped Scheduling of Iterative.. - Erwin Bonsma (1997)   (Correct)

.... if all operations belonging to the same iteration have to finish before the operations of the next iteration can start; if operations belonging to different iterations can execute simultaneously, the schedule is called overlapped [3] Overlapped scheduling is sometimes also called loop folding [4], 5] An overlapped schedule can be characterized by two entities: the iteration period T 0 , the time between the invocation of the same operation in subsequent iterations, and the latency, the time that passes between the consumption of the first input and the production of the last output in ....

G. Goossens, J. Rabaey, J. Vandewalle, and H. De Man, "An efficient microcode compiler for application specific DSP processors," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 9, no. 9, pp. 925--937, September 1990.


Overlapped Scheduling of Fine-Grain Iterative Data-Flow Graphs .. - Erwin Bonsma   (Correct)

.... all operations belonging to the same iteration have to finish before the operations of the next iteration can start; if operations belonging to different iterations can execute simultaneously, the schedule is called overlapped [Par91] Overlapped scheduling is sometimes also called loop folding [Goo90, Lee94]. An overlapped schedule is characterized by two entities: the iteration period T 0 , the time between the invocation of the same operation in subsequent iterations, and the latency, the time that passes between the consumption of the first input and the production of the last output in the same ....

G. Goossens, J. Rabaey, J. Vandewalle, and H. De Man. An Efficient Microcode Compiler for Application Specific DSP Processors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 9(9):925--937, September 1990.


Optimal Code Placement of Embedded Software for Instruction.. - Hiroyuki Tomiyama (1996)   (5 citations)  (Correct)

....Reducing the instruction count to be executed is one of very effective approaches to make the high performance compatible with the low cost and the low power. Recently, a lot of code generation techniques for embedded processors have been proposed to minimize the number of executed instructions[4, 7]. Most of them do not require extra hardware cost, and some techniques target implementation of low power systems[11] In this paper, we focus on another approach for performance improvement, reduction of cache misses. Let s consider an ideal processor whose CPI is 1 and the cache miss penalty is ....

G. Goossens, J. Rabaey, J. Vandewalle, and H. De Man. "An Efficient Microcode Compiler for Application Specific DSP Processors". IEEE Trans. CAD/ICAS, 9(9):925--937, September 1990.


Co-Synthesis of Instruction Sets and Microarchitectures - Huang (1994)   (3 citations)  (Correct)

....and allocation at the cost of higher complexity [61] These two systems do not work with applications which exhibit loop carried dependencies. PLS and CATHEDRAL II are capable of working on applications with LCD s by means of an iterative folding algorithm [46] and loop folding algorithm [27], respectively. Both systems adopt an iterative approach to find the MAL. This MAL then serves as the upper bound of the performance of the synthesized designs. The major application domain of the above synthesis systems is digital signal processors (DSP) The System Architect s Workbench (SAW) ....

Gert Goossens, Jan Rabaey, Joos Vandewalle and Hugo De Man, "An Efficient Microcode Compiler for Application Specific DSP Processors," IEEE Trans. on Computer-Aided Design, Vol. 9, No. 9, September 163


List Scheduling for Iterative Data-Flow Graphs - Koster, Gerez (1995)   (Correct)

....other objective functions [23] In high level synthesis systems that perform retiming, retiming is a step preceding scheduling [7, 23] The scheduling method presented here considers all essentially equivalent networks [4] of the IDFG simultaneously making explicit retiming unnecessary. In [6] a method called loop folding for the production of overlapped schedules is mentioned. It works by explicitly moving a number of operations from one iteration to another and then scheduling all operations within an interval of length T 0 . In the method presented here, overlapped schedules are ....

G. Goossens, J. Rabaey, J. Vandewalle, and H. De Man. An efficient microcode compiler for application specific DSP processors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 9(9):925--937, September 1990.


Dataflow-driven Memory Allocation for Multi-dimensional Signal.. - Balasa, al. (1994)   (10 citations)  Self-citation (De man)   (Correct)

....should include a decision on: the number and type of memory units, the signal to memory binding for M D signals, and the detailed internal organization of the memory units. Note that this is fully complementary to the traditional high level synthesis step known as register allocation assignment [26, 13, 10, 16, 24] which deals with individual storage locations for scalars. A major drawback of such an approach is that the loop structure has to be destroyed by unrolling. This is unfeasible in video and image processing applications, and the like. Little work has been performed in the HLMM domain within a ....

....far are substantiated in Section 6, followed by conclusions and our future directions of research in Section 7. 2 Background memory allocation and M D signal assignment To our knowledge, almost all techniques tackling the storage allocation problem employ a scheduling driven scalar oriented view [26, 13, 2, 10, 1, 22, 24] where the control steps of production and consumption are assumed to be known for each individual scalar signal. This strategy is mainly due to the fact that applications targeted in conventional highlevel synthesis contain a relatively small number of signals (at most of the order of 10 3 of ....

G.Goossens, J.Rabaey, J.Vandewalle, H.De Man, "An efficient microcode compiler for application-specific DSP processors", IEEE Trans. on Comp.-aided Design, Vol.9, No.9, pp.925-937, Sep. 1990.


Integration of Medium-Throughput Signal Processing .. - Goossens.. (1994)   (6 citations)  Self-citation (Goossens)   (Correct)

....reduced below a user specified bound [39] Tron can be used in conjunction with existing schedulers. The operations in the DSFG are clustered based on a graph metric which models the register cost . Next each cluster is scheduled internally with an existing scheduler (e.g. the Smart list scheduler [40][41] Finally, the scheduled clusters are replaced by macronodes, and the resulting description is scheduled again at the macronode level. This approach reduces the operation mobility as a result of the clustering step, which leads to well balanced schedules. Scheduling the offset filter with ....

G. Goossens et al., "An efficient microcode compiler for application-specific DSP-processors," IEEE Trans. on CAD/ICAS , vol. CAD-9, no. 9, Sept. 1990, pp. 925--937.


Background Memory Area Estimation for Multi-dimensional.. - Balasa, Catthoor, De Man (1995)   (7 citations)  Self-citation (De man)   (Correct)

....1995. See IEEE copyright procedure 1995. high level storage organization for the multi dimensional signals in our CATHEDRAL script [36] Note that this high level memory management stage is fully complementary to the traditional high level synthesis step known as register allocation assignment [31, 19, 15, 1, 29] which deals with individual storage places for scalars, after scheduling. Part of this effort is also needed in the CATHEDRAL context, but this decision on scalar memory management is postponed to our low level data path mapping stage [15] Three main partly conflicting objectives can be ....

....known as register allocation assignment [31, 19, 15, 1, 29] which deals with individual storage places for scalars, after scheduling. Part of this effort is also needed in the CATHEDRAL context, but this decision on scalar memory management is postponed to our low level data path mapping stage [15]. Three main partly conflicting objectives can be identified during high level memory management, solvable within different steps: 1) optimizing the memory access by allocating a number of memory units of specific types and by distributing and organizing the signals efficiently over a set of ....

[Article contains additional citation context not shown here]

G. Goossens, J. Rabaey, J. Vandewalle, H. De Man, "An efficient microcode compiler for application-specific DSP processors," IEEE Trans. on Comp.-Aided Design, vol. 9, pp. 925937, Sep. 1990.


Array Placement for Storage Size Reduction in Embedded.. - De Greef, Catthoor, De .. (1997)   (4 citations)  Self-citation (De man)   (Correct)

....parameter to fix in each of the address equations, namely the base address or offset of the array in memory. However, especially for the dynamic approaches, making a good choice for these offsets is not trivial. In the scalar context, binary) ILP formulations [18, 19] iterative) line packing [20, 21], graph coloring [22] or clique partitioning [23] techniques have provided satisfactory results for register( file) allocation and signal to register assignment. Unfortunately, these techniques are not feasible when the number of scalars becomes too large, which is the case in data dominated ....

G. Goossens, J. Rabaey, J. Vandewalle, and H. De Man. An efficient microcode compiler for application-specific dsp processors. IEEE Trans. on Comp.-aided Design, 9(9):925-- 937, Sep. 1990.


System-Level Memory Management for Weakly Parallel Image.. - Danckaert, al. (1996)   (1 citation)  Self-citation (De man)   (Correct)

....detailed mapping of scalars and delayed signals to foreground storage locations and the detailed organization of the bus and multiplexer network between the processor data paths should still be decided. This is a complex enough problem as such, but feasible even for large realistic applications [13, 25]. It should be emphasized too that decisions made at the SLMM level do translate into constraints on the M D signal access, which directly influence the search space of the subsequent parallelization and processor mapping tasks. This is for instance true due to the restrictions on loop ordering ....

G.Goossens, J.Rabaey, J.Vandewalle, H.De Man, "An efficient microcode compiler for application-specific DSP processors", IEEE Trans. on Comp.-aided Design, Vol.9, No.9, pp.925-937, Sep. 1990.


Approaches to Low-Power Implementations of DSP Systems - Parhi (2001)   (Correct)

No context found.

G. Goossens, J. Rabaey, J. Vandwalle, and H. De Man, "An efficient microcode compiler for application specific DSP processors," IEEE Trans. Computer-Aided Design , vol. 9, pp. 925--937, Sept. 1990.


Approaches to Low-Power Implementations of DSP Systems - Parhi (2001)   (Correct)

No context found.

G. Goossens, J. Rabaey, J. Vandwalle, and H. De Man, "An efficient microcode compiler for application specific DSP processors," IEEE Trans. Computer-Aided Design , vol. 9, pp. 925--937, Sept. 1990.


TLS: A Tabu Search Based Scheduling Algorithm for.. - Ahmad, Dhodhi, Ali   (Correct)

No context found.

Goossens, G., Rabaey, J., Vandewalle, J. and Man, H. D. (1990) An efficient microcode compiler for application specific DSP processors. IEEE Trans. Computer-Aided Design Integrated Circuits Syst., 9, 925--937.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC