| P. Dewilde, E. Deprettere, R. Nouta, "Parallel and Pipelined VLSI Implementations of Signal Processing Algorithms" in S. Y. Kung, H. J. Whitehouse, T. Kailath, "VLSI and Modern Signal Processing", Prentice Hall, 1985 |
....(7) C3 c(r) C m mA z number of evaluation calls in total (PROC 1) number of available component types number of selected component types number of overall iterations (PROC 1) 5 Examples and results From the literature two filter examples were selected. The 5th order filter [DeDN85] contains 26 additions and 8 multiplications. The control flow consists of one single loop without mutual exclusive operations. The AR filter [JaPP88] contains 12 additions and 16 multiplications with also a single loop as control flow. Table 1 contains the available component types for the ....
P. Dewilde, E. Deprettere, R. Nouta, "Parallel and Pipelined VLSI Implementations of Signal Processing Algorithms" in S. Y. Kung, H. J. Whitehouse, T. Kailath, "VLSI and Modern Signal Processing", Prentice Hall, 1985
....can be estimated as a component load larger than one (5) The result is an estimated delay of 0.33 clock cycles. Vk, t = max(ak, t 1,0) 5) ak, t load of type kin clock cycle t Vk, t delay of type k in clock cycle t Table 3 Component distribution 6. Examples and results The filter example [DeDN85] is a benchmark for the highlevel synthesis workshop series. It contains 26 additions and 8 multiplications. The control flow consists of one single loop without mutual exclusive operations. Table 4 Maximum between branches T Add ALU Add ALU 2 0.67 0.67 0.75 0.75 3 0.92 0.92 1 1 4 0 0.5 ....
P. Dewilde, E. Deprettere, R. Nouta, "Parallel and Pipelined VLSI Implementations of Signal Processing Algorithms" in S. Y. Kung, H. J. Whitehouse, T. Kailath, "VLSI and Modern Signal Processing", Prentice Hall, 1985
....the algorithm terminates normally all constraints are met, the operations have status done asap, and tstart i is the result of the ASAP scheduling. The ALAP scheduling is the dual problem and is solved with the corresponding algorithm. 5. Results As example the 5 th order elliptical wave filter [DeDN85] is used. It contains 26 additions and 8 multiplications in one single loop. Tab. 2 contains the selected component types for the additions and the multiplications. To allow a design space exploration component types with different speed are available. The properties were estimated on the base of ....
P. Dewilde, E. Deprettere, R. Nouta, "Parallel and Pipelined VLSI Implementations of Signal Processing Algorithms" in S. Y. Kung, H. J. Whitehouse, T. Kailath, "VLSI and Modern Signal Processing", Prentice Hall, 1985
....3 describes how the compatiblitity graph is built and edge weights calculated. Section 4 reviews the weight directed clique partitioning algorithm. Section 5 shows some results obtained with the algorithm. Conclusions and future work are presented in section 6. Results for the Elliptic Filter [19] #mult #add #registers #mux 2:1 #bus #bus inputs Solution 1 1 2 11 20 7 28 Solution 2 1 3 12 9 4 17 2. PREVIOUS WORK The existing algorithms for unit binding can be classified according to three main tendencies: control step based algorithms: All binding tasks are executed for each control ....
....benchmarks. Results obtained with simple examples like the Differential equation [18] or the FACET example [3] are comparable or better than the obtained by other systems. These results are not reported due to the small size of the problems. Results obtained for the Fifthorder Wave Elliptic Filter [19] are compared with results obtained from HAL [18] SPLICER [20] MABAL [7] ARYL LYRA [12] SALSA [13] ELF [14] SPAID [21] and SCHALLOC [22] Table 2 reports the results obtained using 1 multiplier (two cycles of latency) and 2 adders in a 21 cycle schedule. GLASS improves the results obtained ....
P. Dewilde and E. Deprettere and R. Nouta. Parallel and Pipelined VLSI Implementation of Signal Processing Algorithms. In VLSI and Modern Signal Processing, pages 258-264, ed. T. Kailath, 1985.
....has received the name ListRange, was tested with several benchmark graphs and compared to other schedulers. Here, only the most interesting benchmark results will be pre sented (the interested reader is referred to [13] for many more results) the fifth order wave digital filter due to Nouta [3]. The IDFG is given in Fig ure 6 in order to give an impression of its complexity. Note that the multiplication constants that provide the second input of each multiplier have not been drawn. Following the specifications of the benchmark, the duration of an addition is 1 TU and of a Theda. Fold ....
P. Dewilde, E. Deprettere, and R. Nouta. Parallel and pipelined VLSI implementations of signal processing algorithms. In S.Y. Kung, H.J. Whitehouse, and T. Kailath, editors, VLSI and Modern Signal Processing, pages 257-276. Prentice Hall, Englewood Cliffs, New Jersey, 1985.
....[22] Most of these examples have been rewritten in HardwareC and synthesized to logic level imple mentations. Three widely used and compared examples are the example used in Facet (Tseng) 23] the differential equation solver (Diffeq) 24] and the 5 order elliptic waveform filter (Elliptic) [25]. Although these examples do not contain detailed synchronization and timing constraints, they serve to demonstrate the use of our approach on general synchronous designs. The statistics on the SIF models of these three benchmark designs are given in Table 2. The size of the multiplication is ....
P. Dewilde, E. Deprettere, and R. Nouta, "Parallel and pipelined VLSI implementation of signal processing algorithms," in VLSI and Modern Signal Processing (Kung and Whitehouse, eds.), pp. 258-264, Prentice Hall, 1985.
....for producing area and power competitive designs. We will use a 5th order wave digital elliptical filter as an illustrative example to introduce both the problem formulation and the algorithm for solving this new optimization task. Figure 11 shows the computational graph of this popular benchmark [Dew85]. There are 8 inputs for one iteration of the filter (7 delays and one primary input i 1 ) and 8 outputs (seven delays and one primary output o 9 ) The entries in Table 1 show the functional dependences between outputs and inputs (assuming that constant weighting factors for multiplications 5, ....
P. Dewilde, E. Depretter, R. Nouta: "Parallel and Pipelined VLSI Implementation of Signal Processing Algorithms", in "VLSI and Modern Signal processing", ed. by S.Y. Kung, H.J. Whitehouse, T. Kailath, pp. 257-264, Prentice Hall, Englewood Cliffs, N.J., 1985
....used here to demonstrate that the minimisation technique produces (very) good results also for this type of functions. Example wdf3 i is a 3 rd order wave digital filter, with the constants taken as inputs. It is constructed from three stages of the well known 5 th order wave digital filter [20]. The final SFG of wdf3 i is shown in figure 6. The minimisation technique has managed to arrive at a slightly better result than the initial SFG by first normalising the initial 4 multiplicationsand 13 additions into 52 multiplicationsand in39 i2 i1 in2 i1 C9 i2 in18 i2 i2 C40 i2 in1 ....
P. Dewilde, E. Deprettere, R. Nouta, "Parallel and Pipelined VLSI Implementation of Signal Processing Algorithms," in VLSI and Modern Signal Processing, edited by: S. Y. Kung, H. J. Whitehouse, T. Kailath, Prentice-Hall Information and System Sciences Series, 1985, pp. 258--264.
....to run time efficiency improvements. A public domain solver [Berk94] has beenused to solve the IP problems. The solver uses the simplex algorithm with sparse matrix methods together with branch and bound techniques for the integer part of the problem. In table 1 and 2, results for WDELF from [DeWi85] and FDCT are given. The tables show all the area versus time points generated by the approach described in [Timm93c] so not only the optimal points but also the points for lower bound estimates which were not exact and for which no schedule exists. The last column of the tables shows two ....
....NEAT scheduler is very small, making it a very run time efficient exact scheduler. The only exception is the infeasible module set for FDCT with 10 cycles: the search space for that example is large for any kind of scheduler and can not be traversed efficiently. Table 1: Scheduling results WDELF [DeWi85]. C multipliers (d=dii=2) adders (d = 1) CPU times IP scheduler (sec) CPU times NEAT (sec) # infeasible branches NEAT 17 3 3 1.6 0.5 0 0 18 2 2 1.8 0.4 0 0 19 2 2 2.2 0.5 0 0 20 2 2 2.9 0.8 0 0 21 1 2 1.8 0.9 0 0 22 1 2 3.8 1.0 0 0 23 1 2 5.3 1.1 0 0 ....
P. DeWilde, E. Deprettere and R. Nouta, "Parallel and Pipelined VLSI Implementations of Signal Processing Algorithms", in S.Y. Kung, H.J. Whitehouse and T. Kailath, VLSI and Modern Signal Processing, Prentice Hall, pp. 258--264, 1985.
.... The RTL descriptions are obtained using our high level synthesis tool [7] All the designs have a constant bitwidth (equal to 8 in the presented experiments) EX1 to EX5 are different versions of the HLS benchmark example borrowed from [8] EX6, EX7 and EX8 are elliptical filters borrowed from [9]. The RTL designs have been extended to gate level using Synopsys [10] In examples EX1 to EX5, the only testability bottlenecks are not due to loops or reconvergence paths thus, the Algorithm 1 for scan pre selection do not apply. Designs EX6 to EX8 present all kinds of testability problems. ....
P. Dewilde, E. Deprettere, R. Nouta , "Parallel and pipelined VLSI implementation of signal processing algorithms", In VLSI and Modern Signal Processing. S.Y.Kung et al. Editors. Prentice Hall. pp: 257-260.
....test synthesis method has been applied to four behavioral synthesis benchmarks. Their characteristics before register allocation are given in Table 4. Tseng s example is borrowed from [26] the differential equation example from [23] the AR filter from [27] and the elliptical filter (EW) from [28]. Columns 2 to 7 show respectively the number of control steps, f.u.s, variables, constants and primary inputs outputs. Columns 8 and 9 give the number of controllable and observable variables. Table 4: Benchmarks characteristics Example # steps # f.u. # var # constants # Primary Inputs # ....
P. Dewilde, E. Deprettere, R. Nouta , "Parallel and pipelined VLSI implementation of signal processing algorithms", In VLSI and Modern Signal Processing. S.Y.Kung, H.J.Whitehouse, T.Kailath Editors. Prentice Hall. pp: 257-260.
....We have applied our test synthesis method to four behavioral synthesis benchmarks. Their characteristics before register allocation are given in table 4. Tseng s example is borrowed from [11] the differential equation example from [12] the AR filter from [13] and the elliptical filter (EW) from [14]. TABLE 4 steps f.u. var c. var. o. var. roms PI PO tseng 13 2 ,2 11 5 3 0 2 1 differential 9 1 ,1 ,2 12 10 6 2 3 1 ar filter 21 1 ,2 20 4 6 2 4 2 ew filter 18 3 ,3 39 2 1 3 1 1 For illustration purpose, the substractors are supposed to be neither C transparent nor O transparent The ....
P. Dewilde, E. Deprettere, R. Nouta , "Parallel and pipelined VLSI implementation of signal processing algorithms", In VLSI and Modern Signal Processing. S.Y.Kung, H.J.Whitehouse, T.Kailath Editors. Prentice Hall. pp: 257--260.
....HLS for Testability methodology. Finally, we compare the proposed strategy with a gate level DFT technique. 4. 1 Design examples We first applied the presented algorithm on classical design examples: the differential equation example (difeq) from [14] the elliptical wave filter (ewfil) from [15] and the Tseng s example (tseng) from [16] The only testability bottlenecks encountered in these High Level Synthesis benchmarks are loops which are well known to have an impact on ATPG process complexity. It is difficult to compare our results with those presented in the literature [9] 10] 6] ....
P.Dewilde, E.Deprettere, R.Nouta "Parallel and Pipelined VLSI Implementation of Signal Processing Algorithms", in VLSI and Modern Signal Processing, S.Y.Kung, H.J.Whitehouse - T.Kailath Editors. Prentice Hall. pp.257-260, 1985.
....behavioral synthesis benchmarks. The RTL descriptions are obtained using our high level synthesis tool [12] All the designs have a constant ( 8) bitwidth. EX1 to EX5 are different versions of the HLS benchmark example borrowed from [9] EX6, EX7 and EX8 are elliptical filters borrowed from [10]. The RTL designs have been extended to gate level using Synopsys . In examples EX1 to EX5, the only testability bottlenecks are due to transparency problems thus, the exhaustive search sections (phase 2 and 3) in the algorithm for scan pre selection do not apply. Designs EX6 to EX8 present all ....
P. Dewilde, E. Deprettere, R. Nouta , "Parallel and pipelined VLSI implementation of signal processing algorithms", In VLSI and Modern Signal Processing. S.Y.Kung et al. Editors. Prentice Hall. pp: 257-260.
....an SATPG. Fig.8 High level synthesis for easy testability VI Results We present here experiments conducted with four behavioral synthesis benchmarks. Tseng s example is borrowed from [9] the differential equation example from [10] the AR filter from [11] and the elliptical filter (EW) from [12]. Unfortunately, in these HLS examples, the operations are only additions, subtractions and multiplications which are not demonstrative for our purpose (all transparency coefficients are equal to 1) For illustration purpose, we replaced some of the operations by shift operation implemented by ....
P. Dewilde, E. Deprettere, R. Nouta , "Parallel and pipelined VLSI implementation of signal processing algorithms ", In VLSI and Modern Signal Processing. S.Y.Kung, H.J.Whitehouse, T.Kailath Editors. Prentice Hall. pp: 257--260.
....on the results of heuristic schedulers and on the run time efficiency of schedulers is shown. 2. Execution interval analysis 2.1. Bipartite graph matching formulation The EI analysis is based on a bipartite graph matching formulation. In figure 1a, the fifth order wave digital filter from [DeWi85] is presented with a time constraint of 21 cycles and a set of resources consisting of 1 multiplier and2 adders. First the usual operation execution intervals (OEIs) based on the ASAP and ALAP values under the assumption of unlimited resources are determined. These intervals are called the initial ....
....are 8 (non overlapping) MEIs of 2 cycles to perform the multiplications. Figure 1c shows these MEIs, which start every 2 cycles from cycle step 4 onward. Digest of technical papers of the ICCAD, pp. 454 459, Santa Clara, CA, November 7 11, 1993. Fig 1: Fifth order wave digital filter from [DeWi85] with bipartite matching Time constraint: 21 cycles. Resources: 1 multiplier (d=dii=2) 1 slow adder (d=dii=2) 1 fast adder (d=1) Initial bipartite graph matching formulation for the multiplications: 4,5] 6,7] 8,9] 10,11] 12,13] 14,15] 16,17] 18,19] 15 [4,9] 25 [4,9] 18 [8,13] 28 [8,13] ....
[Article contains additional citation context not shown here]
P. DeWilde, E. Deprettere and R. Nouta, "Parallel and Pipelined VLSI Implementations of Signal Processing Algorithms ", in S.Y. Kung, H.J. Whitehouse and T. Kailath, VLSI and Modern Signal Processing, Prentice Hall, pp. 258--264, 1985.
.... DEFINITION 2 [CAPACITY OF A MODULE TYPE] Let l Q L and e Q [T, T] cap: L x [T, T] # is the function which describes the number of operations a module of type l can execute in an interval e: cap(l, e) e # d(l) # dii(l) dii(l) Figure 3a shows the fifth order wave digital filter from [DeWi85]. During the execution of the fifth addition (operation 14) no other operation can be executing. Such operations can be identified by the fact that their initial OEIs do not overlap with the initial OEI of any other operation in case the time constraint is as tight as possible. As soon as the ....
....bound turns out to be incorrect, then the bound has to be incremented until it can possibly be correct. 4.2. Bipartite graph matching formulation The remaining lower cycle bound estimation is based on a bipartite graph matching formulation. In figure 3a, the fifth order wave digital filter from [DeWi85] is presented with a lower cycle bound of 21 cycles and a set of resources consisting of 1 multiplier and 2 adders. For the multiplications the initial OEIs under the assumption of unlimited resources are given in figure 3c. There are 8 multiplications, so there are exactly 8 MEIs in which an ....
[Article contains additional citation context not shown here]
P. DeWilde, E. Deprettere and R. Nouta, "Parallel and Pipelined VLSI Implementations of Signal Processing Algorithms", in S.Y. Kung, H.J. Whitehouse and T. Kailath, VLSI and Modern Signal Processing, Prentice Hall, pp. 258--264, 1985.
....divisions. However, they do occur for some designs, and some process divisions, as demonstrated by the examples in the next section. For further details, see [16] V. AN EXAMPLE OF PROCESS CREATION A 5TH ORDER ELLIPTIC WAVE FILTER This example, a fifth order digital elliptic wave filter from [4], was one of the examples from the ACM IEEE 1988 Workshop on High Level Synthesis [2] The VT for the filter is shown in Fig. 8. WALKER, et al.: BEHAVIORAL TRANSFORMATION FOR ALGORITHMIC LEVEL IC DESIGN 7 Section A describes the transformations required to split the VT horizontally, ....
P. Dewilde, E. Deprettere, and R. Nouta. Parallel and Pipelined VLSI Implementation of Signal Processing Algorithms. In S.Y. Kung, H.J. Whitehouse, and T. Kailath (editors), VLSI and Modern Signal Processing, chapter 15, pages 257--264. Prentice-Hall, 1985.
....method, we compared the synthesis and test generation results of using partitioning scheme with those using non scan, partial scan and full scan schemes. Designs used for experiments are four high level synthesis benchmarks and are specified in behavior VHDL. They are EllipticFilter benchmark [3], the Tseng benchmark [14] the Square Root benchmark [13] and the Diffeq benchmark [10] From the experiments, the fault coverage after partitioning is improved for these benchmarks compared with non scan and partial scan schemes. ATPG time and test application time get different improvement with ....
P. Dewilde, E. Deprettere, and R. Nouta. Parallel and pipelined VLSI implementation of signal processing algorithms. In S. Y. Kung et al., editor, VLSI and Modern Signal Processing, pages 257--264. Prentice-Hall, 1985.
....value lifetimes are taken from the specific scheduler used and not seen as a separate benchmark. We have applied the minimal grouping algorithm to the set of storage values generated by an in house scheduling and assignment tool [17] for the well known fifth order wave digital filter benchmark [5], a benchmark not considered by Aloqeely and Chen due to its irregularities . The specific instance of the problem had 20 storage values for which it can be shown that an implementation with common RAMs would require at least 4 memories. 100 different runs of the program (to be less dependent on ....
P. Dewilde, E. Deprettere, and R. Nouta. Parallel and pipelined VLSI implementations of signal processing algorithms. In S.Y. Kung, H.J. Whitehouse, and T. Kailath, editors, VLSI and Modern Signal Processing, pages 257--276. Prentice Hall, Englewood Cliffs, New Jersey, 1985.
....[5] 8] They are true cascaded orthogonal filters. These filters realize the transfer function as a cascade inter connection of 4 terminal orthogonal sections, where each section consists of only Givens rotations and delay elements which can be mapped onto CORDIC arithmetic based processors [9] [19] Since each orthogonal section implements a real or a pair of complex conjugate transfer zeroes, these zeroes can be tuned to the desired location to achieve low sensitivity in filter stop band [6] 7] The ODR lattice filters are not true cascaded orthogonal filters by this criteria, since ....
P. Dewilde, E. Deprettere, , and R. Nouta, "Parallel and pipelined VLSI implementation of signal processing algorithms", in VLSI and Modern Signal Processing,(S. Kung, H. Whitehouse, and T. Kailath, eds.), Englewood Cliffs, NJ: Prentice Hall, Inc., pp. 257--276, 1984.
....and ODR lattice filters have recently been pipelined in [2] The cordic based IIR digital filters [4] discussed here are true cascaded orthogonal filters. These filters consist of a cascade of 4 terminal orthogonal sections, with each section implemented by an efficient CORDIC algorithm [5]. The ODR lattice filters are not true cascaded orthogonal filters by this criteria, since they have 6 terminals for each cascaded section. Because of this, it is expected that the ODR lattice filters have much higher sensitivity in filter stop band compared to the cordic IIR filters [4] Similar ....
....go with the first row of the matrix. Here, fOE i g are filter parameters, and can be determined through the synthesis routine. Notice that the whole filter consists of only the planar rotation blocks and storage elements. These rotation blocks can be implemented using an efficient CORDIC processor [5]. A degree 1 section implements one zero of the transfer function, and a degree 2 section implements a pair of complex conjugate zeros, since for real coefficient transfer functions, if a zero is complex, its complex conjugate must also be a zero. The filter synthesis algorithm is based on a ....
P. Dewilde, E. Deprettere, and R. Nouta, "Parallel and pipelined VLSI implementation of signal processing algorithms", VLSI and Modern Signal Processing,(S. Kung, H. Whitehouse, and T. Kailath, eds.), Englewood Cliffs, NJ: Prentice Hall, Inc., 1984.
....Minneapolis, MN 55455, U.S.A. E mail: fjunma,parhig ee.umn.edu . E. F. Deprettere is with the Department of Electrical Engineering, Delft University of Technology, 2628 CD Delft, The Netherlands. E mail: ed cas.et.tudelft.nl . tions, with each section implemented by an efficient CORDIC algorithm [5]. The ODR lattice filters are not true cascaded orthogonal filters by this criteria, since they have 6 terminals for each cascaded section. Because of this, it is expected that the ODR lattice filters have much higher sensitivity in filter stop band compared to the cordic IIR filters [4] Similar ....
....the first row of the matrix. Here, fOE i g are filter parameters, and can be determined through the synthesis routine. Notice that the whole filter consists of only the planar rotation blocks and storage elements. These rotation 2 blocks can be implemented using an efficient CORDIC processor [5]. A degree 1 section implements one zero of the transfer function, and a degree 2 section implements a pair of complex conjugate zeros, since for real coefficient transfer functions, if a zero is complex, its complex conjugate must also be a zero. Z 1 Z 1 z ( 1 U z ( 1 U z ( U 2 z ( 1 ....
P. Dewilde, E. Deprettere, and R. Nouta, "Parallel and pipelined VLSI implementation of signal processing algorithms ", VLSI and Modern Signal Processing,(S. Kung, H. Whitehouse, and T. Kailath, eds.), Englewood Cliffs, NJ: Prentice Hall, Inc., 1984.
No context found.
P. Dewilde, E. Deprettere, R. Nouta, "Parallel and pipelined VLSI implementation of signal processing algorithms", in : S.Y. Kung, H.J. Whitehouse, T. Kailath (ed.), "VLSI and modern signal processing", Prentice-Hall, 1985.
No context found.
P. DeWilde et.al., "Parallel and pipelined VLSI implementation of signal processing algorithms," VLSI and Modern Signal Processing, Kung, Whitehouse, and Kailath, Eds., Englewood Cliffs, NJ: PrenticeHall, 1985.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC