| K. K. Parhi, "Algorithm transformation techniques for concurrent processors, " Proc. IEEE, vol. 77, pp. 1879--1895, Dec. 1989. |
....in these handsets. There have been many digital hardware design strategies proposed for power reduction including: reduction of supply voltage, reduction of clock speed and data rate, parallelization and pipelining of operations, using sign magnitude arithmetic, and differential encoding of data [2], 3] Another technique, which is the springboard for this paper, is the reduction of the number of bits used to represent the data and control variables in the digital circuit [4] The bit width reduction strategy is very highly leveraged since it reduces the power dissipation everywhere in the ....
K. K. Parhi, "Algorithm transformation techniques for concurrent processors," IEEE Proceedings, vol. 77, pp. 1879-- 1895, Dec. 1989.
....during an iteration: it consumes a single data sample from each of its incoming data edges and produces a single sample on each of its output edge. However, single rate CDFG s are not sufficient to represent the complexity of many present day DSP designs, which often require multirate CDFG s [28] [35]. In multirate systems, the computation loop may require different nodes to be executed a different number of times in a single computation iteration. While we do not develop the theoretical framework underlying rephasing of multirate CDFG s, in this section, we show that rephasing is indeed ....
....not develop the theoretical framework underlying rephasing of multirate CDFG s, in this section, we show that rephasing is indeed applicable to such CDFG s and carries similar benefits as in rephasing of single rate CDFG s. Fig. 9(a) shows an example of a multirate CDFG. Following the notation of [35], the numbers at the inputs of a node represent the number of samples consumed by it from that input on each invocation of the node. Similarly, the numbers at the outputs of a node represent the number of samples produced by it on that output on each invocation. In this example, node A fires twice ....
[Article contains additional citation context not shown here]
K. K. Parhi, "Algorithm transformation techniques for concurrent processors, " Proc. IEEE, vol. 77, pp. 1879--1895, Dec. 1989.
....of the edge is x at the n th iteration, then x at the (n 1) st iteration is in the output side of the arc: xx n D n 1 The tabu set, T, is supposed to have a constant number of elements and it equals the tabu tenure, t. All the closed path in figure1 have a simple unit delay, so, as argued in [13], no unfolding process is necessary at this abstraction level to improve the parallel degree of the system. Surveying the data dependences, we can indicate the executive temporal sequence of the algorithm exploiting the precedence graph shown in figure 2. The generation of the CL is the heavier ....
K.K. Parhi, "Algorithm Transformation Techniques for Concurrent Processors", Proc. of IEEE, vol. 77, n.12, Dec. 1989, pp.1879-1895.
....tools based on such a paradigm will be necessary to realize complex VLSI systems for signal processing and communications. The present design trend (see Fig. l(b) is to incorporate VLSI issues as constraints into the algorithm design phase. In particular, algorithm transformation techniques [8] were proposed to as an intermediary step in the translation to VLSI hardware. These techniques were originally developed for high throughput applications. However, they have found applications in low power design as well [9] Algorithm transformation techniques modify the algorithm structure ....
....it by removing arcs with non zero delays. Thus, the iteration period of the DFG in Fig. 3 is 30 time units. The critical path of a DFG is a path p such that d(p) IP. The goal of most algorithm transformation techniques is to reduce the delay of the critical path. The iteration period bound (IPB) [8] for a DFG is defined as follows: max (7) vL EeL w(e) where L is a loop in the DFG, where a loop is defined as a path p whose source and destination nodes are identical. Note that IP can be altered via the application of various algorithm transformation techniques. However, the IP will always ....
[Article contains additional citation context not shown here]
K. K. Parhi, Algorithm transformation techniques for concurrent processors, Proceedings of the IEEE, vol. 77, pp. 1879-1895, Dec. 1989.
....been proposed at all lev els of the design hierarchy beginning with algorithms and architectures and ending with circuits and technological innovations. Existing techniques include those at the algorithmic level (such as reduced complexity algorithms [6] architectural level (such as pipelining [12,25] and parallel pro cessing) logic (logic minimization [31] and precomputation [1] circuit (reduced voltage swing [21] adiabatic logic [3] and technological level [8] It is now well recognized that an astute algorithmic and architectural design can have a large impact on the final power ....
K. K. Parhi, "Algorithm transformation techniques for concurrent processors," Proceedings of the IEEE, vol. 77, pp. 1879-1895, December 1989.
....adaptive equalizer archi tectures. Traditionally, the focus in algorithm design has been to obtain performance in terms of better signal to noise ratios ( NR) and or bit error rates (BER) The present trend is to trade off a small amount of performance via algorithm transformation techniques [31] for a much superior VLSI architecture. Algorithm transformation techniques [6,31] such as look ahead [32] relaxed look ahead [37] block processing [33] associa tivity [36] unfolding [15,34] folding [35] retiming [21] have all been employed to design high speed algorithms and architectures. ....
....has been to obtain performance in terms of better signal to noise ratios ( NR) and or bit error rates (BER) The present trend is to trade off a small amount of performance via algorithm transformation techniques [31] for a much superior VLSI architecture. Algorithm transformation techniques [6,31] such as look ahead [32] relaxed look ahead [37] block processing [33] associa tivity [36] unfolding [15,34] folding [35] retiming [21] have all been employed to design high speed algorithms and architectures. Low power operation was then achieved by trading off excess speed with power. Of ....
K. K. Parhi, "Algorithm transformation techniques for concurrent processors," Proceedings of the IEEE, vol. 77, pp. 1879-1895, Dec. 1989.
....based on such a paradigm will be necessary to realize complex VLSI systems for signal processing and communications. One way to integrate algorithmic concerns (such as SNR) and implementation issues such as area, power dissipation and throughput is to employ algorithm transformation techniques [27] such as pipelining [25,28,31] parallel processing [28] unfolding [16] folding [29] retiming [22] etc. Employed traditionally for high speed applications, pipelined algorithms have found use in low power applications as well. Furthermore, by combining pipelining with folding, it is possible ....
K. K. Parhi, "Algorithm transformation techniques for concurrent processors," Proceedings of the IEEE, vol. 77, pp. 1879-1895, Dec. 1989.
....switching activity, achievable bounds, CMOS circuits, information theory, busses I. INTRODUCTION Power dissipation has become a critical VLSI design con cern in recent years [3] and a substantial amount of research is being conducted at the algorithmic [3] architectural (such as pipelining [13] and parallel processing) logic [9, 18] and circuit [4, 8] levels in order to develop power reduction techniques. Most of these efforts focus upon reducing the on chip dynamic power dissipation of CMOS circuits, which at a node is given by, PD iTCLVdf, 1.1) z where T is the transition ....
K. K. Parhi, "Algorithm transformation techniques for concur- rent processors," Proceedings of the IEEE, vol. 77, pp. 18791895, December 1989.
....techniques is retiming [16, 17] where delays are redistributed among the edges so that the application s function remains the same while the execution time decreases. Despite its usefulness when applied to HDFGs, the application of retiming to SDFGs was explored only marginally prior to 1994 [11, 18] before being studied by Zivojnovic et al. primarily as a way to minimize the delay count of a SDFG [25,27] In this section we intend to review the basics of retiming, explore some of the pitfalls which arise when studying retiming of SDFGs, demonstrate the effectiveness of retiming, and propose ....
....we have described here will prove just as valuable despite this logical gap. 5 Examples In this section, we illustrate our methods further by applying them to various SDFGs found in the literature. 20 5. 1 First Example Consider the SDFG in Figure 15(a) a variation on the example from [18] with a BRV of q = 2 1 2 . In our example, nodes A and B take 1 time unit to execute and C takes 2; thus we will attempt to retime it to have an optimal clock period of 2. There are four edges in the SDFG, so the first condition of Theorem 4.2 gives us an initial set of four inequalities: ....
K.K. Parhi. Algorithm transformation techniques for concurrent processors. Proceedings of the IEEE, 77:1879--1895, 1989.
....in these handsets. Many digital hardware design strategies have been proposed for power reduction including: reduction of supply voltage, reduction of clock speed and data rate, parallelization and pipelining of operations, using sign magnitude arithmetic, and di erential encoding of data [16, 51]. Another technique is the reduction of the number of bits (wordlength) used to represent the data and control variables in the digital circuit [52] The wordlength reduction strategy is very highly leveraged since it reduces the power dissipation everywhere in the data and control ow paths. This ....
K. K. Parhi, \Algorithm Transformation Techniques for Concurrent Processors," IEEE Proceedings, vol. 77, pp. 1879-1895, Dec. 1989.
....techniques is retiming [15, 16] where delays are redistributed among the edges so that the application s function remains the same while the execution time decreases. Despite its usefulness when applied to HDFGs, the application of retiming to SDFGs was explored only marginally prior to 1994 [10,17] before being studied by Zivojnovic et al. primarily as a way to minimize the delay count of a SDFG [24,26] In this section we intend to review the basics of retiming, explore some of the pitfalls which arise when studying retiming of SDFGs, demonstrate the effectiveness of retiming, and propose ....
.... of algorithm; b) Its EHG 4 A B C D 4 4 4 3 2 1 3 1 2 1 4 2 1 10 6 3 4 4 (a) B 1 B 2 B 3 C 1 C 2 C 3 C 4 A 1 A 2 D 5 5 (b) Figure 14: a) Figure 11(a) retimed; b) Its EHG 14 5 A Simple Example To illustrate our method further, consider the SDFG in Figure 15, a variation on the example from [17] with a BRV of T . In our example, nodes and take time unit to execute and takes . We will attempt to retime it to have a clock period of . Our algorithm requires three passes to complete. At the outset, we compute the longest path lengths and find that T , ....
K.K. Parhi. Algorithm transformation techniques for concurrent processors. Proceedings of the IEEE, 77:1879--1895, 1989.
....analyse the structure of the DAG for scheduling. New approaches exist that take genetic algorithms into account [24, 25] Apart from the DAG algorithms, algorithms based on the ITG are being implemented. For this graph model unfolding, re timing and software pipelining are popular techniques [26, 27, 11]. Some of these algorithms utilise again DAG scheduling algorithms for partially unfolded ITGs. To bene t from regular structures of graphs, especially from graphs derived from equations, techniques known from the VLSI processor design [20] are employed. These techniques use the regular structure ....
Keshab K. Parhi. Algorithm transformation techniques for concurrent processors. Proceedings of the IEEE, 77(12):18791895, December 1989.
....of the edge is x at the n th iteration, then x at the (n 1) st iteration is in the output side of the arc: xx n D n 1 The tabu set, T, is supposed to have a constant number of elements and it equals the tabu tenure, t. All the closed path in figure1 have a simple unit delay, so, as argued in [13], no unfolding process is necessary at this abstraction level to improve the parallel degree of the system. Surveying the data dependences, we can indicate the executive temporal sequence of the algorithm exploiting the precedence graph shown in figure 2. The generation of the CL is the heavier ....
K.K. Parhi, "Algorithm Transformation Techniques for Concurrent Processors", Proc. of IEEE, vol. 77, n.12, Dec. 1989, pp.1879-1895.
....in Section 2 the architecture of the reconfigurable equalizer is presented, while simulation results are shown in Section 3. 1. 1 Dynamic algorithm transformations (DAT) Traditionally, signal processing systems have been designed for low power operation by applying certain algorithm transforms [2, 3] in order to optimize the architecture. For example, pipelining [4] may be used to reduce the critical path of a design, thereby allowing the supply voltage to be reduced. Once the algorithm is sufficiently optimized, custom circuits are designed which provide the necessary balance between power ....
K. K. Parhi, "Algorithm transformation techniques for concurrent processors," Proceedings of the IEEE, vol. 77, no. 12, pp. 1879--1895, Dec. 1989.
....) variations from 7dB 10dB. On an average 55 energy savings are achieved. 1. INTRODUCTION Power reduction techniques havebeen proposed at all levels of VLSI design hierarchy ranging from the circuits to algorithms. Of particular interest in this paper are algorithm transformation techniques [1]. Channel SNR Variable Spectrum Modulator and Shaping Demod. Equalizer and SMA Block Data out Data in TRANSCEIVER OUTER (Fixed) INNER TRANSCEIVER (Reconfigurable) Fixed BER Reconfig. RS Encoder Reconfig. RS Decoder (SPA) SPA) 7 7.5 8 8.5 9 9.5 10 10 15 10 10 10 5 10 E N (dB) BER ....
K. K. Parhi, "Algorithm transformation techniques for concurrent processors," Proceedings of the IEEE, vol. 77, no. 12, pp. 1879--1895, Dec. 1989.
....in [ScBa 86] utilize exhaustive search to generate cyclo static schedules, which may reduce the iteration periods most of the times. To exploit the hidden parallelism available in the DFP, transformation techniques such as unfolding and retiming have been applied to the corresponding DFG [PaKK 89] The retiming technique minimizes the critical path length of a DFG but does not guarantee a critical path time less than a speci ed iteration period. In fact, the DFG tasks need to be scheduled optimally to minimize the iteration period, which was not given adequate focus previously. Moreover, ....
K.K.Parhi, "Algorithm transformation techniques for concurrent processors", Proceedings of the IEEE, vol. 77, no. 12, Dec. 1989.
....in these handsets. There have been many digital hardware design strategies proposed for power reduction including: reduction of supply voltage, reduction of clock speed and data rate, parallelization and pipelining of operations, using sign magnitude arithmetic, and differential encoding of data [2], 3] Another technique, which is the springboard for this paper, is the reduction of the number of bits used to represent the data and control variables in the digital circuit [4] The bit width reduction strategy is very highly leveraged since it reduces the power dissipation everywhere in the ....
K. K. Parhi, "Algorithm transformation techniques for concurrent processors," IEEE Proceedings, vol. 77, pp. 1879-- 1895, Dec. 1989.
....was introduced as a technique to optimize hardware circuits by redistributing registers without affecting functionality [1] Retiming is also useful for DSP software design. It changes precedence constraints among instructions or tasks, and can improve single processor [2] and multiprocessor [3,4] schedules. In both cases, hardware and software design, marked graph can be used as an appropriate model of computation, and retiming is a transformation changing the distribution of tokens on arcs. This paper extends retiming principles to non ordinary marked graphs, characterized by nodes ....
....token conservation theorem is not valid anymore, limiting the applicability of numerous useful results developed for the ordinary case. In the past retiming was treated mostly as ordinary (unitrate) retiming. Only marginal treatment of non ordinary (multirate) retiming can be found (e.g. in [3]) The focus of this paper is on reachability of non ordinary marked graphs. It continues along the work of Teruel et al. 8] and provides new reachability results useful for retiming of multirate DSP algorithms. After the introduction, we revise the background and introduce the notation. In ....
[Article contains additional citation context not shown here]
K. Parhi, "Algorithm transformation techniques for concurrent processors," Proceedings of the IEEE, vol. 77, pp. 1879--1895, Dec. 1989.
....propose a novel algorithm for faster determination of the iteration bound of the MRDFG. 1 Introduction Digital signal processing algorithms are repetitive in nature. These algorithms are described by iterative data flow graphs (DFGs) where nodes represent tasks and edges represent communication [1, 2]. Execution of all nodes of the DFG once completes an iteration. Successive iterations of any node are executed with a time displacement referred 1 This research was supported by the Advanced Research Projects Agency and monitored by Wright Patterson AFB under contract number F33615 93 C 1309. ....
....Moreover, different nodes may be invoked for a different number of times in an iteration. In other words, one node is invoked at a different rate from another node in MRDFGs. The definition of the edge in MRDFGs also differs from that in SRDFGs. An MRDFG can be expanded into the equivalent SRDFG [1]. The equivalence means that the MRDFG and its expanded SRDFG express identical signal processing algorithm. In this section we describe a method to expand an 8 MRDFG into its equivalent SRDFG which is similar to unfolding an SRDFG [5] 5.1 The number of invocations of node In SRDFGs, it is ....
K. K. Parhi, "Algorithm Transformation Techniques for Concurrent Processors," Proc. of the IEEE, vol. 77, pp. 1879--1895, Dec. 1989.
No context found.
K. K. Parhi, "Algorithm transformation techniques for concurrent processors, " Proc. IEEE, vol. 77, pp. 1879--1895, Dec. 1989.
No context found.
K. K. Parhi, "Algorithm transformation techniques for concurrent processors, " Proc. IEEE, vol. 77, pp. 1879--1895, Dec. 1989.
No context found.
K. K. Parhi, "Algorithm transformation techniques for concurrent processors ", In Proc. IEEE, vol. 77, pp.1879-1895, Dec. 1989
No context found.
K. Parhi, "Algorithm transformation techniques for concurrent processors," Proceedings of the IEEE, vol. 77, pp. 1879-1895, December 1989.
No context found.
K. Parhi, "Algorithm transformation techniques for concurrent processors," Proceedings of the IEEE, vol. 77, pp. 1879-1895, Dec. 1989.
No context found.
K. K. Parhi, "Algorithm transformation techniques for concurrent processors," IEEE Proceedings, Vol. 77, 1989, pp. 1879-1895.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC