16 citations found. Retrieving documents...
V. G. Oklobdzija, D. Villeger, and S. S. Liu, "A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach," IEEE Trans. Computers, vol. 45, no. 3, pp. 294--305, March 1996.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Layout-Aware Synthesis of Arithmetic Circuits - Um, Kim (2002)   (Correct)

....circuits, especially multipliers (e.g. 19] using bit level FAs as basic implementation components. An extensive survey can be found in [20, 21] The main issue of the bit level arithmetic optimization is implementing fast Partial Product Reduction Trees (PPRTs) using FAs. Oklobdzija et al. [11] proposed an optimal algorithm, called three greedy algorithm, for constructing a minimum delay PPRT using FAs under a restricted FA timing model. Further, Stelling et al. 12] proposed an algo D(x) D(y) D(z) x (p) y (q) z (r) D s1 D D(sum) max D(y) D , D(z) D s1 s2 D s2 D s2 s2 ....

V. G. Oklobdzija, D.Villeger, and S.S. Liu, "A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers using an Algorithmic Approach", IEEE Trans. on Computers, Vol. 45, No. 3, 1996.


Circuit Implementation Of A Very--High Radix Cordic.. - Pi Neiro Bruguera (1999)   (Correct)

....2a, a, and 2a. The eight partial products generated are totaled using a tree with two levels of 4:2 carry save adders (CSA s) implemented taking advantage of the existence of fast inputs and fast outputs, and performing the appropriate interconnections with the object of minimising their delay [6]. In expressions (4) to (7) it is shown that the accumulation operation is only needed during the micro rotations. As only 6 of the 8 partial products available in the first level CSA tree of the MAC s are used in these kind of iterations, accumulation is incorporated in the multiplier introducing ....

V.G. Oklobdzija, D. Villeger, and S.S. Liu, "A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach," IEEE Transactions On Computers, vol. 45, no. 3, pp. 294--305, 1996.


Practical Experiences with Standard-Cell Based Datapath.. - Ienne, Grießing (1998)   (1 citation)  (Correct)

.... multipliers (some not shown on the table) have been generated combining the following elements: 1) Partial product generation is built in the customary using either a Booth 2 encoding scheme or no encoding [5] 2) Column compressors are generated with an algorithm similar to the one published in [4], and use the available timing information on the signals to add and on the standard cells to be used (several combinations have been tried) 3) Two types of final adders have been used, either with a fixed 8 bit group carry select carry lookahead structure or with a generalized structure ....

V. G. Oklobdzija, D. Villeger, and S. S. Liu. A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach. IEEE Transactions on Computers, C-45(3):294--306, Mar. 1996.


The SNAP Project: Design of Floating Point Arithmetic Units - Oberman, Al-Twaijry, Flynn (1997)   (19 citations)  (Correct)

....trivial. Therefore, an algorithmic approach to the design, using a sophisticated delay model that takes into account the interconnect delay due to counter placement and the different path delays, is extremely useful. We have implemented such an algorithm, based upon the approach of Oklobdzija [15]. The algorithm is essentially the same as that proposed by Oklobdzija, but it also takes into account interconnect delay due to counter placement and the different path delays. Our algorithm uses a complex delay model for the (3,2) counter, and it is further constrained by the availability ....

V. G. Oklobdzija, D. Villeger and S. S. Liu, "A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach," IEEE Trans. Computers, vol. C-45, no.3, pp. 294--305, Mar. 1996.


A Multiplier with Redundant Operands - Ferguson, Ercegovac (1999)   (Correct)

....allow for a unbiased comparison between the redundant multiplier and conventional, the algorithm for designing the array has been chosen in such a way as to be functional and uniform, but is not completely optimal. The method employed is the Three Dimensional Minimization (TDM) method proposed in [4]. The array is considered in column wise slices, known as vertical compressor slices (VCS) Each VCS is composed of the outputs of the multiple generators for some particular array output bit column and the outputs of the next lower order VCS. The delays of inputs to the columns are not at all ....

V. G. Oklobdzija, et al., "A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach," IEEE Transactions on Computers, Vol. 45, No. 3, March 1996.


On The Complexity Of Booth Recoding - Paul, Seidel   (Correct)

....We list just a few ffl One can systematically study where drivers should be placed in order to make nets smaller and hence speed up signal propagation. ffl One can analyze hybrid layouts (arrays of small trees) and the layouts of many other multiplication designs ( 1] 5] 8] 14] 17][20] [22] 23] 25] 30] 32] ffl The layouts of various adder and shifter designs can be analyzed in a quite realistic way. ffl It is common practice to fold layouts of addition trees into more square layouts. But the formula wire(T ) w(T ) 2 2h(T ) suggests, that trivial folding of layouts ....

V.G. Oklobdzija, D. Villeger, and S. S. Liu. A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach. IEEE Transactions on Computers, 45(3):294--306, March 1996. 16


A comparison of three rounding algorithms for IEEE.. - Even, Seidel (1998)   (2 citations)  (Correct)

....instructions consists of FP multiplications. For example, Oberman reports that FP multiplications account for 37 percent of the FP instructions in benchmark applications [17] A lot of research has been devoted to optimizing the latency of adding the partial products to produce the product, e.g. [1, 2, 6, 9, 15, 16, 18, 19, 20, 21, 26, 28, 29, 30]. More recently, work on rounding the product according to the IEEE 754 Standard has been published [4, 7, 10, 22, 23, 24, 25, 31, 33, 34] Assuming that the multiplier outputs a carry save encoded digit string representing the exact product, the following natural question arises: What is the ....

V.G. Oklobdzija, D. Villeger, and S. S. Liu. A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach. IEEE Transactions on Computers, 45(3):294-306, March 1996.


Constant Delay Linear Size Adder under Left-to-Right Input.. - Takagi, Horiyama   (Correct)

.... For example, this addition appears in the conversion of quotient in an unfolded implementation of shift and subtract type division, such as SRT division, and also of shift and subtract type square rooting [4] Another example is the upper part of the final adder in the tree type multipliers [5]. Our concern is to implement an adder as a combinational circuit under left to right input arrival condition. We measure the complexity of a combinational circuit by its delay and size. The delay is the computation time after the arrival of the final input bits. An n bit carry lookahead adder ....

....on the onthe fly conversion algorithm (OTFA) has O(1) delay and O(n 2 ) size [1, 3] Although its delay is small, large amount of hardware is required. A carry select adder (CSA) with O(log n) blocks has O(log n) delay and O(n) size when the number of input bits for each block is optimized [5]. Although the amount of hardware for the CLA or the CSA is smaller than that of the OTFA, its delay is not negligible. In this paper, we propose an adder with constant delay and linear size. We call it CDLS adder. Its delay is a very small constant as the OTFA. Its size is O(n) as the CLA and ....

V.G. Oklobdzija, D. Villeger and S.S. Liu, `A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach, ' IEEE Trans. Computers, vol.45, no.3, pp.294--305, 1996.


High-Speed MARS Hardware - Satoh, Ooba, Takano, D'Avignon (2000)   (4 citations)  (Correct)

....proposed [6] 7] to optimize the critical path of this tree, but these compressors basically consist of 3:2 compressors. Booth encoding [8] is widely used to reduce the number of partial products, but it is a kind of 4:2 compression technique and does not change the tree structure. Oklobdzija et al. [9] suggested that not all inputs and outputs from a compressor contribute equally to the delay, and the difference in using 4:2 and higher order compressors is not in the structure of the compressor but in the way they are interconnected. Multiplicator Multiplicant Partial Products Result (a) ....

V. G. Oklobdzija, D. Villeger and S. S. Liu: "A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach," IEEE Trans. on Comp., vol. 35, no. 3, pp. 294-305, Mar. 1996.


Constant Delay Linear Size Adder under Left-to-Right Input.. - Takagi, HORIYAMA (1997)   (Correct)

.... For example, this addition appears in the conversion of quotient in an unfolded implementation of shift and subtract type division, such as SRT division, and also of shift and subtract type square rooting [4] Another example is the upper part of the final adder in the tree type multipliers [5]. Our concern is to implement an adder as a combinational circuit under left to right input arrival condition. We measure the complexity of a combinational circuit by its delay and size. The delay is the computation time after the arrival of the final input bits. An n bit carry lookahead adder ....

....on the onthe fly conversion algorithm (OTFA) has O(1) delay and O(n 2 ) size [1, 3] Although its delay is small, large amount of hardware is required. A carry select adder (CSA) with O(log n) blocks has O(log n) delay and O(n) size when the number of input bits for each block is optimized [5]. Although the amount of hardware for the CLA or the CSA is smaller than that of the OTFA, its delay is not negligible. In this paper, we propose an adder with constant delay and linear size. We call it CDLS adder. Its delay is a very small constant as the OTFA. Its size is O(n) as the CLA and ....

V.G. Oklobdzija, D. Villeger and S.S. Liu, `A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach, ' IEEE Trans. Computers, vol.45, no.3, pp.294--305, 1996.


Technology Scaling Effects on Multipliers - Al-Twaijry, Flynn   (Correct)

....not trivial. Therefore, an algorithmic approach to the design, using a sophisticated delay model that takes into account the interconnect delay due to counter placement and the different path delays, is extremely useful. We have implemented such an algorithm, based upon the approach of Oklobdzija [10]. The algorithm extends Oklobdzija by taking into account interconnect delay due to counter placement and the different path delays. Our algorithm also uses a complex delay model for the (3,2) counter, and it is further constrained by the availability of wiring tracks for the routing of each ....

V. G. Oklobdzija, D. Villeger, and S. S. Liu, "A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach," IEEE Transactions on Computers, vol. 45, no. 3, pp. 294--306, March 1996.


Design Issues In High Performance Floating Point Arithmetic Units - Oberman (1996)   (7 citations)  (Correct)

....product reduction is implemented by a series of carry free adders, which are connected in one of many different topologies ranging from linear arrays to logarithmic trees. The organization of these reduction trees has been the subject of previous research, much of which is summarized in [3] 4] [5], and [6] The final carry propagate addition is an application of integer addition, a topic independent from FPU design [7] The remaining tradeoffs in multiplier design are: method of partial product generation, topology and circuit design of the reduction tree, and the topology and circuit ....

V. G. Oklobdzija, D. Villeger, and S. S. Liu, "A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach," IEEE Transactions on Computers, vol. 45, no. 3, pp. 294--306, March 1996.


Optimum Placement and Routing of Multiplier Partial Product.. - Hesham Al-Twaijry And   (Correct)

....by increasing the current to selected gates. This fine tuning is not feasible for CMOS designs because the current drive in CMOS is related to the transistor widths, and varying the transistor widths effects the previous stages, unlike ECL. Another algorithm was developed by Oklobdzija et al., [12] where the design is based upon the carry save counter. This algorithm takes into account the different delays for the paths, in addition to the different input output delays. However, it ignores the incremental wiring delay that is caused by the placement of the counters. A problem is that these ....

V. G. Oklobdzija, D. Villeger and S. S. Liu, "A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach" IEEE Trans. on Computers, vol C-45, No.3, pp. 294-305, Mar. 1996.


Design Strategies for Optimal Multiplier Circuits - Martel, Oklobdzija, Ravi.. (1995)   (2 citations)  Self-citation (Oklobdzija)   (Correct)

....Dept. of Computer Science Princeton University University of California at Davis Princeton, NJ 08544 Davis, CA 95616 Email: ravi cs.princeton.edu Email: stelling cs.ucdavis.edu Abstract We present new design and analysis techniques for the synthesis of fast parallel multiplier circuits. In [4], Oklobdzija, Villeger, and Lui suggested a new approach, the Three Dimensional Method (TDM) for Partial Product Reduction Tree (PPRT) design that produces multipliers which outperform the current best designs. The goal of TDM is to produce a minimum delay PPRT using full adders. This is done by ....

....producing two bits for each column (carrys are incorporated from each column to the next) Then add the two (2n Gamma 1) bit numbers produced using a final adder, which we will call a carry propagate adder (See Figure 1) The basic problems we address here relate to designing fast PPRTs. In [4] the Three Dimensional Method (TDM) for globally designing the PPRT of a parallel multiplier circuit is described. The goal of the TDM is to produce a minimum delay PPRT using full adders ( 3,2) adders) and a small number of half adders ( 2,2) adders) In [4] it was shown that the TDM has the ....

[Article contains additional citation context not shown here]

V. G. Oklobdzija, D. Villeger, S. S. Liu, "A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach," in press IEEE Transaction on Computers, 1995.


Optimized Synthesis of Sum-of-Products - Zimmermann, Tran (2003)   (Correct)

No context found.

V. G. Oklobdzija, D. Villeger, and S. S. Liu, "A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach," IEEE Trans. Computers, vol. 45, no. 3, pp. 294--305, March 1996.


Parametric Time Delay Modeling for Oating Point Units - Fahmy, Liddicoat, Flynn (2002)   (Correct)

No context found.

V. G. Oklobdzija, D. Villeger, and S. S. Liu, \A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach," IEEE Transactions on Computers 45, pp. 294-306, Mar. 1996.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC