10 citations found. Retrieving documents...
S. Oberman. Design Issues in High Performance Floating Point Arithmetic Units. PhD thesis, Stanford University, December 1996.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Strength Reduction of Integer Division and Modulo Operations - Saman Amarasinghe Walter   (Correct)

....more important today than ever. While modern processors have taken advantage of increasing silicon area by replacing iterative multipliers with faster, non iterative structures such as Wallace multipliers, similar non iterative division modulo functional units have not materialized technologically [19]. Thus, while the performance gap between an add and a multiply has narrowed, the gap between a divide and the other arithmetic operations has either widened or remained the same. In the MIPS family, for example, the ratio of costs of div mul add has gone from 35 12 1 on the R3000 to 35 6 1 on the ....

S. Oberman. Design Issues in High Performance Floating Point Arithmetic Units. PhD thesis, Stanford University, December 1996.


Strength Reduction of Integer Division and Modulo Operations - Amarasinghe, Lee, Greenwald   (Correct)

....more important today than ever. While modern processors have taken advantage of increasing silicon area by replacing iterative multipliers with faster, non iterative structures such as Wallace multipliers, similar non iterative division modulo functional units have not materialized technologically [19]. Thus, while the performance gap between an add and a multiply has narrowed, the gap between a divide and the other arithmetic operations has either widened or remained the same. In the MIPS family, for example, the ratio of costs of div mul add has gone from 35 12 1 on the R3000 to 35 6 1 on the ....

S. Oberman. Design Issues in High Performance Floating Point Arithmetic Units. PhD thesis, Stanford University, December 1996.


Strength Reduction of Integer Division and Modulo.. - Sheldon, Lee.. (2001)   (Correct)

....more important today than ever. While modern processors have taken advantage of increasing silicon area by replacing iterative multipliers with faster, non iterative structures such as Wallace multipliers, similar non iterative division modulo functional units have not materialized technologically [16]. Thus, while the performance gap between an add and a multiply has narrowed, the gap between a divide and the other arithmetic operations has either widened or remained the same. In the MIPS family, for example, the ratio of costs of div mul add has gone from 35 12 1 on the R3000 to 35 6 1 on the ....

S. Oberman. Design Issues in High Performance Floating Point Arithmetic Units. PhD thesis, Stanford University, December 1996.


A comparison of three rounding algorithms for IEEE.. - Even, Seidel (1998)   (2 citations)  (Correct)

....Standard [13] The latency of the FP multiplier is critical to the oating point performance since a large portion of the FP instructions consists of FP multiplications. For example, Oberman reports that FP multiplications account for 37 percent of the FP instructions in benchmark applications [17]. A lot of research has been devoted to optimizing the latency of adding the partial products to produce the product, e.g. 1, 2, 6, 9, 15, 16, 18, 19, 20, 21, 26, 28, 29, 30] More recently, work on rounding the product according to the IEEE 754 Standard has been published [4, 7, 10, 22, 23, 24, ....

S.F. Oberman. Design Issues in High Performance Floating Point Arithmetic Units. PhD thesis, Stanford University, January 1997. accessible via ftp://umunhum.stanford.edu/tr/oberman.nov96.thesis.ps.Z.


Parallel, Pipelined CORDICs for Reconfigurable Computing - Mencer, Morf   (Correct)

....1985. Both projects investigated the feasibility of FPGAs as computing platforms. Conventional general purpose processors consist of a fixed, general datapath, and programmable control (instructions) for that datapath. A few general purpose arithmetic units are highly optimized for low latency i.e.[13]. On custom computing machines datapath and control are fully programmable, allowing the designer to tailor the architecture of the computer to the structure of the algorithm. Flexibility or reconfigurability comes at the expense of latency (i.e. longer cycle time) and logic density on the chip. ....

S. Oberman with M. Flynn, Design Issues in High Performance Floating Point Arithmetic Units, PhD Thesis, E.E. Dept., Stanford, Jan. 1997.


A comparison of three rounding algorithms for IEEE.. - Even, Seidel (1998)   (2 citations)  (Correct)

.... microprocessor includes a floating point (FP) multiplier that complies with the IEEE 754 Standard [9] The latency of the FP multiplier is critical to the floating point performance since a large portion of the FP instructions consists of FP multiplications (37 percent in benchmark applications [13]) A lot of research has been devoted to optimizing the latency of adding the partial products to produce the product, e.g. 1, 5, 11, 12, 14, 15, 16, 22] More recently, work on rounding the product according to the IEEE 754 Standard has been published [3, 4, 6, 17, 18, 19, 20, 23, 25] Assuming ....

S. Oberman. Design Issues in High Performance Floating Point Arithmetic Units. PhD thesis, Stanford, January 1997.


SRT Division: Architectures, Models, and Implementations - Harris, Oberman, Horowitz   Self-citation (Oberman)   (Correct)

....like to use a high radix r. However, low latency does not come for free. As the radix increases, the quotient digit selection becomes more complicated, which may increase the cycle time. Moreover, the generation of all required divisor multiples may become impractical for higher radices. Oberman [10] shows that the delay of quotient selection tables increases linearly with increasing radix, while the area increases quadratically. While prescaling of the input operands [11] reduces table complexity at the expense of additional latency, the difficulty in generating all divisor multiples for ....

....there are two typical choices for the digit set: minimally redundant f;2# ;1# 0# 1# 2g and maximally redundant f;3# ;2# ;1# 0# 1# 2# 3g. The quotient selection logic for a maximally redundant radix 4 digit set is about 20 faster and 50 smaller than for a minimally redundant digit set [10]. However, maximally redundant radix 4 requires the computation of the 3x divisor multiple, which typically requires extra initial delay and area. 5 Input Mux Latch Figure 1. SRT divider block diagram 2.3 Higher Performance Several techniques have been proposed for improving the ....

[Article contains additional citation context not shown here]

S. F. Oberman, Design Issues in High Performance Floating Point Arithmetic Units, Ph.D. thesis, Stanford University, Nov. 1996.


SRT Division Architectures and Implementations - Harris, Oberman, Horowitz (1997)   (3 citations)  Self-citation (Oberman)   (Correct)

....be a power of 2. However, this latency reduction does not come for free. As the radix increases, the quotient digit selection becomes more complicated, which may increase the cycle time. Moreover, the generation of all required divisor multiples may become impractical for higher radices. Oberman [10] shows that the delay of quotient selection tables increases linearly with increasing radix, while the area increases quadratically. While prescaling of the input operands [11] reduces table complexity at the expense of additional latency, nevertheless the difficulty in generating all of the ....

....typical choices for the digit set: minimally redundant f Gamma2; Gamma1; 0; 1; 2g and maximally redundant f Gamma3; Gamma2; Gamma1; 0; 1; 2; 3g. The quotient selection logic for a maximally redundant radix 4 digit set is about 20 faster and 50 smaller than for a minimallyredundant digit set [10]. However, maximally redundant radix 4 requires the computation of the 3x divisor multiple, which typically requires extra initial delay and area. 2.2.3 Choice of Remainder Representation The partial remainder also can be represented in two different forms, either redundant or nonredundant. Each ....

[Article contains additional citation context not shown here]

S. F. Oberman, Design Issues in High Performance Floating Point Arithmetic Units, Ph.D. thesis, Stanford University, Nov. 1996.


Division Algorithms and Implementations - Oberman, Flynn (1997)   (5 citations)  Self-citation (Oberman)   (Correct)

....implementation, as the number of partial remainder input bits are halved. However, the delay of the quotientdigit selection function increases by the delay of the adder. Such an adder is shown in Fig. 1 as the CPA component. OBERMAN AND FLYNN: DIVISION ALGORITHMS AND IMPLEMENTATIONS 837 Oberman [13] present a methodology for performing this analysis along with several techniques for minimizing table complexity. Specifically, the use of Gray coding for the quotient digits is proposed to allow for the automatic minimization of the quotient digit selection logic equations, achieving near ....

....carry assimilating adder, increases the size and delay of the table, offsetting the possible performance gain due to the shorter adder. 2. 3 Increasing Performance Several techniques have been reported for improving the performance of SRT division including [14] 15] 16] 17] 18] 19] [13], 21] 22] 23] 24] Some of these approaches are discussed below. 2.3.1 Simple Staging In order to retire more bits of quotient in every cycle, a simple low radix divider can be replicated many times to form a higher radix divider, as shown in Fig. 3. In this implementation, the critical ....

[Article contains additional citation context not shown here]

S. Oberman, "Design Issues in High Performance Floating Point Arithmetic Units," PhD thesis, Stanford Univ., Nov. 1996.


The Raw Prototype Design Document - Taylor (2000)   (2 citations)  (Correct)

No context found.

S. Oberman. "Design Issues in High Performance Floating Point Arithmetic Units," Ph.D. Dissertation, Stanford University, December 1996.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC