24 citations found. Retrieving documents...
P. Lapsley, J. Bier, A. Shoham and E. Lee, DSP Processor Fundamentals: Architectures and Features, Berkeley Design Technology Inc., 1996

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Design of an Instruction Set for Modular Network Processors - Wolf   (Correct)

....cost. Thus, the main advantages of modular network processors are: ability to specialize modules to common or computationally intense task, achieve scalability by using various module con gurations. Using specialized hardware ( hardware accelerators ) like digital signal processors (DSP) [7] or customized logic, is useful for common and computationally intense tasks. These modules are tradeo s between silicon real estate and processing speedup for packets that can make use of the specialized module. Thus, it is important to use the accelerators for common tasks (Amdahl s Law) ....

P. Lapsley, J. Bier, A. Shoham, and E. A. Lee. DSP Processor Fundamentals: Architectures and Features. IEEE Press Series on Signal Processing, Jan. 1997.


Hardware Support to Reduce Overhead in Fine-Grain Media Codes - Talla, John, Burger (2001)   (Correct)

....essing. All branches and instructions related to loop increments are handled by this technique. This approach is fairly simple and straightforward to implement and has been implemented in many conventional DSP proc essors such as the Motorola 56000 and TMS320C5x from Texas Instruments [18]. Address calculation: The current PLE allows for three input data structures streams and produces one output structure. The choice was made because many media algorithms can benefit from this capability (current SIMD execution units sometimes operate on three input registers to produce one ....

P. Lapsley, J. Bier, A. Shoham, and E. A. Lee. DSP Processor Fundamentals': Architectures and Features, Chapter 8, IEEE Press series on Signal Processing, ISBN 0-7803-3405-1, 1997.


Assembly Code Optimization Techniques For Real Time Dsp - Implementation Of Speech (2002)   (Correct)

....are the wireless DSP processors where the hardware has been well molded by applications. These enhanced conventional DSP processors are characterized with irregular data paths, small register files, specialized and non orthogonal instruction sets. Consequently they make very poor compiler targets [1] and providing high level language support is a difficult task. The need for hand assembly becomes very significant since the firmware determines the target applications performance. As an example the quality of the coreware in a mobile handset may determine its battery life significantly. In ....

....architecture of DSP processors provides multiple memory access capability. The multiple memory access are commonly implemented with the use of separate memory banks or a multiport RAM or sometimes a RAM of a much faster access time so that multiple sequential access are completed in a single cycle [1]. The C code should be studied for the opportunity to place data in different memory spaces so that parallel access can be performed. As an example the filter coefficient and input data can be placed in separate memory space. This can be accomplished with the use of compiler directives facilitated ....

Phil Lapsley, Jeff Bier, Amit Shoham and Edward A. Lee, DSP Processor Fundamentals: Architectures and Features, IEEE Press, 1996.


A New Approach For Block-Floating-Point Arithmetic - Kobayashi, Fettweis (1999)   (Correct)

....DSPs matches with the float type of the C language makes the application of C compiler straightforward and thus relatively easy. However, the signal processing quality of short word floating point has not been studied well, and there are some questions about its arithmetic performance [1]. Block floating point also seems to be a very attractive solution. In view of chip size, a block floating point DSP can be as small as a fixed point DSP while allowing higher signal processing quality. However, again, This work was supported in part by Asahi Chemical, Tokyo, Japan, Siemens ....

Phil Lapsley et al. DSP Processor Fundamentals: Architectures and Features. IEEE Press, 1997.


The Evolution of DSP Processors - Eyre, Bier, al. (2000)   (3 citations)  (Correct)

....compute a multiplication in a single clock cycle. As might be expected, faster multiplication hardware yields faster performance in many DSP algorithms, and for this reason all modern DSP processors include at least one dedicated single cycle multiplier or combined multiply accumulate (MAC) unit [1]. Sxh x h The Evolution of DSP Processors By Jennifer Eyre and Jeff Bier, Berkeley Design Technology, Inc. BDTI) Note: A version of this white paper has appeared in IEEE Signal Processing Magazine) Copyright 2000 Berkeley Design Technology, Inc. PAGE2 of 9 Multiple Execution Units DSP ....

Lapsley et. al, "DSP Processor Fundamentals: Architectures and Features," IEEE Press, 1996.


Architectural Techniques to Accelerate Multimedia Applications on.. - Talla (2001)   (Correct)

....[8] 43] The concept of embedding loops in 31 hardware was implemented commercially in the TI ASC [23] do loop in this case) The SMA architecture [75] provided similar flexibility in accessing matrices. This concept was seen to be successful in all these machines as well as many DSP processors [57]. The Burroughs scientific processor [55] was a pure SIMD array processor that had special purpose hardware called alignment networks for packing and unpacking data. In addition, the processor has several powerful SIMD instructions of which many are being used in current SIMD extensions. 32 ....

....hardware loop control and supports up to five levels of loop nesting. All branches related to loop increments (based on indices used for referencing data) are handled by this technique. This is done in many conventional DSP processors such as the Motorola 56000 and TMS320C5x from Texas Instruments [57] 1 . Data Station: This is the register file for the SIMD computation and is implemented as a queue. Dedicated register files are present in conventional machines for SIMD either as a separate register file (as in AltiVec) or aliased to the floating point register file (as in MMX) Breeze ....

[Article contains additional citation context not shown here]

P. Lapsley, J. Bier, A. Shoham, and E. A. Lee. DSP processor fundamentals: architectures and features, Chapter 8, IEEE Press series on Signal Processing, ISBN 0-7803-3405-1, 1997.


Cost-effective Hardware Acceleration of Multimedia Applications - Talla, John (2001)   (3 citations)  (Correct)

....overhead for loop increments. A similar mechanism was commercially implemented in the TI ASC [7] two levels of do loop nesting in addition to a selfincrement loop) Conventional DSP processors such as the TMS320C5x from TI also use such a technique for one or more levels of loop nesting [8]. Fig. 5 shows the block diagram of the looping hardware. Loop index values are produced every clock cycle based on the loop bound for each level of nesting (bounds for each of the five loops are specified in the Breeze instruction) The value of a loop index varies from 1 (lower bound) to the ....

P. Lapsley, J. Bier, A. Shoham, and E. A. Lee, DSP Processor Fundamentals: Architectures and Features, Chapter 8, IEEE Press series on Signal Processing, ISBN 0-78033405 -1, 1997.


The Evolution of DSP Processors - Eyre, Bier (2000)   (3 citations)  (Correct)

....a multiplication in a single clock cycle. As might be expected, faster multiplication hardware yields faster performance in many DSP algorithms, and for this reason all modern DSP processors include at least one dedicated single cycle multiplier or combined multiply accumulate (MAC) unit [1]. Multiple Execution Units DSP applications typically have very high computational requirements in comparison to other types of computing tasks, since they often must execute DSP algorithms (such as FIR filtering) in real time on lengthy segments of signals sampled at 10 100 KHz or higher. ....

Lapsley et. al, "DSP Processor Fundamentals: Architectures and Features," IEEE Press, 1996.


Embedded Software - An Agenda for Research - Lee (1999)   (6 citations)  (Correct)

....keyboards and screens. We should not forget that even the emphasis on keyboards and screens is relatively recent. Computation has its roots in the transformation of data, not in the interaction 4 with sensors, actuators, and humans. 1. 6 Real time software Starting when programmable DSPs [51] and microcontrollers appeared in the 1970s, functionality has been steadily shifting from hardware to software. This glib statement actually has profound consequences. What we mean by software is primarily sequential execution, where the same hardware resources are multiplexed in time to ....

P. Lapsley, J. Bier, A. Shoham, and E. A. Lee, DSP Processor Fundamentals --- Architectures and Features, IEEE Press, New York, 1997.


Parallel Saturating Fractional Arithmetic Units - Navindra Yadav And (1999)   (1 citation)  (Correct)

....operations [2] To be compliant with the GSM standard, the results produced must be identical to results produced when the operations are performed serially. To achieve high performance, DSPs have multipliers, adders, and multiply and accumulate (MAC) units with one or two cycle latencies [4]. Typically, saturating arithmetic operations require two cycles, with saturation performed in the second cycle. More complicated operations, such as a MAC operation with saturation after both the multiplication and the addition, often require a greater number of cycles. The challenge is to design ....

P. Lapsley. DSP Processor Fundamentals: Architectures and Features. IEEE Press, 1997.


Combined Unsigned and Two's Complement Saturating.. - Schulte, Gok, Balzola.. (2000)   (Correct)

.... shading [1] 2] 3] Because of its usefulness in these types of applications, support for saturating arithmetic has been added to several instruction set architecture extensions [1] 4] 5] 6] Saturating arithmetic operations are also supported on most commercial digital signal processors [7]. With saturating multiplication, products that overflow are saturated to the most positive or most negative representable number [7] For unsigned saturating multiplication, if the product is too large to represent, it is saturated to the largest representable number. With two s complement ....

.... several instruction set architecture extensions [1] 4] 5] 6] Saturating arithmetic operations are also supported on most commercial digital signal processors [7] With saturating multiplication, products that overflow are saturated to the most positive or most negative representable number [7]. For unsigned saturating multiplication, if the product is too large to represent, it is saturated to the largest representable number. With two s complement saturating multiplication, if the magnitude of the product is too large to represent, it is saturated to either the most positive or most ....

P. Lapsley, DSP Processor Fundamentals: Architectures and Features, IEEE Press, 1997.


Integer Multiplication with Overflow Detection or.. - Schulte, Balzola, Akkas, .. (2000)   (1 citation)  (Correct)

....2n 1 p 2n 2 p a 1 b 1 a 0 b 1 a 1 b 0 a 0 b 0 p 0 1 p n p n 1 p a n 2 a n 1 n 2 b n 1 b n 1 b a n 1 a n 2 n 2 b a n 1 a n 2 1 b b 1 a n 1 a n 2 0 b b 0 a 1 b n 1 a 0 b n 1 a 1 b n 2 a 0 b n 2 Fig. 1. Unsigned Multiplication Matrix for P = A Delta B most negative representable number [13], 14] 15] Overflow detection or saturation is used in several applications including digital filter implementations, speech encoding, and graphics applications [11] 16] Previous research on overflow detection and saturation has focused on fractional operands [14] 17] or operations other ....

P. Lapsley, DSP Processor Fundamentals: Architectures and Features. IEEE Press, 1997.


Aviv: A Retargetable Code Generator for Embedded Processors - Hanono (1999)   (2 citations)  (Correct)

....execution of loops. Such hardware avoids having to perform (in software) an update and comparison of the loop index variable. It also avoids the need for a conditional branch instruction to the beginning of the loop. Zero overhead looping hardware follows one of the following configurations [32]: ffl A one word repeat buffer: This configuration supports the repeated execution of one instruction. This instruction must be fetched from the instruction memory during the first iteration of the loop. Subsequently, the instruction is fetched from the repeat buffer. ffl An N word repeat ....

P. Lapsley, J. Bier, A. Shoham, and E. Lee. DSP Processor Fundamentals - Architectures and Features. IEEE Press, 1997.


Techniques for Effectively Exploiting a Zero.. - Uh, Wang, Whalley..   (Correct)

....the branch instruction. Common software techniques include loop strength reduction with basic induction variable elimination and loop unrolling. Note that loop unrolling can signi cantly increase code size. Currently available versions of ZOLBs in TI, ADI, and Lucent processors have been described [5]. Assembly language programmers for DSPs commonly use ZOLBs in the code that they write. However, optimizing compilers have been used only recently for DSP applications and programmers still tend to write critical sections by hand [6] A preliminary version of this paper appeared in a workshop ....

Lapsley, P., Bier, J., Lee, E.: DSP Processor Fundamentals - Architecture and Features, IEEE Press (1996).


Advances in the Dataflow Computational Model - Najjary, Lee, Gao (1999)   (1 citation)  (Correct)

....3.2 Decidable Dataflow Many signal processing systems involve repeated (infinite) execution of a well defined finite computation on an infinite stream of data. Implementations have real time constraints, and often take the form of embedded software (such as assembly code for programmable DSPs [69]) This raises a number of interesting issues. In particular, it is important that the schedule of actor firings be predictable in order to ensure that real time constraints are met. It is also critical that a program never deadlock. Because of the embedded system context, it is also important ....

P. Lapsley, J. Bier, A. Shoham, and E. A. Lee, DSP Processor Fundamentals Architectures and Features, IEEE Press, New York, 1997.


Code Generation for Fixed-Point DSPs - Araujo, Malik (1998)   (4 citations)  (Correct)

....features which distinguish DSPs from general purpose processors with respect to basic block code generation. It is not the purpose of this section to give a detailed and extensive analysis of these features. A comprehensive analysis of DSP architectures can be found in [Lee 1988] Lee 1989] [Lapsley et al. 1996]. DSPs can be classified according to the type of data they use as fixed point DSPs and floating point DSPs. In applications running on a fixed point DSP, users are responsible for scaling the result of the integer operations. This is automatically done in floating point DSPs. Floating point ....

Lapsley, P., Bier, J., Shoham, A., and Lee, E. A. 1996. DSP Processor Fundamentals: Architectures and Features. IEEE Press.


Code Generation Algorithms For Digital Signal Processors - Araujo (1997)   (2 citations)  (Correct)

....not to describe implementation details of each feature, but to analyze its impact on the code generation task. The interested reader should refer to Lee s survey papers [19, 33] for an extensive comparison between a number of DSP architectures. An updated version of this work can also be found in [34]. Annual reports on all commercial DSPs, including benchmark analysis of typical applications 1 , are also available (e.g. 35] 2.1.1 Memory System The demand for high performance of DSP applications typically requires one instruction to be executed for each machine cycle. In order to achieve ....

P. Lapsley, J. Bier, A. Shoham, and E. A. Lee. DSP Processor Fundamentals: Architectures and Features. IEEE Press, 1996.


Real - Time Implementation - Of Nearfield Broadband   (Correct)

No context found.

P. Lapsley, J. Bier, A. Shoham and E. Lee, DSP Processor Fundamentals: Architectures and Features, Berkeley Design Technology Inc., 1996


Bottlenecks in Multimedia Processing with SIMD Style.. - Talla, John, Burger (2003)   (Correct)

No context found.

P.Lapsley,J.Bier,A.Shoham,andE.A.Lee.DSP Processor Fundamentals: Architectures and Features, Chapter 8, IEEE Press series on Signal Processing, ISBN 0-7803-3405-1, 1997.


A Sub-Word Parallel Digital Signal Processor - For Wireless Communication (2002)   (Correct)

No context found.

Lapsley et al., DSP Processor Fundamentals: Architectures and Features. IEEE Press, New York, 1996.


A Class Of Efficient-Encoding Generalized - Low-Density Parity-Check Codes (2001)   (Correct)

No context found.

P. Lapsley, J. Bier, A. Shoham, and E. A. Lee, DSP Processor Fundamentals: architectures and features, IEEE Press, 1997.


Parallel Saturating Fractional Arithmetic Units - Navindra Yadav And (1999)   (1 citation)  (Correct)

No context found.

P. Lapsley. DSP Processor Fundamentals: Architectures and Features. IEEE Press, 1997.


Scalable Vector Media-processors for Embedded Systems - Kozyrakis (2002)   (7 citations)  (Correct)

No context found.

P. Lapsley, J. Bier, A. Shoham, and E. Lee. DSP Processor Fundamentals: Architectures and Features. IEEE Press, 1997.


MediaBreeze: A Decoupled Architecture for Accelerating.. - Talla, John (2001)   (1 citation)  (Correct)

No context found.

P. Lapsley, J. Bier, A. Shoham, and E. A. Lee, DSP Processor Fundamentals: Architectures and Features, Chapter 8, IEEE Press series on Signal Processing, ISBN 0-78033405 -1, 1997.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC