97 citations found. Retrieving documents...
P. M. Kogge. The Architecture of Pipelined Computers. Hemisphere Publishing Corp., 1981.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Embedded Software in Real-Time Signal Processing.. - Goossens, Van.. (1997)   (11 citations)  (Correct)

.... accumulation with the previous multiplication result ( acc ) and the load of the next multiplication operand from memory ( load ) To control the operations in the data pipeline, two different mechanisms are commonly used in computer architecture: data stationary and time stationary coding [16]. In the case of data stationary coding, every instruction that is part of the processor s instruction set controls a complete sequence of operations that have to be executed on a specific data item, as it traverses the data pipeline. Once the instruction has been fetched from program memory ....

P. M. Kogge, The Architecture of Pipelined Computers. New York: McGraw-Hill, 1981.


Automatic Synthesis of Time-Stationary Controllers for.. - Kim, Kurdahi, Park   (Correct)

....of resources and estimates the cost of registers and interconnections. In [2] module assignment and Register Transfer level synthesis of pipelined data paths was presented. Control synthesis usually follows data path synthesis. There are two basic control schemes for pipelined data paths[3]: data stationary and time stationary. These are depicted in Figure 1. A data stationary control scheme passes control function code along with data. This scheme allows simple and straight forward design of both the state sequencer and the data path control circuits for each stage, and thus, is ....

P. M. Kogge. The Architecture of Pipelined Computers. McGraw-Hill, New York, N.Y., 1981.


Data Locality Optimizations for Multigrid Methods on Structured.. - Weiß   (Correct)

....hierarchy architecture is given. The chapter concludes with a summary of trends in microprocessor architecture. 2.1 Pipelining and Superscalar Execution In this section, pipelining will only be briefly introduced. A more detailed description of pipelining for microprocessors can be found in [Kog81, HP96] Pipelining is a technique that is applied to many situations to speed up the overall execution of a process which repeatedly performs a certain task. It must be possible to divide the task into a series of individual and independent operations, or stages, that, when applied sequentially, ....

P.M. Kogge. The Architecture of Pipelined Computer. McGraw--Hill, New York, USA, 1981.


Register-Transfer Synthesis of Pipelined Data Paths - Park, Kurdahi (1995)   (Correct)

....group overlap in time due to pipelined execution. Three multipliers and five adders are chosen and shared over all six time steps. The number of rows in the allocation table corresponds to the number of available modules. The allocation table is an extension of a conventional reservation table [9] [10]. The results of scheduling and resource allocation together with the chosen module set are the inputs to the module assignment task, which performs specific operation to operator mapping. In i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 i11 i12 i13 i14 i15 i16 w1 w2 w3 w4 w5 w6 w7 w8 out 1 2 3 4 5 6 ....

P. M. Kogge, The Architecture of Pipelined Computers. New York, N.Y.: McGraw-Hill, 1981.


Complexity-Effective Superscalar Processors - Palacharla (1997)   (161 citations)  (Correct)

....Perspective This section briefly outlines the evolution of ILP processors, especially superscalar processors, while highlighting major trends in design trade offs involving hardware complexity and performance. Figure 1 1 illustrates the evolution of ILP processors with a time line. Pipelining [Kog81] is the most prevalent technique for exploiting instruction level parallelism. Pipelining enables overlapped execution of multiple instructions by breaking instruction processing into segments, just like an assembly line. It was first implemented in the IBM Stretch [Buc62] in 1961. Ever since, ....

P. M. Kogge. The Architecture of Pipelined Computers. McGraw-Hill, 1981.


Efficient Pipelining of Nested Loops: Unroll-and-Squash - Petkov, Harr, Amarasinghe (2001)   (Correct)

.... loops but considers inner loop entries as exceptional exits from hardware [1] Overall few techniques for scheduling across basic block boundaries handle nested loop structures efficiently [7] 11] The general theory of hardware pipelining and optimal scheduling of recurrences can be found in [15]. 6. Conclusion This paper presented a loop pipelining technique that targets nested loop pairs with an iteration parallel outer loop and a strong inter and intra iteration data dependent inner loop. The method was evaluated using the Nimble compiler framework on several signal processing ....

P. Kogge. The Architecture of Pipelined Computers, McGraw Hill, NY, 1981.


Block Based Compression Storage Expected Performance - Vassiliadis, Cotofana, Stathis (2000)   (Correct)

....SPAR [15] have been proposed and developed. 1 Finite Element Modeling [9] is a powerful numerical technique for solving partial differential equations which has SMVM as basic computational step. 1 Generally speaking, due to their intrinsic support for data parallelism, vector architectures [11, 8] are potentially good candidates to efficiently execute SMVMs and other types of sparse matrix manipulations. In practice however they are not as efficient 2 on sparse matrices as they are on dense. This performance degradation mainly relates to the code and data irregularity induced by the fact ....

.... c = A b; c i = n Gamma1 X k=0 a i;k b k ; i = 0; 1; n Gamma 1 (1) In Equation (1) to calculate c, a sequence of n inner products A i Theta b, where A i is the i th row of the matrix A, i = 0; 1; n Gamma 1, have to be evaluated. Vector architectures [11, 8] such as the one graphically depicted in Figure 4 are particularly suitable to efficiently perform such inner product evaluations as they can operate on sequences of data, i.e. vectors, via vector instructions. On most current vector architectures [8] the vectors are copied from the main memory ....

P. M. Kogge. The Architecture of Pipelined Computers. McGraw-Hill, New York, 1981.


Variable-Precision Arithmetic for Vector Quantization - Dionysian (1994)   (Correct)

....throughput. It is both low cost and simple to incorporate. It reduces the inner product evaluation time by reducing the cycle time. Instead of waiting for propagation through the summation network, we need to wait only for propagation through one pipeline stage. The pipelining speedup S pipe (l) Kog81] is S pipe (l) nT original (n l Gamma 1)T s ; T s T original l T latch nl n l Gamma 1 As the number of stages of the pipeline l increases, stage time T s decreases. Even if each codevector to be chosen for classification depends on the previous one, S pipe still can be large. ....

P. M. Kogge. The Architecture of Pipelined Computers. McGraw-Hill Book Company, 1981.


Sparse Matrix Vector Multiplication Evaluation Using.. - Stathis, Cotofana.. (2001)   (Correct)

....Section 4 the simulation environment setup and results. Finally, in Section 5 we draw some conclusions. 2 The Vector Processor Simulator Before describing the Vector Processor Simulator that we developed we will rst give a brief description of a vector architecture. Vector architectures [12, 9] (VPs) have the ability to operate with one instruction on sequences of data, i.e. vectors, via vector instructions. On most current vector architectures [9] the vectors are copied from the main memory into vector registers within the processor before they are operated upon. Vector registers are ....

P. M. Kogge. The Architecture of Pipelined Computers. McGraw-Hill, New York, 1981.


Software performance estimation of DSPs for HW/SW partitioning - Auguin Belleudy Gogniat   (Correct)

....methods [11] Consequently, loops that operate on structured data must be estimated with appropriate techniques. 3.2. 1 Estimation of regular loop executions The maximum performance of a processor is achieved when at least one of its functional units is used at each cycle of the loop execution [10]. Functional units included in DSP processors result from characteristics of signal processing algorithms. Since loops are the most time consuming parts of a program, architectures of DSPs are particularly adapted to their execution. Hence, for regular loops that operate on structured set of data, ....

KOGGE P.M., The architecture of pipelined computers. McGraw-Hill, 1981.


Memory Access Synchronization in Vector Multiprocessors - Mateo Valero Montse   (Correct)

....processor linear address space into a pair (m, d) that indicates the module where the address is mapped, m, and the displacement inside the module, d. The term storage scheme refers to the different types of implementation of such correspondence. Vector computers use the interleaved storage scheme [3] (Figure 1) whereas other storage schemes such as skewing have been used in array processors [4] and linear transformations [5] have been used in VLIW systems such as the Cydra [6] and scalar multiprocessors such as the RP3 [7] to improve the performance of the memory system. Fig. 1. Interleaved ....

P.M. Kogge, "The Architecture of Pipelined Computers", McGraw-Hill, New York, 1981.


On Scheduling Cycle Shops: Classification, Complexity And.. - Middendorf, Timkovsky   (Correct)

....counterpart. Periodic problems in this paper are based on the above periodicity concept. We avoid considering periodic problems on identical parallel machines [HM95, M96] and in recurrent job shops [H94] with potential or conjunctive constraints [CC90, H94] originated from pipeline computing [K81]. ON SCHEDULING CYCLE SHOPS 9 2.4. Job characteristics and minimization criteria. In comparision with the classification of Hall et al. HKS97] for periodic robotic flow shop problems, our classification includes them as well but remains the job characteristics field the same as in the well known ....

P. M. Kogge, The architecture of pipelined computers, McGraw Hill, New York, 1981.


On Scheduling Cycle Shops: Classification, Complexity And.. - Middendorf, Timkovsky   (Correct)

....counterpart. Periodic problems in this paper are based on the above periodicity concept. We avoid considering periodic problems on identical parallel machines [HM95, M96] and in recurrent job shops [H94] with potential or conjunctive constraints [CC90, H94] originated from pipeline computing [K81]. ON SCHEDULING CYCLE SHOPS 9 2.4. Job characteristics and minimization criteria. In comparision with the classi cation of Hall et al. HKS97] for periodic robotic ow shop problems, our classi cation includes them as well but remains the job characteristics eld the same as in the well known ....

P. M. Kogge, The architecture of pipelined computers, McGraw Hill, New York, 1981.


Vector Multiprocessors with Arbitrated Memory Access - Peiron, Valero.. (1995)   (4 citations)  (Correct)

....and to effectively use the available memory bandwidth. For the case of a single vector processor with one memory port and a matched memory system (M = T) several storage schemes have been proposed to efficiently access streams with the most frequent strides. The basic scheme is interleaving [1], in which the module number is obtained from the m lowest bits of the address; this storage scheme allows a minimum latency in order access for streams of odd stride, but results in degraded performance for even strides. Other storage schemes, such as skewing [2] and linear transformations [3] ....

P. M. Kogge, "The Architecture of Pipelined Computers", McGraw-Hill, New York, 1981.


A Mathematical Formulation of the Loop Pipelining Problem - Cortadella, Badia, Sanchez (1995)   (5 citations)  (Correct)

.... of other authors to propose techniques for loop pipelining with timing constraints [3, 6, 34] Similar (if not identical) problems are encountered in the area of optimizing compilers for parallel architectures (VLIW, superscalar and superpipelined) Under the general name of software pipelining [21], several techniques have been devised: modulo scheduling [31] URPR [36] Lam s algorithm [22] and perfect pipelining [1] Loops are usually represented by means of a data dependence graph (DG) Figure 1 shows an example. Vertices represent computational nodes (henceforth called instructions) ....

P. M. Kogge. The architecture of pipelined computers. New York. McGraw-Hill, 1981. 23


Processing in Memory: Chips to Petaflops - Kogge, Brockman, Sterling, Gao (1997)   (5 citations)  Self-citation (Kogge)   (Correct)

....space just as discussed above. Another example is in many attached signal and vector processors. In the IBM 3838 array processor, for example, the unit had internally several pipelined floating point units which could directly address are relatively large (for its time) block of SRAM storage [15]. A separately programmable unit sat between this memory and the main DRAM memory. Its entire purpose was to manage data transfers between the two memories. Approximately a half dozen data transfer routines matched the transfer patterns for virtually all the vector and array functions implemented ....

Kogge, P. M., The Architecture of Pipelined Computers, McGraw Hill/Hemisphere Press, NY, NY, 1981.


Worst Case Timing Analysis Of Concurrently Executing Dma I/o And.. - Huang (1997)   (1 citation)  (Correct)

No context found.

P. M. Kogge. The Architecture of Pipelined Computers. Hemisphere Publishing Corp., 1981.


Integrated Analysis of Power and Performance For.. - Zyuban, Brooks.. (2004)   (Correct)

No context found.

P. Kogge, The Architecture of Pipelined Computers. Hemisphere Publishing, 1981.


Compiler-Architecture Exploration using - Reservation Tables Generation   (Correct)

No context found.

P. M. Kogge. The Architecture of Pipelined Computers. McGraw-Hill, New York, 1981.


Memory Access Synchronization in Vector Multiprocessors - Mateo Valero Montse (1994)   (Correct)

No context found.

P.M. Kogge, "The Architecture of Pipelined Computers", McGraw-Hill, New York, 1981.


Vector Multiprocessors With Arbitrated Memory Access - Montse Peiron Mateo (1995)   (4 citations)  (Correct)

No context found.

P. M. Kogge, "The Architecture of Pipelined Computers", McGraw-Hill, New York, 1981.


Multimedia Rectangularly and Separably Addressable Memory - Georgi Kuzmanov Georgi   (Correct)

No context found.

P. M. Kogge. The Architecture of Pipelined Computers. McGraw-Hill, 1981.


A 2D Addressing Mode for Multimedia Applications - Kuzmanov, Vassiliadis, van.. (2002)   (Correct)

No context found.

P. M. Kogge. The Architecture of Pipelined Computers. McGraw-Hill, 1981.


Efficient State-Diagram Construction Methods for Software.. - Zhang, al. (1999)   (Correct)

No context found.

Peter M. Kogge. The Architecture of Pipelined Computers. McGraw-Hill Book Co., New York, NY, 1981.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC