| Patterson D.A. & Hennessy J.L. (1994) Computer Organization and Design: The Hardware/software Interface. Morgan Kaufmann, San Mateo, California. 648 p. |
....I O instructions are used to transfer data one byte at a time to the NIC, memory mapped I O where communication is done through a memory window on the NIC that is mapped to host memory, and . Direct Memory Access (DMA) where the NIC independently copies data directly from host memory [2]. The model used by nearly all new NICs is an extension of DMA called scattergather DMA. In scatter gather DMA the device driver passes a list describing fragments of which the packet is composed of. For example, the packets may consist of two fragments, protocol headers generated by the kernel ....
Patterson D.A. & Hennessy J.L. (1994) Computer Organization and Design: The Hardware/software Interface. Morgan Kaufmann, San Mateo, California. 648 p.
....results must show if preload and locking instructions in cache makes the system predictable and also if the proposed scheme obtain similar performance than traditional caches (direct mapped or set associative) with LRU or Pseudo LRU replacement algorithm. To make experiments, the SPIM tool [15], a MIPS R2000 simulator is used. The SPIM does not include neither cache nor multitask, thus modifications has been made to the original version of SPIM to include an instruction cache, multitasking (simulated and controlled by the simulator and not by the O.S. and to obtain execution times. ....
D. Patterson and J. L. Hennessy. "Computer Organization and Design. The Hardware/Software Interface". Morgan Kaufmann. San Mateo, 1994.
....Computer Architecture concepts are usually analyzed theoretically, leaving the students with incomplete and sometime erroneous views of how a computer works. These misconceptions remain in higher level courses making difficult a thorough learning in the area. Computer organization literature [1, 2, 3, 4, 5] usually attacks the complexity of computer systems by using several layers to describe them. Each layer describes one abstraction level, providing higher insight when analyzing a given subsystem. These levels usually include assembly language, instruction sets, microprogramming and digital logic. ....
....if( time = preparationTime = time ; Model IncDec: externalFunction( const ExternalMessage msg ) Check the input ports, assigning the input values. if( msg.port( OP0 ) OP[0] int) msg.value( if( msg.port( OP1 ) OP[1] int) msg.value( if( msg.port( OP2 ) OP[2] = int) msg.value( if( msg.port( OP3 ) OP[3] int) msg.value( if( msg.port( OP4 ) OP[4] int) msg.value( if( msg.port( FCOD ) FCOD = int) msg.value( if ( FCOD = 1) Increment for (int i=0; i =4; i ) v[4 i] OP[i] alu.activate(v, 00000 , 11 , 1 ) ....
[Article contains additional citation context not shown here]
PATTERSON, D. "Computer Organization and Design: the Hardware/Software Interface". 2nd. Edition. University of California, Berkeley. 1995.
....test cases traversing the if else regardless of the safety of the if else construct itself. Given the safety of potential expressions, we proceed to prove that if else is a safe construct with respect to DejaVu. An if else statement has at least one clear interpretation, as given in Figure 7 [20]. 14 Assuming that this format of the if else is the only format produced by the compiler, and ignoring the existence of additional else if clauses for brevity, we can readily specify the behavior of the if else 14 The Figure uses the MIPS assembly language. Equivalent code could be generated ....
D. A. Patterson, J. L. Hennessy, and D. A. Peterson. Computer Organization and Design: The Hardware /Software Interface. Morgan Kaufmann Publishers, San Fracisco, CA, 1997.
....the density and performance of semiconductor technology. This has brought faster microprocessors along with larger and faster primary memory devices. Improvements in secondary storage systems have not kept pace. The performance of RISC microprocessors has been increasing by more than 50 per year [18]; disk transfer rates have only improved by about 20 each year [5] This has transformed many CPU bound applications to I O bound. Amdahl [2] predicted three decades ago that, unless accompanied by corresponding increases in secondary storage performance, substantial increases in microprocessor ....
....14] For the stripe write and the read modify write, it is crucial that a method be available for caching or buffering the contents of the disks accessed within the stripe. We assume throughout the presence of a buffer which is capable of holding the contents of the stripe across all disks (see [18]) otherwise, writes of each disk within the stripe must be treated as independent, and in general check disks are written and rewritten for each information disk written (see [7, 21] 2.3 Writes in multiple erasure systems When multiple erasure codes are used, analogues of the stripe write and ....
D. A. Patterson and J. L. Hennessy. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann, San Mateo, California, 1994.
....by the OOTI course. 20 CHAPTER 3 PROJECT ORGANISATION Chapter 4 Disk Measurements In the project startup phase, it was found that not all hardware performance figures which were relevant for CMS storage management could be found in literature. It turned out that the standard sources, like [15] [16], 17] 18] and [19] did not contain enough information to accurately predict performance in some important disk bound HEP data access scenarios. Also, it was found that none of the CMS and RD45 members of CMS were already performing, or planning to perform, a detailed study of disk bound data ....
J. L. Hennessy, D. A. Patterson. Computer organization and design: the hardware/software interface. San Mateo, Morgan Kaufmann, 1994. ISBN 1-55860-281-X.
....data arrives, if the cache is full, some element within the cache must be ejected so that the new data can enter. The decision as to which element is ejected is termed the cache s replacement policy.In all experiments in this work, the replacement policy is true Least Recently Used (LRU) [31]. 44 5.1.1 Experimental Configurations In order to simplify the analysis, the experiments are split into two primary configurations: first, a system where the memory macro acts as a page space,whichis defined as an area in memory capable of accepting relatively large contiguous chunks of memory ....
....The measured translation cost represents the upper bound on overhead, as it assumes that a given node must ask the directory each time an o# chip communication is necessary. 6.3. 1 Interconnection Networks The three interconnection networks simulated are: a 2 d mesh, a ring, and a binary hypercube [31]. Each network is assumed to consist of up to 2048 nodes. Thus, the mesh used is 32x32, the ring is 2048 around, and the hypercube is 10 dimensional. All communications are presumed to have occurred ideally given the routing algorithm. On the ring, this means that the choice of going left or ....
David A. Patterson and John L. Hennessy. Computer Organization and Design: The Hardware/Software Interface, 2ed. Morgan Kaufmann Publishers, 1997.
....designed the user interface of these two tools. Their integration into a real programming environment for Scheme is detailed in the paper. 1. INTRODUCTION Since CPU speeds continue to increase faster than memory speeds, memory is and will increasingly be the factor that limits performance [8]. Excessive memory use has two drawbacks: The program itself makes less e ective use of the higher layers in the memory hierarchy, and it may interfere with other processes running on the same machine. Functional languages have a reputation for memory consumption. In order to deliver fast ....
D. Patterson and J. Hennessy. Computer Organization and Design The hardware/software interface. Morgan Kaufmann, 2nd edition, 1998.
....whether or not a given algorithms is efficient. One disadvantage of the simulation tool is that it is very slow, on the other hand, one only needs to run a simulation once, since there are no fluctuations due to the machine s workload. 9. 1 Cache model A quadruple is used to describe a cache ( PH94] IBM94] Q = n; w; s; a) 8 : n number of cache lines w width of each cache line s the number of splits for each cache line a the associativity (number of congruence classes) The size of the cache is nw bits, or nw=8 bytes or nw=64 double precision floating point ....
David A. Patterson and John L. Hennessy. Computer Organization and Design The Hardware/Software Interface. Morgan Kaufmann Publisher, 1994.
....technology. With this progress came faster microprocessors as well as larger and faster primary memory devices. Improvements in secondary storage systems, on the other hand, have not kept pace. While the performance of RISC microprocessors has been increasing by more than 50 per year [60], disk transfer rates, which depend on the speed of mechanical movements and magnetic media densities, have only improved by about 20 each year [53] This phenomenon has transformed many computationally bound applications to being I O bound. Indeed, Amdahl [51] predicted about three decades ago ....
D.A. Patterson & J.L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, Morgan Kaufmann, San Mateo, Ca. (1994).
....subsections, we will briefly describe each step in the design process. 2.1 Defining goals At the beginning of the project, we defined the architectural goals of the processor. The starting point for the design team was the book we followed in a previous computer design and organization course [1]. To give us a better idea of what some of the realistic goals were, we studied the architecture of several modern CPUs: Intel Pentium TM [2] DEC Alpha TM [3] MIPS R4000 TM [4] Motorola 68040 TM [5] HP s Precision TM [6] and IBM Motorola PowerPC 601 TM [7] We compared the design ....
....series. Subsequently, we have also learned of the implementation of the P6 architecture [8] 2.2 Specification of the Processor Our CPU included a simple instruction set which can be easily implemented as a superscalar, superpipelined machine. We borrowed the instruction set from MIPS R2000 [1]. We enhanced the MIPS R2000 instruction set with direct memory operand instructions for all R type instructions. The processor was also to include separate data and instruction caches, each of 8KB. More specification details are provided in Appendix B. 4 2.3 Lay out of architecture Having ....
[Article contains additional citation context not shown here]
D. A. Patterson and J. L. Hennessy, "Computer Organization and Design - The Hardware /Software Interface," Morgan Kaufmann, 1994.
....the distinguishing prefixes [4] As discussed in [2] multiple word integers are not an exotic special case. For example, the IEEE 754 floating point standard is designed so that the ordering of floating point numbers can be deduced by perceiving their representations as multiple word integers [11]. Finally, it should be mentioned that we do not need to know n in advance. In [14, Proof of Theorem 4] based on standard doubling techniques, the details are given on how to dynamically increase decrease the capacity of priority queues. 3 2 The general reduction This section is devoted ....
J.L. Hennessy and D.A. Patterson, Computer Organization and Design: the Hardware /Software Interface, Morgan Kaufmann, San Mateo, CA, 1994.
....Spawn and multioperand instructions such as prefix sum. Using standard C scoping, the Joins are implicit. A Join implementation comprises a parallel sum operation for monitoring the number of terminating threads. The XMT instruction set extends the standard serial MIPS instruction set (see [HP94]) The table describes a few new instructions. A fuller specification for the explicit parallel instruction set (also referred to as spawn join instruction set) can be found in the extended summary version of [VDBN98] it describes the new instructions and how they can be efficiently implemented ....
Hennessy J.L., Patterson D.A., Computer Organization and Design - The Hardware / Software Interface. Morgan/Kaufmann. 1994.
....is more fine grained than the algorithm level) could need less tight synchronization than lock step; this could lead to better performance in our model. 11 2.1.2. The instruction set level The instruction set is based on a standard (serial) instruction set (we chose the MIPS version used in [PH94]) with a few additions. Highlights of the instruction set (i) Transition from the serial to the parallel state is done by the Spawn instruction and a thread terminates when it hits a Join instruction, for a transition back into the serial state. These Spawn and Join instructions are meant to be ....
....incrementing y. Once y reaches n, a transition into serial state occurs and n gets the size of the compacted array. Finally, note that the program remains correct subject to IOS semantics. Instruction set level For simplicity, we provide next an informal instruction code. For definitions see [PH94]. Instruction Cycle no: Comments : li R1,0 1 move R2, Rn 1 Spawn R3,0,R2,4 2 li R1 ,1 3 lwa R2 ,B OFF(R0) 4[R0 ] 1 Gamma 5 prefetch lw R3 ,C OFF(R2 ) 6 Gamma 10 bne R3 ,R1 ,L 11 PS R1,R1 12 lwa R4 , A OFF(R0) 4[R0 ] 8 Gamma 12 prefetch swa R4 , D OFF(R0) 4[R1 ] 13 L : Join ....
D.A. Patterson and J.L. Hennessy. Computer Organization and Design The Hardware /Software interface. Morgan Kaufmann, San Mateo, California, 1994.
....summation, a parallel sum immediate and a parallel mark are also added to the machine. We also proposed to implement the join instructions by using the parallel sum immediate hardware. 2.2 General Comments 1. The instructions set is based on the MIPS instruction set, presented in appendix A of [1]. 2. Instructions NOT mentioned here are the same (except maybe in the bit count of each field) as in MIPS 3. Comments in the assembler files begin with a sharp pound (#) sign. 4. The instruction size has to be larger than 32 bits (as used in MIPS) due to additional information that has to be ....
Hennessy J.L., Patterson D.A., Computer Organization and Design - The Hardware / Software Interface. Morgan/Kaufmann. 1994.
....from the high level code (C and extended C) to the optimized assembly code, which has actually been done manually, is feasible with known compiler techniques. 6. The Instruction Set The explicit parallel instruction set is based on a standard (serial) instruction set (we used the MIPS version in [HP94]) with some additions. It includes a set of instructions that implement these new ideas and was appropriate for the problems we studied. In [VDBN98] we described the new ideas and their implementation in the instruction set. Since a key feature of an instruction set model is implementability we ....
Hennessy J.L., Patterson D.A., Computer Organization and Design - The Hardware / Software Interface. Morgan/Kaufmann. 1994.
....already stored in R will be lost during the execution of M 1 [62, pp. 15 17] Additionally, if M 1 , in turn, calls another macro, say M 2 , then the original caller of M 1 must take into consideration even similar restrictions that perhaps concern M 2 . ffl The MIPS assembly instruction set [32, 81] contains synthetic instructions that the assembler may expand into a couple of machine instructions. Sometimes the assembler needs a temporary register for storing intermediate results; modularity is still guaranteed since one of the hardware registers is reserved for the exclusive use of ....
D. A. Patterson and J. L. Hennessy. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann, San Mateo (California, USA), 1994.
....a fixed number of bits for the exponent followed by a fixed number of bits for the mantissa, where the exponent is expressed using biased notation. This is because lexicographic ordering of the binary strings representing numbers in this way is consistent with the numerical order of these numbers [26]. Similarly, our algorithm can be used for sets of rational numbers with numerators and denominators at most k 2 O(b) bits long, if each rational number x=y, where x 2 [0; 2 k 1] and y 2 [1; 2 k 1] is represented by the 3k bit binary representation of the integer bx4 k =yc. To see why, note ....
J. L. Hennessey and D. A. Patterson. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufman, San Mateo, CA, 1994.
....and memory architecture. Table 3 shows that the bandwidth needed for operand access is very high and the absolute addressing mode is most frequent. The separated bus of Harvard Architecture may be useful since this frequency is very high compared to the applications of general purpose processors[19]. To 7 maximize the bandwidth of the separated bus, a balanced pipeline should be implemented. The size of the immediate operand is also analyzed to determine the size of the constant field. Table 4 shows that 80 per cent of immediate operands are within 8 bit size and 90 per cent of immediate ....
....BIT ALU from MIU clock bit accumulator SHIFT REGISTER SHIFT REGISTER fuction code Fig. 9. Bit processing unit 3.3. 4 Word processing unit Three models are studied for the architecture of the word processing unit: stack architecture, register set architecture, and accumulator architecture[19]. Shimokawa et al. suggested a RISC processor of which the word processing unit is a kind of stack architecture[9] and Rho et al. proposed a RISC processor of which the word processing unit is a kind of register set architecture[20] Operations are performed on internal registers in these two ....
D. A. Patterson, and J. L. Hennessy, Computer Organization and Design: The Hardware / Software Interface, Morgan Kaufman Publishers Inc., 1994.
....parallel scalability than the parallel binary exchange approach. 5. Analysis for A Coarse grained Data Distribution The degree of parallelism indicates the extent to which a parallel program matches the parallel architecture. Speedup captures the performance gain when utilizing a parallel system [21]: True speedup is defined as the ratio of the time required to solve a problem on a single processor, using the best known sequential algorithm, to the time taken to solve the same problem using P identical processors. 21 . For the relative speedup the sequential time to be used is ....
D. Patterson and J. Hennessy, Computer Organization and Design: the hardware /software interface, Morgan Kaufmann, San Francisco, CA, 1994.
No context found.
D. A. Patterson and J. L. Hennessy. Computer Organization and Design: The Hardware/Software Interface, second edition, Morgan Kaufman Publishers, Inc., 1998.
No context found.
David A. Patterson and John L. Hennessy. Computer Organization and Design: the hardware/software interface. Morgan Kaufmann Publishers, Inc, San Mateo, California, 1994.
No context found.
D. Patterson, and J. Hennessy, Computer Organization and Design: The Hardware/ Software Interface, Morgan Kaufmann, San Mateo, Calif., 1994; http://www.mkp.com.
No context found.
David Patterson and John Hennessy, Computer Organization and Design: The Hardware / Software Interface, Morgan Kaufmann Publishers, 1994.
No context found.
David Patterson and John Hennessy, Computer Organization and Design: The Hardware / Software Interface, Morgan Kaufmann Publishers, 1994.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC