| i860 Microprocessor Family Programmer's Reference Manual, Intel Corporation, 1991. |
....assume the processor can perform non caching loads and stores so that nonunit stride streams can be accessed without concomitantly accessing extraneous data and wasting bandwidth. While not a common architectural feature, some commercial processors such as the Convex C 1 [Wal85] and Intel i860 [Int91] include such cache CPU mem mem mem MSU Figure 1 Stream Memory Controller FIFO scalar accesses CACHE SBU bypassing . Others, such as the DEC Alpha [DEC92] provide a means of specifying some portions of memory as non cacheable. 4. Simulation Environment We have simulated a wide ....
i860 XP Microprocessor Data Book, Intel Corporation, 1991.
....Intel 80x196, Intel386 EX, and commercial high performance processors such as Pentium and Pentium Pro [7] provide on chip timers driven by the processor clock to implement system clocks with better time granularity. Computers that do not use such processors usually use an external hardware timer [6], 3] on the processor bus. But, these timers have been known to lose time during operation [19] For example, in order to set a new value for the timer, usually a register has to be updated. But certain operations such as DMA and high priority interrupts can preempt time management functions. ....
Microprocessor and Peripheral Handbook, volume II, chapter 6. Intel Corporation, Santa Clara, CA, 1989.
....access ordering requires a small amount of special purpose hardware, and our static and dynamic access ordering techniques both require non caching load instructions. Although rare, these instructions are available in some commercial processors, such as the Convex C 1 [Wal85] and Intel i860 [Int91]. Most current microprocessors (including the DEC Alpha [Dig92] MIPS [Kan92] Intel 80486, Pentium, and i860 [Tab91] and the PowerPC [Mot93] provide a means of specifying some memory pages as non cacheable, even though these mechanisms is not generally accessible to the user. Our investigation ....
i860 XP Microprocessor Data Book, Intel Corporation, 1991.
....Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center took delivery of one of the first two Intel Touchstone Gamma prototypes, and it became available for testing in January 1990. The Intel Touchstone Gamma system is based on the new 64 bit i860 microprocessor by Intel [4]. The i860 has over 1 million transistors and runs at 40 MHz (the initial Touchstone systems were delivered with 33 MHz processors, but these have since been upgraded to 40 MHz) The theoretical peak speed is 80 MFLOPS in 32 bit floating point and 60 MFLOPS for 64 bit floating point operations. ....
i860 64-Bit Microprocessor Programmer's Reference Manual, Intel Corporation, Santa Clara, CA, 1990.
....iPSC 860 and will be referred to as the iPSC 860 for the remainder of the paper. For a review of early experiences with the iPSC 860 at Ames Research Center and Oak Ridge National Laboratory, see [1] and [5] respectively. The iPSC 860 system is based on the 64 bit i860 microprocessor by Intel [6]. The i860 runs at 40 MHZ (the initial system was delivered with 33 MHZ processors, which were upgraded to 40 MHZ) The theoretical peak speed is 80 MFLOPS for 32 bit floating point and 60 MFLOPS for 64 bit floating point operations. There are thirty two 32 bit integer address registers, and ....
i860 64-Bit Microprocessor Programmer's Reference Manual, Intel Corporation, Santa Clara, CA, 1990.
....EXPERIMENTAL IMPLEMENTATION In order to demonstrate the viability of dynamic access ordering, we have developed an experimental Stream Memory Controller system. This proof of concept version is implemented as a single, semi custom VLSI integrated circuit interfaced to an Intel i860 host processor [Int91]. The SMC ASIC was designed using VHDL for state machine specification, Mentor Graphics Corporation s Design Architect for schematic capture, and Cascade Design Automation s Epoch tool for hardware synthesis [Cas93,Men93] The i860 was selected because it was readily available, and because it ....
i860 XP Microprocessor Data Book, Intel Corporation, 1991.
.... L63: mul 3, s0, s1 addu 6, 6, 3 addu 5, 5,1 blt 5, 9,L63 Figure 2 Example Code for Dot Product do 10, i=1, 1000 10 s = s a(i) b(i) a) MIPS (b) MIPS w streaming Smarter Memory = Better Performance: Improving Effective Bandwidth for Streams DRAFT DO NOT DISTRIBUTE 5 host processor [2]. Both versions of the SMC ASIC were designed using VHDL for state machine specification, Mentor Graphics Corporation s Design Architect for schematic capture, and Cascade Design Automation s Epoch tool for hardware synthesis [3] 4] The i860 was selected because it was readily available, and ....
i860 XP Microprocessor Data Book, Intel Corporation, 1991.
....we assume the processor can perform non caching loads and stores so that nonunit stride streams can be accessed without concomitantly accessing extraneous data and wasting bandwidth. While not a common architectural feature, some commercial processors such as the Convex C 1 [Wal85] and Intel i860 [Int91] include such cache CPU mem mem mem mem MSU Figure 1 Stream Memory Controller state FIFO FIFO scalar accesses FIFO CACHE state state SBU Hardware Support for Dynamic Access Ordering: Performance of Some Design Options 7 bypassing . Others, such as the DEC Alpha [DEC92] provide a means ....
i860 XP Microprocessor Data Book, Intel Corporation, 1991.
....enable element reuse in later implementation phases. In order to rapidly validate the SMC concept, we have chosen to add an SMC system to an existing microprocessor system. The Intel i860 was selected for its support of vector operations and non cacheable floating point load and store instructions [13], which will be used to access stream operands. Using this approach has the disadvantage that the stream buffers will be external to the processor, and will therefore incur a higher access cost than the internal cache. However, accesses to the stream buffers should be fast enough that using the ....
i860 XP Microprocessor Data Book, Intel Corporation, 1991.
....we assume the processor can perform non caching loads and stores so that nonunit stride streams can be accessed without concomitantly accessing extraneous data and wasting bandwidth. While not a common architectural feature, some commercial processors such as the Convex C 1 [Wal85] and Intel i860 [Int91] include such cache bypassing . Others, such as the DEC Alpha [DEC92] provide a means of specifying some portions of memory as non cacheable. 4. Simulation Environment In order to validate the SMC concept, we have simulated a wide range of SMC configurations and benchmarks, varying FIFO depth, ....
i860 XP Microprocessor Data Book, Intel Corporation, 1991.
....machine with additional hardware support for overlapping loop iterations [5] It relies on software pipelining techniques in the compiler to take advantage of the special overlapped loop support. The Cydra 5 has a 256 bit instruction word capable of controlling seven functional units. The i860 [13] from Intel was introduced in 1989. It has two features which warrant inclusion in the list. First, it has a dual instruction mode in which two instructions are executed at the same time. Second, it has some special dual operation instructions which manipulate both the adder and multiplier ....
....essential element in VLIW architectures. It is also in fairly wide use. Perhaps most importantly, though, is that we were able to use the Portland Group C compiler which is able to achieve software pipelining of many loops. 5.1.1. Architectural Overview According to the i860 programmer s manual [13], instructions can be grouped into two classes . The first class consists of what are called core instructions which are executed by the integer unit. Core instructions include integer arithmetic, loads and stores between memory and registers (including floating point registers) control ....
i860 64-Bit Microprocessor Programmer's Reference Manual, Intel Corporation, 1990.
....Experimental Implementation In order to demonstrate the viability of dynamic access ordering, we have developed an experimental Stream Memory Controller system. This proof of concept version is implemented as a single, semi custom VLSI integrated circuit interfaced to an Intel i860 host processor [Int91]. The SMC ASIC was designed using VHDL for state machine specification, Mentor Graphics Corporation s Design Architect for schematic capture, and Cascade Design Automation s Epoch tool for hardware synthesis [Cas93,Men93] The i860 was selected because it is both readily available and it provides ....
i860 XP Microprocessor Data Book, Intel Corporation, 1991.
....branch instructions can be in the pipe at once. Graphics support. None. Communication support. No support for message passing in a distributed memory model. Technology. Available in a 527 pin PGA, fabricated with CMOS 0.5 micron technology with 6 million transistors and runs at 3.3V. i860[16, 17, 18] Noteworthy instructions. A FLUSH instruction empties the writeback data cache by testing the dirty bit of each cache line and writing to memory the lines with their bit set. The execution time of this instruction depends on the number of dirty lines in the cache. Prefetching of floating point ....
i860 64-bit Microprocessor Hardware Design Guide. 1989, Intel Corporation.
.... communication transactions, the latency and bandwidth of the interconnection networks pose the fundamental limit to the scalability of an MPP [13] The hierarchical ring interconnect of the KSR1, the fat tree network in the Thinking Machines CM 5 [4] and the mesh network in the Intel Paragon [3] are very different structurally and have different characteristics. Since application performance on MPP systems is greatly affected by the communication capabilities of the architectures, it is essential to evaluate the specific communication requirements of target applications on alternate MPP ....
"Paragon XP/S Product Overview," Intel Corporation, Hillsboro, OR, 1991.
....prevent many unnecessary disk accesses, and prefetching the file sequentially can improve throughput. Application programming interfaces (APIs) for current commercial and experimental file systems provide limited support for application specification of global access patterns. In Intel s PFS [8] the programmer can select one of five modes that describe common global access patterns. MPI IO [23] a proposed high level API for parallel input output, allows a developer to specify arbitrarily complex access patterns. IBM s PIOFS also provides an interface to describe file views. To provide ....
....the performance improvements from global classification, we show results from benchmarks and parallel applications from the Scalable I O Initiative application suite on the Intel Paragon XP S. We used PPFS as an intermediary between the application requests and the underlying Intel PFS file system [8]. PFS is a parallel file system that stripes data over disks on input output nodes using a default 64 KB stripe size. In normal usage, applications provide access pattern information by specifying PFS modes and have limited control over input output node buffering. In our hybrid system, our ....
[Article contains additional citation context not shown here]
Paragon XP/S Product Overview. Intel Corporation, 1991.
....20.0 Ring 1000 50.0 126 18 [7] MIT J Machine 12.5 4 Theta 4 Theta 2 Mesh 3200 256.0 7 N A 7 [36] MIT M Machine #100.0 4 Theta 4 Theta 2 Mesh 12800 128.0 10 154 21 [16] 23] Intel Delta 40.0 4 Theta 8 Mesh 216 5. 4 15 N A 10 [13] 34] Intel Paragon 50.0 4 Theta 8 Mesh 2800 56.0 12 N A 10 [22] [34] Stanford DASH 33.0 2 Theta 4 480 14.5 31 120 30 [28] 4 proc clusters Stanford FLASH 200.0 4 Theta 8 Mesh 3200 16.0 62 352 40 [19] Wisconsin T0 #200.0 none simulated N A N A 200 1461 40 [39] 37] Wisconsin T1 #200.0 none simulated N A N A 200 401 40 [39] 37] Cray T3D 150.0 4 Theta 2 Theta ....
Paragon XP/S product overview. Intel Corporation, 1991.
....multiprocessors, shared memory, message passing, bulk transfer, iterative solution, irregular, sparse matrix 1 Introduction Distributed shared memory and message passing are two dominant communication mechanisms in parallel systems. Most machines incorporate either message passing [Thi93a] Int91] Che93] or shared memory [LT88] LLG 92] BFKR92] SGI94] A few machines support both mechanisms [ABC 95] HKO 94] The precise benefit of shared memory versus message passing has remained an open question, largely due to the difficulty of finding experimental platforms supporting ....
Paragon XP/S product overview. Intel Corporation, 1991.
....For instance, an instruction set that gives the user flexibility in how memory is accessed makes it easier to exploit features of the memory system through access ordering. Although not a common feature, non caching loads are supported by some processors (as in the Convex C 1 [Wal85] or Intel i860 [Int91]) Others allow some regions of memory to be designated as non cacheable (as in the DEC Alpha [DEC92] Still other systems include special purpose hardware for streams (as in the Cray T3D [Cra95] the WM[Wul92] the Stream Memory Controller [McK95c] or the hardware proposed by Valero, et al. ....
....won t permit that degree of optimization. The graph on the right presents the same information in terms of the average number of cycles per stream access. 3. Experimental Platforms Our investigations of compile time access ordering target three different platforms: a node of the Intel iPSC 860 [Int91], the University of Virginia s Stream Memory Controller system [McK95c] and a node of the Cray T3D [Cra95] Some preliminary background on the relevant features of each memory system is helpful in understanding how changing the order of memory references affects performance. 3.1 Intel iPSC 860 ....
i860 XP Microprocessor Data Book, Intel Corporation, 1991.
....each additional byte (or any appropriate data unit) Such a fully connected model addresses emerging trends in many modern distributedmemory parallel computers and message passing communication environments. These trends are evident in systems such as Thinking Machines CM 5 [36] Intel s Paragon [38], NCUBE s nCUBE 2 [34] MIT s J Machine [20] IBM s Vulcan [10, 39] and the recently announced IBM s Scalable POWERparallel System 1 (SP1) and in environments such as Express [37] PARMACS [31] PICL [29] Zipcode [35] and Venus [4] These systems and environments generally ignore the specific ....
Paragon XP/S Overview, Intel Corporation, 1991.
No context found.
Microprocessor and Peripheral Handbook, Vol. II, Peripheral, Intel Corporation, Santa Clara, California, 1989.
No context found.
i486 Microprocessor Databook, 1989, Intel Corporation, Santa Clara, California
No context found.
i860 Microprocessor Family Programmer's Reference Manual, Intel Corporation, 1991.
No context found.
i860 XP Microprocessor Data Book, Intel Corporation, 1991.
No context found.
i860 XP Microprocessor Data Book, Intel Corporation, 1991.
No context found.
i860 XP Microprocessor Data Book, Intel Corporation, 1991.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC