| D. E. Lenoski et al. Design of Stanford DASH Multiprocessor. In Technical Report CSL-TR-89-403. Stanford Univ., December 1989. |
....to the processor elements via switching networks. This type of directory cache is not very different to the snoop cache, and the only difference is that snoop caches exploit the merits of the bus connections between processors and main memory. At the end of the 1980s new types of directory caches [31, 1, 6] were proposed and developed for use with large scale DSM machines. To ensure bandwidth between the processor (or a few processors) and main memory module, new machines handle the processor(s) and memory module as a combined building block of the system. We call such a building block a node of the ....
....cache system: the on chip caches of the processors (L1 caches) are at level 1, snoop caches (L2 caches) are at level 2 and the memory banks of the nodes are at level 3. Page level directory schemes such as IVY[32] were used in the MBP systems instead of cache block level directory schemes [31], in order to reduce the amount of directory memory and to use translation look aside buffers (TLBs) to accelerate consistency preserving operations. The unit of data transmission, however, is the size of an L1 cache block instead of that of a memory page corresponding to a TLB entry. This is to ....
[Article contains additional citation context not shown here]
D. E. Lenoski et al. Design of Stanford DASH Multiprocessor. In Technical Report CSL-TR-89-403. Stanford Univ., December 1989.
....machines, the memory is implemented as a single module accessed over the bus, while in most large scale machines the memory is physically distributed among all the processing nodes. Virtually all machines that support the shared memory programming abstraction provide local caches at each processor [1, 15, 16, 18, 19]. Caches automatically replicate memory locations close to the processor and avoid expensive network traversals for most memory accesses. The existence of caches does not change the memory abstraction. The hardware is responsible for maintaining the illusion that all reads and writes take place ....
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz, and M. Lain. Design of the Stanford DASH Multiprocessor. Computer Systems Laboratory TR 89-403, Stanford University, December 1989.
....we explored the implications for applications of this higher latency. We measured the effect of network latency on the operation of Mether Version 1, which provided a strongly coherent memory model. The measurements and their implications are applicable to other systems such as MemNet[5] Dash[6], Ivy[12] and Plus[1] We determined that latency was a factor that could not be ignored in a networking environment. The result of this research was Mether Version 4, which we used to support the Monte Carlo program described in Section 4. Mether Version 4 provides an extended memory model to ....
Lenoski et. al. Design of the stanford dash multiprocessor. Technical report, Stanford University, December 1989.
....lower costs than tightly coupled multiprocessors. They can benefit from heterogeneity and there is potential for providing highly available systems since component replication is not as costly as in other architectures. Second, the success of the shared address space abstraction: Previous work [13, 6] has shown that a shared address space can be provided e#ciently on tightly coupled hardware DSM systems up to the 128 processor scale and most vendors are designing hardware cache coherent machines, targeting both scientific as well as commercial applications. Traditionally, SAN clusters have ....
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. Design of the Stanford DASH multiprocessor. Technical Report CSL-TR-89-403, Stanford University, December 1989.
....network latency on the operation of Mether Version 1, which provided a strongly coherent memory model, and we determined that latency was a factor which could not be ignored in a networking environment. The measurements and their implications are applicable to other systems such as MemNet[12] Dash[14] and Ivy[24] Mether Version 2 provides an extended memory model to applications. The extensions allow programs to make effective use of the processors and network while minimizing difficulties due to latency. Mether presents users with a virtual address space partitioned into pages. An Mether ....
Lenoski et. al. Design of the stanford dash multiprocessor. Technical report, Stanford University, December 1989.
....that the performance of a Sather program is close to a comparable C program. ffl It is a relatively clean language, offering certain constructs such as strong typing, storage management and class parameterization which are not available in efficient object oriented languages such as C . 5 [34] and [33] are examples of work in building NUMA multiprocessors. 6 [44] gives a non exhaustive list of references of work on parallel debuggers. REFERENCES 88 Acknowledgements Thanks to Krste Asanovic, Joachim Beer, Jeff Bilmes, Steve Omohundro, Abhiram Ranade and Heinz Schmidt, who have ....
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam. Design of the stanford dash multiprocessor. Technical Report CSL-TR-89-403, Computer Systems Laboratory, Stanford University, December 1989.
.... are gaining in popularity and have been employed in many recent existing or proposed machines, including the Thinking Machines CM2 [Hill85] Intel iPSC and Paragon, Cosmic Cube [Seit85] MIT Alewife [Agar90] Tera supercomputer [Alve90] CMU Intel iWarp [Bork90] and Stanford DASH multiprocessor [Leno89]. The most commonly used direct networks are variants of the k ary n cube. Recall that a k ary n cube consists of N=k n nodes, arranged in n dimensions with k nodes per dimension. Figure 4.1 illustrates several different k ary n cubes. Each node is connected via a direct link to its nearest ....
Lenoski, D., J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam, Design of the Stanford DASH Multiprocessor, CSL Technical Report 89-403, Stanford University, December 1989.
....1. UMA or Uniform Memory Access multiprocessors, 2. NUMA or No Uniform Memory Access multiprocessors, and 3. NORMA or No Remote Memory Access multiprocessors. Much attention has paid to CC NUMA (Cache Coherency NUMA [71] SADM machines, where the data has a permanent home location, such as DASH [52] and Alewife [1] and UMA SADM machines, where the data can migrate among processors and so has no home location, such as KSR1[64] and DDM [80] Recently, software coherence on NCC NUMA machines is suggested to be a possible more cost effective approach to large scale shared memory multiprocessing ....
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. Design of the stanford dash multiprocessor. Technical Report CSL-TR-89403, Stanford University, December 1989.
.... proposed machines that make use of direct networks are: Caltech Cosmic Cube [4] Caltech Mosaic [5] CMU Intel iWarp [6] 7] Connection Machine [8] HORIZON [9] Intel iPSC and Paragon; MIT Alewife [10] MIT J machine [11] MuNet [12] Stanford DASH Multiprocessor [13]; Thinking Machines CM2 [8] and . Cray T3E [14] PE PE SW PE PE SW SW SW PE PE PE PE (a) SW PE SW PE SW PE SW PE (b) Fig. 1.1. Network examples. a) Indirect network. b) Direct network. We will focus on multidimensional direct networks with unidirectional or bidirectional links between ....
D. Lenoski et al., "Design of the Stanford DASH multiprocessor," Comput. Syst. Lab. TR 89-403, Stanford Univ., Dec. 1989.
....interface Network Figure 1 3. A shared memory multiprocessor. 15 interface, a shared memory interface usually generates these messages directly in hardware or firmware. Shared memory interfaces in most large multiprocessors use a form of coherence protocol called a directory protocol [4, 71, 66, 25, 130, 69, 2]. Directory protocols maintain a directory entry per memory block that records which processor(s) currently cache the block. On a processor cache miss for a remotely cached block, the shared memory interface sends a coherence message over an interconnect to a directory entry, which often forwards ....
....records whether or not a memory block is idle (that is, no cached copies exist) a writable copy of the block exists, or one or more readable copies of the block exist. To simplify the discussion I only consider a full map and write invalidate directory protocol, such as the SGI Origin protocol [69]. A directory entry in such a protocol maintains logical pointers to all caches that hold a valid copy of the block and invalidates all outstanding copies of the block when one processor wishes to write to it. Similarly, a block in a cache is usually in one of three quiescent states: invalid, ....
[Article contains additional citation context not shown here]
Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica Lam. Design of the Stanford DASH Multiprocessor. Technical Report CSL-TR-89-403, Computer System Laboratory, Stanford University, December 1989. 181
.... that use such networks are the MuNet [12] Ametek 2010 [26] the Caltech Mosaic [3] the MIT J machine [9] and the CMU Intel iWarp [4] Some recent distributed shared memory designs are also planning to use low dimensional direct networks, e.g. HORIZON [18] the Stanford DASH Multiprocessor [20], and the MIT Alewife machine [2, 6] The choice of the optimal network for a multiprocessor is highly sensitive to the assumptions about system parameters and the constraints that apply on the design. System parameters include, among other factors, message size and the degree of communication ....
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. Design of the Stanford DASH Multiprocessor. Computer Systems Laboratory TR 89-403, Stanford University, December 1989.
....machines, the memory is implemented as a single module accessed over the bus, while in most large scale machines the memory is physically distributed among all the processing nodes. Virtually all machines that support the shared memory programming abstraction provide local caches at each processor [1, 15, 16, 18, 19]. Caches automatically replicate memory locations close to the processor and avoid expensive network traversals for most memory accesses. The existence of caches does not change the memory abstraction. The hardware is responsible for maintaining the illusion that all reads and writes take place ....
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. Design of the Stanford DASH Multiprocessor. Computer Systems Laboratory TR 89-403, Stanford University, December 1989.
....machines, the memory is implemented as a single module accessed over the bus, while in most large scale machines the memory is physically distributed among all the processing nodes. Virtually all machines that support the shared memory programming abstraction provide local caches at each processor [1, 17, 18, 20, 21]. Caches automatically replicate memory locations close to the processor and avoid expensive network traversals for most memory accesses. Authors e mail: fagarwal,guttag,chadjic,mariosg lcs.mit.edu. Useful Work 600 800 1000 1200 1400 0 5 10 15 20 25 ....
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. Design of the Stanford DASH Multiprocessor. Computer Systems Laboratory TR 89-403, Stanford University, December 1989.
....multiprocessors are an important class of parallel computers. Most shared memory multiprocessors accelerate memory accesses using per processor caches. Caches are usually made transparent to software with a cache coherence protocol. Most large shared memory multiprocessors use directory protocols [3, 21, 19]. Directory protocols maintain a directory entry per memory block that records which processor(s) currently cache the block. On a miss, a processor sends a coherence message over an interconnect to a directory entry, which often forwards message(s) to processor(s) currently caching the block, who ....
....records whether or not a memory block is idle (that is, no cached copies exist) a writable copy of the block exists, or one or more readable copies of the block exist. To simplify our discussion we only consider a full map and write invalidate directory protocol, such as the SGI Origin protocol [21]. A directory entry in such a protocol maintains logical pointers to all caches that hold a valid copy of the block and invalidates all outstanding copies of the block when one processor wishes to write to it. Similarly, a block in a cache is usually in one of three quiescent states: invalid, ....
[Article contains additional citation context not shown here]
Daniel Lenoski, JamesLaudon,KouroshGharachorloo,AnoopGupta,JohnHennessy,MarkHorowitz,andMonicaLam.Design of the Stanford DASH Multiprocessor. Technical Report CSL-TR-89-403, Computer System Laboratory, Stanford University, December 1989.
No context found.
D. E. Lenoski et al. Design of Stanford DASH Multiprocessor. In Technical Report CSL-TR-89-403. Stanford Univ., December 1989.
No context found.
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennesey, M. Horowitz, and M. Lam. Design of the stanford dash multiprocessor. Technical Report CSL-TR89 -403, Stanford University, Palo Alto, California, December 1989.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC