| D. E. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam, "The Stanford Dash multiprocessor," IEEE Computer, vol. 25, no. 3, pp. 63-79, March 1992. |
....Shared Memory) systems. Researchers in computer architecture and DSM have been studying the cache coherence problem for a long time. Many concepts and theories such as several consistency models [35, 22, 16, 21, 7] and numerous practical systems such as the MIT Alewife [2] the Stanford Dash [36], CRL [30] and so on, have been developed to attack the problem. This problem is no different in the field of distributed virtual environments, so we simply borrow the idea from the DSM literature and modify it slightly to fit our specific requirements. This technique is relatively mature due to ....
....LP j have a shared copy of s k . To solve these problems, we present a coherence protocol in the following section. 3. 7 Cache Coherence We employ a fixed owner, directory based invalidate protocol similar to that used in many hardware or software based DSM (distributed shared memory) systems [2, 36, 30]. Directory based coherence reduces network traffic because it does not use a broadcast scheme for one LP to send invalidate update messages to all other LPs, which usually generates network traffic that is proportional to the number of LPs squared (M ) Invalidate coherency protocols [28] ....
D. Lenoski, J. Laudon, K. Gharachorloo, W. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford Dash multiprocessor. IEEE Computer, pages 63--79, March 1992.
....in a scalable topology, such as a mesh or a hypercube. Each node contains a few processors, a portion of the globally distributed memory, a node controller, and possibly some I O devices. The node controller handles all memory coherency and I O traffic going through the node. Several research [2, 35, 37] and commercial [18, 36, 39] projects have built sharedmemory multiprocessors based on the above mentioned design. These machines have been available for several years, and are becoming a popular platform in the server market. Besides traditional computation intensive workloads, such as raytrace ....
Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica S. Lam. The Stanford DASH multiprocessor. Computer, 25(3):63--79, March 1992.
....a directory controller, a network interface, and a portion of the main memory of the system (Figure 2) The processor is a 6 issue dynamic superscalar. The caches are non blocking and write back. The system uses a full map directory and a cache coherence protocol similar to that used in DASH [12]. The directory controller is extended to support logging and distributed parity needed for ReVive, as described in Section 3.2. Contention is accurately modeled in the entire system, including the busses, the network and the main memory. Table 3 lists the main characteristics of the ....
D. Lenoski et al. The Stanford Dash Multiprocessor. IEEE Computer, pages 63--79, Mar. 1992. It is Dash, not DASH.
....1(b) Cache coherence is maintained by a bus based snoopy protocol within an SMP and enforced by CCs using a directory based cache coherence protocol across the machine. Our directory based protocol uses an invalidation based approach. Each CC also connects to either a Remote Access Cache (RAC) [10] or an L3 cache. A RAC keeps recently accessed copies of remote memory lines. An L3 cache keeps both local and remote memory lines. If the RAC or L3 can satisfy a local request to a remote memory line, the request does not need to traverse the network in order to fetch that memory line from the ....
D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH Multiprocessor. IEEE Computer, pages 63--79, March 1992.
....memory modules in a transparent way, although it may su#er increased latencies when accessing memory located on remote clusters. SMPs with this type of physical memory organization are called Non Uniform Memory Access (NUMA) SMPs. Examples of such NUMA SMP architectures include Stanford s Dash [21] and Flash [17] architectures, University of Toronto s Hector [42] and NUMAchine [41] architectures, Sequent s NUMA Q [32] architecture and SGI s Cray Origin2000 [19] NUMA SMPs that implement cache coherency in hardware are called CC NUMA SMPs. In contrast, multiprocessors based on a single bus ....
Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica S. Lam. The Stanford Dash multiprocessor. Computer, 25(3):63--79, March 1992.
....with low overhead, reaping the communication reduction and lower latency benefits without computational overhead. Of course, there are a wealth of cache system optimizations proposed within parallel machines which could be applied in an application specific manner to achieve best performance [43, 23, 25]. 4.4 Custom Prefetching The pointer based data access to sparse matrix data structures in current day memory hierarchies yields poor performance because the indirection introduces main memory and memory hierarchy latencies into the innermost computational loop. Techniques such as software ....
Lenoski, D., and et al. The Stanford DASH Multiprocessor. IEEE Computer (Mar 1992), 63--79.
....transfer a nonuniform communication architecture. A NUCA is an architecture in which the unloaded latency for a processor accessing data recently modified by another processor differs at least by a factor of two, depending on where that processor is located. DASH was the first NUCA machine [13]. Each DASH node consists of four processors connected by a snooping bus. A cache to cache transfer from a cache in a remote node is 4.5 times slower than a transfer from a cache in the same node. We call this the NUCA ratio. Sequent s NUMAQ has a similar topology, but its NUCA ratio is closer to ....
....locks to provide a hardware queued lock behavior without requiring any software support or new instructions [21] The load linked store conditional instructions are used to demonstrate a possible implementation. Stanford DASH uses directories to indicate which processors are spinning on the lock [13]. When the lock is released, one of the waiting nodes is chosen at random and is granted the lock. The grant request invalidates only that node s caches and allows one processor in that node to acquire the lock with a local operation. This scheme lowers both the traffic and the latency involved in ....
D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam. The Stanford Dash Multiprocessor. IEEE Computer, 25(3):63-- 79, Mar. 1992.
No context found.
D. E. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam, "The Stanford Dash multiprocessor," IEEE Computer, vol. 25, no. 3, pp. 63-79, March 1992.
No context found.
D. Lenoski, J. Laudon, K. Gharachorloo,W.Weber, A. Gupta, J. Hennessy,M.Horowitz and M. Lam. The Stanford Dash multiprocessor. IEEE Computer, 25(3):63--79, 1992.
No context found.
D. Lenoski et al. The Stanford Dash Multiprocessor. IEEE Computer, pp. 63--79, March 1992.
No context found.
D. Lenoski, et al.: "The Stanford DASH multiprocessor", IEEE Computer, 25(3), pp.6379, March 1992.
No context found.
D. Lenoski, J. Laudon, K. Garachorloo, W.-D. Weber, A. Gupta, J. Henessy, M. Horowitz, and M.S. Lam. The stanford dash multiprocessor. IEEE Computer, 25(3):63--79, 1992.
No context found.
D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63--79, Mar. 1992.
No context found.
D. Lenoski et al. The Stanford Dash Multiprocessor. IEEE Computer, pages 63--79, Mar. 1992. It is Dash, not DASH.
No context found.
Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John L. Hennessy, Mark Horowitz, and Monica S. Lam. The Stanford Dash Multiprocessor. IEEE Computer, 25(3):63--79, March 1992.
No context found.
Lenoski, D., et al.: The Stanford DASH Multiprocessor. IEEE Computer , 25(3):63-- 79, March 1992.
No context found.
D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam. The Stanford DASH multiprocessor. IEEE COMPUTER, 25(3):63--79, March 1992.
No context found.
D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. L. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH multiprocessor. IEEE Computer, 25(3):63-- 79, Mar. 1992.
No context found.
Lenoski, D. et al. (1992) The Stanford Dash Multiprocessor. IEEE Comput. Mag., 25, 63--79.
No context found.
D. Lenoski, et al. The Stanford DASH Multiprocessor. IEEE Computer 25, 3 (March 1992), pp. 63--79.
No context found.
D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy,M.Horowitz, and M. S. Lam. The Stanford DASH multiprocessor. IEEE Computer, March1992.
No context found.
Daniel Lenoski, James Laudon, Kourosh Gharachorlo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz and Monica S. Lam, "The Stanford Dash Multiprocessor", Proc. 1992.
No context found.
D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam. The Stanford DASH multiprocessor. IEEE COMPUTER, 25(3):63--79, March 1992.
No context found.
D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63--79, Mar. 1992.
No context found.
D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63--79, March 1992.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC