478 citations found. Retrieving documents...
D. E. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam, "The Stanford Dash multiprocessor," IEEE Computer, vol. 25, no. 3, pp. 63-79, March 1992.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Techniques for Mitigating Lag-Time When Joining Interest Groups in.. - Shi (2000)   (Correct)

....Shared Memory) systems. Researchers in computer architecture and DSM have been studying the cache coherence problem for a long time. Many concepts and theories such as several consistency models [35, 22, 16, 21, 7] and numerous practical systems such as the MIT Alewife [2] the Stanford Dash [36], CRL [30] and so on, have been developed to attack the problem. This problem is no different in the field of distributed virtual environments, so we simply borrow the idea from the DSM literature and modify it slightly to fit our specific requirements. This technique is relatively mature due to ....

....LP j have a shared copy of s k . To solve these problems, we present a coherence protocol in the following section. 3. 7 Cache Coherence We employ a fixed owner, directory based invalidate protocol similar to that used in many hardware or software based DSM (distributed shared memory) systems [2, 36, 30]. Directory based coherence reduces network traffic because it does not use a broadcast scheme for one LP to send invalidate update messages to all other LPs, which usually generates network traffic that is proportional to the number of LPs squared (M ) Invalidate coherency protocols [28] ....

D. Lenoski, J. Laudon, K. Gharachorloo, W. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford Dash multiprocessor. IEEE Computer, pages 63--79, March 1992.


Virtual Clusters: Resource Mangement on Large Shared-Memory.. - Govil (2000)   (Correct)

....in a scalable topology, such as a mesh or a hypercube. Each node contains a few processors, a portion of the globally distributed memory, a node controller, and possibly some I O devices. The node controller handles all memory coherency and I O traffic going through the node. Several research [2, 35, 37] and commercial [18, 36, 39] projects have built sharedmemory multiprocessors based on the above mentioned design. These machines have been available for several years, and are becoming a popular platform in the server market. Besides traditional computation intensive workloads, such as raytrace ....

Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica S. Lam. The Stanford DASH multiprocessor. Computer, 25(3):63--79, March 1992.


ReVive: Cost-Effective Architectural Support for Rollback.. - Prvulovic (2002)   (5 citations)  (Correct)

....a directory controller, a network interface, and a portion of the main memory of the system (Figure 2) The processor is a 6 issue dynamic superscalar. The caches are non blocking and write back. The system uses a full map directory and a cache coherence protocol similar to that used in DASH [12]. The directory controller is extended to support logging and distributed parity needed for ReVive, as described in Section 3.2. Contention is accurately modeled in the entire system, including the busses, the network and the main memory. Table 3 lists the main characteristics of the ....

D. Lenoski et al. The Stanford Dash Multiprocessor. IEEE Computer, pages 63--79, Mar. 1992. It is Dash, not DASH.


Design Trade-Offs in High-Throughput Coherence Controllers - Anthony-Trung Nguyen..   (Correct)

....1(b) Cache coherence is maintained by a bus based snoopy protocol within an SMP and enforced by CCs using a directory based cache coherence protocol across the machine. Our directory based protocol uses an invalidation based approach. Each CC also connects to either a Remote Access Cache (RAC) [10] or an L3 cache. A RAC keeps recently accessed copies of remote memory lines. An L3 cache keeps both local and remote memory lines. If the RAC or L3 can satisfy a local request to a remote memory line, the request does not need to traverse the network in order to fetch that memory line from the ....

D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH Multiprocessor. IEEE Computer, pages 63--79, March 1992.


Clustered Objects: Initial Design, Implementation and Evaluation - Appavoo   (Correct)

....memory modules in a transparent way, although it may su#er increased latencies when accessing memory located on remote clusters. SMPs with this type of physical memory organization are called Non Uniform Memory Access (NUMA) SMPs. Examples of such NUMA SMP architectures include Stanford s Dash [21] and Flash [17] architectures, University of Toronto s Hector [42] and NUMAchine [41] architectures, Sequent s NUMA Q [32] architecture and SGI s Cray Origin2000 [19] NUMA SMPs that implement cache coherency in hardware are called CC NUMA SMPs. In contrast, multiprocessors based on a single bus ....

Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica S. Lam. The Stanford Dash multiprocessor. Computer, 25(3):63--79, March 1992.


MORPH: A System Architecture for Robust High Performance Using .. - Chien, Gupta (1996)   (5 citations)  (Correct)

....with low overhead, reaping the communication reduction and lower latency benefits without computational overhead. Of course, there are a wealth of cache system optimizations proposed within parallel machines which could be applied in an application specific manner to achieve best performance [43, 23, 25]. 4.4 Custom Prefetching The pointer based data access to sparse matrix data structures in current day memory hierarchies yields poor performance because the indirection introduces main memory and memory hierarchy latencies into the innermost computational loop. Techniques such as software ....

Lenoski, D., and et al. The Stanford DASH Multiprocessor. IEEE Computer (Mar 1992), 63--79.


Hierarchical Backoff Locks for Nonuniform Communication.. - Radovic, Hagersten (2003)   (1 citation)  (Correct)

....transfer a nonuniform communication architecture. A NUCA is an architecture in which the unloaded latency for a processor accessing data recently modified by another processor differs at least by a factor of two, depending on where that processor is located. DASH was the first NUCA machine [13]. Each DASH node consists of four processors connected by a snooping bus. A cache to cache transfer from a cache in a remote node is 4.5 times slower than a transfer from a cache in the same node. We call this the NUCA ratio. Sequent s NUMAQ has a similar topology, but its NUCA ratio is closer to ....

....locks to provide a hardware queued lock behavior without requiring any software support or new instructions [21] The load linked store conditional instructions are used to demonstrate a possible implementation. Stanford DASH uses directories to indicate which processors are spinning on the lock [13]. When the lock is released, one of the waiting nodes is chosen at random and is granted the lock. The grant request invalidates only that node s caches and allows one processor in that node to acquire the lock with a local operation. This scheme lowers both the traffic and the latency involved in ....

D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam. The Stanford Dash Multiprocessor. IEEE Computer, 25(3):63-- 79, Mar. 1992.


Memory Latency Rediction via Data Prefetching and Data Forwarding .. - Poulsen (1994)   (Correct)

No context found.

D. E. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam, "The Stanford Dash multiprocessor," IEEE Computer, vol. 25, no. 3, pp. 63-79, March 1992.


Compiler Support for Array Distribution on - Numa Shared Memory   (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo,W.Weber, A. Gupta, J. Hennessy,M.Horowitz and M. Lam. The Stanford Dash multiprocessor. IEEE Computer, 25(3):63--79, 1992.


SmartApps, an Application Centric Approach to High .. - Dang, Garzaran..   (Correct)

No context found.

D. Lenoski et al. The Stanford Dash Multiprocessor. IEEE Computer, pp. 63--79, March 1992.


A View-based Consistency Model based on Transparent Data.. - Huang, al. (2004)   (Correct)

No context found.

D. Lenoski, et al.: "The Stanford DASH multiprocessor", IEEE Computer, 25(3), pp.6379, March 1992.


Simulation of the Clustered Torus - Wong   (Correct)

No context found.

D. Lenoski, J. Laudon, K. Garachorloo, W.-D. Weber, A. Gupta, J. Henessy, M. Horowitz, and M.S. Lam. The stanford dash multiprocessor. IEEE Computer, 25(3):63--79, 1992.


Coherence Decoupling: Making Use of Incoherence - Huh, Chang, Burger, al. (2004)   (1 citation)  (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63--79, Mar. 1992.


ReVive: Cost-Effective Architectural Support for Rollback.. - Prvulovic, al. (2002)   (5 citations)  (Correct)

No context found.

D. Lenoski et al. The Stanford Dash Multiprocessor. IEEE Computer, pages 63--79, Mar. 1992. It is Dash, not DASH.


Assessment of Cache Coherence Protocols in Shared-memory.. - Grbic (2003)   (Correct)

No context found.

Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John L. Hennessy, Mark Horowitz, and Monica S. Lam. The Stanford Dash Multiprocessor. IEEE Computer, 25(3):63--79, March 1992.


Active Memory Clusters: Efficient Multiprocessing on.. - Heinrich, Speight.. (2002)   (1 citation)  (Correct)

No context found.

Lenoski, D., et al.: The Stanford DASH Multiprocessor. IEEE Computer , 25(3):63-- 79, March 1992.


Deriving Efficient Cache Coherence Protocols through.. - Nalumasu, Gopalakrishnan (1997)   (1 citation)  (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam. The Stanford DASH multiprocessor. IEEE COMPUTER, 25(3):63--79, March 1992.


Permission to Make Digital Or Hard Copies of All Or Part.. - Personal Or Classroom   (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. L. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH multiprocessor. IEEE Computer, 25(3):63-- 79, Mar. 1992.


Fault-Tolerant Hierarchical Networks for Shared Memory .. - Mahmud, Samaratunga.. (2002)   (Correct)

No context found.

Lenoski, D. et al. (1992) The Stanford Dash Multiprocessor. IEEE Comput. Mag., 25, 63--79.


Evaluation of the Raw Microprocessor: An.. - Taylor, Lee.. (2004)   (6 citations)  (Correct)

No context found.

D. Lenoski, et al. The Stanford DASH Multiprocessor. IEEE Computer 25, 3 (March 1992), pp. 63--79.


A Comparison of Software and Hardware Synchronization.. - Carter, Kuo, Kuramkote (1996)   (9 citations)  (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy,M.Horowitz, and M. S. Lam. The Stanford DASH multiprocessor. IEEE Computer, March1992.


Shared Memory for Distributed Systems - Of The Requirements   (Correct)

No context found.

Daniel Lenoski, James Laudon, Kourosh Gharachorlo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz and Monica S. Lam, "The Stanford Dash Multiprocessor", Proc. 1992.


Deriving Efficient Cache Coherence Protocols through.. - Nalumasu, Gopalakrishnan (1997)   (1 citation)  (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam. The Stanford DASH multiprocessor. IEEE COMPUTER, 25(3):63--79, March 1992.


Token Coherence: Decoupling Performance and Correctness - Martin, Hill, Wood (2003)   (2 citations)  (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63--79, Mar. 1992.


Data Locality Optimization of Shared Memory Programs on NUMA.. - Tao   (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63--79, March 1992.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC