357 citations found. Retrieving documents...
Lenoski, D., Laudon, J., Gharachorloo, K., Gupta, A., and Hennessy, J. (1990). The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proc. of the 17th International Symposium on Computer Architecture, pages 148--159.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

An Evaluation of Architectural Platforms for Parallel.. - Jayasimha, al. (1996)   (Correct)

.... the Cray T3D, and a cluster of workstations connected via many networks (the Lewis Advanced Cluster Environment (LACE) 9] experimental testbed) One important architecture that has not been considered in our study is cache coherent, massively parallel processors typified by the DASH architecture [11]. An earlier paper by the authors presented the results of a study of this application on LACE [6] This paper differs from the earlier one in two important aspects: i) It is comprehensive covering a gamut of architectures while the other examined the feasibility of NOW architectures as low cost ....

Lenoski, D. E., et al. "The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor". Int'l Conf. on Computer Architecture, May 1990, pp. 148--159.


A Study on Memory-Based Communications and Synchronization in.. - Matsumoto (2001)   (Correct)

....MBP performs fine grained communications by using a dedicated hard wired circuit. The Tempest detects fine grained access by the main processors, so interrupts occur too frequently to allow an improved performance for the system as a whole. 2.7. 3 Flash The Flash [29] is the successor to the DASH [31, 30] multiprocessor and includes an integrated protocol processor which detects fine grained access by the main processors to shared blocks and handles the communications required to maintain consistency. Its basic functions are thus similar to those of the MBP, but the Flash has no virtual ....

D. E. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. L. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proc. of the 17th Int. Symp. on Computer Aarchitecture, pages 148--159, May 1990.


Automatic Software Cache Coherence through Vectorization - Darnell, al. (1992)   (16 citations)  (Correct)

....on the processor memory interconnect, are now in common use for small scale systems [14, 16] however, snoopy schemes are problematic for large scale machines because such machines cannot be based on a single, central broadcast medium for lack of sufficient band width. Directory schemes [3, 11, 17], in which a directory entry associated with each memory location indicates which processors have cached values for that location, seem more promising for large scale systems. However, directories can require large amounts of additional storage and directory maintenance operations may ....

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the dash multiprocessor. 17th International Symposium on Computer Architec- ture/ComputerArchitecture News, pages 148 159, May 1990.


SPIRAL: A Client-Transparent Third-Party Transfer Scheme for.. - Ma, Reddy   (Correct)

....of the pending blocks. A release message can be sent by the server if we know that the corresponding DLI will not be referred later. This happens when the server receives ACKs from the client that acknowledge all the data specified by the DLI. The approach is similar to release consistency [15]. In practice, the release messages can be sent out to the NADs in batches for efficiency reasons. For UDP based NFS, the reference count on the pending blocks can be reduced after the corresponding redirected packets are sent. If one or some of these UDP packets are lost, NFS RPC will re send the ....

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Henenessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual Symposium on Computer Architecture, pages 148--159, May 1990.


Efficient Integration of Compiler-directed Cache Coherence And.. - Lim, Yew (2000)   (1 citation)  (Correct)

....is necessary to develop efficient techniques to hide the large remote memory access latencies in such system. The cache coherence techniques used in existing commercially available multiprocessors are mainly hardware based, such as a snoopy cache protocol [26] or a hardware directorybased scheme [17, 19]. These schemes rely on interprocessor communications to determine cache coherence actions. In large scale DSM systems, the scalability of such schemes might be affected by excessive coherence related network traffic. Furthermore, their hardware complexity and cost can become quite substantial as ....

....shared data. The remaining invalid copies are known as stale data, and the references to these data are called stale references. A variety of techniques for the detection, prevention and avoidance of stale references are used by cache coherence schemes. Most hardware based cache coherence schemes [17, 19, 26] prevent stale references at run time by invalidating or updating stale cache entries before the stale references occur. To do so, they require interprocessor communications to keep track of the cache states and to 3 perform these cache coherence actions. Such coherence related network traffic ....

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directorybased cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th International Symposium on Computer Architecture, pages 148--159, May 1990. 36


Clustered Objects: Initial Design, Implementation and Evaluation - Appavoo   (Correct)

....replicate the Simple Array SSAC. We refer to this implementation as the Replicated SSAC. Figure 4.7 illustrates such an organization. It replicates the Shared SSAC structure on a per processor basis. Consistency between the replicas is maintained using a directory based writeupdate cache protocol [34, 20]. It was felt that better performance may actually be achieved by using an invalidate protocol but that the added complexity of the update protocol would better explore the expressiveness of the Clustered Object approach. From the clients perspective, the Replicated SSAC and the Shared SSAC ....

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Jean-Loup Baer and Larry Snyder, editors, Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148--159, Seattle, WA, June 1990. IEEE Computer Society Press.


Implementing Shared Memory On Large-Scale Multiprocessors - Parthasarathy (1992)   (1 citation)  (Correct)

....protocols are the fullmap scheme [5, 12] the chained list scheme [6] the limited copy scheme [6, 7] and the hierarchical protocols [11, 18] Two major classes of large scale shared address space multiprocessors have emerged in the literature. These are non uniform memory access machines (NUMA) [17, 14, 7], and cache only memory architectures (COMA) 17, 11, 18] Both incorporate distributed M M P P P C C C Processors Caches Local Memories INTERCONNECTION NETWORK Figure 1.2: A Shared Memory System with Caches and Distributed Memory memory and directory based cache coherence. In a NUMA ....

....System with Caches and Distributed Memory memory and directory based cache coherence. In a NUMA machine, each processor has a local memory and a cache. The shared address space is synthesized from the local memories of the processors. Examples of NUMA machines are the Stanford DASH multiprocessor [14], the MIT Alewife machine [7] and the BBN Butterfly machine [15] In a COMA machine, the local memory of each processor resembles a huge cache. The shared address space is synthesized from these large caches. Data in the address space can migrate freely among the caches and is replicated as ....

D. Lenoski et al., "The directory based cache coherence protocol for the DASH multiprocessor, " Proceedings of the 17th Annual Symposium on Computer Architecture, pp. 148--160, June 1990.


An Asynchronous Protocol for Release Consistent Distributed.. - Yeo, Yeom, Park   (Correct)

....which write to shared memory. In the release consistency(RC) model, writes to shared memory by a processor p i need to become visible at another processor p j only when a subsequent lock release operation of p i is performed. This relaxation of the memory consistency model 3 allows the DASH [20] implementation of the RC model to combat memory latency by pipelining writes to shared memory. Four variations of the RC model have been proposed: eager invalidate, lazy invalidate, lazy hybrid and aggressive models. In the eager invalidate(EI) protocol, modifications to shared data are made ....

D.E. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J.L. Hennessy, "The directory-based cache coherence protocol for the dash multiprocessor," in Proc. 17th Annual Int'l Symp. on Computer Architecture, pp. 148--159, May 1990.


Speculative Synchronization: Applying Thread-Level.. - Martinez, Torrellas (2002)   (4 citations)  (Correct)

....modeled and the applications executed. 5.1 Architecture Modeled We use an execution driven simulation framework [18] to model in detail a CC NUMA multiprocessor with 16 or 64 nodes. The system uses the release memory consistency model and a cache coherence protocol along the lines of DASH [21]. Each node has one processor and a two level hierarchy of write back caches. The processor is a 4 issue out of order superscalar with register renaming, branch prediction, and nonblocking memory operations. The cache sizes are kept small to capture the behavior that real size input data would ....

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In International Symposium on Computer Architecture, pages 148--159, Seattle, WA, May 1990.


Dynamic Computation Migration in Distributed Shared Memory Systems - Hsieh (1995)   (6 citations)  (Correct)

....a remote processor. Under the coherence protocol implemented in MCRL, the data is first sent to the home processor, and then back to the requesting processor. Modifying the protocol to allow the data to be sent directly to the requesting processor (which is done, for example, in the DASH protocol [65]) would improve the relative performance of data migration. 68 Performance This chapter discusses and analyzes the performance of MCRL, as well as the performance of dynamic computation migration under my two heuristics. Several important results regarding computation migration are ....

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. "The DirectoryBased Cache Coherence Protocol for the DASH Multiprocessor". In Proceedings of the 17th International Symposium on Computer Architecture, pages 148--158, Seattle, WA, May 1990.


Parallel Hierarchical Molecular Structure Estimation - Chen, Singh, Altman (1996)   (1 citation)  (Correct)

....be encountered by analytical procedures such as the current algorithm. See [7] for details of the problem and its computation. The decomposition of the 30S ribosome problem is indicated in Figure 4. The execution platforms on which we performed our experiments are the Stanford DASH multiprocessor [9] and a Silicon Graphics Challenge. The DASH is an experimental research multiprocessor built at Stanford University. It supports an implicit shared memory communication abstraction in hardware, with hardware supported cache coherence. The machine we used has 32 processors organized into 8 ....

Lenoski, D., J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy, "The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor", Proceedings of the 17th Annual International Symposium on Computer Architecture, pp.148-- 159, May 1990.


User-Level Interprocess Communication for Shared.. - Bershad, Anderson.. (1991)   (31 citations)  (Correct)

....operating system kernel designed for a uniprocessor, but running on a multiprocessor, kernel resources are logically centralized, but distributed over the many processors in the system. For a medium to large scale shared memory multiprocessor such as the Butterfly [7] Alewife [4] or DASH [25], URPC s user level orientation to operating system design localizes system resources to those processors where the resources are in use, relaxing the performance bottleneck that comes from relying on centralized kernel data structures. This bottleneck is due to the contention for logical ....

....is an indirect but more pervasive overhead stemming from the effect of thread management policy on program performance. There are a large number of parallel programming models, and within these, a wide variety of scheduling disciplines that are most appropriate (for performance) to a given model [25, 33]. Performance, though, is strongly influenced by the choice of interfaces, data structures, and algorithms used to implement threads, so a single model represented by one style of kernel level thread is unlikely to have an implementation that is efficient for all parallel programs. In response to ....

LENOSKI, D., LAUDON, J., GHARACHORLOO, K., GUPTA, A., AND HENNESSY, J. The directorybased cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual InternatloTctl Symposium on Computer Architecture. (May 1990), 148 159. 26 LEVY, H. M. AND ECOnOUSE, R.H. Computer Programming and Architecture: The VAX-11. 2nd Ed. Digital Press, Bedford, Mass., 1989.


The Influence of Architectural Parameters on the.. - Silva, Dutra..   (Correct)

....copies of the cache block containing the item in other processors caches are invalidated. If one of the invalidated processors later requires the same item, it will have to fetch it from the writer s cache. Our WI protocol keeps caches coherent using the DASH protocol with release consistency [9]. WU protocols are the main alternative to invalidate based protocols. In WU protocols, whenever an item is written, the writer sends copies of the new value to the other processors that share the item. In our WU implementation, a processor writes through its cache to the home node. The home node ....

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. Proceedings of the 17th ISCA, pages 148--159, May 1990.


Design And Analysis Of Update-Based Cache Coherence Protocols For .. - Glasco (1995)   (1 citation)  (Correct)

....centralized directory update based protocol presented in this work currently does not support invalidations, but the protocol could be extended to support such directory initiated invalidations. 3. 2 Protocol Deadlock If the system has finite buffering, then protocol level deadlock is possible [69, 53, 35] . For example, figure 3.4 shows two caches that are sending requests to each other through a set of finite buffers. Each buffer can hold a single request. First, cache A sends two requests to cache B, and it begins processing a request that will generate another request to cache B. But because ....

....almost infinitely sized buffer, but it requires a tight coupling of the cache controller, directory controller and local memory. The second technique attempts to break the deadlock by removing requests from the deadlocked buffer and sending them back to their source through an exception network [53, 69, 35]. To minimize the probability of deadlock, message types are statically divided by the protocol into request and reply messages. A request message may generate another message and, therefore, lead to deadlock. A reply message never generates any new messages and, therefore, can always be consumed. ....

[Article contains additional citation context not shown here]

Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th International Symposium on Computer Architecture, pages 148--159, May 1990.


ARCADE: An Architectural Basis for Distributed Systems - Banerji Casey Cohn (1992)   (1 citation)  (Correct)

....The ARCADE abstractions strike a balance. Data units and data unit links are high enough for multiple cooperating implementations and low enough to support a variety of computational models. 7. 2 Data Units and Distributed Shared Memory Distributed shared memory has been implemented in hardware [le90], as operating system software [Li86] and through compiler generated code [Ba89] Ni91] The major DSM design issues are granularity of shared data, coherence protocol and support for heterogeneity [Ni91] Ivy [Li86] classically assumes shared data is totally unstructured, using hardware dependent ....

Lenoski, D., et. al., The Directory Based Cache Coherence Protocol for the DASH Multiprocessor, Proceedings of the 17th Annual Int. Symposium on Computer Architecture, IEEE, 1990, pp. 148-159.


Implementation Issues Relating to the WPRAM Model for.. - Nash, Dew, Davy, Dyer (1996)   (Correct)

....of traversals of the global router. This results in the cost O(log p N ) 4 Supporting a Weakly Coherent Shared Address Space The WPRAM assumes the use of a randomised shared memory for the support of predictable data access times [18] Cache coherent multiprocessors, such as the Stanford DASH [15] and KSR machine [2] and distributed shared memory machines such as the Cray T3D, directly support a shared address space, but without the strict bounds on the performance of shared data access as the number of processors increase. This section describes a scalable and practical 8 implementation ....

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. A DirectoryBased Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148--159, 1990.


A Combination of Scalable Caching Methods for a Weakly.. - Zamanifar, Nash, Dew (1996)   (Correct)

....Snoopy based cache coherency schemes [15, 3, 19] are limited to small scale multiprocessors, because of the limited bandwidth of the shared bus. Hardware solutions to the cache coherency problem for multiprocessors with point to point connections more commonly employs a directory based scheme [27, 2, 6, 18, 16]. Due to the increased complexity of hardware solutions to the cache coherency problem, software assisted schemes 1 [7, 10, 13, 26, 29, 25, 21, 8] have been proposed, which are under supervision of the compiler (static schemes) or supported by the operating system kernel (dynamic schemes) To ....

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennesy. The DirectoryBased Cache Coherence Protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, May 1990.


SPIRAL: A Client-Transparent Third-Party Transfer Scheme for.. - Xiaonan Ma And (2001)   (Correct)

....of the pending blocks. A release message can be sent by the server if we know that the corresponding DLI will not be referred later. This happens when the server receives ACKs from the client that acknowledge all the data specified by the DLI. The approach is similar to release consistency [12]. In practice, the release messages can be sent out to the disks in batches for efficiency reasons. For UDP based NFS, the reference count on the pending blocks can be reduced after the corresponding redirected packets are sent. If one or some of these UDP packets are lost, NFS RPC will re send ....

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Henenessy. The directory-based cache coherence protocol for the DASH multiprocessor. Proc. of 17th Ann. Symp. on Computer Architecture, pages 148--159, May 1990.


Removing Architectural Bottlenecks to the.. - Prvulovic..   (6 citations)  (Correct)

....modeled. Since many accesses to shared data are not compiler analyzable, shared data pages are allocated round robin in the memory modules of the participating processors. Private data are allocated locally. The system uses a directory based cache coherence protocol along the lines of DASH [14] with the support for speculative threadlevel parallelization sketched in Section 2. In the baseline speculation protocol, task commit involves eagerly writing back to memory all the dirty lines generated by the task. Only after the operation is complete can the non speculative status be passed on ....

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148--159, May 1990.


Parallelizing Navier-Stokes Computations on a Variety of .. - Jayasimha, Hayder.. (1995)   (Correct)

.... the Cray T3D, and a cluster of workstations connected via many networks (the Lewis Advanced Cluster Environment (LACE) 6] experimental testbed) One important architecture that has not been considered in our study is cache coherent, massively parallel processors typified by the DASH architecture [7]. Architectures such as LACE (an example of NOW) are becoming increasingly popular because they show promise as a low cost alternative to expensive supercomputers and massively parallel processors. We have therefore laid more emphasis on this aspect of the study in this paper. An earlier paper by ....

Lenoski, D. E., et al. "The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor". Int'l Conf. on Computer Architecture, May 1990, pp. 148--159.


Loosely-Coupled Processes - Jayadev Misra Department (1991)   (5 citations)  (Correct)

....Therefore, a possible programming methodology is to design a loosely coupled system in which each property of the above form is implemented by a single process, with the restriction that no other process falsify the post condition. A problem in multiprocessor system design, called cache coherence [9], has its roots in shared variable programming. Suppose that several processes hold copies of a shared variable in their local caches. If processes write into their local copies autonomously then these copies may become inconsistent (and then the processes may read different values for the ....

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the dash multiprocessor. In Proc. IEEE, 17th Annual International Symposium on Computer Architecture, pages 148--159, May 1990.


Improving the Performance of Bristled CC-NUMA Systems.. - Martinez, Torrellas.. (1999)   (Correct)

....for the system sizes that we are analysing, the experimental values are not much different from what we obtained using a uniform single unit link delay. Therefore, we are presenting results under the latter assumption. An invalidation based cache coherence protocol similar to the one in DASH [12] is used. The protocol is able to correctly handle out of order messaging. We choose four parallel shared memory codes from the SPLASH 2 suite [20] FFT, Radix, Ocean, and LU. The choice is made to be representative of the different levels of communication bandwidth required in the suite. Table 1 ....

D. E. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. "The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor". Proc. 17th Intl. Symp. on Computer Architecture, May 1990.


Wait-Free Data Structures in the Asynchronous PRAM Model - Aspnes, Herlihy (2000)   (34 citations)  (Correct)

....without such atomic instructions. There is another sense in which asynchronous PRAM may be too strong to be realistic. Many modern shared memory multiprocessors do not guarantee that memory is sequentially consistent [34] reads and writes to shared memory do not appear to occur atomically (e.g. [1, 36] and many commercial multiprocessors) In modern architectures, processors are fast, while memory and communication are slow, and as a result the cache coherency protocols necessary to enforce sequential consistency are expensive, and architects are often unwilling to pay this cost on every memory ....

D Lenoski, J. Laudon, K Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual Symposium on Computer Architecture, pages 148-159, May 1990.


Volume Rendering on Scalable Shared-Memory MIMD Architectures - Nieh, Levoy (1992)   (56 citations)  (Correct)

....parallel machine. To sustain high performance, DASH provides caching of memory, including shared writable data, to reduce memory latency. By associating a directory with each cluster s main memory that keeps track of all memory blocks cached in other clusters, a distributed directorybased protocol [9] with an intracluster snoopy bus based protocol [15] provides coherent caches, keeping memory consistent among the processors. The latency of a memory access depends on where it is serviced in the memory system of DASH. The memory system can be broken into four levels of hierarchy: processor, ....

Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148--159, May 1990.


Dimensions of Verifying the Hardware-Software.. - Abts, Lilja.. (1999)   (Correct)

....The other dimension of the hardware software interface, as shown in Figure 1b, is the coherence mechanism, which can range from nonexistent to a scalable directory based protocol. Naturally, software based coherence mecahnisms [6, 7, 8] are more flexible, whereas hardware based coherence [9, 10, 11] provides better performance at the expense of a more complex verification process. As a parallel program executes, memory references will cause data to migrate through the memory hierarchy. However, changes (writes) to data in a local cache may not be visible by other processors unless ....

Lenoski et al. The directory-based cache coherence protocol for the DASH multiprocessor. Proc of the 17th Annual Int. Symposium on Computer Architecture, pages 148-- 159, June 1990.


Molecular Structure Computation from Multiple Data Sources - Chen (2000)   Self-citation (Hennessy)   (Correct)

....For increased parallelism, the dynamic algorithm also necessarily trades off some of the data locality orchestrated in the static scheme. The extent of the compromise may be quantified by examining the increase in remote memory operations the processes incur. Unfortunately, unlike some machines [60], the performance counters on the Origin are not able to distinguish local misses from remote. The available instrumentation [89] tells us that for the 64 processor runs of 1GLN, the static version of the application uses an average of 11.8 MB sec of total memory bandwidth per process, while the ....

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy, "The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor", pp. 148-159, in Proc. 17th Annual International Symposium on Computer Architecture, 1990.


Interleaving: A Multithreading Technique Targeting.. - Laudon, Gupta, Horowitz (1994)   (38 citations)  Self-citation (Laudon Gupta)   (Correct)

....on a multiprocessor. By addressing the needs of the workstation environment, our proposal makes multiple contexts more attractive for commodity microprocessors. 1 Introduction Large scale multiprocessors, such as the one shown in Figure 1, are increasingly built using commodity microprocessors [2, 16]. While these commodity microprocessors provide a relatively lowcost compute node, their performance depends heavily on employing a sophisticated cache hierarchy to insulate the processor from the long remote memory latency. Providing the ability to cache shared data [16] can greatly increase the ....

....microprocessors [2, 16] While these commodity microprocessors provide a relatively lowcost compute node, their performance depends heavily on employing a sophisticated cache hierarchy to insulate the processor from the long remote memory latency. Providing the ability to cache shared data [16] can greatly increase the amount of computation that can be done before requiring a long latency operation, however, it cannot remove the long latency operations completely. To address the performance loss associated with remote cache misses, several latency tolerating schemes have been proposed, ....

[Article contains additional citation context not shown here]

Dan Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148 159, May 1990.


Memory Consistency and Event Ordering in Scalable.. - Gharachorloo.. (1990)   (450 citations)  Self-citation (Lenoski Laudon Gharachorloo Gupta Hennessy)   (Correct)

....still providing a reasonable programming model for the programmer. Architectural optimizations that reduce memory latency are especially important for scalable multiprocessor architectures. As a result of the distributed memory and general interconnection networks used by such multiprocessors [8, 9, 12], requests issued by a processor to distinct memory modules may execute out of order. Caching of data further complicates the ordering of accesses by introducing multiple copies of the same location. While memory accesses are atomic in systems with a single copy of data (a new data value becomes ....

....ordering accesses under the various consistency models. The problem is split between ordering accesses to the same memory block and those to different memory blocks. General solutions to achieve the proper ordering are given along with the particular solutions employed in the DASH prototype system [8]. Our discussion focuses on invalidation based coherence protocols, although the concepts can also be applied to update based protocols. 6.1 Inter Block Access Ordering and the FENCE Mechanism As a result of the distribution of the memory and the use of scalable interconnection networks, ....

[Article contains additional citation context not shown here]

Dan Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. The directorybased cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, May 1990.


The Performance Advantages of Integrating Block Data.. - Woo, Singh, Hennessy (1994)   (20 citations)  Self-citation (Hennessy)   (Correct)

....event driven reference generator [11] Our detailed, variablelatency memory system simulator models contention at the node controller and memory system, but not in the network itself. An invalidation based cache coherence protocol similar to the one used in the Stanford DASH multiprocessor [13] is simulated. Processors are forced to block on read misses, but infinite write buffering hardware is included to eliminate processor stalls on write misses. To reduce miss latencies, speculative memory reads are performed at the home node at the same time that the state of a line is checked. The ....

Dan Lenoski, James Laudon, Kourosh Gharachofioo, Anoop Gupta, and John Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148 159, May 1990.


Comparative Evaluation of Latency Reducing and.. - Gupta, Hennessy.. (1991)   (103 citations)  Self-citation (Gharachorloo Gupta Hennessy)   (Correct)

....each one on its own. Overall, we show that using suitable combinations of the techniques, performance can be improved by 4 to 7 times. 1 Introduction Large scale shared memory multiprocessors are expected to have remote memory reference latencies of several tens to hundreds of processor cycles [18, 22, 25, 30]. The large latencies arise partly due to the increased physical dimensions of the parallel machine and partly due to the ever increasing clock rates at which the individual processors operate. These large memory latencies can quickly offset any performance gains expected from the use of ....

....offset any performance gains expected from the use of parallelism. Techniques that can help to reduce or hide these latencies are essential for achieving high processor utilization. To cope with the large latencies, several different architectural techniques have been proposed. Coherent caches [3, 4, 18, 30] allow shared read write data to be cached and significantly reduce the memory latency seen by the processors. Relaxed memory consistency models [1, 5, 8] hide latency by allowing buffering and pipelining of memory references. Prefetching techniques [11, 16, 21, 23] hide the latency by bringing ....

[Article contains additional citation context not shown here]

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proc. Int. Symp. Comput. Arch., pages 148 159, May 1990.


Load Balancing and Data Locality in Adaptive.. - Singh, Holt.. (1995)   (30 citations)  Self-citation (Gupta Hennessy)   (Correct)

....of this generalized multiprocessor in our experiments: the Stanford DASH Multiprocessor a high performance research machine and a simulated multiprocessor. I Interconnection Network I Figure 10: The simulated multiprocessor architecture. The Stanford DASH Multiprocessor The DASH machine [20] has 48 processors organized in 12 clusters. 6 A cluster comprises 4 MIPS R3000 processors connected by a shared bus, and clusters are connected together in a mesh network. Every processor has a 64KB first level cache memory and a 256KB second level cache, and every cluster has an equal fraction ....

Dan Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. The directorybased cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990.


Tolerating Latency Through Software-Controlled Prefetching in.. - Mowry, Gupta (1991)   (232 citations)  Self-citation (Gupta)   (Correct)

....directed prefetching. A prefetch operation in DASH is an explicit non blocking request to the memory system that brings the prefetched location into a cache close to the processor. Prefetching in DASH is non binding in the sense that prefetched data remains visible to the cache coherence protocol [18] to keep it consistent until the processor actually reads the value through a binding access (e.g. a register load operation) In contrast, with binding prefetching [9, 14] the value of a later reference is bound (e.g. a processor register is loaded) at the time the prefetch completes. As a ....

....or more cycles. This section presents the architectural assumptions that we make, the benchmark applications, and the simulation environment used to get performance results. 2. 1 Architectural Assumptions For this study, we have chosen an architecture that resembles the DASH multiprocessor [18], a large scale cache coherent machine currently being built at Stanford. Figure 1 shows the high level organization of the architecture. The architecture consists of several processing nodes (or clusters) connected through a low Prc l ooo Prc l ooo Interconnection Network Figure 1: The DASH ....

[Article contains additional citation context not shown here]

Dan Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. The directorybased cache coherence protocol for the DASH multiprocessor. In Proceedings' of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990.


Scaling Parallel Programs for Multiprocessors: - Examples   Self-citation (Gupta Hennessy)   (Correct)

No context found.

Dan Lenoski, James Laudon, Kourosh Gharachor- loo, Anoop Gupta, and John Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990.


Computer Architecture: a qualitative overview of Hennessy and.. - Machanick (2001)   Self-citation (Hennessy)   (Correct)

.... primitives like barriers which scale poorly [Cheriton et al. 1993] The ParaDiGM architecture [Cheriton et al. 1991] contains some interesting ideas about coherency based locks as well as the notion of scaling up shared memory with a hierarchy of buses and caches, while the DASH project [Lenoski et al. 1990] was one of the earliest to introduce latency hiding strategies (an issue now with uniprocessor systems) Tree based barriers attempt to distribute the synchronization overhead, so a barrier does not become a hot spot for global contention for locks [Mellor Crumney and Scott 1991] 76 6.11 ....

D Lenoski, J Laudon, K Gharachorloo, A Gupta and J Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor, Proc. 17th Int. Symp. on Computer Architecture, Seattle, WA, May 1990, pp 148--159.


Improving the I/O Performance and Correctness of Network File.. - Wang (1999)   (Correct)

No context found.

Lenoski, D., Laudon, J., Gharachorloo, K., Gupta, A., and Hennessy, J. (1990). The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proc. of the 17th International Symposium on Computer Architecture, pages 148--159.


Emulation of a Virtual Shared Memory Architecture - Raina (1993)   (3 citations)  (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The Directorybased Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148--159, 1990.


Arvind Krishnamurthy - Report No Ucb   (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In 17th International Symposium on Computer Architecture, pages 148--159, 1990.


Heuristics for Complexity-Effective Verification of a Cache.. - Abts, Chen, Lilja (2003)   (Correct)

No context found.

D. E. Lenoski. The directory-based cache coherence protocol for the DASH multiprocessor. Proc of the 17th Annual Int. Symposium on Computer Architecture, pages 148--159, June 1990.


Assessment of Cache Coherence Protocols in Shared-memory.. - Grbic (2003)   (Correct)

No context found.

Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John L. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In 148--159, Seattle, Washington, June 1990.


The Thrifty Barrier: Energy-Aware Synchronization in - Shared-Memory..   (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. L. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In International Symposium on Computer Architecture, pages 148--159, Seattle, WA, June 1990.


An Experimental Evaluation of - Software Distributed Shared   (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam, "The Directory-Based Cache Coherence Protocol for the DASH Multiproces11


The Thrifty Barrier: Energy-Aware Synchronization in - Shared-Memory..   (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. L. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In International Symposium on Computer Architecture, pages 148--159, Seattle, WA, June 1990.


The Coherence Predictor Cache: A Resource-Efficient and .. - Nilsson, Landin..   (Correct)

No context found.

D. E. Lenoski, J. P. Laudon, K. Gharachorloo, A. Gupta, and J. L. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proc. of ISCA-17, pages 148--159, May 1990.


Coherence Buffer: An Architectural Support for.. - Sarojadevi, Nandy.. (2002)   (Correct)

No context found.

Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. The Directory-based Cache Coherence Protocol for the DASH multiprocessor. In Proceedings of the 17 Annual International Symposium on Computer Architecture, May 1990.


A Study on Memory-Based Communications and Synchronization in.. - Matsumoto (2001)   (Correct)

No context found.

D. E. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. L. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proc. of the 17th Int. Symp. on Computer Aarchitecture, pages 148--159, May 1990.


A Framework of Memory Consistency Models - Hu, Shi, Tang (1998)   (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, P. Gibbons, A. Gupta, and J. Hennessy, "The Directory-Based Cache Coherence Protocol for the DASH Multiprocessors", In Proceedings of the 17th Annual International Symposium on Computer Architecture, pp. 148--158, June 1990.


Experiences with Oasis+: A Fault Tolerant Storage System - David Watson Yan (2001)   (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy, The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor, Procs of the 17th International Symp on Computer Architecture, pp. 148-159, Seattle, WA, May 1990.


Out-of-Order Execution in Sequentially Consistent Shared-Memory.. - Hu, Xia   (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, P. Gibbons, A. Gupta, and J. Hennessy, "The Directory-Based Cache Coherence Protocol for the DASH Multiprocessors", In Proceedings of the 17th Annual International Symposium on Computer Architecture, pp. 148--158, June 1990.


Verifying Sequential Consistency on Shared-Memory Multiprocessors .. - Qadeer (2001)   (24 citations)  (Correct)

No context found.

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148--159, 1990.


Distributed Paging for General Networks - Awerbuch, Bartal, Fiat (1996)   (36 citations)  (Correct)

No context found.

L. Lenoski, J. Laundo, K. Gharachorloo, A. Gupta, and J.Hennessy. The directorybased cache coherence protocol for the dash multiprocessor. In Proc. of 17th Intern. Symp. on Computer Architecture, pages 148--159, 1990.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC