38 citations found. Retrieving documents...
E. Hagersten and S. Haridi. The Cache Coherence Protocol of the Data Diffusion Machine. In Proceedings of the Parallel Architectures and Languages Europe, PARLE, 1989.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Implementing Shared Memory On Large-Scale Multiprocessors - Parthasarathy (1992)   (1 citation)  (Correct)

....protocols do not impose any restrictions on the interconnection network and, therefore, can be used in large scale parallel machines. Examples of such protocols are the fullmap scheme [5, 12] the chained list scheme [6] the limited copy scheme [6, 7] and the hierarchical protocols [11, 18]. Two major classes of large scale shared address space multiprocessors have emerged in the literature. These are non uniform memory access machines (NUMA) 17, 14, 7] and cache only memory architectures (COMA) 17, 11, 18] Both incorporate distributed M M P P P C C C Processors Caches ....

....scheme [6] the limited copy scheme [6, 7] and the hierarchical protocols [11, 18] Two major classes of large scale shared address space multiprocessors have emerged in the literature. These are non uniform memory access machines (NUMA) 17, 14, 7] and cache only memory architectures (COMA) [17, 11, 18]. Both incorporate distributed M M P P P C C C Processors Caches Local Memories INTERCONNECTION NETWORK Figure 1.2: A Shared Memory System with Caches and Distributed Memory memory and directory based cache coherence. In a NUMA machine, each processor has a local memory and a cache. ....

[Article contains additional citation context not shown here]

S. Haridi and E. Hagersten, "The cache coherence protocol of the data diffusion machine," Parallel Architectures and Languages Europe, pp. 1 -- 18, June 1989.


PHD: A Hierarchical Cache Coherent Protocol - Wallach (1992)   (1 citation)  (Correct)

....consider how his ideas would work on very large scale systems. Archibald, in [5] proposes another solution intended for a small hierarchy of buses, remarking that his protocol is feasible for a two level hierarchy, but not necessarily a three level, or four level one. Haridi and Hagersten [12] later proposed a hierarchical scheme which was designed for a much larger systems: the Data Diffusion Machine (DDM) Their architecture also assumes a tree composed of buses, which forces all requests to be routed through the hierarchy. The intermediate level directories store information as to ....

....black node is performing a read request. The grey nodes have copies of the block being requested. request mechanism, and one in the asynchronous invalidate mechanism. We briefly compare four different solutions to this tradeoff, three of which are part of existing protocols. DDM The DDM protocol [12] requires more traversals of the hierarchy than do any of the other protocols. This requirement is reasonable given the assumptions of that project: they propose to implement their protocol on a bus based system, where the hierarchy is fixed in hardware and cannot be circumvented. During a read ....

Seif Haridi and Erik Hagersten. The cache coherence protocol of the Data Diffusion Machine. In PARLE '89 Parallel Architectures and Languages Europe, volume I, pages 1-18, June 1989.


SMART: a Simulator of Massive ARchitectures and Topologies - Petrini, Vanneschi (1997)   (Correct)

....important research prototypes, as the Stanford FLASH, provide a cache coherent shared address space and have communication 3 processors optimized to implement the coherency protocols. COMA machines are another kind of shared memory multiprocessors with peculiar architectural requirements [17] [14]. At the moment there is no clear winner between all existing architectural approaches. For this reason we decided to provide basic primitives to model the desired machine rather than having a fixed internal structure. We didn t want to tie the design of our simulator to a peculiar architecture. ....

S. Haridi and E. Hagersten. The Cache Coherence Protocol of the Data Diffusion Machine. In PARLE'89, Parallel Architectures and Languages Europe, volume I, pages 1--18, June 1989.


Network Performance under Physical Constraints - Petrini, Vanneschi (1997)   (Correct)

....Elite is the basic building block of the Meiko CS 2 network [4] This network takes the form of a quaternary fat tree. Its design is based on a multistage network and has the property that the overall communication bandwidth remains constant at each level. Other references to fat trees include [5] [6] Unfortunately, not much is known on the communication performance of the fat trees. Most of the literature deals with the CM 5 and focuses on raw network performance [7] 8] 9] Thanks to their simplicity and expandability, lowdimensional cubes have been adopted as interconnection networks ....

S. Haridi and E. Hagersten, "The Cache Coherence Protocol of the Data Diffusion Machine," in PARLE'89, Parallel Architectures and Languages Europe, vol. I, pp. 1--18, June 1989.


Efficient Personalized Communication on Wormhole Networks - Petrini, Vanneschi (1997)   (Correct)

....The communication processor Elan, which interfaces each processing node to the network, attaches at the beginning of each outgoing message a string that is used by the communication processors to route the message in the ascending and descending phases. Other references to fat trees include [30] [31] Unfortunately, not much is known on the communication performance of the fat trees. Most of the literature deals with the CM 5 and focuses on raw network performance [32] 33] 34] Typical communication patterns include simple sends and ping pong between pairs of nodes. Block permutations ....

S. Haridi and E. Hagersten, "The Cache Coherence Protocol of the Data Diffusion Machine," in PARLE'89, Parallel Architectures and Languages Europe, vol. I, pp. 1--18, June 1989.


Communication Performance of Wormhole Interconnection Networks - Petrini (1997)   (Correct)

....The communication processor Elan, which interfaces each processing node to the network, attaches at the beginning of each outgoing message a string that is used by the communication processors to route the message in the ascending and descending phases. Other references to fat trees include [70] [88] Unfortunately, not much is known on the communication performance of the fat trees. Most of the literature deals with the CM 5 and focuses on raw network performance [94] 103] 107] Typical communication patterns include simple sends and ping pong between pairs of nodes. Block ....

....important research prototypes, as the Stanford FLASH [71] provide a cache coherent shared address space and have communication processors optimized to implement the coherency protocols. COMA machines are another kind of shared memory multiprocessors with peculiar architectural requirements [88] [70]. At the moment there is no clear winner between all existing architectural approaches. For this reason we decided to provide basic primitives to model the desired CHAPTER 5. THE SIMULATION MODEL 67 machine rather than having a fixed internal structure. We didn t want to tie the design of our ....

Seif Haridi and Erik Hagersten. The Cache Coherence Protocol of the Data Diffusion Machine. In PARLE'89, Parallel Architectures and Languages Europe, volume I, pages 1--18, June 1989.


Automatic Scheduling for Cache Only Memory Architectures - Moore, Klauer, Waldschmidt (1998)   (Correct)

....We have transformed the MESI protocol into an ECOLI protocol. An object can thus be in one of 5 states, relative to a given attraction memory, as summarized in table 1. This can be compared to the 7 states needed for the SICS DDM: Invalid, Exclusive, Shared, Reading, Answering, Waiting and Leaving [11]. For a specific implementation, given details about the actual network, some simplifications of the protocol might be possible. For example, if the network is a shared bus, and if the bus arbiter can be relied upon to choose between multiple recipients when necessary, then the Leaving state is ....

S. Haridi and E. Hagersten. The cache coherence protocol of the Data Diffusion Machine. In Proceddings of the PARLE 89, volume 1, pages 1--18. Springer-Verlag, 1989.


The SDAARC Architecture - Moore, Klauer, Waldschmidt   (Correct)

....is described in [15] Both of these protocols have much in common: both allow (copies of) framelets and data containers to be in one of five states: Exclusive, Clone, Original, Leaving, and Invalid. These states are similar to (but somewhat simpler than) the states in other COMA protocols such as [8] or [22] The main difference between SDAARC and other COMAs lies in the transactions which the protocol allows. While other protocols allow load and store transactions (either blocking or split phase) SDAARC allows only apply transactions, where results from one object (framelet or data ....

S. Haridi and E. Hagersten. The cache coherence protocol of the Data Diffusion Machine. In Proceedings of the PARLE 89, volume 1, pages 1--18. Springer-Verlag, 1989.


Combining Static Partitioning with Dynamic.. - Moore, Klang, Klauer, .. (1999)   (Correct)

....compared to other COMAs. In our protocol [MKW98a] objects can be in one of five states: Exclusive, Cloned, Original, Leaving or Invalid (ECOLI) This can be compared to the standard SMP (non COMA) MESI (Modified, Shared, Exclusive and Invalid) protocols, or the the 7 state COMA protocol in [HH89] The shared address space in SDAARC is separated into three partitions. As in a standard Harvard Architecture, there exists data and instruction partitions. However, we add a third partition for frames, to allow special optimized handling of these essential data objects. Further, as discussed ....

Seif Haridi and Erik Hagersten. The cache coherence protocol of the Data Diffusion Machine. In Proceddings of the PARLE 89, volume 1, pages 1-- 18. Springer-Verlag, 1989.


A LA-COMA Implementation of Parallel Volume Rendering - Law, Yagel (1995)   (Correct)

....one node or shared in several nodes. The directories only contain state information to reduce memory overhead, data are not stored. Such hierarchical directory organization can be found in machines like the Kendall Square Research KSR 1 [2] and Swedish Institute of Computer Science s DDM machine [6]. In another organization of the distributed directories, Stenstrom et al. [14] proposed a COMA F architecture (flat COMA) which does not use hierarchical directory based cache coherency protocol. COMA F uses a flat directory organization in which the directory memories are physically distributed ....

Hagersten, E., S. Haridi, and D.H.D. Warren. "The Cache-Coherence Protocol of the Data Diffusion Machine". Michel Dubois and Shreekant Thakkar, editors, Cache and Interconnect Architectures in Multiprocessors. Kluwer Academic Publishers, 1990.


Page Placement For Non-Uniform Memory Access Time (NUMA) Shared .. - LaRowe, Jr. (1991)   (Correct)

....problem which can be effectively studied in the context of the DUnX operating system kernel. Several scalable shared memory multiprocessors based on hierarchical busses and consistent caches have been proposed (e.g. the VMP MC [CGB89] Encore s Gigamax [Wil87] and the Data Diffusion Machine [HHW90] In these systems, each shared memory module is local to some bus and the processors directly connected to that bus (the memory and processors connected to a bus comprise a cluster of the machine) Despite the presence of coherent caches, however, the benefits of taking advantage of locality in ....

E. Hagersten, S. Haridi, and D.H.D. Warren. The cache coherence protocol of the Data Diffusion Machine. In M. Dubois and S. Thakkar, editors, Proceedings of the Cache and Interconnect Workshop, Norwel, Mass., 1990. Kluwer Academic Publishers.


Page Placement For Non-Uniform Memory Access Time (NUMA) Shared .. - LaRowe, Jr. (1991)   (Correct)

....CGBG88] uses software controlled caching. The Gigamax uses an extended snoopy caching protocol that ensures consistency throughout the bus hierarchy. The Data Diffusion Machine (DDM) at the Swedish Institute of Computer Science is another proposed architecture based on hierarchical busses [HH89] In the DDM proposal, however, the main memory of the machine is comprised only of the cache space at each processor. Any data not available in some processor s cache must be stored on disk. The Wisconsin Multicube computer being developed by Goodman and Woest [GW88] extends the bus based ....

S. Haridi and E. Hagersten. The cache coherence protocol of the Data Diffusion Machine. In Proceedings of PARLES 89, pages 1--18. SpringerVerlag, 1989.


Distributed-Memory 3D Rendering with Object Migration - Law, Yagel (1995)   (Correct)

....In this new approach, data are not assigned statically to the nodes. Rather, they migrate and replicate at processors on demand, leading to better utilization of local memory. Although the memory management mechanism in our algorithm resembles COMA (Cache Only Memory Architecture) machines [2][10], it does not use their hierarchical organization but rather relies on a flat arrangement, thus making it more suitable for implementation on any underlying topology, similar to [18] The paper is organized as follows: in the next section, we give a brief introduction to 3D rendering, and look at ....

E. Hagersten, S. Haridi, D.H.D. Warren. "The Cache-Coherence Protocol of the Data Diffusion Machine". Michel Dubois and Shreekant Thakkar, editors, Cache and Interconnect Architectures in Multiprocessors. Kluwer Academic Publishers, 1990.


Toward The Design Of Large-Scale, Shared-Memory Multiprocessors - Scott (1992)   (3 citations)  (Correct)

....of scalability, including uniformly, architecturally and implementationally scalable. On the other hand, Hill [Hill90] questions whether scalability can be usefully defined at all. He challenges the technical community to either define the term rigorously, or stop using it altogether. Others [Leno90, Cher89, Hage89] use the term without accompanying definition, relying on the readers intuitive definitions. I believe that a rigorous definition of scalability may be of little use, but that we can arrive at a useful working definition. Several qualifications need to be offered at the start, however. First, ....

Hagersten, E. and S. Haridi, The Cache Coherence Protocol of the Data Diffusion Machine, SICS Research Report R-89004, Swedish Institute of Computer Science, Kista, Sweden, May 1989.


K-ary N-trees: High Performance Networks for Massively.. - Petrini, Vanneschi (1995)   (1 citation)  (Correct)

....The communication processor Elan, which interfaces each processing node to the network, attaches at the beginning of each outgoing message a string that is used by the communication processors to route the message in the ascending and descending phases. Other references to fat trees include [HH89, Ken91] Unfortunately, not much is known on the communication performance of the fat trees. Most of the literature deals with the CM 5 and focuses on raw network performance [KTR93, LTD 92, MB94] Typical communication patterns include simple sends and ping pong between pairs of nodes. ....

Seif Haridi and Erik Hagersten. The Cache Coherence Protocol of the Data Diffusion Machine. In PARLE'89, Parallel Architectures and Languages Europe, volume I, pages 1--18, June 1989.


Reactive Proxies: a Flexible Protocol Extension to Reduce.. - Talbot, Kelly   (Correct)

....[4] Architectures based on clusters of bus based multiprocessor nodes provide an element of read combining since caches in the same cluster snoop their shared bus. Caching extra copies of data to speed up retrieval time for remote reads has been explored for hierarchical architectures, including [5]. The proxies approach is different because it does not use a fixed hierarchy: instead it allows requests for copies of successive data lines to be serviced by different proxies. Attempts have been made to identify widely shared data for combining, including the glow extensions to the sci protocol ....

Seif Haridi and Erik Hagersten. The cache coherence protocol of the Data Diffusion Machine. In E. Odijk, M. Rem, and J.-C Syre, editors, PARLE 89 Parallel Architectures and Languages Europe, Eindhoven, volume 365 of Lecture Notes in Computer Science, pages 1--18. Springer-Verlag, June 1989.


A Survey of Verification Techniques for Cache Coherence Protocols - Pong, Dubois (1996)   (Correct)

....provided by a cache coherence protocol which defines a set of rules coordinating processors, cache controllers, and memory controllers. The verification of cache coherence protocols is an important subject which has been neglected for a long time. Many protocols have been proposed and implemented [6, 16, 30, 46, 57, 61, 88]; however, their correctness has never been formally validated. The main reason for this state of affair is that most existing protocols are relatively simple snooping protocols which use broadcast of updates or invalidations to keep data copies consistent. Their correctness can be established by ....

....of the memory word. The memory module which contains memory location a is commonly referred to as the home memory of a and the home memory is located at the home node or the home [61] In the COMA model (figure 1. c) the distributed memory modules are referred to as the attraction memories [46] and act as caches of very large capacity. Data can be replicated freely in the attraction memories of different processor nodes, as if they were caches. A commercial 4 COMA, the KSR 1, has also been called ALLCACHE [79] to emphasize the fact that all memories behave as caches. FIGURE 1. ....

[Article contains additional citation context not shown here]

Haridi, S. and Hagersten E., "The Cache Coherence Protocol of the Data Diffusion Machine", Proc. PARLE 89, Vol. 1, Springer-Verlag, pp. 1-18, 1989.


Highly Concurrent Cache Coherence Protocols - Williams, Reynolds, Jr. (1990)   (Correct)

....the affected block or not. Because snoopy protocols rely on broadcasting, they scale poorly. The ICN, typically but not necessarily a shared bus, becomes saturated with only a few PE s. Researchers are exploring ways to delay this saturation point by assuming multiple buses arranged hierarchically [CGB89, HaH89, Wil87] or in a grid [CaD90, GoW88] This approach is promising for programs with access patterns that allow most broadcasts to be restricted to a local cluster of PE s. Directory protocols represent a more general approach to the scalability problem. These protocols do not require broadcasting. They ....

S. Haridi and E. Hagersten, The Cache Coherence Protocol of the Data Diffusion Machine, Proc. PARLE 89 1(1989), 1-18, Springer-Verlag.


Data Management in Networks: Experimental.. - Krick, der Heide, .. (1999)   (3 citations)  (Correct)

....and experimental work. Several projects deal with the implementation of distributed shared memory in hardware. Generally, there exists the CC NUMA (cache coherent non uniform memory access) concept (see, e.g. 1, 23, 29] and the COMA (cache only memory architecture) concept (see, e.g. [11, 20]) In a CC NUMA, each node contains one or more processors with private caches and a memory module that is part of the global shared memory. The address of a shared memory block specifies the memory module and the location inside the memory module. For an application to deliver high performance it ....

E. Hagersten, S. Haridi, and D. Warren. The cache-coherence protocol of the data diffusion machine. In M. Dubois and S. Thakkar, editors, Cache and Interconnect Architectures in Multiprocessors. Kluwer Academic Publishers, 1990.


Cache Inclusion And Processor Sampling In Multiprocessor.. - Chame, Dubois (1992)   (Correct)

....on these conditions, we then show how to generate a trace of references for a group of processors and how to use it to simulate the activity of the processors in isolation. Applications of this technique are the efficient designs of processor nodes (especially their cache) and processor clusters [9] with a simulation efficiency independent of the number of processors. Finally, we show how to estimate system wide metrics such as number of misses, number of write backs and number of cache coherence events, by measuring them in a set of randomly selected processors. We call this approach ....

E. Hagersten, S. Haridi, and D. Warren, "The Cache-Coherence Protocol of the Data Diffusion Machine". M. Dubois and S. S. Thakkar, editors, Cache and Interconnect Architectures in Multiprocessors, pp. 165-188, Kluwer Academic Publishers, 1990.


An Empirical Comparison of the Kendall Square Research.. - Singh, Joe, Gupta.. (1993)   (37 citations)  (Correct)

.... machines) and (ii) COMA (cacheonly memory architectures 1 ) Examples of CC NUMA machines are the Stanford DASH [9] and MIT Alewife multiprocessors [1] while examples of COMA are the Kendall Square Research KSR 1 [4] and the Swedish Institute of Computer Science s Data Diffusion Machine (DDM) [6]. KSR 1 is a commercial product, DASH is an experimental prototype, and Alewife and DDM are in implementation. CC NUMA and COMA architectures have many important features in common, including a shared address space with physically distributed memory, a scalable interconnection network, and ....

....COMA processing node is similar, except that the main memory on the node is itself converted into a very large, hardware managed cache by adding tags to cacheline sized blocks in main memory. This large cache, which is the only main memory in the machine, is called the attraction memory (AM) 2 [6]. As a result of this arrangement, the location of a data item in the machine is decoupled from its physical address, and the data item is automatically moved (or replicated) by hardware to the attraction memory of a processor that references it. The main advantage of COMA machines is that the ....

Erik Hagersten, Seif Haridi, and David H.D. Warren. The cache-coherence protocol of the data diffusion machine. In Michel Dubois and Shreekant Thakkar, editors, Cache and Interconnect Architectures in Multiprocessors. Kluwer Academic Publishers, 1990.


A Performance Study of the DDM - a Cache-Only Memory.. - Hagersten.. (1991)   (2 citations)  Self-citation (Hagersten Haridi)   (Correct)

....Both approaches have their advantages [EK89] A COMA coherence protocol can adapt the techniques used in other coherence protocols, but must be extended to search for and retrieve a datum on a read miss. The protocol must also make sure that the last copy of a datum is not lost upon replacement[HHW90] The address space of a COMA corresponds to the physical addresses of conventional architectures. There is however nothing very physical about them nor are they addresses, since they do not tell where a datum resides. We still guarantee room for every datum by making the shared address space ....

....state memories, directories, storing state information for all data in their subsystem, but not their values. In this performance study we have used a hierarchical write invalidate cache coherence protocol which uses the directories to make the coherence traffic as local as possible [HHW90] The state of a datum in the directory indicates if the datum resides in the subsystem of the directory, and if so, whether other copies might exist outside the subsystem (stable states: exclusive, shared and invalid) A directory therefore can judge if a coherence transaction needs to be ....

[Article contains additional citation context not shown here]

E. Hagersten, S. Haridi, and D.H.D. Warren. The Cache-Coherence Protocol of the Data Diffusion Machine. In M. Dubois and S. Thakkar, editors, Cache and Interconnect Architectures in Multiprocessors. Kluwer Academic Publisher, Norwell, Mass, 1990.


Emulation of a Virtual Shared Memory Architecture - Raina (1993)   (3 citations)  (Correct)

No context found.

E. Hagersten and S. Haridi. The Cache Coherence Protocol of the Data Diffusion Machine. In Proceedings of the Parallel Architectures and Languages Europe, PARLE, 1989.


Performance Analysis of Wormhole Routed k-ary n-trees - Petrini, Vanneschi (1998)   (Correct)

No context found.

S. Haridi and E. Hagersten, "The Cache Coherence Protocol of the Data Diffusion Machine," In PARLE'89, Parallel Architectures and Languages Europe, volume I, pages 1--18, June 1989.


Stable Performance For Cc-Numa Using First Touch Page.. - Sarah Talbot   (1 citation)  (Correct)

No context found.

Seif Haridi and Erik Hagersten. The cache coherence protocol of the Data Diffusion Machine. In E. Odijk, M. Rem, and J.-C Syre, editors, PARLE 89 Parallel Architectures and Languages Europe, Eindhoven, June 1989, vol.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC