10 citations found. Retrieving documents...
R.E. Johnson. Extending the Scalable Coherent Interface for large-Scale Shared-Memory Multiprocessors. PhD Thesis, University of Wisconsin-Madison, 1993.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A New Scalable Directory Architecture for Large-Scale .. - Acacio, González..   (Correct)

....the sharing code between them. Each directory entry has a pointer to the first sharer in the list, which in turn has a pointer to the second sharer, and so on. All nodes holding a copy of the line are obtained by performing a list traversal. Optimizations to this proposal can be found in [5][12][18] All these schemes introduce significant overhead, drastically increasing the latency of coherence transactions. We have not considered these organizations because they represent a different approach from the implementation point of view. Instead of decreasing directory width, other schemes ....

R.E. Johnson. Extending the Scalable Coherent Interface for large-Scale Shared-Memory Multiprocessors. PhD Thesis, University of Wisconsin-Madison, 1993.


Example-Standard Contents - These Cover Sheets   (Correct)

....1.9.1 Overview The base SCI coherence protocols are limited, in that sharing list additions and deletions are serialized, at the memory and the sharing list head respectively. Simulations indicate this is sufficient when frequentlywritten data is shared by a small number (10 s) of processors [ExtJohn]. Although linear lists are a cost effective solution for many applications, our scalability objectives also mandated a solution for (nearly) arbitrary applications on large multiprocessor systems. Thus, we are developing compatible extensions [ExtStd] to the base coherence protocols. The ....

Ross Evan Johnson, "Extending the Scalable Coherent Interface for Large-Scale Shared-Memory Multiprocessors," PhD Thesis, Computer Sciences Department, University of Wisconsin-Madison, February 1993. Computer Sciences Technical Report #1136.


The Performance of SCI Memory Hierarchies - Roberto Hexsel Nigel (1994)   (1 citation)  (Correct)

....SCI interface on each node. Scalability to 64K nodes comes at the price of added complexity in the communication and coherence protocols. For instance, a write to a shared datum needs a larger number of network messages for its completion than needed by the same operation in DASH [21] Johnson, in [20], proposes additions to the cache coherence protocol to alleviate this problem. Additional links can be used in the linked lists, thus turning them into trees, and significantly improving the performance of invalidations when there is global sharing. Aboulenein et.al, in [1] examine SCI s ....

Ross Evan Johnson. Extending the Scalable Coherent Interface for large-scale shared-memory multiprocessors. Tech Report ????, Univ of Wisconsin--Madison, 1993. PhD thesis.


Identification And Optimization Of Sharing Patterns For Scalable.. - Kaxiras (1998)   (4 citations)  (Correct)

....that provide highquality support for widely shared data may be much larger than would be indicated by a sample of current shared memory programs, which generally avoid such data wherever possible. Previous proposals for a wide sharing optimization, such as the STEM Kiloprocessor Extensions to SCI [44] and others [17,69,76] have largely ignored network locality in the network or they are closely tied to a network that is physically hierarchical. In this thesis, I propose a comprehensive solution to optimize wide sharing that borrows from the best attributes of previous proposals. The solution ....

....In brief, I propose: For wide sharing: the GLOW optimization; a static address based, a static instructionbased, two dynamic address based, and two dynamic instruction based methods to selectively apply the optimization. WIDE SHARING Static Dynamic Address EC [17] PROXIES [96] 15] STEM [44] Combining [36] Instruction MIGRATORY SHARING Static Dynamic Address Munin [22] Adaptive protocols for migratory data [28] 93] Instruction PRODUCERCONSUMER SHARING Static Dynamic Address Munin [22] Update protocols Competitive Update Instruction Data Forwarding ....

[Article contains additional citation context not shown here]

Ross E. Johnson, "Extending the Scalable Coherent Interface for Large-Scale ShardMemory Multiprocessors." Ph.D. Thesis, University of Wisconsin-Madison, 1993.


Hierarchical Extensions to SCI - Goodman, Kaxiras (1994)   (Correct)

....for the use of Multiple Types of Tags . 41 Acknowledgments . 43 References . 43 1. Motivation The current proposal for Kiloprocessor Extensions to Scalable Coherent Interface [1] which we will refer to as STEM [2], has achieved its goal of a logarithmic time algorithm to build, maintain and invalidate a sharing binary tree without taking into account the topology of the interconnect network. The complexity of the algorithm however, is very high. It requires complex transactions that generate a lot of ....

....3 5 125 4 5 625 5 5 3,125 6 5 15,625 Table 1. For cases where our tree is longer than STEM we believe our scheme will still out perform STEM because of the shorter paths of individual links. 3. 2 OMEGA FLIP Topologies In his thesis Ross Johnson suggested building butterfly topologies with rings [2]. A disadvantage of the topologies he suggested, is that the rings have a varying number of nodes. We propose two variations of these topologies, the Omega [7] and the Flip. The important characteristic of these two topologies is that they are built of rings of the same size. The ring based Omega ....

Ross E. Johnson. "Extending the Scalable Coherent Interface for large-Scale Shard-Memory Multiprocessors," PhD Thesis, University of Wisconsin-Madison, 1993.


An Efficient Hybrid Cache Coherence Protocol for Shared Memory .. - Chang, Bhuyan (1999)   (1 citation)  (Correct)

....network and therefore, a request may be forwarded to a distant node although it could have been satisfied by a neighboring node. The major disadvantage is the sequential nature of the invalidation process for write misses. The scalable tree protocol (STP) 7] and the SCI tree extension protocol [8] were proposed to reduce the latency of write misses. The low latency of read misses is sacrificed in order to construct a balanced tree connecting all the shared copies of a cache block. The large number of messages generated for read misses, however, makes these protocols prohibitive for an ....

....The index i of Dir i Tree k represents the number of nodes having shared copies in their local caches. STP [7] belongs to Dir 2 Tree k because it maintains a k ary tree and keeps pointers to the root of the tree and the latest node joining the tree. Similarly, the SCI tree extension (P1596.2 [8]) belongs to Dir 2 Tree 2 because it maintains a balanced binary tree and keeps two pointers, one to the root of the tree and the other to the head (latest node joining the tree) Our tree based protocol is a Dir i Tree k scheme with only forward pointers. 2.1 Bit map Schemes A. Full Map ....

[Article contains additional citation context not shown here]

R. E. Johnson, Extending the Scalable Coherent Interface for Large-Scale Shared-Memory Multiprocessors, PhD thesis, University of WisconsinMadison, 1993.


CC-NUMA Page Table Management and Redundant Linked List Based.. - Vlaovic   (Correct)

....shared memory and cache coherence as well as communication. However, ATM may be used as the underlying data transferring mechanism for SCI. Efforts are being made to improve SCI cache coherence protocol for large scale sharing (sharing a cache line among hundreds or even thousands of nodes) [19, 22] presented later in this paper and in Kiloprocessor Extensions to SCI (P1596.2) This line of work will eventually lead to more complex but effective directory structures. There is a plethora of current research related to operating systems. We are particularly interested in the work related to ....

....this structure will also default to the sequential linked list. Since new additions to the list must be made at the root (head) maintaining a balanced tree is not trivial. The working group in charge of these extensions is the Kiloprocessor Extensions to SCI (P1596.2) Part of the Wisconsin STEM [19] effort on behalf cache coherency is being considered as an extension to SCI. The Wisconsin STEM (permuted acronym for Tree Merging Extensions to SCI) also organizes the sharing set as a binary tree. The overhead included for each 64 bit cache line includes three pointers and one five bit height. ....

[Article contains additional citation context not shown here]

R.E. Johnson. Extending the Scalable Coherent Interface for Large-Scale Shared-Memory Multiprocessors. PhD thesis, University of Wisconsin-Madison, 1993.


Reducing Controller Contention in Shared-Memory.. - Talbot, Kelly   (Correct)

.... in the interconnection network has been used to avoid contention for reads, writes and fetch and update operations, for example in the NYU Ultracomputer [7] and the Saarbrucken SB PRAM prototype [1] Attempts have been made to use combining in a cache coherence protocol, notably Johnson s STEM [12] and Kaxiras and Goodman s GLOW extensions to the SCI protocol [14] Both have overheads which make them suitable only for data structures where the benefits outweigh the costs. Architectures based on clusters of bus based multiprocessor nodes provide an element of read combining since caches in ....

....set at 1e 6. Trace results of a 32 node system showed that queues of length 31 were occurring for access to elements of the f array which forms part of the G Memory data structure. ffl Ocean non contig: The original splash suite version of ocean, which this is similar to, was used by Ross Johnson [12] as an example of long sharing lists, so it was specifically selected to see if it exhibits read contention. ffl Barnes, FFT, Ocean contig, Water nsq: in the absence of any obvious data contention within these applications, all the significant shared data structures were marked for proxying to ....

[Article contains additional citation context not shown here]

Ross E. Johnson. Extending the Scalable Coherent Interface for Large-Scale Shared-Memory Multiprocessors. PhD thesis, Computer Science Department, University of Wisconsin-Madison, Feb 1993.


Kiloprocessor Extensions to SCI - Stefanos Kaxiras (1996)   (1 citation)  (Correct)

....may devote a significant part of its time to widely shared data accesses. The ANSI IEEE Standard 1596 Scalable Coherent Interface (SCI) 1] defines a cache coherence protocol based on distributed sharing lists. To improve the performance for widelyshared data in SCI, the GLOW [4] and STEM [2] kiloprocessor extensions are developed. Both are designed to handle accesses to widely shared data and provide good scalability to large numbers of processors. To enable scalability of programs, both scalable reads and scalable writes for widely shared data are essential. Request combining, ....

....by themselves (if they are GLOW agents) When the agent finds itself childless and tail in its list, it will invalidate and roll out from its upstream neighbor, freeing that to invalidate itself. 4 STEM kiloprocessor extensions to SCI The STEM extensions to SCI, originally developed by Johnson [2], provide a logarithmic time algorithm to build, maintain and invalidate a binary sharing tree (in contrast to GLOW s k ary trees) without regard to the topology of the interconnection network. STEM employs combining in the interconnect to provide scalable reads. As employed in STEM, combining ....

[Article contains additional citation context not shown here]

Ross E. Johnson, "Extending the Scalable Coherent Interface for Large-Scale Shared-Memory Multiprocessors," PhD Thesis, University of Wisconsin-Madison, 1993.


Request Combining in Multiprocessors with Arbitrary.. - Lebeck, Sohi (1994)   (7 citations)  (Correct)

....the L locations (nodes of the combining tree) must be distributed across the memory modules in order to alleviate excessive contention for a single memory module. Yew, Tzeng, and Lawrie show how software combining can be used for barrier operations [32] Goodman, Vernon and Woest [5] and Johnson [12] extend the work of Yew, Tzeng, and Lawrie to carry out arbitrary Fetch F operations with a software combining tree. Tang and Yew also provide several algorithms for traversing a combining tree where the type of memory access determines which algorithm is chosen (e.g. barrier synchronization, ....

....the data structure used to maintain the combining set, as well as the algorithm used to carry out the prefix operation on the combining set. One step in this direction is a recent thesis by Johnson where he investigates the use of IPC to build a tree to implement a scalable cache coherence scheme [12]. More direct comparisons between the different forms of combining, using real application workloads, and different network topologies, also need to be done so that we can get a better picture of the cost performance benefits of the various techniques for request combining. ....

Johnson, R. E., "Extending the Scalable Coherent Interface for Large-Scale Shared-Memory Multiprocessors, " Ph. D. Thesis, Department of Computer Science (Technical Report #1136), University of WisconsinMadison, Madison, WI 53706, February 1993.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC