| J. B. Carter et al. Techniques for Reducing ConsistencyRelated Information in Distributed Shared Memory Systems. ACM Trans. on Computer Systems, 13(3), Aug 1995. |
....IBM. Keywords: C.1.2 Multiple Data Stream Architectures (Parallel Processors) B.3.2 Cache Memories, C.4 Performance of Systems while(continue cond) f . x = hash[index1] hash[index2] y; g Epoch 1 Epoch 2 Epoch 3 Epoch 4 hash[10] hash[21] hash[30] hash[25] hash[3] = hash[19] hash[33] hash[10] attempt commit( attempt commit( attempt commit( attempt commit( Violation Redo Processor1 Processor2 Processor3 Processor4 Epoch 4 hash[25] hash[10] attempt commit( ....
....ways to resolve the situation. One option is to simply squash the more speculative epoch. Alternatively, we could allow both epochs to modify their own copies of the cache line and combine them with the real copy of the cache line as they commit, as is done in a multiple writer coherence protocol [2, 3]. The action G.Combine indicates that the current cache line should be combined with the copy stored at the next level of caching in the external memory system. If combining is not supported, the G.Combine action is simply ignored. When an epoch becomes homefree, it may allow its speculative ....
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....is to simply squash the logically later epoch, as is the case for our baseline scheme. Alternatively, we could allow both epochs to modify their own copies of the cache line and combine them with the real copy of the cache line as they commit, as is done in a multiple writer coherence protocol [3, 4]. To support multiple writers in our coherence scheme thus allowing multiple speculatively modified copies of a single cache line to exist we need the following two new features. First, an invalidation speculative will only cause a violation if it is from a logically earlier epoch and the ....
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-relatedinformation in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....in 3D FFT : 23 1 Chapter 1 Introduction This thesis focuses on protocols for lazy release consistent (LRC) KCZ92] software distributed shared memory (DSM) LH89] on commodity hardware. Both single writer (SW) Kel96] and multiple writer (MW) CBZ95] protocols have been used to implement LRC. SW protocols allow only a single writable copy of a page at any given time. Furthermore, they always transfer a whole page to satisfy an access miss. With MW protocols, several writable copies of a page may co exist. Instead of transferring whole pages, ....
....while on the remaining six applications, the multiple writer protocol performed better. For two out of the six applications the multiple writer protocol performed significantly (43 ) better. On the average, for the other six the SW protocol performed 3 better. Other systems (such as Munin [CBZ95] allow multiple protocols to be used, but require user annotation to choose between them. In this thesis we take an alternative approach. We observe that for some applications a MW protocol is preferred while for others a SW protocol is more desirable. Even within a single application, different ....
[Article contains additional citation context not shown here]
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....found in [3, 4, 16] To the best of our knowledge, our work is the first to compare performance of a number of applications under release and entry consistency. Various papers have discussed the performance differences between sequential consistency and relaxed memory consistency models (see e.g. [5, 9, 17]) Our work also builds on the large body of previous work in software DSM (e.g. 14] and hardware DSM (e.g. 2, 10] 7 Conclusions This paper presents two related results. First, it compares two different implementations of entry consistency, one using software dirty bits and one using ....
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. To appear in ACM Transactions on Computer Systems.
....in this area ever since Li and Hudak s seminal work [23] on software distributed shared memory (DSM) in 1985. Today, it is generally accepted that the ill effects of false sharing can be reduced, but not entirely eliminated, using a relaxed memory consistency model and a multiple writer protocol [2, 7, 11, 22]. Despite this, the conventional wisdom remains that the overhead of false sharing, as well as fine grained true sharing, in page based consistency protocols is the primary factor limiting the performance of software DSMs. In this paper, we show that a simple data reorganization technique can ....
....performance results. Finally, Section 7 discusses related work and Section 8 concludes the paper. 2 Effects of Data Locality in Software DSMs In this paper, we focus on data locality in page based software DSM systems that use lazy release consistency (LRC) 22] and multiple writer protocols [7], such as TreadMarks [2] and Home based LRC [31] Lazy release consistency implements the release consistency (RC) 11] memory model. The LRC algorithm delays the propagation of modifications to a processor until that processor performs an acquire. An acquire marks the beginning of a critical ....
J. Carter, J. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, Aug. 1995.
....is IEEE Transactions on Automatic Control. 2 a result, the distance between processors and memories are non uniform and the architecture is called non uniform memory access (NUMA) DSM multiprocessors maintain coherent data based on software techniques or using hardware support. Software DSMs [1, 11] maintain coherence at the page level by recording and merging changes using software routines at synchronization points. Another software method for maintaining cache coherence at the block level is the use of compiler directed techniques that insert logical timestamps [15] To avoid the software ....
....that allocate data in local memories intelligently, thus increasing data locality and eliminating or at least significantly reducing the need for remote accesses. Work on memory management policies has been based primarily on DSM multiprocessors without caches [41, 64, 66] and software DSMs [1, 11] that enable coherence at the page level. The impact of memory management policies on CC NUMA system performance was only recently studied in [14, 77] These studies primarily focus on OS hardware support needed for dynamic memory management policies involving page migration and replication. They ....
J. Carter, J. Bennett and W. Zwaenepoel, "Techniques for Reducing ConsistencyRelated Information in Distributed Shared Memory Systems," ACM Transaction on Computer Systems, 13(3):205:243, Aug 1995.
....message exchanges to maintain data coherency. The issue of how to make fast and efficient data movement is thus important for the design of DSM. Some early researchers in this area resorted to the hardware approach to coherency enforcement [1, 2, 3] while others sought software solutions [4, 5]. The relaxed memory consistency models may incorporate hardware to reduce the latency associated with remote memory accesses [1, 2, 3] However, the cost involved in special purpose hardware may pose a major concern. The software approaches, in contrast, put an emphasis on efficient data movement ....
....approach is more expensive than a hardware approach. In the release consistency model [3] shared memory updates do not become visible to a particular processor until certain synchronization access is done (for example, until the release of a lock to the next processor) Eager release consistency [4] is a software implementation of the release consistency [3] with buffers written till a lock release in order to reduce the number of messages exchanged. The lazy release consistency (LRC) model proposed by Keleher et al. 5] achieves the best performance among known software approaches. The ....
[Article contains additional citation context not shown here]
J. B. Carter et al., "Techniques for Reducing ConsistencyRelated Information in Distributed Shared Memory Systems, " ACM Trans. Computer Systems, Vol. 13, No. 3, pp. 205--243, August 1995.
....provides a significant reduction in execution time (11 , 16 , 33 and 46 ) compared to traditional multiplewriter approaches on our platform. Keywords: Memory consistency, false sharing, single writer, multiple writer, multiple protocols. 1 Introduction Both single writer [8] and multiple writer[4] approaches have been used to implement LRC[6] in software DSM systems. Singlewriter approach allows only one writable copy of a page at any given time, and always transfers an entire page to satisfy an access miss. This work is supported in part by Natinal Natural Science Foundation of China ....
John B. Carter, John K. Bennett, and Willy Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Compute Systems, 13(3): 205-243, August 1995.
....and lock and barrier synchronization. The system supports a release consistent (RC) memory model [10] requiring the programmer to use explicit synchronization to ensure that changes to shared data become visible. TreadMarks uses a lazy invalidate [14] version of RC and a multiple writer protocol [3] to reduce the overhead involved in implementing the shared memory abstraction. The virtual memory hardware is used to detect accesses to shared memory. Consequently, the consistency unit is a virtual memory page. The multiple writer protocol reduces the effects of false sharing with such a large ....
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....overheads that limit performance. SW DSMs based on relaxed consistency models can reduce these overheads by delaying and or restricting communication and coherence transactions as much as possible. The Munin system This research was supported by Brazilian FINEP MCT, CNPq, and CAPES. [5], for instance, delays the creation and transfer of diffs (encoded modifications to shared pages) until lock release operations, so that messages can be coalesced and the negative impact of false sharing alleviated. TreadMarks [2] delays the coherence transactions even further, until the next lock ....
....bindings either. Both the ScC and AEC protocols assume update based coherence protocols, while ADSM only uses updates for single writer data protected by locks. SW DSM protocols vary mainly in the way they manage the propagation of coherence information at the synchronization points; Munin [5], TreadMarks [2] and its Lazy Hybrid variation [6] and Midway [3] are important examples. AEC leads to much less communication than in Munin, since updates are only sent to the update set of the lock releaser, as opposed to all processors that shared the modified data. Like AEC, TreadMarks and ....
J. B. Carter, J. K. Bennett, and W. Zwaenepoel. Techniques for Reducing Consistency-Related Information in Distributed Shared Memory Systems. ACM Transactions on Computer Systems, 13(3), Aug 1995.
.... is the oldest epoch: if( speculation succeeded( f Try again non speculatively: x[my i] x[y[my i] g g join threads( i = N; sequence # i=0 fork threads( x[N] x[y[N] join threads( join threads( 1 N 2 7 join threads( 3 6 x[1] x[21] join threads( 4 5 x[4]=x[24] x[2] x[22] x[3] x[23] x[5] x[25] x[6] x[5] x[7] x[27] x[6] x[5] i=N Violation (c) Loop executed speculatively using recycled threads and static scheduling. i = 0; Fork to all processors, returns number of threads created: num threads = fork threads( start) start: my i = i; ....
.... x[y[my i] g Schedule the next epoch statically: my i = my i num threads; g join threads( i = N; i=0 fork threads( join threads( 4 8 12 x[N] x[y[N] join threads( i=N 1 join threads( 1 5 9 N 2 10 join threads( 3 7 11 15 6 x[1] x[21] x[2] x[22] x[3] x[23] x[4]=x[24] x[5] x[25] x[6] x[5] x[7] x[27] x[8] x[28] x[9] x[29] x[6] x[5] x[11] x[31] x[12] x[32] x[10] x[30] x[15] x[35] Violation Figure 2: Speculative execution illustrated using a while loop. a) A simple while loop. Initialize y so that there is a RAW dependence in iteration 6 of the ....
[Article contains additional citation context not shown here]
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....lock and barrier synchronization. The system supports a release consistent (RC) memory model [10] The programmer is required to use explicit synchronization to ensure that changes to shared data become visible. TreadMarks uses a lazy invalidate [14] version of RC and a multiple writer protocol [3] to reduce the overhead involved in implementing the shared memory abstraction. The underlying virtual memory hardware is used to detect accesses to shared memory. Consequently, the consistency unit is a virtual memory page. The multiple writer protocol reduces the effects of false sharing with ....
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
.... on the other hand, usually require either system software supports for memory management, networking, and interrupts [17, 5, 30, 12, 11] or hardware supports, e.g. special network interface for efficient communications [14, 25] or cache coherence [15, 26] Among the existing DSM systems, Munin [5, 6] shares the most common features with Adsmith: both support release consistency (RC) model and both manipulate shared data with objects. They also allow users to specify the properties of each shared object. For example, the shared object can be replicated elsewhere, fixed in a particular site or ....
....is not straightforward for Munin to port to other systems, because Munin requires a linker to create the shared memory segments and operating system supports to handle page faults and manipulate page tables. In reducing consistency related communication, Munin provides an update timeout mechanism [6] to invalidate replicas of a cached variable that has not been accessed recently. The aim is to reduce the number of messages with unnecessary updates or stale updates. TreadMarks [12] a descendant of Munin, supports a variation of the RC model, called the lazy release consistency (LRC) model ....
J.B. Carter, J.K. Bennett, and W. Zwaenepoel, "Techniques for reducing consistency-related information in distributed shared memory systems," ACM Trans. Computer Systems, vol. 13, no. 3, pp. 205--244, Aug 1995.
.... by the updated page itself if there is only one writable copy of a page at any given time (i.e. a single writer scheme [15, 8] or by the diff obtained from the updated page and its twin if there are multiple writable copies of a page on different processors (i.e. a multiplewriter scheme [5]) An updates propagation protocol can be integrated with either a single writer scheme or a multiple writer scheme. In the following discussion on updates propagation protocols, we assume a DSM system which achieves both time selection and processor selection by requiring programs to explicitly ....
....[1] The experimental platform consists of 8 SGI workstations running IRIX Release 5.3, which are connected by a 10 Mbps Ethernet. Each of the workstations has a 100 MHz processor and 32 Mbytes memory. The page size in the virtual memory is 4 KB. TreadMarks has adopted a multiple writer scheme [5], which was proposed to minimize the effect of false sharing problem [12] In the multiple writer scheme, initially a page is write protected. When a write protected page is first updated by a processor, a twin of the page is created and stored in the system space. When the updates on the page ....
J.B. Carter, J.K. Bennett, and W. Zwaenepoel: "Techniques for reducing consistency-related information in distributed shared memory systems, " ACM Transactions on Computer Systems, 13(3):205-243, August 1995.
....modifications are merged at the next synchronization operation. By delaying the merger of the modifications, the ill effects of false sharing are reduced because the page will not ping pong between the writers. The first implementation of a multiple writer protocol appeared in the Munin system [5]. Munin, however, used a form of eager RC that is generally inferior to LRC [10] Since then two multiple writer protocols that are compatible with LRC have been developed: the TreadMarks (Tmk) protocol [11] and the Princeton homebased (HLRC) protocol [20] Some aspects of these protocols are the ....
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....synchronize processors within a SMP node in addition to using internal locks for all protocol operations on shared data blocks. In their paper, Erlichson et al. 7] present a single writer sequential consistency implementation, and identify network bandwidth as the bottleneck. Earlier work (e.g. [5]) has demonstrated that the performance of such a system can be poor when false sharing occurs. Finally, our implementation is a relatively portable user level implementation, while theirs is a kernel implementation speci c to the Power Challenge Irix kernel. Cashmere 2L [18] uses Unix processes ....
J. Carter, J. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205-243, Aug. 1995.
....memory (DSM) implementations that do not use type information (e.g. 1, 13] Conventional VM based DSM systems have only achieved good performance on relatively coarse grained applications, because of their reliance on VM pages. Although relaxed memory models [9] and multiple writer protocols [6] reduce the impact of the large page size, ne grained sharing and false sharing remain problematic [1] Fine grained DSM systems 2 have been built using code instrumentation, but they have been limited by the cost of instrumentation and lack of communication aggregation [8] The system presented ....
J. Carter, J. Bennett, and W. Zwaenepoel. Techniques for reducing consistencyrelated information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205-243, Aug. 1995.
....memory (DSM) implementations that do not use type information (e.g. 1, 13] Conventional VM based DSM systems have only achieved good performance on relatively coarse grained applications, because of their reliance on VM pages. Although relaxed memory models [9] and multiple writer protocols [6] reduce the impact of the large page size, fine grained sharing and false sharing remain problematic [1] Fine grained DSM systems have been built using code instrumentation, but they have been limited by the cost of instrumentation and lack of communication aggregation [8] The system presented ....
J. Carter, J. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, Aug. 1995.
....are 17 and 40 for the two versions, respectively. 10 Related Work We have already compared our work extensively to TreadMarks [1] and Shasta [11] using them as examples of coarse grained and fine grained DSM systems. Qualitatively similar comparisons can be made with other DSM systems [9, 5, 16]. We have also compared our work to the MultiView approach used in Millipede [8] Dwarkadas et al. 6] compare Cashmere, a coarsegrained system, and Shasta running on an identical platform a cluster of four four way AlphaServers connected by a Memory Channel network. Both systems are designed ....
J. Carter, J. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, Aug. 1995.
....to keep in mind where the data is, decide when to communicate with other processors, whom to communicate with, and what to communicate, making it hard to program in message passing, especially for applications with complex data structures. Software distributed shared memory (DSM) systems (e.g. [28, 4, 2, 19]) provide a shared memory abstraction on top of the native message passing facilities. An application can be written as if it were executing on a shared memory multiprocessor, accessing shared data with ordinary read and write operations. The chore of message passing is left to the underlying DSM ....
....by one of the writers after a synchronization point, diff requests are sent to all of the writers, causing extra messages and data to be sent. In the current implementation of TreadMarks, diff accumulation occurs for migratory data. Migratory data is shared sequentially by a set of processors [4, 27]. Each processor has exclusive read and write access for a time. Accesses to migratory data are protected by locks in TreadMarks. Each time a processor accesses migratory data, it must see all the preceding modifications. In TreadMarks, this is implemented by fetching all diffs created by ....
[Article contains additional citation context not shown here]
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistencyrelated information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....The adaptations considered in this paper are triggered automatically: the run time system detects certain access patterns and switches between protocols accordingly. This automated adaptation distinguishes our work from so called multi protocol software shared memory implementations (e.g. [8]) in which the user has to annotate the program to select the appropriate protocol. In our experience, removing the need for annotation leads to much improved usability. The adaptive protocols were implemented in TreadMarks [1] Our experimental platform is a switched 100Mbps Ethernet consisting ....
.... however, can be much improved by the use of RC or LRC, especially for software implementations of shared memory, because the messages propagating the shared memory modifications can be delayed and coalesced with the synchronization messages, leading to a substantial reduction in communication [8], 16] In addition to being data race free, all synchronization in the program must be done through the primitives supplied by the runtime system, so that it can take the required consistency actions at synchronization points. We assume that the shared memory is implemented as a global virtual ....
[Article contains additional citation context not shown here]
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....003604 017, and by grants from IBM Corporation and from Tech Sym, Inc. 1 Introduction This paper focuses on protocols for implementing lazy release consistent (LRC) 14] software distributed shared memory (DSM) 16] on commodity hardware. Both single writer (SW) 13] and multiple writer (MW) [6] protocols have been used to implement LRC. SW protocols allow only a single writable copy of a page at any given time. Furthermore, they always transfer a whole page to satisfy an access miss. With MW protocols several writable copies of a page may co exist. Instead of transferring whole pages, ....
....writable copy of a page at any given time. Furthermore, they always transfer a whole page to satisfy an access miss. With MW protocols several writable copies of a page may co exist. Instead of transferring whole pages, MW protocols transfer diffs, records of the modifications made to a page [6]. SW protocols suffer from the ping pong effect in the case of write write false sharing (concurrent writes from different processors to non overlapping parts of the page) Furthermore, if only a single word in a page is changed, then it is clearly undesirable to transmit the entire page, ....
[Article contains additional citation context not shown here]
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....the Battle Against False Sharing has been a dominant, if not the dominant, theme of research in this area. Today, it is generally accepted that the ill effects of false sharing can be reduced, but not entirely eliminated, using a relaxed memory consistency model and a multiple writer protocol [1, 3, 9, 13]. Despite this, the conventional wisdom remains that the overhead of false sharing, as well as fine grained true sharing, in pagebased consistency protocols is the primary factor limiting the performance of software DSM. In a software DSM that uses lazy release consistency (LRC) 13] and multiple ....
....the conventional wisdom remains that the overhead of false sharing, as well as fine grained true sharing, in pagebased consistency protocols is the primary factor limiting the performance of software DSM. In a software DSM that uses lazy release consistency (LRC) 13] and multiple writer protocols [3], such as TreadMarks [1] false sharing can have two harmful effects: it can cause extra messages to be sent, and it can cause additional data to be sent on messages that also carry truly shared data. We refer to these as useless messages and useless data. For the applications in our test suite, ....
[Article contains additional citation context not shown here]
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....a shared memory update gets propagated. TreadMarks uses the lazy release consistency algorithm [5] to implement release consistency. Roughly speaking, lazy release consistency enforces consistency at the time of an acquire, in contrast to the earlier implementation of release consistency in Munin [3], sometimes referred to as eager release consistency, which enforced consistency at the time of a release. Figure 6 illustrates the intuitive argument behind lazy release consistency. Assume that x is replicated at all processors. With eager release consistency a message needs to be sent to all ....
....contrast, in release consistency, messages are sent for every synchronization operation. Although the net effect is somewhat application dependent, release consistent DSMs in general send fewer messages than sequentially consistent DSMs and therefore perform better. A recent paper by Carter et al. [3] contains a comparison of seven application programs run either with eager release consistency (Munin) or with sequential consistency. Compared to a sequentially consistent DSM, Munin achieves performance improvements ranging from a few to several hundred percent, depending on the application. ....
[Article contains additional citation context not shown here]
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....to keep in mind where the data is, decide when to communicate with other processors, whom to communicate with, and what to communicate, making it hard to program in message passing, especially for applications with complex data structures. Software distributed shared memory (DSM) systems (e.g. [30, 4, 16, 21]) provide a shared memory abstraction on top of the native message passing facilities. An application can be written as if it were executing on a shared memory multiprocessor, accessing shared data with ordinary read and write operations. The chore of message passing is left to the underlying DSM ....
....is imperative to use explicit synchronization, as data is moved from processor to processor only in response to synchronization calls (see Section 2.2.2) 2.2. 2 TreadMarks Implementation TreadMarks uses a lazy invalidate [16] version of release consistency (RC) 10] and a multiple writer protocol [4] to reduce the amount of communication involved in implementing the shared memory abstraction. The virtual memory hardware is used to detect accesses to shared memory. RC is a relaxed memory consistency model. In RC, ordinary shared memory accesses are distinguished from synchronization accesses, ....
[Article contains additional citation context not shown here]
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....notices cause the corresponding pages to be invalidated, resulting in a page fault at the time of access. The write notices inform the faulting processor whom it needs to communicate with to get the necessary modifications to the page. TreadMarks uses a multiple writer protocol, retrieving diffs [8] at the time of an access miss rather than whole pages. Diffs are produced by the TreadMarks write detection mechanism. Initially, a page is write protected. When a processor first writes to the page, it incurs a protection violation and TreadMarks makes a twin, a copy of the unmodified page. When ....
....copy to create a diff containing the changes. This diff is transmitted to the faulting processor and merged into its copy of the page. In addition to reducing communication, multiple writer protocols have the benefit of reducing false sharing overheads by allowing multiple concurrent writers [8]. Recent studies (e.g. 22] have shown that, for relatively coarse grained applications, software DSM provides good performance, although there still remains a sizable gap between the performance of DSM and message passing for some applications. In particular, in a comparison of PVM and ....
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM TOCS, August 1995.
....(in line checks) before shared memory references. Novel techniques are developed to minimize the overhead of in line checks. In their paper, Erlichson et al. 6] present a single writer sequential consistency implementation, and identify network bandwidth as the bottleneck. Earlier work (e.g. [4]) has demonstrated that the performance of such a system can be poor when false sharing occurs. Finally, our implementation is a relatively portable user level implementation, while theirs is a kernel implementation specific to the Power Challenge Irix kernel. Cashmere 2L [12] uses Unix processes ....
J. Carter, J. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, Aug. 1995.
....all processors in the system have arrived at the same barrier. Locks are used to control access to critical sections. No processor can acquire a lock if another processor is holding it. TreadMarks uses a lazy invalidate [2] version of release consistency (RC) 9] and a multiple writer protocol [5] to reduce the overhead involved in implementing the shared memory abstraction. RC is a relaxed memory consistency model. In RC, ordinary shared memory accesses are distinguished from synchronization accesses, with the latter category divided into acquire and release accesses. RC requires ....
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....data that it will read. With EC, no communication takes place. With LRC, at the barrier before the second phase, the page must be invalidated at both processors. Our implementations of LRC are, however, not subject to the ping pong effect, because they allow multiple concurrent writers per page [4]. Rebinding. The rebinding effect is an artifact of the extra synchronization required in EC (see Section 3.3) In the EC implementations, the acquire message carries the acquirer s value of the incarnation number for the lock. This value indicates to the releasing processor the last time that ....
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM TOCS, August 1995.
....out what to send and whom to send it to. Our results show that because of the use of release consistency and the multiple writer protocol, Tread Marks performs comparably with PVM on a variety of problems in the experimental environment examined. These results are corroborated by those in [4], which performed a similar experiment comparing the Munin DSM system against message passing on the V System [6] For five out of the twelve experiments, TreadMarks performed within 10 of PVM. Of the remaining experiments, Barnes Hut and to a lesser extent IS Large exhibit poor performance on ....
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. To appear in ACM Transactions on Computer Systems.
....003604 017, and by grants from IBM Corporation and from Tech Sym, Inc. 1 Introduction This paper focuses on protocols for implementing lazy release consistent (LRC) 14] software distributed shared memory (DSM) 16] on commodity hardware. Both single writer (SW) 13] and multiple writer (MW) [6] protocols have been used to implement LRC. SW protocols allow only a single writable copy of a page at any given time. Furthermore, they always transfer a whole page to satisfy an access miss. With MW protocols, several writable copies of a page may co exist. Instead of transferring whole pages, ....
....the modifications. The memory costs can be bounded by garbage collection, but frequent garbage collection results in added execution time. CVM [13] uses a SW protocol, while TreadMarks [3] uses a MW protocol (see the work of Keleher [13] for a study of the tradeoffs) Other systems (such as Munin [6]) allow multiple protocols to be used, but require user annotation to choose between them. In this paper we take an alternative approach. We observe that for some applications a MW protocol is preferred while for others a SW protocol is more desirable. Even within a single application, different ....
[Article contains additional citation context not shown here]
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
....and lock and barrier synchronization. The system supports a release consistent (RC) memory model [11] requiring the programmer to use explicit synchronization to ensure that changes to shared data become visible. TreadMarks uses a lazy invalidate [17] version of RC and a multiple writer protocol [6] to reduce the overhead involved in implementing the shared memory abstraction. The virtual memory hardware is used to detect accesses to shared memory. Consequently, the consistency unit is a virtual memory page. The multiple writer protocol reduces the effects of false sharing with such a large ....
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
No context found.
J. B. Carter et al. Techniques for Reducing ConsistencyRelated Information in Distributed Shared Memory Systems. ACM Trans. on Computer Systems, 13(3), Aug 1995.
No context found.
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
No context found.
J.B. Carter, J.K. Bennett, and W. Zwaenepoel: "Techniques for reducing consistencyrelated information in distributed shared memory systems," ACM Transactions on Computer Systems, 13(3), pp.205-243, August 1995.
No context found.
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205--243, August 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC