| L.I.Kontothanassis and M.L.Scott. Using Memory-Mapped Network Interfaces to Improve the Performance of Distributed Shared Memory. In Proc. of the 2nd IEEE Symp. on HighPerformance Computer Architecture (HPCA-2), 1996. |
....brings large performance penalties. This is a valid concern not only for DSM machines, since large shared memory machines also have a home node concept. However, home node migration would probably make allocation considerations superfluous. 6. RELATED WORK Most DSM systems are either page based [17, 20, 19] or objectbased [4, 5, 16] while discarding transparency. Jackal manages pages to implement a shared address space in which regions are stored. This allows shared data to be named by virtual addresses to avoid software address translation. For cache coherence, however, Jackal uses small, ....
L.I. Kontothanassis and M.L. Scott. Using Memory-Mapped Network Interface to Improve the Performance of Distributed Shared Memory. In Proc. of the 2nd Int. Symp. on High-Performance Computer Architecture, pages 166--177, San Jose, CA, February 1996.
....brings large performance penalties. This is a valid concern not only for DSM machines, since large shared memory machines also have a home node concept. However, home node migration would probably make allocation considerations superfluous. 6. RELATED WORK Most DSM systems are either page based [15, 18, 17] or object based [2, 3, 14] while discarding transparency. Jackal manages pages to implement a shared address space in which regions are stored. This allows shared data to be named by virtual addresses to avoid software address translation. For cache coherence, however, Jackal uses small, ....
L.I. Kontothanassis and M.L. Scott. Using Memory-Mapped Network Interface to Improve the Performance of Distributed Shared Memory. In Proc. of the 2nd Int. Symp. on High-Performance Computer Architecture, pages 166--177, San Jose, CA, February 1996.
....protocol and they find both synchronization and data movement across nodes to be a problem. The SVM protocol they use is tuned for traditional clusters with slow interconnection networks (they make use of interrupts and assume high overheads in communication initiation) The authors in [11] take advantage of direct remote access capabilities of system area networks to address communication overheads in software shared memory protocols. This study is simulation based and the performance characteristics of the interconnection network correspond to todays state of the art SANs. The ....
L. I. Kontothanassis and M. L. Scott. Using memory-mapped network interfaces to improve t he performance of distributed shared memory. In The 2nd IEEE Symposium on High-Performance Computer Architecture, Feb. 1996.
....this potentially leads to much interprocessor communication, our implementation uses an optimized version of this protocol that still adheres to the memory model. In particular, it is not necessary to invalidate and flush a region that is accessed by only a single processor or that is only read [19]. Jackal s lazy flushing is evaluated and described in detail in [23] 3.2 Synchronization Logically, each Java object contains a lock and a condition variable. Since threads can access objects from different processors, Jackal provides distributed synchronization protocols. Briefly, an object s ....
....is divided by four. The speedup is still less than RMI s, which must be attributed to more procotol overhead. We do not show this improved performance in Figure 6, since we explicitly want to avoid application level tuning for Jackal. 8. RELATED WORK Most DSM systems are either page based [17, 19] or objectbased [2, 3, 15] Jackal manages pages to implement a shared address space in which regions are stored. For cache coherence, however, Jackal uses small, software managed regions rather than pages and therefore largely avoids the false sharing problems of page based DSM systems. Like ....
L. Kontothanassis and M. Scott. Using Memory-Mapped Network Interface to Improve the Performance of Distributed Shared Memory. In Proc. of the 2nd Int. Symp. on High-Performance Computer Architecture, pages 166--177, San Jose, CA, Feb. 1996.
.... mature approaches: page based shared virtual memory (which we will call SVM) and fine grained or variable grained access control through code instrumentation (which we call the fine grained approach) Much excellent research has been done in the design and implementation of these protocols as well [19, 15, 17, 30, 8, 25]. And finally, above the programming model or protocol layer runs the application itself. Each layer has its own functional and performance characteristics that contribute to the end performance seen by a user. Despite all the research and improvements, currently software shared memory systems ....
.... in page based SVM to alleviate the effects of false sharing [1] and the recent lowering of instrumentation costs for fine grained protocols [25] which are able to avoid false sharing by virtue of using fine granularity) Many important improvements have been made in SVM protocols since then [15, 17, 12, 30]. And the emergence of commercial, low latency, high bandwidth network interfaces and system area interconnects, such as Myrinet [4] and Memory Channel [9] have been the great technological enabler to support these protocols more efficiently on clusters. Fine grained protocols such as the SC ....
[Article contains additional citation context not shown here]
L. I. Kontothanassis and M. L. Scott. Using memorymapped network interfaces to improve t he performance of distributed shared memory. In The 2nd IEEE Symposium on High-Performance Computer Architecture, Feb. 1996.
....the home node, but will also add some extra delay while waiting for a synchronous event to happen in the node. The protocol is still implemented as communicating protocol agents. Several other papers have suggested hardware support for fine grain remote write operations in the network interface [23], 22] One of the recent implementations is the automatic update release consistency (AURC) home based protocol [16] This implementation is a page based SW DSM which eliminates diffs the compact encoded representation of the differences between the two pages, frequently used in many ....
L. Kontothanassis and M. Scott. Using Memory-Mapped Network Interfaces to Improve the Performance of Distributed Shared Memory. In Proceedings of the 2nd IEEE Symposium on High Performance Computer Architecture, February 1996.
....the home node, but will also add some extra delay while waiting for a synchronous event to happen in the node. The protocol is still implemented as communicating protocol agents. Several other papers have suggested hardware support for fine grain remote write operations in the network interface [KS96] KHS # 97] One of the recent implementations is the automatic update release consistency (AURC) home based protocol [IBD # 98] This implementation is a pagebased SW DSM which eliminates diffs the compact encoded representation of the differences between the two pages, frequently used ....
L. Kontothanassis and M. Scott. Using Memory-Mapped Network Interfaces to Improve the Performance of Distributed Shared Memory. In Proceedings of the 2nd IEEE Symposium on High Performance Computer Architecture, February 1996.
....of stages, and treat each stage separately. Thus, latency in our work refers only to wire latency, whereas in [50] latency is an endto end metric including host overhead and packet processing cost. Various types of hardware support to accelerate protocols have been examined for SVM in [25] and [35], and for fine grained software DSM in the Typhoon zero prototype [41] In [30] Karlsson et al. find that the latency and bandwidth of an ATM switch is acceptable in a clustered SVM architecture. In [33] a Lazy Release Consistency protocol for hardware cache coherence is presented. In a ....
....of the DEC Memory Channel network interface, where code instrumentation is used to propagate relevant writes (of application or protocol data) to a remote node, also in a home based protocol. A di#erent type of coarse or variable grained remote fetch support has been examined through simulation [35], but not in real implementation. More sophisticated support to accelerate specific protocol operations has also been examined in simulation, such as hardware di# engines in [4] Support for AURC with write back caches has also been designed and evaluated through simulation in [6] A discussion on ....
L. I. Kontothanassis and M. L. Scott. Using memory-mapped network interfaces to improve t he performance of distributed shared memory. In The 2nd IEEE Symposium on HighPerformance Computer Architecture, February 1996.
....Protocol Programming Model Layer Communication Layer Communication Library Network Figure 1: The layers that a#ect the end application performance in software shared memory. The last decade has seen a lot of excellent research in the individual layers, especially the lower two system layers [4, 3, 5, 13, 11, 12, 21, 18, 17]. Still, software shared memory systems currently yield performance that is, for several classes of applications, far behind that of hardware coherent systems even at quite small scale. This paper uses a layered framework to examine where the major gains (or losses) in the parallel performance ....
....Ocean contiguous behaves similarly, although it is a regular application, due to fine grained (one element) remote accesses at column oriented partition boundaries in the nearneighbor calculations. When di# cost is a problem, hardware support for automatic write propagation [3] can eliminate di#s [12, 8, 9], at the potential cost of contention and or code instrumentation; we might expect it to help substantially in Water nsquared and Radix. Finally, improving protocol costs halfway (C0P 1 2 , not shown in the figures) usually provides about half or less of the benefit of eliminating them (more on ....
[Article contains additional citation context not shown here]
L. I. Kontothanassis and M. L. Scott. Using memorymapped network interfaces to improve t he performance of distributed shared memory. In The 2nd IEEE Symposium on High-Performance Computer Architecture, Feb. 1996.
....software rather than hardware in tightly coupled systems such as the SGI Origin2000 discussed in Chapter 2. Studies on software coherent shared address space multiprocessors have largely used applications as they were written for hardware cache coherent machines. The performance evaluations so far [43, 22, 46, 61, 31, 89, 35, 8] point out that for certain classes of applications there is a large performance gap between hardware cache coherent and software coherent systems. However, it should be possible to modify or restructure applications to interact better with software coherence protocols and granularities, and to ....
....The home copy is thus kept up to date. Upon a page fault following a causally related acquire CHAPTER 3. PERFORMANCE PORTABILITY TO CLUSTERS 59 operation, the entire page is fetched from the home [89] The tradeo#s between homebased and traditional LRC protocols have been studied in the literature [46, 34, 89]. Overall, due to software management for communication and coherence, SVM systems su#er high costs in communication, protocol overhead, and synchronization as well as critical section dilations. Interactions such as false sharing and fragmentation can easily occur due to the large page ....
L. I. Kontothanassis and M. L. Scott. Using memory-mapped network interfaces to improve t he performance of distributed shared memory. In The 2nd IEEE Symposium on High-Performance Computer Architecture, Feb. 1996.
....layers are held fixed. Since the beginning of SVM [59] much excellent research has been done in improving the functionality of the protocol layer either by itself, based on increasingly relaxed memory consistency models [5, 51] or to take advantage of specialized features of a network interface [44, 56, 7]. Recently, several developments have also occurred in communication layers. Commodity network interfaces that support, or can be programmed to support, key data movement and synchronization mechanisms that do not interrupt the processor have been developed [37, 17] communication libraries [27, ....
L. I. Kontothanassis and M. L. Scott. Using memory-mapped network interfaces to improve t he performance of distributed shared memory. In The 2nd IEEE Symposium on High-Performance Computer Architecture, Feb. 1996.
....Protocol Programming Model Layer Communication Layer Communication Library Network Figure 1: The layers that affect the end application performance in software shared memory. The last decade has seen a lot of excellent research in the individual layers, especially the lower two system layers [4, 3, 5, 14, 12, 13, 21, 7, 18]. Still, software shared memory systems currently yield performance that is, for several classes of applications, far behind that of hardware coherent systems even at quite small scale. This paper uses a layered framework to examine where the major gains (or losses) in the parallel performance ....
....behaves similarly, although it is a regular application, due to fine grained (one element) remote accesses at column oriented partition boundaries in the nearneighbor calculations. When diff cost is a problem, hardware support for automatic write propagation [3] can eliminate diffs [13, 9, 10], at the potential cost of contention and or code instrumentation; we might expect it to help substantially in Water nsquared and Radix. Finally, improving protocol costs halfway (C0P 1=2 , not shown in the figures) usually provides about half or less of the benefit of eliminating them (more on ....
[Article contains additional citation context not shown here]
L. I. Kontothanassis and M. L. Scott. Using memorymapped network interfaces to improve t he performance of distributed shared memory. In The 2nd IEEE Symposium on High-Performance Computer Architecture, Feb. 1996.
....1990s [9, 27] used the recently introduced release consistency model [16] to breathe new life into the the SVM approach. Since then, SVM research has witnessed a very active and fruitful decade (see Figure 1) with many research groups building on one anothers ideas to push performance higher [4, 21, 39, 38, 47, 2, 19, 30]. Of the three layers that a#ect end performance (application, protocol model, and communication architecture) most of the e#orts so far have been in the lower two: relaxed consistency models and protocol implementations to reduce communication frequency and tra#c [4, 21, 47, 2] and additional ....
.... and communication architecture) most of the e#orts so far have been in the lower two: relaxed consistency models and protocol implementations to reduce communication frequency and tra#c [4, 21, 47, 2] and additional hardware support in the communication architecture to reduce communication costs [39, 11, 35, 19, 18, 7, 30, 29]. With the relative maturity of protocols, in the last couple of years SVM research has moved to greater emphasis on the application layer and the synergies available across layers. New areas are being emphasized like application driven performance evaluation, application restructuring for SVM ....
[Article contains additional citation context not shown here]
L. I. Kontothanassis and M. L. Scott. Using memory-mapped network interfaces to improve the performance of distributed shared memory. In Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture, February 1996.
.... can be enhanced to improve overall performance [9] Since SVM was first proposed [35] much research has been done in improving the protocol layer by relaxing the memory consistency models [5, 32] by improving the communication layer with low latency, high bandwidth, userlevel communication [20, 13, 17, 38, 15, 47, 4, 39, 26, 33, 34, 51, 6], and by taking advantage of the two level communication hierarchy in systems with multiprocessor nodes [18, 7, 46, 8, 40, 50] Recently, the application layer has also been improved, by discovering application restructurings that dramatically improve performance on SVM systems [28] However, ....
....further as we shall discuss later. While we use a programmable Myrinet network interface for prototyping, the message handling mechanisms are simple and do not require programmability. Unlike some previously proposed mechanisms that also avoid asynchronous message handling in certain situations [34, 26], they support only explicit operations; they do not require code instrumentation or observing the memory bus, or the network to provide global ordering guarantees. Thus, they can more likely be supported in commodity NIs. In fact, many modern communication systems [12, 20, 31, 24] and the recent ....
[Article contains additional citation context not shown here]
L. I. Kontothanassis and M. L. Scott. Using memorymapped network interfaces to improve t he performance of distributed shared memory. In The 2nd IEEE Symposium on High-Performance Computer Architecture, Feb. 1996.
....on end performance. Since the beginning of SVM [24] much excellent research has been done in improving the functionality of the protocol layer either by itself, based on increasingly relaxed memory consistency models [2, 20] or to take advantage of specialized features of a network interface [17, 23, 3]. Recently, several developments have also occurred in communication layers. Commodity network interfaces that support, or can be programmed to support, key data movement and synchronization mechanisms without interrupting the processor have been developed [14, 8] communication libraries [11, 26, ....
L. I. Kontothanassis and M. L. Scott. Using memorymapped network interfaces to improve t he performance of distributed shared memory. In The 2nd IEEE Symposium on High-Performance Computer Architecture, Feb. 1996.
.... [ISL96a,JSS97] Kel96] SW LRC Adaptive LRC [KCZ92] LRC ERC Fine grain [SFL94] Protocols Applications Architectural Support [SB97] Home based ScC [ISL96c] Multicast based ScC [SZB94] DCZ96] Consistency Models Software dirty bits [CF89,PL93, Remote Operations IDFL96,KS96b] and restructuring understanding Application Compiler Support [BFRS94] protocols Application specific [SFL94,SGT96] ACDZ97,Kel98] Figure 2: Research in Shared Virtual Memory approach in the early 1990s and lead to eager release consistency (ERC) implementation in Munin [CBZ91] and lazy ....
....use it to accelerate the protocols and the interaction among applications, compiler and protocols. Two particular hardware extensions played major roles in pushing SVM research forward: fine grain local access control [SFL 94] and fine grain remote access [CF89, PL93, PL94, IDFL96, IBD 98, KS96b, KHS 97] Fine grain sequential consistent shared memory, initially proposed in [SFL 94] was recently revived [SGT96, SG97] as a competitive alternative to the page based shared virtual memory approach. This dissertation contributes to all four areas of SVM research indicated in Figure ....
[Article contains additional citation context not shown here]
L. I. Kontothanassis and M. L. Scott. Using Memory-Mapped Network Interfaces to Improve the Performance of Distributed Shared Memory. In BIBLIOGRAPHY 129 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture, February 1996.
....computing platform. To reduce the hardware cost of these systems, cache coherence hardware is often foregone in favor of software layers implementing Shared Virtual Memory (SVM) 18] as illustrated in Figure 1. Although SVM implementations have been the focus of several past performance studies [12, 15, 17, 24], data gathering approaches have varied widely. At the programming and protocol layer, run time tools and software instrumentation have been used to detect high level contributors to execution time: e.g. counts of synchronizations, page faults, and messages. Prototype evaluations in these studies ....
....messages. Prototype evaluations in these studies indicate that the communication related costs plus the software overhead are responsible for limiting the performance of the software shared memory approach. Simulations have been the principal vehicle for much previous research on SVM performance [3, 13, 14, 17]. However, the simu Applications Communication Layer SVM Protocol Node Host Memory I O Bus Network Interface Network . Node Host Memory I O Bus Network Interface Node Host Memory I O Bus Network Interface Figure 1: Layers that affect the end performance of SVM applications. lation ....
[Article contains additional citation context not shown here]
L. Kontothanassis and M. Scott. Using memory-mapped network interfaces to improve the performance of distributed shared memory. In The 2nd Symposium on High-Performance Computer Architecture, Feb. 1996.
....is mapped to page B at processor P1, then all writes which are performed locally on page A are also automatically propagated to page B at node P1. Copy N 1 Copy 1 Copy 2 . home) Copy N Figure 3: AU based Write Collection This simple hardware support is enough to build a multiple writer protocol [6, 10], as long as coherence is still supported in software as before. The basic approach is to have a home memory for every shared page (just as in hardware coherent machines) and to set up mappings such that writes to other copies of a page get automatically updated at the home. see Figure 3) The ....
L. I. Kontothanassis and M. L. Scott. Using memory-mapped network interfaces to improve the performance of distributed shared memory. In The 2nd IEEE Symposium on High-Performance Computer Architecture, February 1996.
....or updates are performed as soon as they arrive (i.e. it is not DC) and a releaser waits for acknowledgments before proceeding. Release latency is large due to large round trip message cost and remote interrupt overhead, both of which are on the critical path. The Cashmere system from Rochester [27] proposes a lazy application, eager propagation protocol that queues invalidations but still waits for acknowledgments at release time. Overall, the performance complexity tradeoffs in laziness of propagating and applying coherence are not yet very clear. 2.3 Multiple writer Protocols Using ....
.... dirty bits due to the cost of instrumentation [1] The other alternative to incurring software diffing overhead is to use hardware support for either computing and applying diffs [5] or for propagating writes to the home in hardware in home based protocols, thus eliminating diffs altogether [16, 27] (see Sec 2.4) 2.3.2 Laziness in Diffing As with coherence propagation, there many degrees of laziness are possible in diff propagation and application. These must be clearly understood in interpreting performance results that compare protocols, since they too can influence the results ....
[Article contains additional citation context not shown here]
L. I. Kontothanassis and M. L. Scott. Using Memory-Mapped Network Interfaces to Improve the Performance of Distributed Shared Memory. In The 2nd IEEE Symposium on HighPerformance Computer Architecture, February 1996.
.... called AURC (Automatic Update Release Consistency) uses lazy release consistency [10] together with the automatic update mechanism to implement a multiple writer protocol [6] A similar approach using a directory based scheme has been taken on other systems that provide automatic update support [11]. Recently, small scale shared memory SMPs have become increasingly widespread. Inexpensive SMPs based on Intel PC processors are on the market, and SMPs from other vendors are increasingly popular. Given this development, it is natural to examine whether the SVM mechanisms developed for ....
L. I. Kontothanassis and M. L. Scott. Using Memory-Mapped Network Interfaces to Improve t he Performance of Distributed Shared Memory. In The 2nd IEEE Symposium on High-Performance Computer Architecture, February 1996.
.... support for fine grain remote writes (automatic updates) Such support is provided in several recent network interfaces to implement memory mapped communication (e.g. SHRIMP [4] DEC Memory Channel [8] and has been found to be very useful for efficiently implementing shared virtual memory [10, 15]. To understand the performance implications of this approach, we implemented ScC and AURC protocols within the TangoLite simulation framework [9] We conducted detailed simulation studies with five real applications, as well as a synthetic benchmark designed to emphasize a communication pattern ....
....the increase in protocol overhead and communication. 6 All software Implementation for ScC The hardware support for automatic update is very valuable for shared virtual memory independent of scope consistency [10] and several shared virtual memory systems have been built using this feature (see [10, 15, 11]) We have seen that scope consistency can be easily built on top of this in software, and has performance benefits. However, it is also interesting to see if ScC can be built on top of an allsoftware protocol, without automatic update support. One way to do this is to emulate the AURC protocol ....
[Article contains additional citation context not shown here]
L. I. Kontothanassis and M. L. Scott. Using MemoryMapped Network Interfaces to Improve t he Performance of Distributed Shared Memory. In The 2nd IEEE Symposium on High-Performance Computer Architecture, February 1996.
....obtained as separate diffs and applied individually by the faulting processor. Automatic Update Release Consistency The AURC protocol [9] is also based on a lazy release consistency model, but it takes advantage of the automatic update hardware feature of virtual memory mapped network interfaces [4, 14] to detect, propagate and merge writes by different processors to a page. By snooping on the memory bus, the automatic update mechanism can propagate updates to shared data at a word granularity with no software overhead, as long as a mapping has been established between the local page and a ....
L. I. Kontothanassis and M. L. Scott. Using MemoryMapped Network Interfaces to Improve t he Performance of Distributed Shared Memory. In The 2nd IEEE Symposium on High-Performance Computer Architecture, February 1996.
....two important consistency models which are supported in software shared memory systems. 2. 1 Release Consistency The Release Consistency(RC) model was initially introduced for hardware cache coherent multiprocessor [11] It has since become the cornerstone of software shared virtual memory systems [8, 20, 19, 17, 21]. Release consistency distinguishes between ordinary memory accesses and synchronization accesses defined as either acquire or release. To make this possible the program must be properly labeled [11] Figure 1) Only the synchronization accesses are strictly ordered (either sequentially [22] or ....
....by selectively invalidating the obsolete pages. 5. 3 All software ScC Protocol The hardware support for automatic update is very valuable for shared virtual memory independent of scope consistency [16, 17] and several shared virtual memory systems have been built using this feature (see [16, 21]) However, ScC can also be built on top of an all software home based protocol, without automatic update support. All software home based LRC protocols have been shown to perform and scale much better than the traditional, homeless LRC [33] The home based LRC protocol(HLRC) 33] is inspired by ....
[Article contains additional citation context not shown here]
L. I. Kontothanassis and M. L. Scott. Using Memory-Mapped Network Interfaces to Improve the Performance of Distributed Shared Memory. In The 2nd IEEE Symposium on HighPerformance Computer Architecture, February 1996.
....block via write through to a unique (and possibly remote) main memory copy of each page. Simulation studies indicate that on an ideal remote write network Cashmere will significantly outperform other DSM approaches, and will in fact approach the performance of full hardware cache coherence [20, 21]. In this paper we compare implementations of Cashmere and TreadMarks on a 32 processor cluster (8 nodes, 4 processors each) of DEC AlphaServers, connected by DEC s Memory Channel [15] network. Memory Channel allows a user level application to write to the memory of remote nodes. The remote write ....
....set for certain applications beyond the 16K available, dramatically reducing performance. The larger caches of the 21264 should largely eliminate this problem. We are optimistic about the future of Cashmere like systems as network interfaces continue to evolve. Based on previous simulations [21], it is in fact somewhat surprising that Cashmere performs as well as it does on the current generation of hardware. The second generation Memory Channel, due on the market very soon, will have something like half the latency, and an order of magnitude more bandwidth. Finer grain DSM systems are ....
[Article contains additional citation context not shown here]
L. I. Kontothanassis and M. L. Scott. Using MemoryMapped Network Interfaces to Improve the Performance of Distributed Shared Memory. In Proceedings of the Second International Symposium on High Performance Computer Architecture, San Jose, CA, February 1996.
....shared memory programs, such as might run on a machine with hardware coherence. The memory model presented to the user is release consistency [7] with explicit synchronization operations visible to the run time system. We begin with a summary of existing protocols and implementations. Cashmere [11] employs a directorybased multi writer protocol that exploits the writethrough capabilities of its network (in software) to merge updates by multiple processors. AURC [9] is an interval based multi writer protocol that exploits the hardware write through capabilities of the Shrimp network ....
....and for a userlevel implementation of polling, allowing processors to exchange asynchronous messages inexpensively. We do not currently use broadcast or remote memory access for either synchronization or protocol data structures, nor do we place shared memory in Memory Channel space. Cashmere [11] is a software coherence system expressly designed for memory mapped network interfaces. It was inspired by Petersen s work on coherence for small scale, non hardware coherent multiprocessors [17] Cashmere maintains coherence information using a distributed directory data structure. For each ....
L. I. Kontothanassis and M. L. Scott. Using MemoryMapped Network Interfaces to Improve the Performance of Distributed Shared Memory. In Proc. of the 2nd Intl. Symp. on High Performance Computer Architecture, Feb. 1996.
....of magnitude lower than that of traditional message passing. These networks suggest the need to re evaluate the assumptions underlying the design of DSM protocols, and specifically to consider protocols that communicate at a much finer grain. The Cashmere system employs this sort of protocol [17, 18]. It uses directories to keep track of sharing information, and merges concurrent writes to the same coherence block via write through to a unique (and possibly remote) main memory copy of each page. In this paper we compare implementations of TreadMarks and a modified version of Cashmere on a ....
....set for certain applications beyond the 16K available, dramatically reducing performance. The larger caches of the 21264 should largely eliminate this problem. We are optimistic about the future of Cashmere like systems as network interfaces continue to evolve. Based on previous simulations [18], it is in fact somewhat surprising that Cashmere performs as well as it doeson the current generation of hardware. The second generationMemory Channel, due on the market very soon, will have something like half the latency, and an order of magnitude more bandwidth. Finer grain DSM systems are in ....
[Article contains additional citation context not shown here]
L. I. Kontothanassis and M. L. Scott. Using MemoryMapped Network Interfaces to Improve the Performance of Distributed Shared Memory. In Proc. of the 2nd Intl. Symp. on High Performance Computer Architecture, Feb. 1996.
No context found.
L.I.Kontothanassis and M.L.Scott. Using Memory-Mapped Network Interfaces to Improve the Performance of Distributed Shared Memory. In Proc. of the 2nd IEEE Symp. on HighPerformance Computer Architecture (HPCA-2), 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC