| Kontothanassis, Scott. Software cache coherence for large scale multiprocessors. Proceedings First IEEE Symposium on High-Performance Computer Architecture, pp 286-95, 1995. |
....home node, and there is currently an outstanding write request. A directory entry is created for every shared block which a program accesses. Also, the home node creates a directory entry for a block when it is accessed by another node. Unlike many hardware or hardware software directory schemes [1,5,9], our implementation maintains directory entries for both remote and local data; this is because we have no hardware or operating system support for detecting access violations. The only difference between a directory entry for a local block and one for a remote block is that the user vector is ....
Kontothanassis, Scott. Software cache coherence for large scale multiprocessors. Proceedings First IEEE Symposium on High-Performance Computer Architecture, pp 286-95, 1995.
....sharing several hybrid hardware software approaches are currently being explored. In these approaches, some hardware support is provided to assist the software. Some examples include support for remote writes in the SHRIMP project [1] support for remote reads and writes in the Cashmere project [12], and support for fine grain sharing in Blizzard E [23] Typhoon 0 [20] and START NG [3] This trend goes all the way to adding a dedicated processor to execute the software protocol and interact tightly with the network interface, as in Typhoon [19] 20] or Flash [14] This paper explores four ....
L.I. Kontothanassis, M.L. Scott. Software Cache Coherence for Large Scale Multiprocessors. Proc. of the 1st Symposium on High-Performance Computer Architecture, pages 286-295. January 1995. 21
....many of the issues regarding the implementation on SMP nodes [69] An interesting direction is the software emulation of CC NUMA on top of an NCC NUMA substrate. Following Petersen and Li s work [59] on virtual memory supported cache coherence, Kontothanassis and Scott have proposed Cashemere [42]. This approach exploits the virtual memory system of the operating system. Because the granularity of sharing is at the page level and because coherence is maintained by flushing pages from caches, the applications must be written under relaxed consistency models. When a processor writes into a ....
L.I. Kontothanassis, M.L. Scott. Software Cache Coherence for Large Scale Multiprocessors. Proc. of the 1st Symposium on High-Performance Computer Architecture, pages 286-295. January 1995.
....easy in most UNIX OS s because it uses an exported interface originally meant for building third party file systems. This approach was pioneered by Li [67] and has been further refined in many subsequent implementations, the most notable being Treadmark[53] A variant of this approach, Cashmere [56], relies on non coherent shared memory hardware with remote write capability. A good shared memory system should be built from a combination of all three techniques. Either user directive, or dynamic monitoring of access characteristics can help pick the right implementation technique for a ....
L. I. Kontothanassis and M. L. Scott. Software Cache Coherence for Large Scale Multiprocessors. In Proceedings of the First International Symposium on High-Performance Computer Architecture, Raleigh, NC, pages 286--297, Jan. 1995.
.... a software only shared memory system significantly [14, 7] and that on distributed memory architectures with remote memory reference capability, the performance of software cache coherence maintained at the virtual memory page level is competitive with that of hardware cache coherence schemes [23, 24, 15]. These results suggest that appropriate architectural support may improve the performance of shared virtual memory substantially. The SHRIMP project at Princeton studies how to provide high performance communication mechanisms to integrate unmodified, commodity desktops such as PCs and ....
L.I. Kontothanassis and M.L. Scott. Software cache coherence for large scale multiprocessors. In Proceedings of the First International Symposium on High Performance Computer Architecture, January 1995.
....Incoming remote writes are compared against this bit vector to selectively update the corresponding cache line. The actual data is written in place. This technique basically eliminates the need for twinning and diffing of shared pages [3, 13, 14] or doubling of writes either in software [15] or in hardware [12] done in some page based DSM systems to avoid false sharing effects. A merge buffer [5] is used to smoothen the outgoing update traffic. These components are at a node level, with multiple application threads being associated with the same cache, directories, write, and merge ....
L. I. Kontothanassis and M. L. Scott. Software Cache Coherence for Large Scale Multiprocessors. In Proceedings of the First International Symposium on High Performance Computer Architecture, pages 286--295, January 1995.
....are relatively easy to build, and can follow improvements in microprocessors and other hardware technology closely. As part of the CASHMERe 1 project we have shown that a lazy software protocol on a tightly coupled NCC NUMA system can rival the performance of traditional hardware coherence [15, 17]. In this paper we examine the other end of the architectural spectrum, quantifying the performance advantage of software coherence on NCC NUMA hardware with respect to more traditional DSM systems for message passing hardware. Using detailed execution driven simulation, we have compared the ....
....Munin [4] and lazy release consistency [13] in its use of delayed write notices, but we take advantage of the globally accessible physical address space for cache fills (in the Nxxx systems) and for access to the directory and the local write notice lists. We have also presented protocol variants [17] suitable for a multiprocessor such as the BBN TC2000 or the Cray T3D, with a globally accessible physical memory but without hardware coherence. In such systems the lower latency of memory accesses allows the use of uncached remote references as an alternative to caching and provides us with the ....
[Article contains additional citation context not shown here]
L. I. Kontothanassis and M. L. Scott. Software Cache Coherence for Large Scale Multiprocessors. 1st Intl. Symp. on High Performance Computer Architecture, pp. 286--295, Raleigh, NC, Jan. 1995.
....and read only mappings. If the only previously existing mapping had read write permissions, or if the current fault was a write fault and all previously existing mappings were read only, then the page is added to the weak list. Full details of this protocol can be found in a technical report [10]. 3.2 Page Placement Mechanisms The changes required to add page placement to both the hardware and software coherence protocols were straightforward. The basic idea is that the first processor to touch a given page of shared memory becomes that page s home node. To deal with the common case in ....
....systems generally exhibit comparable performance both with and without migration. Our applications exhibit coarse grained sharing and therefore scale nicely under both coherence schemes. The one exception is mp3d, which requires several modifications to work well on a software coherent system [10]. These modifications were not applied to the code in these experiments. Figure 2 shows the percentage of cache misses and writebacks that occur on pages that are local after migration. Without dynamic placement, the applications in our suite satisfy less than two percent of their misses locally, ....
L. I. Kontothanassis and M. L. Scott. Software Cache Coherence for Large Scale Multiprocessors. TR 513, Computer Science Department, University of Rochester, March 1994. Submitted for publication.
....and the thesis work of Karin Petersen [15, 16] It is also related to ongoing work on the Wind Tunnel [19] and the Princeton Shrimp [1] project and, less directly, to several other DSM and multiprocessor projects. Full protocol details and comparisons to related work can be found in other papers [8, 9, 10]. 2 Results and Project Status We have evaluated the performance of NCC NUMA hardware running Cashmere protocols using execution driven simulation on a variety of applications. The applications include two programs from the SPLASH suite [20] mp3d and water) two from the NASA parallel ....
....remote memory directly. The results indicate that one can achieve substantial performance improvements over more traditional DSM systems by exploiting the additional hardware capabilities of NCC NUMA systems. A complete discussion and explanation of the results can be found in previous papers [8, 9, 10]. The best performance, clearly, will be obtained by systems that combine the speed and concurrency of existing hardware coherence mechanisms with the flexibility of software coherence. This goal may be achieved by a new generation of machines with programmable network controllers [12, 18] Our ....
L. I. Kontothanassis and M. L. Scott. Software Cache Coherence for Large Scale Multiprocessors. In Proceedings of the First InternationalSymposium on High Performance Computer Architecture, pages 286--295, Raleigh, NC, January 1995.
....improving program performance. Table 1 summarizes the default parameters used in our simulations. The CC NUMA machine uses the directory based writeinvalidate coherence protocol of the Stanford DASH machine. Our software coherent NUMA machine uses a more complicated, multi writer protocol [9]. This protocol employs a variant of lazy release consistency [8] in which invalidation messages are sent only at synchronization release points, and processed (locally) only at synchronization acquire points. At an acquire, a processor is required to flush from its own cache all lines of all ....
....exhibit comparable performance both with and without dynamic placement. Our applications exhibit coarse grained sharing and therefore scale nicely under both coherence schemes. The principal exception is mp3d, which requires several modifications to work well on a software coherent system [9]. These modifications were not applied to the code in these experiments. Figure 2 shows the percentage of cache misses and writebacks that occur on pages that are local after migration. Without dynamic placement, the applications in our suite satisfy less than two percent of their misses locally, ....
L. I. Kontothanassis and M. L. Scott. Software Cache Coherence for Large Scale Multiprocessors. In Proc. of the 1st Intl. Symp. on High Performance Computer Architecture, Jan. 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC