| Donald Yeung, John Kubiatowicz, and Anant Agarwal. Mgs: A multigrain shared memory system. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996. 16 |
....helpful to find out where the possible bottlenecks are (for example, in computation, scheduling, or synchronization) and hence make improvements according (such as better network, faster processors, etc. 5. Related Work Some performance models have been proposed for DSMs. Donald Yeung et al. [15, 16] started from a clusters of SMPs with a relatively simple protocol. Blelloch and Gibbons et al. studied the performance of planar DAG scheduling. Their model supports synchronization based on write once synchronization variables [2] Bilas [1] analyzed the performance of shared virtual memory on ....
D. Yeung, J.Kubiatowicz, and A.Agarwal. Mgs: A multigrain shared memory system. In 23th Annual Symposium on Computer Architecture, pages 44--55, May 1996.
.... in spirit to the one advocated by the bulk synchronous parallel (BSP) model [32] Most popular programming methodologies for Cosmos fall into two categories [15] The first, distributed shared memory (DSM) systems (for example, TreadMarks [2] from Rice University, Multigrain Shared Memory (MGS) [34] from MIT, and Coherent Virtual Machine (CVM) 19] from University of Maryland) provides a software layer which simulates coherent shared memory between nodes by internally using messaging to move around specific data or referenced memory pages. The second, based on message passing primitives (for ....
D. Yeung, J. Kubiatowicz, and A. Agarwal, MGS: A multigrain shared memory system, in Proceedings of the 23rd Annual International Symposium on Computer Architecture, Philadelphia, PA, May 1996."
.... programming approach (see Figure 1) SIMPLE is described in more detail in [7] Existing programming methodologies for SMP clusters fall into two categories [12] The first, distributed shared memory (DSM) systems (for example, TreadMarks [2] from Rice University, Multigrain Shared Memory (MGS) [25] from MIT and Coherent Virtual Machine (CVM) 18] from University of Maryland) provides a software layer which simulates coherent shared memory between nodes by internally using messaging to move around specific data or referenced memory pages. The second, based on message passing primitives (for ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: A Multigrain Shared Memory System. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, Philadelphia, PA, May 1996.
.... in spirit to the one advocated by the Bulk Synchronous Parallel (BSP) model [32] Most popular programming methodologies for COSMOS fall into two categories [14] The first, distributed shared memory (DSM) systems (for example, TreadMarks [2] from Rice University, Multigrain Shared Memory (MGS) [34] from MIT and Coherent Virtual Machine (CVM) 19] from University of Maryland) provides a software layer which simulates coherent shared memory between nodes by internally using messaging to move around specific data or cosmos ( kaz mos) noun Greek kosmos c. 1650 1: an orderly harmonious ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: A Multigrain Shared Memory System. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, Philadelphia, PA, May 1996. 18
....hardware found in commodity microprocessors. Such systems suffer from fragmentation and false sharing and can perform poorly in the presence of fine grain sharing [EK89] For acceptable performance, page based systems often resort to weaker shared memory consistency models [KDCZ93,ENCH96,KHS 97,YKA96] Shared virtual memory systems lack fine grain access control, a key feature of hardware shared memory machines. Access control is the ability to selectively restrict reads and writes to memory regions. At each memory reference, the system must perform a lookup to determine whether the ....
....performance competitive to uniprocessor nodes, and are especially beneficial for Blizzard E, converting high overhead FGDSM operations (e.g. write to read only pages) to fast SMP local accesses. This result corroborates previous findings for (high overhead) DVSM implementations on SMP clusters [YKA96] Quad processor nodes also increase synchronization time in the loopbackedge handshake because a protocol handler must wait for three processors to reach a loop backedge. This result indicates that the loop backedge handshake may be suitable for small scale SMP nodes while instrumented ....
[Article contains additional citation context not shown here]
Donald Yeung, John Kubiatowicz, and Anant Agarwal. MGS: A multigrain shared memory system. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.
....13 14 FFT LU c LU nc Radix Barnes FMM Ocean c Ocean nc Radiosity Raytrace Water nsq Water sp Average E6000 16 CPUs CC NUMA 2x8 DSZOOM WF 2x8 Figure 8: Application speedups for Sun Enterprise E6000, 2node CC NUMA, and 2 node DSZOOM WF. 2L [41] 10] CRL [19] GeNIMA [5] Ivy [26] 27] MGS [44], Munin [8] Shasta [35] 34] 32] 33] 10] SiroccoS [36] SoftFLASH [11] and TreadMarks [21] Most of them suffer from synchronous interrupt protocol processing. We belive that many of these implementations would benefit from a more efficient protocol implementation; such the one described ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: A Multigrain Shared Memory System. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA'96), pages 44--56, May 1996. 13
....Network delay is about 3 microseconds for all DSZOOM EMU configurations. 5 Related Work Many different SW DSM implementations have been proposed over the years: Blizzard S [SFL # 94] Brazos [SB97] Cashmere 2L [SDH # 97] DGK # 99] CRL [JKW95] GeNIMA [BLS99] Ivy [Li88] LH89] MGS [YKA96] Munin [CBZ91] Shasta [SGT96] SGA97] SG97a] SG97b] DGK # 99] Sirocco S [SFH # 98] SoftFLASH [ENCH96] and TreadMarks [KCDZ94] Most of them suffer from synchronous interrupt protocol processing. We belive that many of these implementations would benefit from a more efficient ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: A Multigrain Shared Memory System. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA'96), pages 44--56, May 1996. 23
....and speedups for Fibonacci are shown in Table 2. The speedup obtained is almost linear with the number of processors used. 5 Related Work The four systems most similar to ours are an OpenMP implementation on a network of multiprocessors [HLCZ99] Strings [RC98] SoftFLASH [ENCH96] and MGS [YKA96] The OpenMP port is [HLCZ99] is probably the most similar to HyFi. Similar to HyFi, it provides a single API for parallelization within a node, between nodes and a combination of the two. It also uses POSIX threads within a node for portability, and a per page mutex for the page table to provide ....
....con guration dedicates additional processors for network interrupt processing. These processors do not run applications and are not counted when assessing speed up. Unlike HyFi, SoftFLASH supports only iterative programs and does not provide support for true fork join parallel applications. MGS [YKA96] was one of the rst systems to explore coupling of small to medium scale shared memory multiprocessors through software to synthesize larger logically shared memory systems. Similar to Strings and the OpenMP port, it handles iterative applications well but does not support fork join programs. ....
Donald Yeung, John Kubiatowicz, and Anant Agarwal. MGS: A multigrain shared memory system. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996. 21
....of communication. Several studies have also examined the performance of different SVM systems across multiprocessor nodes and compared it with the performance of configurations with uniprocessor nodes. Erlichson et al. 6] find that clustering helps shared memory applications. Yeung et al. in [23] find this to be true for SVM systems in which each node is a hardware coherent DSM machine. In [1] they find that the same is true in general for all software SVM systems, and for SVM systems with support for automatic write propagation. 9 Discussion and Future Work This work shows that there ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: a multigrain shared memory system. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May
....and address the programming concerns of cluster based parallel computing, several software DSM systems have been built that do not rely on specialized hardware to provide programmers with shared memory. These systems include Ivy [28] TreadMarks [21] Munin [4] Brazos [43] CRL [18] MGS [49], CVM [20] Blizzard S [40] Shasta [39] Cashmere 2L [46] and SoftFLASH [9] The underlying principle in these machines is to leverage commodity parts particularly the use of commodity processors, node boards, networks, and operating systems to build a scalable DSM machine. Most software DSM ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: A Multigrain Shared Memory System. In Proceedings of the 23rd International Symposium on Computer Architecture, pages 44--55, May 1996.
....et al. 2] on adaptive TreadMarks implementations is the most similar to ours. In this paper, we compare their most important implementation against ADSM and show that our protocol behaves better, as it does not require as much communication traffic. Other indirectly related pieces of work are [16] and [17] Just as ADSM, the Lazy Hybrid (LH) protocol studied by Dwarkadas et al. in [6] also applies a hybrid invalidate update coherence approach. ADSM differs from LH in that it only updates the single writer pages associated with the lock variable on a lock acquire operation. During a lock ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: A Multigrain Shared Memory System. In Proc. of the 23th Annual Int'l Symp. on Computer Architecture (ISCA'96), pages 45--55, May 1996.
....(for example, TreadMarks [2] from Rice University, Multigrain Shared Memory 1 cosmos ( kaz m os) noun Greek kosmos c. 1650 1: an orderly harmonious systematic universe 2: a complex orderly self inclusive system 3: Cluster Of Shared Memory Nodes 1 UNM Technical Report: EECE TR 98 006 (MGS) [34] from MIT and Coherent Virtual Machine (CVM) 19] from University of Maryland) provides a software layer which simulates coherent shared memory between nodes by internally using messaging to move around specific data or referenced memory pages. The second, based on message passing primitives (for ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: A Multigrain Shared Memory System. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, Philadelphia, PA, May 1996. 14
....the opportunity to handle incoming messages on any processor on a node for load balancing purposes. The widespread availability of commercial SMP servers with small numbers of processors has led several researchers to consider their use as building blocks for Shared Virtual Memory (SVM) systems [1, 3, 4, 8, 18]. These systems exploit the SMP cache coherence hardware to support fine grain sharing within a node, and use a software protocol to support sharing across nodes at a coarser page size granularity. Most of the above work is based on simulation studies. SoftFLASH is the only actual implementation ....
....We have extended the Shasta protocol to exploit data sharing and clustering benefits within SMP nodes. This protocol is fully functional and runs on our prototype SMP cluster. Several researchers have considered using SMP nodes as building blocks for software Shared Virtual Memory (SVM) systems [1, 3, 4, 8, 18]. Among these, SoftFLASH [4] and MGS [18] are the only real implementations, with SoftFLASH being the only implementation based on commercial multiprocessor nodes. The primary difference with our study is that these systems support coherence across nodes at a fixed coarse granularity equal to the ....
[Article contains additional citation context not shown here]
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: A Multigrain Shared Memory System. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 44--56, May 1996.
....7, 63] All this research has helped narrow the performance gap between SVM on clusters and hardware DSM systems for an expanding range of applications at the 16 processor scale. Recent studies examine the performance of software shared memory clusters at significant scale (32 to 64 processors) [87, 35, 89, 80, 14]. In this chapter, we present the first scalability study of all software SVM on a modern cluster with 64 processors. 136 CHAPTER 5. SCALABILITY ON AN SVM CLUSTER 137 Since clusters used for high performance computing typically use SMP nodes, we examine a 64 processor cluster composed of sixteen ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: a multigrain shared memory system. In Proceedings of the 23rd International Symposium on Computer Architecture (ISCA), May 1996.
....model) Similarly, some page based systems (e.g. Treadmarks [14] reduce the required bandwidth by only communicating the differences between copies, but the coherence granularity is still a page. Page based DSM systems implemented on a cluster of shared memory multiprocessors, such as MGS [25] and SoftFLASH [7] naturally support two coherence granularities the line size of the multiprocessor hardware and the size of the virtual memory page. However, neither of these granularities can be changed. There has been a lot of research on exploiting and evaluating the benefits of relaxed ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: A Multigrain Shared Memory System. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 44--56, May 1996.
....We use a multiple writer home based protocol based on lazy release consistency (LRC) The network interface we use provides lower latency and higher bandwidth. We use 4 KBytes pages and the safe page fetch helps us avoid the TLB synchronization problem. The Multigrain Shared Memory System (MGS) [99] built on top of the MIT Alewife machine uses a protocol very similar to the SoftFLASH system. The system is implemented by partitioning the Alewife machine, so each node is a distributed rather than centralized shared memory multiprocessor, and the number of external network connections from the ....
....of communication. Several studies have also examined the performance of di#erent SVM systems across multiprocessor nodes and compared it with the performance of configurations with uniprocessor nodes. Erlichson et al. 28] find that clustering helps shared memory applications. Yeung et al. in [99] find this to be true for SVM systems in which each node is a hardware coherent DSM machine. In [9] they find that the same is true in general for all software SVM systems, and for SVM systems with support for CHAPTER 4. SVM PERFROMANCE BOTTLENECKS 150 automatic write propagation. 4.6 Discussion ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: a multigrain shared memory system. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996. BIBLIOGRAPHY 231
....SC and HLRC, 16 processors [48] 6. 2 SVM across Multiprocessor Clusters Another recent development is motivated by the increasing popularity of small scale hardware coherent multiprocessors and the ease of constructing systems of systems or clusters that use these multiprocessors as their nodes [46, 10, 23, 15, 45, 36, 5, 43, 37]. A software shared memory layer provides a uniform, coherent shared memory programming model rather than a hybrid messagepassing shared memory interface for this increasingly important platform. Also, because the local coherence and synchronization within a multiprocessor node is performed in ....
....number of processors per node. For example, an SVM protocol may be implemented across clusters of distributed memory, hardware coherent machines rather than SMPs, with performance hopefully increasing as node size increases relative to the number of nodes. A start in the latter area has been made [45], but with a relatively simple protocol. If this approach is successful, i.e. it competes well with full hardware coherent largescale implementations, it may provide a way to extend the attractive programming model from moderate scale hardware coherent systems built in industry to much larger ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. Mgs: A multigrain shared memory system. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996.
.... relaxing the memory consistency models [5, 32] by improving the communication layer with low latency, high bandwidth, userlevel communication [20, 13, 17, 38, 15, 47, 4, 39, 26, 33, 34, 51, 6] and by taking advantage of the two level communication hierarchy in systems with multiprocessor nodes [18, 7, 46, 8, 40, 50]. Recently, the application layer has also been improved, by discovering application restructurings that dramatically improve performance on SVM systems [28] However, Figure 1 shows that parallel performance is still not satisfactory. Speedups on a 16 processor, all software, home based SVM ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: a multigrain shared memory system. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996.
....[24] exist for providing consistency. Accessing remote data that is not present in the local cache of the hypernode results in considerable time lost as complete cache lines or pages are transferred across the network. In systems with software based cachecoherency schemes, e.g. the MIT MGS [21] systems, such costs are even more significant. Memory hierarchy optimization techniques such as prefetching may reduce inter hypernode latencies. Although cache effects are important they are specific to particular machines and thus difficult to incorporate in a model. Network gap for short ....
Donald Yeung, John Kubiatowicz, and Anant Agarwal. MGS: A Multigrain Shared Memory System. Proceedings of the 23rd ISCA,pages 44-55, May 1996.
.... Multiprocessor Clusters Another recent development in software DSM was motivated by the increasing popularity of small scale hardware coherent multiprocessors and the development of systems of systems or clusters that use these multiprocessors as their nodes [ZSLW92, CDK 94, KS96a, ENCH96, YKA96, SISL96, BIMS96, BIS98, SDH 97, SG97, SBIS98] The home based LRC protocol proposed in this dissertation was adapted and implemented on a cluster of SMPs [SISL96, SBIS98] There are two strong arguments for using a software DSM layer across the nodes in such a cluster. First, it provides a ....
.... are questions about whether to use a dedicated protocol processor within the node and, if not, how to distribute protocol processing in response to incoming messages across processors [KS96a] Systems have also been developed to implement SVM across hardware DSM rather than bus based machines [YKA96] which is a promising direction for building truly large scale shared memory systems. Studies have shown that using multiprocessors rather than uniprocessor nodes in software DSM systems indeed improves performance, with the extent of the improvement depending on the localization of ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: A Multigrain Shared Memory System. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996. BIBLIOGRAPHY 132
....acquisition accordingly. Munin [CBZ95] uses multiple protocols to handle data with different access characteristics. The innovation in our work is that it chooses automatically between SW and MW protocols. In Munin, the choice of protocol is based on somewhat burdensome user annotations. MGS [YKA96] a DSM system for distributed SMPs, uses a base protocol similar to Munin. Their protocol employs a single writer optimization that avoids diffing overhead when there is only one writable copy. Although the twin is still made, the 31 entire page is sent to the home instead of computing a diff. ....
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: A multigrain shared memory system. In Proceedings of the 23th Annual International Symposium on Computer Architecture, May 1996.
....between computation and communication in software, it can potentially change its resource allocation for different applications. A prototype implementation of the design is built on the MIT Alewife multiprocessor [3] We integrate our prototype with the MIT Multi Grain Shared memory system (MGS) [15], which is a DSSM system also implemented on Alewife. The prototype provides a flexible platform for studying the following problems: How much overhead needs to be paid for inter cluster protocol processing 3 By how much does contention in ICCSs impact application performance To what extent ....
Donald Yeung, John Kubiatowicz, and Anant Agarwal. MGS: A Multigrain Shared Memory System. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 44--55, May 1996. 22
....Cache coherent shared memory hardware provides a small cache line sharing grain between processors colocated on the same SMP. Pagebased software DSM provides a larger page sharing grain between processors in separate SMPs. Recently, several DSMP architectures have been constructed and studied [6, 7, 8, 9, 4]. This paper builds upon the work in [4] and makes the following novel contributions: 1. We present a fully functional design of a multigrain shared memory system, called MGS, and provide a prototype implementation of MGS on the Alewife multiprocessor. 2. We define two performance metrics, the ....
....small cache line sharing grain between processors colocated on the same SMP. Pagebased software DSM provides a larger page sharing grain between processors in separate SMPs. Recently, several DSMP architectures have been constructed and studied [6, 7, 8, 9, 4] This paper builds upon the work in [4] and makes the following novel contributions: 1. We present a fully functional design of a multigrain shared memory system, called MGS, and provide a prototype implementation of MGS on the Alewife multiprocessor. 2. We define two performance metrics, the breakup penalty and the multigrain ....
[Article contains additional citation context not shown here]
Donald Yeung, John Kubiatowicz, and Anant Agarwal. MGS: A Multigrain Shared Memory System. In Proceedings of the 1996 International Symposium on Computer Architecture, Philadelphia, May 1996.
....high performance can be achieved on four out of our five applications if each multiprocessor node is at least 16 way. 1 Introduction Recently, researchers have proposed building large scale distributed shared memory (DSM) systems by coupling multiple smallscale shared memory multiprocessors [1, 2, 3, 4, 5]. These systems combine fine grain cache coherence mechanisms supported in hardware (within a small scale multiprocessor) and coarse grain software page based mechanisms supported in software (between small scale multiprocessors) Because they employ two different P C Network P C P C Shared ....
....shared memory system is built using smallscale multiprocessors as DSM nodes. P denotes processors, C hardware caches, and M physical memory modules on each DSM node. coherence units (both cache lines and pages) these systems have been referred to as multigrain shared memory systems [5]. Figure 1 shows the architecture of a multigrain shared memory system. Like any conventional DSM, a multigrain system consists of a collection of nodes connected over a network. Each node is a shared memory multiprocessor containing a few (2 100) processors each with its own hardware cache, ....
[Article contains additional citation context not shown here]
Donald Yeung, John Kubiatowicz, and Anant Agarwal. MGS: A Multigrain Shared Memory System. In Proceedings of the 1996 International Symposium on Computer Architecture, Philadelphia, May 1996.
No context found.
Donald Yeung, John Kubiatowicz, and Anant Agarwal. Mgs: A multigrain shared memory system. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996. 16
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC