• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Memory coherence in shared virtual memory systems,”ACMTransactions onComputer Systems, (1989)

by K Li, P Hudak
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 957
Next 10 →

Implementation and performance of Munin

by John B. Carter, John K. Bennett, Willy Zwaenepoel - IN PROCEEDINGS OF THE 13TH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES , 1991
"... Munin is a distributed shared memory (DSM) system that allows shared memory parallel programs to be executed efficiently on distributed memory multiprocessors. Munin is unique among existing DSM systems in its use of multiple consistency protocols and in its use of release consistency. In Munin, sha ..."
Abstract - Cited by 587 (22 self) - Add to MetaCart
Munin is a distributed shared memory (DSM) system that allows shared memory parallel programs to be executed efficiently on distributed memory multiprocessors. Munin is unique among existing DSM systems in its use of multiple consistency protocols and in its use of release consistency. In Munin, shared program variables are annotated with their expected access pattern, and these annotations are then used by the runtime system to choose a consistency protocol best suited to that access pattern. Release consistency allows Munin to mask network latency and reduce the number of messages required to keep memory consistent. Munin's multiprotocol release consistency is implemented in software using a delayed update queue that buffers and merges pending outgoing writes. A sixteen-processor prototype of Munin is currently operational. We evaluate its implementation and describe the execution of two Munin programs that achieve performance within ten percent of message passing implementations of the same programs. Munin achieves this level of performance with only minor annotations to the shared memory programs.

TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems

by Pete Keleher , Alan L. Cox, Sandhya Dwarkadas, Willy Zwaenepoel - IN PROCEEDINGS OF THE 1994 WINTER USENIX CONFERENCE , 1994
"... TreadMarks is a distributed shared memory (DSM) system for standard Unix systems such as SunOS and Ultrix. This paper presents a performance evaluation of TreadMarks running on Ultrix using DECstation-5000/240's that are connected by a 100-Mbps switch-based ATM LAN and a 10-Mbps Ethernet. Ou ..."
Abstract - Cited by 526 (17 self) - Add to MetaCart
TreadMarks is a distributed shared memory (DSM) system for standard Unix systems such as SunOS and Ultrix. This paper presents a performance evaluation of TreadMarks running on Ultrix using DECstation-5000/240's that are connected by a 100-Mbps switch-based ATM LAN and a 10-Mbps Ethernet. Our objective is to determine the efficiency of a user-level DSM implementation on commercially available workstations and operating systems. We achieved good speedups on the 8-processor ATM network for Jacobi (7.4), TSP (7.2), Quicksort (6.3), and ILINK (5.7). For a slightly modified version of Water from the SPLASH benchmark suite, we achieved only moderate speedups (4.0) due to the high communication and synchronization rate. Speedups decline on the 10-Mbps Ethernet (5.5 for Jacobi, 6.5 for TSP, 4.2 for Quicksort, 5.1 for ILINK, and 2.1 for Water), reflecting the bandwidth limitations of the Ethernet. These results support the contention that, with suitable networking technology, DSM is a...
(Show Context)

Citation Context

...14, and by a NASA Graduate Fellowship.Various software systems have been proposed and built to support parallel computation on workstation networks, e.g., tuple spaces [2], distributed shared memory =-=[18]-=-, and message passing [23]. TreadMarks is a distributed shared memory (DSM) system [18]. DSM enables processes on di erent machines to share memory, even though the machines physically do not share me...

Treadmarks: Shared memory computing on networks of workstations

by Cristiana Amza, Alan L. Cox, Hya Dwarkadas, Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, Willy Zwaenepoel - Computer , 1996
"... TreadMarks supports parallel computing on networks of workstations by providing the application with a shared memory abstraction. Shared memory facilitates the transition from sequential to parallel programs. After identifying possible sources of parallelism in the code, most of the data structures ..."
Abstract - Cited by 487 (37 self) - Add to MetaCart
TreadMarks supports parallel computing on networks of workstations by providing the application with a shared memory abstraction. Shared memory facilitates the transition from sequential to parallel programs. After identifying possible sources of parallelism in the code, most of the data structures can be retained without change, and only synchronization needs to be added to achieve a correct shared memory parallel program. Additional transformations may be necessary to optimize performance, but this can be done in an incremental fashion. We discuss the techniques used in TreadMarks to provide e cient shared memory, and our experience with two large applications, mixed integer programming and genetic linkage analysis. 1
(Show Context)

Citation Context

...ions using the TreadMarks distributed shared memory (DSM) system. DSM allows processes to assume a globally shared virtual memory even though they execute on nodes that do not physically share memory =-=[9]-=-. Figure 1 illustrates a DSM system consisting of N networked workstations, each with its own memory, connected by a network. The DSM software provides the abstraction of a globally shared memory, in ...

Evaluation of Release Consistent Software Distributed Shared Memory on Emerging Network Technology

by Sandhya Dwarkadas, Pete Keleher, Alan L. Cox, Willy Zwaenepoel
"... We evaluate the effect of processor speed, network characteristics, and software overhead on the performance of release-consistent software distributed shared memory. We examine five different protocols for implementing release consistency: eager update, eager invalidate, lazy update, lazy invalidat ..."
Abstract - Cited by 467 (43 self) - Add to MetaCart
We evaluate the effect of processor speed, network characteristics, and software overhead on the performance of release-consistent software distributed shared memory. We examine five different protocols for implementing release consistency: eager update, eager invalidate, lazy update, lazy invalidate, and a new protocol called lazy hybrid. This lazy hybrid protocol combines the benefits of both lazy update and lazy invalidate. Our simulations indicate that with the processors and networks that are becoming available, coarse-grained applications such as Jacobi and TSP perform well, more or less independent of the protocol used. Medium-grained applications, such as Water, can achieve good performance, but the choice of protocol is critical. For sixteen processors, the best protocol, lazy hybrid, performed more than three times better than the worst, the eager update. Fine-grained applications such as Cholesky achieve little speedup regardless of the protocol used because of the frequency of synchronization operations and the high latency involved. While the use of relaxed memory models, lazy implementations, and multiple-writer protocols has reduced the impact of false sharing, synchronization latency remains a serious problem for software distributed shared memory systems. These results suggest that future work on software DSMs should concentrate on reducing the amount ofsynchronization or its effect.
(Show Context)

Citation Context

... eliminate this communication, because processors that falsely share data are unlikely to be causally related. This observation is consistent with the results of our simulations. 8 6 Related Work Ivy =-=[12]-=- was the first page-based distributed shared memory system. The shared memory implemented by Ivy is sequentially consistent, and does not allow multiple writers. Clouds [15] uses program-based segment...

Orca: A language for parallel programming of distributed systems

by Henri E. Bal, M. Frans Kaashoek, Andrew S. Tanenbaum - IEEE Transactions on Software Engineering , 1992
"... Orca is a language for implementing parallel applications on loosely coupled distributed systems. Unlike most languages for distributed programming, it allows processes on different machines to share data. Such data are encapsulated in data-objects, which are instances of user-defined abstract data ..."
Abstract - Cited by 332 (46 self) - Add to MetaCart
Orca is a language for implementing parallel applications on loosely coupled distributed systems. Unlike most languages for distributed programming, it allows processes on different machines to share data. Such data are encapsulated in data-objects, which are instances of user-defined abstract data types. The implementation of Orca takes care of the physical distribution of objects among the local memories of the processors. In particular, an implementation may replicate and/or migrate objects in order to decrease access times to objects and increase parallelism. This paper gives a detailed description of the Orca language design and motivates the design choices. Orca is intended for applications programmers rather than systems programmers. This is reflected in its design goals to provide a simple, easy to use language that is type-secure and provides clean semantics. The paper discusses three example parallel applications in Orca, one of which is described in detail. It also describes one of the existing implementations, which is based on reliable broadcasting. Performance measurements of this system are given for three parallel applications. The measurements show that significant speedups can be obtained for all three applications. Finally, the paper compares Orca with several related languages and systems. 1.
(Show Context)

Citation Context

...ssors on which they run do not have physical shared memory. The main novelty of our approach is the way access to shared data is expressed. Unlike shared physical memory (or distributed shared memory =-=[6]-=-), shared data in Orca are accessed through user-defined high-level operations, which, as we will see, has many important implications. Supporting shared data on a distributed system imposes some chal...

Tempest and Typhoon: User-level Shared Memory

by Steven K. Reinhardt, James R. Larus, David A. Wood - In Proceedings of the 21st Annual International Symposium on Computer Architecture , 1994
"... Future parallel computers must efficiently execute not only hand-coded applications but also programs written in high-level, parallel programming languages. Today’s machines limit these programs to a single communication paradigm, either message-passing or shared-memory, which results in uneven perf ..."
Abstract - Cited by 309 (27 self) - Add to MetaCart
Future parallel computers must efficiently execute not only hand-coded applications but also programs written in high-level, parallel programming languages. Today’s machines limit these programs to a single communication paradigm, either message-passing or shared-memory, which results in uneven performance. This paper addresses this problem by defining an interface, Tempest, that exposes low-level communication and memory-system mechanisms so programmers and compilers can customize policies for a given application. Typhoon is a proposed hardware platform that implements these mechanisms with a fully-programmable, user-level processor in the network interface. We demonstrate the utility of Tempest with two examples. First, the Stache protocol uses Tempest’s finegrain access control mechanisms to manage part of a processor’s local memory as a large, fully-associative cache for remote data. We simulated Typhoon on the Wisconsin Wind Tunnel and found that Stache running on Typhoon performs comparably (±30%) to an all-hardware Dir N NB cache-coherence protocol for five shared-memory programs. Second, we illustrate how programmers or compilers can use Tempest’s flexibility to exploit an application’s sharing patterns with a custom protocol. For the EM3D application, the custom protocol improves performance up to 35 % over the all-hardware protocol.

The v distributed system

by David R. Cheriton , 1988
"... The V distributed System was developed at Stanford University as part of a research project to explore issues in distributed systems. Aspects of the design suggest important directions for the design of future operating systems and communication systems. ..."
Abstract - Cited by 307 (7 self) - Add to MetaCart
The V distributed System was developed at Stanford University as part of a research project to explore issues in distributed systems. Aspects of the design suggest important directions for the design of future operating systems and communication systems.
(Show Context)

Citation Context

...xperimenting with the distributed shared memory provided between nodes by the virtual memory system, as described in the earlier section on Memory Management. Judging by the experience reported by Li =-=[26]-=-, we expect this approach to be applicable to a significant class of distributed parallel programs, providing shared memory similar to that available in a shared memory multiprocessor, differing for c...

Transparent Process Migration: Design Alternatives and the Sprite Implementation

by Fred Douglis, John Ousterhout - Software - Practice and Experience , 1991
"... this paper is a description of our implementation and our experiences using it ..."
Abstract - Cited by 287 (5 self) - Add to MetaCart
this paper is a description of our implementation and our experiences using it

Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory

by Daniel J. Scales, Kourosh Gharachorloo, Chandramohan A. Thekkath
"... This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine granulari ..."
Abstract - Cited by 236 (5 self) - Add to MetaCart
This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine granularity. In addition, the system allows the coherence granularity to vary across different shared data structures in a single application. Shasta implements the shared address space by transparently rewriting the application executable to intercept loads and stores. For each shared load or store, the inserted code checks to see if the data is available locally and communicates with other processors if necessary. The system uses numerous techniques to reduce the run-time overhead of these checks. Since Shasta is implemented entirely in software, it also provides tremendous flexibility in supporting different types of cache coherence protocols. We have implemented an efficient
(Show Context)

Citation Context

...otations or explicit calls to access shared data [2, 11]. Another approach, called Shared Virtual Memory (SVM), uses the virtual memory hardware to detect access to data that is not available locally =-=[4, 13, 12]-=-. In most such systems, the granularity at which data is accessed and kept coherent is large, because it is related to the size of an application data structure or the size of a virtual page. We have ...

The Amber System: Parallel Programming on a Network of Multiprocessors

by Jeffrey Chase, Franz G. Amador, Edward D. Lazowska, Henry M. Levy, Richard J. Littlefield - In Proceedings of the 12th ACM Symposium on Operating Systems Principles , 1989
"... Microprocessor-based shared-memory multiprocessors are becoming widely available and promise to provide cost-effective high-performance computing. This paper describes a programming system called Amber which permits a single application program to use a homogeneous network of multiprocessors in a un ..."
Abstract - Cited by 231 (15 self) - Add to MetaCart
Microprocessor-based shared-memory multiprocessors are becoming widely available and promise to provide cost-effective high-performance computing. This paper describes a programming system called Amber which permits a single application program to use a homogeneous network of multiprocessors in a uniform way, making the network appear to the application as an integrated, non-uniform memory access, shared-memory multiprocessor. This simplifies the development of applications and allows compute-intensive parallel programs to effectively harness the potential of multiple nodes. Amber programs are written using an object-oriented subset of the C++ programming language, supplemented with primitives for managing concurrency and distribution. Amber provides a network-wide shared-object virtual memory in which coherence is provided by hardware means for locally-executing threads, and by software means for remote accesses. Amber runs on top of the Topaz operating system on a network of DEC SRC ...
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University