Results 1 - 10
of
84
Implementation and performance of Munin
- IN PROCEEDINGS OF THE 13TH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES
, 1991
"... Munin is a distributed shared memory (DSM) system that allows shared memory parallel programs to be executed efficiently on distributed memory multiprocessors. Munin is unique among existing DSM systems in its use of multiple consistency protocols and in its use of release consistency. In Munin, sha ..."
Abstract
-
Cited by 587 (22 self)
- Add to MetaCart
Munin is a distributed shared memory (DSM) system that allows shared memory parallel programs to be executed efficiently on distributed memory multiprocessors. Munin is unique among existing DSM systems in its use of multiple consistency protocols and in its use of release consistency. In Munin, shared program variables are annotated with their expected access pattern, and these annotations are then used by the runtime system to choose a consistency protocol best suited to that access pattern. Release consistency allows Munin to mask network latency and reduce the number of messages required to keep memory consistent. Munin's multiprotocol release consistency is implemented in software using a delayed update queue that buffers and merges pending outgoing writes. A sixteen-processor prototype of Munin is currently operational. We evaluate its implementation and describe the execution of two Munin programs that achieve performance within ten percent of message passing implementations of the same programs. Munin achieves this level of performance with only minor annotations to the shared memory programs.
Evaluation of Release Consistent Software Distributed Shared Memory on Emerging Network Technology
"... We evaluate the effect of processor speed, network characteristics, and software overhead on the performance of release-consistent software distributed shared memory. We examine five different protocols for implementing release consistency: eager update, eager invalidate, lazy update, lazy invalidat ..."
Abstract
-
Cited by 467 (43 self)
- Add to MetaCart
We evaluate the effect of processor speed, network characteristics, and software overhead on the performance of release-consistent software distributed shared memory. We examine five different protocols for implementing release consistency: eager update, eager invalidate, lazy update, lazy invalidate, and a new protocol called lazy hybrid. This lazy hybrid protocol combines the benefits of both lazy update and lazy invalidate. Our simulations indicate that with the processors and networks that are becoming available, coarse-grained applications such as Jacobi and TSP perform well, more or less independent of the protocol used. Medium-grained applications, such as Water, can achieve good performance, but the choice of protocol is critical. For sixteen processors, the best protocol, lazy hybrid, performed more than three times better than the worst, the eager update. Fine-grained applications such as Cholesky achieve little speedup regardless of the protocol used because of the frequency of synchronization operations and the high latency involved. While the use of relaxed memory models, lazy implementations, and multiple-writer protocols has reduced the impact of false sharing, synchronization latency remains a serious problem for software distributed shared memory systems. These results suggest that future work on software DSMs should concentrate on reducing the amount ofsynchronization or its effect.
Orca: A language for parallel programming of distributed systems
- IEEE Transactions on Software Engineering
, 1992
"... Orca is a language for implementing parallel applications on loosely coupled distributed systems. Unlike most languages for distributed programming, it allows processes on different machines to share data. Such data are encapsulated in data-objects, which are instances of user-defined abstract data ..."
Abstract
-
Cited by 332 (46 self)
- Add to MetaCart
(Show Context)
Orca is a language for implementing parallel applications on loosely coupled distributed systems. Unlike most languages for distributed programming, it allows processes on different machines to share data. Such data are encapsulated in data-objects, which are instances of user-defined abstract data types. The implementation of Orca takes care of the physical distribution of objects among the local memories of the processors. In particular, an implementation may replicate and/or migrate objects in order to decrease access times to objects and increase parallelism. This paper gives a detailed description of the Orca language design and motivates the design choices. Orca is intended for applications programmers rather than systems programmers. This is reflected in its design goals to provide a simple, easy to use language that is type-secure and provides clean semantics. The paper discusses three example parallel applications in Orca, one of which is described in detail. It also describes one of the existing implementations, which is based on reliable broadcasting. Performance measurements of this system are given for three parallel applications. The measurements show that significant speedups can be obtained for all three applications. Finally, the paper compares Orca with several related languages and systems. 1.
Experiences with the amoeba distributed operating system
- Communications of the ACM
, 1990
"... The Amoeba distributed operating system has been in development and use for over eight years now. In this paper we describe the present system and our experience with it—what we did right, but also what we did wrong. Among the things done right were basing the system on objects, using a single unifo ..."
Abstract
-
Cited by 229 (20 self)
- Add to MetaCart
(Show Context)
The Amoeba distributed operating system has been in development and use for over eight years now. In this paper we describe the present system and our experience with it—what we did right, but also what we did wrong. Among the things done right were basing the system on objects, using a single uniform mechanism (capabilities) for naming and protecting them in a location independent way, and designing a completely new, and very fast file system. Among the things done wrong were having threads not be pre-emptable, initially building our own homebrew window system, and not having a multicast facility at the outset.
An Efficient Reliable Broadcast Protocol
- OPERATING SYSTEMS REVIEW
, 1989
"... Many distributed and parallel applications can make good use of broadcast communication. In this paper we present a (software) protocol that simulates reliable broadcast, even on an unreliable network. Using this protocol, application programs need not worry about lost messages. Recovery of comm ..."
Abstract
-
Cited by 149 (13 self)
- Add to MetaCart
Many distributed and parallel applications can make good use of broadcast communication. In this paper we present a (software) protocol that simulates reliable broadcast, even on an unreliable network. Using this protocol, application programs need not worry about lost messages. Recovery of communication failures is handled automatically and transparently by the protocol. In normal operation, our protocol is more efficient than previously published reliable broadcast protocols. An initial implementation of the protocol on 10 MC68020 CPUs connected by a 10 Mbit/sec Ethernet performs a reliable broadcast in 1.5 msec.
Lazy Release Consistency for Distributed Shared Memory
, 1995
"... A software distributed shared memory (DSM) system allows shared memory parallel programs to execute on networks of workstations. This thesis presents a new class of protocols that has lower communication requirements than previous DSM protocols, and can consequently achieve higher performance. The l ..."
Abstract
-
Cited by 104 (0 self)
- Add to MetaCart
A software distributed shared memory (DSM) system allows shared memory parallel programs to execute on networks of workstations. This thesis presents a new class of protocols that has lower communication requirements than previous DSM protocols, and can consequently achieve higher performance. The lazy release consistent protocols achieve this reduction in communication by piggybacking consistency information on top of existing synchronization transfers. Some of the protocols also improve performance by speculatively moving data. We evaluate the impact of these features by comparing the performance of a software DSM using lazy protocols with that of a DSM using previous eager protocols. We found that seven of our eight applications performed better on the lazy system, and four of the applications showed performance speedups of at least 18%. As part of this comparison, we show that the cost of executing the slightly more complex code of the lazy protocols is far less important than the ...
Fragmented Objects for Distributed Abstractions
- Readings in Distributed Computing Systems
, 1992
"... ions Mesaac Makpangou Yvon Gourhant Jean-Pierre Le Narzul Marc Shapiro INRIA, B.P. 105, 78153 Rocquencourt C'edex, France tel.: +33 (1) 39-63-52-93, fax: +33 (1) 39 63 53 30 e-mail: mak@sor.inria.fr, telex: 697 033 F October 1, 1991 Keywords: Distributed objects, object oriented programming, ..."
Abstract
-
Cited by 73 (3 self)
- Add to MetaCart
(Show Context)
ions Mesaac Makpangou Yvon Gourhant Jean-Pierre Le Narzul Marc Shapiro INRIA, B.P. 105, 78153 Rocquencourt C'edex, France tel.: +33 (1) 39-63-52-93, fax: +33 (1) 39 63 53 30 e-mail: mak@sor.inria.fr, telex: 697 033 F October 1, 1991 Keywords: Distributed objects, object oriented programming, distributed abstractions, fragmented objects, connective objects, FOG Abstract Fragmented Objects (FOs) extend the object concept to a distributed environment. The abstract view of a FO is a single, shared object, of which the distribution is hidden to clients. In the concrete view the FO designer controls (if wished) the distribution of data and function and of the communication between fragments. FO programming is supported by the FOG language, an extension of C++, and by a toolbox of predefined FOs. The FOG compiler ensures distributed typesafety of both the external and internal interfaces, verifies the encapsulation of FO instances, and automatically generates whatever coercions are necess...
The Architectural Design of Globe: A Wide-Area Distributed System
, 1997
"... . Developing large-scale wide-area applications requires an infrastructure that is presently lacking entirely. Currently, applications have to be built on top of raw communication services, such as TCP connections. All additional services, including those for naming, replication, migration, persiste ..."
Abstract
-
Cited by 69 (8 self)
- Add to MetaCart
. Developing large-scale wide-area applications requires an infrastructure that is presently lacking entirely. Currently, applications have to be built on top of raw communication services, such as TCP connections. All additional services, including those for naming, replication, migration, persistence, fault tolerance, and security, have to be implemented for each application anew. Not only is this a waste of effort, it also makes interoperability between different applications difficult or even impossible. We present a novel, object-based framework for developing wide-area distributed applications. The framework is based on the concept of a distributed shared object, which has the characteristic feature that its state can be physically distributed across multiple machines at the same time. All implementation aspects, including communication protocols, replication strategies, and distribution and migration of state, are part of an object and are hidden behind its interface. The curren...
Efficient Distributed Shared Memory Based On Multi-Protocol Release Consistency
, 1994
"... A distributed shared memory (DSM) system allows shared memory parallel programs to be executed on distributed memory multiprocessors. The challenge in building a DSM system is to achieve good performance over a wide range of shared memory programs without requiring extensive modifications to the s ..."
Abstract
-
Cited by 68 (5 self)
- Add to MetaCart
A distributed shared memory (DSM) system allows shared memory parallel programs to be executed on distributed memory multiprocessors. The challenge in building a DSM system is to achieve good performance over a wide range of shared memory programs without requiring extensive modifications to the source code. The performance challenge translates into reducing the amount of communication performed by the DSM system to that performed by an equivalent message passing program. This thesis describes four novel techniques for reducing the communication overhead of DSM, including: (i) the use of software release consistency, (ii) support for multiple consistency protocols, (iii) a multiple writer protocol, and (iv) an update timeout mechanism. Release consistency allows modifications of shared data to be handled via a delayed update queue, which masks network latencies. Providing multiple cons...
Performance Evaluation of the Orca Shared Object System
- ACM Transactions on Computer Systems
, 1998
"... Orca is a portable, object-based distributed shared memory system. This paper studies and evaluates the design choices made in the Orca system and compares Orca with other DSMs. The paper gives a quantitative analysis of Orca's coherence protocol (based on write-updates with function shipping), ..."
Abstract
-
Cited by 61 (42 self)
- Add to MetaCart
(Show Context)
Orca is a portable, object-based distributed shared memory system. This paper studies and evaluates the design choices made in the Orca system and compares Orca with other DSMs. The paper gives a quantitative analysis of Orca's coherence protocol (based on write-updates with function shipping), the totally-ordered group communication protocol, the strategy for object placement, and the all-software, user-space architecture. Performance measurements for ten parallel applications illustrate the tradeoffs made in the design of Orca, and also show that essentially the right design decisions have been made. A write-update protocol with function shipping is effective for Orca, especially since it is used in combination with techniques that avoid replicating objects that have a low read/write ratio. The overhead of totally-ordered group communication on application performance is low. The Orca system is able to make near-optimal decisions for object placement and replication. In addition, the...