Results 1 - 10
of
37
First-Class User-Level Threads
- In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles
, 1991
"... It is often desirable, for reasons of clarity, portability, and efficiency, to write parallel programs in which the number of processes is independent of the number of available processors. Several modern operating systems support more than one process in an address space, but the overhead of creati ..."
Abstract
-
Cited by 114 (12 self)
- Add to MetaCart
It is often desirable, for reasons of clarity, portability, and efficiency, to write parallel programs in which the number of processes is independent of the number of available processors. Several modern operating systems support more than one process in an address space, but the overhead of creating and synchronizing kernel processes can be high. Many runtime environments implement lightweight processes (threads) in user space, but this approach usually results in second-class status for threads, making it difficult or impossible to perform scheduling operations at appropriate times (e.g. when the current thread blocks in the kernel). In addition, a lack of common assumptions may also make it difficult for parallel programs or library routines that use dissimilar thread packages to communicate with each other, or to synchronize access to shared data. We describe a set of kernel mechanisms and conventions designed to accord first-class status to user-level threads, allowing them to be used in any reasonable way that traditional kernel-provided processes can be used, while leaving the details of their implementation to userlevel code. The key features of our approach are (1) shared memory for asynchronous communication between the kernel and the user, (2) software interrupts for events that might require action on the part of a user-level scheduler, and (3) a scheduler interface convention that facilitates interactions in user space between dissimilar kinds of threads. We have incorporated these mechanisms in the Psyche parallel operating system, and have used them to implement several different kinds of user-level threads. We argue for our approach in terms of both flexibility and performance.
The Concurrent Language Shared Prolog
, 1991
"... Shared Prolog is a new concurrent logic language. A Shared Prolog system is composed of a set of parallel agents which are Prolog programs extended by a guard mechanism. The programmer controls the granularity of parallelism coordinating communication and synchronization of the agents via a centrali ..."
Abstract
-
Cited by 73 (14 self)
- Add to MetaCart
Shared Prolog is a new concurrent logic language. A Shared Prolog system is composed of a set of parallel agents which are Prolog programs extended by a guard mechanism. The programmer controls the granularity of parallelism coordinating communication and synchronization of the agents via a centralized data structure. The communication mechanism is inherited from the blackboard model of problem solving. Intuitively, the granularity of the logic processes to be elaborated in parallel is large, while the resources shared on the blackboard can be very fine-grained. An operational semantics for Shared Prolog is given in terms of a distributed model. Through an abstract notion of computation, the kinds of parallelism supported by the language as well as properties of infinite computations, such as local deadlocks, are studied. The expressiveness of the language is shown with respect to the specification of two classes of applications: metaprogramming and blackboard systems. Categories an...
An Implementation of Distributed Shared Memory
- SOFTWARE - PRACTICE AND EXPERIENCE
, 1991
"... Shared memory is a simple yet powerful paradigm for structuring systems. Recently, there has been an interest in extending this paradigm to non-shared memory architectures as well. For example, the virtual address spaces for all objects in a distributed object-based system could be viewed as constit ..."
Abstract
-
Cited by 54 (5 self)
- Add to MetaCart
Shared memory is a simple yet powerful paradigm for structuring systems. Recently, there has been an interest in extending this paradigm to non-shared memory architectures as well. For example, the virtual address spaces for all objects in a distributed object-based system could be viewed as constituting a global distributed shared memory. We propose a set of primitives for managing distributed shared memory. We present an implementation of these primitives in the context of an object-based operating system as well as on top of Unix.
The purpose of this paper is to present a set of mechanisms for DSM and an implementation of these mechanisms. All the resources of the system are viewed as potentially shared objects. The name space of these objects constitute a distributed shared memory. The objects are composed of segments, where a segment is a logical entity that has attributes such as read-only, and read-write. There is a concept of ownership and the node where a segment is created (the owner node) is responsible for guaranteeing the consistency of the segment. The distributed shared memory
controller (DSMC) to be described next is the entity that provides the mechanisms for managing these segments.
Design, Implementation, and Performance Evaluation of a Distributed Shared Memory Server for Mach
- In 1988 Winter USENIX Conference
, 1988
"... This report describes the design, implementation and performance evaluation of a virtual shared memory server for the Mach operating system. The server provides unrestricted sharing of read-write memory between tasks running on either strongly coupled or loosely coupled architectures, and any mixtur ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
This report describes the design, implementation and performance evaluation of a virtual shared memory server for the Mach operating system. The server provides unrestricted sharing of read-write memory between tasks running on either strongly coupled or loosely coupled architectures, and any mixture thereof. A number of memory coherency algorithms have been implemented and evaluated, including a new distributed algorithm that is shown to outperform centralized ones. Some of the features of the server include support for machines with multiple page sizes, for heterogeneous shared memory, and for fault tolerance. Extensive performance measures of applications are presented, and the intrinsic costs evaluated. 2 1. Introduction Shared memory multiprocessors are becoming increasingly available, and with them a faster way to program applications and system services via the use of shared memory. Currently, the major limitation in using shared memory is that it is not extensible network-wi...
Translation lookaside buffer consistency: a software approach
- Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-III
, 1989
"... We discuss the translation lookaside buffer (TLB) consistency prob-lem for multiprocessors, and introduce the Mach shootdown algo-rithm for maintaining TLJ3 consistency in software. This algorithm has been implemented on several multiprocessors, and is in regular production use. Performance evaluati ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
We discuss the translation lookaside buffer (TLB) consistency prob-lem for multiprocessors, and introduce the Mach shootdown algo-rithm for maintaining TLJ3 consistency in software. This algorithm has been implemented on several multiprocessors, and is in regular production use. Performance evaluations establish the basic costs of the algorithm and show that it has minimal impact on applica-tion performance. As a result, TLB consistency does not pose an insurmountable obstacle to multiprocessors with several hundred processors. We also discuss hardware support options for TLB consistency ranging from a minor interrupt structure modification to complete hardware implementations. Features are identified in current hardware that compound the TLB consistency problem; re-moval or correction of these features can simplify and/or reduce the overhead of maintaining TLB consistency in software. 1
Virtual Shared Memory: A Survey of Techniques and Systems
, 1992
"... Shared memory abstraction on distributed memory hardware has become very popular recently. The abstraction can be provided at various levels in the architecture e.g. hardware, software, employing special mechanisms to maintain coherence of data. In this paper we present a survey of basic techniques ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Shared memory abstraction on distributed memory hardware has become very popular recently. The abstraction can be provided at various levels in the architecture e.g. hardware, software, employing special mechanisms to maintain coherence of data. In this paper we present a survey of basic techniques and review a large number of architectures that provide such an abstraction. We also propose new terminology which is more consistent and orderly as compared with the existing use of terminology for such architectures. 1 Introduction Virtual Shared Memory (VSM) in its most general sense refers to a provision of a shared address space on distributed memory hardware. Such architectures contain no physically shared memory. Instead the distributed local memories collectively provide a virtual address space shared by all the processors. VSM combines the benefits of the ease of programming found in shared-memory multiprocessors with the scalability of message-passing multiprocessors. The implemen...
Experiments with a Task Partitioning Model for Heterogeneous Computing
, 1992
"... One potentially promising approach for exploiting the best features of a variety of different computer architectures is to partition an application program to simultaneously execute on two or more different machines interconnected with a high-speed network. A fundamental problem with this heterogene ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
One potentially promising approach for exploiting the best features of a variety of different computer architectures is to partition an application program to simultaneously execute on two or more different machines interconnected with a high-speed network. A fundamental problem with this heterogeneous computing, however, is the difficulty of partitioning an application program across the machines. This paper presents a partitioning strategy that relates the relative performance of two heterogeneous machines to the communication cost of transferring partial results across their interconnection network. Experiments are described that use this strategy to partition two different application programs across the sequential front-end processor of a Connection Machine CM-200, and its parallel back-end array.
Multi-level Shared State for Distributed Systems
- In Proc. of the 2002 Intl. Conf. on Parallel Processing
, 2002
"... As a result of advances in processor and network speeds, more and more applications can productively be spread across geographically distributed machines. In this paper we present a transparent system for memory sharing, InterWeave, developed with such applications in mind. InterWeave can accommodat ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
As a result of advances in processor and network speeds, more and more applications can productively be spread across geographically distributed machines. In this paper we present a transparent system for memory sharing, InterWeave, developed with such applications in mind. InterWeave can accommodate hardware coherence and consistency within multiprocessors (level-1 sharing), software distributed shared memory (S-DSM) within tightly coupled clusters (level-2 sharing), and version-based coherence and consistency across the Internet (level-3 sharing). InterWeave allows processes written in multiple languages, running on heterogeneous machines, to share arbitrary typed data structures as if they resided in local memory. Application-specific knowledge of minimal coherence requirements is used to minimize communication. Consistency information is maintained in a manner that allows scaling to large amounts of shared data. In C, operations on shared data, including pointers, take precisely the same form as operations on non-shared data. We demonstrate the ease of use and efficiency of the system through an evaluation of several applications. In particular, we demonstrate that InterWeave's support for sharing at higher (more distributed) levels does not reduce the performance of sharing at lower (more tightly coupled) levels.
Gilgamesh: A Multithreaded Processor-In-Memory Architecture for Petaflops Computing
, 2002
"... Processor-in-Memory (PIM) architectures avoid the von Neumann bottleneck in conventional machines by integrating high-density DRAM and CMOS logic on the same chip. Parallel systems based on this new technology are expected to provide higher scalability, adaptability, robustness, fault tolerance and ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Processor-in-Memory (PIM) architectures avoid the von Neumann bottleneck in conventional machines by integrating high-density DRAM and CMOS logic on the same chip. Parallel systems based on this new technology are expected to provide higher scalability, adaptability, robustness, fault tolerance and lower power consumption than current MPPs or commodity clusters. In this paper we describe the design of Gilgamesh, a PIM-based massively parallel architecture, and elements of its execution model. Gilgamesh extends existing PIM capabilities by incorporating advanced mechanisms for virtualizing tasks and data and providing adaptive resource management for load balancing and latency tolerance. The Gilgamesh execution model is based on macroservers, a middleware layer which supports object-based runtime management of data and threads allowing explicit and dynamic control of locality and load balancing.
Efficient Distributed Shared State for Heterogeneous Machine Architectures
- IN PROC. OF THE 23RD INTL. CONF. ON DISTRIBUTED COMPUTING SYSTEMS
, 2003
"... InterWeave is a distributed middleware system that supports the sharing of strongly typed, pointerrich data structures across heterogeneous platforms. As a complement to RPC-based systems such as CORBA, .NET, and Java RMI, InterWeave allows processes to access shared data using ordinary reads and wr ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
InterWeave is a distributed middleware system that supports the sharing of strongly typed, pointerrich data structures across heterogeneous platforms. As a complement to RPC-based systems such as CORBA, .NET, and Java RMI, InterWeave allows processes to access shared data using ordinary reads and writes. To economize on network bandwidth, InterWeave caches data locally, and employs twoway diffing to maintain coherence and consistency, transmitting only the portions of the data that have changed. Experience indicates that InterWeave-style sharing facilitates the rapid development of distributed applications, and enhances performance through transparent caching of state that would traditionally be obtained via explicit callbacks or overly conservative deep-copy parameter passing. In this paper

