Results 1 - 10
of
10
Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems
"... We describe computation migration, a new technique that is based on compile-time program transformations, for accessing remote data in a distributed-memory parallel system. In contrast with RPC-style access, where the access is performed remotely, and with data migration, where the data is moved so ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
We describe computation migration, a new technique that is based on compile-time program transformations, for accessing remote data in a distributed-memory parallel system. In contrast with RPC-style access, where the access is performed remotely, and with data migration, where the data is moved so that it is local, computation migration moves part of the current thread to the processor where the data resides. The access is performed at the remote processor, and the migrated thread portion continues to run on that same processor; this makes subsequent accesses in the thread portion local. We describe an implementation of computation migration that consists of two parts: an implementation that migrates single activation frames, and a high-level language annotation that allows a programmer to express when migration is desired. We performed experiments using two applications; these experiments demonstrate that computation migration is a valuable alternative to RPC and data migration.
PRELUDE: A System for Portable Parallel Software
, 1991
"... In this paper we describe PRELUDE, a programming language and accompanying system support for writing portable MIMD parallel programs. PRELUDE supports a methodology for designing and orga. nizing parallel programs that makes them easier to tune for particular architectures and to port to new archit ..."
Abstract
-
Cited by 18 (8 self)
- Add to MetaCart
In this paper we describe PRELUDE, a programming language and accompanying system support for writing portable MIMD parallel programs. PRELUDE supports a methodology for designing and orga. nizing parallel programs that makes them easier to tune for particular architectures and to port to new architectures. It builds on earlier work on Emerald, Amber, and vaxious Fortran extensions to allow the programmer to divide programs into architecture-dependent and architecture-independent parts, and then to change the architecture-dependent parts to port the program to a new machine or to tune its performance on a single machine. The architecture-dependent parts of a program are specified by annotations that describe the mapping of a program onto a machine. PRELUDE provides a variety of mapping mechanisms similar to those in other systems, including remote procedure call, object migration, and data replication and partitioning. In addition, PRELUDE includes novel migration mechanisms for computations based on a form of continuation passing. The implementation of object migration in PRELUDE uses a novel approach based on fixup blocks that is more efficient than previous approaches, and amortizes the cost of each migration so that the cost per migration drops as the frequency of mi- grations increases.
Dynamic Computation Migration in DSM Systems
- In Supercomputing '96, Pitssburgh
, 1996
"... Dynamic computation migration is the runtime choice between computation and data migration. Dynamic computation migration speeds up access to concurrent data structures with unpredictable read/write patterns. This paper describes the design, implementation, and evaluation of dynamic computation migr ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Dynamic computation migration is the runtime choice between computation and data migration. Dynamic computation migration speeds up access to concurrent data structures with unpredictable read/write patterns. This paper describes the design, implementation, and evaluation of dynamic computation migration in a multithreaded distributed shared-memory system, MCRL. Two policies are studied, STATIC and REPEAT. Both migrate computation for writes. STATIC migrates data for reads, while REPEAT maintains a limited history of accesses and sometimes migrates computation for reads. On a concurrent, distributed B-tree with 50% lookups and 50% inserts, STATIC improves performance by about 17% on both Alewife and the CM-5. REPEAT generally performs better than STATIC. With 80% lookups and 20% inserts, REPEATimproves performance by 23% on Alewife, and by 46% on the CM-5. Keywords: computation migration, data migration, replication, coherence 1 Introduction Dynamic computation migration is the dyn...
Asynchronous Shared Memory Search Structures
- In Proc. 8th ACM Symp. on Parallel Algorithms and Architectures
, 1996
"... We study the problem of storing an ordered set on an asynchronous shared memory parallel computer. We examine the case where we want to efficiently perform successor (least upper bound) queries on the set members that are stored. We also examine the case where processors insert and delete members of ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We study the problem of storing an ordered set on an asynchronous shared memory parallel computer. We examine the case where we want to efficiently perform successor (least upper bound) queries on the set members that are stored. We also examine the case where processors insert and delete members of the set. Due to asynchrony, we require processors to perform queries and to maintain the structure independently. Although several such structures have been proposed, the analysis of these structures has been very limited. We here use the recently proposed QRQW PRAM model to provide upper and lower bounds on the performance of such data structures. In the asynchronous QRQW PRAM, the problem of processors concurrently and independently searching a shared data structure is very similar to the problem of routing packets through a network. Using this as a guide, we introduce the Search-Butterfly, a search structure that combines the efficient packet routing properties of the butterfly graph wit...
A Distributed, Replicated, Data-Balanced Search Structure
- International Journal of High Speed Computing
, 1994
"... Many concurrent dictionary data structures have been proposed, but usually in the context of shared memory multiprocessors. In this paper, we present an algorithm for a concurrent distributed B-tree that can be implemented on message passing computer systems. Our distributed B-tree (the dB-tree) ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Many concurrent dictionary data structures have been proposed, but usually in the context of shared memory multiprocessors. In this paper, we present an algorithm for a concurrent distributed B-tree that can be implemented on message passing computer systems. Our distributed B-tree (the dB-tree) replicates the interior nodes in order to improve parallelism and reduce message passing. The dB-tree stores some redundant information in its nodes to permit the use of lazy updates to maintain replica coherency. We show how the dB-tree algorithm can be used to build an efficient implementation of a highly parallel, data-balanced distributed dictionary, the dE-tree. Keywords: Concurrent dictionary data structures, Message passing multiprocessor systems, Balanced search trees, B-link trees, Replica coherency. 1. Introduction. We introduce a new balanced search tree algorithm for distributed memory architectures. The search tree uses the B-link tree [27] as a base, and distributes ow...
Algorithms for Search Trees on Message-Passing Architectures
- In Proceedings of the 1991 International Conference on Parallel Processing, volume III
, 1991
"... MIMD architecture; the algorithm is particularly well suited for implementation on a small number of processors. We introduce a (2 B-2, 2 B) search tree that uses a linear array of O(log n) processors to store n entries. Update operations use a bottom-up node-splitting scheme, which performs better ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
MIMD architecture; the algorithm is particularly well suited for implementation on a small number of processors. We introduce a (2 B-2, 2 B) search tree that uses a linear array of O(log n) processors to store n entries. Update operations use a bottom-up node-splitting scheme, which performs better than top-down search tree algorithms. Additionally, for a given cost ratio of computation to communication the value of B may be varied to maximize performance. Implementations on a parallel-architecture simulator are described.
A Parallel File I/O API for Cilk
"... Cheerio is an application programming interface (API) for efficient parallel file input and output (I/O) modeled closely after traditional serial POSIX I/O. Cheerio remedies a long-lived gap in multithreaded programming environments, which typically ignore file I/O and force programmers to implement ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Cheerio is an application programming interface (API) for efficient parallel file input and output (I/O) modeled closely after traditional serial POSIX I/O. Cheerio remedies a long-lived gap in multithreaded programming environments, which typically ignore file I/O and force programmers to implement parallel file I/O semantics on top of the raw operating system I/O API. Not only is this approach clumsy and semantically ugly, it can yield inefficient multithreaded programs.
Laboratory For
"... in multiprocessor systems. Pipes allow a sequence of remote invocations to be performed in order, but asynchronously with respect to the calling thread. Using pipes results in programs that are easier to understand and debug than those with explicit synchronization between asynchronous invocations. ..."
Abstract
- Add to MetaCart
in multiprocessor systems. Pipes allow a sequence of remote invocations to be performed in order, but asynchronously with respect to the calling thread. Using pipes results in programs that are easier to understand and debug than those with explicit synchronization between asynchronous invocations.
Pipes: Linguistic Support for . . .
, 1992
"... ... in multiprocessor systems. Pipes allow a sequence of remote invocations to be performed in order, but asynchronously with respect to the calling thread. Using pipes results in programs that are easier to understand and debug than those with explicit synchronization between asynchronous invocatio ..."
Abstract
- Add to MetaCart
... in multiprocessor systems. Pipes allow a sequence of remote invocations to be performed in order, but asynchronously with respect to the calling thread. Using pipes results in programs that are easier to understand and debug than those with explicit synchronization between asynchronous invocations. The semantics
Fast Serial-Append File I/O Mode Support Cilk for
"... Apart from resources, such as memory and processors, parallel computations also require file I/O. There exist a number of interesting file I/O modes, depending on the specific application. One particularly interesting file I/O mode is serial-append, described in [5]. In this mode the file output of ..."
Abstract
- Add to MetaCart
Apart from resources, such as memory and processors, parallel computations also require file I/O. There exist a number of interesting file I/O modes, depending on the specific application. One particularly interesting file I/O mode is serial-append, described in [5]. In this mode the file output of a parallel computation, executed on any number of processors, is “the same ” 1 as the output of the sequential, single processor

