Results 11  20
of
83
A practical hierarchical model of parallel computation: the model
, 1991
"... We introduce a model of parallel computation that retains the ideal properties of the PRAM by using it as a submodel, while simultaneously being more reflective of realistic parallel architectures by accounting for and providing abstract control over communication and synchronization costs. The Hi ..."
Abstract

Cited by 37 (5 self)
 Add to MetaCart
We introduce a model of parallel computation that retains the ideal properties of the PRAM by using it as a submodel, while simultaneously being more reflective of realistic parallel architectures by accounting for and providing abstract control over communication and synchronization costs. The Hierarchical PRAM (HPRAM) model controls conceptual complexity in the face of asynchrony in two ways. First, by providing the simplifying assumption of synchronization to the design of algorithms, but allowing the algorithms to work asynchronously with each other; and organizing this "control asynchrony " via an implicit hierarchy relation. Second, by allowing.the restriction of "communication asynchrony " in order to obtain determinate algorithms (thus greatly simplifying proofs of correctness). It is shown that the model is reflective of a variety of existing and proposed parallel architectures, particularly ones that can support massive parallelism. Relationships to programming
An optical simulation of shared memory
, 1994
"... We present a workoptimal randomized algorithm for simulating a shared memory machine (pram) on an optical communication parallel computer (ocpc). The ocpc model is motivated by the potential of optical communication for parallel computation. The memory of an ocpc is divided into modules, one module ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
We present a workoptimal randomized algorithm for simulating a shared memory machine (pram) on an optical communication parallel computer (ocpc). The ocpc model is motivated by the potential of optical communication for parallel computation. The memory of an ocpc is divided into modules, one module per processor. Each memory module only services a request on a timestep if it receives exactly one memory request. Our algorithm simulates each step of an n lg lg nprocessor erew pram on an nprocessor ocpc in O(lg lg n) expected delay. (The probability that the delay is longer than this is at most n; for any constant.) The best previous simulation, due to Valiant, required (lg n) expected delay.
The Complexity of Computation on the Parallel Random Access Machine
, 1993
"... PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much m ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much more powerful than restricted instruction set PRAMs. THEOREM 21.16 Any function of n variables can be computed by an abstract EROW PRAM in O(log n) steps using n= log 2 n processors and n=2 log 2 n shared memory cells. PROOF Each processor begins by reading log 2 n input values and combining them into one large value. The information known by processors are combined in a binarytreelike fashion. In each round, the remaining processors are grouped into pairs. In each pair, one processor communicates the information it knows about the input to the other processor and then leaves the computation. After dlog 2 ne rounds, one processor knows all n input values. Then this processor computes th...
On Contention Resolution Protocols and Associated Probabilistic Phenomena
 IN PROCEEDINGS OF THE 26TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING
, 1994
"... ..."
Exploiting Storage Redundancy to Speed Up Randomized Shared Memory Simulations
 IN PROCEEDINGS OF THE 12TH ANNUAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE
, 1996
"... Assume that a set U of memory locations is distributed among n memory modules, using some number a of hash functions h1 ; : : : ; ha , randomly and independently drawn from a high performance universal class of hash functions. Thus each memory location has a copies. Consider the task of accessing b ..."
Abstract

Cited by 32 (9 self)
 Add to MetaCart
Assume that a set U of memory locations is distributed among n memory modules, using some number a of hash functions h1 ; : : : ; ha , randomly and independently drawn from a high performance universal class of hash functions. Thus each memory location has a copies. Consider the task of accessing b out of the a copies for each of given keys x1 ; : : : ; xn 2 U , b ! a. The paper presents and analyses a simple process executing the above task on distributed memory machines (DMMs) with n processors. Efficient implementations are presented, implying ffl a simulation of an nprocessor PRAM on an nprocessor optical crossbar DMM with delay O(log log n), ffl a simulation as above on an arbitraryDMM with delay O( log log n log log log n ), ffl an implementation of a static dictionary on an arbitraryDMM with parallel access time O(log n + log log n log a ), if a hash functions are used. In particular, an access time of O(log n) can be reached if (log n) 1= log n hash funct...
Reconfigurable distributed storage for dynamic networks
 In 9th International Conference on Principles of Distributed Systems (OPODIS
, 2005
"... Abstract. This paper presents a new algorithm, RDS (Reconfigurable Distributed Storage), for implementing a reconfigurable distributed shared memory in an asynchronous dynamic network. The algorithm guarantees atomic consistency (linearizability) in all executions in the presence of arbitrary crash ..."
Abstract

Cited by 29 (11 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents a new algorithm, RDS (Reconfigurable Distributed Storage), for implementing a reconfigurable distributed shared memory in an asynchronous dynamic network. The algorithm guarantees atomic consistency (linearizability) in all executions in the presence of arbitrary crash failures of processors and message loss and delays. The algorithm incorporates a quorumbased read/write algorithm and an optimized consensus protocol, based on Paxos. RDS achieves the design goals of: (i) allowing read and write operations to complete rapidly, and (ii) providing longterm fault tolerance through reconfiguration, a process that evolves the quorum configurations used by the read and write operations. The new algorithm improves on previously developed alternatives by using a more efficient reconfiguration protocol, thus guaranteeing better fault tolerance and faster recovery from network instability. This paper presents RDS, a formal proof of correctness, conditional performance analysis, and experimental results.
Exploiting Locality for Data Management in Systems of Limited Bandwidth
 IN PROCEEDINGS OF THE 38TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 1997
"... This paper deals with data management in computer systems in which the computing nodes are connected by a relatively sparse network. We consider the problem of placing and accessing a set of shared objects that are read and written from the nodes in the network. These objects are, e.g., global varia ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
This paper deals with data management in computer systems in which the computing nodes are connected by a relatively sparse network. We consider the problem of placing and accessing a set of shared objects that are read and written from the nodes in the network. These objects are, e.g., global variables in a parallel program, pages or cache lines in a virtual shared memory system, shared files in a distributed file system, or pages in the World Wide Web. A data management strategy consists of a placement strategy that maps the objects (possibly dynamically and with redundancy) to the nodes, and an access strategy that describes how reads and writes are handled by the system (including the routing). We investigate static and dynamic data management strategies. In the static model, we assume that we are given an application for which the rates of read and write accesses for all nodeobject pairs are known. The goal is to calculate a static placement of the objects to the nodes in the ne...
Parallel Balanced Allocations
 IN PROCEEDINGS OF THE 8TH ANNUAL ACM SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1996
"... We study the well known problem of throwing m balls into n bins. If each ball in the sequential game is allowed to select more than one bin, the maximum load of the bins can be exponentially reduced compared to the `classical balls into bins' game. We consider a static and a dynamic variant of ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
We study the well known problem of throwing m balls into n bins. If each ball in the sequential game is allowed to select more than one bin, the maximum load of the bins can be exponentially reduced compared to the `classical balls into bins' game. We consider a static and a dynamic variant of a randomized parallel allocation where each ball can choose a constant number of bins. All results hold with high probability. In the static case all m balls arrive at the same time. We analyze for m = n a very simple optimal class of protocols achieving maximum load O i r q log n log log n j if r rounds of communication are allowed. This matches the lower bound of [ACMR95]. Furthermore, we generalize the protocols to the case of m ? n balls. An optimal load of O(m=n) can be achieved using log log n log(m=n) rounds of communication. Hence, for m = n log log n log log log n balls this slackness allows to hide the amount of communication. In the `classical balls into bins' game this op...
Shared Memory Simulations with TripleLogarithmic Delay (Extended Abstract)
, 1995
"... ) Artur Czumaj 1 , Friedhelm Meyer auf der Heide 2 , and Volker Stemann 1 1 Heinz Nixdorf Institute, University of Paderborn, D33095 Paderborn, Germany 2 Heinz Nixdorf Institute and Department of Computer Science, University of Paderborn, D33095 Paderborn, Germany Abstract. We conside ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
) Artur Czumaj 1 , Friedhelm Meyer auf der Heide 2 , and Volker Stemann 1 1 Heinz Nixdorf Institute, University of Paderborn, D33095 Paderborn, Germany 2 Heinz Nixdorf Institute and Department of Computer Science, University of Paderborn, D33095 Paderborn, Germany Abstract. We consider the problem of simulating a PRAM on a distributed memory machine (DMM). Our main result is a randomized algorithm that simulates each step of an nprocessor CRCW PRAM on an nprocessor DMM with O(log log log n log n) delay, with high probability. This is an exponential improvement on all previously known simulations. It can be extended to a simulation of an (n log log log n log n) processor EREW PRAM on an nprocessor DMM with optimal delay O(log log log n log n), with high probability. Finally a lower bound of \Omega (log log log n=log log log log n) expected time is proved for a large class of randomized simulations that includes all known simulations. 1 Introduction Para...
Approximation Algorithms for Data Management in Networks
 SPAA
, 2001
"... This paper deals with static data management in computer systems connected by networks. A basic functionality in these systems is the interactive use of shared data objects that can be accessed from each computer in the system. Examples for these objects are files in distributed file systems, cache ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
This paper deals with static data management in computer systems connected by networks. A basic functionality in these systems is the interactive use of shared data objects that can be accessed from each computer in the system. Examples for these objects are files in distributed file systems, cache lines in virtual shared memory systems, or pages in the WWW. In the static scenario we are given read and write request frequencies for each computerobject pair. The goal is to calculate a placement of the objects to the memory modules, possibly with redundancy, such that a given cost function is minimized. With the widespread use of commercial networks, as, e.g., the Internet, it is more and more important to consider commercial factors within data management strategies. The goal in previous work was to utilize the available resources, especially the bandwidth, as good as possible. We will present data management strategies for a model in which commercial cost instead of the communication cost are minimized, i.e., we are given a metric communication cost function and a storage cost function. We introduce new deterministic algorithms for the static data management problem on trees and arbitrary networks. Our algorithms aim to minimize the total cost. To our knowledge this is the first analytic treatment of this problem that is NPhard on arbitrary networks. Our main result is a combinatorial algorithm that calculates a constant factor approximation for arbitrary networks in polynomial time. Further, we present an algorithm for trees that calculates an optimal placement of all objects in X on a tree T = (V, E) in time O(X  · V  · diam(T) · log(deg(T))).