Results 1  10
of
57
Direct BulkSynchronous Parallel Algorithms
 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1992
"... We describe a methodology for constructing parallel algorithms that are transportable among parallel computers having different numbers of processors, different bandwidths of interprocessor communication and different periodicity of global synchronisation. We do this for the bulksynchronous paralle ..."
Abstract

Cited by 174 (27 self)
 Add to MetaCart
We describe a methodology for constructing parallel algorithms that are transportable among parallel computers having different numbers of processors, different bandwidths of interprocessor communication and different periodicity of global synchronisation. We do this for the bulksynchronous parallel (BSP) model, which abstracts the characteristics of a parallel machine into three numerical parameters p, g, and L, corresponding to processors, bandwidth, and periodicity respectively. The model differentiates memory that is local to a processor from that which is not, but, for the sake of universality, does not differentiate network proximity. The advantages of this model in supporting shared memory or PRAM style programming have been treated elsewhere. Here we emphasise the viability of an alternative direct style of programming where, for the sake of efficiency the programmer retains control of memory allocation. We show that optimality to within a multiplicative factor close to one ca...
Special Purpose Parallel Computing
 Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract

Cited by 82 (6 self)
 Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
Doubly Logarithmic Communication Algorithms for Optical Communication Parallel Computers
 In Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1994
"... In this paper we consider the problem of interprocessor communication on parallel computers that have optical communication networks. We consider the Completely Connected Optical Communication Parallel Computer (OCPC), which has a completely connected optical network and also the Mesh of Optical Bus ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
In this paper we consider the problem of interprocessor communication on parallel computers that have optical communication networks. We consider the Completely Connected Optical Communication Parallel Computer (OCPC), which has a completely connected optical network and also the Mesh of Optical Buses Parallel Computer (MOBPC) , which has a mesh of optical buses as its communication network. The particular communication problem that we study is that of realizing an hrelation. In this problem, each processor has at most h messages to send and at most h messages to receive. It is clear that any 1relation can be realized in one communication step on an OCPC. However, the best previously known pprocessor OCPC algorithm for realizing an arbitrary hrelation for h ? 1 requires \Theta(h + log p) expected communication steps. (This algorithm is due to Valiant and is based on earlier work of Anderson and Miller.) Valiant's algorithm is optimal only for h = \Omega\Gamma139 p) and it is an op...
Can a SharedMemory Model Serve as a Bridging Model for Parallel Computation?
, 1999
"... There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style fo ..."
Abstract

Cited by 40 (12 self)
 Add to MetaCart
There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the sharedmemory abstraction as an easytouse platform, the bandwidth limitations of current machines have diverted much attention to messagepassing and distributedmemory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a sharedmemory model can serve as an effective bridging model for parallel computation. In particular, can a sharedmemory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing SharedMemory (QSM) model, which accounts for limited communication bandwidth while still providing a simple sharedmemory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple workpreserving emulation of the QSM on both the BSP, and on a related model, the (d, x)BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios
Sorting Selection and Routing on the Array with Reconfigurable Optical Buses
"... In this paper we present efficient algorithms for sorting, selection and packet routing on the AROB (Array with Reconfigurable Optical Buses) model. ..."
Abstract

Cited by 34 (7 self)
 Add to MetaCart
(Show Context)
In this paper we present efficient algorithms for sorting, selection and packet routing on the AROB (Array with Reconfigurable Optical Buses) model.
Parallel Tree Contraction Part 2: Further Applications
 SIAM JOURNAL ON COMPUTING
, 1991
"... This paper applies the parallel tree contraction techniques developed in Miller and paper [Randomness and Computation, 5, S. Micali, ed., JAI Press, 1989, pp. 4772] to a number of fundamental graph problems. The paper presents an time and processor, a 0sided randomized algorithm for testing the i ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
This paper applies the parallel tree contraction techniques developed in Miller and paper [Randomness and Computation, 5, S. Micali, ed., JAI Press, 1989, pp. 4772] to a number of fundamental graph problems. The paper presents an time and processor, a 0sided randomized algorithm for testing the isomorphism of trees, and an n) time, nprocessor algorithm for maximal isomorphism and for common subexpression elimination. An time, nprocessor algorithm for computing the canonical forms of trees and subtrees is given. An Ologn time algorithm for computing the tree of 3connected components of a graph, an n)time algorithm for computing an explicit planar embedding of a planar graph, and an n)time algorithm for computing a canonical form for a planar graph are also given. All these latter algorithms use only processors on a Parallel Random Access Machine (PRAM) model with concurrent writes and concurrent reads.
An optical simulation of shared memory
, 1994
"... We present a workoptimal randomized algorithm for simulating a shared memory machine (pram) on an optical communication parallel computer (ocpc). The ocpc model is motivated by the potential of optical communication for parallel computation. The memory of an ocpc is divided into modules, one module ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
We present a workoptimal randomized algorithm for simulating a shared memory machine (pram) on an optical communication parallel computer (ocpc). The ocpc model is motivated by the potential of optical communication for parallel computation. The memory of an ocpc is divided into modules, one module per processor. Each memory module only services a request on a timestep if it receives exactly one memory request. Our algorithm simulates each step of an n lg lg nprocessor erew pram on an nprocessor ocpc in O(lg lg n) expected delay. (The probability that the delay is longer than this is at most n; for any constant.) The best previous simulation, due to Valiant, required (lg n) expected delay.
On Contention Resolution Protocols and Associated Probabilistic Phenomena
 IN PROCEEDINGS OF THE 26TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING
, 1994
"... ..."
Exploiting Storage Redundancy to Speed Up Randomized Shared Memory Simulations
 IN PROCEEDINGS OF THE 12TH ANNUAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE
, 1996
"... Assume that a set U of memory locations is distributed among n memory modules, using some number a of hash functions h1 ; : : : ; ha , randomly and independently drawn from a high performance universal class of hash functions. Thus each memory location has a copies. Consider the task of accessing b ..."
Abstract

Cited by 32 (9 self)
 Add to MetaCart
Assume that a set U of memory locations is distributed among n memory modules, using some number a of hash functions h1 ; : : : ; ha , randomly and independently drawn from a high performance universal class of hash functions. Thus each memory location has a copies. Consider the task of accessing b out of the a copies for each of given keys x1 ; : : : ; xn 2 U , b ! a. The paper presents and analyses a simple process executing the above task on distributed memory machines (DMMs) with n processors. Efficient implementations are presented, implying ffl a simulation of an nprocessor PRAM on an nprocessor optical crossbar DMM with delay O(log log n), ffl a simulation as above on an arbitraryDMM with delay O( log log n log log log n ), ffl an implementation of a static dictionary on an arbitraryDMM with parallel access time O(log n + log log n log a ), if a hash functions are used. In particular, an access time of O(log n) can be reached if (log n) 1= log n hash funct...
Contention Resolution with Constant Expected Delay
"... We study contention resolution problem in a multipleaccess channel such as the Ethernet... ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
We study contention resolution problem in a multipleaccess channel such as the Ethernet...