Results 1 - 10
of
16
Efficient Low-Contention Parallel Algorithms
- the 1994 ACM Symp. on Parallel Algorithms and Architectures
, 1994
"... The queue-read, queue-write (qrqw) parallel random access machine (pram) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The qrqw pram model reflects the contention prope ..."
Abstract
-
Cited by 29 (11 self)
- Add to MetaCart
The queue-read, queue-write (qrqw) parallel random access machine (pram) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the well-studied crcw pram or erew pram models, and can be efficiently emulated with only logarithmic slowdown on hypercubetype non-combining networks. This paper describes fast, low-contention, work-optimal, randomized qrqw pram algorithms for the fundamental problems of load balancing, multiple compaction, generating a random permutation, parallel hashing, and distributive sorting. These logarithmic or sublogarithmic time algorithms considerably improve upon the best known erew pram algorithms for these problems, while avoiding the high-contention steps typical of crcw pram algorithms. An illustrative expe...
Optimal Deterministic Approximate Parallel Prefix Sums and Their Applications
- In Proc. Israel Symp. on Theory and Computing Systems (ISTCS'95
, 1995
"... We show that extremely accurate approximation to the prefix sums of a sequence of n integers can be computed deterministically in O(log log n) time using O(n= log log n) processors in the Common CRCW PRAM model. This complements randomized approximation methods obtained recently by Goodrich, Matias ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We show that extremely accurate approximation to the prefix sums of a sequence of n integers can be computed deterministically in O(log log n) time using O(n= log log n) processors in the Common CRCW PRAM model. This complements randomized approximation methods obtained recently by Goodrich, Matias and Vishkin and improves previous deterministic results obtained by Hagerup and Raman. Furthermore, our results completely match a lower bound obtained recently by Chaudhuri. Our results have many applications. Using them we improve upon the best known time bounds for deterministic approximate selection and for deterministic padded sorting. 1 Introduction The computation of prefix sums is one of the most basic tools in the design of fast parallel algorithms (see Blelloch [9] and J'aJ'a [33]). Prefix-sums can be computed in O(logn) time and linear work in the EREW PRAM model (Ladner and Fischer [34]) and in O(log n= log log n) and linear work in the Common CRCW PRAM model (Cole and Vishkin...
Construction of 1-D Lower Envelopes and Applications
"... We consider the problem of computing the lower envelope (the minimum) of n constant degree algebraic functions of one variable. The lower envelope has size O(nfi(n)) where fi(n) is a nearly constant function, and it can easily be computed in time O(nfi(n) log n) by a simple deterministic divide-and ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We consider the problem of computing the lower envelope (the minimum) of n constant degree algebraic functions of one variable. The lower envelope has size O(nfi(n)) where fi(n) is a nearly constant function, and it can easily be computed in time O(nfi(n) log n) by a simple deterministic divide-and-conquer algorithm [45]. We give an alternative simple (module a derandomization black box) approach using divide-and-conquer based on cuttings that results in a deterministic sequential algorithm that runs in the same time bound. This algorithm uses derandomization tools by now standard. This approach however allows us to obtain the following results: ffl A deterministic sequential algorithm that is output sensitive and runs in time O(n log f) if f n ffl , or O(nfi(f) log f) = O(nfi(n) log n) otherwise, where f is the size of the output; ffl a randomized parallel EREW algorithm that runs in time O(log n) and uses nearly optimal work O(nfi 2 (n) log n) with n-polynomial probability...
Structural Parallel Algorithmics
, 1991
"... The first half of the paper is a general introduction which emphasizes the central role that the PRAM model of parallel computation plays in algorithmic studies for parallel computers. Some of the collective knowledge-base on non-numerical parallel algorithms can be characterized in a structural way ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
The first half of the paper is a general introduction which emphasizes the central role that the PRAM model of parallel computation plays in algorithmic studies for parallel computers. Some of the collective knowledge-base on non-numerical parallel algorithms can be characterized in a structural way. Each structure relates a few problems and technique to one another from the basic to the more involved. The second half of the paper provides a bird's-eye view of such structures for: (1) list, tree and graph parallel algorithms; (2) very fast deterministic parallel algorithms; and (3) very fast randomized parallel algorithms. 1 Introduction Parallelism is a concern that is missing from "traditional" algorithmic design. Unfortunately, it turns out that most efficient serial algorithms become rather inefficient parallel algorithms. The experience is that the design of parallel algorithms requires new paradigms and techniques, offering an exciting intellectual challenge. We note that it had...
Efficient parallel algorithms for closest point problems
, 1994
"... This dissertation develops and studies fast algorithms for solving closest point problems. Algorithms for such problems have applications in many areas including statistical classification, crystallography, data compression, and finite element analysis. In addition to a comprehensive empirical study ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This dissertation develops and studies fast algorithms for solving closest point problems. Algorithms for such problems have applications in many areas including statistical classification, crystallography, data compression, and finite element analysis. In addition to a comprehensive empirical study of known sequential methods, I introduce new parallel algorithms for these problems that are both efficient and practical. I present a simple and flexible programming model for designing and analyzing parallel algorithms. Also, I describe fast parallel algorithms for nearest-neighbor searching and constructing Voronoi diagrams. Finally, I demonstrate that my algorithms actually obtain good performance on a wide variety of machine architectures. The key algorithmic ideas that I examine are exploiting spatial locality, and random sampling. Spatial decomposition provides allows many concurrent threads to work independently of one another in local areas of a shared data structure. Random sampling provides a simple way to adaptively decompose irregular problems, and to balance workload among many threads. Used together, these techniques result in effective algorithms for a wide range of geometric problems. The key
Optimal Parallel Approximation Algorithms for Prefix Sums and Integer Sorting (Extended Abstract)
"... Parallel prefix computation is perhaps the most frequently used subroutine in parallel algorithms today. Its time complexity on the CRCWPRAM is \Theta(lg n= lg lg n) using a polynomial number of processors, even in a randomized setting. Nevertheless, there are a number of non-trivial applications t ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Parallel prefix computation is perhaps the most frequently used subroutine in parallel algorithms today. Its time complexity on the CRCWPRAM is \Theta(lg n= lg lg n) using a polynomial number of processors, even in a randomized setting. Nevertheless, there are a number of non-trivial applications that have been shown to be solvable using only an approximate version of the prefix sums problem. In this paper we resolve the issue of approximating parallel prefix by introducing an algorithm that runs in O(lg n) time with very high probability, using n= lg n processors, which is optimal in terms of both work and running time. Our approximate prefix sums are guaranteed to come within a factor of (1 + ffl) of the values of the true sums in a "consistent fashion", where ffl is o(1). We achieve this result through the use of a number of interesting new techniques, such as overcertification and estimate-focusing, as well ...
The Random Adversary: A Lower-Bound Technique For Randomized Parallel Algorithms
- in Proc. of the 3rd SODA (ACM
, 1997
"... . The random-adversary technique is a general method for proving lower bounds on randomized parallel algorithms. The bounds apply to the number of communication steps, and they apply regardless of the processors' instruction sets, the lengths of messages, etc. This paper introduces the ra ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
.<F3.82e+05> The random-adversary technique is a general method for proving lower bounds on randomized parallel algorithms. The bounds apply to the number of communication steps, and they apply regardless of the processors' instruction sets, the lengths of messages, etc. This paper introduces the random-adversary technique and shows how it can be used to obtain lower bounds on randomized parallel algorithms for load balancing, compaction, padded sorting, and finding Hamiltonian cycles in random graphs. Using the random-adversary technique, we obtain the first lower bounds for randomized parallel algorithms which are provably faster than their deterministic counterparts (specifically, for load balancing and related problems).<F4.005e+05> Key words.<F3.82e+05> parallel algorithms, parallel computation, PRAM model, randomized parallel algorithms, expected time, lower bounds, load balancing<F4.005e+05> AMS subject classifications.<F3.82e+05> 68Q10, 68Q22, 68Q25<F4.005e+05> PII.<F3.82e+05> ...
Approximate Parallel Prefix Computation and Its Applications
, 1993
"... In this paper we address two fundamental problems in parallel algorithm design---parallel prefix sums and integer sorting---and show that both of them can be approximately solved very quickly on a randomized CRCW PRAM. In the case of prefix sums the approximation is in terms of the accuracy of the s ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this paper we address two fundamental problems in parallel algorithm design---parallel prefix sums and integer sorting---and show that both of them can be approximately solved very quickly on a randomized CRCW PRAM. In the case of prefix sums the approximation is in terms of the accuracy of the sums and in the case of integer sorting it is in terms of allowing some gaps between consecutive elements in the ordered list. By introducing approximation in these ways we are able to solve these problems in o(lg lg n) time, and thus avoid the near-logarithmic lower bounds by Beame and Hastad that hold for the exact versions of these problems. Nevertheless, we demonstrate that these approximations are strong enough to be used as subroutines in fast randomized algorithms for some well-known problems in parallel computational geometry. Perhaps the most succinct way to describe the power of the new tools which are presented is by observing that prior to this work it was known how to solve the i...
ERCW PRAMs and Optical Communication
- in Proceedings of the European Conference on Parallel Processing, EUROPAR ’96
, 1996
"... This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fan-in, bounded fan-out (or `BFO') circuits. Our results for these two models are of importance because o ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fan-in, bounded fan-out (or `BFO') circuits. Our results for these two models are of importance because of the close relationship of the ERCW model to the OCPC model, a model of parallel computing based on dynamically reconfigurable optical networks, and of BFO circuits to the OCPC model with limited dynamic reconfiguration ability. Topics: Parallel Algorithms, Theory of Parallel and Distributed Computing. This research was supported by Texas Advanced Research Projects Grant 003658480. (philmac@cs.utexas.edu) y This research was supported in part by Texas Advanced Research Projects Grants 003658480 and 003658386, and NSF Grant CCR 90-23059. (vlr@cs.utexas.edu) 1 Introduction In this paper we develop algorithms and lower bounds for fundamental problems on the Exclusive Read Concurrent Wri...
Ultrafast Parallel Algorithms and Reconfigurable Meshes
- Proc. of DARPA Software Technology Conference
, 1992
"... Introduction This research is concerned with the development of very fast parallel algorithms, ones faster than those available through normal programming techniques or standard parallel computers. Algorithms have been developed for problems in geometry, graph theory, arithmetic, sorting, and image ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Introduction This research is concerned with the development of very fast parallel algorithms, ones faster than those available through normal programming techniques or standard parallel computers. Algorithms have been developed for problems in geometry, graph theory, arithmetic, sorting, and image processing. The computing models that these algorithms have been developed for are concurrent read concurrent write parallel random access machines (CRCW PRAMs ), and reconfigurable meshes (rmeshes, defined below). For CRCW PRAMS, our work has shown that by combining randomization with the use of some extra memory, one can solve some problems far faster than they can be solved if only randomization is used. We have developed ultrafast algorithms for several problems, where by ultrafast algorithm we mean a parallel algorithm with an input of size n which uses at most a linear number of processors and finishes in poly-log

