Results 11  20
of
57
A Combining Mechanism for Parallel Computers
 IN PROCEEDINGS OF THE FIRST HEINZ NIXDORF SYMPOSIUM
, 1992
"... In a multiprocessor computer communication among the components may be based either on a simple router, which delivers messages pointtopoint like a mail service, or on a more elaborate combining network that, in return for a greater investment in hardware, can combine messages to the same addre ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
In a multiprocessor computer communication among the components may be based either on a simple router, which delivers messages pointtopoint like a mail service, or on a more elaborate combining network that, in return for a greater investment in hardware, can combine messages to the same address prior to delivery. This paper describes a mechanism for recirculating messages in a simple router so that the added functionality of a combining network, for arbitrary access patterns, can be achieved by it with provable efficiency. The method brings together the messages with the same destination address in more than one stage, and at a set of components that is determined by a hash function and decreases in number at each stage.
Delayed path coupling and generating random permutations via distributed stochastic processes
, 1999
"... We analyze various stochastic processes for generating permutations almost uniformly at random in distributed and parallel systems. All our protocols are simple, elegant and are based on performing disjoint transpositions executed in parallel. The challenging problem of our concern is to prove that ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
(Show Context)
We analyze various stochastic processes for generating permutations almost uniformly at random in distributed and parallel systems. All our protocols are simple, elegant and are based on performing disjoint transpositions executed in parallel. The challenging problem of our concern is to prove that the output configurations in our processes reach almost uniform probability distribution very rapidly, i.e. in a (low) polylogarithmic time. For the analysis of the aforementioned protocols we develop a novel technique, called delayed path coupling, for proving rapid mixing of Markov chains. Our approach is an extension of the path coupling method of Bubley and Dyer. We apply delayed path coupling to three stochastic processes for generating random permutations. For one
Analysis of Practical Backoff Protocols for Contention Resolution with Multiple Servers
, 1995
"... Backoff protocols are probably the most widely used protocols for contention resolution in multiple access channels. In this paper, we analyze the stochastic behavior of backoff protocols for contention resolution among a set of clients and servers, each server being a multiple access channel that d ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
Backoff protocols are probably the most widely used protocols for contention resolution in multiple access channels. In this paper, we analyze the stochastic behavior of backoff protocols for contention resolution among a set of clients and servers, each server being a multiple access channel that deals with contention like an Ethernet channel. We use the standard model in which each client generates requests for a given server according to a Bernoulli distribution with a specified mean. The clientserver request rate of a system is the maximum over all clientserver pairs (i; j) of the sum of all request rates associated with either client i or server j. (Having a subunit clientserver request rate is a necessary condition for stability for singleserver systems.) Our main result is that any superlinear polynomial backoff protocol is stable for any multipleserver system with a subunit clientserver request rate. Our result is the first proof of stability for any backoff protocol fo...
Integer Sorting and routing in arrays with reconfigurable optical buses
 Proceedings of International Conference of Parallel Processing
, 1996
"... ..."
Contention Resolution in Hashing Based Shared Memory Simulations
, 2000
"... In this paper we study the problem of simulating shared memory on the distributed memory machine (DMM). Our approach uses multiple copies of shared memory cells, distributed among the memory modules of the DMM via universal hashing. The main aim is to design strategies that resolve contention at th ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
In this paper we study the problem of simulating shared memory on the distributed memory machine (DMM). Our approach uses multiple copies of shared memory cells, distributed among the memory modules of the DMM via universal hashing. The main aim is to design strategies that resolve contention at the memory modules. Extending results and methods from random graphs and very fast randomized algorithms, we present new simulation techniques that enable us to improve the previously best results exponentially. In particular, we show that an nprocessor CRCW PRAM can be simulated by an nprocessor DMM with delay O(log log log n log ∗ n), with high probability. Next we describe a general technique that can be used to turn these simulations into timeprocessor optimal ones, in the case of EREW PRAMs to be simulated. We obtain a timeprocessor optimal simulation of an (n log log log n log ∗ n)processor EREW PRAM on an nprocessor DMM with delay O(log log log n log ∗ n), with high probability. When an (n log log log n log ∗ n)processor CRCW PRAM is simulated, the delay is only by a log ∗ n factor larger. We further demonstrate that the simulations presented can not be significantly improved using our techniques. We show an Ω(log log log n / log log log log n) lower bound on the expected delay for a class of PRAM simulations, called topological simulations, that covers all previously known simulations as well as the simulations presented in the paper.
Efficient Communication Using TotalExchange
"... ... programs using highlevel, generalpurpose, and architectureindependent programming language and have them executedonavarietyofparallelanddistributed architectureswithout sacricing efficiency. Alargebodyofresearchsuggeststhat,atleastintheory, generalpurposeparallelcomputingisindeedpossiblepro ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
... programs using highlevel, generalpurpose, and architectureindependent programming language and have them executedonavarietyofparallelanddistributed architectureswithout sacricing efficiency. Alargebodyofresearchsuggeststhat,atleastintheory, generalpurposeparallelcomputingisindeedpossibleprovided certainconditionsaremet: anexcessoflogicalparallelismin the program,andtheabilityofthetargetarchitectureto efficientlyrealizebalancedcommunication patterns. Thecanonicalexampleofabalancedcommunicationpatternisan hrelation, inwhicheachprocessoristheorigin and destination of at most h messages. A plethoraofprotocolshasbeendesigned forrouting hrelations inavarietyofnetworks. Thegoalhasbeentominimizethevalueofhwhile guaranteeingdeliveryofthemessageswithintime aconstantfactorfromoptimal.Inthispaperwe describeprotocolsthatmeetthemoststringent efficiency requirement, namely deliveryofmessages withintimethatisalowerorderadditivetermfrom thebestachievable. Suchprotocolsarecalled 1optimal. Whiletheseprotocolsachieve1optimality only forheavilyloadednetworks,thatis,for largevaluesofh, theyareremarkablefortheirsimplicityinthattheyonly usethetotalexchange communication primitive. The totalexchange canberealizedinmanynetworksusingverysimple, contentionfree,andextremely efficient schemes. Thetechnicalcontributionofthispaperisaprotocol torouterandomhrelationsinan Nprocessor networkusing hN(1+o(1))+O(loglogN) totalexchange roundswithhighprobability. Usingmessageduplication, wecanimprovetheboundto hN(1+o(1))+O(logN). This improves upon the hN(1+o(1))+O(logN) bound of Gerbessiotis and Valiant. While our theoretical improvements are modest, our experimental results show an improvement over the protocol of Gerebessiotis and Valiant.
Fast Deterministic Simulation of Computations on Faulty Parallel Machines
 in Proc. of the 3rd Ann. European Symp. on Algorithms, 1995, Springer Verlag LNCS 979
, 1995
"... A method of deterministic simulation of fully operational parallel machines on the analogous machines prone to errors is developed. The simulation is presented for the exclusiveread exclusivewrite (EREW) PRAM and the Optical Communication Parallel Computer (OCPC), but it applies to a large class o ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
A method of deterministic simulation of fully operational parallel machines on the analogous machines prone to errors is developed. The simulation is presented for the exclusiveread exclusivewrite (EREW) PRAM and the Optical Communication Parallel Computer (OCPC), but it applies to a large class of parallel computers. It is shown that simulations of operational multiprocessor machines on faulty ones can be performed with logarithmic slowdown in the worst case. More precisely, we prove that both a PRAM with a bounded fraction of faulty processors and memory cells and an OCPC with a bounded fraction of faulty processors can simulate deterministically their faultfree counterparts with O(log n) slowdown and preprocessing done in time O(log 2 n). The fault model is as follows. The faults are deterministic (worstcase distribution) and static (do not change in the course of a computation). If a processor attempts to communicate with some other processor (in the case of an OCPC) or re...
Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers
 Journal of Parallel and Distributed Computing
"... Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N), where 2 < 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using N = log N processors. Such a parallel comp ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N), where 2 < 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using N = log N processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC can be made fully scalable, that is, for all 1 p N = log N, multiplying two N N matrices can be performed by a DMPC with p processors in O(N =p) time, i.e., linear speedup and cost optimality can be achieved in the range [1::N = log N]. This unifies all known algorithms for matrix multiplication on DMPC, standard or nonstandard, sequential or parallel. Extensions of our methods and results to other parallel systems are also presented. The above claims result in significant progress in scalable parallel matrix multiplication (as well as solving many other important problems) on distributed memory systems, both theoretically and practically. 1
HotPotato Routing Algorithms for Sparse Optical Torus
, 2000
"... In this work we present an optical network architecture and deflection (or hot potato) routing algorithms supporting efficient communication between n processor nodes. The network consists of an n x n torus, where processor nodes are situated diagonally, and routing nodes are optical deflection nod ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
In this work we present an optical network architecture and deflection (or hot potato) routing algorithms supporting efficient communication between n processor nodes. The network consists of an n x n torus, where processor nodes are situated diagonally, and routing nodes are optical deflection nodes of two inputs and two outputs. A design of optical deflection node is presented. Routing algorithms are variations of the greedy routing algorithm, and by experiments and partial theoretical analyses they run efficiently on this architecture.