55 citations found. Retrieving documents...
Anna R. Karlin and Eli Upfal. Parallel hashing: An efficient implementation of shared memory. Journal of the ACM, 35(4):876--892, October 1988.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Hashing and Rehashing in Emulated Shared Memory - Keller (1992)   (2 citations)  (Correct)

....class that restricts module congestion to O(log n) is H = p(x) X i=0 a i Delta x i mod P mod m : 0 a i P ) P is a prime larger than m, O(log n) A function of H is obtained by randomly choosing the values for a i . This class was used in several theoretical investigations [10, 13, 16] to emulate shared memory on a processor network. The module congestion of O(log n) is sufficient because access from processors to memory modules across a constant degree interconnection network needs time Omega Gammame n) anyway. However the functions in H are not bijective. This means that ....

Anna R. Karlin and Eli Upfal. Parallel hashing: An efficient implementation of shared memory. Journal of the ACM, 35(4):876--892, October 1988.


Overview of Mesh Results - Sibeyn (1995)   (2 citations)  (Correct)

....on its performance. PRAM simulation is not limited to meshes, and there is a rich literature (see any of the papers mentioned for more references) An essential distinction exists between randomized algorithms, using the randomization to choose a hash function from a certain universal class [33]; and deterministic algorithms, distributing copies of every item in order to minimize the access time for all batches of requests [70] For meshes there are only a few results. Leppanen and Penttonen report simulation results on randomized PRAM simulation [59] The conclusion from this work is ....

Karlin, A.R., E. Upfal, `Parallel Hashing - an Efficient Implementation of Shared Memory,' Proc. 18th Symposium on Theory of Computing, pp. 160--168, 1986.


Techniques for Shared Key Sorting - Cypher, Plaxton (1990)   (1 citation)  (Correct)

.... For the most recent improvements to randomized routing techniques, and for pointers to previous work in this area, the reader is referred to [10] Another significant improvement in randomized routing algorithms has been the reduction in the number of random bits from O(n log n) to O(log 2 n) [7]. It is interesting to note that if a randomized routing algorithm requiring only O(log n) random bits could be devised, then Lemmas 4.1 and 4.2 would provide an on line shared key sorting algorithm with the same (optimal) asymptotic performance as the non constructive algorithms of Theorem 4.1. ....

A. R. Karlin and E. Upfal. Parallel hashing -- an efficient implementation of shared memory. In Proc. 18th Annual Symposium on Theory of Computing, pages 160--168, 1986.


Shared Memory Simulations with Triple-Logarithmic Delay.. - Czumaj, al. (1995)   (16 citations)  (Correct)

....of memory contention on the performance of parallel computers, several authors have investigated the simulation of shared memory machines on DMMs. Often the authors assumed that processors and modules are connected by a bounded degree network, and packet routing is used to access the modules [R91, L92a, L92b, U84, KU86]. In this paper we study DMMs with a complete interconnection between processors and modules. Simulations based on hashing distribute the shared memory cells U among the modules using one or more hash functions h i : U [n] 3 i 2 [a] cell u 2 U is stored in the modules M h1 (u) M ha ....

A. Karlin and E. Upfal. Parallel hashing - an efficient implementation of shared memory. In Proceedings of the 18th Annual ACM Symposium on Theory of Computing, pages 160--168, 1986.


Implementing Shared Memory on Multi-Dimensional Meshes .. - Herley.. (1995)   (2 citations)  (Correct)

....an (n; m) PRAM on an n leaf pruned butterfly with worst case slowdown O( p n log(m=n) using O(log(m=n) copies per variable and O( m=n) log 3 (m=n) storage per node. Lower bounds on the slowdown of PRAM simulations on bounded degree networks have been presented in a number of studies [AHMP87, KU88, HB94]. All such bounds, however, apply to the entire class of bounded degree networks, among which there are networks with low diameter and high bandwidth. For example, in [HB94] the authors show an Omega Gamma 38 2 (m=n) log log(m=n) lower bound, which is too weak for our purposes, since a ....

....scheme devised for the mesh to the pruned butterfly, thereby obtaining the result stated in Theorem 2. 5 Lower Bound In this section, we develop a novel lower bound for deterministic PRAM simulation on processor networks. The lower bound argument is similar in spirit to the one used in [AHMP87, KU88, HB94], but embodies a number of critical enhancements that make it sensitive to the bandwidth characteristics of the interconnection, whereas previous bounds were significant only for highly expanding networks. The bound relies on the notion of decomposition tree [BL84, Lei85] which 8 provides a ....

A.R. Karlin and E. Upfal. Parallel hashing: An efficient implementation of shared memory. Journal of the ACM, 35(4):876--892, Oct 1988.


Improved Optimal Shared Memory Simulations, and the.. - Czumaj, der Heide..   (Correct)

....of memory contention on the performance of parallel computers, several authors have investigated the simulation of shared memory machines on DMMs. Often the authors assumed that processors and modules are connected by a bounded degree network, and packet routing is used to access the modules [R91, L92a, L92b, U84, KU86]. In this paper we focus on DMMs with a complete interconnection between processors and modules. Additionally we introduce a new model of the DMM, called the reconfigurable DMM, or abbreviated as RDMM. This model can be viewed as an ordinary DMM with the additional facility of combining links to ....

A. Karlin and E. Upfal. Parallel hashing - an efficient implementation of shared memory. In Proceedings of the 18th Annual ACM Symposium on Theory of Computing, pages 160--168, 1986.


Simple, Efficient Shared Memory Simulations (Extended.. - Dietzfelbinger, der Heide (1993)   (1 citation)  (Correct)

....accesses to cells stored in one module are sequentialized. Therefore many authors have investigated methods for simulating PRAMs on DMMs. Often it is assumed that processors and modules are connected by a bounded degree network, and packet routing is used to access the modules. See, e.g. [22, 10, 24, 20]; for a survey on packet routing see [15, 16] In this paper we focus on DMMs with a complete interconnection between processors and modules. 1.1 Computation models A parallel random access machine (PRAM) consists of processors P1 ; Pm and a shared memory with cells U = f1; pg, ....

A. Karlin and E. Upfal. Parallel hashing --- an efficient implementation of shared memory. In Proc. of the 18th Ann. ACM Symp. on Theory of Computing, pp. 160-- 168, 1986.


On Universal Classes of Extremely Random Constant Time Hash.. - Siegel   (Correct)

....algorithms that is equivalent to the use of fully random hash functions. For example, recent randomized routing schemes for size n Omega networks have been proven to give optimal expected performance (up to constant factors) given a random Omega Gammando n) wise independent hash function ([5], 13] The hash functions used to date have typically been polynomials of degree fi log n defined on finite fields. In particular, Carter and Wegman exhibited the universal classes of (h) wise independent hash functions that map [0; m Gamma 1] 7 [0; n Gamma 1] F (h) ff j f(x) X ....

....whence pairwise independence guarantees optimal expected performance. In the case of randomized routing on n node bounded degree graphs, the O(log n) cost for each memory reference hashed by a function from F (8 log n) is readily subsumed by the Omega Gamma 10 n) delay in routing the data ([5], 13] Recently, O(log n) wise independent hash functions have also been shown to give optimal expected probe performance for 4 On universal classes of extremely random constant time hash functions and their time space tradeoff double hashing ( 16] But this efficiency is only in terms of ....

[Article contains additional citation context not shown here]

A. Karlin and E. Upfal. Parallel Hashing - an Efficient Implementation of Shared Memory, 18th Annual Symposium on Theory of Computing, May, 1986, pp. 160-- 168.


A Combining Mechanism for Parallel Computers - Leslie Valiant Aiken (1992)   (22 citations)  (Correct)

....be introduced also if the requests are transmitted bit serially. The algorithm can be adapted to models of parallel computation other than the simple router. One candidate is what is called the S PRAM in [15] that has been suggested as a model of various proposals for optical interconnects [1] [6]. Here at each cycle any component can transmit a message to any other, but only those receive messages that have just one targeted at them in that cycle. The senders find out immediately whether their transmission succeeded. Known general simulations of the BSP on the S PRAM with slack log p or ....

A. Karlin and E. Upfal. Parallel hashing -- an efficient implementation of shared memory. Proc. 18th ACM Symp. on Theory of Computing (1986) 160-168.


Constructive Deterministic PRAM Simulation on a.. - Pietracaprina, Pucci, .. (1993)   (7 citations)  (Correct)

....the time needed to simulate one PRAM step, one has to minimize both the memory contention, caused by requests addressed to the same module, and the congestion in the network, caused by the routing of the messages. Several randomized simulation schemes have been presented in the literature, [MV84, KU88, LPP88, Ran91, Mey92, KLM92, DM93]. In all these schemes, the PRAM shared memory is distributed among the modules using one (or more) hash functions randomly drawn from a specific universal class [CW79] One of the most significant results is Ranade s simulation of an n processor PRAM step on an n node Butterfly, in O(log n) time, ....

A.R. Karlin and E. Upfal. Parallel hashing: An efficient implementation of shared memory. J. ACM, 35(4):876--892, 1988.


Parallel Read Operations Without Memory Contention - Andreev, Clementi, Penna, Rolim (2000)   (Correct)

....connected to the memory modules through a switching network. This parallel model, commonly referred to as Distributed Memory Machine (DMM) or Module Parallel Computer, is considered more realistic than the PRAM model and it has been the subject of several studies in the literature [Karp et al. 96, Karlin et al. 86, Mehlhorn et al. 84, Pietracaprina et al. 93, Upfal 84] In an EREW PRAM, each of the p processors can in fact access any of the N memory words, provided that a word is not accessed by more than one processor simultaneously. To ensure such connectivity, the total number of the switching elements ....

....processors with optimal expected delay time O(log log p log p) per step of simulation. In one (the best) version of their solution the memory contention is O(log log p) and each of the shared data is replicated in c 2 copies and mapped to p memory modules by means of c hash functions (see also [Karlin et al. 86] It is important to remark that such hash functions have sufficiently good random properties (and, thus, they work correctly) only if the size of the domain of the mapped data is polynomial in p. This is one of the reasons for which their randomized simulation determines a partition of the ....

Karlin A. and Upfal E. (1986), Parallel hashing - an efficient implementation of shared memory. Proc. ACM STOC, 160-168.


The Complexity of Deterministic PRAM Simulation on.. - Pietracaprina, Pucci (1997)   (Correct)

....a simple modification of their proof, 4 it can be shown that any simulation with redundancy r and m polynomial in n requires Omega log (m=n) log log (m=n) min ( n r ; m n 1 r ) 1) slowdown on an n DMM. Lower bounds have been also developed for networks of bounded degree in [1, 7, 4, 5]. However, the techniques used to prove such bounds are of a slightly different nature, since bandwidth issues have to be taken into account. 1.2 New Results The goal of this paper is to study the complexity of deterministic (n; m) PRAM simulation on a p DMM, with p n, as a function of the ....

....loss of efficiency on a suitable bounded degree network. Therefore, Result 3 also holds for such sparser (and feasible) interconnection. 2 Lower Bound A number of literature papers contain lower bounds on the slowdown of deterministic PRAM simulations on the DMM [17] or on bounded degree networks [1, 7, 4, 5]. All such bounds are expressed in terms of the parameters n and m, and implicitly optimize on the value of the redundancy. Since all deterministic simulation schemes developed until now use a fixed 6 number r of copies per variable, we think useful to estimate the best possible slowdown ....

A.R. Karlin and E. Upfal. Parallel hashing: An efficient implementation of shared memory. Journal of the ACM, 35(4):876--892, October 1988.


Implementing Shared Memory on Mesh-Connected Computers .. - Herley.. (2000)   (1 citation)  (Correct)

....m) PRAM on an n leaf pruned butterfly with worst case slowdown O i p n log(m=n) j , using O (log(m=n) copies per variable and O i (m=n) log 5 (m=n) j storage per node. Lower bounds on the slowdown of PRAM simulations on bounded degree networks have been presented in a number of studies [AHMP87, KU88, HB94]. All such bounds, however, apply to the entire class of such networks, and cannot be specialized to the characteristics of a given topology. For example, in [HB94] the authors show an Omega i log 2 (m=n) log log(m=n) j lower bound on the slowdown required to simulate a PRAM step on any ....

....= O i (m=n) log 4 (m=n) j per node. Hence, the total storage requirement per node is O i (m=n) log 5 (m=n) j . 7 Lower Bound In this section, we prove a lower bound on the worst case slowdown incurred when simulating a PRAM step on a processor network. Unlike previous approaches [AHMP87, KU88, HB94], which do not account for the network topology, we obtain a bound that is based on the bandwidth characteristics of the simulating network. As a result, while previous lower bounds were significant only for very powerful networks such as expanders, our lower bound can be specialized, yielding ....

[Article contains additional citation context not shown here]

Karlin, A.R., and Upfal, E. (1988), Parallel hashing: An efficient implementation of shared memory, J. ACM, 35 (4), 876--892.


A Practical Constructive Scheme for Deterministic.. - Pietracaprina, Preparata (1993)   (2 citations)  (Correct)

....databases) and has received considerable attention in the literature. An early survey by [Kuc77] quotes fourteen works that deal with some special cases. More recently, it has become the main focus of the large body of work concerning the simulation of the PRAM on distributed memory machines [MV84, UW87, AHMP87, KU88, LPP88, Her89, LPP90, Her90, Ran91, Mey92, KLM92]. It is convenient to study this problem on a synchronous system where processors and memories are ideally thought of as being connected by a complete bipartite graph and each memory module is able to fulfill at most one access request (read write) per time unit (Module Parallel Computer (MPC) ....

....the bipartite graph is simulated by a bounded degree network from the more difficult memory organization problem. For the latter, a number of randomized schemes have been successfully developed based on the use of universal classes of hash functions to distribute the variables among the modules [MV84, KU88, LPP88, Ran91, Mey92, KLM92]. Instead, the development of efficient deterministic schemes appears to be much This paper was partially supported by NFS Grant CCR 91 96152 and ONR Contract N00014 91 J 4052, ARPA order 8225. y Department of Computer Science, University of Illinois at Urbana Champaign, Urbana, IL 61801 z ....

A.R. Karlin and E. Upfal. Parallel hashing: An efficient implementation of shared memory. J. ACM, 35(4):876--892, 1988.


Local Search in Physical Distribution Management - Kindervater, Savelsbergh (1992)   (Correct)

....writes into the same location. Although the PRAM is hardly a realistic computer model, the resulting algorithms can be adequately used for implementation on any real world machine and the overhead introduced is only minimal (see for example Alt, Hagerup, Mehlhorn Preparata [1987] and Karlin Upfal [1988]) Before addressing the issue of local optimality, we will first consider an elementary problem and describe a basic technique in parallel computing for its solution. The algorithm consists of two phases. In some simple situations, only the first phase is needed. The problem is to find the ....

A.R. KARLIN,E.UPFAL (1988). Parallel hashing - an efficient implementation of shared memory. J. Assoc. Comput. Mach. 35, 876-892.


Efficient Deterministic and Probabilistic Simulations of PRAMs .. - Li, Pan, al. (2000)   (Correct)

.... switches [32] For probabilistic simulation, it is well known that by allowing packet combining, each step of a p processor PRAM can be simulated by a p node butterfly network in O#log p# time with high probability [46] Other deterministic and probabilistic simulation results are also reported in [1, 5, 18, 20, 22, 33, 34, 47, 48, 49]. A comprehensive survey can be found in [17] Recently, there have been significant advances in optical interconnections. Fiber optic communication technologies offer a combination of gigabit transmission capacity, predictable message delay, low interference and error probability. Based on the ....

A. R. Karlin and E. Upfal. Parallel hashing---an efficient implementation of shared memory. Journal of the ACM, 35:876--892, 1988.


Horizons of Parallel Computation - Bilardi, Preparata (1993)   (28 citations)  (Correct)

....approach is represented by contentions, occurring either among requests answers competing for the same transmission link or among memory references directed to the same memory node. A number of solutions to this problem have been proposed, based both on randomized and on deterministic algorithms[MV84, UW87, AHMP87, Ran87, KU88, HB88]. These schemes are collectively referred to as P RAM simulations, and each of them is categorized on the basis of its slowdown, i.e. the number of network steps needed to simulate a P RAM step (a global memory reference) Fortunately, the mesh interconnection lends itself to a P RAM simulation ....

Karlin, A.,Upfal, E.: Parallel hashing - an efficient implementation of shared memory, Journal of the ACM, 35, 4(1988), 876-892


Efficient PRAM Simulation on a Distributed Memory Machine - Karp, Luby, der Heide (1994)   (72 citations)  (Correct)

....PRAM step which are mapped to the same module under h. ffl In the case of a sparse interconnection network, the routing time, i.e. the time needed to route read and write requests from processors to memory modules, and to transmit the results of read requests back to the requesting processors. In [15] and [20] it is shown that, on a butterfly network, expected routing time O(log(n) can be achieved, which clearly is asymptotically optimal. The expected contention can be made as small as O(log(n) loglog(n) if log(n) universal hash functions as introduced in [4] are used. These hash ....

A. Karlin and E. Upfal. Parallel hashing --- an efficient implementation of shared memory. In Proc. of the 18th Ann. ACM Symp. on Theory of Computing, pages 160-- 168, 1986.


Parallel Computational Geometry : An approach using randomization - Reif, Sen (1999)   (1 citation)  (Correct)

....will help us in understanding the latter algorithms that are built on them. The problem of packet routing involves routing a message from processor i to Pi(i) where Pi is a permutation function. There has been a long and rich history of routing algorithms for fixed connection networks (see [86, 59, 74, 54]) and these can be summarized as following Lemma 6.4 There exists an algorithm for permutation routing on a n node butterfly network that executes in O(log n) steps and uses only constant size queues to achieve this running time. A more general result has been proved by Maggs et al. 59] for ....

....assume that after each recursive call, the sub networks (of varying sizes corresponding to different subroutine calls) are relabeled as if these were isolated networks. The V [w] s are then defined accordingly. The time analysis for this procedure is carried out using a delay sequence argument [54] and it can be shown that this takes O(log n) time in a BF n . We would need a generalization of the result of Theorem 2.7 where the process tree is modified in the following manner. Instead of all the sub routines from a node proceeding independently, all the subroutines for a fixed (constant) ....

A. Karlin and E. Upfal. Parallel hashing - an efficient implementation of shared memory. Proc. of the 18th ACM STOC, pages 160--168, 1986.


A General Purpose Shared-Memory Model For Parallel Computation - Ramachandran   (3 citations)  (Correct)

....of the bsp components, independent of the other memory locations, and independent of the qsm(g; d) algorithm. In practice one would distribute the shared memory across the bsp processors using a random hash function from a class of universal hash functions that can be evaluated quickly (see, e.g. [11, 37, 26]) Theorem 4.2 A p 0 processor qsm(g; d) algorithm that runs in time t 0 can be emulated on a p processor bsp in time t = t 0 Delta p 0 p w.h.p. provided p p 0 (L=g) g=d) lg p and t 0 is bounded by a polynomial in p. Proof. The emulation algorithm is quite simple. The ....

A. Karlin and E. Upfal. Parallel hashing -- An efficient implementation of shared memory. J. ACM, 35:4, pages 876--892, 1988.


A Comparison Of Data Layout Schemes For Multimedia Servers - Berenbrink, Lüling, Rottmann   (Correct)

....that Lemma 3.3 and Lemma 3.4 hold for any S, not only for a randomly chosen S because the probability experiment is the random choice of the hash function. 3.3. Behavior of Polynomials as Hash Functions Polynomials as hash functions have been studied and used in a great number of papers, e.g. [22, 23, 24, 25, 26, 27, 28, 29, 30]. In the following again p is prime and jU j = p. Definition 3.5 H n d : fh a : U Gamma f0; n Gamma 1g; a = fa 0 ; a d Gamma1 g 2 U d g and h a (x) d Gamma1 X i=0 a i Delta x i mod p mod n The next lemma shows that a polynomial with degree d Gamma 1 ....

A. Karlin and E. Upfal. Parallel hashing --- an efficient implementation of shared memory. In Proc. of the 18th ACM STOC, pages 160--168, 1986.


Simulating Shared Memory in Real Time: On the.. - Czumaj, der Heide.. (1995)   (4 citations)  (Correct)

....modules are connected via a routing interconnection network. In this paper we study DMMs with n processors and n modules. In an effort to understand the relative power of the PRAM compared with other parallel computation models several authors described simulations between them (Upfal, 1984; Karlin and Upfal, 1986; Wang and Chen, 1990; Ranade, 1991; Leighton, 1992a; Leighton, 1992b; Karp et al. 1993; Dietzfelbinger and Meyer auf der Heide, 1993; Meyer auf der Heide et al. 1995; Czumaj et al. 1995d) For example, it is known that the n processor PRAM can be simulated (wigh high probability) with O(log n) ....

Karlin, A. and Upfal, E. (1986), "Parallel hashing - an efficient implementation of shared memory," in Proceedings of the 18th Annual ACM Symposium on Theory of Computing, pp. 160--168.


Parallel Algorithmic Techniques: PRAM Algorithms And PRAM.. - Czumaj (1995)   (Correct)

....of memory contention on the performance of parallel computers, several authors have investigated the simulation of shared memory machines on DMMs. Often the authors assumed that processors and modules are connected by a bounded degree network, and packet routing is used to access the modules (Karlin and Upfal, 1986; Leighton, 1992a; Leighton, 1992b; Ranade, 1991; Upfal, 1984) In this chapter we analyse the C DMM model, the DMM with a complete interconnection between processors and modules. This module may be viewed as a model which does not take into account the running time of packet outing and focus ....

Karlin, A. and Upfal, E. (1986), "Parallel hashing - an efficient implementation of shared memory," In Proceedings of the 18th Annual ACM Symposium on Theory of Computing, pages 160--168.


Fast Rehashing in PRAM Emulations - Keller (1993)   (2 citations)  (Correct)

....[8] is a widely used theoretical machine model for processors working synchronously on a shared memory, with unit memory access time. Many numerical and combinatorical parallel algorithms have been designed for the PRAM [4, 9, 11] Much effort has been put in emulating PRAMs on processor networks [10, 14, 15]. We restrict to randomized solutions; we omit the deterministic solutions because they use special expander graphs for which no constructions are known today. A second approach for shared memory emulations uses caches to avoid using the network. An example is the DASH multiprocessor [12] We do ....

A. R. Karlin and E. Upfal, Parallel hashing: An efficient implementation of shared memory, J. Assoc. Comput. Mach. 35 (1988) 876--892.


Trading Space for Time in Undirected s-t Connectivity - Broder, Karlin, Raghavan.. (1991)   (10 citations)  Self-citation (Karlin Upfal)   (Correct)

....total of O.m 2 log 4 n=p time. Since this is also the total number of finds and lookups performed, this is the running time of each execution of the outermost loop. 2 Note that this algorithm is easily parallelizable using p processors and O.p space. The parallel hashing scheme described in [7] can be used to implement a parallel version of this algorithm that runs on p processors, n z # p # n 1#z , z 0, that are connected by a bounded degree network. Briefly, storing the leader set using parallel hashing allows for the p processors to execute parallel unions and parallel finds ....

A.R. Karlin and E. Upfal. Parallel hashing: An efficient implementation of shared memory. Journal of the ACM, 35(4):876--892, October 1988.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC