| C.G. Plaxton. Efficient computation on sparse interconnection networks. Computer Science Tech. Rep. STAN-CS-89-1283, Stanford University, Stanford, CA, September 1989. |
....single step algorithms is an irregular communication scheme and difficulty with load balancing. The other group of sorting algorithms is the multi step algorithms, which include bitonic sort [8] column sort [20] rotate sort [23] hyperquicksort [26] flashsort [27] B flashsort [18] smoothsort [25], and Tridgell and Brent s sort [30] Generally speaking, these algorithms accept multiple rounds of communication in return for better load balancing and, in some cases, regular communication. In this paper, we present a novel variation on the sample sort algorithm which addresses the limitations ....
C.G. Plaxton. Efficient computation on sparse interconnection networks. Computer Science Tech. Rep. STAN-CS-89-1283, Stanford University, Stanford, CA, September 1989.
....single step algorithms is an irregular communication scheme and difficulty with load balancing. The other group of sorting algorithms is the multi step algorithms, which include bitonic sort [8] column sort [20] rotate sort [23] hyperquicksort [26] flashsort [27] B flashsort [18] smoothsort [25], and Tridgell and Brent s sort [30] Generally speaking, these algorithms accept multiple rounds of communication in return for better load balancing and, in some cases, regular communication. In this paper, we present a novel variation on the sample sort algorithm which addresses the limitations ....
C. G. Plaxton, Efficient Computation on Sparse Interconnection Networks," Computer Science Tech. Rep. STAN-CS-89-1283, Stanford University, Stanford, CA, September 1989.
....single step algorithms is an irregular communication scheme and difficulty with load balancing. The other group of sorting algorithms is the multi step algorithms, which include bitonic sort [8] column sort [20] rotate sort [23] hyperquicksort [26] flashsort [27] B flashsort [18] smoothsort [25], and Tridgell and Brent s sort [30] Generally speaking, these algorithms accept multiple rounds of communication in return for better load balancing and, in some cases, regular communication. In this paper, we present a novel variation on the sample sort algorithm which addresses the limitations ....
C.G. Plaxton. Efficient computation on sparse interconnection networks. Computer Science Tech. Rep. STAN-CS-89-1283, Stanford University, Stanford, CA, September 1989.
....[4] as a subroutine. On coarse grain hypercubes, this problem can certainly be solved by sorting algorithms [1, 7] in O( n p log n) computation and communication time when n = O(p 1 ) but such solutions use O(n log n) total operations and have high communication complexity. Plaxton [11, 12] presents three coarse grain hypercube algorithms for the unweighted case; the best one [12] achieves a worst case time of O( n p log log p log 2 p log n p ) this we will refer to as Plaxton s algorithm) Plaxton s algorithm assumes n = O(p c ) for some constant c 1. Plaxton also ....
....n p log log p log 2 p log n p ) this we will refer to as Plaxton s algorithm) Plaxton s algorithm assumes n = O(p c ) for some constant c 1. Plaxton also proves an Omega Gamma n p log log p log p) lower bound on the local computation time for the selection problem on hypercubes [11, 13]. Hence the local computation time of [12] in general cannot be improved and linear speedup (computation wise) is impossible for the selection problem on hypercubes unless p is a constant. The communication time of Plaxton s algorithm is O(log 2 p log n p ) Recently, Rajasekaran, Chen, and ....
[Article contains additional citation context not shown here]
C. G. Plaxton. Efficient Computations on Sparse Interconnection Networks. PhD thesis, Stanford University, Department of Computer Science, September 1989.
....associated with a node are arranged in a cycle so the network looks like a cube connected cycle. We also assume that each hypercube node can flip an n sided unbiased coin in one unit of time. All the samplings performed in the algorithms of this paper are uniform. 1. 2 Previous Results Plaxton [8] has presented an algorithm for selection out of n elements that runs on a p node sequential hypercube in time O( n=p) log log p (T 1 T 2 log p) log(n=p) where T 1 is the time needed for sorting p keys (located one per processor) on a p processor hypercube, and T 2 is the time needed for ....
....n elements that runs on a p node sequential hypercube in time O( n=p) log log p (T 1 T 2 log p) log(n=p) where T 1 is the time needed for sorting p keys (located one per processor) on a p processor hypercube, and T 2 is the time needed for broadcasting and summing on a p node hypercube. He [8] has also proved a lower bound of Omega Gamma n=p) log log p log p) for selection. For n p log 2 p the lower bound matches the upper bound (to within a multiplicative constant) The only operations allowed on the keys are copying and comparison (for both the upper bound and the lower ....
[Article contains additional citation context not shown here]
C.G. Plaxton, Efficient Computation on Sparse Interconnection Networks, Ph. D. Thesis, Department of Computer Science, Stanford University, 1989.
....time that is O(TC (n=p) log(n=p) where ffi = 2= log 3 Gamma 1) which is approximately 3:419. Using an algorithm they call cubesort, Cypher and Sanz [14] show how to improve the TC term in these bounds to be O( 25) log n Gammalog (n=p) log n= log(n=p) 2 ) and Plaxton [39] shows how cubesort can be modified to achieve TC = O( log n= log(n=p) 2 ) Indeed, Plaxton 3 can modify the sharesort method of Cypher and Plaxton [15] to achieve TC = O( log n= log(n=p) log 2 (log n= log(n=p) Finally, Chv atal [10] describes an approach of Ajtai, Koml os, Paterson, ....
....method for constructing such a network is not included in Chv atal s report, however, for the method he describes is a non uniform procedure based upon the probabilistic method. In addition, the constant factor in the running time appears to be fairly large. Incidentally, these latter methods [10, 15, 14, 30, 39] are actually defined for more restrictive BSP models where the data elements cannot be duplicated and each internal computation must be a sorting of the internal memory elements. The only previous sorting algorithms we are aware of that were designed with the BSP model in mind are recent methods ....
C. G. Plaxton. Efficient Computation on Sparse Interconnection Networks. PhD thesis, Department of Computer Science, Stanford University, 1989.
....single step algorithms is an irregular communication scheme and difficulty with load balancing. The other group of sorting algorithms is the multi step algorithms, which include bitonic sort [9] column sort [22] rotate sort [25] hyperquicksort [28] flashsort [29] B flashsort [20] smoothsort [27], and Tridgell and Brent s sort [32] Generally speaking, these algorithms accept multiple rounds of communication in return for better load balancing and, in some cases, regular communication. In this paper, we present a novel variation on the sample sort algorithm [19] which addresses the ....
C.G. Plaxton. Efficient Computation on Sparse Interconnection Networks. Technical Report STAN-CS-89-1283, Department of Computer Science, Stanford University, Stanford, CA, September 1989.
....sets, parallel graph partitioning and parallel construction of multidimensional binary search trees. Many parallel algorithms for selection have been designed for the PRAM model [2, 3, 4, 9, 14] and for various network models including trees, meshes, hypercubes and reconfigurable architectures [6, 7, 13, 16, 22]. More recently, Bader et.al. 5] implement a parallel deterministic selection algorithm on several distributed memory machines including CM 5, IBM SP 2 and INTEL Paragon. In this paper, we consider and evaluate parallel selection algorithms for coarse grained distributed memory parallel ....
C.G. Plaxton, Efficient computation on sparse interconnection networks, Ph.D. Thesis, Department of Computer Science, Stanford University, 1989.
....et al. 30] 1987 O(n 1=6 (log n) 2=3 ) DET. Olariu et al. 1992 O(n 1=6 (log n) 1=3 ) DET. Rajasekaran [50] 1992 e O(n 1=6 ) RAND. Table 8: Selection on a Mesh with Fixed Buses Model Run Time Lower Bound Ref. Sequential O( n p log log p log 2 p log( n p ) n p log log p log p [46] Sequential e O( n p log log p log p log log p) n p log log p log p [48] Sequential O(log n log n) log n [6] Sequential e O(log n) log n [57, 48] Weak Parallel O( n p log p log log p) n p log p [46] Weak Parallel e O( n p log p) n p log p [48] Table 9: Selection on the ....
....Ref. Sequential O( n p log log p log 2 p log( n p ) n p log log p log p [46] Sequential e O( n p log log p log p log log p) n p log log p log p [48] Sequential O(log n log n) log n [6] Sequential e O(log n) log n [57, 48] Weak Parallel O( n p log p log log p) n p log p [46] Weak Parallel e O( n p log p) n p log p [48] Table 9: Selection on the Hypercube [48] s algorithm has been implemented on CM 2 and empirical results are promising [51] The best known deterministic algorithm is due to Berthom e, et al. 6] and has a run time of O(log n log n) For the ....
C.G. Plaxton, Efficient Computation on Sparse Interconnection Networks, Ph. D. Thesis, Department of Computer Science, Stanford University, 1989.
....single step algorithms is an irregular communication scheme and difficulty with load balancing. The other group of sorting algorithms is the multi step algorithms, which include bitonic sort [9] column sort [23] rotate sort [26] hyperquicksort [29] flashsort [30] B flashsort [19] smoothsort [28], and Tridgell and Brent s sort [33] Generally speaking, these algorithms accept multiple rounds of communication in return for better load balancing and, in some cases, regular communication. In this paper, we present a novel variation on the sample sort algorithm which addresses the limitations ....
C.G. Plaxton. Efficient Computation on Sparse Interconnection Networks. Technical Report STAN-CS-89-1283, Department of Computer Science, Stanford University, Stanford, CA, September 1989.
....P 0 P 0 P 1 P 0 P 1 P 2 P 3 P 3 P 4 P 3 P 4 P 5 P 6 P 6 P 7 P 6 P 7 P 8 P 0 P 3 : one hop : zero hop Figure 3: Embedding an IL tree on a 3 Theta 3 network. 3.2. Token Concentration Token concentration is an important technique used for load balancing (see Leighton s algorithm in Chapter 4 of [13]) Before the operation of token concentration, each processor may create one token. Note that one processor has no more than one token at any time. The problem of token concentration is defined as follows (an example is illustrated in Figure 4) m Token Concentration: Assume that there are m ....
C.G. Plaxton. Efficient Computation on Sparse Interconnection Networks. PhD thesis, Dept of CS, Stanford University, September 1989.
....in practice than its deterministic counterpart due to the low constant associated with the algorithm. Many parallel algorithms for selection have been designed for the PRAM model [2, 3, 4, 9, 14] and for various network models including trees, meshes, hypercubes and reconfigurable architectures [6, 7, 13, 16, 21]. More recently, Bader et.al. 5] implement a parallel deterministic selection algorithm on several distributed memory machines including CM 5, IBM SP 2 and INTEL Paragon. In this paper, we consider and evaluate parallel selection algorithms for coarse grained distributed memory parallel ....
C.G. Plaxton, Efficient computation on sparse interconnection networks, Ph.D. Thesis, Department of Computer Science, Stanford University, 1989.
....Centre of Ontario. 1 A single step algorithm will merge the portions in all the processors or partition the sublists and assign to all the processors in one step. Single step Multi step merge quickmerge [25] Batcher s bitonic sort [5] based PSRS [30, 21] parallel merge sort [9, 10] smoothsort [24] Nassimi and Sahni s sort [22] column sort [18] Tridgell and Brent s sort [32] snakesort [6] quicksort parallel quicksort [7, 20, 13] flashsort [27] based parallel sample sort [15, 8] B flashsort [14] hyperquicksort [34, 25] Kal e and Krishnan sort [17] Table 1: Classification of parallel sorting ....
C. G. Plaxton. Efficient computation on sparse interconnection networks. Technical Report STAN-CS-89-1283, Stanford University, Department of Computer Scienece, Stanford, CA, September 1989.
....is work efficient (i.e. exhibits optimal speedup) because the processor time product is equal to the time, O(n) of the fastest sequential algorithm for this problem. Cole [3] later found an O(lg n lg n) time work efficient PRAM algorithm. For any p processor hypercubic network, Plaxton [8] showed that selection from a set of n records requires Omega Gammaq n=p) lg lg p lg p) time in the worst case. The bound implies that a work efficient algorithm is not possible. This paper presents several algorithms for selecting the kth largest record from a set of n unordered records on ....
.... claim is easily verified by induction on i) For ffi 20, we can set i = blg ffic Gamma 1 to obtain T (ffi; 4) T (b3ffi=4c 1; dffi=4e 3) O(ffi lg ffi) O(ffi lg ffi) Hence T (d; 0) O(d lg d) O(lg n lg lg n) This algorithm is essentially equivalent to that described by Plaxton in [8]. Prior to this algorithm, the best bounds known for selection on the hypercube were given by sorting algorithms. 2.3 An O(lg n lg (3) n) algorithm Throughout this subsection, we will refer to the O(lg n lg lg n) selection algorithm of Section 2.2 as the basic algorithm. Our present goal is ....
C. G. Plaxton. Efficient Computation on Sparse Interconnection Networks. PhD thesis, Department of Computer Science, Stanford University, September 1989.
....in this paper run on the hypercube or on any of its bounded degree derivatives including the butterfly, cubeconnected cycles, and shuffle exchange network. The fastest runs in O(log n log n) time on an n node network. The fastest previously known algorithm ran in O(log n log log n) time [9]. The algorithms use a technique called succesive sampling, which was previously used by Cole and Yap [5] to solve the selection problem in an idealized model of computation called the parallel comparision model. The algorithms also use as subroutines sorting algorithms for hypercubic networks due ....
....selection refinement algorithm until the set of remaining elements is small enough to be sorted directly. 1. 3 Previous work The fastest previously known algorithm for solving the selection problem on a hypercubic network is due to Plaxton and runs in O(log n log log n) time on an n node network [9]. The selection problem can also be solved in O(log n log log n) using the hypercubic sorting algorithm of Cypher and Plaxton [6] In [9] Plaxton also showed that any deterministic algorithm for solving the selection problem on a p processor hypercubic network requires Omega Gammaq n=p) lg lg ....
[Article contains additional citation context not shown here]
C. G. Plaxton. Efficient Computation on Sparse Interconnection Networks. PhD thesis, Stanford University, Department of Computer Science, Stanford, CA, September 1989.
....surprisingly, these algorithms behave much better on average than they do in the worst case. Perhaps the most important application of the Balance and MultiBalance operations is to the problem of sorting. Fast, practical sorting algorithms based on these load balancing primitives are described in [10]. 14 A Expansion Properties of the Hypercube The calculations in this appendix analyze the volume to surface ratio of a Hamming ball of radius r = r(d) lying in a hypercube of dimension d. Theorem B.1, which is used in Section 2.2, characterizes the asymptotic behavior of this ratio for r in ....
C. G. Plaxton. Efficient Computation on Sparse Interconnection Networks. PhD thesis, Department of Computer Science, Stanford University, September 1989.
....is work efficient (i.e. exhibits optimal speedup) because the processor time product is equal to the time, O(n) of the fastest sequential algorithm for this problem. Cole [3] later found an O(lg n lg n) time work efficient PRAM algorithm. For any p processor hypercubic network, Plaxton [8] showed that selection from a set of n records requires Omega Gammaq n=p) lg lg p lg p) time in the worst case. The bound implies that a work efficient algorithm is not possible. 2 Selection by successive approximation This section presents several algorithms for selecting the kth largest ....
.... claim is easily verified by induction on i) For ffi 20, we can set i = blg ffic Gamma 1 to obtain T (ffi; 4) T (b3ffi=4c 1; dffi=4e 3) O(ffi lg ffi) O(ffi lg ffi) Hence T (d; 0) O(d lg d) O(lg n lg lg n) This algorithm is essentially equivalent to that descibed by Plaxton in [8]. Prior to this algorithm, the best bounds known for selection on the hypercube were given by sorting algorithms. 2.3 An O(lg n lg (3) n) algorithm Throughout this subsection, we will refer to the O(lg n lg lg n) selection algorithm of Section 2.2 as the basic algorithm. Our present goal is ....
C. G. Plaxton. Efficient Computation on Sparse Interconnection Networks. PhD thesis, Department of Computer Science, Stanford University, September 1989. 11
....even a single (general) selection in O(log n) time. Currently, the asymptotically fastest selection algorithm known for cube type computers is based on the O( log log n) 2 ) selection algorithm devised by Cole and Yap for the parallel comparison model [4] and runs in O(log n log log n) time [13]. If the input consists of n 1 Gammaffl sorted sublists of length n ffl for some constant ffl 0, algorithm FindSplitters( can be used to perform O(n ffl 0 ) evenly spaced selections in O(log n) time, for 12 any constant ffl 0 ffl. Furthermore, the algorithm can easily be adapted to ....
C. G. Plaxton. Efficient Computation on Sparse Interconnection Networks. PhD thesis, Department of Computer Science, Stanford University, September 1989.
....cycles, and shuffle exchange. For the interesting case where the number of input records is equal to the number of processors, we obtain the first selection algorithms that are asymptotically faster than the best sorting algorithms currently known. For any p processor hypercubic network, Plaxton [8] showed that selection from a set of n records requires Omega Gammaq n=p) lg lg p lg p) time in the worst case. The bound implies that a work efficient algorithm is not possible. For the case n = p, it is clear that the selection problem is Omega Gamma 3 n) time. Our best asymptotic result ....
....algorithm is not possible. For the case n = p, it is clear that the selection problem is Omega Gamma 3 n) time. Our best asymptotic result presented in this paper is a uniform selection algorithm with a running time of O(lg n lg n) improving on Plaxton s O(lg n lg lg n) time algorithm [8]. 1 Laboratoire de l Informatique du Parall elisme, Institut IMAG, Ecole Normale Sup erieure de Lyon, 46, All ee d Italie, 69364 Lyon Cedex 07, France. This work has been supported by the French CNRS Coordinated Research Program on Parallelism C 3 and the French PRC GDR MathInfo. 2 On leave ....
[Article contains additional citation context not shown here]
C. G. Plaxton. Efficient Computation on Sparse Interconnection Networks. PhD thesis, Stanford University, Department of Computer Science, September 1989.
....even a single (general) selection in O(log n) time. Currently, the asymptotically fastest selection algorithm known for cube type computers is based on the O( log log n) 2 ) selection algorithm devised by Cole and Yap for the parallel comparison model [4] and runs in O(log n log log n) time [13]. If the input consists of n 1 Gammaffl sorted sublists of length n ffl for some constant ffl 0, algorithm FindSplitters( can be used to perform O(n ffl 0 ) evenly spaced selections in O(log n) time, for any constant ffl 0 ffl. Furthermore, the algorithm can easily be adapted to ....
C. G. Plaxton. Efficient Computation on Sparse Interconnection Networks. PhD thesis, Stanford University, Department of Computer Science, September 1989.
....proposed a more practical O(lg n) time randomized algorithm for sorting [24] called flashsort. Many other parallel sorting algorithms have been proposed in the literature, including parallel versions of radix sort and quicksort [7] a variant of quicksort called hyperquicksort [27] smoothsort [21], column sort [16] Nassimi and Sahni s sort [19] and parallel merge sort [8] Our paper reports the findings of a recent project undertaken at Thinking Machines Corporation to develop a fast sorting algorithm for the Connection Machine Supercomputer model CM 2. The primary goal of this project ....
....attractive. Note that column sort could well be a useful component of a hybrid sorting scheme that automatically selects an appropriate algorithm depending upon the values of n and p. Nonadaptive smoothsort. There are a number of variants of the smoothsort algorithm, all of which are described in [21]. The most practical variant, and the one of interest to us here, is the nonadaptive version of smoothsort algorithm. The structure of this algorithm, hereinafter referred to simply as smoothsort , is similar to that of column sort; both algorithms make progress by ensuring that under a certain ....
[Article contains additional citation context not shown here]
C. G. Plaxton. Efficient computation on sparse interconnection networks. Technical Report STAN-CS-89-1283, Stanford University, Department of Computer Science, September 1989.
....proposed a more practical O(lg n) time randomized algorithm for sorting [19] called flashsort. Many other parallel sorting algorithms have been proposed in the literature, including parallel versions of radix sort and quicksort [5] a variant of quicksort called hyperquicksort [23] smoothsort [18], column sort [15] Nassimi and Sahni s sort [17] and parallel merge sort [6] This paper reports the findings of a project undertaken at Thinking Machines Corporation to develop a fast sorting algorithm for the Connection Machine Supercomputer model CM 2. The primary goals of this project were ....
....in performance ever occurs, it appears likely to occur at a point where both cubesort and column sort have poor performance relative to other algorithms (e.g. at low to moderate loads) Nonadaptive smoothsort. There are several variants of the smoothsort algorithm, all of which are described in [18]. The most practical variant, and the one of interest to us here, is the nonadaptive version of the smoothsort algorithm. The structure of this algorithm, hereinafter referred to simply as smoothsort, is similar to that of column sort. Both algorithms make progress by ensuring that under a ....
[Article contains additional citation context not shown here]
C. G. Plaxton. Efficient computation on sparse interconnection networks. Technical Report STAN-CS-89-1283, Stanford University, Department of Computer Science, September 1989.
....algorithms in this paper run on the hypercube or on any of its bounded degree derivatives including the butterfly, cube connected cycles, and shuffle exchange network. The fastest runs in O(lg n lg n) time on an n node network. The fastest previously known algorithm ran in O(lg n lg lg n) time [9]. The algorithms use a technique called succesive sampling, which was previously used by Cole and Yap [5] to solve the selection problem in an idealized model of computation called the parallel comparision model. The algorithms also use as subroutines sorting algorithms for hypercubic networks due ....
....lg n) time. Note that sparse enumeration sort runs in optimal O(lg n) time if p n 1 for some positive constant . The fastest previously known algorithm for solving the selection problem on a hypercubic network is due to Plaxton and runs in O(lg n lg lg n) time on an n node network [9]. Of course, the selection problem can also be solved in O(lg n(lg lg n) 2 ) time using Sharesort. Plaxton also showed that any deterministic algorithm for solving the selection problem on a p processor hypercubic network requires Omega Gammaq n=p) lg lg p lg p) time in the worst case [9] ....
[Article contains additional citation context not shown here]
C. G. Plaxton. Efficient Computation on Sparse Interconnection Networks. PhD thesis, Department of Computer Science, Stanford University, September 1989.
No context found.
C.G. Plaxton, Efficient Computation on Sparse Interconnection Networks, Technical Report STAN-CS-89-1283, Stanford University, Department of Computer Science, September 1989.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC