| H. Shi and J. Schae#er. Parallel Sorting by Regu lar Sampling. Journal of Parallel and Distributed Computing, 14:361--372, 1992. |
....the bandwidth of the interconnect network is limited, the access time on a memory location will increase dramatically when contention increases. Contention has been considered as one of the main factors a#ecting the performance of shared memory multiprocessors [9] Results of many researchers [5, 11, 15] have shown that the performance of spinlocks degrades dramatically in the presence of contention on shared memory multiprocessors. The spin operation on a memory location is CPU intensive, performs no useful work and moreover generates an overwhelming amount of network tra#c on shared memory ....
....Reisinge. The non blocking write protocol NBW: A solution to a real time synchronisation problem. In Proceedings of the Real Time Systems Symposium, pages 131 137, Raleigh Durham, NC, Dec. 1993. IEEE Computer Society Press. 14] C. M. Krishna and K. G. Shin. Real Time Systems. McGraw Hill, 1997. [15] B. H. Lim and A. Agarwal. Reactive synchronization algorithms for multiprocessors. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI) pages 25 35. ACM press, Oct. 1994. 16] P. Magnusson, A. Landin, and E. ....
[Article contains additional citation context not shown here]
H. Shi and J. Schae#er. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14(4):361--372, 1992.
....Nonetheless, the necessity of gathering all keys onto one processor limits the maximum simulation size to less than a million particles on the T3E. The fully parallel sort currently implemented is an adaptation of the PSRS (parallel sort by regular sampling) algorithm originally proposed in Ref. [21]. Since the distribution of keys depends sensitively on the geometry of the system simulated that is, whether the particles are initially arranged in a cube, sphere or more complex object regular sampling tends to produce highly imbalanced particle numbers across the processors. To compensate ....
H. Shi and J. Schaeffer, Parallel sorting by regular sampling, J. Par. Dist. Comp. 14, 361--372 (1992).
....of two groups, each with its respective disadvantages. The first group, using the classification of Li and Sevcik [21] is the single step algorithms, so named because data is moved exactly once between processors. Examples of this include sample sort [19, 9] parallel sorting by regular sampling [29, 22], and parallel sorting by overpartitioning [21] The price paid by these single step algorithms is an irregular communication scheme and difficulty with load balancing. The other group of sorting algorithms is the multi step algorithms, which include bitonic sort [8] column sort [20] rotate sort ....
H. Shi and J. Schaeffer. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14, 4 (Apr. 1992.
....subarrays where P also denotes the number of processors. Each partition is sorted in parallel, after which a number of global pivots are determined. Based in these pivots, the subarrays are cyclically merged. As a result, a sorted array is obtained. A description of the algorithm can be found in [16]. To have an interesting scenario for our experiment (i.e. the worst case for Eqs. 5) and (4) we adjust the parameters such that both tasks have approximately equal workload (which also balances the pipeline) We choose P = 1 processor for NAS EP and P = 24 for PSRS. The pipeline processes 6; ....
Shi, H. and Schaeffer, J.: "Parallel Sorting by Regular Sampling." In Journal of Parallel and Distributed Computing, Vol. 14, No. 4, 1992, pp. 361-372.
....of Blelloch et al. 7] and our own random sample sort algorithm [14] Alternatively, the splitters may be chosen as we have for this paper by regularly sampling the sorted input elements at each processor hence the name Sorting by Regular Sampling. A previous version of regular sample sort [15, 17], known as Parallel Sorting by Regular Sampling (PSRS) first sorts the elements at each processor and then selects every element as a sample. These samples are then routed to a single processor, where they are sorted and every p sample is selected as a splitter. Each processor then ....
....after which local merging of these subsequences is done to complete the sorting process. The first difficulty with this approach is the load balance. There exist inputs for which at least one processor will be left with as many as p 1 elements at the completion of sorting [15, 17]. This could be reduced by choosing more samples, but this would also increase the overhead. And no matter how many samples are chosen, previous studies have shown that the load balance would still deteriorate linearly with the number of duplicates [15] One could, of course, tag each item with a ....
H. Shi and J. Schaeffer. Parallel Sorting by Regular Sampling. Journal of Parallel and Distributed Computing, 14:361--372, 1992.
....in order to eventually generate an analytical estimate make the modeling step far from trivial, and many of the associated issues will therefore be treated separately in the design document. The parallel sorting application is based on the PSRS algorithm (Parallel Sorting by Regular Sampling [40]) The pseudo code is shown in Appendix A. The input vector x of length N is blockpartitioned in P (number of processors) parts. Each partition is sorted in parallel (sortparts) after which a number of global pivots are determined (computepivots) Based on these pivots, the partially sorted ....
H. Shi and J. Schaeffer, "Parallel sorting by regular sampling," Journal of Parallel and Distributed Computing, vol. 14, no. 4, 1992, pp. 361--372.
....of two groups, each with its respective disadvantages. The first group, using the classification of Li and Sevcik [21] is the singlestep algorithms, so named because data is moved exactly once between processors. Examples of this include sample sort [19, 9] parallel sorting by regular sampling [29, 22], and parallel sorting by overpartitioning [21] The price paid by these single step algorithms is an irregular communication scheme and difficulty with load balancing. The other group of sorting algorithms is the multi step algorithms, which include bitonic sort [8] column sort [20] rotate sort ....
H. Shi and J. Schaeffer, Parallel sorting by regular sampling, J. Parallel Distributed Comput. 14, No. 4 (Apr. 1992), 361#372.
....depends on how well we divide the input, and this in turn depends on how evenly we choose the splitters. One way to choose the splitters is by regularly sampling the sorted input elements at each processor hence the name Sorting by Regular Sampling. A previous version of regular sample sort [18, 16], known as Parallel Sorting by Regular Sampling (PSRS) first sorts the elements at each processor and then selects every element as a sample. These samples are then routed to a single processor, where they are sorted and every p sample is selected as a splitter. Each processor then ....
H. Shi and J. Schaeffer. Parallel Sorting by Regular Sampling. Journal of Parallel and Distributed Computing, 14:361--372, 1992.
....Many papers have discussed the task of sorting on parallel computers. See, for example, 1, 2, 12] Most of these papers have dealt with the problem from a theoretical point of view, neglecting many issues that are important in a practical implementation of a parallel sorting algorithm [4, 10, 14]. This paper describes a practical parallel sorting algorithm which is suitable for efficient general purpose internal sorting. A description of the algorithm is given in Sections 2 and 3. In Section 4 the algorithm is compared with some other parallel sorting algorithms, such as radix sort and ....
....cost of decreased generality. The parallel sample sort algorithm, as with radix sorting, has a problem with high memory requirements [4] It also fails to satisfy property 4 due to its reliance on taking pseudo random samples of the input data. Parallel versions of radix sort [4] and sample sort [4, 14] also require all to all communications, which is much more costly in terms of communication overhead on current parallel machines than the transfer of large contiguous blocks of elements as is used in our algorithm. Several methods that rely on the number of elements being a multiple of the ....
H. Shu and J. Schaeffer, Parallel sorting by regular sampling, J. Parallel and Distributed Computing 14, 1992, 361-372.
....in our project, principally because it seems to be more easier to get results with the technique. We have experimented in [1] with such technique for stable 2 algorithms and with the Bulk Synchronous Parallel (BSP) toolkit as the target programming tool. Parallel Sorting by Regular Sampling [2] or PSRS for short is one technique in the category of one step merging algorithms. It is as follows: Step 1: Perform a local sort; each processor selects p samples which are gathered onto process zero; Step 2: Perform a local sort of p 2 samples; pick p 1 regular pivots k from the sorted p 2 ....
....and Sevcik. Until recently processors that sort in parallel according to the PSRS philosophy began by sequentially sorting their portions in the rst phase of the algorithm and used regular sampling to select pivots. Even Shi and Schae er, the inventors of the PSRS technique in the original paper [2] said that It appears to be a dicult problem to nd pivots that partition the data to be sorted into ordered subsets of equal size without sorting the data rst . In fact, an optimization can be done to bypass the rst sorting phase if we concentrate (see [6] on the notion of quantiles. ....
[Article contains additional citation context not shown here]
Hanmao Shi and Jonathan Schaeer, \Parallel sorting by regular sampling", Journal of Parallel and Distributed Computing, vol. 14, no. 4, pp. 361-372, 1992.
.... run button to allow the program to proceed without interruption. However we are free to intercept it at any later stage using the pause button. The development tool has been used to program a number of sophisticated parallel applications[10] such as parallel sorting using regular sampling[8] and Gaussian elimination using partial pivoting[9] We believe it has saved an immense amount of development time. Note that, for ne grained debugging, a facility is provided to attach a serial debugger to any thread, and interact with it remotely through the console. 3. HOW THE BSP DEBUGGER ....
H. Shi and J. Schaeer. Parallel Sorting by Regular Sampling. Journal of Parallel and Distributed Computing, 1992. Volume 14(4), pp361-372.
....lg 2 # n, # =#249 the improved deterministic sorting algorithm in [27, 28] requires computation and communication time (1 2 # o(1) n lg n p and O(g # 1 n p ) O(L # 1 ) respectively. The algorithm performs deterministic sorting by extending regular sampling, a technique introduced in [48] to perform deterministic regular oversampling. Past results that use regular sampling have been available for cases with p 2 n. The BSP algorithm in [20, 27, 28] further extends the processor range and achieves asymptotically optimal e#ciency for a range of n p that is very close to that ....
H. Shi and J. Schae#er. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14:362-372, 1992.
....m and k respectively can be merged by Batcher s algorithm if the concatenation of the fomer sequence and the reverse of the latter one is input to the algorithm for sorting. An application of the presented algorithm is in parallel sorting [4, 5, 6, 8, 9] In parallel sorting using regular sampling [17], a sample can be sorted either sequentially (as was also done in [11] in the context of a randomized sorting algorithm) or in parallel. In experimental work reported in [4, 5] a BSP implementation of Batcher s sorter is used for sample sorting. Given the simplicity of the algorithm and its low ....
H. Shi and J. Schae#er. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14:362-372, 1992.
No context found.
H. Shi and J. Schaeffer. Parallel Sorting by Regular Sampling. Journal of Parallel and Distributed Computing, 14(4):361-372, 1992.
.... even though parallel object oriented languages (such as Mentat [16] and Orca [2] have existed for over a decade (POOMA [27] is an exception with a narrow focus) Design patterns for parallel programs have existed for two decades, in a variety of guises (e.g. skeletons [8] templates [32][32] However, no widely used parallel computing tool uses this technology. Frameworks have emerged as a powerful tool for rapid code development. To the best of our knowledge, there are no applications of this idea to building tools to support parallel code development. The reality today is ....
.... even though parallel object oriented languages (such as Mentat [16] and Orca [2] have existed for over a decade (POOMA [27] is an exception with a narrow focus) Design patterns for parallel programs have existed for two decades, in a variety of guises (e.g. skeletons [8] templates [32][32]) However, no widely used parallel computing tool uses this technology. Frameworks have emerged as a powerful tool for rapid code development. To the best of our knowledge, there are no applications of this idea to building tools to support parallel code development. The reality today is that ....
[Article contains additional citation context not shown here]
H. Shi and J. Schaeffer. Parallel Sorting by Regular Sampling. Journal of Parallel and Distributed Computing, 14(4):361-372, 1992.
No context found.
H. Shi and J. Schae#er. Parallel Sorting by Regu lar Sampling. Journal of Parallel and Distributed Computing, 14:361--372, 1992.
No context found.
H. Shi and J. Schae#er. Parallel Sorting by Regu lar Sampling. Journal of Parallel and Distributed Computing, 14:361--372, 1992.
No context found.
H. Shi and J. Schae#er. Parallel Sorting by Regu lar Sampling. Journal of Parallel and Distributed Computing, 14:361--372, 1992.
No context found.
H. Shi and J. Schaeffer. Parallel Sorting by Regular Sampling. Journal of Parallel and Distributed Computing, 14(4):361--372, 1992.
No context found.
H. Shi and J. Schae#er. Parallel Sorting by Regu lar Sampling. Journal of Parallel and Distributed Computing, 14:361--372, 1992.
No context found.
H. Shi and J. Schaeffer. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14:361--372, 1992.
No context found.
H. Shi and J. Schaeffer. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14(4):361-- 372, 1992.
No context found.
H. Shi and J. Schaeffer. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14(4):361--372, 1992.
No context found.
H. Shi and J. Schaeffer. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14(4):361--372, 1992.
No context found.
H. Shi and J. Schaeffer. Parallel Sorting by Regular Sampling. Journal of Parallel and Distributed Computing, 14(4):361--372, 1992.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC