| M. H. Nodine and J. S. Vitter. Greed sort: An optimal sorting algorithm for multiple disks. Journal of the ACM, 42(4):919--933, 1995. |
....Diego, California, USA. Copyright 2003 ACM 1 58113 661 7 03 0006 . 5.00. performance computer, a high performance external sorting algorithm needs to be able to exploit many disks. Interestingly, parallel disk sorting is a nontrivial problem. Asymptotically I O optimal deterministic algorithms [17, 18] are complicated and have rather large constant factors. There are relatively simple randomized algorithms that approach the lower bound of 2N DB log M B N B I Os for sorting N elements using D disks, fast memory of size M , and blocks of size B [12] These algorithms are so close to algorithms ....
M. H. Nodine and J. S. Vitter. Greed sort: An optimal sorting algorithm for multiple disks. Journal of the ACM, 42(4):919--933, 1995.
....time integer time and O(n log n) work general time and O(n) work (rand. constant Proof. External Memory: Sorting tuples and lexicographic naming is easily reduced to external memory integer sorting. I O optimal deterministic parallel disk sorting algorithms are well known [34, 33]. We have to make a few remarks regarding internal work however. To achieve optimal internal work for all values of n, M , and B, we can use radix sort where the most signi cant digit has blog Mc 1 bits and the remaining digits have blog M=Bc bits. Sorting then starts with log M=B n=M data ....
M. H. Nodine and J. S. Vitter. Greed sort: An optimal sorting algorithm for multiple disks. J. ACM, 42(4):919-933, 1995.
....can move up to D blocks, one from each disk. For graphs with n nodes and m edges the semi external memory (SEM) setting assumes c n M m for some appropriate constant c 1. A number of basic computational problems can be solved I O efficiently. The most prominent example is EM sorting [2, 15]: sorting x items of constant size takes sort(x) Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST 1999 14186 (ALCOM FT) and the DFG grant SA 933 1 1. x ) I Os. BFS, however, seems to be hard for external memory computation (and ....
M. H. Nodine and J. S. Vitter. Greed sort: An optimal sorting algorithm for multiple disks. Journal of the ACM, 42(4):919--933, 1995.
....as a subroutine [41] Perhaps the best algorithm for both a single disk and a parallel multi head disk is multi way merge sort. This algorithm can be implemented using about 2 I Os [23] Ingenious deterministic algorithms have been developed that adapt multi way merging to independent disks [29]. Since the known deterministic algorithms increase the number of I Os by a considerable factor, Barve et al. 6] have developed a more practical algorithm based on randomized striping, which also achieves O Gamma N I Os if M = Omega (D log D) Our general emulation result does not have ....
Nodine, M. H., and Vitter, J. S. Greed sort: An optimal sorting algorithm for multiple disks. Journal of the ACM 42, 4 (1995), 919--933.
....subroutine [32] Perhaps the best algorithm for both a single disk and a parallel multi head disk is multi way merge sort. This algorithm can be implemented using about 2 log M=B I Os [16] Ingenious deterministic algorithms have been developed that adapt multiway merging to independent disks [22]. Since the known deterministic algorithms increase the number of I Os by a considerable factor, Barve et al. 5] have developed a more practical algorithm based on randomized striping, which also achieves O DB log M=B I Os if M = Omega (D log D) Our general emulation result does not have ....
Nodine, M. H., and Vitter, J. S. Greed sort: An optimal sorting algorithm for multiple disks. Journal
....N = 10 10 to N = 10 12 . An increasingly popular approach to further increase the throughput of the I O system is to use a number of disks in parallel [50, 51, 97] Several authors have considered an extension of the above model with a parameter D denoting the number of disks in the system [21, 73, 71, 72, 97]. In the parallel disk model [97] one can read or write one block from each of the D disks simultaneously in one I O. The number of disks D range up to 10 2 in current disk arrays. The parallel disk model corresponds to the one shown in Figure 2, where we only count the number of blocks of B ....
.... Vitter [6] and the notion of parallel disks was introduced by Vitter and Shriver [97] The latter papers also deal with fundamental problems such as permutation, sorting and matrix transposition, and a number of authors have considered the difficult problem of sorting optimally on parallel disks [5, 21, 73, 71]. The problem of implementing various classes of permutations has been addressed in [38, 39, 41] More recently researchers have moved on to more specialized problems in the computational geometry [12, 14, 19, 32, 53, 99] graph theoretical [13, 14, 32, 34, 52, 66] and string processing areas [15, ....
[Article contains additional citation context not shown here]
M. H. Nodine and J. S. Vitter. Greed sort: An optimal sorting algorithm for multiple disks. Journal of the ACM, pages 919--933, 1995.
....bounds. 1 Introduction There is a growing interest in algorithms working on sets of data that are too large to be fit in the internal memory of computers, and that consequently need to perform input output accesses to external storage devices, like disks and CD ROMs (see e.g. 4, 11, 19, 21, 29, 37] These devices are roughly 10 6 times slower than internal memory in terms of access time. In many applications, this disparity has given rise to an input output (or I O) bottleneck, in which the time spent on moving data between internal and external memory dominates the overall ....
.... the gradation, the incremental step from T (S i Gamma1 ) to T (S i ) can be performed by using efficient internal memory algorithms together with simple I O efficient procedures like straightforward movement of data in blocks or, in the most sophisticated case, external memory sorting routines [29, 37] As the main result, we obtain an I O optimal randomized algorithm for the segment intersections problem that requires O(n log m n k) expected I Os. Since the algorithm necessarily computes the pairwise intersections among the input segments, this settles a question posed in [5] The ....
[Article contains additional citation context not shown here]
M. H. Nodine and J. S. Vitter. Greed sort: An optimal sorting algorithm for multiple disks. Journal of the ACM (1995) 919--933.
.... optimal EM algorithm for such problems thus requires independent access to the D disks, in which each of the D blocks in a parallel I O operation can reside at a di erent o set on its disk [18] Designing algorithms for independent parallel disks has turned out to be ad hoc and relatively dicult [21, 14, 15, 5, 6, 7, 8, 3, 4, 18], and in practice the added overhead often makes the Department of Computer Science, Duke University, Durham, NC 27708 0129. Email: jsv cs.duke.edu. Web: http: www.cs.duke.edu jsv . Support was provided in part by the Army Research Oce through grant DAAD19 01 1 0725 and by the National ....
.... red blue line intersection in GIS [3] and spatial join [2] In many cases, the algorithms can be adapted to use parallel disks on an ad hoc basis by applying techniques from the previously developed parallel disk sorting algorithms, such as those by Vitter and Shriver [21] Nodine and Vitter [14, 15], and Dehne et al. 7, 8] but in practice these techniques are often slower than simpler approaches based upon disk striping. 1.2 Our Contributions In this paper we develop a simple, practical and provably optimal randomized algorithm for distribution sort with parallel disks. Our method is ....
M. H. Nodine and J. S. Vitter. Greed Sort: An optimal sorting algorithm for multiple disks. Journal of the ACM, 42(4):919-933, July 1995.
.... EM algorithm for such problems thus requires independent access to the D disks, in which each of the D blocks in a parallel I O operation can reside at a different offset on its disk [17] Designing algorithms for independent parallel disks has turned out to be ad hoc and relatively difficult [19, 13, 14, 5, 6, 7, 8, 3, 4, 17], and in practice the added overhead often makes the algorithms slower than those based upon disk striping. It is therefore highly desirable to develop efficient techniques for converting serial EM algorithms into EM algorithms that use parallel disks independently. In this paper we develop a ....
.... of a simple polygon, red blue line intersection in GIS [3] and spatial join [2] In many cases, the algorithms can be adapted to use parallel disks on an ad hoc basis by applying techniques from previous parallel disk sorting algorithms, such as those by Vitter and Shriver [19] Nodine and Vitter [13, 14], and Dehne et al. 7, 8] but in practice these techniques are often slower than simpler approaches based upon disk striping. 1.2 Our Contributions In this paper we develop a simple, practical and provably optimal randomized algorithm for distribution sort with parallel disks. Our method is ....
M. H. Nodine and J. S. Vitter. Greed Sort: An optimal sorting algorithm for multiple disks. Journal of the ACM, 42(4):919-933, July 1995.
.... algorithm for sorting and many related EM problems requires independent access to the D disks, in which each of the D blocks in a parallel I O operation can reside at a different position on its disk [19, 17] Designing algorithms for independent parallel disks has been surprisingly difficult [19, 14, 15, 3, 8, 9, 17, 16, 18]. In this paper we consider parallel disk output and input separately, in particular as the output scheduling problem problem and the prefetch scheduling problem respectively. The (online) output scheduling (or queued writing) problem takes as input a fixed size pool of m (initially empty) ....
....in [16] for FR allocation and extended to RC allocation in [18] These results are the basis for our results in Sect. 4. For reading there is an algorithm for SR allocation that is close to optimal if m AE D log D [3] There are asymptotically optimal deterministic algorithms for external sorting [15], but the constant factors involved make them unattractive in practice. Barve et al. 3] introduced a simple and efficient randomized sorting algorithm called Simple Randomized Mergesort (SRM) For each run, SRM allocates blocks to disks using the SR allocation discipline. SRM comes within fl ....
M. H. Nodine and J. S. Vitter. Greed Sort: An optimal sorting algorithm for multiple disks. Journal of the ACM, 42(4):919--933, July 1995.
.... algorithm for sorting and many related EM problems requires independent access to the D disks, in which each of the D blocks in a parallel I O operation can reside at a different position on its disk [19, 17] Designing algorithms for independent parallel disks has been surprisingly difficult [19, 14, 15, 3, 8, 9, 17, 16, 18]. In this paper we consider parallel disk output and input separately, in particular as the output scheduling problem problem and the prefetch scheduling problem respectively. The (online) output scheduling (or queued writing) problem takes as input a fixed size pool of m (initially empty) ....
....in [16] for FR allocation and extended to RC allocation in [18] These results are the basis for our results in Sect. 4. For reading there is an algorithm for SR allocation that is close to optimal if m AE D log D [3] There are asymptotically optimal deterministic algorithms for external sorting [15], but the constant factors involved make them unattractive in practice. Barve et al. 3] introduced a simple and efficient randomized sorting algorithm called Simple Randomized Mergesort (SRM) For each run, SRM allocates blocks to disks using the SR allocation discipline. SRM comes within fl ....
M. H. Nodine and J. S. Vitter. Greed Sort: An optimal sorting algorithm for multiple disks. Journal of the ACM, 42(4):919--933, July 1995.
....the data blocks residing in memory at a given time are such that the I O required to get them there can be charged to the amount of internal memory work that can be accomplished using that set of memory resident data blocks. Several interesting parallel disk sorting algorithms [VS94, NV95, AP94] performing an optimal number Theta of I O operations have been proposed, but they are somewhat complicated and difficult to implement in practice. As a consequence, an attractive alternative to implement sorting algorithms for parallel disks is to use the technique of disk striping ....
....When there are hotspots, reading the set of the next R participating blocks can take many more parallel I Os than the optimal number dR=De parallel I Os. We refer the reader to [VS94, BGV97] for more intuition regarding the difficulty merging with parallel independent disks. Nodine and Vitter [NV95] overcame this difficulty by performing external merging by first approximately merging the runs followed by additional passes to refine the merge. Aggarwal and Plaxton s Sharesort [AP94] technique does repeated merging and has accompanying overheads. Each of these approaches involves extra ....
M. H. Nodine and J. S. Vitter. Greed sort: An optimal sorting algorithm for multiple disks. Journal of the ACM, 42(4):919--933, July 1995.
.... algorithm for sorting and many related EM problems requires independent access to the D disks, in which each of the D blocks in a parallel I O operation can reside at a different position on its disk [22, 20] Designing algorithms for independent parallel disks has been surprisingly difficult [22, 15, 16, 3, 8, 9, 20, 17, 21]. In this paper we consider parallel disk input and output separately, in particular as the prefetch scheduling problem and the output scheduling problem respectively. The (online) output scheduling (or queued writing) problem takes as input a fixed size pool of (empty) memory buffers for storing ....
.... Delta time algorithm with this property, and yielding optimal schedules for the offline case. Recently, Kallahalla and Varman independently proposed an optimal algorithm for integrated prefetching and caching. 11] There are asymptotically optimal deterministic algorithms for external sorting [16], but the constant factors involved make them unattractive in practice. Barve et al. 3] introduced a simple and efficient randomized sorting algorithm called Simple Randomized Merge sort (SRM) For each run, SRM allocates blocks to disks using the SR allocation discipline. SRM comes within fl ....
[Article contains additional citation context not shown here]
M. H. Nodine and J. S. Vitter. Greed Sort: An optimal sorting algorithm for multiple disks. Journal of the ACM, 42(4):919--933, July 1995.
....sorting) the final sorted sequence can be obtained with O(K 2 N 2 =B) O(N 2 =B) extra I Os by moving the strings into their final position one at a time. 1. 2 Previous Results in I O efficient Computation Early work on I O algorithms concentrated on sorting and permutation related problems [2, 9, 18, 41, 40, 46]. Work has also been done on matrix algebra and related problems arising in scientific computation [2, 45, 46] More recently, researchers have designed I O algorithms for a number of problems in different areas, such as in computational geometry [6, 10, 28] graph theoretic computation [6, 7, 16] ....
M. H. Nodine and J. S. Vitter. Greed sort: An optimal sorting algorithm for multiple disks. Journal of the ACM, pages 919-- 933, 1995.
....[52] when D is very large, it is often the method of choice in practice for using parallel disks, especially when D is moderately sized [51] 1. 2 Previous Results in I O Efficient Computation Early work on I O algorithms concentrated on algorithms for sorting and permutation related problems [3, 25, 40, 39, 52, 10, 14]. External sorting requires Theta(n log m n) I Os, 1 which is the external memory equivalent of the well known Theta(N log N) time bound for sorting in internal memory. Work has also been done on matrix algebra and related problems arising in scientific computation [3, 51, 52] More recently, ....
....Unfortunately, the modified algorithm still needs O(N log B n) I Os to sort the segments. In this paper we focus our attention on the single disk model. As described in Section 1.1, striping can be used to implement our algorithms on parallel disk systems with D 1. Additionally, techniques from [40] and [39] can be used to extend many of our results to parallel disk systems. In the conference version of this paper we conjectured that all our results could be improved by the optimal factor of D on parallel disk systems with D disks, but it is still an open problem whether the required merges ....
M. H. Nodine and J. S. Vitter. Greed sort: An optimal sorting algorithm for multiple disks. Journal of the ACM, pages 919--933, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC