Results 1  10
of
67
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we surve ..."
Abstract

Cited by 349 (23 self)
 Add to MetaCart
(Show Context)
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "outofcore" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machineindependent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
The buffer tree: A new technique for optimal I/Oalgorithms
 University of Aarhus
, 1995
"... ..."
(Show Context)
Optimal Dynamic Interval Management in External Memory (Extended Abstract)
 IN PROC. IEEE SYMP. ON FOUNDATIONS OF COMP. SCI
, 1996
"... We present a space and I/Ooptimal externalmemory data structure for answering stabbing queries on a set of dynamically maintained intervals. Our data structure settles an open problem in databases and I/O algorithms by providing the first optimal externalmemory solution to the dynamic interval m ..."
Abstract

Cited by 82 (20 self)
 Add to MetaCart
We present a space and I/Ooptimal externalmemory data structure for answering stabbing queries on a set of dynamically maintained intervals. Our data structure settles an open problem in databases and I/O algorithms by providing the first optimal externalmemory solution to the dynamic interval management problem, which is a special case of 2dimensional range searching and a central problem for objectoriented and temporal databases and for constraint logic programming. Our data structure simultaneously uses optimal linear space (that is, O(N/B) blocks of disk space) and achieves the optimal O(log B N + T/B) I/O query bound and O(log B N ) I/O update bound, where B is the I/O block size and T the number of elements in the answer to a query. Our structure is also the first optimal external data structure for a 2dimensional range searching problem that has worstcase as opposed to amortized update bounds. Part of the data structure uses a novel balancing technique for efficient worstcase manipulation of balanced trees, which is of independent interest.
The Buffer Tree: A Technique for Designing Batched External Data Structures
, 2003
"... We present a technique for designing external memory data structures that support batched operations I/O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to d ..."
Abstract

Cited by 78 (14 self)
 Add to MetaCart
We present a technique for designing external memory data structures that support batched operations I/O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to develop I/Oefficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known internal memory algorithms—given the developed external data structures.
I/O Optimal Isosurface Extraction
, 1997
"... In this paper we give I/Ooptimal techniques for the extraction of isosurfaces from volumetric data, by a novel application of the I/Ooptimal interval tree of Arge and Vitter. The main idea is to preprocess the dataset once and for all to build an efficient search structure in disk, and then each ti ..."
Abstract

Cited by 77 (17 self)
 Add to MetaCart
In this paper we give I/Ooptimal techniques for the extraction of isosurfaces from volumetric data, by a novel application of the I/Ooptimal interval tree of Arge and Vitter. The main idea is to preprocess the dataset once and for all to build an efficient search structure in disk, and then each time we want to extract an isosurface, we perform an outputsensitive query on the search structure to retrieve only those active cells that are intersected by the isosurface. During the query operation, only two blocks of main memory space are needed, and only those active cells are brought into the main memory, plus some negligible overhead of disk accesses. This implies that we can efficiently visualize very large datasets on workstations with just enough main memory to hold the isosurfaces themselves. The implementation is delicate but not complicated. We give the first implementation of the I/Ooptimal interval tree, and also implement our methods as an I/O filter for Vtk's isosurface ext...
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynami ..."
Abstract

Cited by 73 (31 self)
 Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
Scalable sweepingbased spatial join
 IN PROC. 24TH INT. CONF. VERY LARGE DATA BASES, VLDB
, 1998
"... In this paper, we consider the filter step of the spatial join problem, for the case where neither of the inputs are indexed. We present a new algorithm, Scalable SweepingBased Spatial Join (SSSJ), that achieves both efficiency on reallife data and robustness against highly skewed and worstcase d ..."
Abstract

Cited by 68 (7 self)
 Add to MetaCart
In this paper, we consider the filter step of the spatial join problem, for the case where neither of the inputs are indexed. We present a new algorithm, Scalable SweepingBased Spatial Join (SSSJ), that achieves both efficiency on reallife data and robustness against highly skewed and worstcase data sets. The algorithm combines a method with theoretically optimal bounds on I/O transfers based on the recently proposed distributionsweeping technique with a highly optimized implementation of internalmemory planesweeping. We present experimental results based on an efficient implementation of the SSSJ algorithm, and compare it to the stateoftheart PartitionBased SpatialMerge (PBSM) algorithm of Pate1 and DeWitt.
Asymptotically Tight Bounds for Performing BMMC Permutations on Parallel Disk Systems
, 1994
"... This paper presents asymptotically equal lower and upper bounds for the number of parallel I/O operations required to perform bitmatrixmultiply/complement (BMMC) permutations on the Parallel Disk Model proposed by Vitter and Shriver. A BMMC permutation maps a source index to a target index by an a ..."
Abstract

Cited by 55 (18 self)
 Add to MetaCart
This paper presents asymptotically equal lower and upper bounds for the number of parallel I/O operations required to perform bitmatrixmultiply/complement (BMMC) permutations on the Parallel Disk Model proposed by Vitter and Shriver. A BMMC permutation maps a source index to a target index by an affine transformation over GF (2), where the source and target indices are treated as bit vectors. The class of BMMC permutations includes many common permutations, such as matrix transposition (when dimensions are powers of 2), bitreversal permutations, vectorreversal permutations, hypercube permutations, matrix reblocking, Graycode permutations, and inverse Graycode permutations. The upper bound improves upon the asymptotic bound in the previous best known BMMC algorithm and upon the constant factor in the previous best known bitpermute/complement (BPC) permutation algorithm. The algorithm achieving the upper bound uses basic linearalgebra techniques to factor the characteristic matrix...
Optimal external memory interval management
 SIAM Journal on Computing
"... This work has been made available by the University of Kansas ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
(Show Context)
This work has been made available by the University of Kansas
Efficient External Memory Algorithms by Simulating CoarseGrained Parallel Algorithms
, 2003
"... External memory (EM) algorithms are designed for largescale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to ..."
Abstract

Cited by 45 (11 self)
 Add to MetaCart
(Show Context)
External memory (EM) algorithms are designed for largescale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge by the ACMWorking Group on Storage I/O for LargeScale Computing.