Results 1  10
of
28
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we surve ..."
Abstract

Cited by 360 (23 self)
 Add to MetaCart
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "outofcore" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machineindependent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
A Functional Approach to External Graph Algorithms
 Algorithmica
, 1998
"... . We present a new approach for designing external graph algorithms and use it to design simple external algorithms for computing connected components, minimum spanning trees, bottleneck minimum spanning trees, and maximal matchings in undirected graphs and multigraphs. Our I/O bounds compete w ..."
Abstract

Cited by 109 (2 self)
 Add to MetaCart
. We present a new approach for designing external graph algorithms and use it to design simple external algorithms for computing connected components, minimum spanning trees, bottleneck minimum spanning trees, and maximal matchings in undirected graphs and multigraphs. Our I/O bounds compete with those of previous approaches. Unlike previous approaches, ours is purely functionalwithout side effectsand is thus amenable to standard checkpointing and programming language optimization techniques. This is an important practical consideration for applications that may take hours to run. 1 Introduction We present a divideandconquer approach for designing external graph algorithms, i.e., algorithms on graphs that are too large to fit in main memory. Our approach is simple to describe and implement: it builds a succession of graph transformations that reduce to sorting, selection, and a recursive bucketing technique. No sophisticated data structures are needed. We apply our t...
The Buffer Tree: A Technique for Designing Batched External Data Structures
, 2003
"... We present a technique for designing external memory data structures that support batched operations I/O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to d ..."
Abstract

Cited by 75 (14 self)
 Add to MetaCart
We present a technique for designing external memory data structures that support batched operations I/O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to develop I/Oefficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known internal memory algorithms—given the developed external data structures.
Scalable sweepingbased spatial join
 IN PROC. 24TH INT. CONF. VERY LARGE DATA BASES, VLDB
, 1998
"... In this paper, we consider the filter step of the spatial join problem, for the case where neither of the inputs are indexed. We present a new algorithm, Scalable SweepingBased Spatial Join (SSSJ), that achieves both efficiency on reallife data and robustness against highly skewed and worstcase d ..."
Abstract

Cited by 69 (7 self)
 Add to MetaCart
In this paper, we consider the filter step of the spatial join problem, for the case where neither of the inputs are indexed. We present a new algorithm, Scalable SweepingBased Spatial Join (SSSJ), that achieves both efficiency on reallife data and robustness against highly skewed and worstcase data sets. The algorithm combines a method with theoretically optimal bounds on I/O transfers based on the recently proposed distributionsweeping technique with a highly optimized implementation of internalmemory planesweeping. We present experimental results based on an efficient implementation of the SSSJ algorithm, and compare it to the stateoftheart PartitionBased SpatialMerge (PBSM) algorithm of Pate1 and DeWitt.
Indexing Animated Objects Using Spatiotemporal Access Methods
 IEEE Transactions on Knowledge and Data Engineering
, 2001
"... AbstractÐWe present a new approach for indexing animated objects and efficiently answering queries about their position in time and space. In particular, we consider an animated movie as a spatiotemporal evolution. A movie is viewed as an ordered sequence of frames, where each frame is a 2D space oc ..."
Abstract

Cited by 54 (7 self)
 Add to MetaCart
(Show Context)
AbstractÐWe present a new approach for indexing animated objects and efficiently answering queries about their position in time and space. In particular, we consider an animated movie as a spatiotemporal evolution. A movie is viewed as an ordered sequence of frames, where each frame is a 2D space occupied by the objects that appear in that frame. The queries of interest are range queries of the form, ªfind the objects that appear in area S between frames fi and fjº as well as nearest neighbor queries such as, ªfind the q nearest objects to a given position A between frames fi and fj.º The straightforward approach to index such objects considers the frame sequence as another dimension and uses a 3D access method (such as, an RTree or its variants). This, however, assigns long ªlifetimeº intervals to objects that appear through many consecutive frames. Long intervals are difficult to cluster efficiently in a 3D index. Instead, we propose to reduce the problem to a partialpersistence problem. Namely, we use a 2D access method that is made partially persistent. We show that this approach leads to faster query performance while still using storage proportional to the total number of changes in the frame evolution. What differentiates this problem from traditional temporal indexing approaches is that objects are allowed to move and/or change their extent continuously between frames. We present novel methods to approximate such object evolutions. We formulate an optimization problem for which we provide an optimal solution for the case where objects move linearly. Finally, we present an extensive experimental study of the proposed methods. While we concentrate on animated movies, our approach is general and can be applied to other spatiotemporal applications as well. Index TermsÐAccess methods, spatiotemporal databases, animated objects, multimedia. 1
Efficient Bulk Operations on Dynamic RTrees
 ALGORITHMICA
, 2002
"... In recent years there has been an upsurge of interest in spatial databases. A major issue is how to manipulate efficiently massive amounts of spatial data stored on disk in multidimensional spatial indexes (data structures). Construction of spatial indexes (bulk loading) has been studied intensivel ..."
Abstract

Cited by 48 (9 self)
 Add to MetaCart
In recent years there has been an upsurge of interest in spatial databases. A major issue is how to manipulate efficiently massive amounts of spatial data stored on disk in multidimensional spatial indexes (data structures). Construction of spatial indexes (bulk loading) has been studied intensively in the database community. The continuous arrival of massive amounts of new data makes it important to update existing indexes (bulk updating) efficiently. In this paper we present a simple, yet efficient, technique for performing bulk update and query operations on multidimensional indexes. We present our technique in terms of the socalled Rtree and its variants, as they have emerged as practically efficient indexing methods for spatial data. Our method uses ideas from the buffer tree lazy buffering technique and fully utilizes the available internal memory and the page size of the operating system. We give a theoretical analysis of our technique, showing that it is efficient both in terms of I/O communication, disk storage, and internal computation time. We also present the results of an extensive set of experiments showing that in practice our approach performs better than the previously best known bulk update methods with respect to update time, and that it produces a better quality index in terms of query performance. One important novel feature of our technique is that in most cases it allows us to perform a batch of updates and queries simultaneously. To be able to do so is essential in environments where queries have to be answered even while the index is being updated and reorganized.
I/Oefficient algorithms for contourline extraction and planar graph blocking (Extended Abstract)
 IN PROCEEDINGS OF THE 10TH ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1998
"... For a polyhedral terrain \Sigma, the contour at zcoordinate h, denoted Ch , is defined to be the intersection of the plane z = h with \Sigma. In this paper, we study the contourline extraction problem, where we want to preprocess \Sigma into a data structure so that given a query zcoordinate h, ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
For a polyhedral terrain \Sigma, the contour at zcoordinate h, denoted Ch , is defined to be the intersection of the plane z = h with \Sigma. In this paper, we study the contourline extraction problem, where we want to preprocess \Sigma into a data structure so that given a query zcoordinate h, we can report Ch quickly. This is a central problem that arises in geographic information systems (GIS), where terrains are often stored as Triangular Irregular Networks (TINs). We present an I/Ooptimal algorithm for this problem which stores a terrain \Sigma with N vertices using O(N=B) blocks, where B is the size of a disk block, so that for any query h, the contour Ch can be computed using O(log B N + jCh j=B) I/O operations, where jCh j denotes the size of Ch .
Sorting, searching, and simulation in the MapReduce framework
 in Proc. of the 22nd International Symp. on Algorithms and Computation
"... In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the usefulness of our approach by designing and analyzing efficient MapReduce algorithms for fundamental sorting, searching, and simulation problems. This study is motivated by a goal of ultimately puttin ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the usefulness of our approach by designing and analyzing efficient MapReduce algorithms for fundamental sorting, searching, and simulation problems. This study is motivated by a goal of ultimately putting the MapReduce framework on an equal theoretical footing with the wellknown PRAM and BSP parallel models, which would benefit both the theory and practice of MapReduce algorithms. We describe efficient MapReduce algorithms for sorting, multisearching, and simulations of parallel algorithms specified in the BSP and CRCW PRAM models. We also provide some applications of these results to problems in parallel computational geometry for the MapReduce framework, which result in efficient MapReduce algorithms for sorting, 2 and 3dimensional convex hulls, and fixeddimensional linear programming. For the case when mappers and reducers have a memory/messageI/O size of M = Θ(N ), for a small constant > 0, all of our MapReduce algorithms for these applications run in a constant number of rounds. ar X iv
Research Challenges in LocationEnabled MServices
, 2002
"... Rapid, sustained advances in key computing hardware technologies combine to enable a new class of computing services that aim to meet needs of mobile users. These ubiquitous and intelligent services adapt to each user's particular preferences and current circumstancesthey are personalized. T ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Rapid, sustained advances in key computing hardware technologies combine to enable a new class of computing services that aim to meet needs of mobile users. These ubiquitous and intelligent services adapt to each user's particular preferences and current circumstancesthey are personalized. The services exploit data available from multiple sources, including data on past interactions with the users, data accessible via the Internet, and data obtained from sensors. The user's geographical location is particularly central to these services. We outline
LEDASM: Extending LEDA to secondary memory
 In Proc. Workshop on Algorithm Engineering
, 1999
"... Abstract. During the last years, many software libraries for incore computation have been developed. Most internal memory algorithms perform very badly when used in an external memory setting. We introduce LEDASM that extends the LEDAlibrary [22] towards secondary memory computation. LEDASM use ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
Abstract. During the last years, many software libraries for incore computation have been developed. Most internal memory algorithms perform very badly when used in an external memory setting. We introduce LEDASM that extends the LEDAlibrary [22] towards secondary memory computation. LEDASM uses I/Oecient algorithms and data structures that do not suer from the so called I/O bottleneck. LEDA is used for incore computation. We explain the design of LEDASM and report on performance results. 1