Results 1  10
of
78
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we surve ..."
Abstract

Cited by 349 (23 self)
 Add to MetaCart
(Show Context)
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "outofcore" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machineindependent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
ExternalMemory Algorithms for Processing Line Segments in Geographic Information Systems
, 2007
"... In the design of algorithms for largescale applications it is essential to consider the problem of minimizing I/O communication. Geographical information systems (GIS) are good examples of such largescale applications as they frequently handle huge amounts of spatial data. In this paper we develop ..."
Abstract

Cited by 67 (25 self)
 Add to MetaCart
In the design of algorithms for largescale applications it is essential to consider the problem of minimizing I/O communication. Geographical information systems (GIS) are good examples of such largescale applications as they frequently handle huge amounts of spatial data. In this paper we develop efficient externalmemory algorithms for a number of important problems involving line segments in the plane, including trapezoid decomposition, batched planar point location, triangulation, red–blue line segment intersection reporting, and general line segment intersection reporting. In GIS systems the first three problems are useful for rendering and modeling, and the latter two are frequently used for overlaying maps and extracting information from them.
Cacheoblivious priority queue and graph algorithm applications
 In Proc. 34th Annual ACM Symposium on Theory of Computing
, 2002
"... In this paper we develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in O ( 1 B logM/B N) amortized memory B transfers, where M and B are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hi ..."
Abstract

Cited by 65 (8 self)
 Add to MetaCart
In this paper we develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in O ( 1 B logM/B N) amortized memory B transfers, where M and B are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hierarchy. In a cacheoblivious data structure, M and B are not used in the description of the structure. The bounds match the bounds of several previously developed externalmemory (cacheaware) priority queue data structures, which all rely crucially on knowledge about M and B. Priority queues are a critical component in many of the best known externalmemory graph algorithms, and using our cacheoblivious priority queue we develop several cacheoblivious graph algorithms.
Optimal external memory interval management
 SIAM Journal on Computing
"... This work has been made available by the University of Kansas ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
(Show Context)
This work has been made available by the University of Kansas
Logarithmic lower bounds in the cellprobe model
 SIAM Journal on Computing
"... Abstract. We develop a new technique for proving cellprobe lower bounds on dynamic data structures. This enables us to prove Ω(lg n) bounds, breaking a longstanding barrier of Ω(lg n/lg lg n). We can also prove the first Ω(lgB n) lower bound in the external memory model, without assumptions on the ..."
Abstract

Cited by 47 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We develop a new technique for proving cellprobe lower bounds on dynamic data structures. This enables us to prove Ω(lg n) bounds, breaking a longstanding barrier of Ω(lg n/lg lg n). We can also prove the first Ω(lgB n) lower bound in the external memory model, without assumptions on the data structure. We use our technique to prove better bounds for the partialsums problem, dynamic connectivity and (by reductions) other dynamic graph problems. Our proofs are surprisingly simple and clean. The bounds we obtain are often optimal, and lead to a nearly complete understanding of the problems. We also present new matching upper bounds for the partialsums problem. Key words. cellprobe complexity, lower bounds, data structures, dynamic graph problems, partialsums problem AMS subject classification. 68Q17
LazyAdaptive Tree: An Optimized Index Structure for Flash Devices
 PVLDB
"... Flash memories are in ubiquitous use for storage on sensor nodes, mobile devices, and enterprise servers. However, they present significant challenges in designing tree indexes due to their fundamentally different read and write characteristics in comparison to magnetic disks. In this paper, we pres ..."
Abstract

Cited by 38 (0 self)
 Add to MetaCart
(Show Context)
Flash memories are in ubiquitous use for storage on sensor nodes, mobile devices, and enterprise servers. However, they present significant challenges in designing tree indexes due to their fundamentally different read and write characteristics in comparison to magnetic disks. In this paper, we present the LazyAdaptive Tree (LATree), a novel index structure that is designed to improve performance by minimizing accesses to flash. The LAtree has three key features: 1) it amortizes the cost of node reads and writes by performing update operations in a lazy manner using cascaded buffers, 2) it dynamically adapts buffer sizes to workload using an online algorithm, which we prove to be optimal under the cost model for raw NAND flashes, and 3) it optimizes index parameters, memory management, and storage reclamation to address flash constraints. Our performance results on raw NAND flashes show that the LATree achieves 2 × to 12 × gains over the best of alternate schemes across a range of workloads and memory constraints. Initial results on SSDs are also promising, with 3× to 6 × gains in most cases. 1.
On externalmemory MST, SSSP and multiway planar graph separation
 In Proc. 8th Scandinavian Workshop on Algorithmic Theory, volume 1851 of LNCS
, 2000
"... Recently external memory graph algorithms have received considerable attention because massive graphs arise naturally in many applications involving massive data sets. Even though a large number of I/Oefficient graph algorithms have been developed, a number of fundamental problems still remain open ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
Recently external memory graph algorithms have received considerable attention because massive graphs arise naturally in many applications involving massive data sets. Even though a large number of I/Oefficient graph algorithms have been developed, a number of fundamental problems still remain open. In this paper we develop an improved algorithm for the problem of computing a minimum spanning tree of a general graph, as well as new algorithms for the single source shortest paths and the multiway graph separation problems on planar graphs.
TerraStream: From elevation data to watershed hierarchies
 PROC. ACM SYMPOS. ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS
"... We consider the problem of extracting a river network and a watershed hierarchy from a terrain given as a set of irregularly spaced points. We describe TerraStream, a “pipelined” solution that consists of four main stages: construction of a digital elevation model (DEM), hydrological conditioning, e ..."
Abstract

Cited by 21 (12 self)
 Add to MetaCart
(Show Context)
We consider the problem of extracting a river network and a watershed hierarchy from a terrain given as a set of irregularly spaced points. We describe TerraStream, a “pipelined” solution that consists of four main stages: construction of a digital elevation model (DEM), hydrological conditioning, extraction of river networks, and construction of a watershed hierarchy. Our approach has several advantages over existing methods. First, we design and implement the pipeline so each stage is scalable to massive data sets; a single nonscalable stage would create a bottleneck and limit overall scalability. Second, we develop the algorithms in a general framework so that they work for both TIN and grid DEMs. TerraStream is flexible and allows users to choose from various models and parameters, yet our pipeline is designed to reduce (or eliminate) the need for manual intervention between stages. We have implemented TerraStream and present experimental results on real elevation point sets that show that our approach handles massive multigigabyte terrain data sets. For example, we can process a data set containing over 300 million points—over 20GB of raw data—in under 26 hours, where most of the time (76%) is spent in the initial CPUintensive DEM construction stage.
Counting Inversions, Offline Orthogonal Range Counting, and Related Problems
"... We give an O(n √ lg n)time algorithm for counting the number of inversions in a permutation on n elements. This improves a longstanding previous bound of O(n lg n / lg lg n) that followed from Dietz’s data structure [WADS’89], and answers a question of Andersson and Petersson [SODA’95]. As Dietz’s ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
We give an O(n √ lg n)time algorithm for counting the number of inversions in a permutation on n elements. This improves a longstanding previous bound of O(n lg n / lg lg n) that followed from Dietz’s data structure [WADS’89], and answers a question of Andersson and Petersson [SODA’95]. As Dietz’s result is known to be optimal for the related dynamic rank problem, our result demonstrates a significant improvement in the offline setting. Our new technique is quite simple: we perform a “vertical partitioning ” of a trie (akin to van Emde Boas trees), and use ideas from external memory. However, the technique finds numerous applications: for example, we obtain • in d dimensions, an algorithm to answer n offline orthogonal range counting queries in time O(n lg d−2+1/d n); • an improved construction time for online data structures for orthogonal range counting; • an improved update time for the partial sums problem; • faster Word RAM algorithms for finding the maximum depth in an arrangement of axisaligned rectangles, and for the slope selection problem. As a bonus, we also give a simple (1 + ε)approximation algorithm for counting inversions that runs in linear time, improving the previous O(n lg lg n) bound by Andersson and Petersson.
I/Oefficient Point Location using Persistent BTrees
"... We present an external planar point location data structure that is I/Oefficient both in theory and practice. The developed structure uses linear space and answers a query in optimal O(log B N) I/Os, where B is the disk block size. It is based on a persistent Btree, and all previously developed su ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
(Show Context)
We present an external planar point location data structure that is I/Oefficient both in theory and practice. The developed structure uses linear space and answers a query in optimal O(log B N) I/Os, where B is the disk block size. It is based on a persistent Btree, and all previously developed such structures assume a total order on the elements in the structure. As a theoretical result of independent interest, we show how to remove this assumption. Most previous theoretical I/Oefficient planer point location structures are relatively complicated and have not been implemented. Based on a bucket approach, Vahrenhold and Hinrichs therefore developed a simple and practical, but theoretically nonoptimal, heuristic structure. We present an extensive experimental evaluation that shows that on a range of realworld Geographic Information Systems (GIS) data, our structure uses fewer I/Os than the structure of Vahrenhold and Hinrichs to answer a query. On a synthetically generated worstcase dataset, our structure uses significantly fewer I/Os.