Results 1  10
of
266
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we surve ..."
Abstract

Cited by 349 (23 self)
 Add to MetaCart
(Show Context)
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "outofcore" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machineindependent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
Approximation Algorithms for Projective Clustering
 Proceedings of the ACM SIGMOD International Conference on Management of data, Philadelphia
, 2000
"... We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w ..."
Abstract

Cited by 302 (22 self)
 Add to MetaCart
We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w be the smallest value so that S can be covered by k hyperstrips (resp. hypercylinders), each of width (resp. diameter) at most w : In the plane, the two problems are equivalent. It is NPHard to compute k planar strips of width even at most Cw ; for any constant C ? 0 [50]. This paper contains four main results related to projective clustering: (i) For d = 2, we present a randomized algorithm that computes O(k log k) strips of width at most 6w that cover S. Its expected running time is O(nk 2 log 4 n) if k 2 log k n; it also works for larger values of k, but then the expected running time is O(n 2=3 k 8=3 log 4 n). We also propose another algorithm that computes a c...
On Indexing Mobile Objects
, 1999
"... We show how to index mobile objects in one and two dimensions using efficient dynamic external memory data structures. The problem is motivated by real life applications in traffic monitoring, intelligent navigation and mobile communications domains. For the 1dimensional case, we give (i) a dynamic ..."
Abstract

Cited by 217 (16 self)
 Add to MetaCart
(Show Context)
We show how to index mobile objects in one and two dimensions using efficient dynamic external memory data structures. The problem is motivated by real life applications in traffic monitoring, intelligent navigation and mobile communications domains. For the 1dimensional case, we give (i) a dynamic, external memory algorithm with guaranteed worst case performance and linear space and (ii) a practical approximation algorithm also in the dynamic, external memory setting, which has linear space and expected logarithmic query time. We also give an algorithm with guaranteed logarithmic query time for a restricted version of the problem. We present extensions of our techniques to two dimensions. In addition we give a lower bound on the number of I/O's needed to answer the ddimensional problem. Initial experimental results and comparisons to traditional indexing approaches are also included. 1 Introduction Traditional database management systems assume that data stored in the database rem...
Indexing moving points
, 2003
"... We propose three indexing schemes for storing a set S of N points in the plane, each moving along a linear trajectory, so that any query of the following form can be answered quickly: Given a rectangle R and a real value t; report all K points of S that lie inside R at time t: We first present an in ..."
Abstract

Cited by 184 (11 self)
 Add to MetaCart
We propose three indexing schemes for storing a set S of N points in the plane, each moving along a linear trajectory, so that any query of the following form can be answered quickly: Given a rectangle R and a real value t; report all K points of S that lie inside R at time t: We first present an indexing structure that, for any given constant e> 0; uses OðN=BÞ disk blocks and answers a query in OððN=BÞ 1=2þe þ K=BÞ I/Os, where B is the block size. It can also report all the points of S that lie inside R during a given time interval. A point can be inserted or deleted, or the trajectory of a point can be changed, in Oðlog 2 B NÞ I/Os. Next, we present a general approach that improves the query time if the queries arrive in chronological order, by allowing the index to evolve over time. We obtain a tradeoff between the query time and the number of times the index needs to be updated as the points move. We also describe an indexing scheme in which the number of I/Os required to answer a query depends monotonically on the difference between the query time stamp t and the current time. Finally, we develop an efficient indexing scheme to answer approximate
Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data
 Proc. 30th Int’l Conf. Very Large Data Bases (VLDB
, 2004
"... It is infeasible for a sensor database to contain the exact value of each sensor at all points in time. This uncertainty is inherent in these systems due to measurement and sampling errors, and resource limitations. In order to avoid drawing erroneous conclusions based upon stale data, the use of un ..."
Abstract

Cited by 123 (22 self)
 Add to MetaCart
(Show Context)
It is infeasible for a sensor database to contain the exact value of each sensor at all points in time. This uncertainty is inherent in these systems due to measurement and sampling errors, and resource limitations. In order to avoid drawing erroneous conclusions based upon stale data, the use of uncertainty intervals that model each data item as a range and associated probability density function (pdf) rather than a single value has recently been proposed. Querying these uncertain data introduces imprecision into answers, in the form of probability values that specify the likeliness the answer satisfies the query. These queries are more expensive to evaluate than their traditional counterparts but are guaranteed to be correct and more informative due to the probabilities accompanying the answers. Although the answer probabilities are useful, for many applications, it is only necessary to know whether the probability exceeds a given threshold – we term these Probabilistic Threshold Queries (PTQ). In this paper we address the efficient computation of these types of queries. In particular, we develop two index structures and associated algorithms to efficiently answer PTQs. The first index scheme is based on the idea of augmenting uncertainty information to an Rtree. We establish the difficulty
Algorithms and Data Structures for External Memory
, 2006
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. Algorithms and Data Str ..."
Abstract

Cited by 84 (6 self)
 Add to MetaCart
(Show Context)
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. Algorithms and Data Structures for External Memory surveys the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs. A variety of EM paradigms are considered for solving batched and online problems efficiently in external memory. Algorithms and Data Structures for External Memory describes several useful paradigms for the design and implementation of efficient EM algorithms and data structures. The problem domains considered include sorting, permuting, FFT, scientific computing, computational geometry, graphs, databases, geographic information systems, and text and string processing. Algorithms and Data Structures for External Memory is an invaluable reference for anybody interested in, or conducting research in the design, analysis, and implementation of algorithms and data structures. This book is originally published as
Arrangements and Their Applications
 Handbook of Computational Geometry
, 1998
"... The arrangement of a finite collection of geometric objects is the decomposition of the space into connected cells induced by them. We survey combinatorial and algorithmic properties of arrangements of arcs in the plane and of surface patches in higher dimensions. We present many applications of arr ..."
Abstract

Cited by 81 (17 self)
 Add to MetaCart
(Show Context)
The arrangement of a finite collection of geometric objects is the decomposition of the space into connected cells induced by them. We survey combinatorial and algorithmic properties of arrangements of arcs in the plane and of surface patches in higher dimensions. We present many applications of arrangements to problems in motion planning, visualization, range searching, molecular modeling, and geometric optimization. Some results involving planar arrangements of arcs have been presented in a companion chapter in this book, and are extended in this chapter to higher dimensions. Work by P.A. was supported by Army Research Office MURI grant DAAH049610013, by a Sloan fellowship, by an NYI award, and by a grant from the U.S.Israeli Binational Science Foundation. Work by M.S. was supported by NSF Grants CCR9122103 and CCR9311127, by a MaxPlanck Research Award, and by grants from the U.S.Israeli Binational Science Foundation, the Israel Science Fund administered by the Israeli Ac...
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynami ..."
Abstract

Cited by 73 (31 self)
 Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
Indexing Text using the ZivLempel Trie
 Journal of Discrete Algorithms
, 2002
"... Let a text of u characters over an alphabet of size be compressible to n symbols by the LZ78 or LZW algorithm. We show that it is possible to build a data structure based on the ZivLempel trie that takes 4n log 2 n(1+o(1)) bits of space and reports the R occurrences of a pattern of length m in ..."
Abstract

Cited by 69 (44 self)
 Add to MetaCart
(Show Context)
Let a text of u characters over an alphabet of size be compressible to n symbols by the LZ78 or LZW algorithm. We show that it is possible to build a data structure based on the ZivLempel trie that takes 4n log 2 n(1+o(1)) bits of space and reports the R occurrences of a pattern of length m in worst case time O(m log(m)+(m+R)log n).
Range Searching
, 1996
"... Range searching is one of the central problems in computational geometry, because it arises in many applications and a wide variety of geometric problems can be formulated as a rangesearching problem. A typical rangesearching problem has the following form. Let S be a set of n points in R d , an ..."
Abstract

Cited by 69 (1 self)
 Add to MetaCart
Range searching is one of the central problems in computational geometry, because it arises in many applications and a wide variety of geometric problems can be formulated as a rangesearching problem. A typical rangesearching problem has the following form. Let S be a set of n points in R d , and let R be a family of subsets; elements of R are called ranges . We wish to preprocess S into a data structure so that for a query range R, the points in S " R can be reported or counted efficiently. Typical examples of ranges include rectangles, halfspaces, simplices, and balls. If we are only interested in answering a single query, it can be done in linear time, using linear space, by simply checking for each point p 2 S whether p lies in the query range.