Results 1  10
of
47
Approximating MultiDimensional Aggregate Range Queries Over Real Attributes
, 2000
"... Finding approximate answers to multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a quer ..."
Abstract

Cited by 85 (9 self)
 Add to MetaCart
Finding approximate answers to multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a query that specifies a range in each dimension, find a good approximation of the number of records in the table that satisfy the query. We present a new histogram technique that is designed to approximate the density of multidimensional datasets with real attributes. Our technique finds buckets of variable size, and allows the buckets to overlap. Overlapping buckets allow more efficient approximation of the density. The size of the cells is based on the local density of the data. This technique leads to a faster and more compact approximation of the data distribution. We also show how to generalize kernel density estimators, and how to apply them on the multidimensional query approxim...
On rectangular partitionings in two dimensions: algorithms, complexity and applications
, 1998
"... Partitioning a multidimensional data set into rectangular partitions subject to certain constraints is an important problem that arises in many database applications, including histogrambased selectivity estimation, loadbalancing, and construction of index structures. While provably optimal and ..."
Abstract

Cited by 57 (7 self)
 Add to MetaCart
(Show Context)
Partitioning a multidimensional data set into rectangular partitions subject to certain constraints is an important problem that arises in many database applications, including histogrambased selectivity estimation, loadbalancing, and construction of index structures. While provably optimal and efficient algorithms exist for partitioning onedimensional data, the multidimensional problem has received less attention, except for a few special cases. As a result, the heuristic partitioning techniques that are used in practice are not well understood, and come with no guarantees on the quality of the solution. In this paper, we present algorithmic and complexitytheoretic results for the fundamental problem of partitioning a twodimensional array into rectangular tiles of arbitrary size in a way that minimizes the number of tiles required to satisfy a given constraint. Our main results are approximation algorithms for several partitioning problems that provably approximate the optimal solutions within small constant factors, and that run in linear or close to linear time. We also establish the NPhardness of several partitioning problems, therefore it is unlikely that there are efficient, i.e., polynomial time, algorithms for solving these problems exactly. We also discuss a few applications in which partitioning problems arise. One of the applications is the problem of constructing multidimensional histograms. Our results, for example, give an efficient algorithm to construct the VOptimal histograms which are known to be the most accurate histograms in several selectivity estimation problems. Our algorithms are the first to provide guaranteed bounds on the quality of the solution.
PolynomialTime Approximation Schemes for Packing and Piercing Fat Objects
 JOURNAL OF ALGORITHMS
, 2001
"... We consider two problems: given a collection of n fat objects in a xed dimension, 1. (packing) nd the maximum subcollection of pairwise disjoint objects, and 2. (piercing) nd the minimum point set that intersects every object. Recently, Erlebach, Jansen, and Seidel gave a polynomialtime approxim ..."
Abstract

Cited by 50 (5 self)
 Add to MetaCart
(Show Context)
We consider two problems: given a collection of n fat objects in a xed dimension, 1. (packing) nd the maximum subcollection of pairwise disjoint objects, and 2. (piercing) nd the minimum point set that intersects every object. Recently, Erlebach, Jansen, and Seidel gave a polynomialtime approximation scheme (PTAS) for the packing problem, based on a shifted hierarchical subdivision method. Using shifted quadtrees, we describe a similar algorithm for packing but with a smaller time bound. Erlebach et al.'s algorithm requires polynomial space. We describe a dierent algorithm, based on geometric separators, that requires only linear space. This algorithm can also be applied to piercing, yielding the rst PTAS for that problem. Abbreviated title. Packing and Piercing Fat Objects.
Maximum Independent Set of Rectangles
"... We study the Maximum Independent Set of Rectangles (MISR) problem: given a collection R of n axisparallel rectangles, find a maximumcardinality subset of disjoint rectangles. MISR is a special case of the classical Maximum Independent Set problem, where the input is restricted to intersection grap ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
We study the Maximum Independent Set of Rectangles (MISR) problem: given a collection R of n axisparallel rectangles, find a maximumcardinality subset of disjoint rectangles. MISR is a special case of the classical Maximum Independent Set problem, where the input is restricted to intersection graphs of axisparallel rectangles. Due to its many applications, ranging from map labeling to data mining, MISR has received a significant amount of attention from various research communities. Since the problem is NPhard, the main focus has been on the design of approximation algorithms. Several groups of researches have independently suggested O(log n)approximation algorithms for MISR, and this remained the best currently known approximation factor for the problem. The main result of our paper is an O(log log n)approximation algorithm for MISR. Our algorithm combines existing approaches for solving special cases of the problem, in which the input set of rectangles is restricted to containing specific intersection types, with new insights into the combinatorial structure of sets of intersecting rectangles in the plane. We also consider a generalization of MISR to higher dimensions, where rectangles are replaced by ddimensional hyperrectangles. Our results for MISR imply an O((log n) d−2 log log n)approximation algorithm for this problem, improving upon the best previously known O((log n) d−1)approximation.
Selectivity estimators for multidimensional range queries over real attributes
 The VLDB Journal
, 2005
"... Abstract Estimating the selectivity of multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
Abstract Estimating the selectivity of multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a query that specifies a range in each dimension, find a good approximation of the number of records in the table that satisfy the query. The simplest approach to tackle this problem is to assume that the attributes are independent. More accurate estimators try to capture the joint data distribution of the attributes. In databases, such estimators include the construction of multidimensional histograms, random sampling, or the wavelet transform. In statistics, kernel estimation techniques are being used. Many traditional approaches assume that attribute values come from discrete, finite domains, where different values have high frequencies. However, for many novel applications (as in temporal, spatial and multimedia databases) attribute values come from the infinite domain of real numbers. Consequently, each value appears very infrequently, a characteristic that affects the behavior and effectiveness of the estimator. Moreover, real life data exhibit attribute correlations which also affect the estimator. We present a new histogram technique that is designed to approximate the density of multidimensional datasets with real attributes. Our technique defines buckets of variable size, and allows the buckets to overlap. The size of the cells is based on the local density of the data. The use of overlapping buckets allows a more compact approximation of the data distribution. We also show how to generalize kernel density estimators, and how to apply them on the multidimensional query approximation problem. Finally, we compare the accuracy of the proposed techniques with existing techniques using real and synthetic datasets. The experimental results show that the proposed techniques behave more accurately in high dimensionalities than previous approaches.
On multidimensional kanonymity with local recoding generalization
 In Proc. of International Conference on Data Engineering (ICDE
, 2007
"... This paper presents the first theoretical study, on using localrecoding generalization (LRG) to compute a kanonymous table with quality guarantee. First, we prove that it is NPhard both to find the table with the maximum quality, and to discover a solution with an approximation ratio at most 5/4. ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
(Show Context)
This paper presents the first theoretical study, on using localrecoding generalization (LRG) to compute a kanonymous table with quality guarantee. First, we prove that it is NPhard both to find the table with the maximum quality, and to discover a solution with an approximation ratio at most 5/4. Then, we develop an algorithm with good balance between the approximation ratio and time complexity. The quality of our solution is verified by experiments. 1
Improved Approximation Algorithms for Rectangle Tiling and Packing (Extended Abstract)
 Proc. 12th ACMSIAM Symp. on Disc. Alg
, 2001
"... ) 1 Introduction We study several rectangle tiling and packing problems. These are natural combinatorial problems that arise in many applications in databases, parallel computing and image processing. We present new approximation algorithms for these problems. In contrast to the previously known r ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
(Show Context)
) 1 Introduction We study several rectangle tiling and packing problems. These are natural combinatorial problems that arise in many applications in databases, parallel computing and image processing. We present new approximation algorithms for these problems. In contrast to the previously known results, we meet a crucial demand of most of these applications, namely, our algorithms work on sparse inputs and/or high dimensions very e#ciently. In addition, our algorithms have better approximation bounds than known algorithms. Furthermore, the algorithms are simple to implement. In what follows, we will first formally define the problems before presenting our results. 1.1 The Rectangle Tiling and Packing Problems We study the following two classes of problems. RTILE problem. Given a two dimensional array A 1 of size n n containing nonnegative integers 2 , partition A into at most p rectangular tiles so that the maximum weight of any tile is minimized (a tile is any rectangular ...
Geometric Algorithms for Optimal Airspace Design and Air Traffic Controller Workload Balancing Abstract
"... The National Airspace System (NAS) is designed to accommodate a large number of flights over North America. For purposes of workload limitations for air traffic controllers, the airspace is partitioned into approximately 600 sectors; each sector is observed by one or more controllers. In order to sa ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
The National Airspace System (NAS) is designed to accommodate a large number of flights over North America. For purposes of workload limitations for air traffic controllers, the airspace is partitioned into approximately 600 sectors; each sector is observed by one or more controllers. In order to satisfy workload limitations for controllers, it is important that sectors be designed carefully according to the traffic patterns of flights, so that no sector becomes overloaded. We formulate and study the airspace sectorization problem from an algorithmic point of view, modeling the problem of optimal sectorization as a geometric partition problem with constraints. The novelty of the problem is that it partitions data consisting of trajectories of moving points, rather than static point set partitioning that is commonly studied. First, we formulate and solve the 1d version of the problem, showing how to partition a line into “sectors ” (intervals) according to historical trajectory data. Then, we apply the 1D solution framework to design a 2D sectorization heuristic based on binary space partitions. We also devise partitions based on balanced “pie partitions ” of a convex polygon. We evaluate our 2D algorithms experimentally. We conduct experiments using actual historical flight track data for the NAS as the basis of our partitioning. We compare the workload balance of our methods to that of the existing set of sectors for the NAS and find that our resectorization yields competitive and improved workload balancing. In particular, our methods yield an improvement by a factor between 2 and 3 over the current sectorization in terms of the timeaverage and the worstcase workloads of the maximum workload sector. An even better improvement is seen in the standard deviations (over all sectors) of both timeaverage and worstcase workloads. 75 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited 1
A constant factor approximation algorithm for unsplittable flow on paths
 In Proceedings of the 52th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2011
, 2011
"... ar ..."
(Show Context)