Results 1  10
of
53
Semantics of ranking queries for probabilistic data and expected ranks
 In Proc. of ICDE’09
, 2009
"... Abstract — When dealing with massive quantities of data, topk queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditi ..."
Abstract

Cited by 63 (1 self)
 Add to MetaCart
(Show Context)
Abstract — When dealing with massive quantities of data, topk queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditional database settings. The importance of the topk is perhaps even greater in probabilistic databases, where a relation can encode exponentially many possible worlds. There have been several recent attempts to propose definitions and algorithms for ranking queries over probabilistic data. However, these all lack many of the intuitive properties of a topk over deterministic data. Specifically, we define a number of fundamental properties, including exactk, containment, uniquerank, valueinvariance, and stability, which are all satisfied by ranking queries on certain data. We argue that all these conditions should also be fulfilled by any reasonable definition for ranking uncertain data. Unfortunately, none of the existing definitions is able to achieve this. To remedy this shortcoming, this work proposes an intuitive new approach of expected rank. This uses the wellfounded notion of the expected rank of each tuple across all possible worlds as the basis of the ranking. We are able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query. We provide efficient solutions to compute this ranking across the major models of uncertain data, such as attributelevel and tuplelevel uncertainty. For an uncertain relation of N tuples, the processing cost is O(N log N)—no worse than simply sorting the relation. In settings where there is a high cost for generating each tuple in turn, we provide pruning techniques based on probabilistic tail bounds that can terminate the search early and guarantee that the topk has been found. Finally, a comprehensive experimental study confirms the effectiveness of our approach. I.
Computing all skyline probabilities for uncertain data
 In PODS
, 2009
"... Skyline computation is widely used in multicriteria decision making. As research in uncertain databases draws increasing attention, skyline queries with uncertain data have also been studied, e.g. probabilistic skylines. The previous work requires “thresholding ” for its efficiency – the efficiency ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
(Show Context)
Skyline computation is widely used in multicriteria decision making. As research in uncertain databases draws increasing attention, skyline queries with uncertain data have also been studied, e.g. probabilistic skylines. The previous work requires “thresholding ” for its efficiency – the efficiency relies on the assumption that points with skyline probabilities below a certain threshold can be ignored. But there are situations where “thresholding”is not desirable – low probability events cannot be ignored when their consequences are significant. In such cases it is necessary to compute skyline probabilities of all data items. We provide the first algorithm for this problem whose worstcase time complexity is subquadratic. The techniques we use are interesting in their own right, as they rely on a space partitioning technique combined with using the existing dominance counting algorithm. The effectiveness of our algorithm is experimentally verified. Categories and Subject Descriptors H.2.4 [Database Management]: Systems—Query processing;
Probabilistic Reverse Nearest Neighbor Queries on Uncertain Data
 TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
"... Uncertain data is inherent in various important applications and reverse nearest neighbor (RNN) query is an important query type for many applications. While many different types of queries have been studied on uncertain data, there is no previous work on answering RNN queries on uncertain data. In ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
Uncertain data is inherent in various important applications and reverse nearest neighbor (RNN) query is an important query type for many applications. While many different types of queries have been studied on uncertain data, there is no previous work on answering RNN queries on uncertain data. In this paper, we formalize probabilistic reverse nearest neighbor query that is to retrieve the objects from the uncertain data that have higher probability than a given threshold to be the RNN of an uncertain query object. We develop an efficient algorithm based on various novel pruning approaches that solves the probabilistic RNN queries on multidimensional uncertain data. The experimental results demonstrate that our algorithm is even more efficient than a samplingbased approximate algorithm for most of the cases and is highly scalable.
Consensus answers for queries over probabilistic databases
 in PODS
, 2009
"... We address the problem of finding a “best ” deterministic query answer to a query over a probabilistic database. For this purpose, we propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes the expected distance to the possible worlds (a ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
(Show Context)
We address the problem of finding a “best ” deterministic query answer to a query over a probabilistic database. For this purpose, we propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes the expected distance to the possible worlds (answers). This problem can be seen as a generalization of the wellstudied inconsistent information aggregation problems (e.g. rank aggregation) to probabilistic databases. We consider this problem for various types of queries including SPJ queries, Topk ranking queries, groupby aggregate queries, and clustering. For different distance metrics, we obtain polynomial time optimal or approximation algorithms for computing the consensus answers (or prove NPhardness). Most of our results are for a general probabilistic database model, called and/xor tree model, which significantly generalizes previous probabilistic database models like xtuples and blockindependent disjoint models, and is of independent interest.
Probabilistic threshold k nearest neighbor queries over moving objects in symbolic indoor space
 In Proc. EDBT
, 2010
"... The availability of indoor positioning renders it possible to deploy locationbased services in indoor spaces. Many such services will benefit from the efficient support for k nearest neighbor (kNN) queries over large populations of indoor moving objects. However, existing kNN techniques fall short ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
(Show Context)
The availability of indoor positioning renders it possible to deploy locationbased services in indoor spaces. Many such services will benefit from the efficient support for k nearest neighbor (kNN) queries over large populations of indoor moving objects. However, existing kNN techniques fall short in indoor spaces because these differ from Euclidean and spatial network spaces and because of the limited capabilities of indoor positioning technologies. To contend with indoor settings, we propose the new concept of minimal indoor walking distance (MIWD) along with algorithms and data structures for distance computing and storage; and we differentiate the states of indoor moving objects based on a positioning device deployment graph, utilize these states in effective object indexing structures, and capture the uncertainty of object locations. On these foundations, we study the probabilistic threshold kNN (PTkNN) query. Given a query location q and a probability threshold T, this query returns all subsets of k objects that have probability larger than T of containing the kNN query result of q. We propose a combination of three techniques for processing this query. The first uses the MIWD metric to prune objects that are too far away. The second uses fast probability estimates to prune unqualified objects and candidate result subsets. The third uses efficient probability evaluation for computing the final result on the remaining candidate subsets. An empirical study using both synthetic and real data shows that the techniques are efficient.
Energy management in the
 IEEE 802.16e MAC, IEEE Communications Letters
, 2006
"... Abstract—With the rapid development of various optical, infrared, and radar sensors and GPS techniques, there are a huge amount of multidimensional uncertain data collected and accumulated everyday. Recently, considerable research efforts have been made in the field of indexing, analyzing, and minin ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
Abstract—With the rapid development of various optical, infrared, and radar sensors and GPS techniques, there are a huge amount of multidimensional uncertain data collected and accumulated everyday. Recently, considerable research efforts have been made in the field of indexing, analyzing, and mining uncertain data. As shown in a recent book [2] on uncertain data, in order to efficiently manage and mine uncertain data, effective indexing techniques are highly desirable. Based on the observation that the existing index structures for multidimensional data are sensitive to the size or shape of uncertain regions of uncertain objects and the queries, in this paper, we introduce a novel RTreebased inverted index structure, named UITree, to efficiently support various queries including range queries, similarity joins, and their size estimation, as well as topk range query, over multidimensional uncertain objects against continuous or discrete cases. Comprehensive experiments are conducted on both real data and synthetic data to demonstrate the efficiency of our techniques. Index Terms—Uncertain, index, range query, partition. Ç
NearestNeighbor Searching Under Uncertainty
, 2012
"... Nearestneighbor queries, which ask for returning the nearest neighbor of a query point in a set of points, are important and widely studied in many fields because of a wide range of applications. In many of these applications, such as sensor databases, location based services, face recognition, and ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
Nearestneighbor queries, which ask for returning the nearest neighbor of a query point in a set of points, are important and widely studied in many fields because of a wide range of applications. In many of these applications, such as sensor databases, location based services, face recognition, and mobile data, the location of data is imprecise. We therefore study nearest neighbor queries in a probabilistic framework in which the location of each input point and/or query point is specified as a probability density function and the goal is to return the point that minimizes the expected distance, which we refer to as the expected nearest neighbor (ENN). We present methods for computing an exact ENN or an εapproximate ENN, for a given error parameter 0 < ε < 1, under different distance functions. These methods build an index of nearlinear size and answer ENN queries in polylogarithmic or sublinear time, depending on the underlying function. As far as we know, these are the first nontrivial methods for answering exact or εapproximate ENN queries with provable performance guarantees.
UVDiagram: A Voronoi Diagram for Uncertain Data
, 2009
"... The Voronoi diagram is an important technique for answering nearestneighbor queries for spatial databases. In this paper, we study how the Voronoi diagram can be used on uncertain data, which are inherent in scientific and business applications. In particular, we propose the UncertainVoronoi Diagr ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
The Voronoi diagram is an important technique for answering nearestneighbor queries for spatial databases. In this paper, we study how the Voronoi diagram can be used on uncertain data, which are inherent in scientific and business applications. In particular, we propose the UncertainVoronoi Diagram (or UVdiagram in short). Conceptually, the data space is divided into distinct “UVpartitions”, where each UVpartition P is associated with a set S of objects; any point q located in P has the set S as its nearest neighbor with nonzero probabilities. The UVdiagram facilitates queries that inquire objects for having nonzero chances of being the nearest neighbor of a given query point. It also allows analysis of nearest neighbor information, e.g., finding out how many objects are the nearest neighbors in a given area. However, a UVdiagram requires exponential construction and storage costs. To tackle these problems, we devise an alternative representation for UVpartitions, and develop an adaptive index for the UVdiagram. This index can be constructed in polynomial time. We examine how it can be extended to support other related queries. We also perform extensive experiments to validate the effectiveness of our approach.
Effectively Indexing Uncertain Moving Objects for Predictive Queries
"... Moving object indexing and query processing is a well studied research topic, with applications in areas such as intelligent transport systems and locationbased services. While much existing work explicitly or implicitly assumes a deterministic object movement model, realworld objects often move i ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
Moving object indexing and query processing is a well studied research topic, with applications in areas such as intelligent transport systems and locationbased services. While much existing work explicitly or implicitly assumes a deterministic object movement model, realworld objects often move in more complex and stochastic ways. This paper investigates the possibility of a marriage between movingobject indexing and probabilistic object modeling. Given the distributions of the current locations and velocities of moving objects, we devise an efficient inference method for the prediction of future locations. We demonstrate that such prediction can be seamlessly integrated into existing index structures designed for moving objects, thus improving the meaningfulness of range and nearest neighbor query results in highly dynamic and uncertain environments. The paper reports on extensive experiments on the B xtree that offer insights into the properties of the paper’s proposal. 1.
Boosting spatial pruning: on optimal pruning of mbrs
 In SIGMOD
, 2010
"... Fast query processing of complex objects, e.g. spatial or uncertain objects, depends on efficient spatial pruning of the objects ’ approximations, which are typically minimum bounding rectangles (MBRs). In this paper, we propose a novel effective and efficient criterion to determine the spatial topo ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
(Show Context)
Fast query processing of complex objects, e.g. spatial or uncertain objects, depends on efficient spatial pruning of the objects ’ approximations, which are typically minimum bounding rectangles (MBRs). In this paper, we propose a novel effective and efficient criterion to determine the spatial topology between multidimensional rectangles. Given three rectangles R, A, and B in a multidimensional space, the task is to determine whether A is definitely closer to R than B. This domination relation is used in many applications to perform spatial pruning. Traditional techniques apply spatial pruning based on minimal and maximal distance. These techniques however show significant deficiencies in terms of effectivity. We prove that our decision criterion is correct, complete, and efficient to compute even for high dimensional databases. In addition, we tackle the problem of computing the number of objects dominating an object o. The challenge here is to incorporate objects that only partially dominate o. In this work we will show how to detect such partial domination topology by using a modified version of our decision criterion. We propose strategies for conservatively and progressively estimating the total number of objects dominating an object. Our experiments show that the new pruning criterion, albeit very general and widely applicable, significantly outperforms current stateoftheart pruning criteria.