Results 1  10
of
24
A Survey on Representation, Composition and Application of Preferences in Database Systems
 ACM TODS
, 2011
"... Preferences have been traditionally studied in philosophy, psychology, and economics and applied to decision making problems. Recently, they have attracted the attention of researchers in other fields, such as databases where they capture soft criteria for queries. Databases bring a whole fresh pers ..."
Abstract

Cited by 30 (6 self)
 Add to MetaCart
(Show Context)
Preferences have been traditionally studied in philosophy, psychology, and economics and applied to decision making problems. Recently, they have attracted the attention of researchers in other fields, such as databases where they capture soft criteria for queries. Databases bring a whole fresh perspective to the study of preferences, both computational and representational. From a representational perspective, the central question is how we can effectively represent preferences and incorporate them in database querying. From a computational perspective, we can look at how we can efficiently process preferences in the context of database queries. Several approaches have been proposed but a systematic study of these works is missing. The purpose of this survey is to provide a framework for placing existing works in perspective and highlight critical open challenges to serve as a springboard for researchers in database systems. We organize our study around three axes: preference representation, preference composition, and preference query processing.
Consensus answers for queries over probabilistic databases
 in PODS
, 2009
"... We address the problem of finding a “best ” deterministic query answer to a query over a probabilistic database. For this purpose, we propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes the expected distance to the possible worlds (a ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
(Show Context)
We address the problem of finding a “best ” deterministic query answer to a query over a probabilistic database. For this purpose, we propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes the expected distance to the possible worlds (answers). This problem can be seen as a generalization of the wellstudied inconsistent information aggregation problems (e.g. rank aggregation) to probabilistic databases. We consider this problem for various types of queries including SPJ queries, Topk ranking queries, groupby aggregate queries, and clustering. For different distance metrics, we obtain polynomial time optimal or approximation algorithms for computing the consensus answers (or prove NPhardness). Most of our results are for a general probabilistic database model, called and/xor tree model, which significantly generalizes previous probabilistic database models like xtuples and blockindependent disjoint models, and is of independent interest.
Ranking with Uncertain Scoring Functions: Semantics and Sensitivity Measures
"... Ranking queries report the topK results according to a userdefined scoring function. A widely used scoring function is the weighted summation of multiple scores. Often times, users cannot precisely specify the weights in such functions in order to produce the preferred order of results. Adopting u ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
Ranking queries report the topK results according to a userdefined scoring function. A widely used scoring function is the weighted summation of multiple scores. Often times, users cannot precisely specify the weights in such functions in order to produce the preferred order of results. Adopting uncertain/incomplete scoring functions (e.g., using weight ranges and partiallyspecified weight preferences) can better capture user’s preferences in this scenario. In this paper, we study two aspects in uncertain scoring functions. The first aspect is the semantics of ranking queries, and the second aspect is the sensitivity of computed results to refinements made by the user. We formalize and solve multiple problems under both aspects, and present novel techniques that compute query results efficiently to comply with the interactive nature of these problems.
Probabilistic Similarity Search for Uncertain Time Series
, 2009
"... A probabilistic similarity query over uncertain data assigns to each uncertain database object o a probability indicating the likelihood that o meets the query predicate. In this paper, we formalize the notion of uncertain time series and introduce two novel and important types of probabilistic rang ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
A probabilistic similarity query over uncertain data assigns to each uncertain database object o a probability indicating the likelihood that o meets the query predicate. In this paper, we formalize the notion of uncertain time series and introduce two novel and important types of probabilistic range queries over uncertain time series. Furthermore, we propose an original approximate representation of uncertain time series that can be used to efficiently support both new query types by upper and lower bounding the Euclidean distance.
Scalable Probabilistic Similarity Ranking in Uncertain Databases
"... This paper introduces a scalable approach for probabilistic topk similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that are assumed to be mutuallyexclusive. The objective is to rank the uncertain data according to their distance to a ref ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
This paper introduces a scalable approach for probabilistic topk similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that are assumed to be mutuallyexclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several stateoftheart probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper we theoretically as well as experimentally show that our framework reduces this to a lineartime complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic topk ranking for the objects, according to different stateoftheart definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.
Ranking Continuous Probabilistic Datasets
"... Ranking is a fundamental operation in data analysis and decision support, and plays an even more crucial role if the dataset being explored exhibits uncertainty. This has led to much work in understanding how to rank uncertain datasets in recent years. In this paper, we address the problem of rankin ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Ranking is a fundamental operation in data analysis and decision support, and plays an even more crucial role if the dataset being explored exhibits uncertainty. This has led to much work in understanding how to rank uncertain datasets in recent years. In this paper, we address the problem of ranking when the tuple scores are uncertain, and the uncertainty is captured using continuous probability distributions (e.g. Gaussian distributions). We present a comprehensive solution to compute the values of a parameterized ranking function (P RF) [18] for arbitrary continuous probability distributions (and thus rank the uncertain dataset); P RF can be used to simulate or approximate many other ranking functions proposed in prior work. We develop exact polynomial time algorithms for some continuous probability distribution classes, and efficient approximation schemes with provable guarantees for arbitrary probability distributions. Our algorithms can also be used for exact or approximate evaluation of knearest neighbor queries over uncertain objects, whose positions are modeled using continuous probability distributions. Our experimental evaluation over several datasets illustrates the effectiveness of our approach at efficiently ranking uncertain datasets with continuous attribute uncertainty. 1.
A novel probabilistic pruning approach to speed up similarity queries in uncertain databases
"... Abstract — In this paper, we propose a novel, effective and efficient probabilistic pruning criterion for probabilistic similarity queries on uncertain data. Our approach supports a general uncertainty model using continuous probabilistic density functions to describe the (possibly correlated) uncer ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper, we propose a novel, effective and efficient probabilistic pruning criterion for probabilistic similarity queries on uncertain data. Our approach supports a general uncertainty model using continuous probabilistic density functions to describe the (possibly correlated) uncertain attributes of objects. In a nutshell, the problem to be solved is to compute the PDF of the random variable denoted by the probabilistic domination count: Given an uncertain database object B, an uncertain reference object R and a set D of uncertain database objects in a multidimensional space, the probabilistic domination count denotes the number of uncertain objects in D that are closer to R than B. This domination count can be used to answer a wide range of probabilistic similarity queries. Specifically, we propose a novel geometric pruning filter and introduce an iterative filterrefinement strategy for conservatively and progressively estimating the probabilistic domination count in an efficient way while keeping correctness according to the possible world semantics. In an experimental evaluation, we show that our proposed technique allows to acquire tight probability bounds for the probabilistic domination count quickly, even for large uncertain databases. I.
Semantics of Ranking Queries for Probabilistic Data
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
"... Recently, there have been several attempts to propose definitions and algorithms for ranking queries on probabilistic data. However, these lack many intuitive properties of a topk over deterministic data. We define numerous fundamental properties, including exactk, containment, uniquerank, value ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Recently, there have been several attempts to propose definitions and algorithms for ranking queries on probabilistic data. However, these lack many intuitive properties of a topk over deterministic data. We define numerous fundamental properties, including exactk, containment, uniquerank, valueinvariance, and stability, which are satisfied by ranking queries on certain data. We argue these properties should also be carefully studied in defining ranking queries in probabilistic data, and fulfilled by definition for ranking uncertain data for most applications. We propose an intuitive new ranking definition based on the observation that the ranks of a tuple across all possible worlds represent a wellfounded rank distribution. We studied the ranking definitions based on the expectation, the median and other statistics of this rank distribution for a tuple and derived the expected rank, median rank and quantile rank correspondingly. We are able to prove that the expected rank, median rank and quantile rank satisfy all these properties for a ranking query. We provide efficient solutions to compute such rankings across the major models of uncertain data, such as attributelevel and tuplelevel uncertainty. Finally, a comprehensive experimental study confirms the effectiveness of our approach.
Skyline Query Processing for Uncertain Data
 In Proceedings of the International Conference on Information and Knowledge Managemen, CIKM
, 2010
"... Recently, several research efforts have addressed answering skyline queries efficiently over large datasets. However, this research lacks methods to compute these queries over uncertain data, where uncertain values are represented as a range. In this paper, we define skyline queries over continuous ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Recently, several research efforts have addressed answering skyline queries efficiently over large datasets. However, this research lacks methods to compute these queries over uncertain data, where uncertain values are represented as a range. In this paper, we define skyline queries over continuous uncertain data, and propose a novel, efficient framework to answer these queries. Query answers are probabilistic, where each object is associated with a probability value of being a query answer. Typically, users specify a probability threshold, that each returned object must exceed, and a tolerance value that defines the allowed error margin in probability calculation to reduce the computational overhead. Our framework employs an efficient twophase query processing algorithm.
Efficient Rank Based KNN Query Processing Over Uncertain Data
"... Abstract — Uncertain data are inherent in many applications such as environmental surveillance and quantitative economics research. As an important problem in many applications, KNN query has been extensively investigated in the literature. In this paper, we study the problem of processing rank base ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract — Uncertain data are inherent in many applications such as environmental surveillance and quantitative economics research. As an important problem in many applications, KNN query has been extensively investigated in the literature. In this paper, we study the problem of processing rank based KNN query against uncertain data. Besides applying the expected rank semantic to compute KNN, we also introduce the median rank which is less sensitive to the outliers. We show both ranking methods satisfy nice topk properties such as exactk, containment, unique ranking, value invariance, stability and fairfulness. For given query q, IO and CPU efficient algorithms are proposed in the paper to compute KNN based on expected (median) ranks of the uncertain objects. To tackle the correlations of the uncertain objects and high IO cost caused by large number of instances of the uncertain objects, randomized algorithms are proposed to approximately compute KNN with theoretical guarantees. Comprehensive experiments are conducted on both real and synthetic data to demonstrate the efficiency of our techniques. I.