Results 1  10
of
278
Efficient Query Evaluation on Probabilistic Databases
, 2004
"... We describe a system that supports arbitrarily complex SQL queries with ”uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attentio ..."
Abstract

Cited by 456 (47 self)
 Add to MetaCart
We describe a system that supports arbitrarily complex SQL queries with ”uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attention in the past. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #Pcomplete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a MonteCarlo simulation algorithm.
ModelDriven Data Acquisition in Sensor Networks
 IN VLDB
, 2004
"... Declarative queries are proving to be an attractive paradigm for interacting with networks of wireless sensors. The metaphor that "the sensornet is a database" is problematic, however, because sensors do not exhaustively represent the data in the real world. In order to map the raw sensor ..."
Abstract

Cited by 449 (36 self)
 Add to MetaCart
(Show Context)
Declarative queries are proving to be an attractive paradigm for interacting with networks of wireless sensors. The metaphor that "the sensornet is a database" is problematic, however, because sensors do not exhaustively represent the data in the real world. In order to map the raw sensor readings onto physical reality, a model of that reality is required to complement the readings. In this paper, we enrich interactive sensor querying with statistical modeling techniques. We demonstrate that such models can help provide answers that are both more meaningful, and, by introducing approximations with probabilistic confidences, significantly more efficient to compute in both time and energy. Utilizing the combination of a model and live data acquisition raises the challenging optimization problem of selecting the best sensor readings to acquire, balancing the increase in the confidence of our answer against the communication and data acquisition costs in the network. We describe an exponential time algorithm for finding the optimal solution to this optimization problem, and a polynomialtime heuristic for identifying solutions that perform well in practice. We evaluate our approach on several realworld sensornetwork data sets, taking into account the real measured data and communication quality, demonstrating that our modelbased approach provides a highfidelity representation of the real phenomena and leads to significant performance gains versus traditional data acquisition techniques.
Trio: a system for integrated management of data, accuracy, and lineage
 PRESENTED AT CIDR 2005
, 2005
"... Trio is a new database system that manages not only data, butalsotheaccuracy and lineage of the data. Inexact (uncertain, probabilistic, fuzzy, approximate, incomplete, and imprecise!) databases have been proposed in the past, and the lineage problem also has been studied. The goals of the Trio proj ..."
Abstract

Cited by 268 (17 self)
 Add to MetaCart
Trio is a new database system that manages not only data, butalsotheaccuracy and lineage of the data. Inexact (uncertain, probabilistic, fuzzy, approximate, incomplete, and imprecise!) databases have been proposed in the past, and the lineage problem also has been studied. The goals of the Trio project are to combine and distill previous work into a simple and usable model, design a query language as an understandable extension to SQL, and most importantly build a working system—a system that augments conventional data management with both accuracy and lineage as an integral part of the data. This paper provides numerous motivating applications for Trio and lays out preliminary plans for the data model, query language, and prototype system.
Efficient topk query evaluation on probabilistic data
 in ICDE
, 2007
"... Modern enterprise applications are forced to deal with unreliable, inconsistent and imprecise information. Probabilistic databases can model such data naturally, but SQL query evaluation on probabilistic databases is difficult: previous approaches have either restricted the SQL queries, or computed ..."
Abstract

Cited by 182 (32 self)
 Add to MetaCart
(Show Context)
Modern enterprise applications are forced to deal with unreliable, inconsistent and imprecise information. Probabilistic databases can model such data naturally, but SQL query evaluation on probabilistic databases is difficult: previous approaches have either restricted the SQL queries, or computed approximate probabilities, or did not scale, and it was shown recently that precise query evaluation is theoretically hard. In this paper we describe a novel approach, which computes and ranks efficiently the topk answers to a SQL query on a probabilistic database. The restriction to topk answers is natural, since imprecisions in the data often lead to a large number of answers of low quality, and users are interested only in the answers with the highest probabilities. The idea in our algorithm is to run in parallel several MonteCarlo simulations, one for each candidate answer, and approximate each probability only to the extent needed to compute correctly the topk answers. The algorithms is in a certain sense provably optimal and scales to large databases: we have measured running times of 5 to 50 seconds for complex SQL queries over a large database (10M tuples of which 6M probabilistic). Additional contributions of the paper include several optimization techniques, and a simple data model for probabilistic data that achieves completeness by using SQL views. 1
Representing and querying correlated tuples in probabilistic databases
 In ICDE
, 2007
"... Probabilistic databases have received considerable attention recently due to the need for storing uncertain data produced by many real world applications. The widespread use of probabilistic databases is hampered by two limitations: (1) current probabilistic databases make simplistic assumptions abo ..."
Abstract

Cited by 142 (11 self)
 Add to MetaCart
(Show Context)
Probabilistic databases have received considerable attention recently due to the need for storing uncertain data produced by many real world applications. The widespread use of probabilistic databases is hampered by two limitations: (1) current probabilistic databases make simplistic assumptions about the data (e.g., complete independence among tuples) that make it difficult to use them in applications that naturally produce correlated data, and (2) most probabilistic databases can only answer a restricted subset of the queries that can be expressed using traditional query languages. We address both these limitations by proposing a framework that can represent not only probabilistic tuples, but also correlations that may be present among them. Our proposed framework naturally lends itself to the possible world semantics thus preserving the precise query semantics extant in current probabilistic databases. We develop an efficient strategy for query evaluation over such probabilistic databases by casting the query processing problem as an inference problem in an appropriately constructed probabilistic graphical model. We present several optimizations specific to probabilistic databases that enable efficient query evaluation. We validate our approach by presenting an experimental evaluation that illustrates the effectiveness of our techniques at answering various queries using real and synthetic datasets. 1
Topk query processing in uncertain databases
 In ICDE
, 2007
"... Topk processing in uncertain databases is semantically and computationally different from traditional topk processing. The interplay between score and uncertainty makes traditional techniques inapplicable. We introduce new probabilistic formulations for topk queries. Our formulations are based on ..."
Abstract

Cited by 125 (9 self)
 Add to MetaCart
(Show Context)
Topk processing in uncertain databases is semantically and computationally different from traditional topk processing. The interplay between score and uncertainty makes traditional techniques inapplicable. We introduce new probabilistic formulations for topk queries. Our formulations are based on “marriage ” of traditional topk semantics and possible worlds semantics. In the light of these formulations, we construct a framework that encapsulates a state space model and efficient query processing techniques to tackle the challenges of uncertain data settings. We prove that our techniques are optimal in terms of the number of accessed tuples and materialized search states. Our experiments show the efficiency of our techniques under different data distributions with orders of magnitude improvement over naïve materialization of possible worlds. 1
Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data
 Proc. 30th Int’l Conf. Very Large Data Bases (VLDB
, 2004
"... It is infeasible for a sensor database to contain the exact value of each sensor at all points in time. This uncertainty is inherent in these systems due to measurement and sampling errors, and resource limitations. In order to avoid drawing erroneous conclusions based upon stale data, the use of un ..."
Abstract

Cited by 123 (22 self)
 Add to MetaCart
(Show Context)
It is infeasible for a sensor database to contain the exact value of each sensor at all points in time. This uncertainty is inherent in these systems due to measurement and sampling errors, and resource limitations. In order to avoid drawing erroneous conclusions based upon stale data, the use of uncertainty intervals that model each data item as a range and associated probability density function (pdf) rather than a single value has recently been proposed. Querying these uncertain data introduces imprecision into answers, in the form of probability values that specify the likeliness the answer satisfies the query. These queries are more expensive to evaluate than their traditional counterparts but are guaranteed to be correct and more informative due to the probabilities accompanying the answers. Although the answer probabilities are useful, for many applications, it is only necessary to know whether the probability exceeds a given threshold – we term these Probabilistic Threshold Queries (PTQ). In this paper we address the efficient computation of these types of queries. In particular, we develop two index structures and associated algorithms to efficiently answer PTQs. The first index scheme is based on the idea of augmenting uncertainty information to an Rtree. We establish the difficulty
Protecting Location Privacy with Personalized kAnonymity: Architecture and Algorithms
 IEEE TRANSACTIONS ON MOBILE COMPUTING
, 2008
"... Continued advances in mobile networks and positioning technologies have created a strong market push for locationbased applications. Examples include locationaware emergency response, locationbased advertisement, and locationbased entertainment. An important challenge in the wide deployment of l ..."
Abstract

Cited by 118 (5 self)
 Add to MetaCart
(Show Context)
Continued advances in mobile networks and positioning technologies have created a strong market push for locationbased applications. Examples include locationaware emergency response, locationbased advertisement, and locationbased entertainment. An important challenge in the wide deployment of locationbased services (LBSs) is the privacyaware management of location information, providing safeguards for location privacy of mobile clients against vulnerabilities for abuse. This paper describes a scalable architecture for protecting the location privacy from various privacy threats resulting from uncontrolled usage of LBSs. This architecture includes the development of a personalized location anonymization model and a suite of location perturbation algorithms. A unique characteristic of our location privacy architecture is the use of a flexible privacy personalization framework to support location kanonymity for a wide range of mobile clients with contextsensitive privacy requirements. This framework enables each mobile client to specify the minimum level of anonymity that it desires and the maximum temporal and spatial tolerances that it is willing to accept when requesting kanonymitypreserving LBSs. We devise an efficient message perturbation engine to implement the proposed location privacy framework. The prototype that we develop is designed to be run by the anonymity server on a trusted platform and performs location anonymization on LBS request messages of mobile clients such as identity removal and spatiotemporal cloaking of the location information. We study the effectiveness of our location cloaking algorithms under various conditions by using realistic location data that is synthetically generated from real road maps and traffic volume data. Our experiments show that the personalized location kanonymity model, together with our location perturbation engine, can achieve high resilience to location privacy threats without introducing any significant performance penalty.
Indexing multidimensional uncertain data with arbitrary probability density functions
 In Proc. VLDB
, 2005
"... In an “uncertain database”, an object o is associated with a multidimensional probability density function (pdf), which describes the likelihood that o appears at each position in the data space. A fundamental operation is the “probabilistic range search ” which, given a value pq and a rectangular ..."
Abstract

Cited by 116 (15 self)
 Add to MetaCart
(Show Context)
In an “uncertain database”, an object o is associated with a multidimensional probability density function (pdf), which describes the likelihood that o appears at each position in the data space. A fundamental operation is the “probabilistic range search ” which, given a value pq and a rectangular area rq, retrieves the objects that appear in rq with probabilities at least pq. In this paper, we propose the Utree, an access method designed to optimize both the I/O and CPU time of range retrieval on multidimensional imprecise data. The new structure is fully dynamic (i.e., objects can be incrementally inserted/deleted in any order), and does not place any constraints on the data pdfs. We verify the query and update efficiency of Utrees with extensive experiments. 1
MCDB: a Monte Carlo approach to managing uncertain data
, 2008
"... To deal with data uncertainty, existing probabilistic database systems augment tuples with attributelevel or tuplelevel probability values, which are loaded into the database along with the data itself. This approach can severely limit the system’s ability to gracefully handle complex or unforese ..."
Abstract

Cited by 110 (3 self)
 Add to MetaCart
To deal with data uncertainty, existing probabilistic database systems augment tuples with attributelevel or tuplelevel probability values, which are loaded into the database along with the data itself. This approach can severely limit the system’s ability to gracefully handle complex or unforeseen types of uncertainty, and does not permit the uncertainty model to be dynamically parameterized according to the current state of the database. We introduce MCDB, a system for managing uncertain data that is based on a Monte Carlo approach. MCDB represents uncertainty via “VG functions,” which are used to pseudorandomly generate realized values for uncertain attributes. VG functions can be parameterized on the results of SQL queries over “parameter tables ” that are stored in the database, facilitating whatif analyses. By storing parameters, and not probabilities, and by estimating, rather than exactly computing, the probability distribution over possible query answers, MCDB avoids many of the limitations of prior systems. For example, MCDB can easily handle arbitrary joint probability distributions over discrete or continuous attributes, arbitrarily complex SQL queries, and arbitrary functionals of the queryresult distribution such as means, variances, and quantiles. To achieve good performance, MCDB uses novel query processing techniques, executing a query plan exactly once, but over “tuple bundles ” instead of ordinary tuples. Experiments indicate that our enhanced functionality can be obtained with acceptable overheads relative to traditional systems.