Results 1  10
of
60
Representing and querying correlated tuples in probabilistic databases
 In ICDE
, 2007
"... Probabilistic databases have received considerable attention recently due to the need for storing uncertain data produced by many real world applications. The widespread use of probabilistic databases is hampered by two limitations: (1) current probabilistic databases make simplistic assumptions abo ..."
Abstract

Cited by 142 (11 self)
 Add to MetaCart
(Show Context)
Probabilistic databases have received considerable attention recently due to the need for storing uncertain data produced by many real world applications. The widespread use of probabilistic databases is hampered by two limitations: (1) current probabilistic databases make simplistic assumptions about the data (e.g., complete independence among tuples) that make it difficult to use them in applications that naturally produce correlated data, and (2) most probabilistic databases can only answer a restricted subset of the queries that can be expressed using traditional query languages. We address both these limitations by proposing a framework that can represent not only probabilistic tuples, but also correlations that may be present among them. Our proposed framework naturally lends itself to the possible world semantics thus preserving the precise query semantics extant in current probabilistic databases. We develop an efficient strategy for query evaluation over such probabilistic databases by casting the query processing problem as an inference problem in an appropriately constructed probabilistic graphical model. We present several optimizations specific to probabilistic databases that enable efficient query evaluation. We validate our approach by presenting an experimental evaluation that illustrates the effectiveness of our techniques at answering various queries using real and synthetic datasets. 1
PXML: A probabilistic semistructured data model and algebra
 In ICDE
, 2003
"... ehung,getoor,vs£ Despite the recent proliferation of work on semistructured data models, there has been little work to date on supporting uncertainty in these models. In this paper, we propose a model for probabilistic semistructured data (PSD). The advantage of our approach is that it supports a fl ..."
Abstract

Cited by 60 (4 self)
 Add to MetaCart
(Show Context)
ehung,getoor,vs£ Despite the recent proliferation of work on semistructured data models, there has been little work to date on supporting uncertainty in these models. In this paper, we propose a model for probabilistic semistructured data (PSD). The advantage of our approach is that it supports a flexible representation that allows the specification of a wide class of distributions over semistructured instances. We provide two semantics for the model and show that the semantics are probabilistically coherent. Next, we develop an extension of the relational algebra to handle probabilistic semistructured data and describe efficient algorithms for answering queries that use this algebra. Finally, we present experimental results showing the efficiency of our algorithms. 1
On the expressiveness of probabilistic XML models
, 2009
"... Various known models of probabilistic XML can be represented as instantiations of the abstract notion of pdocuments. In addition to ordinary nodes, pdocuments have distributional nodes that specify the possible worlds and their probabilistic distribution. Particular families of pdocuments are de ..."
Abstract

Cited by 47 (24 self)
 Add to MetaCart
Various known models of probabilistic XML can be represented as instantiations of the abstract notion of pdocuments. In addition to ordinary nodes, pdocuments have distributional nodes that specify the possible worlds and their probabilistic distribution. Particular families of pdocuments are determined by the types of distributional nodes that can be used as well as by the structural constraints on the placement of those nodes in a pdocument. Some of the resulting families provide natural extensions and combinations of previously studied probabilistic XML models. The focus of the paper is on the expressive power of families of pdocuments. In particular, two main issues are studied.
On the complexity of managing probabilistic XML data
 In PODS
, 2007
"... In [3], we introduced a framework for querying and updating probabilistic information over unordered labeled trees, the probabilistic tree model. The data model is based on trees where nodes are annotated with conjunctions of probabilistic event variables. We briefly described an implementation and ..."
Abstract

Cited by 37 (11 self)
 Add to MetaCart
In [3], we introduced a framework for querying and updating probabilistic information over unordered labeled trees, the probabilistic tree model. The data model is based on trees where nodes are annotated with conjunctions of probabilistic event variables. We briefly described an implementation and scenarios of usage. We develop here a mathematical foundation for this model. In particular, we present complexity results. We identify a very large class of queries for which simple variations of querying and updating algorithms from [3] compute the correct answer. A main contribution is a full complexity analysis of queries and updates. We also exhibit a decision procedure for the equivalence of probabilistic trees and prove it is in corp. Furthermore, we study the issue of removing less probable possible worlds, and that of validating a probabilistic tree against a DTD. We show that these two problems are intractable in the most general case.
Range Search on Multidimensional Uncertain Data
"... In an uncertain database, every object o is associated with a probability density function, which describes the likelihood that o appears at each position in a multidimensional workspace. This article studies two types of range retrieval fundamental to many analytical tasks. Specifically, a nonfuzzy ..."
Abstract

Cited by 37 (8 self)
 Add to MetaCart
In an uncertain database, every object o is associated with a probability density function, which describes the likelihood that o appears at each position in a multidimensional workspace. This article studies two types of range retrieval fundamental to many analytical tasks. Specifically, a nonfuzzy query returns all the objects that appear in a search region rq with at least a certain probability tq. On the other hand, given an uncertain object q, fuzzy search retrieves the set of objects that are within distance εq from q with no less than probability tq. The core of our methodology is a novel concept of “probabilistically constrained rectangle”, which permits effective pruning/validation of nonqualifying/qualifying data. We develop a new index structure called the Utree for minimizing the query overhead. Our algorithmic findings are accompanied with a thorough theoretical analysis, which reveals valuable insight into the problem characteristics, and mathematically confirms the efficiency of our solutions. We verify the effectiveness of the proposed techniques with extensive
Aggregate Queries for Discrete and Continuous Probabilistic XML
, 2010
"... Sources of data uncertainty and imprecision are numerous. A way to handle this uncertainty is to associate probabilistic annotations to data. Many such probabilistic database models have been proposed, both in the relational and in the semistructured setting. The latter is particularly well adapted ..."
Abstract

Cited by 21 (18 self)
 Add to MetaCart
(Show Context)
Sources of data uncertainty and imprecision are numerous. A way to handle this uncertainty is to associate probabilistic annotations to data. Many such probabilistic database models have been proposed, both in the relational and in the semistructured setting. The latter is particularly well adapted to the management of uncertain data coming from a variety of automatic processes. An important problem, in the context of probabilistic XML databases, is that of answering aggregate queries (count, sum, avg, etc.), which has received limited attention so far. In a model unifying the various (discrete) semistructured probabilistic models studied up to now, we present algorithms to compute the distribution of the aggregation values (exploiting some regularity properties of the aggregate functions) and probabilistic moments (especially, expectation and variance) of this distribution. We also prove the intractability of some of these problems and investigate approximation techniques. We finally extend the discrete model to a continuous one, in order to take into account continuous data values, such as measurements from sensor networks, and present algorithms to compute distribution functions and moments for various classes of continuous distributions of data values.
OntologyBased User Context Management: The Challenges of Imperfection and TimeDependence
 in On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE. Part I., ser. Lecture
, 2006
"... Robust and scalable user context management is the key enabler for the emerging context and situationaware applications, and ontologybased approaches have shown their usefulness for capturing especially context information on a high level of abstraction. But so far the problem has not been app ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Robust and scalable user context management is the key enabler for the emerging context and situationaware applications, and ontologybased approaches have shown their usefulness for capturing especially context information on a high level of abstraction. But so far the problem has not been approached as a data management problem, which is key to scalability and robustness. The specific challenges lie in the imperfection of highlevel context information, its timedependence and the variability in the dynamics between its different elements.
Fusion rules for merging uncertain information
 Information Fusion
, 2006
"... In previous papers, we have presented a logicbased framework based on fusion rules for merging structured news reports [Hun00, Hun02b, Hun02a, HS03, HS04]. Structured news reports are XML documents, where the textentries are restricted to individual words or simple phrases, such as names and domain ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
In previous papers, we have presented a logicbased framework based on fusion rules for merging structured news reports [Hun00, Hun02b, Hun02a, HS03, HS04]. Structured news reports are XML documents, where the textentries are restricted to individual words or simple phrases, such as names and domainspecific terminology, and numbers and units. We assume structured news reports do not require natural language processing. Fusion rules are a form of scripting language that define how structured news reports should be merged. The antecedent of a fusion rule is a call to investigate the information in the structured news reports and the background knowledge, and the consequent of a fusion rule is a formula specifying an action to be undertaken to form a merged report. It is expected that a set of fusion rules is defined for any given application. In this paper we extend the approach to handling probability values, degrees of beliefs, or necessity measures associated with textentries in the news reports. We present the formal definition for each of these types of uncertainty and explain how they can be handled using fusion rules. We also discuss the methods of detecting inconsistencies among sources. 1
Probabilistic XML via Markov Chains
, 2009
"... We show how Recursive Markov Chains (RMCs) and their restrictions can define probabilistic distributions over XML documents, and study tractability ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
We show how Recursive Markov Chains (RMCs) and their restrictions can define probabilistic distributions over XML documents, and study tractability
Efficient Subgraph Search over Large Uncertain Graphs
"... Retrieving graphs containing a query graph from a large graph database is a key task in many graphbased applications, including chemical compounds discovery, protein complex prediction, and structural pattern recognition. However, graph data handled by these applications is often noisy, incomplete, ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
(Show Context)
Retrieving graphs containing a query graph from a large graph database is a key task in many graphbased applications, including chemical compounds discovery, protein complex prediction, and structural pattern recognition. However, graph data handled by these applications is often noisy, incomplete, and inaccurate because of the way the data is produced. In this paper,we study subgraph queries over uncertain graphs. Specifically, we consider the problem of answering thresholdbased probabilistic queries over a large uncertain graph database with the possible world semantics. We prove that problem is #Pcomplete, therefore, we adopt a filteringandverification strategy to speed up the search. In the filtering phase, we use a probabilistic inverted index, PIndex, based on subgraph features obtained by an optimal feature selection process. During the verification phase, we develop exact and bound algorithms to validate the remaining candidates. Extensive experimental results demonstrate the effectiveness of the proposed algorithms. 1.