Results 1 
8 of
8
A TemporalProbabilistic Database Model for Information Extraction
"... Temporal annotations of facts are a key component both for building a highaccuracy knowledge base and for answering queries over the resulting temporal knowledge base with high precision and recall. In this paper, we present a temporalprobabilistic database model for cleaning uncertain temporal fac ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Temporal annotations of facts are a key component both for building a highaccuracy knowledge base and for answering queries over the resulting temporal knowledge base with high precision and recall. In this paper, we present a temporalprobabilistic database model for cleaning uncertain temporal facts obtained from information extraction methods. Specifically, we consider a combination of temporal deduction rules, temporal consistency constraints and probabilistic inference based on the common possibleworlds semantics with data lineage, and we study the theoretical properties of this data model. We further develop a query engine which is capable of scaling to very large temporal knowledge bases, with nearly interactive query response times over millions of uncertain facts and hundreds of thousands of grounded rules. Our experiments over two realworld datasets demonstrate the increased robustness of our approach compared to related techniques based on constraint solving via Integer Linear Programming (ILP) and probabilistic inference via Markov Logic Networks (MLNs). We are also able to show that our runtime performance is more than competitive to current ILP solvers and the fastest available, probabilistic but nontemporal, database engines. 1.
Querying factorized probabilistic triple databases
 In ISWC
, 2014
"... Abstract. An increasing amount of data is becoming available in the form of large triple stores, with the Semantic Web’s linked open data cloud (LOD) as one of the most prominent examples. Data quality and completeness are key issues in many communitygenerated data stores, like LOD, which motivates ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract. An increasing amount of data is becoming available in the form of large triple stores, with the Semantic Web’s linked open data cloud (LOD) as one of the most prominent examples. Data quality and completeness are key issues in many communitygenerated data stores, like LOD, which motivates probabilistic and statistical approaches to data representation, reasoning and querying. In this paper we address the issue from the perspective of probabilistic databases, which account for uncertainty in the data via a probability distribution over all database instances. We obtain a highly compressed representation using the recently developed RESCAL approach and demonstrate experimentally that efficient querying can be obtained by exploiting inherent features of RESCAL via subquery approximations of deterministic views.
Anytime approximation in probabilistic databases
, 2013
"... This article describes an approximation algorithm for computing the probability of propositional formulas over discrete random variables. It incrementally refines lower and upper bounds on the probability of the formulas until the desired absolute or relative error guarantee is reached. This algori ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This article describes an approximation algorithm for computing the probability of propositional formulas over discrete random variables. It incrementally refines lower and upper bounds on the probability of the formulas until the desired absolute or relative error guarantee is reached. This algorithm is used by the SPROUT query engine to approximate the probabilities of results to relational algebra queries on expressive probabilistic databases.
Querying and Learning in Probabilistic Databases
"... Abstract. Probabilistic Databases (PDBs) lie at the expressive intersection of databases, firstorder logic, and probability theory. PDBs employ logical deduction rules to process SelectProjectJoin (SPJ) queries, which form the basis for a variety of declarative query languages such as Datalog, ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Probabilistic Databases (PDBs) lie at the expressive intersection of databases, firstorder logic, and probability theory. PDBs employ logical deduction rules to process SelectProjectJoin (SPJ) queries, which form the basis for a variety of declarative query languages such as Datalog, Relational Algebra, and SQL. They employ logical consistency constraints to resolve data inconsistencies, and they represent query answers via logical lineage formulas (aka.“data provenance”) to trace the dependencies between these answers and the input tuples that led to their derivation. While the literature on PDBs dates back to more than 25 years of research, only fairly recently the key role of lineage for establishing a closed and complete representation model of relational operations over this kind of probabilistic data was discovered. Although PDBs benefit from their efficient and scalable database infrastructures for data storage and indexing, they couple the data computation with probabilistic inference, the latter of which remains a #Phard problem also in the context of PDBs. In this chapter, we provide a review on the key concepts of PDBs with a particular focus on our own recent research results related to this field. We highlight a number of ongoing research challenges related to PDBs, and we keep referring to an information extraction (IE) scenario as a running application to manage uncertain and temporal facts obtained from IE techniques directly inside a PDB setting.
Towards Enabling Probabilistic Databases for Participatory Sensing
"... Abstract—Participatory sensing has emerged as a new data collection paradigm, in which humans use their own devices (cell phone accelerometers, cameras, etc.) as sensors. This paradigm enables to collect a huge amount of data from the crowd for worldwide applications, without spending cost to buy d ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Participatory sensing has emerged as a new data collection paradigm, in which humans use their own devices (cell phone accelerometers, cameras, etc.) as sensors. This paradigm enables to collect a huge amount of data from the crowd for worldwide applications, without spending cost to buy dedicated sensors. Despite of this benefit, the data collected from human sensors are inherently uncertain due to no quality guarantee from the participants. Moreover, the participatory sensing data are time series that not only exhibit highly irregular dependencies on time, but also vary from sensor to sensor. To overcome these issues, we study in this paper the problem of creating probabilistic data from given (uncertain) time series collected by participatory sensors. We approach the problem in two steps. In the first step, we generate probabilistic times series from raw time series using a dynamical model from the time series literature. In the second step, we combine probabilistic time series from multiple sensors based on the mutual relationship between the reliability of the sensors and the quality of their data. Through extensive experimentation, we demonstrate the efficiency of our approach on both real data and synthetic data. I.
10 Years of Probabilistic Querying What Next?
"... Abstract. Over the past decade, the two research areas of probabilistic databases and probabilistic programming have intensively studied the problem of making structured probabilistic inference scalable, but—so far—both areas developed almost independently of one another. While probabilistic databa ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Over the past decade, the two research areas of probabilistic databases and probabilistic programming have intensively studied the problem of making structured probabilistic inference scalable, but—so far—both areas developed almost independently of one another. While probabilistic databases have focused on describing tractable query classes based on the structure of query plans and data lineage, probabilistic programming has contributed sophisticated inference techniques based on knowledge compilation and lifted (firstorder) inference. Both fields have developed their own variants of—both exact and approximate—topk algorithms for query evaluation, and both investigate query optimization techniques known from SQL, Datalog, and Prolog, which all calls for a more intensive study of the commonalities and integration of the two fields. Moreover, we believe that naturallanguage processing and information extraction will remain a driving factor and in fact a longstanding challenge for developing expressive representation models which can be combined with structured probabilistic inference—also for the next decades to come.
Authors ' Addresses
, 2014
"... We would like to thank Floris Geerts and Rainer Gemulla for helpful technical discussions. We thank Radu Curticapean for pointing out Bezout's theorem. Learning the parameters of complex probabilisticrelational models from labeled training data is a standard technique in machine learning, whi ..."
Abstract
 Add to MetaCart
(Show Context)
We would like to thank Floris Geerts and Rainer Gemulla for helpful technical discussions. We thank Radu Curticapean for pointing out Bezout's theorem. Learning the parameters of complex probabilisticrelational models from labeled training data is a standard technique in machine learning, which has been intensively studied in the subeld of Statistical Relational Learning (SRL), butso farthis is still an underinvestigated topic in the context of Probabilistic Databases (PDBs). In this paper, we focus on learning the probability values of base tuples in a PDB from query answers, the latter of which are represented as labeled lineage formulas. Specically, we consider labels in the form of pairs, each consisting of a Boolean lineage formula and a marginal probability that comes attached to the corresponding query answer. The resulting learning problem can be viewed as the inverse problem to condence computations in PDBs: given a set of labeled query answers, learn the probability values of the base tuples, such that the marginal prob