Results 1 
8 of
8
A dichotomy for nonrepeating queries with negation in probabilistic databases
, 2014
"... This paper shows that any nonrepeating conjunctive relational query with negation has either polynomial time or #Phard data complexity on tupleindependent probabilistic databases. This result extends a dichotomy by Dalvi and Suciu for nonrepeating conjunctive queries to queries with negation. ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
This paper shows that any nonrepeating conjunctive relational query with negation has either polynomial time or #Phard data complexity on tupleindependent probabilistic databases. This result extends a dichotomy by Dalvi and Suciu for nonrepeating conjunctive queries to queries with negation. The tractable queries with negation are precisely the hierarchical ones and can be recognised efficiently. 1.
ENFrame: A Platform for Processing Probabilistic Data
"... This paper introduces ENFrame, a unified data processing platform for querying and mining probabilistic data. Using ENFrame, users can write programs in a fragment of Python with constructs such as boundedrange loops, list comprehension, aggregate operations on lists, and calls to external databas ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
This paper introduces ENFrame, a unified data processing platform for querying and mining probabilistic data. Using ENFrame, users can write programs in a fragment of Python with constructs such as boundedrange loops, list comprehension, aggregate operations on lists, and calls to external database engines. The program is then interpreted probabilistically by ENFrame. The realisation of ENFrame required novel contributions along several directions. We propose an event language that is expressive enough to succinctly encode arbitrary correlations, trace the computation of user programs, and allow for computation of discrete probability distributions of program variables. We exemplify ENFrame on three clustering algorithms: kmeans, kmedoids, and Markov clustering. We introduce sequential and distributed algorithms for computing the probability of interconnected events exactly or approximately with error guarantees. Experiments with kmedoids clustering of sensor readings from energy networks show ordersofmagnitude improvements of exact clustering using ENFrame over näıve clustering in each possible world, of approximate over exact, and of distributed over sequential algorithms. 1.
Circuits for Datalog Provenance
"... The annotation of the results of database queries with provenance information has many applications. This paper studies provenance for datalog queries. We start by considering provenance representation by (positive) Boolean expressions, as pioneered in the theories of incomplete and probabilist ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
The annotation of the results of database queries with provenance information has many applications. This paper studies provenance for datalog queries. We start by considering provenance representation by (positive) Boolean expressions, as pioneered in the theories of incomplete and probabilistic databases. We show that even for linear datalog programs the representation of provenance using Boolean expressions incurs a superpolynomial size blowup in data complexity. We address this with an approach that is novel in provenance studies, showing that we can construct in PTIME polysize (data complexity) provenance representations as Boolean circuits. Then we present optimization techniques that embed the construction of circuits into seminaive datalog evaluation, and further reduce the size of the circuits. We also illustrate the usefulness of our approach in multiple application domains such as query evaluation in probabilistic databases, and in deletion propagation. Next, we study the possibility of extending the circuit approach to the more general framework of semiring annotations introduced in earlier work. We show that for a large and useful class of provenance semirings, we can construct in PTIME polysize circuits that capture the provenance.
Provenance for Nondeterministic OrderAware Queries
"... Data transformations that involve (partial) ordering, and consolidate data in presence of uncertainty, are common in the context of various applications. The complexity of such transformations, in addition to the possible presence of metadata, call for provenance support. We introduce, for the fi ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Data transformations that involve (partial) ordering, and consolidate data in presence of uncertainty, are common in the context of various applications. The complexity of such transformations, in addition to the possible presence of metadata, call for provenance support. We introduce, for the first time, a framework that accounts for the conjunction of these needs. To this end, we enrich the positive relational algebra with orderaware operators, some of which are nondeterministic, accounting for uncertainty. We study the expressive power and the complexity of deciding possibility for the obtained language. We then equip the language with (semiringbased) provenance tracking and highlight the unique challenges in supporting provenance for the orderaware operations. We explain how to overcome these challenges, designing a new provenance structure and a provenanceaware semantics for our language. We show the usefulness of the construction, proving that it satisfies common desiderata for provenance tracking. 1.
Anytime approximation in probabilistic databases
, 2013
"... This article describes an approximation algorithm for computing the probability of propositional formulas over discrete random variables. It incrementally refines lower and upper bounds on the probability of the formulas until the desired absolute or relative error guarantee is reached. This algori ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This article describes an approximation algorithm for computing the probability of propositional formulas over discrete random variables. It incrementally refines lower and upper bounds on the probability of the formulas until the desired absolute or relative error guarantee is reached. This algorithm is used by the SPROUT query engine to approximate the probabilities of results to relational algebra queries on expressive probabilistic databases.
A Provenance Framework for DataDependent Process Analysis ∗
"... A datadependent process (DDP) models an application whose control flow is guided by a finite state machine, as well as by the state of an underlying database. DDPs are commonly found e.g., in ecommerce. In this paper we develop a framework supporting the use of provenance in static (temporal) anal ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
A datadependent process (DDP) models an application whose control flow is guided by a finite state machine, as well as by the state of an underlying database. DDPs are commonly found e.g., in ecommerce. In this paper we develop a framework supporting the use of provenance in static (temporal) analysis of possible DDP executions. Using provenance support, analysts can interactively test and explore the effect of hypothetical modifications to a DDP’s state machine and/or to the underlying database. They can also extend the analysis to incorporate the propagation of annotations from metadomains of interest, e.g., cost or access privileges. Toward this goal we note that the framework of semiringbased provenance was proven highly effective in fulfilling similar needs in the context of database queries. In this paper we consider novel constructions that generalize the semiring approach to the context of DDP analysis. These constructions address two interacting new challenges: (1) to combine provenance annotations for both information that resides in the database and information about external inputs (e.g., user choices), and (2) to finitely capture infinite process executions. We analyze our solution from theoretical and experimental perspectives, proving its effectiveness. 1.
10 Years of Probabilistic Querying What Next?
"... Abstract. Over the past decade, the two research areas of probabilistic databases and probabilistic programming have intensively studied the problem of making structured probabilistic inference scalable, but—so far—both areas developed almost independently of one another. While probabilistic databa ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Over the past decade, the two research areas of probabilistic databases and probabilistic programming have intensively studied the problem of making structured probabilistic inference scalable, but—so far—both areas developed almost independently of one another. While probabilistic databases have focused on describing tractable query classes based on the structure of query plans and data lineage, probabilistic programming has contributed sophisticated inference techniques based on knowledge compilation and lifted (firstorder) inference. Both fields have developed their own variants of—both exact and approximate—topk algorithms for query evaluation, and both investigate query optimization techniques known from SQL, Datalog, and Prolog, which all calls for a more intensive study of the commonalities and integration of the two fields. Moreover, we believe that naturallanguage processing and information extraction will remain a driving factor and in fact a longstanding challenge for developing expressive representation models which can be combined with structured probabilistic inference—also for the next decades to come.
Probabilistic Data Programming with ENFrame
"... Abstract This paper overviews ENFrame, a programming framework for probabilistic data. In addition to relational query processing supported via an existing probabilistic database management system, ENFrame allows programming with loops, assignments, conditionals, list comprehension, and aggregates ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract This paper overviews ENFrame, a programming framework for probabilistic data. In addition to relational query processing supported via an existing probabilistic database management system, ENFrame allows programming with loops, assignments, conditionals, list comprehension, and aggregates to encode complex tasks such as clustering and classification of probabilistic data. We explain the design choices behind ENFrame, some distilled from the wealth of work on probabilistic databases and some new. We also highlight a few challenges lying ahead. Motivation and Scope Probabilistic data management has gone a long, fruitful way in the last decade There is a growing need for computing frameworks that allow users to build applications feeding on uncertain data without worrying about the underlying uncertain nature of such data or the computationally hard inference task that comes along with it. For tasks that only need to query probabilistic data, existing probabilistic database systems do offer a viable solution A similar observation has been recently made in the areas of machine learning The thesis of this work is that one can build powerful and useful probabilistic data programming frameworks that leverage existing work on probabilistic databases. ENFrame [21] is a framework that aims to fit this vision: