Results 1 
3 of
3
Circuits for Datalog Provenance
"... The annotation of the results of database queries with provenance information has many applications. This paper studies provenance for datalog queries. We start by considering provenance representation by (positive) Boolean expressions, as pioneered in the theories of incomplete and probabilist ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
The annotation of the results of database queries with provenance information has many applications. This paper studies provenance for datalog queries. We start by considering provenance representation by (positive) Boolean expressions, as pioneered in the theories of incomplete and probabilistic databases. We show that even for linear datalog programs the representation of provenance using Boolean expressions incurs a superpolynomial size blowup in data complexity. We address this with an approach that is novel in provenance studies, showing that we can construct in PTIME polysize (data complexity) provenance representations as Boolean circuits. Then we present optimization techniques that embed the construction of circuits into seminaive datalog evaluation, and further reduce the size of the circuits. We also illustrate the usefulness of our approach in multiple application domains such as query evaluation in probabilistic databases, and in deletion propagation. Next, we study the possibility of extending the circuit approach to the more general framework of semiring annotations introduced in earlier work. We show that for a large and useful class of provenance semirings, we can construct in PTIME polysize circuits that capture the provenance.
PROX: Approximated Summarization of Data Provenance
"... ABSTRACT Many modern applications involve collecting large amounts of data from multiple sources, and then aggregating and manipulating it in intricate ways. The complexity of such applications, combined with the size of the collected data, makes it difficult to understand the application logic and ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT Many modern applications involve collecting large amounts of data from multiple sources, and then aggregating and manipulating it in intricate ways. The complexity of such applications, combined with the size of the collected data, makes it difficult to understand the application logic and how information was derived. Data provenance has been proven helpful in this respect in different contexts; however, maintaining and presenting the full and exact provenance may be infeasible, due to its size and complex structure. For that reason, we introduce the notion of approximated summarized provenance, where we seek a compact representation of the provenance at the possible cost of information loss. Based on this notion, we have developed PROX, a system for the management, presentation and use of data provenance for complex applications. We propose to demonstrate PROX in the context of a movies rating crowdsourcing system, letting participants view provenance summarization and use it to gain insights on the application and its underlying data.
Approximated Provenance for Complex Applications
"... Many applications now involve the collection of large amounts of data from multiple users, and then aggregating and manipulating it in intricate ways. The complexity of such applications, combined with the size of the collected data, makes it difficult to understand how information was derived, and ..."
Abstract
 Add to MetaCart
(Show Context)
Many applications now involve the collection of large amounts of data from multiple users, and then aggregating and manipulating it in intricate ways. The complexity of such applications, combined with the size of the collected data, makes it difficult to understand how information was derived, and consequently difficult to asses its credibility, to optimize and debug its derivation, etc. Provenance has been helpful in achieving such goals in different contexts, and we illustrate its potential for novel complex applications such as those performing crowdsourcing. Maintaining (and presenting) the full and exact provenance information may be infeasible for such applications, due to the size of the provenance and its complex structure. We propose some initial directions towards addressing this challenge, through the notion of approximated provenance. 1.