Results 11 - 20
of
198
On Propagation of Deletions and Annotations Through Views
- In PODS
, 2002
"... We study two classes of view update problems in relational databases. We are given a source database S, a monotone query Q, and the view Q(S) generated by the query. The first problem that we consider is the classical view deletion problem where we wish to identify a minimal set T of tuples in S who ..."
Abstract
-
Cited by 56 (10 self)
- Add to MetaCart
We study two classes of view update problems in relational databases. We are given a source database S, a monotone query Q, and the view Q(S) generated by the query. The first problem that we consider is the classical view deletion problem where we wish to identify a minimal set T of tuples in S whose deletion will eliminate a given tuple t from the view. We study the complexity of optimizing two natural objectives in this setting, namely, find T to minimize the side-effects on the view, and the source, respectively. For both objective functions, we show a dichotomy in the complexity. Interestingly, the problem is either in P or is NP-hard, for queries in the same class in either objective function. The second problem in our study is the annotation placement problem. Suppose we annotate an attribute of a tuple in S. The rules for carrying the annotation forward through a query are easily stated. On the other hand, suppose we annotate an attribute of a tuple in the view Q(S), what annotation(s) in S will cause this annotation to appear in the view, minimizing the propagation to other attributes in Q(S)? View annotation is becoming an increasingly useful method of communicating meta-data among users of shared scientific data sets, and to our knowledge, there has been no formal study of this problem. Our study of these problems gives us important insights into computational issues involved in data provenance
Models for Incomplete and Probabilistic Information
- IEEE Data Engineering Bulletin
, 2006
"... Abstract. We discuss, compare and relate some old and some new models for incomplete and probabilistic databases. We characterize the expressive power of c-tables over infinite domains and we introduce a new kind of result, algebraic completion, for studying less expressive models. By viewing probab ..."
Abstract
-
Cited by 50 (6 self)
- Add to MetaCart
Abstract. We discuss, compare and relate some old and some new models for incomplete and probabilistic databases. We characterize the expressive power of c-tables over infinite domains and we introduce a new kind of result, algebraic completion, for studying less expressive models. By viewing probabilistic models as incompleteness models with additional probability information, we define completeness and closure under query languages of general probabilistic database models and we introduce a new such model, probabilistic c-tables, that is shown to be complete and closed under the relational algebra. 1
Applying Chimera virtual data concepts to cluster finding
- in the Sloan Sky Survey. Proceedings of Supercomputing 2002 (SC2002
, 2002
"... The GriPhyN project [1] is one of several major efforts [2-4] working to enable large-scale data-intensive computation as a routine scientific tool. GriPhyN focuses in particular on virtual data technologies that allow computational procedures and results to be exploited as community resources so th ..."
Abstract
-
Cited by 45 (12 self)
- Add to MetaCart
The GriPhyN project [1] is one of several major efforts [2-4] working to enable large-scale data-intensive computation as a routine scientific tool. GriPhyN focuses in particular on virtual data technologies that allow computational procedures and results to be exploited as community resources so that, for example, scientists can not only run their own computations on raw data, but also discover computational procedures
Update Exchange with Mappings and Provenance
- In Very Large Data Bases (VLDB
, 2007
"... We consider systems for data sharing among heterogeneous peers related by a network of schema mappings. Each peer has a locally controlled and edited database instance, but wants to ask queries over related data from other peers as well. To achieve this, every peer’s updates propagate along the mapp ..."
Abstract
-
Cited by 44 (25 self)
- Add to MetaCart
We consider systems for data sharing among heterogeneous peers related by a network of schema mappings. Each peer has a locally controlled and edited database instance, but wants to ask queries over related data from other peers as well. To achieve this, every peer’s updates propagate along the mappings to the other peers. However, this update exchange is filtered by trust conditions — expressing what data and sources a peer judges to be authoritative — which may cause a peer to reject another’s updates. In order to support such filtering, updates carry provenance information. These systems target scientific data sharing applications, and their general principles and architecture have been described in [20]. In this paper we present methods for realizing such systems. Specifically, we extend techniques from data integration, data exchange, and incremental view maintenance to propagate updates along mappings; we integrate a novel model for tracking data provenance, such that curators may filter updates based on trust conditions over this provenance; we discuss strategies for implementing our techniques in conjunction with an RDBMS; and we experimentally demonstrate the viability of our techniques in the ORCHES-TRA prototype system. 1.
Curated databases
- PODS'08
, 2008
"... Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries – dictionaries, encyclopedias, gazetteers etc. – are now curated databases. Since it is now easy to publish databa ..."
Abstract
-
Cited by 43 (6 self)
- Add to MetaCart
Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries – dictionaries, encyclopedias, gazetteers etc. – are now curated databases. Since it is now easy to publish databases on the web, there has been an explosion in the number of new curated databases used in scientific research. The value of curated databases lies in the organization and the quality of the data they contain. Like the paper reference works they have replaced, they usually represent the efforts of a dedicated group of people to produce a definitive description of some subject area. Curated databases present a number of challenges for database research. The topics of annotation, provenance, and citation are central, because curated databases are heavily cross-referenced with, and include data from, other databases, and much of the work of a curator is annotating existing data. Evolution of structure is important because these databases often evolve from semistructured representations, and because they have to accommodate new scientific discoveries. Much of the work in these areas is in its infancy, but it is beginning to provide suggest new research for both theory and practice. We discuss some of this research and emphasize the need to find appropriate models of the processes associated with curated databases.
Recording and reasoning over data provenance in web and grid services
- In Int. Conf. on Ontologies, Databases and Applications of Semantics, volume 2888 of LNCS
, 2003
"... Abstract. Large-scale, dynamic and open environments such as the Grid and Web Services build upon existing computing infrastructures to supply dependable and consistent large-scale computational systems. This kind of architecture has been adopted by those working with business and scientific informa ..."
Abstract
-
Cited by 40 (13 self)
- Add to MetaCart
Abstract. Large-scale, dynamic and open environments such as the Grid and Web Services build upon existing computing infrastructures to supply dependable and consistent large-scale computational systems. This kind of architecture has been adopted by those working with business and scientific information systems allowing them to exploit extensive and diverse computing resources to perform complex data processing tasks. In such systems, results are often derived by composing multiple, geographically distributed, heterogeneous services as specified by intricate workflow management. This leads to the undesirable situation where the results are known, but the means by which they were achieved is not. With both scientific experiments and business transactions, the notion of lineage and dataset derivation is of paramount importance since without it, information is potentially worthless. We address the issue of data provenance, the description of the origin of a piece of data, in these environments showing the requirements, uses and implementation difficulties. We propose an infrastructure level support for a provenance recording capability for service-oriented architectures such as the Grid and Web Services. We also offer services to view and retrieve provenance and we provide a mechanism by which provenance is used to determine whether previous computed results are still up to date. 1
A protocol for recording provenance in service-oriented grids
- In Proceedings of the 8th International Conference on Principles of Distributed Systems (OPODIS’04
, 2004
"... Abstract. Both the scientific and business communities, which are beginning to rely on Grids as problem-solving mechanisms, have requirements in terms of provenance. The provenance of some data is the documentation of process that led to the data; its necessity is apparent in fields ranging from med ..."
Abstract
-
Cited by 40 (13 self)
- Add to MetaCart
Abstract. Both the scientific and business communities, which are beginning to rely on Grids as problem-solving mechanisms, have requirements in terms of provenance. The provenance of some data is the documentation of process that led to the data; its necessity is apparent in fields ranging from medicine to aerospace. To support provenance capture in Grids, we have developed an implementation-independent protocol for the recording of provenance. We describe the protocol in state machine or a three-dimensional state transition diagram. Using these techniques we sketch a liveness property for the system.
Explaining answers from the semantic web: The inference web approach
- Journal of Web Semantics
, 2004
"... The Semantic Web lacks support for explaining answers from web applications. When applications return answers, many users do not know what information sources were used, when they were updated, how reliable the source was, or what information was looked up versus derived. Many users also do not know ..."
Abstract
-
Cited by 38 (18 self)
- Add to MetaCart
The Semantic Web lacks support for explaining answers from web applications. When applications return answers, many users do not know what information sources were used, when they were updated, how reliable the source was, or what information was looked up versus derived. Many users also do not know how implicit answers were derived. The Inference Web (IW) aims to take opaque query answers and make the answers more transparent by providing infrastructure for presenting and managing explanations. The explanations include information concerning where answers came from (knowledge provenance) and how they were derived (or retrieved). In this article we describe an infrastructure for IW explanations. The infrastructure includes: IWBase – an extensible web-based registry containing details about information sources, reasoners, languages, and rewrite rules; PML – the Proof Markup Language specification and API used for encoding portable proofs; IW browser – a tool supporting navigation and presentations of proofs and their explanations; and a new explanation dialogue component. Source information in the IWBase is used to convey knowledge provenance. Representation and reasoning language axioms and rewrite rules in the IWBase are used to support proofs, proof combination, and Semantic Web agent interoperability. The Inference Web is in use by four Semantic Web agents, three of them using embedded reasoning engines fully registered in the IW. Inference Web also provides explanation infrastructure for a number of DARPA and ARDA projects.
Mondrian: Annotating and querying databases through colors and blocks
- in ICDE ’06: Proceedings of the 22nd International Conference on Data Engineering (ICDE’06
, 2006
"... Annotations play a central role in the curation of scientific databases. Despite their importance, data formats and schemas are not designed to manage the increasing variety of annotations. Moreover, DBMS’s often lack support for storing and querying annotations. Furthermore, annotations and data ar ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
Annotations play a central role in the curation of scientific databases. Despite their importance, data formats and schemas are not designed to manage the increasing variety of annotations. Moreover, DBMS’s often lack support for storing and querying annotations. Furthermore, annotations and data are only loosely coupled. This paper introduces an annotation-oriented data model for the manipulation and querying of both data and annotations. In particular, the model allows for the specification of annotations on sets of values and for effectively querying the information on their association. We use the concept of block to represent an annotated set of values. Different colors applied to the blocks represent different annotations. We introduce a color query language for our model and prove it to be both complete (it can express all possible queries over the class of annotated databases), and minimal (all the algebra operators are primitive). We present MONDRIAN, a prototype implementation of our annotation mechanism, and we conduct experiments that investigate the set of parameters which influence the evaluation cost for color queries. 1.
Data provenance: some basic issues
- In Foundations of Software Technology and Theoretical Computer Science
, 2000
"... Abstract. The ease with which one can copy and transform data on the Web, has made it increasingly di cult to determine the origins of a piece of data. We use the term data provenance to refer to the process of tracing and recording the origins of data and its movement between databases. Provenance ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Abstract. The ease with which one can copy and transform data on the Web, has made it increasingly di cult to determine the origins of a piece of data. We use the term data provenance to refer to the process of tracing and recording the origins of data and its movement between databases. Provenance is now an acute issue in scienti c databases where it central to the validation of data. In this paper we discuss some of the technical issues that have emerged in an initial exploration of the topic. 1

