Results 1 - 10
of
116
Data Integration: A Theoretical Perspective
- Symposium on Principles of Database Systems
, 2002
"... Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interestin ..."
Abstract
-
Cited by 965 (45 self)
- Add to MetaCart
Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
The DLV System for Knowledge Representation and Reasoning
- ACM Transactions on Computational Logic
, 2002
"... Disjunctive Logic Programming (DLP) is an advanced formalism for knowledge representation and reasoning, which is very expressive in a precise mathematical sense: it allows to express every property of finite structures that is decidable in the complexity class ΣP 2 (NPNP). Thus, under widely believ ..."
Abstract
-
Cited by 456 (102 self)
- Add to MetaCart
Disjunctive Logic Programming (DLP) is an advanced formalism for knowledge representation and reasoning, which is very expressive in a precise mathematical sense: it allows to express every property of finite structures that is decidable in the complexity class ΣP 2 (NPNP). Thus, under widely believed assumptions, DLP is strictly more expressive than normal (disjunction-free) logic programming, whose expressiveness is limited to properties decidable in NP. Importantly, apart from enlarging the class of applications which can be encoded in the language, disjunction often allows for representing problems of lower complexity in a simpler and more natural fashion. This paper presents the DLV system, which is widely considered the state-of-the-art implementation of disjunctive logic programming, and addresses several aspects. As for problem solving, we provide a formal definition of its kernel language, function-free disjunctive logic programs (also known as disjunctive datalog), extended by weak constraints, which are a powerful tool to express optimization problems. We then illustrate the usage of DLV as a tool for knowledge representation and reasoning, describing a new declarative programming methodology which allows one to encode complex problems (up to ∆P 3-complete problems) in a declarative fashion. On the foundational side, we provide a detailed analysis of the computational complexity of the language of
Data Exchange: Semantics and Query Answering
- In ICDT
, 2003
"... Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to query answe ..."
Abstract
-
Cited by 427 (41 self)
- Add to MetaCart
(Show Context)
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to query answering in the context of data exchange. These issues arise because, given a source instance, there may be many target instances that satisfy the constraints of the data exchange problem. We give an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that we call universal. A universal solution has no more and no less data than required for data exchange and it represents the entire space of possible solutions. We then identify fairly general, and practical, conditions that guarantee the existence of a universal solution and yield algorithms to compute a canonical universal solution efficiently. We adopt the notion of "certain answers" in indefinite databases for the semantics for query answering in data exchange. We investigate the computational complexity of computing the certain answers in this context and also study the problem of computing the certain answers of target queries by simply evaluating them on a canonical universal solution.
On the decidability and complexity of query answering over inconsistent and incomplete databases
- In Proc. of PODS 2003
, 2003
"... In databases with integrity constraints, data may not satisfy the constraints. In this paper, we address the problem of obtaining consistent answers in such a setting, when key and inclusion dependencies are expressed on the database schema. We establish decidability and complexity results for query ..."
Abstract
-
Cited by 149 (29 self)
- Add to MetaCart
(Show Context)
In databases with integrity constraints, data may not satisfy the constraints. In this paper, we address the problem of obtaining consistent answers in such a setting, when key and inclusion dependencies are expressed on the database schema. We establish decidability and complexity results for query answering under different assumptions on data (soundness and/or completeness). In particular, after showing that the problem is in general undecidable, we identify the maximal class of inclusion dependencies under which query answering is decidable in the presence of key dependencies. Although obtained in a single database context, such results are directly applicable to data integration, where multiple information sources may provide data that are inconsistent with respect to the global view of the sources. 1.
Logical foundations of peer-to-peer data integration
- In Proc. of the 23rd ACM SIGACT SIGMOD SIGART Sym. on Principles of Database Systems (PODS-2004
, 2004
"... In peer-to-peer data integration, each peer exports data in terms of its own schema, and data interoperation is achieved by means of mappings among the peer schemas. Peers are autonomous systems and mappings are dynamically created and changed. One of the challenges in these systems is answering que ..."
Abstract
-
Cited by 107 (13 self)
- Add to MetaCart
(Show Context)
In peer-to-peer data integration, each peer exports data in terms of its own schema, and data interoperation is achieved by means of mappings among the peer schemas. Peers are autonomous systems and mappings are dynamically created and changed. One of the challenges in these systems is answering queries posed to one peer taking into account the mappings. Obviously, query answering strongly depends on the semantics of the overall system. In this paper, we compare the commonly adopted approach of interpreting peerto-peer systems using a first-order semantics, with an alternative approach based on epistemic logic. We consider several central properties of peer-to-peer systems: modularity, generality, and decidability. We argue that the approach based on epistemic logic is superior with respect to all the above properties. In particular, we show that, in systems in which peers have decidable schemas and conjunctive mappings, but are arbitrarily interconnected, the first-order approach may lead to undecidability of query answering, while the epistemic approach always preserves decidability. This is a fundamental property, since the actual interconnections among peers are not under the control of any actor in the system. 1.
J.B.: The chase revisited
- In: PODS (2008
"... We revisit the classical chase procedure, studying its properties as well as its applicability to standard database problems. We settle (in the negative) the open problem of decidability of termination of the standard chase, and we provide sufficient termination conditions which are strictly less ov ..."
Abstract
-
Cited by 87 (6 self)
- Add to MetaCart
We revisit the classical chase procedure, studying its properties as well as its applicability to standard database problems. We settle (in the negative) the open problem of decidability of termination of the standard chase, and we provide sufficient termination conditions which are strictly less over-conservative than the best previously known. We investigate the adequacy of the standard chase for checking query containment under constraints, constraint implication and computing certain answers in data exchange. We find room for improvement after gaining a deeper understanding of the chase by separating the algorithm from its result. We identify the properties of the chase result that are essential to the above applications, and we introduce the more general notion of an F-universal model set, which supports query and constraint languages that are closed under a class F of mappings. By choosing F appropriately, we extend prior results all the way to existential first-order queries and ∀∃-first-order constraints (and various standard sublanguages). We show that the standard chase is incomplete for finding universal model sets, and we introduce the extended core chase which is complete, i.e. finds an F-universal model set when it exists. A key advantage of the new chase is that the same algorithm can be applied for the mapping classes F of interest, by simply modifying appropriately the set of constraints given as input. Even when restricted to the typical input in prior work (unions of conjunctive queries and embedded dependencies), the new chase supports certain answer computation and containment/implication tests in strictly more cases than the incomplete standard chase.
Query rewriting and answering under constraints in data integration systems
- In Proc. of the 18th Int. Joint Conf. on Artificial Intelligence (IJCAI 2003
, 2003
"... In this paper we address the problem of query answering and rewriting in global-as-view data integration systems, when key and inclusion dependencies are expressed on the global integration schema. In the case of sound views, we provide sound and complete rewriting techniques for a maximal class of ..."
Abstract
-
Cited by 83 (25 self)
- Add to MetaCart
In this paper we address the problem of query answering and rewriting in global-as-view data integration systems, when key and inclusion dependencies are expressed on the global integration schema. In the case of sound views, we provide sound and complete rewriting techniques for a maximal class of constraints for which decidability holds. Then, we introduce a semantics which is able to cope with violations of constraints, and present a sound and complete rewriting technique for the same decidable class of constraints. Finally, we consider the decision problem of query answering and give decidability and complexity results. 1
Concept abduction and contraction for semantic-based discovery of matches and negotiation spaces in an e-marketplace
, 2005
"... ..."
Logic Programs for Consistently Querying Data Integration Systems
- In International Joint Conference on Artificial Intelligence (IJCAI
, 2003
"... We solve the problem of obtaining answers to queries posed to a mediated integration system under the local-as-view paradigm that are consistent wrt to certain global integrity constraints. For this, the query program is combined with logic programming specifications under the stable model semantics ..."
Abstract
-
Cited by 61 (5 self)
- Add to MetaCart
We solve the problem of obtaining answers to queries posed to a mediated integration system under the local-as-view paradigm that are consistent wrt to certain global integrity constraints. For this, the query program is combined with logic programming specifications under the stable model semantics of the class of minimal global instances, and of
Condensed representation of database repairs for consistent query answering
- In ICDT
, 2003
"... Abstract. Repairing a database means bringing the database in accordance with a given set of integrity constraints by applying modifications that are as small as possible. In the seminal work of Arenas et al. on query answering in the presence of inconsistency, the possible modifications considered ..."
Abstract
-
Cited by 49 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Repairing a database means bringing the database in accordance with a given set of integrity constraints by applying modifications that are as small as possible. In the seminal work of Arenas et al. on query answering in the presence of inconsistency, the possible modifications considered are deletions and insertions of tuples. Unlike earlier work, we also allow tuple updates as a repair primitive. Update-based repairing is advantageous, because it allows rectifying an error within a tuple without deleting the tuple, thereby preserving other consistent values in the tuple. At the center of the paper is the problem of query answering in the presence of inconsistency relative to this refined repair notion. Given a query, a trustable answer is obtained by intersecting the query answers on all repaired versions of the database. The problem arising is that, in general, a database can be repaired in infinitely many ways. A positive result is that for conjunctive queries and full dependencies, there exists a condensed representation of all repairs that permits computing trustable query answers.