Results 1  10
of
114
Data Integration: A Theoretical Perspective
 Symposium on Principles of Database Systems
, 2002
"... Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interestin ..."
Abstract

Cited by 944 (45 self)
 Add to MetaCart
Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
The DLV System for Knowledge Representation and Reasoning
 ACM Transactions on Computational Logic
, 2002
"... Disjunctive Logic Programming (DLP) is an advanced formalism for knowledge representation and reasoning, which is very expressive in a precise mathematical sense: it allows to express every property of finite structures that is decidable in the complexity class ΣP 2 (NPNP). Thus, under widely believ ..."
Abstract

Cited by 455 (100 self)
 Add to MetaCart
Disjunctive Logic Programming (DLP) is an advanced formalism for knowledge representation and reasoning, which is very expressive in a precise mathematical sense: it allows to express every property of finite structures that is decidable in the complexity class ΣP 2 (NPNP). Thus, under widely believed assumptions, DLP is strictly more expressive than normal (disjunctionfree) logic programming, whose expressiveness is limited to properties decidable in NP. Importantly, apart from enlarging the class of applications which can be encoded in the language, disjunction often allows for representing problems of lower complexity in a simpler and more natural fashion. This paper presents the DLV system, which is widely considered the stateoftheart implementation of disjunctive logic programming, and addresses several aspects. As for problem solving, we provide a formal definition of its kernel language, functionfree disjunctive logic programs (also known as disjunctive datalog), extended by weak constraints, which are a powerful tool to express optimization problems. We then illustrate the usage of DLV as a tool for knowledge representation and reasoning, describing a new declarative programming methodology which allows one to encode complex problems (up to ∆P 3complete problems) in a declarative fashion. On the foundational side, we provide a detailed analysis of the computational complexity of the language of
Data Exchange: Semantics and Query Answering
 In ICDT
, 2003
"... Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to query answe ..."
Abstract

Cited by 420 (37 self)
 Add to MetaCart
(Show Context)
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to query answering in the context of data exchange. These issues arise because, given a source instance, there may be many target instances that satisfy the constraints of the data exchange problem. We give an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that we call universal. A universal solution has no more and no less data than required for data exchange and it represents the entire space of possible solutions. We then identify fairly general, and practical, conditions that guarantee the existence of a universal solution and yield algorithms to compute a canonical universal solution efficiently. We adopt the notion of "certain answers" in indefinite databases for the semantics for query answering in data exchange. We investigate the computational complexity of computing the certain answers in this context and also study the problem of computing the certain answers of target queries by simply evaluating them on a canonical universal solution.
On the decidability and complexity of query answering over inconsistent and incomplete databases
 In Proc. of PODS 2003
, 2003
"... In databases with integrity constraints, data may not satisfy the constraints. In this paper, we address the problem of obtaining consistent answers in such a setting, when key and inclusion dependencies are expressed on the database schema. We establish decidability and complexity results for query ..."
Abstract

Cited by 150 (28 self)
 Add to MetaCart
(Show Context)
In databases with integrity constraints, data may not satisfy the constraints. In this paper, we address the problem of obtaining consistent answers in such a setting, when key and inclusion dependencies are expressed on the database schema. We establish decidability and complexity results for query answering under different assumptions on data (soundness and/or completeness). In particular, after showing that the problem is in general undecidable, we identify the maximal class of inclusion dependencies under which query answering is decidable in the presence of key dependencies. Although obtained in a single database context, such results are directly applicable to data integration, where multiple information sources may provide data that are inconsistent with respect to the global view of the sources. 1.
Logical foundations of peertopeer data integration
 In Proc. of the 23rd ACM SIGACT SIGMOD SIGART Sym. on Principles of Database Systems (PODS2004
, 2004
"... In peertopeer data integration, each peer exports data in terms of its own schema, and data interoperation is achieved by means of mappings among the peer schemas. Peers are autonomous systems and mappings are dynamically created and changed. One of the challenges in these systems is answering que ..."
Abstract

Cited by 106 (13 self)
 Add to MetaCart
(Show Context)
In peertopeer data integration, each peer exports data in terms of its own schema, and data interoperation is achieved by means of mappings among the peer schemas. Peers are autonomous systems and mappings are dynamically created and changed. One of the challenges in these systems is answering queries posed to one peer taking into account the mappings. Obviously, query answering strongly depends on the semantics of the overall system. In this paper, we compare the commonly adopted approach of interpreting peertopeer systems using a firstorder semantics, with an alternative approach based on epistemic logic. We consider several central properties of peertopeer systems: modularity, generality, and decidability. We argue that the approach based on epistemic logic is superior with respect to all the above properties. In particular, we show that, in systems in which peers have decidable schemas and conjunctive mappings, but are arbitrarily interconnected, the firstorder approach may lead to undecidability of query answering, while the epistemic approach always preserves decidability. This is a fundamental property, since the actual interconnections among peers are not under the control of any actor in the system. 1.
J.B.: The chase revisited
 In: PODS (2008
"... We revisit the classical chase procedure, studying its properties as well as its applicability to standard database problems. We settle (in the negative) the open problem of decidability of termination of the standard chase, and we provide sufficient termination conditions which are strictly less ov ..."
Abstract

Cited by 83 (6 self)
 Add to MetaCart
(Show Context)
We revisit the classical chase procedure, studying its properties as well as its applicability to standard database problems. We settle (in the negative) the open problem of decidability of termination of the standard chase, and we provide sufficient termination conditions which are strictly less overconservative than the best previously known. We investigate the adequacy of the standard chase for checking query containment under constraints, constraint implication and computing certain answers in data exchange. We find room for improvement after gaining a deeper understanding of the chase by separating the algorithm from its result. We identify the properties of the chase result that are essential to the above applications, and we introduce the more general notion of an Funiversal model set, which supports query and constraint languages that are closed under a class F of mappings. By choosing F appropriately, we extend prior results all the way to existential firstorder queries and ∀∃firstorder constraints (and various standard sublanguages). We show that the standard chase is incomplete for finding universal model sets, and we introduce the extended core chase which is complete, i.e. finds an Funiversal model set when it exists. A key advantage of the new chase is that the same algorithm can be applied for the mapping classes F of interest, by simply modifying appropriately the set of constraints given as input. Even when restricted to the typical input in prior work (unions of conjunctive queries and embedded dependencies), the new chase supports certain answer computation and containment/implication tests in strictly more cases than the incomplete standard chase.
Query rewriting and answering under constraints in data integration systems
 In Proc. of the 18th Int. Joint Conf. on Artificial Intelligence (IJCAI 2003
, 2003
"... In this paper we address the problem of query answering and rewriting in globalasview data integration systems, when key and inclusion dependencies are expressed on the global integration schema. In the case of sound views, we provide sound and complete rewriting techniques for a maximal class of ..."
Abstract

Cited by 82 (25 self)
 Add to MetaCart
In this paper we address the problem of query answering and rewriting in globalasview data integration systems, when key and inclusion dependencies are expressed on the global integration schema. In the case of sound views, we provide sound and complete rewriting techniques for a maximal class of constraints for which decidability holds. Then, we introduce a semantics which is able to cope with violations of constraints, and present a sound and complete rewriting technique for the same decidable class of constraints. Finally, we consider the decision problem of query answering and give decidability and complexity results. 1
Logic Programs for Consistently Querying Data Integration Systems
 In International Joint Conference on Artificial Intelligence (IJCAI
, 2003
"... We solve the problem of obtaining answers to queries posed to a mediated integration system under the localasview paradigm that are consistent wrt to certain global integrity constraints. For this, the query program is combined with logic programming specifications under the stable model semantics ..."
Abstract

Cited by 61 (5 self)
 Add to MetaCart
We solve the problem of obtaining answers to queries posed to a mediated integration system under the localasview paradigm that are consistent wrt to certain global integrity constraints. For this, the query program is combined with logic programming specifications under the stable model semantics of the class of minimal global instances, and of
Concept abduction and contraction for semanticbased discovery of matches and negotiation spaces in an emarketplace
, 2005
"... ..."
Probabilistic data exchange
 In Proc. ICDT
, 2010
"... The work reported here lays the foundations of data exchange in the presence of probabilistic data. This requires rethinking the very basic concepts of traditional data exchange, such as solution, universal solution, and the certain answers of target queries. We develop a framework for data exchange ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
(Show Context)
The work reported here lays the foundations of data exchange in the presence of probabilistic data. This requires rethinking the very basic concepts of traditional data exchange, such as solution, universal solution, and the certain answers of target queries. We develop a framework for data exchange over probabilistic databases, and make a case for its coherence and robustness. This framework applies to arbitrary schema mappings, and finite or countably infinite probability spaces on the source and target instances. After establishing this framework and formulating the key concepts, we study the application of the framework to a concrete and practical setting where probabilistic databases are compactly encoded by means of annotations formulated over random Boolean variables. In this setting, we study the problems of testing for the existence of solutions and universal solutions, materializing such solutions, and evaluating target queries (for unions of conjunctive queries) in both the exact sense and the approximate sense. For each of the problems, we carry out a complexity analysis based on properties of the annotation, in various classes of dependencies. Finally, we show that the framework and results easily and completely generalize to allow not only the data, but also the schema mapping itself to be probabilistic.