A Normal Form for XML Documents
Abstract

Cited by 167 (8 self)
This paper takes a rst step towards the design and normalization theory for XML documents. We show that, like relational databases, XML documents may contain redundant information, and may be prone to update anomalies. Furthermore, such problems are caused by certain functional dependencies among paths in the document. Our goal is to nd a way of converting an arbitrary DTD into a welldesigned one, that avoids these problems. We rst introduce the concept of a functional dependency for XML, and de ne its semantics via a relational representation of XML. We then de ne an XML normal form, XNF, that avoids update anomalies and redundancies. We study its properties and show that it generalizes BCNF and a normal form for nested relations when those are appropriately coded as XML documents. Finally, we present a lossless algorithm for converting any DTD into one in XNF.
Principles of Programming with Complex Objects and Collection Types
 Theoretical Computer Science
, 1995
Abstract

Cited by 146 (29 self)
We present a new principle for the development of database query languages that the primitive operations should be organized around types. Viewing a relational database as consisting of sets of records, this principle dictates that we should investigate separately operations for records and sets. There are two immediate advantages of this approach, which is partly inspired by basic ideas from category theory. First, it provides a language for structures in which record and set types may be freely combined: nested relations or complex objects. Second, the fundamental operations for sets are closely related to those for other "collection types" such as bags or lists, and this suggests how database languages may be uniformly extended to these new types. The most general operation on sets, that of structural recursion, is one in which not all programs are welldefined. In looking for limited forms of this operation that always give rise to welldefined operations, we find a number of close ...
Finitely Representable Databases
, 1995
Abstract

Cited by 57 (8 self)
: We study classes of infinite but finitely representable databases based on constraints, motivated by new database applications such as geographical databases. We formally define these notions and introduce the concept of query which generalizes queries over classical relational databases. We prove that in this context the basic properties of queries (satisfiability, containment, equivalence, etc.) are nonrecursive. We investigate the theory of finitely representable models and prove that it differs strongly from both classical model theory and finite model theory. In particular, we show that most of the well known theorems of either one fail (compactness, completeness, locality, 0/1 laws, etc.). An immediate consequence is the lack of tools to consider the definability of queries in the relational calculus over finitely representable databases. We illustrate this very challenging problem through some classical examples. We then mainly concentrate on dense order databases, and exhibit...
Towards Tractable Algebras for Bags
, 1993
Abstract

Cited by 56 (5 self)
Bags, i.e. sets with duplicates, are often used to implement relations in database systems. In this paper, we study the expressive power of algebras for manipulating bags. The algebra we present is a simple extension of the nested relation algebra. Our aim is to investigate how the use of bags in the language extends its expressive power, and increases its complexity. We consider two main issues, namely (i) the impact of the depth of bag nesting on the expressive power, and (ii) the complexity and the expressive power induced by the algebraic operations. We show that the bag algebra is more expressive than the nested relation algebra (at all levels of nesting), and that the difference may be subtle. We establish a hierarchy based on the structure of algebra expressions. This hierarchy is shown to be highly related to the properties of the powerset operator. Invited to a special issue of the Journal of Computer and System Sciences selected from ACM Princ. of Database Systems,...
Deciding Containment for Queries with Complex Objects and Aggregations
, 1997
Abstract

Cited by 47 (6 self)
We address the problem of query containment and query equivalence for complex objects. We show that for a certain conjunctive query language for complex objects, query containment and weak query equivalence are decidable. Our results have two consequences. First, when the answers of the two queries are guaranteed not to contain empty sets, then weak equivalence coincides with equivalence, and our result answers partially an open problem about the equivalence of nest; unnest queries for complex objects [GPG90]. Second, we derive an NPcomplete algorithm for checking the equivalence of certain conjunctive queries with grouping and aggregates. Our results rely on a translation of the containment and equivalence conditions for complex objects into novel conditions on conjunctive queries, which we call simulation and strong simulation. These conditions are more complex than containment of conjunctive queries, because they involve arbitrary numbers of quantifier alternations. We prove that c...
Comprehension Syntax
 SIGMOD RECORD
, 1994
Abstract

Cited by 24 (4 self)
The syntax of comprehensions is very close to the syntax of a number of practical database query languages and is, we believe, a better starting point than firstorder logic for the development of database languages. We give an informal account of a language based on comprehension syntax that deals uniformly with a variety of collection types; it also includes pattern matching, variant types and function definition. We show, again informally, how comprehension syntax is a natural fragment of structural recursion, a much more powerful programming paradigm for collection types. We also show that a very small "abstract syntax language" can serve as a basis for the implementation and optimization of comprehension syntax.
Kleisli, a Functional Query System
 J. Funct. Prog
, 1998
Abstract

Cited by 21 (2 self)
Kleisli is a modern data integration system that has made a significant impact on bioinformatics data integration. This paper contains a brief introduction to the Kleisli system and an example to illustrate its uses in the bioinformatics arena. The primary query language provided by Kleisli is called CPL, which is a functional query language whose surface syntax is based on the comprehension syntax. Kleisli is itself implemented using the functional language SML. So this paper also describes the influence of functional programming research that benefits the Kleisli system, especially the less obvious ones at the implementation level. Availability. Kleisli has been commercialized under the name "KRIS". It is available from Kris Technology Inc., 713 Santa Cruz Ave, #2, Menlo Park, CA 94025. Direct email to info@krisinc.com and web browser to http://www.krisinc.com. 1 Introduction The Kleisli system (Davidson et al., 1997) is an advanced broadscale integration technology that has pro...
An Algebra for Pomsets
, 1995
Abstract

Cited by 20 (3 self)
We study languages for manipulating partially ordered structures with duplicates (e.g. trees, lists). As a general framework, we consider the pomset (partially ordered multiset) data type. We introduce an algebra for pomsets, which generalizes traditional algebras for (nested) sets, bags and lists. This paper is motivated by the study of the impact of different language primitives on the expressive power. We show that the use of partially ordered types increases the expressive power significantly. Surprisingly, it turns out that the algebra when restricted to both unordered (bags) and totally ordered (lists) intermediate types, yields the same expressive power as fixpoint logic with counting on relational databases. It therefore constitutes a rather robust class of relational queries. On the other hand, we obtain a characterization of PTIME queries on lists by considering only totally ordered types.
An Extended Algebra for Constraint Databases
 IEEE Transactions on Knowledge and Data Engineering
, 1999
Abstract

Cited by 20 (3 self)
Constraint relational databases use constraints to both model and query data. A constraint relation contains a finite set of generalized tuples. Each generalized tuple is represented by a conjunction of constraints on a given logical theory and, depending on the logical theory and the specific conjunction of constraints, it may possibly represent an infinite set of relational tuples. For their characteristics, constraint databases are well suited to model multidimensional and structured data, like spatial and temporal data. The definition of an algebra for constraint relational databases is important in order to make constraint databases a practical technology. In this paper, we extend the previously defined constraint algebra (called generalized relational algebra). First, we show that the relational model is not the only possible semantic reference model for constraint relational databases and we show how constraint relations can be interpreted under the nested relational model. Then...