Results 1 - 10
of
31
A Query Language and Optimization Techniques for Unstructured Data
, 1996
"... A new kind of data model has recently emerged in which the database is not constrained by a conventional schema. Systems like ACeDB, which has become very popular with biologists, and the recent Tsimmis proposal for data integration organize data in tree-like structures whose components can be used ..."
Abstract
-
Cited by 368 (34 self)
- Add to MetaCart
A new kind of data model has recently emerged in which the database is not constrained by a conventional schema. Systems like ACeDB, which has become very popular with biologists, and the recent Tsimmis proposal for data integration organize data in tree-like structures whose components can be used equally well to represent sets and tuples. Such structures allow great flexibility in data representation What query language is appropriate for such structures? Here we propose a simple language UnQL for querying data organized as a rooted, edge-labeled graph. In this model, relational data may be represented as fixed-depth trees, and on such trees UnQL is equivalent to the relational algebra. The novelty of UnQL consists in its programming constructs for arbitrarily deep data and for cyclic structures. While strictly more powerful than query languages with path expressions like XSQL, UnQL can still be efficiently evaluated. We describe new optimization techniques for the deep or "vertical" dimension of UnQL queries. Furthermore, we show that known optimization techniques for operators on flat relations apply to the "horizontal" dimension of UnQL.
BioKleisli: A Digital Library for Biomedical Researchers
, 1996
"... Data of interest to biomedical researchers associated with the Human Genome Project (HGP) is stored all over the world in a number of different electronic data formats and accessible through a varietyof interfaces and retrieval languages. These data sources include conventional relational databases ..."
Abstract
-
Cited by 70 (15 self)
- Add to MetaCart
Data of interest to biomedical researchers associated with the Human Genome Project (HGP) is stored all over the world in a number of different electronic data formats and accessible through a varietyof interfaces and retrieval languages. These data sources include conventional relational databases with SQL interfaces, formatted text files on top of which indexing is provided for efficient retrieval (ASN.1-Entrez), and binary files that can be interpreted textually or graphically via special purpose interfaces (ACeDB). Researchers within the HGP wanttocombine data from these different data sources, add value through sophisticated data analysis techniques (such as the biosequence comparison software BLAST and FASTA), and view it using special purpose scientific visualization tools. However, currently there are no commercial tools for enabling such an integrated digital library, and a fundamental barrier to developing such tools appears to be one of language design and optimization: The data f...
A Data Transformation System for Biological Data Sources
- In Proceedings of 21st International Conference on Very Large Data Bases
, 1995
"... Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well as sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contai ..."
Abstract
-
Cited by 69 (19 self)
- Add to MetaCart
Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well as sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data. 1 Introduction The goal of the Human Genome Project (HGP) is to sequence the 24 distinct chromosomes comprising the human genome. Much of the information associated with the HGP resides not in conventional databases, but in files that have been formatted according to a variety of conventions. These...
Optimizing Object Queries Using an Effective Calculus
- ACM Transactions on Database Systems
, 1998
"... This paper concentrates on query unnesting (also known as query decorrelation), an optimization that, even though improves performance considerably, is not treated properly (if at all) by most OODB systems. Our framework generalizes many unnesting techniques proposed recently in the literature and i ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
This paper concentrates on query unnesting (also known as query decorrelation), an optimization that, even though improves performance considerably, is not treated properly (if at all) by most OODB systems. Our framework generalizes many unnesting techniques proposed recently in the literature and is capable of removing any form of query nesting using a very simple and efficient algorithm. The simplicity of our method is due to the use of the monoid comprehension calculus as an intermediate form for OODB queries. The monoid comprehension calculus treats operations over multiple collection types, aggregates, and quantifiers in a similar way, resulting in a uniform way of unnesting queries, regardless of their type of nesting.
Polymorphism and Type Inference in Database Programming
"... In order to find a static type system that adequately supports database languages, we need to express the most general type of a program that involves database operations. This can be achieved through an extension to the type system of ML that captures the polymorphic nature of field selection, toge ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
In order to find a static type system that adequately supports database languages, we need to express the most general type of a program that involves database operations. This can be achieved through an extension to the type system of ML that captures the polymorphic nature of field selection, together with a technique that generalizes relational operators to arbitrary data structures. The combination provides a statically typed language in which generalized relational databases may be cleanly represented as typed structures. As in ML types are inferred, which relieves the programmer of making the type assertions that may be required in a complex database environment. These extensions may also be used to provide static polymorphic typechecking in object-oriented languages and databases. A problem that arises with object-oriented databases is the apparent need for dynamic typechecking when dealing with queries on heterogeneous collections of objects. An extension of the type system needed for generalized relational operations can also be used for manipulating collections of dynamically typed values in a statically typed language. A prototype language based on these ideas has been implemented. While it lacks a proper treatment of persistent data, it demonstrates that a wide variety of database structures can be cleanly represented in a polymorphic programming language.
Incremental Recomputation of Recursive Queries with Nested Sets and Aggregate Functions
, 1997
"... We examine the power of incremental evaluation systems that use an SQL-like language for maintaining recursively-defined views. We show that recursive queries such as transitive closure, and "alternating paths" can be incrementally maintained in a nested relational language, when some auxiliary r ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
We examine the power of incremental evaluation systems that use an SQL-like language for maintaining recursively-defined views. We show that recursive queries such as transitive closure, and "alternating paths" can be incrementally maintained in a nested relational language, when some auxiliary relations are allowed. In the presence of aggregate functions, even more queries can be maintained, for example, the "same generation" query. In contrast, it is still an open problem whether such queries are maintainable in relational calculus. We then restrict the language so that no nested relations are involved (but wekeep the aggregate functions). Such a language captures the capability of most practical relational database systems. We prove that this restriction does not reduce the incremental computational power; that is, any query that can be maintained in a nested language with aggregates, is still maintainable using only flat relations. We also show that one does not need auxiliar...
A Query Language for NC
- In Proceedings of 13th ACM Symposium on Principles of Database Systems
, 1994
"... We show that a form of divide and conquer recursion on sets together with the relational algebra expresses exactly the queries over ordered relational databases which are NC -computable. At a finer level, we relate k nested uses of recursion exactly to AC k , k 1. We also give corresponding resul ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
We show that a form of divide and conquer recursion on sets together with the relational algebra expresses exactly the queries over ordered relational databases which are NC -computable. At a finer level, we relate k nested uses of recursion exactly to AC k , k 1. We also give corresponding results for complex objects. 1 Introduction NC is the complexity class of functions that are computable in poly-logarithmic time with polynomially many processors on a parallel random access machine (PRAM). The query language for NC discussed here is centered around a form of divide and conquer recursion (dcr ) on finite sets which has obvious potential for parallel evaluation and can easily express, for example, transitive closure and parity. Divide and conquer with parameters e; f; u defines the unique function ', notation dcr (e; f; u), taking finite sets as arguments, such that: '(;) def = e '(fyg) def = f(y) '(s 1 [ s 2 ) def = u('(s 1 ); '(s 2 )) when s 1 " s 2 = ; For parity, we t...
Kleisli, a Functional Query System
- J. Funct. Prog
, 1998
"... Kleisli is a modern data integration system that has made a significant impact on bioinformatics data integration. This paper contains a brief introduction to the Kleisli system and an example to illustrate its uses in the bioinformatics arena. The primary query language provided by Kleisli is calle ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Kleisli is a modern data integration system that has made a significant impact on bioinformatics data integration. This paper contains a brief introduction to the Kleisli system and an example to illustrate its uses in the bioinformatics arena. The primary query language provided by Kleisli is called CPL, which is a functional query language whose surface syntax is based on the comprehension syntax. Kleisli is itself implemented using the functional language SML. So this paper also describes the influence of functional programming research that benefits the Kleisli system, especially the less obvious ones at the implementation level. Availability. Kleisli has been commercialized under the name "KRIS". It is available from Kris Technology Inc., 713 Santa Cruz Ave, #2, Menlo Park, CA 94025. Direct email to info@kris-inc.com and web browser to http://www.kris-inc.com. 1 Introduction The Kleisli system (Davidson et al., 1997) is an advanced broad-scale integration technology that has pro...
Horizontal Query Optimization on Ordered Semistructured Data
, 1999
"... The exchange and storage of XML data is becoming increasingly important. In contrast to conventional semistructured data [4, 1], the labels in a document-oriented representation such as XML are ordered. Furthermore, regular expressions (DTDs) describe the horizontal (and vertical) structure of the d ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
The exchange and storage of XML data is becoming increasingly important. In contrast to conventional semistructured data [4, 1], the labels in a document-oriented representation such as XML are ordered. Furthermore, regular expressions (DTDs) describe the horizontal (and vertical) structure of the data. Traditional query languages for semi-structured data ignore the horizontal order and are therefore limited in their expressiveness and optimizability. We describe a query language for querying ordered semistructured data. This query language provides primitives for specifying more powerful queries on ordered semistructured data. Furthermore, we describe how horizontal type information in DTDs is used to optimize queries based on finite automata.
The Functional Guts of the Kleisli Query System
- SIGPLAN Notices
, 2000
"... Kleisli is a modern data integration system that has made a significant impact on bioinformatics data integration. The primary query language provided by Kleisli is called CPL, which is a functional query language whose surface syntax is based on the comprehension syntax. Kleisli is itself implement ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Kleisli is a modern data integration system that has made a significant impact on bioinformatics data integration. The primary query language provided by Kleisli is called CPL, which is a functional query language whose surface syntax is based on the comprehension syntax. Kleisli is itself implemented using the functional language SML. This paper describes the influence of functional programming research that benefits the Kleisli system, especially the less obvious ones at the implementation level. 1 Introduction The Kleisli system [14] is an advanced broad-scale integration technology that has proved useful in the bioinformatics arena. Many bioinformatics problems require access to data sources that are high in volume, highly heterogeneous and complex, constantly evolving, and geographically dispersed. Solutions to these problems usually involve multiple carefully sequenced steps and require information to be passed smoothly between the steps. Kleisli is designed to handle these req...

