Results 1 -
3 of
3
Cut and Paste
, 1998
"... The paper develops Editor, a language for manipulating semi-structured documents, such as the ones typically available on the Web. Editor programs are based on two simple ideas, taken from text editors: "search" instructions are used to select regions of interest in a document, and "cut & paste" to ..."
Abstract
-
Cited by 71 (10 self)
- Add to MetaCart
The paper develops Editor, a language for manipulating semi-structured documents, such as the ones typically available on the Web. Editor programs are based on two simple ideas, taken from text editors: "search" instructions are used to select regions of interest in a document, and "cut & paste" to restructure them. We study the expressive power and the complexity of these programs. We show that they are computationally complete, in the sense that any computable document restructuring can be expressed in Editor. We also study the complexity of a safe subclass of programs, showing that it captures exactly the class of polynomial-time restructurings. The language has been implemented in Java, and is currently used in the Araneus project as a basis for a wrapper--generation toolkit. 1 Introduction It is well known that databases provide robust technology for querying highly structured data in a flexible and efficient way. Recently, the manipulation of less structured information has als...
Sequences, Datalog and Transducers
, 1996
"... This paper develops a query language for sequence databases, such as genome databases and text databases. The language, called SequenceDatalog, extends classical Datalog with interpreted function symbols for manipulating sequences. It has both a clear operational and declarative semantics, based on ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
This paper develops a query language for sequence databases, such as genome databases and text databases. The language, called SequenceDatalog, extends classical Datalog with interpreted function symbols for manipulating sequences. It has both a clear operational and declarative semantics, based on a new notion called the extended active domain of a database. The extended domain contains all the sequences in the database and all their subsequences. This idea leads to a clear distinction between safe and unsafe recursion over sequences: safe recursion stays inside the extended active domain, while unsafe recursion does not. By carefully limiting the amountof unsafe recursion, the paper develops a safe and expressive subset of Sequence Datalog. As part of the development, a new type of transducer is introduced, called a generalized sequence transducer. Unsafe recursion is allowed only within these generalized transducers. Generalized transducers extend ordinary transducers by allowing them to invoke other transducers as "subroutines." Generalized transducers can be implemented in Sequence Datalog in a straightforward way. Moreover, their introduction into the language leads to simple conditions that guarantee safety and finiteness. This paper develops two such conditions. The first condition expresses exactly the class of ptime sequence functions; and the second expresses exactly the class of elementary sequence functions.
Querying Sequence Databases with Transducers
- In International Workshop on Database Programming Languages (DBPL), number 1369 in Lecture Notes in Computer Science
, 1997
"... This paper develops a database query language called Transducer Datalog motivated by the needs of a new and emerging class of database applications. In these applications, such as text databases and genome databases, the storage and manipulation of long character sequences is a crucial feature. T ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper develops a database query language called Transducer Datalog motivated by the needs of a new and emerging class of database applications. In these applications, such as text databases and genome databases, the storage and manipulation of long character sequences is a crucial feature. The issues involved in managing this kind of data are not addressed by traditional database systems, either in theory or in practice. To address these issues, we recently introduced a new machine model called a generalized sequence transducer. These generalized transducers extend ordinary transducers by allowing them to invoke other transducers as "subroutines." This paper establishes the computational properties of Transducer Datalog, a query language based on this new machine model. In the process, we develop a hierarchy of time-complexity classes based on the Ackermann function. The lower levels of this hierarchy correspond to well-known complexity classes, such as polynomial time...

