Results 1 - 10
of
22
Tregex and tsurgeon: tools for querying and manipulating tree data structures
- In 5th International Conference on Language Resources and Evaluation
, 2006
"... With syntactically annotated corpora becoming increasingly available for a variety of languages and grammatical frameworks, tree query tools have proven invaluable to linguists and computer scientists for both data exploration and corpusbased research. We provide a combined engine for tree query (Tr ..."
Abstract
-
Cited by 97 (1 self)
- Add to MetaCart
(Show Context)
With syntactically annotated corpora becoming increasingly available for a variety of languages and grammatical frameworks, tree query tools have proven invaluable to linguists and computer scientists for both data exploration and corpusbased research. We provide a combined engine for tree query (Tregex) and manipulation (Tsurgeon) that can operate on arbitrary tree data structures with no need for preprocessing. Tregex remedies several expressive and implementational limitations of existing query tools, while Tsurgeon is to our knowledge the most expressive tree manipulation utility available. 1.
Xpath leashed
- IN ACM COMPUTING SURVEYS
, 2007
"... This survey gives an overview of formal results on the XML query language XPath. We identify several important fragments of XPath, focusing on subsets of XPath 1.0. We then give results on the expressiveness of XPath and its fragments compared to other formalisms for querying trees, algorithms and c ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
This survey gives an overview of formal results on the XML query language XPath. We identify several important fragments of XPath, focusing on subsets of XPath 1.0. We then give results on the expressiveness of XPath and its fragments compared to other formalisms for querying trees, algorithms and complexity bounds for evaluation of XPath queries, and static analysis of XPath queries.
Querying and updating treebanks: A critical survey and requirements analysis
- In Proceedings of the Australasian Language Technology Workshop
, 2004
"... Language technology makes extensive use of hierarchically annotated text and speech data. These databases are stored in flat files and manipulated using corpus-specific query tools or special-purpose scripts. While the size of these databases and the range of applications has grown rapidly in recent ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
(Show Context)
Language technology makes extensive use of hierarchically annotated text and speech data. These databases are stored in flat files and manipulated using corpus-specific query tools or special-purpose scripts. While the size of these databases and the range of applications has grown rapidly in recent years, neither method for managing the data has led to reusable, scalable software. The formal properties of the query languages are not well understood. Hence established methods for indexing tree data and optimizing tree queries cannot be employed. We analyze a range of existing linguistic query languages, and adduce a set of requirements for a reusable, scalable linguistic query language. 1
Designing and Evaluating an XPath Dialect for Linguistic Queries
- In Proc. of the 22nd Int. Conf. on Data Engineering (ICDE 2006
, 2006
"... Linguistic research and natural language processing employ large repositories of ordered trees. XML, a standard ordered tree model, and XPath, its associated language, are natural choices for linguistic data and queries. However, several important expressive features required for linguistic queries ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
(Show Context)
Linguistic research and natural language processing employ large repositories of ordered trees. XML, a standard ordered tree model, and XPath, its associated language, are natural choices for linguistic data and queries. However, several important expressive features required for linguistic queries are missing or hard to express in XPath. In this paper, we motivate and illustrate these features with a variety of linguistic queries. Then we propose extensions to XPath to support linguistic queries, and design an efficient query engine based on a novel labeling scheme. Experiments demonstrate that our language is not only sufficiently expressive for linguistic trees but also efficient for practical usage. 1
XPath, transitive closure logic, and nested tree walking automata
- In Proceedings PODS 2008
, 2008
"... We consider the navigational core of XPath, extended with two operators: the Kleene star for taking the transitive closure of path expressions, and a subtree relativisation operator, allowing one to restrict attention to a specific subtree while evaluating a subexpression. We show that the expressiv ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
(Show Context)
We consider the navigational core of XPath, extended with two operators: the Kleene star for taking the transitive closure of path expressions, and a subtree relativisation operator, allowing one to restrict attention to a specific subtree while evaluating a subexpression. We show that the expressive power of this XPath dialect equals that of FO(MTC), first order logic extended with monadic transitive closure. We also give a characterization in terms of nested tree-walking automata. Using the latter we then proceed to show that the language is strictly less expressive than MSO. This solves an open question about the relative expressive power of FO(MTC) and MSO on trees. We also investigate the complexity for our XPath dialect. We show that query evaluation be done in polynomial time (combined complexity), but that satisfiability and query containment (as well as emptiness for our automaton model) are 2ExpTime-complete (it is ExpTime-complete for Core XPath).
Processing Queries on Tree-Structured Data Efficiently
- In PODS’06
, 2006
"... This is a survey of algorithms, complexity results, and general solution techniques for efficiently processing queries on tree-structured data. I focus on query languages that compute nodes or tuples of nodes – conjunctive queries, first-order queries, datalog, and XPath. I also point out a number o ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
(Show Context)
This is a survey of algorithms, complexity results, and general solution techniques for efficiently processing queries on tree-structured data. I focus on query languages that compute nodes or tuples of nodes – conjunctive queries, first-order queries, datalog, and XPath. I also point out a number of connections among previous results that have not been observed before. The techniques belong to five groups: 1. employing orders on the nodes of the tree for efficient labeling schemes and structural joins, 2. linear-time algorithms for evaluating Horn-SAT (the datalog technique), 3. structural decomposition techniques for queries, 4. query rewriting, and 5. holistic query processing techniques that can be explained using ideas from constraint satisfaction. 1
S.: Coral: Corpus access in controlled language
- Corpora
, 2012
"... Abstract In this paper, we present Coral, an interface in which complex corpus queries can be expressed in a controlled subset of natural English. With the help of a predictive editor, users can compose queries and submit them to the Coral system, which then automatically translates them into forma ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
(Show Context)
Abstract In this paper, we present Coral, an interface in which complex corpus queries can be expressed in a controlled subset of natural English. With the help of a predictive editor, users can compose queries and submit them to the Coral system, which then automatically translates them into formal AQL statements. We give an overview of the controlled natural language developed for Coral and describes the functionalities of the predictive editor provided for it. It also reports on a user experiment in which the system was evaluated. The results show that, with Coral, corpora of annotated texts can be queried easier and faster than with the existing ANNIS interface. Our system demonstrates that complex corpora can be accessed without the need to learn a complicated formal query language.
Online Evaluation of Regular Tree Queries
"... Regular tree queries (RTQs) are a class of queries considered especially relevant for the expressiveness and evaluation of XML query languages. The algorithms proposed so far for evaluating queries online, while scanning the input data rather than by explicitly building the tree representation of th ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Regular tree queries (RTQs) are a class of queries considered especially relevant for the expressiveness and evaluation of XML query languages. The algorithms proposed so far for evaluating queries online, while scanning the input data rather than by explicitly building the tree representation of the input beforehand, only cover restricted subsets of RTQs. In contrast, we introduce here an efficient algorithm for the online evaluation of unrestricted RTQs. We prove our algorithm is optimal in the sense that it finds matches at the earliest possible time for the query and the input document at hand. The time complexity of the algorithm is quadratic in the input size in the worst case and linear in many practical cases. Preliminary experimental evaluation of our practical implementation are very encouraging.
Storing and Querying Historical Texts in a Relational Database.” Informatik-Bericht Nr.176 des Instituts für Informatik der Humboldt-Universität zu
- Informatik-Bericht 176, Inst. für Informatik, Humboldt-Universität zu Berlin
, 2005
"... This paper describes an approach for storing and querying a large corpus of linguistically annotated historical texts in a relational database management system. Texts in such a corpus have a complex structure consisting of multiple text layers that are richly annotated and aligned to each other. Mo ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper describes an approach for storing and querying a large corpus of linguistically annotated historical texts in a relational database management system. Texts in such a corpus have a complex structure consisting of multiple text layers that are richly annotated and aligned to each other. Modeling and managing such corpora poses various challenges not present in simpler text collections. In particular, it is a difficult task to design and efficiently implement a query language for such complex annotation structures that fulfills the requirements of linguists and philologists. In this report, we describe steps towards a solution of this task. We describe a model for storing arbitrarily complex linguistic annotation schemes for text. The text itself may be present in various transliterations, transcriptions, or editions. We identify the main requirements for a query language on linguistic annotations in this scenario. From these requirements, we derive fundamental query operators and sketch their implementation in our model. Furthermore, we discuss initial ideas for improving the efficiency of an implementation based on
XML Transformation Language Based on Monadic Second-order Logic
"... Abstract. Although monadic second-order logic (MSO) has been a foundation of XML queries, little work has attempted to take MSO formulae themselves as a programming construct. Indeed, MSO can express (1) all regular queries, (2) deep matching without explicit recursion, (3) queries that “don’t care ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Although monadic second-order logic (MSO) has been a foundation of XML queries, little work has attempted to take MSO formulae themselves as a programming construct. Indeed, MSO can express (1) all regular queries, (2) deep matching without explicit recursion, (3) queries that “don’t care ” unmentioned nodes, and (4) n-ary queries for locating n-tuples of nodes. While previous frameworks for subtree extraction (path expressions, pattern matches, etc.) each have some of these properties, none satisfies all. In this work, we have designed and implemented a practical XML transformation language, MTran, fully exploiting MSO’s expressiveness. Based on XSLT-like “select-and-transform” paradigm, we design transformation templates specially suitable for expressing structure-preserving transformation, eliminating the need for explicit recursive calls. Also, we allow nesting of templates for making use of an n-ary query that depends on previously selected n − 1 nodes. For the implementation, we have developed an efficient evaluation strategy for n-ary MSO queries, consisting of (a) an exploitation of MONA system for the translation from MSO to tree automata and (b) a linear time query evaluation algorithm for tree automata. The latter is similar to Flum-Frick-Grohe algorithm locating n-tuples of sets of nodes, except that our query is specialized to querying tuples of nodes and employs partially lazy set operations for attaining a simpler implementation with a fewer number of tree traversals. Our preliminary experiments confirm that our strategy yields a practical performance. 1