Results 1 - 10
of
103
Robust Accurate Statistical Annotation of General Text
, 2002
"... We describe a robust accurate domain-independent approach to statistical parsing incorporated into the new release of the ANLT toolkit, and publicly available as a research tool. The system has been used to parse many well known corpora in order to produce data for lexical acquisition efforts; it ha ..."
Abstract
-
Cited by 193 (13 self)
- Add to MetaCart
We describe a robust accurate domain-independent approach to statistical parsing incorporated into the new release of the ANLT toolkit, and publicly available as a research tool. The system has been used to parse many well known corpora in order to produce data for lexical acquisition efforts; it has also been used as a component in an open-domain question answering project. The performance of the system is competitive with that of statistical parsers using highly lexicalised parse selection models. However, we plan to extend the system to improve parse coverage, depth and accuracy.
The LinGO Redwoods Treebank -- Motivation and Preliminary Applications
"... The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. While several medium- to large-scale treebanks exist for English (and for other major languages), pre-existing publicly available resources exhibit the following limitations: (i) annotation is m ..."
Abstract
-
Cited by 83 (21 self)
- Add to MetaCart
The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. While several medium- to large-scale treebanks exist for English (and for other major languages), pre-existing publicly available resources exhibit the following limitations: (i) annotation is mono-stratal, either encoding topological (phrase structure) or tectogrammatical (dependency) information, (ii) the depth of linguistic information recorded is comparatively shallow, (iii) the design and format of linguistic representation in the treebank hard-wires a small, predefined range of ways in which information can be extracted from the treebank, and (iv) representations in existing treebanks are static and over the (often year- or decade-long) evolution of a large-scale treebank tend to fall behind the development of the field. LinGO Redwoods aims at the development of a novel treebanking methodology, rich in nature and dynamic both in the ways linguistic data can be retrieved from the treebank in varying granularity and in the constant evolution and regular updating of the treebank itself. Since October 2001, the project is working to build the foundations for this new type of treebank, to develop a basic set of tools for treebank construction and maintenance, and to construct an initial set of 10,000 annotated trees to be distributed together with the tools under an open-source license.
The Grammar Matrix: An Open-Source Starter-Kit for the Rapid Development of Cross-Linguistically Consistent Broad-Coverage Precision Grammars
- Proceedings of the Workshop on Grammar Engineering and Evaluation at the 19th International Conference on Computational Linguistics
, 2002
"... The grammar matrix is an open-source starter-kit for the development of broadcoverage HPSGs. By using a type hierarchy to represent cross-linguistic generalizations and providing compatibility with other open-source tools for grammar engineering, evaluation, parsing and generation, it facilit ..."
Abstract
-
Cited by 70 (19 self)
- Add to MetaCart
The grammar matrix is an open-source starter-kit for the development of broadcoverage HPSGs. By using a type hierarchy to represent cross-linguistic generalizations and providing compatibility with other open-source tools for grammar engineering, evaluation, parsing and generation, it facilitates not only quick start-up but also rapid growth towards the wide coverage necessary for robust natural language processing and the precision parses and semantic representations necessary for natural language understanding.
Coupling CCG and Hybrid Logic Dependency Semantics
- IN PROC. ACL 2002
, 2002
"... Categorial grammar has traditionally used the l-calculus to represent meaning. We present an alternative, dependency-based perspective on linguistic meaning and situate it in the computational setting. This perspective is formalized in terms of hybrid logic and has a rich yet perspicuous prop ..."
Abstract
-
Cited by 69 (33 self)
- Add to MetaCart
Categorial grammar has traditionally used the l-calculus to represent meaning. We present an alternative, dependency-based perspective on linguistic meaning and situate it in the computational setting. This perspective is formalized in terms of hybrid logic and has a rich yet perspicuous propositional ontology that enables a wide variety of semantic phenomena to be represented in a single meaning formalism. Finally,
LinGO Redwoods - A Rich and Dynamic Treebank for HPSG
- In Beyond PARSEVAL. Workshop of the Third LREC Conference
, 2002
"... The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. A treebank is a (typically hand-built) collection of natural language utterances and associated linguistic analyses; typical treebanks---as for example the widely recognized Penn Treebank (Ma ..."
Abstract
-
Cited by 53 (12 self)
- Add to MetaCart
The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. A treebank is a (typically hand-built) collection of natural language utterances and associated linguistic analyses; typical treebanks---as for example the widely recognized Penn Treebank (Marcus, Santorini, & Marcinkiewicz, 1993), the Prague Dependency Treebank (Hajic, 1998), or the German TiGer Corpus (Skut, Krenn, Brants, & Uszkoreit, 1997)---assign syntactic phrase structure or tectogrammatical dependency trees over sentences taken from a naturallyoccuring source, often newspaper text. Applications of existing treebanks fall into two broad categories: (i) use of an annotated corpus in empirical linguistics as a source of structured language data and distributional patterns and (ii) use of the treebank for the acquisition (e.g. using stochastic or machine learning approaches) and evaluation of parsing systems.
Rapid prototyping of scalable grammars: Towards modularity in extensions to a language-independent core
- In Proceedings of the 2nd International Joint Conference on Natural Language Processing IJCNLP-05 (Posters/Demos), Jeju Island, Korea
, 2005
"... We present a new way to simplify the construction of precise broad-coverage grammars, employing typologicallymotivated, customizable extensions to a language-independent core grammar. Each ‘module ’ represents a salient dimension of cross-linguistic variation, and presents the grammar developer with ..."
Abstract
-
Cited by 47 (18 self)
- Add to MetaCart
(Show Context)
We present a new way to simplify the construction of precise broad-coverage grammars, employing typologicallymotivated, customizable extensions to a language-independent core grammar. Each ‘module ’ represents a salient dimension of cross-linguistic variation, and presents the grammar developer with simple choices that result in automatically generated language-specific software. We illustrate the approach for several phenomena and explore the interdependence of the modules. 1
Adapting Chart Realization to CCG
- IN: PROC
"... We describe a bottom-up chart realization algorithm adapted for use with Combinatory Categorial Grammar (CCG), and show how it can be used to efficiently realize a wide range of coordination phenomena, including argument cluster coordination and gapping. The algorithm has been implemented as an exte ..."
Abstract
-
Cited by 36 (13 self)
- Add to MetaCart
We describe a bottom-up chart realization algorithm adapted for use with Combinatory Categorial Grammar (CCG), and show how it can be used to efficiently realize a wide range of coordination phenomena, including argument cluster coordination and gapping. The algorithm has been implemented as an extension to the OpenNLP open source CCG parser. As an avenue for future exploration, we also suggest how the realizer could be used to simplify the treatment of aggregation in conjunction with higher level content planning components.
Efficient Deep Processing of Japanese
- In Proceedings of the 3rd Workshop on Asian Language Resources and Standardization at the 19th International Conference on Computational Linguistics
"... We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is b ..."
Abstract
-
Cited by 34 (8 self)
- Add to MetaCart
(Show Context)
We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages.
Ensemble-based Active Learning for Parse Selection
"... Supervised estimation methods are widely seen as being superior to semi and fully unsupervised methods. However, supervised methods crucially rely upon training sets that need to be manually annotated. This can be very expensive, especially when skilled annotators are required. Active learning (AL) ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
Supervised estimation methods are widely seen as being superior to semi and fully unsupervised methods. However, supervised methods crucially rely upon training sets that need to be manually annotated. This can be very expensive, especially when skilled annotators are required. Active learning (AL) promises to help reduce this annotation cost. Within the complex domain of HPSG parse selection, we show that ideas from ensemble learning can help further reduce the cost of annotation. Our main results show that at times, an ensemble model trained with randomly sampled examples can outperform a single model trained using AL. However, converting the single-model AL method into an ensemble-based AL method shows that even this much stronger baseline model can be improved upon. Our best results show a ¢¤£¦ ¥ reduction in annotation cost compared with single-model random sampling.