Results 1 - 10
of
11
Scalable Discriminative Learning for Natural Language Parsing and Translation
- In Proceedings of the 2006 Neural Information Processing Systems (NIPS
, 2006
"... Parsing and translating natural languages can be viewed as problems of predicting tree structures. For machine learning approaches to these predictions, the diversity and high dimensionality of the structures involved mandate very large training sets. This paper presents a purely discriminative lear ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Parsing and translating natural languages can be viewed as problems of predicting tree structures. For machine learning approaches to these predictions, the diversity and high dimensionality of the structures involved mandate very large training sets. This paper presents a purely discriminative learning method that scales up well to problems of this size. Its accuracy was at least as good as other comparable methods on a standard parsing task. To our knowledge, it is the first purely discriminative learning algorithm for translation with treestructured models. Unlike other popular methods, this method does not require a great deal of feature engineering a priori, because it performs feature selection over a compound feature space as it learns. Experiments demonstrate the method’s versatility, accuracy, and efficiency. Relevant software is freely available at
Constituent Parsing by Classification
, 2005
"... Ordinary classification techniques can drive a conceptually simple constituent parser that achieves near state-of-the-art accuracy on standard test sets. Here we present such a parser, which avoids some of the limitations of other discriminative parsers. In particular, it does not place any r ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
Ordinary classification techniques can drive a conceptually simple constituent parser that achieves near state-of-the-art accuracy on standard test sets. Here we present such a parser, which avoids some of the limitations of other discriminative parsers. In particular, it does not place any restrictions upon which types of features are allowed. We also present several innovations for faster training of discriminative parsers: we show how training can be parallelized, how examples can be generated prior to training without a working parser, and how independently trained sub-classifiers that have never done any parsing can be e#ectively combined into a working parser. Finally, we propose a new figure-of-merit for bestfirst parsing with confidence-rated inferences.
Encoding Syntactic Annotation
"... There is a widely recognized need for a general framework for linguistic annotation that is flexible and extensible enough to accommodate different annotation types and different theoretical and practical approaches, while at the same time enabling their representation in a "pivot" format that can s ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
There is a widely recognized need for a general framework for linguistic annotation that is flexible and extensible enough to accommodate different annotation types and different theoretical and practical approaches, while at the same time enabling their representation in a "pivot" format that can serve as the basis for comparative evaluation, merging, and the development of reusable editing and processing tools. To address this need, we have developed a framework composed of an abstract model for a variety of different annotation types (e.g., morpho-syntactic tagging, syntactic annotation, coreference annotation, etc.), which can be instantiated in different ways depending on the annotator's approach and goals. The results have been incorporated into XCES (Ide et al., 2000), the XML instantiation of the Corpus Encoding Standard (Ide 1998a,b), which provides a ready-made, standard encoding format together with a data architecture designed specifically for linguistically annotated corpora.
Advances in discriminative parsing
- In Proceedings of the Joint International Conference on Computational Linguistics and Association of Computational Linguistics (COLING/ACL
, 2006
"... The present work advances the accuracy and training speed of discriminative parsing. Our discriminative parsing method has no generative component, yet surpasses a generative baseline on constituent parsing, and does so with minimal linguistic cleverness. Our model can incorporate arbitrary features ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The present work advances the accuracy and training speed of discriminative parsing. Our discriminative parsing method has no generative component, yet surpasses a generative baseline on constituent parsing, and does so with minimal linguistic cleverness. Our model can incorporate arbitrary features of the input and parse state, and performs feature selection incrementally over an exponential feature space during training. We demonstrate the flexibility of our approach by testing it with several parsing strategies and various feature sets. Our implementation is freely available at:
On Detecting Errors in Dependency Treebanks
"... Dependency relations between words are increasingly recognized as an important level of linguistic representation that is close to the data and at the same time to the semantic functor-argument structure as a target of syntactic analysis and processing. Correspondingly, dependency structures play an ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
Dependency relations between words are increasingly recognized as an important level of linguistic representation that is close to the data and at the same time to the semantic functor-argument structure as a target of syntactic analysis and processing. Correspondingly, dependency structures play an important role in parser evaluation and for the training and evaluation of tools based on dependency treebanks. Gold standard dependency treebanks have been created for some languages, most notably Czech, and annotation efforts for other languages are under way. At the same time, general techniques for detecting errors in dependency annotation have not yet been developed. We address this gap by exploring how a technique proposed for detecting errors in constituency-based syntactic annotation can be adapted to systematically detect errors in dependency annotation. Building on an analysis of key properties and differences between constituency and dependency annotation, we discuss results for dependency treebanks for Swedish, Czech, and German. Complementing the focus on detecting errors in dependency treebanks to improve these gold standard resources, the discussion of dependency error detection for different languages and annotation schemes also raises questions of standardization for some aspects of dependency annotation, in particular regarding the locality of annotation, the assumption of a single head for each dependency relation, and phenomena such as coordination. 1
to appear)). The NXT-format switchboard corpus: A rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue. Language Resources and Evaluation Journal
, 2009
"... and prosody of dialogue ..."
Bootstrapping Parallel Treebanks
- In Proceedings of the 7th Conference of the Workshop on Linguistically Interpreted Corpora (LINC
, 2004
"... This paper argues for the development of parallel treebanks. It summarizes the work done in this area and reports on experiments for building a Swedish-German treebank. And it describes our approach for reusing resources from one language while annotating another language. 1 ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper argues for the development of parallel treebanks. It summarizes the work done in this area and reports on experiments for building a Swedish-German treebank. And it describes our approach for reusing resources from one language while annotating another language. 1
An Ownership Model of Annotation: The Ancient Greek Dependency Treebank
"... We describe here the first release of the Ancient Greek Dependency Treebank (AGDT), a 190,903-word syntactically annotated corpus of literary texts including the works of Hesiod, Homer and Aeschylus. While the far larger works of Hesiod and Homer (142,705 words) have been annotated under a standard ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We describe here the first release of the Ancient Greek Dependency Treebank (AGDT), a 190,903-word syntactically annotated corpus of literary texts including the works of Hesiod, Homer and Aeschylus. While the far larger works of Hesiod and Homer (142,705 words) have been annotated under a standard treebank production method of soliciting annotations from two independent reviewers and then reconciling their differences, we also put forth with Aeschylus (48,198 words) a new model of treebank production that draws on the methods of classical philology to take into account the personal responsibility of the annotator in the publication and ownership of a “scholarly ” treebank. 1
Computational Challenges in Parsing by Classification
"... This paper presents a discriminative parser that does not use a generative model in any way, yet whose accuracy still surpasses a generative baseline. The parser performs feature selection incrementally during training, as opposed to a priori, which enables it to work well with minimal linguistic cl ..."
Abstract
- Add to MetaCart
This paper presents a discriminative parser that does not use a generative model in any way, yet whose accuracy still surpasses a generative baseline. The parser performs feature selection incrementally during training, as opposed to a priori, which enables it to work well with minimal linguistic cleverness. The main challenge in building this parser was fitting the training data into memory. We introduce gradient sampling, which increased training speed 100-fold. Our implementation is freely available at
Automatic Sentiment Monitoring of Specific Topics in the Blogosphere
"... Abstract. The classification of a text according to its sentiment is a task of raising relevance in many applications, including applications related to monitoring and tracking of the blogosphere. The blogosphere provides a rich source of information about products, personalities, technologies, etc. ..."
Abstract
- Add to MetaCart
Abstract. The classification of a text according to its sentiment is a task of raising relevance in many applications, including applications related to monitoring and tracking of the blogosphere. The blogosphere provides a rich source of information about products, personalities, technologies, etc. The identification of the sentiment expressed in articles is an important asset to a proper analysis of this user-generated data. In this paper we focus on the task of automatic determination of the polarity of blogs articles, i. e., the sentiment analysis of blogs. In order to identify whether a piece of text expresses a positive or negative opinion, an approach based on word spotting was used. Empirical results on different domains show that our approach performs well if compared to costly and domain-specific approaches. In addition to that, if we consider an aggregation of a set of documents and not the polarity of each individual document, we can achieve an accuracy distribution around 90% for specific topics of a certain domain.

