Results 11 - 20
of
330
A Statistical Parser for Czech
, 1999
"... This paper considers statistical parsing of Czech, which differs radically from English in at least two respects: (1) it is a highly infiected language, and (2) it has relatively free word order. These dif- ferences are likely to .pose new problems for tech- niques that have been developed on Engli ..."
Abstract
-
Cited by 100 (4 self)
- Add to MetaCart
This paper considers statistical parsing of Czech, which differs radically from English in at least two respects: (1) it is a highly infiected language, and (2) it has relatively free word order. These dif- ferences are likely to .pose new problems for tech- niques that have been developed on English. We describe our experience in building on the parsing model of (Collins 97). Our final results - 80% dependency accuracy - represent good progress towards the 91% accuracy of the parser on English (Wall Street Journal) text.
Adding Semantic Annotation to the Penn TreeBank
- In Proceedings of the Human Language Technology Conference
, 2002
"... This paper presents our basic approach to creating Proposition Bank, which involves adding a layer of semantic annotation to the Penn English TreeBank. Without attempting to confirm or disconfirm any particular semantic theory, our goal is to provide consistent argument labeling that will facilitate ..."
Abstract
-
Cited by 88 (1 self)
- Add to MetaCart
This paper presents our basic approach to creating Proposition Bank, which involves adding a layer of semantic annotation to the Penn English TreeBank. Without attempting to confirm or disconfirm any particular semantic theory, our goal is to provide consistent argument labeling that will facilitate the automatic extraction of relational data. An argument such as the window in John broke the window and in The window broke would receive the same label in both sentences. In order to ensure reliable human annotation, we provide our annotators with explicit guidelines for labeling all of the syntactic and semantic frames of each particular verb. We give several examples of these guidelines and discuss the inter-annotator agreement figures. We also discuss our current experiments on the automatic expansion of our verb guidelines based on verb class membership. Our current rate of progress and our consistency of annotation demonstrate the feasibility of the task.
Intricacies of Collins' Parsing Model
- COMPUTATIONAL LINGUISTICS
"... This paper documents a large set of heretofore unpublished details Collins used in his parser, such that, along with Collins' thesis (Collins, 1999), this paper contains all information necessary to duplicate Collins' benchmark results. Indeed, these as-yet-unpublished details account for an 11% rel ..."
Abstract
-
Cited by 87 (1 self)
- Add to MetaCart
This paper documents a large set of heretofore unpublished details Collins used in his parser, such that, along with Collins' thesis (Collins, 1999), this paper contains all information necessary to duplicate Collins' benchmark results. Indeed, these as-yet-unpublished details account for an 11% relative reduction in error between a clean-room implementation of Collins' model and an implementation including all details. We also show a cleaner and equally--well-performing method for the handling of punctuation and conjunction, and reveal certain other probabilistic oddities about Collins' parser. We analyze not only the effect of the unpublished details, but also re-analyze the effect of certain well-known details, revealing that bilexical dependencies are barely used by the model and that head choice is not nearly as important to overall parsing performance as once thought. Finally, we perform experiments that show that the true discriminative power of lexicalization appears to lie in the fact that unlexicalized syntactic structures are generated conditioning on the head word and its part of speech
Statistical Dependency Analysis with Support Vector Machines
- In Proceedings of IWPT
, 2003
"... In this paper, we propose a method for analyzing word-word dependencies using deterministic bottom-up manner using Support Vector machines. We experimented with dependency trees converted from Penn treebank data, and achieved over 90 % accuracy of word-word dependency. Though the result is little wo ..."
Abstract
-
Cited by 83 (0 self)
- Add to MetaCart
In this paper, we propose a method for analyzing word-word dependencies using deterministic bottom-up manner using Support Vector machines. We experimented with dependency trees converted from Penn treebank data, and achieved over 90 % accuracy of word-word dependency. Though the result is little worse than the most up-to-date phrase structure based parsers, it looks satisfactorily accurate considering that our parser uses no information from phrase structures. 1
Training Tree Transducers
- IN HLT-NAACL
, 2004
"... Many probabilistic models for natural language are now written in terms of hierarchical tree structure. Tree-based modeling still lacks many of the standard tools taken for granted in (finite-state) string-based modeling. The theory of tree transducer automata provides a possible framework to ..."
Abstract
-
Cited by 81 (9 self)
- Add to MetaCart
Many probabilistic models for natural language are now written in terms of hierarchical tree structure. Tree-based modeling still lacks many of the standard tools taken for granted in (finite-state) string-based modeling. The theory of tree transducer automata provides a possible framework to draw on, as it has been worked out in an extensive literature. We motivate the use of tree transducers for natural language and address the training problem for probabilistic tree-totree and tree-to-string transducers.
Automatic Verb Classification Based on Statistical Distributions of Argument Structure
- Computational Linguistics
, 2001
"... this paper, we focus on argument structure--the thematic roles assigned by a verb to its arguments--as the way in which the relational semantics of the verb is represented at the syntactic level ..."
Abstract
-
Cited by 79 (15 self)
- Add to MetaCart
this paper, we focus on argument structure--the thematic roles assigned by a verb to its arguments--as the way in which the relational semantics of the verb is represented at the syntactic level
A novel use of statistical parsing to extract information from text
- ANLP
, 2000
"... Since 1995, a few statistical parsing algorithms have demonstrated a breakthrough in parsing accuracy, as measured against the UPenn TREEBANK as a gold standard. In this paper we report adapting a lexicalized, probabilistic context-free parser to information extraction and evaluate this new techniqu ..."
Abstract
-
Cited by 78 (4 self)
- Add to MetaCart
Since 1995, a few statistical parsing algorithms have demonstrated a breakthrough in parsing accuracy, as measured against the UPenn TREEBANK as a gold standard. In this paper we report adapting a lexicalized, probabilistic context-free parser to information extraction and evaluate this new technique on MUC-7 template elements and template relations.
Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars
- IN ACL 37
, 1999
"... Several recent stochastic parsers use bilexical grammars, where each word type idiosyncratically prefers particular complements with particular head words. We present O(n^4) parsing algorithms for two bilexical formalisms, improving the prior upper bounds of O(n^5). For a common special case that wa ..."
Abstract
-
Cited by 74 (15 self)
- Add to MetaCart
Several recent stochastic parsers use bilexical grammars, where each word type idiosyncratically prefers particular complements with particular head words. We present O(n^4) parsing algorithms for two bilexical formalisms, improving the prior upper bounds of O(n^5). For a common special case that was known to allow O(n³) parsing (Eisner, 1997), we present an O(n³) algorithm with an improved grammar constant.
Corpus Variation and Parser Performance
, 2001
"... Most work in statistical parsing has focused on a single corpus: the Wall Street Journal portion of the Penn Treebank. While this has allowed for quantitative comparison of parsing techniques, it has left open the question of how other types of text might a#ect parser performance, and how portable p ..."
Abstract
-
Cited by 72 (0 self)
- Add to MetaCart
Most work in statistical parsing has focused on a single corpus: the Wall Street Journal portion of the Penn Treebank. While this has allowed for quantitative comparison of parsing techniques, it has left open the question of how other types of text might a#ect parser performance, and how portable parsing models are across corpora. We examine these questions by comparing results for the Brown and WSJ corpora, and also consider which parts of the parser's probability model are particularly tuned to the corpus on which it was trained. This leads us to a technique for pruning parameters to reduce the size of the parsing model. 1
Modeling local coherence: An entity-based approach
- In Proceedings of ACL 2005
, 2005
"... This paper considers the problem of automatic assessment of local coherence. We present a novel entity-based representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the ..."
Abstract
-
Cited by 70 (5 self)
- Add to MetaCart
This paper considers the problem of automatic assessment of local coherence. We present a novel entity-based representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the proposed discourse representation supports the effective learning of a ranking function. Our experiments demonstrate that the induced model achieves significantly higher accuracy than a state-of-the-art coherence model. 1

