#### DMCA

## Head-Driven Statistical Models for Natural Language Parsing (1999)

### Cached

### Download Links

Citations: | 1139 - 15 self |

### Citations

11694 | Maximum Likelihood From Incomplete Data via the EM - Dempster, Laird, et al. |

4744 | Introduction to Automata Theory, Languages and Computability - HOPCROFT, MOTWANI, et al. - 2000 |

3376 | Conditional random fields: Probabilistic models for segmenting and labeling sequence data - Lafferty, McCallum, et al. - 2001 |

2715 | The minimalist program - Chomsky - 1995 |

1183 | An empirical study of smoothing techniques for language modeling - Chen, Goodman - 1999 |

961 | A maximum-entropy-inspired parser
- Charniak
- 2000
(Show Context)
Citation Context ...on marks) and have the same label 15 as a constituent in the treebank parse. Table 2 shows the results for models 1, 2 and 3 and a variety of other models in the literature. Two models (Collins 2000; =-=Charniak 2000-=-) outperform models 2 and 3 on section 23 of the treebank. Collins (2000) uses a technique based on boosting algorithms for machine learning that reranks n-best output from model 2 in this article. Ch... |

957 | Class-based n-gram models of natural language - Brown, Peter, et al. - 1992 |

915 |
The language instinct
- Pinker
- 1994
(Show Context)
Citation Context ...ysis is semantically quite plausible, consider Bill believed John to have been shot.) As evidence that structural preferences can even override semantic plausibility, take the following example (from =-=Pinker 1994-=-): (5) Flip said that Squeaky will do the work yesterday. This sentence is a garden path: The structural preference for yesterday to modify the most recent verb is so strong that it is easy to miss th... |

767 | A stochastic parts program and noun phrase parser for unrestricted text - Church - 1988 |

767 | Tree-adjoining grammars - Joshi, Schabes - 1997 |

702 | An efficient boosting algorithm for combining preferences
- Freund, Iyer, et al.
- 1998
(Show Context)
Citation Context ...red model. The use of additional features gives clear improvements in performance. Collins (2000) shows similar improvements through a quite different model based on boosting approaches to reranking (=-=Freund et al. 1998). A-=-n initial model—in fact Model 2 described in the current article—is used to generate N-best output. The reranking approach attempts to rerank the N-best lists using additional features that are no... |

698 | Syntactic Structures - Chomsky - 1957 |

632 | Statistical Language Learning - Charniak - 1993 |

619 | An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes - Baum - 1969 |

607 | Lexical-functional grammar: A formal system for grammatical representation - Kaplan, Bresnan - 1982 |

575 | A maximum-entropy model for part-ofspeech tagging
- Ratnaparkhi
- 1996
(Show Context)
Citation Context ... be incorrect. With normalization, only the verb-object relation is incorrect. 17 The justification for this is that there is an estimated 3% error rate in the hand-assigned POS tags in the treebank (=-=Ratnaparkhi 1996),-=- and we didn’t want this noise to contribute to dependency errors.sCollins Head-Driven Statistical Models for NL Parsing Table 4 Dependency accuracy on section 0 of the treebank with Model 2. No lab... |

566 | Three Generative, Lexicalised Models for Statistical Parsing - Collins - 1997 |

545 | Generalized phrase structure grammar - Gazdar, Klein, et al. - 1985 |

487 | A New Statistical Parser Based on Bigram Lexical Dependencies
- Collins
- 1996
(Show Context)
Citation Context ...atment of punctuation (section 4.3) together with the addition of the Pp and Pcc parameters.) 608 Model ≤ 40 Words (2,245 sentences) LR LP CBs 0 CBs ≤ 2 CBs Magerman 1995 84.6% 84.9% 1.26 56.6% 81=-=.4% Collins 1996-=- 85.8% 86.3% 1.14 59.9% 83.6% Goodman 1997 84.8% 85.3% 1.21 57.6% 81.4% Charniak 1997 87.5% 87.4% 1.00 62.1% 86.1% Model 1 87.9% 88.2% 0.95 65.8% 86.3% Model 2 88.5% 88.7% 0.92 66.7% 87.1% Model 3 88.... |

408 | Statistical parsing with a Context-Free Grammar and word statistics
- Charniak
- 1997
(Show Context)
Citation Context ...ameters.) 608 Model ≤ 40 Words (2,245 sentences) LR LP CBs 0 CBs ≤ 2 CBs Magerman 1995 84.6% 84.9% 1.26 56.6% 81.4% Collins 1996 85.8% 86.3% 1.14 59.9% 83.6% Goodman 1997 84.8% 85.3% 1.21 57.6% 81=-=.4% Charniak 1997-=- 87.5% 87.4% 1.00 62.1% 86.1% Model 1 87.9% 88.2% 0.95 65.8% 86.3% Model 2 88.5% 88.7% 0.92 66.7% 87.1% Model 3 88.6% 88.7% 0.90 67.1% 87.4% Charniak 2000 90.1% 90.1% 0.74 70.1% 89.6% Collins 2000 90.... |

388 |
Definite clause grammars for language analysis—a survey of the formalism and a comparison with augmented transition networks
- Pereira, Warren
- 1980
(Show Context)
Citation Context ...ve several equally semantically plausible analyses, but that structural preferences 619sComputational Linguistics Volume 29, Number 4 distinguish strongly among them. Take the following example (from =-=Pereira and Warren 1980-=-): (4) John was believed to have been shot by Bill. Surprisingly, this sentence has two analyses: Bill can be the deep subject of either believed or shot. Yet people have a very strong preference for ... |

362 | Statistical decision-tree models for parsing
- Magerman
- 1995
(Show Context)
Citation Context ...he main model changes were the improved treatment of punctuation (section 4.3) together with the addition of the Pp and Pcc parameters.) 608 Model ≤ 40 Words (2,245 sentences) LR LP CBs 0 CBs ≤ 2 =-=CBs Magerman 1995-=- 84.6% 84.9% 1.26 56.6% 81.4% Collins 1996 85.8% 86.3% 1.14 59.9% 83.6% Goodman 1997 84.8% 85.3% 1.21 57.6% 81.4% Charniak 1997 87.5% 87.4% 1.00 62.1% 86.1% Model 1 87.9% 88.2% 0.95 65.8% 86.3% Model ... |

340 | The penn treebank: Annotating predicate argument structure. ARPA human language technology workshop - Marcus, Kim, et al. - 1994 |

330 | Structural ambiguity and lexical relations - Hindle, Rooth - 1993 |

327 | Discriminative reranking for natural language parsing
- Collins
- 2000
(Show Context)
Citation Context ...ns, or quotation marks) and have the same label 15 as a constituent in the treebank parse. Table 2 shows the results for models 1, 2 and 3 and a variety of other models in the literature. Two models (=-=Collins 2000-=-; Charniak 2000) outperform models 2 and 3 on section 23 of the treebank. Collins (2000) uses a technique based on boosting algorithms for machine learning that reranks n-best output from model 2 in t... |

323 | Nymble: a HighPerformance Learning Name-finder - Bikel, Miller, et al. - 1997 |

316 | Three new probabilistic models for dependency parsing: An exploration - Eisner - 1996 |

299 | Trainable Grammars for Speech Recognition - Baker - 1979 |

270 |
A procedure for quantitatively comparing the syntactic coverage of English grammars
- Black, Abney, et al.
- 1991
(Show Context)
Citation Context ...Wall Street Journal portion of the Penn Treebank (Marcus, Santorini, and Marcinkiewicz 1993) (approximately 40,000 sentences) and tested on section 23 (2,416 sentences). We use the PARSEVAL measures (=-=Black et al. 1991-=-) to compare performance: Labeled precision = number of correct constituents in proposed parse number of constituents in proposed parse number of correct constituents in proposed parse Labeled recall ... |

251 | Tree-bank Grammars - Charniak - 1996 |

232 | A natural language system for spoken language applications - Seneff, “TINA - 1992 |

172 | Gemini: A natural language system for spoken language understanding - Dowding, Gawron, et al. - 1993 |

171 | A linear observed time statistical parser based on maximum entropy models
- Ratnaparkhi
- 1997
(Show Context)
Citation Context ...70.7% 89.6% Model ≤ 100 Words (2,416 sentences) LR LP CBs 0 CBs ≤ 2 CBs Magerman 1995 84.0% 84.3% 1.46 54.0% 78.8% Collins 1996 85.3% 85.7% 1.32 57.2% 80.8% Charniak 1997 86.7% 86.6% 1.20 59.5% 83=-=.2% Ratnaparkhi 1997 8-=-6.3% 87.5% 1.21 60.2% — Model 1 87.5% 87.7% 1.09 63.4% 84.1% Model 2 88.1% 88.3% 1.06 64.0% 85.1% Model 3 88.0% 88.3% 1.05 64.3% 85.4% Charniak 2000 89.6% 89.5% 0.88 67.6% 87.7% Collins 2000 89.6% 8... |

167 | A rule-based approach to prepositional phrase attachment disambiguation - Brill, Resnik - 1994 |

163 | Building a Syntactically Annotated Corpus: The Prague Dependency Treebank - Hajič - 1998 |

162 | Towards history-based grammars: Using richer models for probabilistic parsing - Black, Jelinek, et al. - 1992 |

154 | Statistical attribute-value grammars - Abney - 1997 |

152 | Prepositional Phrase Attachment through a backed-off model
- Collins, Brooks
- 1995
(Show Context)
Citation Context ... some structural preference is not ideal, but is at least better than chance. This hypothesis is suggested by previous work on specific cases of attachment ambiguity such as PP attachment (see, e.g., =-=Collins and Brooks 1995-=-), which has showed that models will perform better given lexical statistics, and that a straight structural preference is merely a fallback. But some examples suggest this is not the case: that, in f... |

148 | Mathematical Statistics: Basic Ideas and Selected Topics, 2nd edition - Bickel, Docksum - 2001 |

139 | A statistical parser for Czech - Collins, Hajič, et al. - 1999 |

136 | Automatic grammar induction and parsing free text: A transformationbased approach - Brill - 1993 |

136 | Exploiting Syntactic Structures for Language and Modelling
- Chelba, Jelinek
- 1998
(Show Context)
Citation Context ...und 1998. Of particular relevance is other work on parsing the Penn WSJ Treebank (Jelinek et al. 1994; Magerman 1995; Eisner 1996a, 1996b; Collins 1996; Charniak 1997; Goodman 1997; Ratnaparkhi 1997; =-=Chelba and Jelinek 1998-=-; Roark 2001). Eisner (1996a, 1996b) describes several dependency-based models that are also closely related to the models in this article. Collins (1996) also describes a dependency-based model appli... |

128 | Equations for part-of-speech tagging - Charniak, Hendrickson, et al. - 1993 |

125 | Stochastic lexicalized tree-adjoining grammars - Schabes - 1992 |

123 | Disambiguation of super parts of speech (or supertags): Almost parsing - Joshi, Srinivas - 1994 |

112 | A novel use of statistical parsing to extract information from text - Miller, Fox, et al. - 2000 |

111 | Applying probability measures to abstract languages - Booth, Thompson - 1973 |

106 | Efficient parsing for bilexical context-free grammars and head automaton grammars - Eisner, Satta - 1999 |

99 | Parsing Inside-Out - Goodman - 1998 |

95 | The NYU System for MUC-6 or Where’s the Syntax - Grishman - 1995 |

90 | Probabilistic tree-adjoining grammar as a framework for statistical natural language processing - Resnik - 1992 |

89 | Grammatical trigrams: A probabilistic model of link grammar - Lafferty, Sleator, et al. - 1992 |

85 | Syntactic Locality and Tree Adjoining Grammar: Grammatical, Acquisition and Processing Perpectives, Doctoral dissertation - Frank - 1992 |

73 | Learning Parse and Translation Decisions from Examples with Rich Context - Hermjakob - 1997 |

69 | The role of semantics in a grammar - McCawley - 1968 |

66 |
Building a Large Annotated
- MARCUS, MARCINKIEWICZ, et al.
- 1993
(Show Context)
Citation Context ...babilities conditioned on lexical heads. For this reason we refer to the models as head-driven statistical models. We describe evaluation of the three models on the Penn Wall Street Journal treebank (=-=Marcus et al. 1993-=-). Model 1 achieves 87.7/87.5% constituent precision and recall on sentences of up to 100 words in length in section 23 of the treebank, and Models 2 and 3 give further improvements to 88.3/88.0% cons... |

65 | Efficient algorithms for parsing the dop model - Goodman - 1996 |

62 |
Building a large annotated corpus of english: The Penn Treebank
- Marcinkiewicz
- 1994
(Show Context)
Citation Context ...on lexical heads. For this reason we refer to the models as head-driven statistical models. We describe evaluation of the three models on the Penn Wall Street Journal Treebank (Marcus, Santorini, and =-=Marcinkiewicz 1993-=-). Model 1 achieves 87.7% constituent precision and 87.5% consituent recall on sentences of up to 100 words in length in section 23 of the treebank, and Models 2 and 3 give further improvements to 88.... |

62 | Conditional structure versus conditional estimation in NLP models - Klein, Manning - 2002 |

61 | Training and Scaling Preference Functions for Disambiguation - Alshawi, Carter - 1994 |

59 | Statistically-Driven Computer Grammars of English: The IBM/Lancaster Approach, Rodopi: Amsterdam-Atlanta - Black, Garside, et al. - 1993 |

58 | Effective Bayesian inference for stochastic programs - Roller, McAllester, et al. - 1997 |

58 | Pearl: A Probabilistic Chart Parser - Magerman, Marcus - 1991 |

57 | Development and evaluation of a broadcoverage probabilistic grammar of english-language computer manuals - Black, Lafferty, et al. - 1992 |

52 | Decision tree parsing using a hidden derivation model
- Jelinek, Lafferty, et al.
- 1994
(Show Context)
Citation Context ...ted work, chapter 4 of Collins (1999) attempts to give a comprehensive review of work on statistical parsing up to around 1998. Of particular relevance is other work on parsing the Penn WSJ Treebank (=-=Jelinek et al. 1994-=-; Magerman 1995; Eisner 1996a, 1996b; Collins 1996; Charniak 1997; Goodman 1997; Ratnaparkhi 1997; Chelba and Jelinek 1998; Roark 2001). Eisner (1996a, 1996b) describes several dependency-based models... |

52 | Using an annotated corpus as a stochastic grammar - Bod - 1993 |

51 | Poor Estimates of Context are Worse than None - Gale, Church - 1990 |

51 | Efficiency, robustness and accuracy in Picky chart parsing - Magerman, Weir - 1992 |

50 | Head automata and bilingual tiling: Translation with minimal representations - Alshawi - 1996 |

50 | Corpus statistics meet the noun compound: Some empirical results - Lauer - 1995 |

46 | An empirical comparison of probability models for dependency grammar
- Eisner
- 1996
(Show Context)
Citation Context ...) attempts to give a comprehensive review of work on statistical parsing up to around 1998. Of particular relevance is other work on parsing the Penn WSJ Treebank (Jelinek et al. 1994; Magerman 1995; =-=Eisner 1996-=-a, 1996b; Collins 1996; Charniak 1997; Goodman 1997; Ratnaparkhi 1997; Chelba and Jelinek 1998; Roark 2001). Eisner (1996a, 1996b) describes several dependency-based models that are also closely relat... |

44 | Context-sensitive statistics for improved grammatical language models - Charniak, Carroll - 1994 |

43 | Lexicalized context-free grammars - Schabes, Waters - 1993 |

43 | Statistical Parsing of Messages - Chitrao, Grishman - 1990 |

42 | Global thresholding and multiple-pass parsing - Goodman - 1997 |

37 | Probabilistic feature grammars
- Goodman
- 1997
(Show Context)
Citation Context ...er with the addition of the Pp and Pcc parameters.) 608 Model ≤ 40 Words (2,245 sentences) LR LP CBs 0 CBs ≤ 2 CBs Magerman 1995 84.6% 84.9% 1.26 56.6% 81.4% Collins 1996 85.8% 86.3% 1.14 59.9% 83=-=.6% Goodman 1997-=- 84.8% 85.3% 1.21 57.6% 81.4% Charniak 1997 87.5% 87.4% 1.00 62.1% 86.1% Model 1 87.9% 88.2% 0.95 65.8% 86.3% Model 2 88.5% 88.7% 0.92 66.7% 87.1% Model 3 88.6% 88.7% 0.90 67.1% 87.4% Charniak 2000 90... |

33 | Stochastic HPSG - Brew - 1995 |

28 | Automatic Learning for Semantic Collocation - Sekine, Carroll, et al. - 1992 |

28 | What is the minimal set of fragments that achieves maximal parse accuracy? A. Zollmann and K. Sima’an
- Bod
- 2001
(Show Context)
Citation Context ...., 1998). This approach intends to allow greatsexibility in the features which can be incorporated in a model, and additional features are shown to give improvements in parsing performance. Finally, (=-=Bod 200-=-1) describes a very dierent approach { a DOP approach to parsing { which gives excellent results on treebank parsing, comparable to the results of (Charniak 2000; Collins 2000). 8.1 A Comparison to th... |

22 | Conditional random Probabilistic models for segmenting and labeling sequence data - Laerty, McCallum, et al. - 2001 |

22 | Tina: A natural language system for spoken language applications - Sene - 1992 |

17 | Core Natural Language Processing Technology Applicable to Multiple Languages. The Workshop 98 Final Report. At: http://www.clsp.jhu.edu/ws98/ projects/nlp/report - Haji, Brill, et al. - 1998 |

15 |
nymble: a high-performance learning name
- Bikel, Miller, et al.
- 1997
(Show Context)
Citation Context ...context at levels 1, 2 and 3 in the table, and 1 , 2 and 3 are smoothing parameters where 0 i 1. We use 17 Computational Linguistics Volume ??, Number ? the smoothing method described in (Bikel =-=et al. 199-=-7), which is derived from a method described in (Witten and Bell 1991). First, say the most specic estimate e 1 = n1 f1 { that is, f 1 is the value of the denominator count in the relative frequency e... |

13 | A Statistical Model for Parsing and Word-Sense Disambiguation - Bikel - 2000 |

13 | Semantic Tagging using a Probabilistic Context Free Grammar - Collins, Miller - 1998 |

11 | On the unsupervised induction of phrase-structure grammars - Marcken - 1995 |

11 | The effect of alternative tree representations on tree bank grammars - Johnson - 1998 |

9 | A probabilistic parser applied to software testing documents - Jones, Eisner - 1992 |

7 | A probabilistic parser and its applications - Jones, Eisner - 1992 |

7 | Zaidel and Dania Egedi. A Freely Available Wide Coverage Morphological Analyzer for English - Karp, Schabes, et al. - 1992 |

6 | FASTUS: a nite-state processor for information extraction from real-world text - Appelt, Hobbs, et al. - 1993 |

6 | Generalized LR Parsing of Natural Language (Corpora) with Uni cation-Based Grammars - Briscoe, Carroll - 1993 |

5 | Grammatical trigrams: A probabilistic model of link grammar - Laerty, Sleator, et al. - 1992 |

2 | New gures of merit for best- rst probabilistic chart parsing - Caraballo, Charniak - 1998 |

1 | Head Automata and Bilingual Tiling: Translation with Minimal Representations - Volume - 1996 |

1 |
Head-Driven Statistical Models for NL Parsing
- Collins
- 1996
(Show Context)
Citation Context ...erivation order is depth-rst | that is, each modier recursively generates the sub-tree below it before the next modier is generated. Figure 3 gives an example that illustrates this. The models in (Col=-=lins 199-=-6) showed that the distance between words standing in head-modier relationships was important, in particular that it is important to capture a preference for right-branching structures (which almost t... |

1 | The Penn Treebank: Annotating Predicate Argument Structure - Marcinkiewicz, Bies, et al. - 1994 |

1 |
Computational Linguistics Volume ??, Number ? de
- Marcken
- 1995
(Show Context)
Citation Context ...cribe an alternative, \supertagging" model for tree adjoining grammars. See (Alshawi 1996) for work on stochastic headautomata, and (Laerty et al. 1992) for a stochastic version of link grammar. =-=(de Marcken 19-=-95) considers stochastic lexicalized PCFGs, with specic reference to EM methods for unsupervised training. (Sene 1992) describes the use of markov models for rule generation, which is closely related ... |

1 | Apportioning Development E ort in a Probabilistic LR Parsing System through Evaluation - Carroll, Briscoe - 1995 |

1 | Coping with Syntactic Ambiguity or How to Put the Block intheBoxon the Table - Church, Patil - 1982 |