#### DMCA

## Wide-coverage efficient statistical parsing with CCG and log-linear models (2007)

### Cached

### Download Links

- [web.comlab.ox.ac.uk]
- [www.cl.cam.ac.uk]
- [www.cl.cam.ac.uk]
- [www.it.usyd.edu.au]
- [sydney.edu.au]
- [sydney.edu.au]
- [www.cs.usyd.edu.au]
- [aclweb.org]
- [www.aclweb.org]
- [wing.comp.nus.edu.sg]
- [www.aclweb.org]
- [aclweb.org]
- [aclweb.org]
- [www.aclweb.org]
- [wing.comp.nus.edu.sg]
- [web.comlab.ox.ac.uk]
- [www.cs.ox.ac.uk]
- [www.cl.cam.ac.uk]
- [www.cl.cam.ac.uk]
- DBLP

### Other Repositories/Bibliography

Venue: | COMPUTATIONAL LINGUISTICS |

Citations: | 218 - 43 self |

### Citations

3484 | Conditional random fields: Probabilistic models for segmenting and labeling sequence datasets
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ...monstrate that both accurate and highly efficient parsing is possible with CCG. 1 Introduction Log-linear models have been applied to a number of problems in NLP, e.g. POS tagging (Ratnaparkhi, 1996; =-=Lafferty, McCallum, and Pereira, 2001-=-), named entity recognition (Borthwick, 1999), chunking (Koeling, 2000) and parsing (Johnson et al., 1999). Loglinear models are also referred to as maximum entropy models and random fields in the NLP... |

3325 | Numerical Optimization
- Nocedal, Wright
- 1999
(Show Context)
Citation Context ...rgence was extremely slow; Sha and Pereira (2003) present a similar finding for globally optimised log-linear models for sequences. As an alternative to GIS, we use the limited-memory BFGS algorithm (=-=Nocedal and Wright 1999-=-). As Malouf (2002) demonstrates, general purpose numerical optimisation algorithms such as BFGS can converge much faster than iterative scaling algorithms (including Improved Iterative Scaling (Della... |

2740 | Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics,
- Marcus, Santorini, et al.
- 1993
(Show Context)
Citation Context ...ules used by the parser, and it is used as training data for the statistical models. The treebank is CCGbank (Hockenmaier and Steedman, 2002a; Hockenmaier, 2003a), a CCG version of the Penn Treebank (=-=Marcus, Santorini, and Marcinkiewicz, 1993-=-). Penn Treebank conversions have also been carried out for other linguistic formalisms, including TAG (Chen and Vijay-Shanker, 2000; Xia, Palmer, and Joshi, 2000), LFG (Burke et al., 2004) and HPSG (... |

971 | A maximum-entropy-inspired parser. - Charniak - 2000 |

889 | A high-performance, portable implementation of the MPI message passing interface standard, Parallel Computing 22 (6
- Gropp, Lusk, et al.
- 1996
(Show Context)
Citation Context ...reduction in estimation time: using 18 nodes allows our best-performing model to be estimated in less than three hours. We use the the Message Passing Interface (MPI) standard for the implementation (=-=Gropp et al. 1996-=-). The parallel implementation is a straightforward extension of the BFGS algorithm. Each machine in the cluster deals with a subset of the training data, holding the packed charts for that subset in ... |

720 |
The Syntactic Process.
- Steedman
- 2000
(Show Context)
Citation Context ...e estimation problem by developing a parallelised version of the estimation algorithm which runs on a Beowulf cluster. The lexicalized grammar formalism we use is Combinatory Categorial Grammar (CCG; =-=Steedman, 2000-=-). A number of statistical parsing models have recently been developed for CCG and used in parsers applied to newspaper text (Clark, Hockenmaier, and Steedman 2002; Hockenmaier and Steedman 2002b; Hoc... |

670 | Inducing features of random fields. - Pietra, Pietra, et al. - 1995 |

581 | Shallow Parsing with Conditional Random Fields. In: - Sha, Pereira - 2003 |

511 |
Generalized iterative scaling for log-linear models.
- Darroch, Ratcliff
- 1972
(Show Context)
Citation Context ...nd outside scores to calculate expectations, similar to the inside-outside algorithm for estimating the parameters of a PCFG from unlabelled data (Lari and Young 1990). Generalised Iterative Scaling (=-=Darroch and Ratcliff 1972-=-) is a common choice in the NLP literature for estimating a log-linear model, e.g. (Ratnaparkhi 1998; Curran and Clark 2003). Initially we used GIS for the parsing models described here, but found tha... |

490 | A New Statistical Parser Based on Bigram Lexical Dependencies. - Collins - 1996 |

429 |
The estimation of stochastic context-free grammars using the insideoutside algorithm.
- Lari, Young
- 1990
(Show Context)
Citation Context ...entence. The dynamic programming method uses inside and outside scores to calculate expectations, similar to the inside-outside algorithm for estimating the parameters of a PCFG from unlabelled data (=-=Lari and Young 1990-=-). Generalised Iterative Scaling (Darroch and Ratcliff 1972) is a common choice in the NLP literature for estimating a log-linear model, e.g. (Ratnaparkhi 1998; Curran and Clark 2003). Initially we us... |

414 | Statistical parsing with a context-free grammar and word statistics.
- Charniak
- 1997
(Show Context)
Citation Context ...ext. Hockenmaier (2003a) and Hockenmaier and Steedman (2002b) present a generative model of normal-form derivations, based on various techniques from the statistical parsing literature (Collins 2003; =-=Charniak 1997-=-; Goodman 1997). A CCG binary derivation tree is generated top-down, with the probability of generating particular child nodes being conditioned on some limited context from the previously generated s... |

363 |
A maximum entropy part-of-speech tagger.
- Ratnaparkhi, Adwait
- 1996
(Show Context)
Citation Context ... CCG parser. We demonstrate that both accurate and highly efficient parsing is possible with CCG. 1. Introduction Log-linear models have been applied to a number of problems in NLP, e.g. POS tagging (=-=Ratnaparkhi 1996-=-; Lafferty, McCallum, and Pereira 2001), named entity recognition ∗ Oxford University Computing Laboratory, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK. E-mail: stephen.clark@comlab.ox.ac.uk ∗∗ ... |

333 | Discriminative reranking for natural language parsing
- Collins, Koo
- 2004
(Show Context)
Citation Context ...ng the generative model of Hockenmaier and Steedman (2002b) in a similar way. Using a generative model’s score as a feature in a discriminative framework has been beneficial for reranking approaches (=-=Collins and Koo 2005-=-). Since the generative model uses local features similar to those in our loglinear models, it could be incorporated into the estimation and decoding processes without the need for reranking. One way ... |

297 | Categorical type Logics’, In
- Moortgat
- 1997
(Show Context)
Citation Context ...iments. A recent development in the theory of CCG is the multi-modal treatment given by Baldridge (2002) and Baldridge and Kruijff (2003), following the type-logical approaches to categorial grammar (=-=Moortgat 1997-=-). One possible extension to the parser and grammar described in this paper is to incorporate the multi-modal approach; Baldridge (2002) suggests that, as well as having theoretical motivation, a mult... |

253 | A gaussian prior for smoothing maximum entropy models (Tech.
- Chen, Rosenfeld
- 1999
(Show Context)
Citation Context ... follow Riezler et al. (2002) in using a discriminative estimation method by maximising the conditional log-likelihood of the model given the data, minus a Gaussian prior term to prevent overfitting (=-=Chen and Rosenfeld 1999-=-; Johnson et al. 1999). Thus, given training sentences S1, . . . , Sm, gold-standard dependency structures, π1, . . . , πm, and the definition of the probability of a dependency structure (13), the ob... |

253 | PCFG Models of Linguistic Tree Representations.
- Johnson
- 1998
(Show Context)
Citation Context ...aseline are considered: increasing the amount of lexicalisation; generating a lexical category at its maximal projection; conditioning the probability of a rule instantiation on the grandparent node (=-=Johnson 1998-=-); adding features designed to deal with coordination; and adding distance to the dependency features. Some of these extensions, such as increased lexicalisation and generating a lexical category at i... |

234 | Maximum entropy models for natural language ambiguity resolution, PhD thesis,
- Ratnaparkhi
- 1998
(Show Context)
Citation Context ...eters of a PCFG from unlabelled data (Lari and Young 1990). Generalised Iterative Scaling (Darroch and Ratcliff 1972) is a common choice in the NLP literature for estimating a log-linear model, e.g. (=-=Ratnaparkhi 1998-=-; Curran and Clark 2003). Initially we used GIS for the parsing models described here, but found that convergence was extremely slow; Sha and Pereira (2003) present a similar finding for globally opti... |

226 |
Surface Structure and Interpretation.
- Steedman
- 1996
(Show Context)
Citation Context ...rties of the extracted grammars, is an open question. Related work on statistical parsing with CCG will described in Section 3. 3. Combinatory Categorial Grammar Combinatory Categorial Grammar (CCG) (=-=Steedman 1996-=-, 2000) is a type-driven lexicalised theory of grammar based on categorial grammar (Wood 1993). CCG lexical entries consist of a syntactic category, which defines valency and directionality, and a sem... |

221 |
Recognition and parsing of context-free languages in time n cubed.
- Younger
- 1967
(Show Context)
Citation Context ...ng loss in accuracy. Section 10.3 gives results for the speed of the parser. 9.2 Chart parsing algorithm The algorithm used to build the packed charts is the CKY chart parsing algorithm (Kasami 1965; =-=Younger 1967-=-) described in Steedman (2000). The CKY algorithm applies naturally to CCG since the grammar is binary. It builds the chart bottom-up, starting with constituents spanning a single word, incrementally ... |

195 | Robust accurate statistical annotation of general text. - Briscoe, Carroll - 2002 |

191 | Learning to parse natural language with maximum entropy models”.
- Ratnaparkhi
- 1999
(Show Context)
Citation Context ...highly efficient parsing is possible with CCG. 2 Related Work The first application of log-linear models to parsing is the work of Ratnaparkhi (Ratnaparkhi, Roukos, and Ward, 1994; Ratnaparkhi, 1996; =-=Ratnaparkhi, 1999-=-). Similar to Della Pietra, Della Pietra, and Lafferty (1997), Ratnaparkhi motivates log-linear models from the perspective of maximising entropy, subject to certain constraints. Ratnaparkhi models th... |

188 | Parsing the wsj using ccg and log-linear models. - Clark, Curran - 2004 |

184 | A maximum entropy approach to named entity recognition. PhD thesis,
- Borthwick
- 1999
(Show Context)
Citation Context ...ved: 27 April 2006; Revised submission received: 30 November 2006; Accepted for publication: 16 March 2007. © 0 Association for Computational LinguisticssComputational Linguistics Volume 0, Number 0 (=-=Borthwick 1999-=-), chunking (Koeling 2000) and parsing (Johnson et al. 1999). Log-linear models are also referred to as maximum entropy models and random fields in the NLP literature. They are popular because of the ... |

177 | Incremental parsing with the perceptron algorithm. - Collins, Roark - 2004 |

168 | Supertagging: An Approach to Almost Parsing.
- Bangalore, Joshi
- 1999
(Show Context)
Citation Context ...ng a number of incorrect but plausible lexical categories for each word in the sentence. Second, it greatly increases the efficiency of the parser, which was the original motivation for supertagging (=-=Bangalore and Joshi 1999-=-). One possible criticism of CCG has been that highly efficient parsing is not possible because of the additional “spurious" derivations. In fact, we show that a novel method which tightly integrates ... |

164 |
A quasi-arithmetical notation for syntactic description
- Bar-Hillel
- 1953
(Show Context)
Citation Context ...ive, infinitival and wh-question. This additional information will be described in later sections. Categories are combined in a derivation using combinatory rules. In the original Categorial Grammar (=-=Bar-Hillel 1953-=-), which is context-free, there are two rules of functional application: < X /Y Y ⇒ X (>) (3) Y X \Y ⇒ X (<) (4) where X and Y denote categories (either basic or complex). The first rule is forward ap... |

155 | and C.Manning. Max-margin parsing.
- Taskar, Klein, et al.
- 2004
(Show Context)
Citation Context ...that it is difficult to test different configurations of the system, for example different feature sets. It may also not be possible to train or run the system on anything other than short sentences (=-=Taskar et al. 2004-=-). 49sComputational Linguistics Volume 0, Number 0 The supertagger is a key component in our parsing system. It reduces the size of the charts considerably compared with naive methods for assigning le... |

154 | Statistical attribute-value grammars - Abney - 1997 |

154 | Estimators for stochastic “unification-based” grammars.
- Johnson, Geman, et al.
- 1999
(Show Context)
Citation Context ...vember 2006; Accepted for publication: 16 March 2007. © 0 Association for Computational LinguisticssComputational Linguistics Volume 0, Number 0 (Borthwick 1999), chunking (Koeling 2000) and parsing (=-=Johnson et al. 1999-=-). Log-linear models are also referred to as maximum entropy models and random fields in the NLP literature. They are popular because of the ease with which complex discriminating features can be incl... |

154 |
An efficient recognition and syntax analysis algorithm for context free languages.
- Kasami
- 1965
(Show Context)
Citation Context ...y corresponding loss in accuracy. Section 10.3 gives results for the speed of the parser. 9.2 Chart parsing algorithm The algorithm used to build the packed charts is the CKY chart parsing algorithm (=-=Kasami 1965-=-; Younger 1967) described in Steedman (2000). The CKY algorithm applies naturally to CCG since the grammar is binary. It builds the chart bottom-up, starting with constituents spanning a single word, ... |

153 | Prepositional phrase attachment through a backed-off model,”
- Collins, Brooks
- 1995
(Show Context)
Citation Context ...gramming performed over the chart. Two possible extensions, which we have not investigated, include defining dependency features which account for all three elements of the triple in a PP-attachment (=-=Collins and Brooks 1995-=-), and defining a rule feature which includes the grandparent node (Johnson 1998). Another alternative for future work is to compare the dynamic programming approach taken here with the beam-search ap... |

146 | Parser Evaluation: A Survey and a New Proposal.
- Carroll, Briscoe, et al.
- 1998
(Show Context)
Citation Context ...-CCG parser, namely the RASP parser – and since we are converting the CCG output into the format used by RASP the CCG parser is not at an unfair advantage. There is also the SUSANNE GR gold standard (=-=Carroll, Briscoe, and Sanfilippo, 1998-=-), on which the B&C annotation is based, but we chose not to use this for evaluation. This earlier GR scheme is less like the dependencies output by the CCG parser, and the comparison would be complic... |

142 | Parsing the Wall Street Journal using a Lexical-Functional Grammar and discriminative estimation techniques.
- Riezler, King, et al.
- 2002
(Show Context)
Citation Context ...the model, and have been shown to give good performance across a range of NLP tasks. Log-linear models have previously been applied to statistical parsing (Johnson et al. 1999; Toutanova et al. 2002; =-=Riezler et al. 2002-=-; Malouf and van Noord 2004) but typically under the assumption that all possible parses for a sentence can be enumerated. For manually constructed grammars, this assumption is usually sufficient for ... |

112 | Parsing algorithms and metrics. - Goodman - 1996 |

101 | The PARC700 dependency bank’. In:
- King, Crouch, et al.
- 2003
(Show Context)
Citation Context ...rature. The parser is evaluated on CCGbank (available through the LDC). In order to facilitate comparisons with parsers using different formalisms, we also evaluate on the publicly available DepBank (=-=King et al. 2003-=-), using the Briscoe and Carroll annotation consistent with the RASP parser (Briscoe, Carroll, and Watson 2006). The dependency annotation is designed to be as theory-neutral as possible to allow easy... |

93 | LongDistance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations’. In: - Cahill, Burke, et al. - 2004 |

90 | The Importance of Supertagging for Wide-Coverage CCG Parsing. - Clark, Curran - 2004 |

83 |
The proposition bank: A corpus annotated with semantic roles.
- Palmer, Gildea, et al.
- 2005
(Show Context)
Citation Context ... (King et al., 2003). Cahill et al. (2004) evaluate an LFG parser, which uses an automatically extracted grammar, against DepBank. Miyao and Tsujii (2004) evaluate their HPSG parser against PropBank (=-=Palmer, Gildea, and Kingsbury, 2005-=-). Kaplan et al. (2004) compare the Collins parser with the Parc LFG parser by mapping Penn Treebank parses into the dependencies of DepBank, claiming that the LFG parser is more accurate with only a ... |

78 | 2000. Automated extraction of TAGs from the Penn Treebank
- Chen, Vijay-Shanker
(Show Context)
Citation Context ...provide. The formalism most closely related to CCG from this list is TAG. TAG grammars have been automatically extracted from the Penn Treebank, using techniques similar to those used by Hockenmaier (=-=Chen and Vijay-Shanker 2000-=-; Xia, Palmer, and Joshi 2000). Also, the supertagging idea which is central to the efficiency of the CCG parser originated with TAG (Bangalore and Joshi 1999). Chen et al. (2002) describe the results... |

76 | MultiModal Combinatory Categorial Grammar. - Baldridge, Kruijff - 2003 |

76 | 2002. Maximum entropy estimation for feature forests - Miyao, Tsujii - 2002 |

71 | Parsing biomedical literature
- Lease, Charniak
- 2005
(Show Context)
Citation Context ...e showing that, perhaps not surprisingly, the performance of parsers trained on the WSJ Penn Treebank drops significantly when the parser is applied to domains outside of newspaper text (Gildea 2001; =-=Lease and Charniak 2005-=-). The difficulty is that developing new treebanks for each of these domains is infeasible. Developing the techniques to extract a CCG grammar from the Penn Treebank, together with the pre-processing ... |

71 | Wide coverage parsing with stochastic attribute value grammars. Draft available from http://www.let.rug.nl/˜vannoord. A preliminary version of this paper was published - Noord, Malouf - 2004 |

62 |
Categorial grammar
- Wood
- 1993
(Show Context)
Citation Context ...G will described in Section 3. 3. Combinatory Categorial Grammar Combinatory Categorial Grammar (CCG) (Steedman 1996, 2000) is a type-driven lexicalised theory of grammar based on categorial grammar (=-=Wood 1993-=-). CCG lexical entries consist of a syntactic category, which defines valency and directionality, and a semantic interpretation. In this paper we are concerned with the syntactic component; see Steedm... |

60 | Efficient Normal-Form Parsing for Combinatory Categorial Grammars.
- Eisner
- 1996
(Show Context)
Citation Context ...ons in CCG complicates the modelling and parsing problems. In this paper we consider two solutions. The first, following Hockenmaier (2003a), is to define a model in terms of normal-form derivations (=-=Eisner 1996-=-). In this approach we recover only one derivation leading to a given set of predicate-argument dependencies and ignore the rest. The second approach is to define a model over the predicate-argument d... |

56 | Cascaded grammatical relation assignment. In - Buchholz, Veenstra, et al. - 1999 |

56 | A classifier-based parser with linear run-time complexity. - Sagae, Lavie - 2005 |

55 | Speed and accuracy in shallow and deep stochastic parsing.
- Kaplan, Riezler, et al.
- 2004
(Show Context)
Citation Context ...istical parsing using linguistically motivated grammar formalisms is large and growing. Statistical parsers have been developed for TAG (Chiang 2000; Sarkar and Joshi 2003), LFG (Riezler et al. 2002; =-=Kaplan et al. 2004-=-; Cahill et al. 2004) and HPSG (Toutanova et al. 2002; Toutanova, Markova, and Manning 2004; Miyao and Tsujii 2004; Malouf and van Noord 2004), among others. The motivation for using these formalisms ... |

52 | Investigating GIS and Smoothing for Maximum Entropy Taggers
- Curran, Clark
- 2003
(Show Context)
Citation Context ...om unlabelled data (Lari and Young 1990). Generalised Iterative Scaling (Darroch and Ratcliff 1972) is a common choice in the NLP literature for estimating a log-linear model, e.g. (Ratnaparkhi 1998; =-=Curran and Clark 2003-=-). Initially we used GIS for the parsing models described here, but found that convergence was extremely slow; Sha and Pereira (2003) present a similar finding for globally optimised log-linear models... |

51 | Corpus-oriented grammar development for acquiring a Head-driven Phrase Structure Grammar from the Penn Treebank - Miyao, Ninomiya, et al. - 2004 |

48 | Probabilistic disambiguation models for wide-coverage hpsg parsing - Miyao, Tsujii - 2005 |

44 | Large-scale induction and evaluation of lexical resources from the penn-ii treebank - O’Donovan, Burke, et al. - 2004 |

44 | Log-linear models for wide-coverage CCG parsing
- Clark, Curran
- 2003
(Show Context)
Citation Context ...tatistical Parsing 9. Parsing in Practice 9.1 Combining the Supertagger and the Parser The philosophy in earlier work which combined the supertagger and parser (Clark, Hockenmaier, and Steedman 2002; =-=Clark and Curran 2003-=-) was to use an unrestrictive setting of the supertagger, but still allow a reasonable compromise between speed and accuracy. The idea was to give the parser the greatest possibility of finding the co... |

42 | Comparison of Evaluation Metrics for a Broad Coverage Stochastic Parser.
- Crouch, Kaplan, et al.
- 2002
(Show Context)
Citation Context ...lar parser. The second difficulty is that some constructions may be analysed differently across formalisms, and even apparently trivial differences such as tokenisation can complicate the comparison (=-=Crouch et al. 2002-=-). Despite these difficulties we have attempted a cross-formalism comparison of the CCG parser. For the gold standard we chose the version of DepBank reannotated by Briscoe and Carroll (2006), consist... |

37 | Probabilistic feature grammars.
- Goodman
- 1997
(Show Context)
Citation Context ...r (2003a) and Hockenmaier and Steedman (2002b) present a generative model of normal-form derivations, based on various techniques from the statistical parsing literature (Collins 2003; Charniak 1997; =-=Goodman 1997-=-). A CCG binary derivation tree is generated top-down, with the probability of generating particular child nodes being conditioned on some limited context from the previously generated structure. Hock... |

34 | A Uniform Method of Grammar Extraction and its Applications
- Xia, Palmer, et al.
(Show Context)
Citation Context ...osely related to CCG from this list is TAG. TAG grammars have been automatically extracted from the Penn Treebank, using techniques similar to those used by Hockenmaier (Chen and Vijay-Shanker, 2000; =-=Xia, Palmer, and Joshi, 2000-=-). Also, the supertagging idea which is central to the efficiency of the CCG parser originated with TAG (Bangalore and Joshi, 1999). Chen et al. (2002) describe the results of reranking the output of ... |

33 | Chunking with maximum entropy models.
- Koeling
- 2000
(Show Context)
Citation Context ... submission received: 30 November 2006; Accepted for publication: 16 March 2007. © 0 Association for Computational LinguisticssComputational Linguistics Volume 0, Number 0 (Borthwick 1999), chunking (=-=Koeling 2000-=-) and parsing (Johnson et al. 1999). Log-linear models are also referred to as maximum entropy models and random fields in the NLP literature. They are popular because of the ease with which complex d... |

32 | Formalism-independent parser evaluation with CCG and Depbank - Clark, Curran - 2007 |

28 | Building Deep Dependency Structures with a Wide-Coverage CCG Parser
- Clark, Hockenmaier, et al.
- 2002
(Show Context)
Citation Context ...ar formalism we use is Combinatory Categorial Grammar (CCG; Steedman, 2000). A number of statistical parsing models have recently been developed for CCG and used in parsers applied to newspaper text (=-=Clark, Hockenmaier, and Steedman, 2002-=-; Hockenmaier and Steedman, 2002b; Hockenmaier, 2003b). In this paper we extend existing parsing techniques by developing log-linear models for CCG, as well as a new model and efficient parsing algori... |

27 | A maximum entropy model for parsing
- Ratnaparkhi, Roukos, et al.
- 1994
(Show Context)
Citation Context ...ncing demonstration that, through use of a supertagger, highly efficient parsing is possible with CCG. 2 Related Work The first application of log-linear models to parsing is the work of Ratnaparkhi (=-=Ratnaparkhi, Roukos, and Ward, 1994-=-; Ratnaparkhi, 1996; Ratnaparkhi, 1999). Similar to Della Pietra, Della Pietra, and Lafferty (1997), Ratnaparkhi motivates log-linear models from the perspective of maximising entropy, subject to cert... |

26 | 2004. Object-extraction and question-parsing using CCG
- Clark, Steedman, et al.
(Show Context)
Citation Context ...endencies can be integrated into the parsing process in a straightforward manner, rather than be relegated to such a post-processing phase (Clark, Hockenmaier, and Steedman, 2002; Hockenmaier, 2003a; =-=Clark, Steedman, and Curran, 2004-=-). Another advantage of CCG is that providing a compositional semantics for the grammar is relatively straightforward. It has a completely transparent interface between syntax and semantics and, since... |

23 |
An introduction to tag sequence grammars and the RASP system parser
- Briscoe
- 2006
(Show Context)
Citation Context ...tatistical Parsing any GRs in which the words either side of the & are arguments with a single GR in which & is the argument. The ta relation, which identifies text adjuncts delimited by punctuation (=-=Briscoe 2006-=-), is difficult to assign correctly to the parser output. The simple punctuation rules used by the parser, and derived from CCGbank, do not contain enough information to distinguish between the variou... |

23 | Parse disambiguation for a rich HPSG grammar
- Toutanova, Manning, et al.
- 2002
(Show Context)
Citation Context ...res can be included in the model, and have been shown to give good performance across a range of NLP tasks. Log-linear models have previously been applied to statistical parsing (Johnson et al. 1999; =-=Toutanova et al. 2002-=-; Riezler et al. 2002; Malouf and van Noord 2004), but typically under the assumption that all possible parses for a sentence can be enumerated. For manually constructed grammars, this assumption is u... |

21 | Exploiting auxiliary distributions in stochastic unification-based grammars - Johnson, Riezler - 2000 |

21 | Learning stochastic categorial grammars
- Osborne, Briscoe
- 1997
(Show Context)
Citation Context ...began as part of the Edinburgh wide-coverage CCG parsing project (2000–2004). There has been some other work on defining stochastic categorial grammars, but mainly in the context of grammar learning (=-=Osborne and Briscoe 1997-=-; Watkinson and Manandhar 2001; Zettlemoyer and Collins 2005). An early attempt from the Edinburgh project at wide-coverage CCG parsing is presented in Clark, Hockenmaier, and Steedman (2002). In orde... |

17 | Deep syntactic processing by combining shallow methods
- Dienes, Dubey
- 2003
(Show Context)
Citation Context ...). This has led to a number of proposals for post-processing the output of the Collins and Charniak parsers, in which trace sites are located and the antecedent of the trace determined (Johnson 2002; =-=Dienes and Dubey 2003-=-; Levy and Manning 2004). An advantage of using CCG is that the recovery of long-range dependencies can be integrated into the parsing process in a straightforward manner, rather than be relegated to ... |

17 | Probabilistic modeling of argument structures including non-local dependencies - Miyao, Ninomiya, et al. - 2003 |

15 |
Acquiring compact lexicalized grammars from a cleaner treebank
- 2002a
(Show Context)
Citation Context ...estimation and decoding. However, for wide-coverage grammars extracted from a treebank, enumerating all parses is infeasible. In this paper we apply the dynamic programming method of Miyao and Tsujii =-=(2002)-=- to a packed chart; however, since the grammar is automatically extracted, the packed charts require a considerable amount of memory: up to 25 GB. We solve this massive estimation problem by developin... |

14 | Multi-tagging for lexicalized-grammar parsing - Curran, Clark, et al. - 2006 |

14 |
Generative models for statistical parsing with Combinatory Categorial Grammar
- 2002b
(Show Context)
Citation Context ...estimation and decoding. However, for wide-coverage grammars extracted from a treebank, enumerating all parses is infeasible. In this paper we apply the dynamic programming method of Miyao and Tsujii =-=(2002)-=- to a packed chart; however, since the grammar is automatically extracted, the packed charts require a considerable amount of memory: up to 25 GB. We solve this massive estimation problem by developin... |

12 | Supertagging and full parsing - Nasr, Rambow - 2004 |

12 | The leaf projection path view of parse trees: Exploring string kernels for HPSG parse selection.
- Toutanova, Markova, et al.
- 2004
(Show Context)
Citation Context ...growing. Statistical parsers have been developed for TAG (Chiang, 2000; Sarkar and Joshi, 2003), LFG (Riezler et al., 2002; Kaplan et al., 2004; Cahill et al., 2004) and HPSG (Toutanova et al., 2002; =-=Toutanova, Markova, and Manning, 2004-=-; Miyao and Tsujii, 2004; Malouf and van Noord, 2004), among others. The motivation for using these formalisms is that many NLP tasks, such as Machine Translation, Information Extraction, and Question... |

9 | Efficient extraction of grammatical relations - Watson, Carroll, et al. - 2005 |

8 | Reranking an n-gram supertagger,” - Chen, Bangalore, et al. - 2002 |

8 | Deep linguistic analysis for the accurate identification of predicate-argument relations. - Tsujii - 2004 |

8 | and Gertjan van Noord. 2003. Reinforcing parser preferences through tagging. Traitement Automatique des Langues - Prins |

8 | Tree-Adjoining Grammars and its application to statistical parsing
- Joshi, Sarkar
- 2002
(Show Context)
Citation Context ...ude higher. More generally, the literature on statistical parsing using linguistically motivated grammar formalisms is large and growing. Statistical parsers have been developed for TAG (Chiang 2000; =-=Sarkar and Joshi 2003-=-), LFG (Riezler et al. 2002; Kaplan et al. 2004; Cahill et al. 2004) and HPSG (Toutanova et al. 2002; Toutanova, Markova, and Manning 2004; Miyao and Tsujii 2004; Malouf and van Noord 2004), among oth... |

6 |
Parsing with generative models of predicate-argument structure
- 2003b
(Show Context)
Citation Context ...imating a log-linear model, e.g. (Ratnaparkhi 1998; Curran and Clark 2003). Initially we used GIS for the parsing models described here, but found that convergence was extremely slow; Sha and Pereira =-=(2003)-=- present a similar finding for globally optimised log-linear models for sequences. As an alternative to GIS, we use the limited-memory BFGS algorithm (Nocedal and Wright 1999). As Malouf (2002) demons... |

5 |
Data and Models for Statistical Parsing with Combinatory Categorial Grammar
- 2003a
(Show Context)
Citation Context ...imating a log-linear model, e.g. (Ratnaparkhi 1998; Curran and Clark 2003). Initially we used GIS for the parsing models described here, but found that convergence was extremely slow; Sha and Pereira =-=(2003)-=- present a similar finding for globally optimised log-linear models for sequences. As an alternative to GIS, we use the limited-memory BFGS algorithm (Nocedal and Wright 1999). As Malouf (2002) demons... |

4 |
The importance of supertagging for wide-coverage CCG parsing
- 2004a
(Show Context)
Citation Context ...dynamic programming to efficiently calculate the feature expectations. Geman and Johnson (2002) propose a similar method in the context of LFG parsing; an implementation is described in Kaplan et al. =-=(2004)-=-. Miyao and Tsujii have carried out a number of investigations similar to the work in this article. In Miyao and Tsujii (2003b, 2003a) log-linear models are developed for automatically extracted gramm... |

4 | dependencies from context-free statistical parsers: Correcting the surface dependency approximation - Deep |

4 | semantic representations from a CCG parser - Wide-coverage |

3 | coverage parsing with stochastic attribute value grammars - Wide |

2 |
Semi-supervised training for statistical parsing: Final report
- Steedman, Baker, et al.
- 2002
(Show Context)
Citation Context ...nal training data, by taking the lexical categories chosen by the parser as gold-standard training data. If enough unlabelled data is parsed, then the large volume can overcome the noise in the data (=-=Steedman et al. 2002-=-; Prins and van Noord 2003). We plan to investigate this idea in the context of our own parsing system. 13. Conclusion This paper has shown how to estimate a log-linear parsing model for an automatica... |

1 | Wide-Coverage Statistical Parsing - Clark, Curran - 2000 |

1 | Computational Linguistics Volume 0, Number 0 - Goodman - 1996 |

1 |
A model of syntactic disambiguation based on lexicalized grammars
- 2003a
(Show Context)
Citation Context ...imating a log-linear model, e.g. (Ratnaparkhi 1998; Curran and Clark 2003). Initially we used GIS for the parsing models described here, but found that convergence was extremely slow; Sha and Pereira =-=(2003)-=- present a similar finding for globally optimised log-linear models for sequences. As an alternative to GIS, we use the limited-memory BFGS algorithm (Nocedal and Wright 1999). As Malouf (2002) demons... |

1 | 2001. Acquisition of large categorial grammar lexicons - Watkinson, Manandhar |

1 | A comparison of evaluation metrics for a broad-coverage stochastic parser - Curran, Clark - 2002 |

1 | Parsing biomedical literature - Levy, Manning - 2005 |

1 |
551 Linguistics Volume 33, Number 4
- Taskar, Klein, et al.
- 2004
(Show Context)
Citation Context ...that it is difficult to test different configurations of the system, for example different feature sets. It may also not be possible to train or run the system on anything other than short sentences (=-=Taskar et al. 2004-=-). The supertagger is a key component in our parsing system. It reduces the size of the charts considerably compared with naive methods for assigning lexical categories, which is crucial for practical... |

1 |
53 Linguistics Volume 0, Number 0
- Chen, Rosenfeld
- 1999
(Show Context)
Citation Context ... follow Riezler et al. (2002) in using a discriminative estimation method by maximising the conditional log-likelihood of the model given the data, minus a Gaussian prior term to prevent overfitting (=-=Chen and Rosenfeld, 1999-=-; Johnson et al., 1999). Thus, given training sentences S1, . . . , Sm, gold-standard dependency structures, π1, . . . , πm, and the definition of the probability of a dependency structure (13), the o... |