#### DMCA

## Stacking Dependency Parsers

### Cached

### Download Links

Citations: | 49 - 5 self |

### Citations

3484 | Conditional random fields: Probabilistic models for segmenting and labeling sequence datasets
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ...resort to search or greediness, (Ratnaparkhi et al., 1994; Sagae and Lavie, 2005; Hall et al., 2006), so that parsing solutions are inexact and learned models may be subject to certain kinds of bias (=-=Lafferty et al., 2001-=-). A solution that leverages the complementary strengths of these two approaches—described in detail by McDonald and Nivre (2007)—was recently and successfully explored by Nivre and McDonald (2008). O... |

1633 | Combining labeled and unlabeled data with co-training.
- Blum, Mitchell
- 1998
(Show Context)
Citation Context ... Malt as well, except for Japanese and Turkish. Further, our non-arc-factored features largely outperform subset A, except on Bulgarian, Chinese, 8 This claim has a parallel in the cotraining method (=-=Blum and Mitchell, 1998-=-), whose performance is bounded by the degree of independence between the two feature sets. and Japanese. On average, the best feature configuration is E, which is statistically significant over Malt ... |

731 | Stacked generalization.
- Wolpert
- 1992
(Show Context)
Citation Context ...sting state-of-the-art dependency parsers. 1 Introduction In this paper we address a representation-efficiency tradeoff in statistical natural language processing through the use of stacked learning (=-=Wolpert, 1992-=-). This tradeoff is exemplified in dependency parsing, illustrated in Fig. 1, on which we focus in this paper: • Exact algorithms for dependency parsing (Eisner and Satta, 1999; McDonald et al., 2005b... |

432 | Stacked Regression. - Breiman - 1996 |

383 | Non-projective dependency parsing using spanning tree algorithms. - McDonald, Pereira, et al. - 2005 |

344 | CoNLL-X shared task on multilingual dependency parsing.
- Buchholz, E
- 2006
(Show Context)
Citation Context ...-pair-factored version). All our experiments use the non-projective version of this parser. We refer to the MaltParser as Malt. We report experiments on twelve languages from the CoNLL-X shared task (=-=Buchholz and Marsi, 2006-=-). 5 All experiments are evaluated using the labeled attachment score (LAS), using the default Statistical significance is measured using Dan Bikel’s randomized parsing evaluation comparator with 10,0... |

318 | Three new probabilistic models for dependency parsing: An exploration.
- Eisner
- 1996
(Show Context)
Citation Context ...ed learning as a way of approximating non-local features in a linear model, rather than making empirically dubious independence (McDonald et al., 2005b) or structural assumptions (e.g., projectivity, =-=Eisner, 1996-=-), using search approximations (Sagae and Lavie, 2005; Hall et al., 2006; McDonald and Pereira, 2006), solving a (generally NP-hard) integer linear program (Riedel and Clarke, 2006), or adding latent ... |

306 | Online large-margin training of dependency parsers. - McDonald, Crammer, et al. - 2005 |

288 |
Optimum branchings,”
- Edmonds
- 1967
(Show Context)
Citation Context ...in cubic time by dynamic programming (Eisner, 1996), and with a weaker “tree” constraint (permitting nonprojective parses) and arc factorization, a quadratic-time algorithm exists (Chu and Liu, 1965; =-=Edmonds, 1967-=-), as shown by McDonald et al. (2005b). In the projective case, the arc-factored assumption can be weakened in certain ways while maintaining polynomial parser runtime (Eisner and Satta, 1999), but no... |

287 | Inductive Dependency Parsing. - Nivre - 2006 |

263 | Sorensen: Dependency tree kernels for relation extraction. In - Culotta, Jeffrey - 2004 |

215 | Online learning of approximate dependency parsing algorithms.
- McDonald, Pereira
- 2006
(Show Context)
Citation Context ...n making empirically dubious independence (McDonald et al., 2005b) or structural assumptions (e.g., projectivity, Eisner, 1996), using search approximations (Sagae and Lavie, 2005; Hall et al., 2006; =-=McDonald and Pereira, 2006-=-), solving a (generally NP-hard) integer linear program (Riedel and Clarke, 2006), or adding latent variables (Titov and Henderson, 2007). Notably, we introduce the use of very rich non-local approxim... |

187 | A Corpus-Based Approach to Language Learning.
- Brill
- 1993
(Show Context)
Citation Context ... work includes sequence labeling (Cohen and de Carvalho, 2005) and inference in conditional random fields (Kou and Cohen, 2007). Stacking is also intuitively related to transformation-based learning (=-=Brill, 1993-=-). 3 Stacked Dependency Parsing We next describe how to use stacked learning for efficient, rich-featured dependency parsing. 3.1 Architecture The architecture consists of two levels. At level 0 we in... |

177 |
On the shortest arborescence of a directed graph,”
- Chu, Liu
- 1965
(Show Context)
Citation Context ...blem can be solved in cubic time by dynamic programming (Eisner, 1996), and with a weaker “tree” constraint (permitting nonprojective parses) and arc factorization, a quadratic-time algorithm exists (=-=Chu and Liu, 1965-=-; Edmonds, 1967), as shown by McDonald et al. (2005b). In the projective case, the arc-factored assumption can be weakened in certain ways while maintaining polynomial parser runtime (Eisner and Satta... |

107 | Efficient parsing for bilexical context-free grammars and head automaton grammars.
- Eisner, Satta
- 1999
(Show Context)
Citation Context ...ugh the use of stacked learning (Wolpert, 1992). This tradeoff is exemplified in dependency parsing, illustrated in Fig. 1, on which we focus in this paper: • Exact algorithms for dependency parsing (=-=Eisner and Satta, 1999-=-; McDonald et al., 2005b) are tractable only when the model makes very strong, linguistically unsupportable independence assumptions, such as “arc factorization” for nonprojective dependency parsing (... |

84 | Dependency parsing by belief propagation. - Smith, Eisner - 2008 |

83 | Machine translation using probabilistic synchronous dependency insertion grammars. In: - Ding, Palmer - 2005 |

81 | Integrating Graph-Based and Transition-Based Dependency Parsers.
- Nivre, McDonald
- 2008
(Show Context)
Citation Context ...andParents PredEdge+Sibling+GrandParents+PredHead PredEdge+Sibling+GrandParents+PredHead+ AllChildren Table 2: Combinations of features enumerated in Table 1 used for stacking. A is a replication of (=-=Nivre and McDonald, 2008-=-), except for the modifications described in footnote 4. which suggests that a separate regularization of the first-order and stacked features might be beneficial in a stacking framework. As a side no... |

79 | Characterizing the errors of data-driven dependency parsing models. In EMNLP-CoNLL, - McDonald, Nivre - 2007 |

73 | Multilingual dependency analysis with a two-stage discriminative parser.
- McDonald, Lerman, et al.
- 2006
(Show Context)
Citation Context ...(x) (Table 2). The results are shown in Table 3. While we see improvements over the single-parser baseline 4 We made other modifications to MSTParser, implementing many of the successes described by (=-=McDonald et al., 2006-=-). Our version of the code is publicly available at http: //www.ark.cs.cmu.edu/MSTParserStacked. The modifications included an approximation to lemmas for datasets without lemmas (three-character pref... |

63 | Incremental integer linear programming for non-projective dependency parsing. - Riedel, Clarke - 2006 |

61 | Labeled pseudoprojective dependency parsing with support vector machines. - Nivre, Hall, et al. - 2006 |

56 | A classifier-based parser with linear run-time complexity.
- Sagae, Lavie
- 2005
(Show Context)
Citation Context ...ndence assumptions, such as “arc factorization” for nonprojective dependency parsing (McDonald and Satta, 2007). • Feature-rich parsers must resort to search or greediness, (Ratnaparkhi et al., 1994; =-=Sagae and Lavie, 2005-=-; Hall et al., 2006), so that parsing solutions are inexact and learned models may be subject to certain kinds of bias (Lafferty et al., 2001). A solution that leverages the complementary strengths of... |

47 | What is the Jeopardy model? A quasi-synchronous grammar for QA. - Wang, Smith, et al. - 2007 |

43 | Relating probabilistic grammars and automata. - Abney, McAllester, et al. - 1999 |

43 | A latent variable model for generative dependency parsing.
- Titov, Henderson
- 2007
(Show Context)
Citation Context ...arch approximations (Sagae and Lavie, 2005; Hall et al., 2006; McDonald and Pereira, 2006), solving a (generally NP-hard) integer linear program (Riedel and Clarke, 2006), or adding latent variables (=-=Titov and Henderson, 2007-=-). Notably, we introduce the use of very rich non-local approximate features in one parser, through the output of another parser. Related approaches are the belief propagation algorithm of Smith and E... |

40 | On the complexity of non-projective data-driven dependency parsing.
- McDonald, Satta
- 2007
(Show Context)
Citation Context ...; McDonald et al., 2005b) are tractable only when the model makes very strong, linguistically unsupportable independence assumptions, such as “arc factorization” for nonprojective dependency parsing (=-=McDonald and Satta, 2007-=-). • Feature-rich parsers must resort to search or greediness, (Ratnaparkhi et al., 1994; Sagae and Lavie, 2005; Hall et al., 2006), so that parsing solutions are inexact and learned models may be sub... |

33 | Structure compilation: trading structure for features. - Liang, Daume, et al. - 2008 |

32 | Stacked graphical models for efficient inference in markov random fields
- Kou, Cohen
- 2007
(Show Context)
Citation Context ...tructured data. Some applications (including here) use only one classifier at level 0; recent work includes sequence labeling (Cohen and de Carvalho, 2005) and inference in conditional random fields (=-=Kou and Cohen, 2007-=-). Stacking is also intuitively related to transformation-based learning (Brill, 1993). 3 Stacked Dependency Parsing We next describe how to use stacked learning for efficient, rich-featured dependenc... |

27 | A maximum entropy model for parsing
- Ratnaparkhi, Roukos, et al.
- 1994
(Show Context)
Citation Context ...cally unsupportable independence assumptions, such as “arc factorization” for nonprojective dependency parsing (McDonald and Satta, 2007). • Feature-rich parsers must resort to search or greediness, (=-=Ratnaparkhi et al., 1994-=-; Sagae and Lavie, 2005; Hall et al., 2006), so that parsing solutions are inexact and learned models may be subject to certain kinds of bias (Lafferty et al., 2001). A solution that leverages the com... |

20 | Discriminative Classifiers for Deterministic Dependency Parsing
- Hall, Nivre, et al.
- 2006
(Show Context)
Citation Context ...h as “arc factorization” for nonprojective dependency parsing (McDonald and Satta, 2007). • Feature-rich parsers must resort to search or greediness, (Ratnaparkhi et al., 1994; Sagae and Lavie, 2005; =-=Hall et al., 2006-=-), so that parsing solutions are inexact and learned models may be subject to certain kinds of bias (Lafferty et al., 2001). A solution that leverages the complementary strengths of these two approach... |

2 | Carvalho V (2005) Stacked sequential learning - Cohen |