Results 1  10
of
40
Dual decomposition for parsing with nonprojective head automata
 In Proc. of EMNLP
, 2010
"... This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of headautomata models to nonprojective structures. The dual decomposition algorithms are simple and efficient, relying on standa ..."
Abstract

Cited by 101 (16 self)
 Add to MetaCart
(Show Context)
This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of headautomata models to nonprojective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the nonprojective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98 % of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets. 1
Dependency parsing by belief propagation
 In Proceedings of EMNLP
, 2008
"... We formulate dependency parsing as a graphical model with the novel ingredient of global constraints. We show how to apply loopy belief propagation (BP), a simple and effective tool for approximate learning and inference. As a parsing algorithm, BP is both asymptotically and empirically efficient. E ..."
Abstract

Cited by 84 (9 self)
 Add to MetaCart
(Show Context)
We formulate dependency parsing as a graphical model with the novel ingredient of global constraints. We show how to apply loopy belief propagation (BP), a simple and effective tool for approximate learning and inference. As a parsing algorithm, BP is both asymptotically and empirically efficient. Even with secondorder features or latent variables, which would make exact parsing considerably slower or NPhard, BP needs only O(n3) time with a small constant factor. Furthermore, such features significantly improve parse accuracy over exact firstorder methods. Incorporating additional features would increase the runtime additively rather than multiplicatively. 1
Concise Integer Linear Programming Formulations for Dependency Parsing
, 2009
"... We formulate the problem of nonprojective dependency parsing as a polynomialsized integer linear program. Our formulation is able to handle nonlocal output features in an efficient manner; not only is it compatible with prior knowledge encoded as hard constraints, it can also learn soft constraint ..."
Abstract

Cited by 58 (11 self)
 Add to MetaCart
We formulate the problem of nonprojective dependency parsing as a polynomialsized integer linear program. Our formulation is able to handle nonlocal output features in an efficient manner; not only is it compatible with prior knowledge encoded as hard constraints, it can also learn soft constraints from data. In particular, our model is able to learn correlations among neighboring arcs (siblings and grandparents), word valency, and tendencies toward nearlyprojective parses. The model parameters are learned in a maxmargin framework by employing a linear programming relaxation. We evaluate the performance of our parser on data in several natural languages, achieving improvements over existing stateoftheart methods.
Unsupervised Searchbased Structured Prediction
, 2009
"... We describe an adaptation and application of a searchbased structured prediction algorithm “Searn” to unsupervised learning problems. We show that it is possible to reduce unsupervised learning to supervised learning and demonstrate a highquality unsupervised shiftreduce parsing model. We additio ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
We describe an adaptation and application of a searchbased structured prediction algorithm “Searn” to unsupervised learning problems. We show that it is possible to reduce unsupervised learning to supervised learning and demonstrate a highquality unsupervised shiftreduce parsing model. We additionally show a close connection between unsupervised Searn and expectation maximization. Finally, we demonstrate the efficacy of a semisupervised extension. The key idea that enables this is an application of the predictself idea for unsupervised learning.
Stacking Dependency Parsers
"... We explore a stacked framework for learning to predict dependency structures for natural language sentences. A typical approach in graphbased dependency parsing has been to assume a factorized model, where local features are used but a global function is optimized (McDonald et al., 2005b). Recently ..."
Abstract

Cited by 49 (5 self)
 Add to MetaCart
(Show Context)
We explore a stacked framework for learning to predict dependency structures for natural language sentences. A typical approach in graphbased dependency parsing has been to assume a factorized model, where local features are used but a global function is optimized (McDonald et al., 2005b). Recently Nivre and McDonald (2008) used the output of one dependency parser to provide features for another. We show that this is an example of stacked learning, in which a second predictor is trained to improve the performance of the first. Further, we argue that this technique is a novel way of approximating rich nonlocal features in the second parser, without sacrificing efficient, modeloptimal prediction. Experiments on twelve languages show that stacking transitionbased and graphbased parsers improves performance over existing stateoftheart dependency parsers. 1
Structured prediction models via the matrixtree theorem
 In EMNLPCoNLL
, 2007
"... This paper provides an algorithmic framework for learning statistical models involving directed spanning trees, or equivalently nonprojective dependency structures. We show how partition functions and marginals for directed spanning trees can be computed by an adaptation of Kirchhoff’s MatrixTree ..."
Abstract

Cited by 44 (5 self)
 Add to MetaCart
(Show Context)
This paper provides an algorithmic framework for learning statistical models involving directed spanning trees, or equivalently nonprojective dependency structures. We show how partition functions and marginals for directed spanning trees can be computed by an adaptation of Kirchhoff’s MatrixTree Theorem. To demonstrate an application of the method, we perform experiments which use the algorithm in training both loglinear and maxmargin dependency parsers. The new training methods give improvements in accuracy over perceptrontrained models. 1
An Augmented Lagrangian Approach to Constrained MAP Inference
"... We propose a new algorithm for approximate MAP inference on factor graphs, by combining augmented Lagrangian optimization with the dual decomposition method. Each slave subproblem is given a quadratic penalty, which pushes toward faster consensus than in previous subgradient approaches. Our algorith ..."
Abstract

Cited by 37 (3 self)
 Add to MetaCart
(Show Context)
We propose a new algorithm for approximate MAP inference on factor graphs, by combining augmented Lagrangian optimization with the dual decomposition method. Each slave subproblem is given a quadratic penalty, which pushes toward faster consensus than in previous subgradient approaches. Our algorithm is provably convergent, parallelizable, and suitable for fine decompositions of the graph. We show how it can efficiently handle problems with (possibly global) structural constraints via simple sort operations. Experiments on synthetic and realworld data show that our approach compares favorably with the stateoftheart. 1.
An Empirical Study of Semisupervised Structured Conditional Models for Dependency Parsing
"... This paper describes an empirical study of highperformance dependency parsers based on a semisupervised learning approach. We describe an extension of semisupervised structured conditional models (SSSCMs) to the dependency parsing problem, whose framework is originally proposed in (Suzuki and Iso ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
This paper describes an empirical study of highperformance dependency parsers based on a semisupervised learning approach. We describe an extension of semisupervised structured conditional models (SSSCMs) to the dependency parsing problem, whose framework is originally proposed in (Suzuki and Isozaki, 2008). Moreover, we introduce two extensions related to dependency parsing: The first extension is to combine SSSCMs with another semisupervised approach, described in (Koo et al., 2008). The second extension is to apply the approach to secondorder parsing models, such as those described in (Carreras, 2007), using a twostage semisupervised learning approach. We demonstrate the effectiveness of our proposed methods on dependency parsing experiments using two widely used test collections: the Penn Treebank for English, and the Prague Dependency Treebank
Probabilistic models of nonprojective dependency trees
 In Proc. EMNLPCoNLL
, 2007
"... A notable gap in research on statistical dependency parsing is a proper conditional probability distribution over nonprojective dependency trees for a given sentence. We exploit the Matrix Tree Theorem (Tutte, 1984) to derive an algorithm that efficiently sums the scores of all nonprojective trees i ..."
Abstract

Cited by 35 (8 self)
 Add to MetaCart
A notable gap in research on statistical dependency parsing is a proper conditional probability distribution over nonprojective dependency trees for a given sentence. We exploit the Matrix Tree Theorem (Tutte, 1984) to derive an algorithm that efficiently sums the scores of all nonprojective trees in a sentence, permitting the definition of a conditional loglinear model over trees. While discriminative methods, such as those presented in McDonald et al. (2005b), obtain very high accuracy on standard dependency parsing tasks and can be trained and applied without marginalization, “summing trees ” permits some alternative techniques of interest. Using the summing algorithm, we present competitive experimental results on four nonprojective languages, for maximum conditional likelihood estimation, minimum Bayesrisk parsing, and hidden variable training. 1