Results 11  20
of
306
Global inference for sentence compression: An integer linear programming approach
 Journal of Artificial Intelligence Research (JAIR
, 2008
"... Sentence compression holds promise for many applications ranging from summarization to subtitle generation. Our work views sentence compression as an optimization problem and uses integer linear programming (ILP) to infer globally optimal compressions in the presence of linguistically motivated cons ..."
Abstract

Cited by 106 (7 self)
 Add to MetaCart
(Show Context)
Sentence compression holds promise for many applications ranging from summarization to subtitle generation. Our work views sentence compression as an optimization problem and uses integer linear programming (ILP) to infer globally optimal compressions in the presence of linguistically motivated constraints. We show how previous formulations of sentence compression can be recast as ILPs and extend these models with novel global constraints. Experimental results on written and spoken texts demonstrate improvements over stateoftheart models. 1.
Exponentiated gradient algorithms for conditional random fields and maxmargin Markov networks
, 2008
"... Loglinear and maximummargin models are two commonlyused methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large dat ..."
Abstract

Cited by 94 (2 self)
 Add to MetaCart
(Show Context)
Loglinear and maximummargin models are two commonlyused methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the loglinear or maxmargin objective function; the dual in both the loglinear and maxmargin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the maxmargin case, O ( 1 ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for loglinear models only O(log (1/ε)) updates are required. For both the maxmargin and loglinear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be
Experiments with a higherorder projective dependency parser
 In Proc. EMNLPCoNLL Shared Task
, 2007
"... We present experiments with a dependency parsing model defined on rich factors. Our model represents dependency trees with factors that include three types of relations between the tokens of a dependency and their children. We extend the projective parsing algorithm of Eisner (1996) for our case, an ..."
Abstract

Cited by 91 (5 self)
 Add to MetaCart
(Show Context)
We present experiments with a dependency parsing model defined on rich factors. Our model represents dependency trees with factors that include three types of relations between the tokens of a dependency and their children. We extend the projective parsing algorithm of Eisner (1996) for our case, and train models using the averaged perceptron. Our experiments show that considering higherorder information yields significant improvements in parsing accuracy, but comes at a high cost in terms of both time and memory consumption. In the multilingual exercise of the CoNLL2007 shared task (Nivre et al., 2007), our system obtains the best accuracy for English, and the second best accuracies for Basque and Czech. 1
Structured Models for FinetoCoarse Sentiment Analysis
 Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
, 2007
"... In this paper we investigate a structured model for jointly classifying the sentiment of text at varying levels of granularity. Inference in the model is based on standard sequence classification techniques using constrained Viterbi to ensure consistent solutions. The primary advantage of such a mod ..."
Abstract

Cited by 89 (6 self)
 Add to MetaCart
(Show Context)
In this paper we investigate a structured model for jointly classifying the sentiment of text at varying levels of granularity. Inference in the model is based on standard sequence classification techniques using constrained Viterbi to ensure consistent solutions. The primary advantage of such a model is that it allows classification decisions from one level in the text to influence decisions at another. Experiments show that this method can significantly reduce classification error relative to models trained in isolation. 1
Dependency parsing by belief propagation
 In Proceedings of EMNLP
, 2008
"... We formulate dependency parsing as a graphical model with the novel ingredient of global constraints. We show how to apply loopy belief propagation (BP), a simple and effective tool for approximate learning and inference. As a parsing algorithm, BP is both asymptotically and empirically efficient. E ..."
Abstract

Cited by 84 (9 self)
 Add to MetaCart
(Show Context)
We formulate dependency parsing as a graphical model with the novel ingredient of global constraints. We show how to apply loopy belief propagation (BP), a simple and effective tool for approximate learning and inference. As a parsing algorithm, BP is both asymptotically and empirically efficient. Even with secondorder features or latent variables, which would make exact parsing considerably slower or NPhard, BP needs only O(n3) time with a small constant factor. Furthermore, such features significantly improve parse accuracy over exact firstorder methods. Incorporating additional features would increase the runtime additively rather than multiplicatively. 1
Characterizing the errors of datadriven dependency parsing models
 Proceedings of the Conference on Empirical Methods in Natural Language Processing and Natural Language Learning
, 2007
"... We present a comparative error analysis of the two dominant approaches in datadriven dependency parsing: global, exhaustive, graphbased models, and local, greedy, transitionbased models. We show that, in spite of similar performance overall, the two models produce different types of errors, in a w ..."
Abstract

Cited by 79 (13 self)
 Add to MetaCart
We present a comparative error analysis of the two dominant approaches in datadriven dependency parsing: global, exhaustive, graphbased models, and local, greedy, transitionbased models. We show that, in spite of similar performance overall, the two models produce different types of errors, in a way that can be explained by theoretical properties of the two models. This analysis leads to new directions for parser development. 1
Distributed Training Strategies for the Structured Perceptron
"... Perceptron training is widely applied in the natural language processing community for learning complex structured models. Like all structured prediction learning frameworks, the structured perceptron can be costly to train as training complexity is proportional to inference, which is frequently non ..."
Abstract

Cited by 77 (4 self)
 Add to MetaCart
Perceptron training is widely applied in the natural language processing community for learning complex structured models. Like all structured prediction learning frameworks, the structured perceptron can be costly to train as training complexity is proportional to inference, which is frequently nonlinear in example sequence length. In this paper we investigate distributed training strategies for the structured perceptron as a means to reduce training times when computing clusters are available. We look at two strategies and provide convergence bounds for a particular mode of distributed structured perceptron training based on iterative parameter mixing (or averaging). We present experiments on two structured prediction problems – namedentity recognition and dependency parsing – to highlight the efficiency of this method. 1
Multilingual dependency analysis with a twostage discriminative parser
 IN PROCEEDINGS OF THE CONFERENCE ON COMPUTATIONAL NATURAL LANGUAGE LEARNING (CONLL
, 2006
"... We present a twostage multilingual dependency parser and evaluate it on 13 diverse languages. The first stage is based on the unlabeled dependency parsing models described by McDonald and Pereira (2006) augmented with morphological features for a subset of the languages. The second stage takes the ..."
Abstract

Cited by 73 (4 self)
 Add to MetaCart
We present a twostage multilingual dependency parser and evaluate it on 13 diverse languages. The first stage is based on the unlabeled dependency parsing models described by McDonald and Pereira (2006) augmented with morphological features for a subset of the languages. The second stage takes the output from the first and labels all the edges in the dependency graph with appropriate syntactic categories using a globally trained sequence classifier over components of the graph. We report results on the CoNLLX shared task (Buchholz et al., 2006) data sets and present an error analysis.
Adaptive Regularization of Weight Vectors
 Advances in Neural Information Processing Systems 22
, 2009
"... We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle nonseparable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform ..."
Abstract

Cited by 71 (17 self)
 Add to MetaCart
We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle nonseparable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent confidenceweighted online learning techniques and show empirically that AROW achieves stateoftheart performance and notable robustness in the case of nonseparable data. 1
Semantic Parsing on Freebase from QuestionAnswer Pairs
"... In this paper, we train a semantic parser that scales up to Freebase. Instead of relying on annotated logical forms, which is especially expensive to obtain at large scale, we learn from questionanswer pairs. The main challenge in this setting is narrowing down the huge number of possible logical p ..."
Abstract

Cited by 60 (9 self)
 Add to MetaCart
(Show Context)
In this paper, we train a semantic parser that scales up to Freebase. Instead of relying on annotated logical forms, which is especially expensive to obtain at large scale, we learn from questionanswer pairs. The main challenge in this setting is narrowing down the huge number of possible logical predicates for a given question. We tackle this problem in two ways: First, we build a coarse mapping from phrases to predicates using a knowledge base and a large text corpus. Second, we use a bridging operation to generate additional predicates based on neighboring predicates. On the dataset of Cai and Yates (2013), despite not having annotated logical forms, our system outperforms their stateoftheart parser. Additionally, we collected a more realistic and challenging dataset of questionanswer pairs and improves over a natural baseline. 1