Results 1  10
of
38
Dual decomposition for parsing with nonprojective head automata
 In Proc. of EMNLP
, 2010
"... This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of headautomata models to nonprojective structures. The dual decomposition algorithms are simple and efficient, relying on standa ..."
Abstract

Cited by 101 (16 self)
 Add to MetaCart
This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of headautomata models to nonprojective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the nonprojective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98 % of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets. 1
A primaldual messagepassing algorithm for approximated large scale structured prediction
 In Advances in Neural Information Processing Systems 23
, 2010
"... In this paper we propose an approximated structured prediction framework for large scale graphical models and derive messagepassing algorithms for learning their parameters efficiently. We first relate CRFs and structured SVMs and show that in CRFs a variant of the logpartition function, known as ..."
Abstract

Cited by 38 (19 self)
 Add to MetaCart
(Show Context)
In this paper we propose an approximated structured prediction framework for large scale graphical models and derive messagepassing algorithms for learning their parameters efficiently. We first relate CRFs and structured SVMs and show that in CRFs a variant of the logpartition function, known as the softmax, smoothly approximates the hinge loss function of structured SVMs. We then propose an intuitive approximation for the structured prediction problem, using duality, based on a local entropy approximation and derive an efficient messagepassing algorithm that is guaranteed to converge. Unlike existing approaches, this allows us to learn efficiently graphical models with cycles and very large number of parameters. 1
An Alternating Direction Method for Dual MAP LP Relaxation
"... Maximum aposteriori (MAP) estimation is an important task in many applications of probabilistic graphical models. Although finding an exact solution is generally intractable, approximations based on linear programming (LP) relaxation often provide good approximate solutions. In this paper we prese ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
(Show Context)
Maximum aposteriori (MAP) estimation is an important task in many applications of probabilistic graphical models. Although finding an exact solution is generally intractable, approximations based on linear programming (LP) relaxation often provide good approximate solutions. In this paper we present an algorithm for solving the LP relaxation optimization problem. In order to overcome the lack of strict convexity, we apply an augmented Lagrangian method to the dual LP. The algorithm, based on the alternating direction method of multipliers (ADMM), is guaranteed to converge to the global optimum of the LP relaxation objective. Our experimental results show that this algorithm is competitive with other stateoftheart algorithms for approximate MAP estimation.
Approximate Inference in Graphical Models using LP Relaxations
, 2010
"... Graphical models such as Markov random fields have been successfully applied to a wide variety of fields, from computer vision and natural language processing, to computational biology. Exact probabilistic inference is generally intractable in complex models having many dependencies between the vari ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Graphical models such as Markov random fields have been successfully applied to a wide variety of fields, from computer vision and natural language processing, to computational biology. Exact probabilistic inference is generally intractable in complex models having many dependencies between the variables. We present new approaches to approximate inference based on linear programming (LP) relaxations. Our algorithms optimize over the cycle relaxation of the marginal polytope, which we show to be closely related to the first lifting of the SheraliAdams hierarchy, and is significantly tighter than the pairwise LP relaxation. We show how to efficiently optimize over the cycle relaxation using a cuttingplane algorithm that iteratively introduces constraints into the relaxation. We provide a criterion to determine which constraints would be most helpful in tightening the relaxation, and give efficient algorithms for solving the search problem of finding the best cycle constraint to add according to this criterion.
CuttingPlane Training of Nonassociative Markov Network for 3D Point Cloud Segmentation
, 2011
"... ..."
Convergence rate analysis of MAP coordinate minimization algorithms
 In NIPS. 2012
"... Finding maximum a posteriori (MAP) assignments in graphical models is an important task in many applications. Since the problem is generally hard, linear programming (LP) relaxations are often used. Solving these relaxations efficiently is thus an important practical problem. In recent years, seve ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
Finding maximum a posteriori (MAP) assignments in graphical models is an important task in many applications. Since the problem is generally hard, linear programming (LP) relaxations are often used. Solving these relaxations efficiently is thus an important practical problem. In recent years, several authors have proposed message passing updates corresponding to coordinate descent in the dual LP. However, these are generally not guaranteed to converge to a global optimum. One approach to remedy this is to smooth the LP, and perform coordinate descent on the smoothed dual. However, little is known about the convergence rate of this procedure. Here we perform a thorough rate analysis of such schemes and derive primal and dual convergence rates. We also provide a simple dual to primal mapping that yields feasible primal solutions with a guaranteed rate of convergence. Empirical evaluation supports our theoretical claims and shows that the method is highly competitive with state of the art approaches that yield global optima. 1
Structured Prediction via Output Space Search
 Journal of Machine Learning Research (JMLR
, 2014
"... We consider a framework for structured prediction based on search in the space of complete structured outputs. Given a structured input, an output is produced by running a timebounded search procedure guided by a learned cost function, and then returning the least cost output uncovered during the s ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
We consider a framework for structured prediction based on search in the space of complete structured outputs. Given a structured input, an output is produced by running a timebounded search procedure guided by a learned cost function, and then returning the least cost output uncovered during the search. This framework can be instantiated for a wide range of search spaces and search procedures, and easily incorporates arbitrary structuredprediction loss functions. In this paper, we make two main technical contributions. First, we describe a novel approach to automatically defining an effective search space over structured outputs, which is able to leverage the availability of powerful classification learning algorithms. In particular, we define the limiteddiscrepancy search space and relate the quality of that space to the quality of learned classifiers. We also define a sparse version of the search space to improve the efficiency of our overall approach. Second, we give a generic cost function learning approach that is applicable to a wide range of search procedures. The key idea is to learn a cost function that attempts to mimic the behavior of conducting searches guided by the true loss function. Our experiments on six benchmark domains show that a small amount of search in limited discrepancy search space is often sufficient for significantly improving on stateoftheart structuredprediction performance. We also demonstrate significant speed improvements for our approach using sparse search spaces with little or no loss in accuracy.
Learning deep structured models
, 2014
"... * equal contribution Many problems in realworld applications involve predicting several random variables that are statistically related. Markov random fields (MRFs) are a great mathematical tool to encode such dependencies. The goal of this paper is to combine MRFs with deep learning to estimate c ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
* equal contribution Many problems in realworld applications involve predicting several random variables that are statistically related. Markov random fields (MRFs) are a great mathematical tool to encode such dependencies. The goal of this paper is to combine MRFs with deep learning to estimate complex representations while taking into account the dependencies between the output random variables. Towards this goal, we propose a training algorithm that is able to learn structured models jointly with deep features that form the MRF potentials. Our approach is efficient as it blends learning and inference and makes use of GPU acceleration. We demonstrate the effectiveness of our algorithm in the tasks of predicting words from noisy images, as well as tagging of Flickr photographs. We show that joint learning of the deep features and the MRF parameters results in significant performance gains. 1.
Structured learning via logistic regression
 In: NIPS (2013
"... A successful approach to structured learning is to write the learning objective as a joint function of linear parameters and inference messages, and iterate between updates to each. This paper observes that if the inference problem is “smoothed” through the addition of entropy terms, for fixed messa ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
A successful approach to structured learning is to write the learning objective as a joint function of linear parameters and inference messages, and iterate between updates to each. This paper observes that if the inference problem is “smoothed” through the addition of entropy terms, for fixed messages, the learning objective reduces to a traditional (nonstructured) logistic regression problem with respect to parameters. In these logistic regression problems, each training example has a bias term determined by the current set of messages. Based on this insight, the structured energy function can be extended from linear factors to any function class where an “oracle ” exists to minimize a logistic loss. 1
HCsearch: A learning framework for searchbased structured prediction
 JAIR
"... Structured prediction is the problem of learning a function that maps structured inputs to structured outputs. Prototypical examples of structured prediction include partofspeech tagging and semantic segmentation of images. Inspired by the recent successes of searchbased structured prediction, we ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Structured prediction is the problem of learning a function that maps structured inputs to structured outputs. Prototypical examples of structured prediction include partofspeech tagging and semantic segmentation of images. Inspired by the recent successes of searchbased structured prediction, we introduce a new framework for structured prediction called HCSearch. Given a structured input, the framework uses a search procedure guided by a learned heuristic H to uncover high quality candidate outputs and then employs a separate learned cost function C to select a final prediction among those outputs. The overall loss of this prediction architecture decomposes into the loss due to H not leading to high quality outputs, and the loss due to C not selecting the best among the generated outputs. Guided by this decomposition, we minimize the overall loss in a greedy stagewise manner by first training H to quickly uncover high quality outputs via imitation learning, and then training C to correctly rank the outputs generated via H according to their true losses. Importantly, this training procedure is sensitive to the particular loss function of interest and the timebound allowed for predictions. Experiments on several benchmark domains show that our approach significantly outperforms several stateoftheart methods. 1.