Results 1 - 10
of
10
From MAP to marginals: Variational inference in Bayesian submodular models
- In Neural Information Processing Systems (NIPS
, 2014
"... Submodular optimization has found many applications in machine learning and beyond. We carry out the first systematic investigation of inference in probabilis-tic models defined through submodular functions, generalizing regular pairwise MRFs and Determinantal Point Processes. In particular, we pres ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Submodular optimization has found many applications in machine learning and beyond. We carry out the first systematic investigation of inference in probabilis-tic models defined through submodular functions, generalizing regular pairwise MRFs and Determinantal Point Processes. In particular, we present L-FIELD, a variational approach to general log-submodular and log-supermodular distribu-tions based on sub- and supergradients. We obtain both lower and upper bounds on the log-partition function, which enables us to compute probability intervals for marginals, conditionals and marginal likelihoods. We also obtain fully factor-ized approximate posteriors, at the same computational cost as ordinary submod-ular optimization. Our framework results in convex problems for optimizing over differentials of submodular functions, which we show how to optimally solve. We provide theoretical guarantees of the approximation quality with respect to the curvature of the function. We further establish natural relations between our variational approach and the classical mean-field method. Lastly, we empirically demonstrate the accuracy of our inference scheme on several submodular models. 1
Playing with duality: An overview of recent primal-dual approaches for . . .
, 2014
"... Optimization methods are at the core of many problems in signal/image processing, computer vision, and machine learning. For a long time, it has been recognized that looking at the dual of an optimization problem may drastically simplify its solution. Deriving efficient strategies jointly bringing i ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Optimization methods are at the core of many problems in signal/image processing, computer vision, and machine learning. For a long time, it has been recognized that looking at the dual of an optimization problem may drastically simplify its solution. Deriving efficient strategies jointly bringing into play the primal and the dual problems is however a more recent idea which has generated many important new contributions in the last years. These novel developments are grounded on recent advances in convex analysis, discrete optimization, parallel processing, and nonsmooth optimization with emphasis on sparsity issues. In this paper, we aim at presenting the principles of primal-dual approaches, while giving an overview of numerical methods which have been proposed in different contexts. We show the benefits which can be drawn from primal-dual algorithms both for solving large-scale convex optimization problems and discrete ones, and we provide various application examples to illustrate their usefulness.
Modular proximal optimization for multidimensional total-variation regularization
"... One of the most frequently used notions of “structured sparsity ” is that of sparse (discrete) gradients, a structure typically elicited through Total-Variation (TV) regularizers. This paper focuses on anisotropic TV-regularizers, in particular on `p-norm weighted TV regularizers for which it develo ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
One of the most frequently used notions of “structured sparsity ” is that of sparse (discrete) gradients, a structure typically elicited through Total-Variation (TV) regularizers. This paper focuses on anisotropic TV-regularizers, in particular on `p-norm weighted TV regularizers for which it develops efficient algorithms to compute the corresponding proximity operators. Our algorithms enable one to scalably incorporate TV regularization of vector, matrix, or tensor data into a proximal convex optimization solvers. For the special case of vectors, we derive and implement a highly efficient weighted 1D-TV solver. This solver provides a backbone for subsequently handling the more complex task of higher-dimensional (two or more) TV by means of a modular proximal optimization approach. We present numerical experiments that demonstrate how our 1D-TV solver matches or exceeds the best known 1D-TV solvers. Thereafter, we illustrate the benefits of our modular design through extensive experiments on: (i) image denoising; (ii) image deconvolution; (iii) four variants of fused-lasso; and (iv) video denoising. Our results show the flexibility and speed our TV solvers offer over competing approaches. To underscore our claims, we provide our TV solvers in an easy to use multi-threaded C++ library (which also aids reproducibility of our results). 1
Minding the gaps for block frank-wolfe optimization of structured svms. arXiv preprint arXiv:1605.09346,
, 2016
"... Abstract In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from
Higher-Order Inference for Multi-class Log-supermodular Models
"... Although shown to be a very powerful tool in computer vision, existing higher-order models are mostly restricted to computing MAP configuration for specific energy functions. In this thesis, we propose a multi-class model along with a variational marginal inference formulation for capturing higher ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Although shown to be a very powerful tool in computer vision, existing higher-order models are mostly restricted to computing MAP configuration for specific energy functions. In this thesis, we propose a multi-class model along with a variational marginal inference formulation for capturing higher-order log-supermodular interactions. Our modeling technique utilizes set functions by incorporating constraints that each variable is assigned to exactly one class. Marginal inference for our model can be done efficiently by either Frank-Wolfe or a soft-move-making algorithm, both of which are easily parallelized. To simutaneously address the associated MAP problem, we extend marginal inference formulation to a parameterized version as smoothed MAP inference. Accompanying the extension, we present a rigorous analysis on the efficiency and accuracy trade-off by varying the smoothing strength. We evaluate the scalability and the effectiveness of our approach in the task of natural scene image segmentation, demonstrating state-of-the-art performance for both
On the Reducibility of Submodular Functions
"... Abstract The scalability of submodular optimization methods is critical for their usability in practice. In this paper, we study the reducibility of submodular functions, a property that enables us to reduce the solution space of submodular optimization problems without performance loss. We introdu ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract The scalability of submodular optimization methods is critical for their usability in practice. In this paper, we study the reducibility of submodular functions, a property that enables us to reduce the solution space of submodular optimization problems without performance loss. We introduce the concept of reducibility using marginal gains. Then we show that by adding perturbation, we can endow irreducible functions with reducibility, based on which we propose the perturbationreduction optimization framework. Our theoretical analysis proves that given the perturbation scales, the reducibility gain could be computed, and the performance loss has additive upper bounds. We further conduct empirical studies and the results demonstrate that our proposed framework significantly accelerates existing optimization methods for irreducible submodular functions with a cost of only small performance losses.
Asynchronous Parallel Block-Coordinate Frank-Wolfe
"... Abstract We develop mini-batched parallel Frank-Wolfe (conditional gradient) methods for smooth convex optimization subject to block-separable constraints. Our work includes the basic (batch) Frank-Wolfe algorithm as well as the recently proposed Block-Coordinate Frank-Wolfe (BCFW) method [18] as s ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract We develop mini-batched parallel Frank-Wolfe (conditional gradient) methods for smooth convex optimization subject to block-separable constraints. Our work includes the basic (batch) Frank-Wolfe algorithm as well as the recently proposed Block-Coordinate Frank-Wolfe (BCFW) method [18] as special cases. Our algorithm permits asynchronous updates within the minibatch, and is robust to stragglers and faulty worker threads. Our analysis reveals how the potential speedups over BCFW depend on the minibatch size and how one can provably obtain large problem dependent speedups. We present several experiments to indicate empirical behavior of our methods, obtaining significant speedups over competing state-of-the-art (and synchronous) methods on structural SVMs.
Asynchronous Parallel Block-Coordinate Frank-Wolfe Yu-Xiang Wang †
"... We develop mini-batched parallel Frank-Wolfe (conditional gradient) methods for smooth convex optimization subject to block-separable constraints. Our work includes the basic (batch) Frank-Wolfe algorithm as well as the recently proposed Block-Coordinate Frank-Wolfe (BCFW) method [22] as special cas ..."
Abstract
- Add to MetaCart
We develop mini-batched parallel Frank-Wolfe (conditional gradient) methods for smooth convex optimization subject to block-separable constraints. Our work includes the basic (batch) Frank-Wolfe algorithm as well as the recently proposed Block-Coordinate Frank-Wolfe (BCFW) method [22] as special cases. Our algorithm permits asynchronous updates within the minibatch, and is robust to stragglers and faulty worker threads. Our analysis reveals how the potential speedups over BCFW depend on the minibatch size and how one can provably obtain large problem dependent speedups. We present several experiments to indicate empirical behavior of our methods, obtaining significant speedups over competing state-of-the-art (and synchronous) methods on structural SVMs. 1
Primal-Dual Approaches for Solving Large-Scale Optimization Problems
, 2014
"... ar ..."
(Show Context)