Results 1 
6 of
6
Scalable variational inference in logsupermodular models
, 2015
"... We consider the problem of approximate Bayesian inference in logsupermodular models. These models encompass regular pairwise MRFs with binary variables, but allow to capture highorder interactions, which are intractable for existing approximate inference techniques such as belief propagation, mean ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We consider the problem of approximate Bayesian inference in logsupermodular models. These models encompass regular pairwise MRFs with binary variables, but allow to capture highorder interactions, which are intractable for existing approximate inference techniques such as belief propagation, mean field, and variants. We show that a recently proposed variational approach to inference in logsupermodular models –LFIELD – reduces to the widelystudied minimum norm problem for submodular minimization. This insight allows to leverage powerful existing tools, and hence to solve the variational problem orders of magnitude more efficiently than previously possible. We then provide another natural interpretation of LFIELD, demonstrating that it exactly minimizes a specific type of Rényi divergence measure. This insight sheds light on the nature of the variational approximations produced by LFIELD. Furthermore, we show how to perform parallel inference as message passing in a suitable factor graph at a linear convergence rate, without having to sum up over all the configurations of the factor. Finally, we apply our approach to a challenging image segmentation task. Our experiments confirm scalability of our approach, high quality of the marginals, and the benefit of incorporating higherorder potentials.
HigherOrder Inference for Multiclass Logsupermodular Models
"... Although shown to be a very powerful tool in computer vision, existing higherorder models are mostly restricted to computing MAP configuration for specific energy functions. In this thesis, we propose a multiclass model along with a variational marginal inference formulation for capturing higher ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Although shown to be a very powerful tool in computer vision, existing higherorder models are mostly restricted to computing MAP configuration for specific energy functions. In this thesis, we propose a multiclass model along with a variational marginal inference formulation for capturing higherorder logsupermodular interactions. Our modeling technique utilizes set functions by incorporating constraints that each variable is assigned to exactly one class. Marginal inference for our model can be done efficiently by either FrankWolfe or a softmovemaking algorithm, both of which are easily parallelized. To simutaneously address the associated MAP problem, we extend marginal inference formulation to a parameterized version as smoothed MAP inference. Accompanying the extension, we present a rigorous analysis on the efficiency and accuracy tradeoff by varying the smoothing strength. We evaluate the scalability and the effectiveness of our approach in the task of natural scene image segmentation, demonstrating stateoftheart performance for both
Parameter Learning for Logsupermodular Distributions
"... Abstract We consider logsupermodular models on binary variables, which are probabilistic models with negative logdensities which are submodular. These models provide probabilistic interpretations of common combinatorial optimization tasks such as image segmentation. In this paper, we focus primar ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We consider logsupermodular models on binary variables, which are probabilistic models with negative logdensities which are submodular. These models provide probabilistic interpretations of common combinatorial optimization tasks such as image segmentation. In this paper, we focus primarily on parameter estimation in the models from known upperbounds on the intractable logpartition function. We show that the bound based on separable optimization on the base polytope of the submodular function is always inferior to a bound based on "perturbandMAP" ideas. Then, to learn parameters, given that our approximation of the logpartition function is an expectation (over our own randomization), we use a stochastic subgradient technique to maximize a lowerbound on the loglikelihood. This can also be extended to conditional maximum likelihood. We illustrate our new results in a set of experiments in binary image denoising, where we highlight the flexibility of a probabilistic model to learn with missing data.
Research Statement
"... Uncertainty is a key factor in realworld problems and I am interested in intelligent and adaptive systems that can cope with complex and uncertain environments. My research is centered on topics in machine learning (where uncertainty typically resides in the observations and data), communication th ..."
Abstract
 Add to MetaCart
(Show Context)
Uncertainty is a key factor in realworld problems and I am interested in intelligent and adaptive systems that can cope with complex and uncertain environments. My research is centered on topics in machine learning (where uncertainty typically resides in the observations and data), communication theory (where uncertainty is due to the transmission medium), and random combinatorial structures (where uncertainty is in the underlying graphical model). The main theme, which unifies these three fields, is information theory. My research focus is on constructing an information theoretic framework to describe the role of uncertainty in data and models. Within such a framework, I develop algorithms by drawing upon probabilistic reasoning, message passing techniques, stochastic optimisation, and approximation algorithms to study questions such as: How can we build systems that acquire the most important information at the lowest cost? How can we design efficient algorithms, often involving large amounts of variables and data, for inference problems? How can we summarise massive amounts of data into a small number of informative representatives and use the smaller set for processing tasks? How can we design practical coding schemes to transfer data inside networks of possibly many individuals? Moreover, I use concepts and techniques from information theory, probability theory, and optimisation to analyse the performance of algorithms and find fundamental tradeoffs. My dissertation research was in the fields of information theory and graphical models. I investigated
Importance Sampling over Sets: A New Probabilistic Inference Scheme
"... Computing expectations in highdimensional spaces is a key challenge in probabilistic inference and machine learning. Monte Carlo sampling, and importance sampling in particular, is one of the leading approaches. We propose a generalized importance sampling scheme based on randomly selecting (expo ..."
Abstract
 Add to MetaCart
Computing expectations in highdimensional spaces is a key challenge in probabilistic inference and machine learning. Monte Carlo sampling, and importance sampling in particular, is one of the leading approaches. We propose a generalized importance sampling scheme based on randomly selecting (exponentially large) subsets of states rather than individual ones. By collecting a small number of extreme states in the sampled sets, we obtain estimates of statistics of interest, such as the partition function of an undirected graphical model. We incorporate this idea into a novel maximum likelihood learning algorithm based on cutting planes. We demonstrate empirically that our scheme provides accurate answers and scales to problems with up to a million variables. 1
Submodular Point Processes with Applications to Machine Learning
"... Abstract We introduce a class of discrete point processes that we call the Submodular Point Processes (SPPs). These processes are characterized via a submodular (or supermodular) function, and naturally model notions of information, coverage and diversity, as well as cooperation. Unlike Logsubmodu ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We introduce a class of discrete point processes that we call the Submodular Point Processes (SPPs). These processes are characterized via a submodular (or supermodular) function, and naturally model notions of information, coverage and diversity, as well as cooperation. Unlike Logsubmodular and Logsupermodular distributions (LogSPPs) such as determinantal point processes (DPPs), SPPs are themselves submodular (or supermodular). In this paper, we analyze the computational complexity of probabilistic inference in SPPs. We show that computing the partition function for SPPs (and LogSPPs), requires exponential complexity in the worst case, and also provide algorithms which approximate SPPs up to polynomial factors. Moreover, for several subclasses of interesting submodular functions that occur in applications, we show how we can provide efficient closed form expressions for the partition functions, and thereby marginals and conditional distributions. We also show how SPPs are closed under mixtures, thus enabling maximum likelihood based strategies for learning mixtures of submodular functions. Finally, we argue how SPPs complement existing LogSPP distributions, and are a natural model for several applications.