Results 1  10
of
17
NormProduct Belief Propagation: PrimalDual MessagePassing for Approximate Inference
, 2008
"... Inference problems in graphical models can be represented as a constrained optimization of a free energy function. In this paper we treat both forms of probabilistic inference, estimating marginal probabilities of the joint distribution and finding the most probable assignment, through a unified me ..."
Abstract

Cited by 53 (11 self)
 Add to MetaCart
(Show Context)
Inference problems in graphical models can be represented as a constrained optimization of a free energy function. In this paper we treat both forms of probabilistic inference, estimating marginal probabilities of the joint distribution and finding the most probable assignment, through a unified messagepassing algorithm architecture. In particular we generalize the Belief Propagation (BP) algorithms of sumproduct and maxproduct and treerewaighted (TRW) sum and max product algorithms (TRBP) and introduce a new set of convergent algorithms based on ”convexfreeenergy” and LinearProgramming (LP) relaxation as a zerotemprature of a convexfreeenergy. The main idea of this work arises from taking a general perspective on the existing BP and TRBP algorithms while observing that they all are reductions from the basic optimization formula of f + ∑ i hi
What Cannot be Learned with Bethe Approximations
"... We address the problem of learning the parameters in graphical models when inference is intractable. A common strategy in this case is to replace the partition function with its Bethe approximation. We show that there exists a regime of empirical marginals where such Bethe learning will fail. By fai ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
We address the problem of learning the parameters in graphical models when inference is intractable. A common strategy in this case is to replace the partition function with its Bethe approximation. We show that there exists a regime of empirical marginals where such Bethe learning will fail. By failure we mean that the empirical marginals cannot be recovered from the approximated maximum likelihood parameters (i.e., moment matching is not achieved). We provide several conditions on empirical marginals that yield outer and inner bounds on the set of Bethe learnable marginals. An interesting implication of
FastInf: An efficient approximate inference library
 Journal of Machine Learning Research
"... The FastInf C++ library is designed to perform memory and time efficient approximate inference in largescale discrete undirected graphical models. The focus of the library is propagation based approximate inference methods, ranging from the basic loopy belief propagation algorithm to propagation ba ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
The FastInf C++ library is designed to perform memory and time efficient approximate inference in largescale discrete undirected graphical models. The focus of the library is propagation based approximate inference methods, ranging from the basic loopy belief propagation algorithm to propagation based on convex free energies. Various message scheduling schemes that improve on the standard synchronous or asynchronous approaches are included. Also implemented are a clique tree based exact inference, Gibbs sampling, and the mean field algorithm. In addition to inference, FastInf provides parameter estimation capabilities as well as representation and learning of shared parameters. It offers a rich interface that facilitates extension of the basic classes to other inference and learning methods.
Hingeloss Markov random fields and probabilistic soft logic
, 2015
"... A fundamental challenge in developing highimpact machine learning technologies is balancing the ability to model rich, structured domains with the ability to scale to big data. Many important problem areas are both richly structured and large scale, from social and biological networks, to knowledge ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
A fundamental challenge in developing highimpact machine learning technologies is balancing the ability to model rich, structured domains with the ability to scale to big data. Many important problem areas are both richly structured and large scale, from social and biological networks, to knowledge graphs and the Web, to images, video, and natural language. In this paper, we introduce two new formalisms for modeling structured data, distinguished from previous approaches by their ability to both capture rich structure and scale to big data. The first, hingeloss Markov random fields (HLMRFs), is a new kind of probabilistic graphical model that generalizes different approaches to convex inference. We unite three approaches from the randomized algorithms, probabilistic graphical models, and fuzzy logic communities, showing that all three lead to the same inference objective. We then derive HLMRFs by generalizing this unified objective. The second new formalism, probabilistic soft logic (PSL), is a probabilistic programming language that makes HLMRFs easy to define using a syntax based on firstorder logic. We next introduce an algorithm for inferring mostprobable variable assignments (MAP inference) that is much more scalable than generalpurpose convex optimization software, because it uses message passing to take advantage of sparse dependency structures. We then show how to learn the parameters of HLMRFs. The learned HLMRFs are as accurate as analogous discrete models, but much more scalable. Together, these algorithms enable HLMRFs and PSL to model rich, structured data at scales not previously possible.
Understanding the Bethe Approximation: When and How can it go Wrong?
"... Belief propagation is a remarkably effective tool for inference, even when applied to networks with cycles. It may be viewed as a way to seek the minimum of the Bethe free energy, though with no convergence guarantee in general. A variational perspective shows that, compared to exact inference, this ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
Belief propagation is a remarkably effective tool for inference, even when applied to networks with cycles. It may be viewed as a way to seek the minimum of the Bethe free energy, though with no convergence guarantee in general. A variational perspective shows that, compared to exact inference, this minimization employs two forms of approximation: (i) the true entropy is approximated by the Bethe entropy, and (ii) the minimization is performed over a relaxation of the marginal polytope termed the local polytope. Here we explore when and how the Bethe approximation can fail for binary pairwise models by examining each aspect of the approximation, deriving results both analytically and with new experimental methods. 1
Approximating the Bethe partition function
"... When belief propagation (BP) converges, it does so to a stationary point of the Bethe free energy F, and is often strikingly accurate. However, it may converge only to a local optimum or may not converge at all. An algorithm was recently introduced by Weller and Jebara for attractive binary pairwise ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
When belief propagation (BP) converges, it does so to a stationary point of the Bethe free energy F, and is often strikingly accurate. However, it may converge only to a local optimum or may not converge at all. An algorithm was recently introduced by Weller and Jebara for attractive binary pairwise MRFs which is guaranteed to return an ɛapproximation to the global minimum of F in polynomial time provided the maximum degree ∆ = O(log n), where n is the number of variables. Here we extend their approach and derive a new method based on analyzing first derivatives of F, which leads to much better performance and, for attractive models, yields a fully polynomialtime approximation scheme (FPTAS) without any degree restriction. Further, our methods apply to general (nonattractive) models, though with no polynomial time guarantee in this case, demonstrating that approximating log of the Bethe partition function, log ZB = − min F, for a general model to additive ɛaccuracy may be reduced to a discrete MAP inference problem. This allows the merits of the global Bethe optimum to be tested.
Implicit Differentiation by Perturbation
"... This paper proposes a simple and efficient finite difference method for implicit differentiation of marginal inference results in discrete graphical models. Given an arbitrary loss function, defined on marginals, we show that the derivatives of this loss with respect to model parameters can be obtai ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
This paper proposes a simple and efficient finite difference method for implicit differentiation of marginal inference results in discrete graphical models. Given an arbitrary loss function, defined on marginals, we show that the derivatives of this loss with respect to model parameters can be obtained by running the inference procedure twice, on slightly perturbed model parameters. This method can be used with approximate inference, with a loss function over approximate marginals. Convenient choices of loss functions make it practical to fit graphical models with hidden variables, high treewidth and/or model misspecification. 1
Approximate Learning for Structured Prediction Problems
"... Prediction problems such as image segmentation, sentence parsing, and gene prediction involve complex output spaces for which multiple decisions must be coordinated to achieve optimal results. Unfortunately, this means that there are generally an exponential number of possible predictions for every ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Prediction problems such as image segmentation, sentence parsing, and gene prediction involve complex output spaces for which multiple decisions must be coordinated to achieve optimal results. Unfortunately, this means that there are generally an exponential number of possible predictions for every input. Markov random fields can be used to express structure in these output spaces, reducing the number of model parameters to a manageable size; however, the problem of learning those parameters from a training sample remains NPhard in general. We review some recent results on approximate learning of structured prediction problems. There are two distinct approaches. In the first, results from the wellstudied field of approximate inference are adapted to the learning setting. In the second, learning performance is characterized directly, producing bounds even when the underlying inference method does not offer formal approximation guarantees. While the literature on this topic is still sparse, we review the strengths and weaknesses of current results, and discuss issues
Paireddual learning for fast training of latent variable hingeloss mrfs
 In Proceedings of the International Conference of Machine Learning
, 2015
"... Latent variables allow probabilistic graphical models to capture nuance and structure in important domains such as network science, natural language processing, and computer vision. Naive approaches to learning such complex models can be prohibitively expensive—because they require repeated inferen ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Latent variables allow probabilistic graphical models to capture nuance and structure in important domains such as network science, natural language processing, and computer vision. Naive approaches to learning such complex models can be prohibitively expensive—because they require repeated inferences to update beliefs about latent variables—so lifting this restriction for useful classes of models is an important problem. Hingeloss Markov random fields (HLMRFs) are graphical models that allow highly scalable inference and learning in structured domains, in part by representing structured problems with continuous variables. However, this representation leads to challenges when learning with latent variables. We introduce paireddual learning, a framework that greatly speeds up training by using tractable entropy surrogates and avoiding repeated inferences. Paireddual learning optimizes an objective with a pair of dual inference problems. This allows fast, joint optimization of parameters and dual variables. We evaluate on socialgroup detection, trust prediction in social networks, and image reconstruction, finding that paireddual learning trains models as accurate as those trained by traditional methods in much less time, often before traditional methods make even a single parameter update. 1.