Results 1  10
of
13
Bethe Bounds and Approximating the Global Optimum
"... Abstract—Inference in general Markov random fields (MRFs) is NPhard, though identifying the maximum a posteriori (MAP) configuration of pairwise MRFs with submodular cost functions is efficiently solvable using graph cuts. Marginal inference, however, even for this restricted class, is in #P. We pr ..."
Abstract

Cited by 9 (8 self)
 Add to MetaCart
Abstract—Inference in general Markov random fields (MRFs) is NPhard, though identifying the maximum a posteriori (MAP) configuration of pairwise MRFs with submodular cost functions is efficiently solvable using graph cuts. Marginal inference, however, even for this restricted class, is in #P. We prove new formulations of derivatives of the Bethe free energy, provide bounds on the derivatives and bracket the locations of stationary points, introducing a new technique called Bethe bound propagation. Several results apply to pairwise models whether associative or not. Applying these to discretized pseudomarginals in the associative case we present a polynomial time approximation scheme for global optimization provided the maximum degree is O(log n), anddiscussseveralextensions. I.
Predictive trip planning – smart routing in smart cities
 In Workshop Proceedings of the EDBT/ICDT 2014 Joint Conference
, 2014
"... Smart route planning gathers increasing interest as cities become crowded and jammed. We present a system for individual trip planning that incorporates future tra c hazards in routing. Future tra c conditions are computed by a SpatioTemporal Random Field based on a stream of sensor readings. In ad ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Smart route planning gathers increasing interest as cities become crowded and jammed. We present a system for individual trip planning that incorporates future tra c hazards in routing. Future tra c conditions are computed by a SpatioTemporal Random Field based on a stream of sensor readings. In addition, our approach estimates tra c flow in areas with low sensor coverage using a Gaussian Process Regression. The conditioning of spatial regression on intermediate predictions of a discrete probabilistic graphical model allows to incorporate historical data, streamed online data and a rich dependency structure at the same time. We demonstrate the system and test model assumptions with a realworld usecase from Dublin city, Ireland.
Understanding the Bethe Approximation: When and How can it go Wrong?
"... Belief propagation is a remarkably effective tool for inference, even when applied to networks with cycles. It may be viewed as a way to seek the minimum of the Bethe free energy, though with no convergence guarantee in general. A variational perspective shows that, compared to exact inference, this ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
Belief propagation is a remarkably effective tool for inference, even when applied to networks with cycles. It may be viewed as a way to seek the minimum of the Bethe free energy, though with no convergence guarantee in general. A variational perspective shows that, compared to exact inference, this minimization employs two forms of approximation: (i) the true entropy is approximated by the Bethe entropy, and (ii) the minimization is performed over a relaxation of the marginal polytope termed the local polytope. Here we explore when and how the Bethe approximation can fail for binary pairwise models by examining each aspect of the approximation, deriving results both analytically and with new experimental methods. 1
Approximating the Bethe partition function
"... When belief propagation (BP) converges, it does so to a stationary point of the Bethe free energy F, and is often strikingly accurate. However, it may converge only to a local optimum or may not converge at all. An algorithm was recently introduced by Weller and Jebara for attractive binary pairwise ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
When belief propagation (BP) converges, it does so to a stationary point of the Bethe free energy F, and is often strikingly accurate. However, it may converge only to a local optimum or may not converge at all. An algorithm was recently introduced by Weller and Jebara for attractive binary pairwise MRFs which is guaranteed to return an ɛapproximation to the global minimum of F in polynomial time provided the maximum degree ∆ = O(log n), where n is the number of variables. Here we extend their approach and derive a new method based on analyzing first derivatives of F, which leads to much better performance and, for attractive models, yields a fully polynomialtime approximation scheme (FPTAS) without any degree restriction. Further, our methods apply to general (nonattractive) models, though with no polynomial time guarantee in this case, demonstrating that approximating log of the Bethe partition function, log ZB = − min F, for a general model to additive ɛaccuracy may be reduced to a discrete MAP inference problem. This allows the merits of the global Bethe optimum to be tested.
Network Ranking With Bethe Pseudomarginals
, 2013
"... Network structure often contains information that can be useful for ranking algorithms. We incorporate network structure by formulating ranking as marginal inference in a Markov random field (MRF). Though inference is generally NPhard, we apply a recentlydeveloped polynomialtime approximation sch ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Network structure often contains information that can be useful for ranking algorithms. We incorporate network structure by formulating ranking as marginal inference in a Markov random field (MRF). Though inference is generally NPhard, we apply a recentlydeveloped polynomialtime approximation scheme (PTAS) to infer Bethe pseudomarginals. As a case study, we investigate the problem of ranking failing transformers that are physically connected in a network. Compared to independent scorebased ranking, the current state of the art, we show superior ranking results. We conclude by discussing an empirical phenomenon of critical parameter regions, with implications for new algorithms.
Marginal Likelihoods for Distributed Estimation of Graphical Model Parameters
"... This paper considers the estimation of graphical model parameters with distributed data collection and computation. We first discuss the use and limitations of wellknown distributed methods for marginal inference in the context of parameter estimation. We then describe an alternative framework for ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
This paper considers the estimation of graphical model parameters with distributed data collection and computation. We first discuss the use and limitations of wellknown distributed methods for marginal inference in the context of parameter estimation. We then describe an alternative framework for distributed parameter estimation based on maximizing marginal likelihoods. Each node independently estimates local parameters through solving a lowdimensional convex optimization with data collected from its local neighborhood. The local estimates are then combined into a global estimate without iterative messagepassing. We provide an asymptotic analysis of the proposed estimator, deriving in particular its rate of convergence. Numerical experiments validate the rate of convergence and demonstrate performance equivalent to the centralized maximum likelihood estimator.
Marginal likelihoods for distributed parameter estimation of Gaussian graphical models
 MACHINE LEARNING: A PROBABILISTIC PERSPECTIVE
, 2014
"... We consider distributed estimation of the inverse covariance matrix, also called the concentration or precision matrix, in Gaussian graphical models. Traditional centralized estimation often requires global inference of the covariance matrix, which can be computationally intensive in large dimensio ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider distributed estimation of the inverse covariance matrix, also called the concentration or precision matrix, in Gaussian graphical models. Traditional centralized estimation often requires global inference of the covariance matrix, which can be computationally intensive in large dimensions. Approximate inference based on messagepassing algorithms, on the other hand, can lead to unstable and biased estimation in loopy graphical models. Here, we propose a general framework for distributed estimation based on a maximum marginal likelihood (MML) approach. This approach computes local parameter estimates by maximizing marginal likelihoods defined with respect to data collected from local neighborhoods. Due to the nonconvexity of the MML problem, we introduce and solve a convex relaxation. The local estimates are then combined into a global estimate without the need for iterative messagepassing between neighborhoods. The proposed algorithm is naturally parallelizable and computationally efficient, thereby making it suitable for highdimensional problems. In the classical regime where the number of variables is fixed and the number of samples increases to infinity, the proposed estimator is shown to be asymptotically consistent and to improve monotonically as the local neighborhood size increases. In the highdimensional scaling regime where both and increase to infinity, the convergence rate to the true parameters is derived and is seen to be comparable to centralized maximumlikelihood estimation. Extensive numerical experiments demonstrate the improved performance of the twohop version of the proposed estimator, which suffices to almost close the gap to the centralized maximum likelihood estimator at a reduced computational cost.
dortmund.de
"... dortmund.de We consider a city where inductionbased vehicle count sensors are installed at some, but not all street junctions. Each sensor regularly outputs a count and a saturation value. We first use a discrete time GaussMarkov model based on historical data to predict the evolution of these s ..."
Abstract
 Add to MetaCart
(Show Context)
dortmund.de We consider a city where inductionbased vehicle count sensors are installed at some, but not all street junctions. Each sensor regularly outputs a count and a saturation value. We first use a discrete time GaussMarkov model based on historical data to predict the evolution of these saturation values, and then a Gaussian Process derived from the street graph to extend these predictions to all junctions. We construct this model based on real data collected in Dublin city. Categories and Subject Descriptors G.3 [Probability and Statistics]: Markov processes, multivariate statistics, stochastic processes, time series analysis;
Learning CRFs for Image Parsing with Adaptive Subgradient Descent
"... We propose an adaptive subgradient descent method to efficiently learn the parameters of CRF models for image parsing. To balance the learning efficiency and performance of the learned CRF models, the parameter learning is iteratively carried out by solving a convex optimization problem in each i ..."
Abstract
 Add to MetaCart
(Show Context)
We propose an adaptive subgradient descent method to efficiently learn the parameters of CRF models for image parsing. To balance the learning efficiency and performance of the learned CRF models, the parameter learning is iteratively carried out by solving a convex optimization problem in each iteration, which integrates a proximal term to preserve the previously learned information and the large margin preference to distinguish bad labeling and the ground truth labeling. A solution of subgradient descent updating form is derived for the convex optimization problem, with an adaptively determined updating stepsize. Besides, to deal with partially labeled training data, we propose a new objective constraint modeling both the labeled and unlabeled parts in the partially labeled training data for the parameter learning of CRF models. The superior learning efficiency of the proposed method is verified by the experiment results on two public datasets. We also demonstrate the powerfulness of our method for handling partially labeled training data. 1.