Results 1  10
of
1,589
Learning in graphical models
 STATISTICAL SCIENCE
, 2004
"... Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for ..."
Abstract

Cited by 806 (10 self)
 Add to MetaCart
(Show Context)
Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in largescale data analysis problems. We also present examples of graphical models in bioinformatics, errorcontrol coding and language processing.
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 770 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
High dimensional graphs and variable selection with the Lasso
 ANNALS OF STATISTICS
, 2006
"... The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a ..."
Abstract

Cited by 736 (22 self)
 Add to MetaCart
The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse highdimensional graphs. Neighborhood selection estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. We show that the proposed neighborhood selection scheme is consistent for sparse highdimensional graphs. Consistency hinges on the choice of the penalty parameter. The oracle value for optimal prediction does not lead to a consistent neighborhood estimate. Controlling instead the probability of falsely joining some distinct connectivity components of the graph, consistent estimation for sparse graphs is achieved (with exponential rates), even when the number of variables grows as the number of observations raised to an arbitrary power.
Convolution Kernels on Discrete Structures
, 1999
"... We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs. The method can be applied iteratively to build a kernel on an infinite set from kernels involving generators of the set. The family of kernels generated generalizes the fa ..."
Abstract

Cited by 506 (0 self)
 Add to MetaCart
(Show Context)
We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs. The method can be applied iteratively to build a kernel on an infinite set from kernels involving generators of the set. The family of kernels generated generalizes the family of radial basis kernels. It can also be used to define kernels in the form of joint Gibbs probability distributions. Kernels can be built from hidden Markov random elds, generalized regular expressions, pairHMMs, or ANOVA decompositions. Uses of the method lead to open problems involving the theory of infinitely divisible positive definite functions. Fundamentals of this theory and the theory of reproducing kernel Hilbert spaces are reviewed and applied in establishing the validity of the method.
Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2008
"... We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse. Our approach is to solve a maximum likelihood problem with an added ℓ1norm penalty term. The problem as formulated is convex but the memor ..."
Abstract

Cited by 334 (2 self)
 Add to MetaCart
(Show Context)
We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse. Our approach is to solve a maximum likelihood problem with an added ℓ1norm penalty term. The problem as formulated is convex but the memory requirements and complexity of existing interior point methods are prohibitive for problems with more than tens of nodes. We present two new algorithms for solving problems with at least a thousand nodes in the Gaussian case. Our first algorithm uses block coordinate descent, and can be interpreted as recursive ℓ1norm penalized regression. Our second algorithm, based on Nesterov’s first order method, yields a complexity estimate with a better dependence on problem size than existing interior point methods. Using a log determinant relaxation of the log partition function (Wainwright and Jordan, 2006), we show that these same algorithms can be used to solve an approximate sparse maximum likelihood problem for the binary case. We test our algorithms on synthetic data, as well as on gene expression and senate voting records data.
Logit models and logistic regressions for social networks. II . . .
, 1999
"... The research described here builds on our previous work by generalizing the univariate models described there to models for multivariate relations. This family, labelled p*, generalizes the Markov random graphs of Frank and Strauss, which were further developed by them and others, building on Besag’ ..."
Abstract

Cited by 321 (7 self)
 Add to MetaCart
The research described here builds on our previous work by generalizing the univariate models described there to models for multivariate relations. This family, labelled p*, generalizes the Markov random graphs of Frank and Strauss, which were further developed by them and others, building on Besag’s ideas on estimation. These models were first used to model random variables embedded in lattices by Ising, and have been quite common in the study of spatial data. Here, they are applied to the statistical analysis of multigraphs, in general, and the analysis of multivariate social networks, in particular. In this paper, we show how to formulate models for multivariate social networks by considering a range of theoretical claims about social structure. We illustrate the models by developing structural models for several multivariate networks.
Correctness of belief propagation in Gaussian graphical models of arbitrary topology
 NEURAL COMPUTATION
, 1999
"... Local "belief propagation" rules of the sort proposed byPearl [12] are guaranteed to converge to the correct posterior probabilities in singly connected graphical models. Recently, a number of researchers have empirically demonstrated good performance of "loopy belief propagation&q ..."
Abstract

Cited by 296 (7 self)
 Add to MetaCart
Local "belief propagation" rules of the sort proposed byPearl [12] are guaranteed to converge to the correct posterior probabilities in singly connected graphical models. Recently, a number of researchers have empirically demonstrated good performance of "loopy belief propagation"  using these same rules on graphs with loops. Perhaps the most dramatic instance is the near Shannonlimit performance of "Turbo codes", whose decoding algorithm is equivalentto loopy belief propagation. Except for the
Kalman filtering with intermittent observations
 IEEE TRANSACTIONS ON AUTOMATIC CONTROL
, 2004
"... Motivated by navigation and tracking applications within sensor networks, we consider the problem of performing Kalman filtering with intermittent observations. When data travel along unreliable communication channels in a large, wireless, multihop sensor network, the effect of communication delays ..."
Abstract

Cited by 295 (41 self)
 Add to MetaCart
Motivated by navigation and tracking applications within sensor networks, we consider the problem of performing Kalman filtering with intermittent observations. When data travel along unreliable communication channels in a large, wireless, multihop sensor network, the effect of communication delays and loss of information in the control loop cannot be neglected. We address this problem starting from the discrete Kalman filtering formulation, and modeling the arrival of the observation as a random process. We study the statistical convergence properties of the estimation error covariance, showing the existence of a critical value for the arrival rate of the observations, beyond which a transition to an unbounded state error covariance occurs. We also give upper and lower bounds on this expected state error covariance.
Nonparametric Belief Propagation
 IN CVPR
, 2002
"... In applications of graphical models arising in fields such as computer vision, the hidden variables of interest are most naturally specified by continuous, nonGaussian distributions. However, due to the limitations of existing inf#6F6F3 algorithms, it is of#]k necessary tof#3# coarse, ..."
Abstract

Cited by 279 (25 self)
 Add to MetaCart
In applications of graphical models arising in fields such as computer vision, the hidden variables of interest are most naturally specified by continuous, nonGaussian distributions. However, due to the limitations of existing inf#6F6F3 algorithms, it is of#]k necessary tof#3# coarse, discrete approximations to such models. In this paper, we develop a nonparametric belief propagation (NBP) algorithm, which uses stochastic methods to propagate kernelbased approximations to the true continuous messages. Each NBP message update is based on an efficient sampling procedure which can accomodate an extremely broad class of potentialf#l3]k[[z3 allowing easy adaptation to new application areas. We validate our method using comparisons to continuous BP for Gaussian networks, and an application to the stereo vision problem.
Algebraic Algorithms for Sampling from Conditional Distributions
 Annals of Statistics
, 1995
"... We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so a ..."
Abstract

Cited by 268 (20 self)
 Add to MetaCart
We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so an excursion into computational algebraic geometry.