Results 1  10
of
5,294
An introduction to variational methods for graphical models
 TO APPEAR: M. I. JORDAN, (ED.), LEARNING IN GRAPHICAL MODELS
"... ..."
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
, 2010
"... ..."
(Show Context)
Learning in graphical models
 STATISTICAL SCIENCE
, 2004
"... Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for ..."
Abstract

Cited by 800 (10 self)
 Add to MetaCart
(Show Context)
Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in largescale data analysis problems. We also present examples of graphical models in bioinformatics, errorcontrol coding and language processing.
Distance Metric Learning, With Application To Clustering With SideInformation
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 15
, 2003
"... Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many "plausible" ways, and if a clustering algorithm such as Kmeans initially fails to find one that is meaningful to a user, the only recourse may be for the us ..."
Abstract

Cited by 799 (14 self)
 Add to MetaCart
(Show Context)
Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many "plausible" ways, and if a clustering algorithm such as Kmeans initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they consider "similar." For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in R , learns a distance metric over R that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allows us to give efficient, localoptimafree algorithms. We also demonstrate empirically that the learned metrics can be used to significantly improve clustering performance.
Cooperative strategies and capacity theorems for relay networks
 IEEE Trans. Inform. Theory
, 2005
"... Abstract—Coding strategies that exploit node cooperation are developed for relay networks. Two basic schemes are studied: the relays decodeandforward the source message to the destination, or they compressandforward their channel outputs to the destination. The decodeandforward scheme is a va ..."
Abstract

Cited by 733 (19 self)
 Add to MetaCart
Abstract—Coding strategies that exploit node cooperation are developed for relay networks. Two basic schemes are studied: the relays decodeandforward the source message to the destination, or they compressandforward their channel outputs to the destination. The decodeandforward scheme is a variant of multihopping, but in addition to having the relays successively decode the message, the transmitters cooperate and each receiver uses several or all of its past channel output blocks to decode. For the compressandforward scheme, the relays take advantage of the statistical dependence between their channel outputs and the destination’s channel output. The strategies are applied to wireless channels, and it is shown that decodeandforward achieves the ergodic capacity with phase fading if phase information is available only locally, and if the relays are near the source node. The ergodic capacity coincides with the rate of a distributed antenna array with full cooperation even though the transmitting antennas are not colocated. The capacity results generalize broadly, including to multiantenna transmission with Rayleigh fading, singlebounce fading, certain quasistatic fading problems, cases where partial channel knowledge is available at the transmitters, and cases where local user cooperation is permitted. The results further extend to multisource and multidestination networks such as multiaccess and broadcast relay channels. Index Terms—Antenna arrays, capacity, coding, multiuser channels, relay channels. I.
A Singular Value Thresholding Algorithm for Matrix Completion
, 2008
"... This paper introduces a novel algorithm to approximate the matrix with minimum nuclear norm among all matrices obeying a set of convex constraints. This problem may be understood as the convex relaxation of a rank minimization problem, and arises in many important applications as in the task of reco ..."
Abstract

Cited by 539 (20 self)
 Add to MetaCart
This paper introduces a novel algorithm to approximate the matrix with minimum nuclear norm among all matrices obeying a set of convex constraints. This problem may be understood as the convex relaxation of a rank minimization problem, and arises in many important applications as in the task of recovering a large matrix from a small subset of its entries (the famous Netflix problem). Offtheshelf algorithms such as interior point methods are not directly amenable to large problems of this kind with over a million unknown entries. This paper develops a simple firstorder and easytoimplement algorithm that is extremely efficient at addressing problems in which the optimal solution has low rank. The algorithm is iterative and produces a sequence of matrices {X k, Y k} and at each step, mainly performs a softthresholding operation on the singular values of the matrix Y k. There are two remarkable features making this attractive for lowrank matrix completion problems. The first is that the softthresholding operation is applied to a sparse matrix; the second is that the rank of the iterates {X k} is empirically nondecreasing. Both these facts allow the algorithm to make use of very minimal storage space and keep the computational cost of each iteration low. On
Pegasos: Primal Estimated subgradient solver for SVM
"... We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract

Cited by 531 (21 self)
 Add to MetaCart
We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ɛ2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total runtime of our method is Õ(d/(λɛ)), where d is a bound on the number of nonzero features in each example. Since the runtime does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to nonlinear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an orderofmagnitude speedup over previous SVM learning methods.
Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems
 IEEE Journal of Selected Topics in Signal Processing
, 2007
"... Abstract—Many problems in signal processing and statistical inference involve finding sparse solutions to underdetermined, or illconditioned, linear systems of equations. A standard approach consists in minimizing an objective function which includes a quadratic (squared ℓ2) error term combined wi ..."
Abstract

Cited by 524 (15 self)
 Add to MetaCart
Abstract—Many problems in signal processing and statistical inference involve finding sparse solutions to underdetermined, or illconditioned, linear systems of equations. A standard approach consists in minimizing an objective function which includes a quadratic (squared ℓ2) error term combined with a sparsenessinducing (ℓ1) regularization term.Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution, and compressed sensing are a few wellknown examples of this approach. This paper proposes gradient projection (GP) algorithms for the boundconstrained quadratic programming (BCQP) formulation of these problems. We test variants of this approach that select the line search parameters in different ways, including techniques based on the BarzilaiBorwein method. Computational experiments show that these GP approaches perform well in a wide range of applications, often being significantly faster (in terms of computation time) than competing methods. Although the performance of GP methods tends to degrade as the regularization term is deemphasized, we show how they can be embedded in a continuation scheme to recover their efficient practical performance. A. Background I.