Results 1  10
of
56
Optimizing costly functions with simple constraints: A limitedmemory projected quasinewton algorithm
 Proc. of Conf. on Artificial Intelligence and Statistics
, 2009
"... An optimization algorithm for minimizing a smooth function over a convex set is described. Each iteration of the method computes a descent direction by minimizing, over the original constraints, a diagonal plus lowrank quadratic approximation to the function. The quadratic approximation is construct ..."
Abstract

Cited by 53 (9 self)
 Add to MetaCart
(Show Context)
An optimization algorithm for minimizing a smooth function over a convex set is described. Each iteration of the method computes a descent direction by minimizing, over the original constraints, a diagonal plus lowrank quadratic approximation to the function. The quadratic approximation is constructed using a limitedmemory quasiNewton update. The method is suitable for largescale problems where evaluation of the function is substantially more expensive than projection onto the constraint set. Numerical experiments on onenorm regularized test problems indicate that the proposed method is competitive with stateoftheart methods such as boundconstrained LBFGS and orthantwise descent. We further show that the method generalizes to a wide class of problems, and substantially improves on stateoftheart methods for problems such as learning the structure of Gaussian graphical models and Markov random fields. 1
Automatic image annotation using group sparsity
 In CVPR
, 2010
"... Automatically assigning relevant text keywords to images is an important problem. Many algorithms have been proposed in the past decade and achieved good performance. Efforts have focused upon model representations of keywords, but properties of features have not been well investigated. In most case ..."
Abstract

Cited by 42 (13 self)
 Add to MetaCart
(Show Context)
Automatically assigning relevant text keywords to images is an important problem. Many algorithms have been proposed in the past decade and achieved good performance. Efforts have focused upon model representations of keywords, but properties of features have not been well investigated. In most cases, a group of features is preselected, yet important feature properties are not well used to select features. In this paper, we introduce a regularization based feature selection algorithm to leverage both the sparsity and clustering properties of features, and incorporate it into the image annotation task. A novel approach is also proposed to iteratively obtain similar and dissimilar pairs from both the keyword similarity and the relevance feedback. Thus keyword similarity is modeled in the annotation framework. Numerous experiments are designed to compare the performance between features, feature combinations and regularization based feature selection methods applied on the image annotation task, which gives insight into the properties of features in the image annotation task. The experimental results demonstrate that the group sparsity based method is more accurate and stable than others. 1.
Convex structure learning in loglinear models: Beyond pairwise potentials
 In Proceedings of International Workshop on Artificial Intelligence and Statistics
, 2010
"... Previous work has examined structure learning in loglinear models with `1regularization, largely focusing on the case of pairwise potentials. In this work we consider the case of models with potentials of arbitrary order, but that satisfy a hierarchical constraint. We enforce the hierarchical const ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
(Show Context)
Previous work has examined structure learning in loglinear models with `1regularization, largely focusing on the case of pairwise potentials. In this work we consider the case of models with potentials of arbitrary order, but that satisfy a hierarchical constraint. We enforce the hierarchical constraint using group `1regularization with overlapping groups. An active set method that enforces hierarchical inclusion allows us to tractably consider the exponential number of higherorder potentials. We use a spectral projected gradient method as a subroutine for solving the overlapping group `1regularization problem, and make use of a sparse version of Dykstra's algorithm to compute the projection. Our experiments indicate that this model gives equal or better test set likelihood compared to previous models. 1
Discriminative structure learning of hierarchical representations for object detection
 in CVPR, 2009
"... A variety of flexible models have been proposed to detect objects in challenging real world scenes. Motivated by some of the most successful techniques, we propose a hierarchical multifeature representation and automatically learn flexible hierarchical object models for a wide variety of object cla ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
A variety of flexible models have been proposed to detect objects in challenging real world scenes. Motivated by some of the most successful techniques, we propose a hierarchical multifeature representation and automatically learn flexible hierarchical object models for a wide variety of object classes. To that end we not only rely on automatic selection of relevant individual features, but go beyond previous work by automatically selecting and modeling complex, longrange feature couplings within this model. To achieve this generality and flexibility our work combines structure learning in conditional random fields and discriminative parameter learning of classifiers using hierarchical features. We adopt an efficient gradient based heuristic for model selection and carry it forward to discriminative, multidimensional selection of features and their couplings for improved detection performance. Experimentally we consistently outperform the currently leading method on all 20 classes of the PASCAL VOC 2007 challenge and achieve the best published results on 16 of 20 classes. 1.
Automatic discovery of meaningful object parts with latent CRFs
 In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010. Bibliography 192
"... Object recognition is challenging due to high intraclass variability caused, e.g., by articulation, viewpoint changes, and partial occlusion. Successful methods need to strike a balance between being flexible enough to model such variation and discriminative enough to detect objects in cluttered, ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
(Show Context)
Object recognition is challenging due to high intraclass variability caused, e.g., by articulation, viewpoint changes, and partial occlusion. Successful methods need to strike a balance between being flexible enough to model such variation and discriminative enough to detect objects in cluttered, real world scenes. Motivated by these challenges we propose a latent conditional random field (CRF) based on a flexible assembly of parts. By modeling part labels as hidden nodes and developing an EM algorithm for learning from class labels alone, this new approach enables the automatic discovery of semantically meaningful object part representations. To increase the flexibility and expressiveness of the model, we learn the pairwise structure of the underlying graphical model at the level of object part interactions. Efficient gradientbased techniques are used to estimate the structure of the domain of interest and carried forward to the multilabel or object part case. Our experiments illustrate the meaningfulness of the discovered parts and demonstrate stateoftheart performance of the approach. 1.
MultiTask Learning of Gaussian Graphical Models
"... We present multitask structure learning for Gaussian graphical models. We discuss uniqueness and boundedness of the optimal solution of the maximization problem. A block coordinate descent method leads to a provably convergent algorithm that generates a sequence of positive definite solutions. Thus ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
We present multitask structure learning for Gaussian graphical models. We discuss uniqueness and boundedness of the optimal solution of the maximization problem. A block coordinate descent method leads to a provably convergent algorithm that generates a sequence of positive definite solutions. Thus, we reduce the original problem into a sequence of strictly convex ℓ∞ regularized quadratic minimization subproblems. We further show that this subproblem leads to the continuous quadratic knapsack problem, for which very efficient methods exist. Finally, we show promising results in a dataset that captures brain function of cocaine addicted and control subjects under conditions of monetary reward. 1.
Learning Tree Conditional Random Fields
"... We examine maximum spanning treebased methods for learning the structure of tree Conditional Random Fields (CRFs) P (YX). We use edge weights which take advantage of local inputs X and thus scale to large problems. For a general class of edge weights, we give a negative learnability result. Howeve ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
We examine maximum spanning treebased methods for learning the structure of tree Conditional Random Fields (CRFs) P (YX). We use edge weights which take advantage of local inputs X and thus scale to large problems. For a general class of edge weights, we give a negative learnability result. However, we demonstrate that two members of the class–local Conditional Mutual Information and Decomposable Conditional Influence– have reasonable theoretical bases and perform very well in practice. On synthetic data and a largescale fMRI application, our methods outperform existing techniques. 1.
An Efficient Projection for l1, ∞ Regularization
"... In recent years the l1, ∞ norm has been proposed for joint regularization. In essence, this type of regularization aims at extending the l1 framework for learning sparse models to a setting where the goal is to learn a set of jointly sparse models. In this paper we derive a simple and effective proj ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
In recent years the l1, ∞ norm has been proposed for joint regularization. In essence, this type of regularization aims at extending the l1 framework for learning sparse models to a setting where the goal is to learn a set of jointly sparse models. In this paper we derive a simple and effective projected gradient method for optimization of l1, ∞ regularized problems. The main challenge in developing such a method resides on being able to compute efficient projections to the l1, ∞ ball. We present an algorithm that works in O(nlog n) time and O(n) memory where n is the number of parameters. We test our algorithm in a multitask image annotation problem. Our results show that l1,∞ leads to better performance than both l2 and l1 regularization and that it is is effective in discovering jointly sparse solutions. 1.
Learning Thin Junction Trees via Graph Cuts
"... Structure learning algorithms usually focus on the compactness of the learned model. However, for general compact models, both exact and approximate inference are still NPhard. Therefore, the focus only on compactness leads to learning models that require approximate inference techniques, thus redu ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Structure learning algorithms usually focus on the compactness of the learned model. However, for general compact models, both exact and approximate inference are still NPhard. Therefore, the focus only on compactness leads to learning models that require approximate inference techniques, thus reducing their prediction quality. In this paper, we propose a method for learning an attractive class of models: boundedtreewidth junction trees, which permit both compact representation of probability distributions and efficient exact inference. Using Bethe approximation of the likelihood, we transform the problem of finding a good junction tree separator into a minimum cut problem on a weighted graph. Using the graph cut intuition, we present an efficient algorithm with theoretical guarantees for finding good separators, which we recursively apply to obtain a thin junction tree. Our extensive empirical evaluation demonstrates the benefit of applying exact inference using our models to answer queries. We also extend our technique to learning low treewidth conditional random fields, and demonstrate significant improvements over state of the art blockL1 regularization techniques. 1
Modeling Discrete Interventional Data using Directed Cyclic Graphical Models
"... We outline a representation for discrete multivariate distributions in terms of interventional potential functions that are globally normalized. This representation can be used to model the effects of interventions, and the independence properties encoded in this model can be represented as a direct ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
We outline a representation for discrete multivariate distributions in terms of interventional potential functions that are globally normalized. This representation can be used to model the effects of interventions, and the independence properties encoded in this model can be represented as a directed graph that allows cycles. In addition to discussing inference and sampling with this representation, we give an exponential family parametrization that allows parameter estimation to be stated as a convex optimization problem; we also give a convex relaxation of the task of simultaneous parameter and structure learning using group ℓ1regularization. The model is evaluated on simulated data and intracellular flow cytometry data. 1