Results 11  20
of
49
Scalable Matrixvalued Kernel Learning for Highdimensional Nonlinear Multivariate Regression and Granger Causality
"... We propose a general matrixvalued multiple kernel learning framework for highdimensionalnonlinearmultivariateregression problems. This framework allows a broad class of mixed norm regularizers, including those that induce sparsity, to be imposedonadictionaryofvectorvaluedReproducing Kernel Hilbert ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
We propose a general matrixvalued multiple kernel learning framework for highdimensionalnonlinearmultivariateregression problems. This framework allows a broad class of mixed norm regularizers, including those that induce sparsity, to be imposedonadictionaryofvectorvaluedReproducing Kernel Hilbert Spaces. We develop a highly scalable and eigendecompositionfree algorithm that orchestrates two inexact solvers for simultaneously learning both the input and output components of separable matrixvalued kernels. As a key application enabled by our framework, we show how highdimensional causal inference tasks can be naturally cast as sparse function estimation problems, leading to novel nonlinear extensions of a class of Graphical Granger Causality techniques. Our algorithmic developments and extensive empirical studies are complemented by theoretical analyses in terms of Rademacher generalization bounds. 1
Conditional gradient algorithms for machine learning
, 2012
"... We consider penalized formulations of machine learning problems with regularization penalty having conic structure. For several important learning problems, stateoftheart optimization approaches such as proximal gradient algorithms are difficult to apply and computationally expensive, preventing ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We consider penalized formulations of machine learning problems with regularization penalty having conic structure. For several important learning problems, stateoftheart optimization approaches such as proximal gradient algorithms are difficult to apply and computationally expensive, preventing from using them for largescale learning purpose. We present a conditional gradient algorithm, with theoretical guarantees, and show promising experimental results on two largescale realworld datasets.
Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing
"... Tracenorm regularization is a widelyused and successful approach for collaborative filtering and matrix completion. However, its theoretical understanding is surprisingly weak, and despite previous attempts, there are no distributionfree, nontrivial learning guarantees currently known. In this p ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Tracenorm regularization is a widelyused and successful approach for collaborative filtering and matrix completion. However, its theoretical understanding is surprisingly weak, and despite previous attempts, there are no distributionfree, nontrivial learning guarantees currently known. In this paper, we bridge this gap by providing such guarantees, under mild assumptions which correspond to collaborative filtering as performed in practice. In fact, we claim that previous difficulties partially stemmed from a mismatch between the standard learningtheoretic modeling of collaborative filtering, and its practical application. Our results also shed some light on the issue of collaborative filtering with bounded models, which enforce predictions to lie within a certain range. In particular, we provide experimental and theoretical evidence that such models lead to a modest yet significant improvement. 1
Regularization Paths with Guarantees for Convex Semidefinite Optimization
"... We devise a simple algorithm for computing an approximate solution path for parameterized semidefinite convex optimization problems that is guaranteed to be εclose to the exact solution path. As a consequence, we can compute the entire regularization path for many regularized matrix completion and ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
We devise a simple algorithm for computing an approximate solution path for parameterized semidefinite convex optimization problems that is guaranteed to be εclose to the exact solution path. As a consequence, we can compute the entire regularization path for many regularized matrix completion and factorization approaches, as well as nuclear norm or weighted nuclear norm regularized convex optimization problems. This also includes robust PCA and variants of sparse PCA. On the theoretical side, we show that the approximate solution path has low complexity. This implies that the whole solution path can be computed efficiently. Our experiments demonstrate the practicality of the approach for large matrix completion problems. 1
Stochastic Gradient Descent with Only One Projection
"... Although many variants of stochastic gradient descent have been proposed for largescale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
(Show Context)
Although many variants of stochastic gradient descent have been proposed for largescale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), the projection step can be computationally expensive, making stochastic gradient descent unattractive for largescale optimization problems. We address this limitation by developing novel stochastic optimization algorithms that do not need intermediate projections. Instead, only one projection at the last iteration is needed to obtain a feasible solution in the given domain. Our theoretical analysis shows that with a high probability, the proposed algorithms achieve an O(1 / √ T) convergence rate for general convex optimization, and an O(ln T/T) rate for strongly convex optimization under mild conditions about the domain and the objective function. 1
Forward Basis Selection for Sparse Approximation over Dictionary
"... Recently, forward greedy selection method has been successfully applied to approximately solve sparse learning problems, characterized by a tradeoff between sparsity and accuracy. In this paper, we generalize this method to the setup of sparse approximation over a prefixed dictionary. A fully corr ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Recently, forward greedy selection method has been successfully applied to approximately solve sparse learning problems, characterized by a tradeoff between sparsity and accuracy. In this paper, we generalize this method to the setup of sparse approximation over a prefixed dictionary. A fully corrective forward selection algorithm is proposed along with convergence analysis. The periteration computational overhead of the proposed algorithm is dominated by a subproblem of linear optimization over the dictionary and a subproblem to optimally adjust the aggregation weights. The former is cheaper in several applications than the Euclidean projection while the latter is typically an unconstrained optimization problem which is relatively easy to solve. Furthermore, we extend the proposed algorithm to the setting of nonnegative/convex sparse approximation over a dictionary. Applications of our algorithms to several concrete learning problems are explored with efficiency validated on benchmark data sets. 1
Riemannian Pursuit for Big Matrix Recovery
"... Low rank matrix recovery is a fundamental task in many realworld applications. The performance of existing methods, however, deteriorates significantly when applied to illconditioned or largescale matrices. In this paper, we therefore propose an efficient method, called Riemannian Pursuit (RP), ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Low rank matrix recovery is a fundamental task in many realworld applications. The performance of existing methods, however, deteriorates significantly when applied to illconditioned or largescale matrices. In this paper, we therefore propose an efficient method, called Riemannian Pursuit (RP), that aims to address these two problems simultaneously. Our method consists of a sequence of fixedrank optimization problems. Each subproblem, solved by a nonlinear Riemannian conjugate gradient method, aims to correct the solution in the most important subspace of increasing size. Theoretically, RP converges linearly under mild conditions and experimental results show that it substantially outperforms existing methods when applied to largescale and illconditioned matrices. 1.
Spectral Regularization for MaxMargin Sequence Tagging
"... We frame maxmargin learning of latent variable structured prediction models as a convex optimization problem, making use of scoring functions computed by inputoutput observable operator models. This learning problem can be expressed as an optimization problem involving a lowrank Hankel matrix ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
We frame maxmargin learning of latent variable structured prediction models as a convex optimization problem, making use of scoring functions computed by inputoutput observable operator models. This learning problem can be expressed as an optimization problem involving a lowrank Hankel matrix that represents the inputoutput operator model. The direct outcome of our work is a new spectral regularization method for maxmargin structured prediction. Our experiments confirm that our proposed regularization framework leads to an effective way of controlling the capacity of structured prediction models. 1.
Optimizing over the growing spectrahedron
 ESA 2012: 20th Annual European Symposium on Algorithms
, 2012
"... Abstract We devise a framework for computing an approximate solution path for an important class of parameterized semidefinite problems that is guaranteed to be εclose to the exact solution path. The problem of computing the entire regularization path for matrix factorization problems such as ma ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract We devise a framework for computing an approximate solution path for an important class of parameterized semidefinite problems that is guaranteed to be εclose to the exact solution path. The problem of computing the entire regularization path for matrix factorization problems such as maximummargin matrix factorization fits into this framework, as well as many other nuclear norm regularized convex optimization problems from machine learning. We show that the combinatorial complexity of the approximate path is independent of the size of the matrix. Furthermore, the whole solution path can be computed in near linear time in the size of the input matrix. The framework employs an approximative semidefinite program solver for a fixed parameter value. Here we use an algorithm that has recently been introduced by Hazan. We present a refined analysis of Hazan’s algorithm that results in improved running time bounds for a single solution as well as for the whole solution path as a function of the approximation guarantee. 1
Learning output kernels for multitask problems
 Neurocomputing
"... Simultaneously solving multiple related learning tasks is beneficial under a variety of circumstances, but the prior knowledge necessary to correctly model task relationships is rarely available in practice. In this paper, we develop a novel kernelbased multitask learning technique that automati ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Simultaneously solving multiple related learning tasks is beneficial under a variety of circumstances, but the prior knowledge necessary to correctly model task relationships is rarely available in practice. In this paper, we develop a novel kernelbased multitask learning technique that automatically reveals structural intertask relationships. Building over the framework of output kernel learning (OKL), we introduce a method that jointly learns multiple functions and a lowrank multitask kernel by solving a nonconvex regularization problem. Optimization is carried out via a block coordinate descent strategy, where each subproblem is solved using suitable conjugate gradient (CG) type iterative methods for linear operator equations. The effectiveness of the proposed approach is demonstrated on pharmacological and collaborative filtering data. 1