Results 1  10
of
151
Consistency of the group lasso and multiple kernel learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... We consider the leastsquare regression problem with regularization by a block 1norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1norm where all spaces have dimension one, where it ..."
Abstract

Cited by 274 (33 self)
 Add to MetaCart
We consider the leastsquare regression problem with regularization by a block 1norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic model consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model misspecification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covariance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.
Structured variable selection with sparsityinducing norms
, 2011
"... We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to ov ..."
Abstract

Cited by 187 (27 self)
 Add to MetaCart
We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to overlap. This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resulting nonzero patterns, providing both forward and backward algorithms to go back and forth from groups to patterns. This allows the design of norms adapted to specific prior knowledge expressed in terms of nonzero patterns. We also present an efficient active set algorithm, and analyze the consistency of variable selection for leastsquares linear regression in low and highdimensional settings.
Exploring large feature spaces with hierarchical MKL
, 2008
"... For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or H ..."
Abstract

Cited by 110 (22 self)
 Add to MetaCart
(Show Context)
For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsityinducing norms such as the ℓ 1norm or the block ℓ 1norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsityinducing norms leads to stateoftheart predictive performance. 1
More efficiency in multiple kernel learning
 In ICML
, 2007
"... An efficient and general multiple kernel learning (MKL) algorithm has been recently proposed by Sonnenburg et al. (2006). This approach has opened new perspectives since it makes the MKL approach tractable for largescale problems, by iteratively using existing support vector machine code. However, i ..."
Abstract

Cited by 90 (5 self)
 Add to MetaCart
An efficient and general multiple kernel learning (MKL) algorithm has been recently proposed by Sonnenburg et al. (2006). This approach has opened new perspectives since it makes the MKL approach tractable for largescale problems, by iteratively using existing support vector machine code. However, it turns out that this iterative algorithm needs several iterations before converging towards a reasonable solution. In this paper, we address the MKL problem through an adaptive 2norm regularization formulation. Weights on each kernel matrix are included in the standard SVM empirical risk minimization problem with a ℓ1 constraint to encourage sparsity. We propose an algorithm for solving this problem and provide an new insight on MKL algorithms based on block 1norm regularization by showing that the two approaches are equivalent. Experimental results show that the resulting algorithm converges rapidly and its efficiency compares favorably to other MKL algorithms. 1.
A spectral regularization framework for multitask structure learning
 In NIPS
, 2008
"... Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, whi ..."
Abstract

Cited by 76 (9 self)
 Add to MetaCart
(Show Context)
Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization with spectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efficiently by an alternating minimization algorithm. In addition, we provide a necessary and sufficient condition for convexity of the regularizer. We analyze concrete examples of the framework, which are equivalent to regularization with Lp matrix norms. Experiments on two real data sets indicate that the algorithm scales well with the number of tasks and improves on state of the art statistical performance. 1
Structured Sparse Principal Component Analysis
, 2009
"... We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes. This structured sparse PCA is based on a structured regularization recently introduced by [1]. While ..."
Abstract

Cited by 70 (14 self)
 Add to MetaCart
(Show Context)
We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes. This structured sparse PCA is based on a structured regularization recently introduced by [1]. While classical sparse priors only deal with cardinality, the regularization we use encodes higherorder information about the data. We propose an efficient and simple optimization procedure to solve this problem. Experiments with two practical tasks, face recognition and the study of the dynamics of a protein complex, demonstrate the benefits of the proposed structured approach over unstructured approaches. 1
Learning convex combinations of continuously parameterized basic kernels
 In COLT
, 2005
"... Abstract. We study the problem of learning a kernel which minimizes a regularization error functional such as that used in regularization networks or support vector machines. We consider this problem when the kernel is in the convex hull of basic kernels, for example, Gaussian kernels which are cont ..."
Abstract

Cited by 62 (10 self)
 Add to MetaCart
(Show Context)
Abstract. We study the problem of learning a kernel which minimizes a regularization error functional such as that used in regularization networks or support vector machines. We consider this problem when the kernel is in the convex hull of basic kernels, for example, Gaussian kernels which are continuously parameterized by a compact set. We show that there always exists an optimal kernel which is the convex combination of at most m + 1 basic kernels, where m is the sample size, and provide a necessary and sufficient condition for a kernel to be optimal. The proof of our results is constructive and leads to a greedy algorithm for learning the kernel. We discuss the properties of this algorithm and present some preliminary numerical simulations. 1
A DCprogramming algorithm for kernel selection
 In ICML
, 2006
"... We address the problem of learning a kernel for a given supervised learning task. Our approach consists in searching within the convex hull of a prescribed set of basic kernels for one which minimizes a convex regularization functional. A unique feature of this approach compared to others in the lit ..."
Abstract

Cited by 61 (5 self)
 Add to MetaCart
(Show Context)
We address the problem of learning a kernel for a given supervised learning task. Our approach consists in searching within the convex hull of a prescribed set of basic kernels for one which minimizes a convex regularization functional. A unique feature of this approach compared to others in the literature is that the number of basic kernels can be infinite. We only require that they are continuously parameterized. For example, the basic kernels could be isotropic Gaussians with variance in a prescribed interval or even Gaussians parameterized by multiple continuous parameters. Our work builds upon a formulation involving a minimax optimization problem and a recently proposed greedy algorithm for learning the kernel. Although this optimization problem is not convex, it belongs to the larger class of DC (difference of convex functions) programs. Therefore, we apply recent results from DC optimization theory to create a new algorithm for learning the kernel. Our experimental results on benchmark data sets show that this algorithm outperforms a previously proposed method.
Prı́ncipe, “The kernel leastmeansquare algorithm
 IEEE Transactions on Signal Processing
, 2008
"... Abstract—The combination of the famed kernel trick and the leastmeansquare (LMS) algorithm provides an interesting samplebysample update for an adaptive filter in reproducing kernel Hilbert spaces (RKHS), which is named in this paper the KLMS. Unlike the accepted view in kernel methods, this pap ..."
Abstract

Cited by 50 (9 self)
 Add to MetaCart
(Show Context)
Abstract—The combination of the famed kernel trick and the leastmeansquare (LMS) algorithm provides an interesting samplebysample update for an adaptive filter in reproducing kernel Hilbert spaces (RKHS), which is named in this paper the KLMS. Unlike the accepted view in kernel methods, this paper shows that in the finite training data case, the KLMS algorithm is well posed in RKHS without the addition of an extra regularization term to penalize solution norms as was suggested by
Learning nonlinear combinations of kernels
 In NIPS
, 2009
"... This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem can be reduced to a simpler minimization problem, and prove that the global solution of this problem always lies on the boundary. We give a projectionbased gradient descent algorithm for solving the optimization problem, shown empirically to converge in few iterations. Finally, we report the results of extensive experiments with this algorithm using several publicly available datasets demonstrating the effectiveness of our technique. 1