Results 1  10
of
12
Scaling SVM and Least Absolute Deviations via Exact Data Reduction
"... The support vector machine (SVM) is a widely used method for classification. Although many efforts have been devoted to develop efficient solvers, it remains challenging to apply SVM to largescale problems. A nice property of SVM is that the nonsupport vectors have no effect on the resulting class ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
The support vector machine (SVM) is a widely used method for classification. Although many efforts have been devoted to develop efficient solvers, it remains challenging to apply SVM to largescale problems. A nice property of SVM is that the nonsupport vectors have no effect on the resulting classifier. Motivated by this observation, we present fast and efficient screening rules to discard nonsupport vectors by analyzing the dual problem of SVM via variational inequalities (DVI). As a result, the number of data instances to be entered into the optimization can be substantially reduced. Some appealing features of our screening method are: (1) DVI is safe in the sense that the vectors discarded by DVI are guaranteed to be nonsupport vectors; (2) the data set needs to be scanned only once to run the screening, and its computational cost is negligible compared to that of solving the SVM problem; (3) DVI is independent of the solvers and can be integrated with any existing efficient solver. We also show that the DVI technique can be extended to detect nonsupport vectors in the least absolute deviations regression (LAD). To the best of our knowledge, there are currently no screening methods for LAD. We have evaluated DVI on both synthetic and real data sets. Experiments indicate that DVI significantly outperforms the existing stateoftheart screening rules for SVM, and it is very effective in discarding nonsupport vectors for LAD. The speedup gained by DVI rules can be up to two orders of magnitude.
Twolayer feature reduction for sparsegroup lasso via decomposition of convex sets
 In Proc. NIPS
, 2014
"... SparseGroup Lasso (SGL) has been shown to be a powerful regression technique for simultaneously discovering group and withingroup sparse patterns by using a combination of the `1 and `2 norms. However, in largescale applications, the complexity of the regularizers entails great computational cha ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
SparseGroup Lasso (SGL) has been shown to be a powerful regression technique for simultaneously discovering group and withingroup sparse patterns by using a combination of the `1 and `2 norms. However, in largescale applications, the complexity of the regularizers entails great computational challenges. In this paper, we propose a novel twolayer feature reduction method (TLFre) for SGL via a decomposition of its dual feasible set. The twolayer reduction is able to quickly identify the inactive groups and the inactive features, respectively, which are guaranteed to be absent from the sparse representation and can be removed from the optimization. Existing feature reduction methods are only applicable for sparse models with one sparsityinducing regularizer. To our best knowledge, TLFre is the first one that is capable of dealing with multiple sparsityinducing regularizers. Moreover, TLFre has a very low computational cost and can be integrated with any existing solvers. We also develop a screening method—called DPC (decomposition of convex set)—for the nonnegative Lasso problem. Experiments on both synthetic and real data sets show that TLFre and DPC improve the efficiency of SGL and nonnegative Lasso by several orders of magnitude. 1
To cite this version:
, 2014
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
 Add to MetaCart
(Show Context)
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Benchmarking solvers for TV−ℓ1 leastsquares and logistic regression in brain imaging
Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling
"... Abstract The problem of learning a sparse model is conceptually interpreted as the process of identifying active features/samples and then optimizing the model over them. Recently introduced safe screening allows us to identify a part of nonactive features/samples. So far, safe screening has been i ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract The problem of learning a sparse model is conceptually interpreted as the process of identifying active features/samples and then optimizing the model over them. Recently introduced safe screening allows us to identify a part of nonactive features/samples. So far, safe screening has been individually studied either for feature screening or for sample screening. In this paper, we introduce a new approach for safely screening features and samples simultaneously by alternatively iterating feature and sample screening steps. A significant advantage of considering them simultaneously rather than individually is that they have a synergy effect in the sense that the results of the previous safe feature screening can be exploited for improving the next safe sample screening performances, and viceversa. We first theoretically investigate the synergy effect, and then illustrate the practical advantage through intensive numerical experiments for problems with large numbers of features and samples.
MultiLayer Feature Reduction for Tree Structured Group Lasso via Hierarchical Projection
"... Abstract Tree structured group Lasso (TGL) is a powerful technique in uncovering the tree structured sparsity over the features, where each node encodes a group of features. It has been applied successfully in many realworld applications. However, with extremely large feature dimensions, solving T ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Tree structured group Lasso (TGL) is a powerful technique in uncovering the tree structured sparsity over the features, where each node encodes a group of features. It has been applied successfully in many realworld applications. However, with extremely large feature dimensions, solving TGL remains a significant challenge due to its highly complicated regularizer. In this paper, we propose a novel MultiLayer Feature reduction method (MLFre) to quickly identify the inactive nodes (the groups of features with zero coefficients in the solution) hierarchically in a topdown fashion, which are guaranteed to be irrelevant to the response. Thus, we can remove the detected nodes from the optimization without sacrificing accuracy. The major challenge in developing such testing rules is due to the overlaps between the parents and their children nodes. By a novel hierarchical projection algorithm, MLFre is able to test the nodes independently from any of their ancestor nodes. Moreover, we can integrate MLFrethat has a low computational costwith any existing solvers. Experiments on both synthetic and real data sets demonstrate that the speedup gained by MLFre can be orders of magnitude.
Regularization Path of CrossValidation Error Lower Bounds
"... Abstract Careful tuning of a regularization parameter is indispensable in many machine learning tasks because it has a significant impact on generalization performances. Nevertheless, current practice of regularization parameter tuning is more of an art than a science, e.g., it is hard to tell how ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Careful tuning of a regularization parameter is indispensable in many machine learning tasks because it has a significant impact on generalization performances. Nevertheless, current practice of regularization parameter tuning is more of an art than a science, e.g., it is hard to tell how many gridpoints would be needed in crossvalidation (CV) for obtaining a solution with sufficiently small CV error. In this paper we propose a novel framework for computing a lower bound of the CV errors as a function of the regularization parameter, which we call regularization path of CV error lower bounds. The proposed framework can be used for providing a theoretical approximation guarantee on a set of solutions in the sense that how far the CV error of the current best solution could be away from best possible CV error in the entire range of the regularization parameters. Our numerical experiments demonstrate that a theoretically guaranteed choice of a regularization parameter in the above sense is possible with reasonable computational costs.
Unified Methods for Exploiting Piecewise Linear Structure in Convex Optimization
"... Abstract We develop methods for rapidly identifying important components of a convex optimization problem for the purpose of achieving fast convergence times. By considering a novel problem formulationthe minimization of a sum of piecewise functionswe describe a principled and general mechanism f ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We develop methods for rapidly identifying important components of a convex optimization problem for the purpose of achieving fast convergence times. By considering a novel problem formulationthe minimization of a sum of piecewise functionswe describe a principled and general mechanism for exploiting piecewise linear structure in convex optimization. This result leads to a theoretically justified working set algorithm and a novel screening test, which generalize and improve upon many prior results on exploiting structure in convex optimization. In empirical comparisons, we study the scalability of our methods. We find that screening scales surprisingly poorly with the size of the problem, while our working set algorithm convincingly outperforms alternative approaches.
unknown title
"... Nuclear norm regularization has been shown very promising for pursing a low rank solution for matrix variable in various machine learning problems. Many efforts have been devoted to develop efficient algorithms for solving the optimization problem in nuclear norm regularization. Solving the problem ..."
Abstract
 Add to MetaCart
(Show Context)
Nuclear norm regularization has been shown very promising for pursing a low rank solution for matrix variable in various machine learning problems. Many efforts have been devoted to develop efficient algorithms for solving the optimization problem in nuclear norm regularization. Solving the problem for largescale matrix variables, however, is still a challenging task since the complexity grows fast with the size of matrix variable. In this work, we propose a novel method called safe subspace screening (SSS), to improve the efficiency of the solver for nuclear norm regularized least squares problems. Motivated by the fact that the low rank solution can be represented by a few subspaces, the proposed method accurately discards a predominant percentage of inactive subspaces prior to solving the problem to reduce problem size. Consequently, a much smaller problem is required to solve, making it more efficient than optimizing the original problem. The proposed SSS is safe, in that its solution is identical to the solution from the solver. In addition, the proposed SSS can be used together with any existing nuclear norm solver since it is independent of the solver. We have evaluated the proposed SSS on several synthetic as well as real data sets. Extensive results show that the
An Algorithmic Framework for Computing Validation Performance Bounds by Using Suboptimal Models
, 2014
"... ar ..."
SCREENING RULES FOR OVERLAPPING GROUP LASSO
"... Recently, to solve largescale lasso and group lasso problems, screening rules have been developed, the goal of which is to reduce the problem size by efficiently discarding zero coefficients using simple rules independently of the others. However, screening for overlapping group lasso remains an op ..."
Abstract
 Add to MetaCart
(Show Context)
Recently, to solve largescale lasso and group lasso problems, screening rules have been developed, the goal of which is to reduce the problem size by efficiently discarding zero coefficients using simple rules independently of the others. However, screening for overlapping group lasso remains an open challenge because the overlaps between groups make it infeasible to test each group independently. In this paper, we develop screening rules for overlapping group lasso. To address the challenge arising from groups with overlaps, we take into account overlapping groups only if they are inclusive of the group being tested, and then we derive screening rules, adopting the dual polytope projection approach. This strategy allows us to screen each group independently of each other. In our experiments, we demonstrate the efficiency of our screening rules on various datasets. 1. Introduction. We