Results 1  10
of
29
Thirion, Smallsample brain mapping: sparse recovery on spatially correlated designs with randomization and clustering, ICML
, 2006
"... Functional neuroimaging can measure the brain’s response to an external stimulus. It is used to perform brain mapping: identifying from these observations the brain regions involved. This problem can be cast into a linear supervised learning task where the neuroimaging data are used as predictors ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
Functional neuroimaging can measure the brain’s response to an external stimulus. It is used to perform brain mapping: identifying from these observations the brain regions involved. This problem can be cast into a linear supervised learning task where the neuroimaging data are used as predictors for the stimulus. Brain mapping is then seen as a support recovery problem. On functional MRI (fMRI) data, this problem is particularly challenging as i) the number of samples is small due to limited acquisition time and ii) the variables are strongly correlated. We propose to overcome these difficulties using sparse regression models over new variables obtained by clustering of the original variables. The use of randomization techniques, e.g. bootstrap samples, and clustering of the variables improves the recovery properties of sparse methods. We demonstrate the benefit of our approach on an extensive simulation study as well as two fMRI datasets. 1.
Sparse Prediction with the kSupport Norm
 IN: NEURAL INFORMATION PROCESSING SYSTEMS (CIT. ON
, 2014
"... ..."
A generalized online mirror descent with applications to classification and regression
, 2012
"... Online learning algorithms are fast, memoryefficient, easy to implement, and applicable to many prediction problems, including classification, regression, and ranking. Several online algorithms were proposed in the past few decades, some based on additive updates, like the Perceptron, and some othe ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Online learning algorithms are fast, memoryefficient, easy to implement, and applicable to many prediction problems, including classification, regression, and ranking. Several online algorithms were proposed in the past few decades, some based on additive updates, like the Perceptron, and some other on multiplicative updates, like Winnow. Online convex optimization is a general framework to unify both the design and the analysis of online algorithms using a single prediction strategy: online mirror descent. Different firstorder online algorithms are obtained by choosing the regularization function in online mirror descent. We generalize online mirror descent to sequences of timevarying regularizers. Our approach allows us to recover as special cases many recently proposed secondorder algorithms, such as the VovkAzouryWarmuth, the secondorder Perceptron, and the AROW algorithm. Moreover, we derive a new second order adaptive pnorm algorithm, and improve bounds for some firstorder algorithms, such as PassiveAggressive (PAI). Keywords: Online learning, Convex optimization, Secondorder algorithms
Intersecting singularities for multistructured estimation
"... We address the problem of designing a convex nonsmooth regularizer encouraging multiple structural effects simultaneously. Focusing on the inference of sparse and lowrank matrices we suggest a new complexity index and a convex penalty approximating it. The new penalty term can be written as the tra ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
We address the problem of designing a convex nonsmooth regularizer encouraging multiple structural effects simultaneously. Focusing on the inference of sparse and lowrank matrices we suggest a new complexity index and a convex penalty approximating it. The new penalty term can be written as the trace norm of a linear function of the matrix. By analyzing theoretical properties of this family of regularizers we come up with oracle inequalities and compressed sensing results ensuring the quality of our regularized estimator. We also provide algorithms and supporting numerical experiments. 1.
Correlation adaptive subspace segmentation by trace lasso
 In ICCV
, 2013
"... This paper studies the subspace segmentation problem. Given a set of data points drawn from a union of subspaces, the goal is to partition them into their underlying subspaces they were drawn from. The spectral clustering method is used as the framework. It requires to find an affinity matrix which ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
This paper studies the subspace segmentation problem. Given a set of data points drawn from a union of subspaces, the goal is to partition them into their underlying subspaces they were drawn from. The spectral clustering method is used as the framework. It requires to find an affinity matrix which is close to block diagonal, with nonzero entries corresponding to the data point pairs from the same subspace. In this work, we argue that both sparsity and the grouping effect are important for subspace segmentation. A sparse affinity matrix tends to be block diagonal, with less connections between data points from different subspaces. The grouping effect ensures that the highly corrected data which are usually from the same subspace can be grouped together. Sparse Subspace Clustering (SSC), by using 1minimization, encourages sparsity for data selection, but it lacks of the grouping effect. On the contrary, LowRank Representation (LRR), by rank minimization, and Least Squares Regression (LSR), by 2regularization, exhibit strong grouping effect, but they are short in subset selection. Thus the obtained affinity matrix is usually very sparse by SSC, yet very dense by LRR and LSR. In this work, we propose the Correlation Adaptive Subspace Segmentation (CASS) method by using trace Lasso. CASS is a data correlation dependent method which simultaneously performs automatic data selection and groups correlated data together. It can be regarded as a method which adaptively balances SSC and LSR. Both theoretical and experimental results show the effectiveness of CASS. 1.
Model consistency of partly smooth regularizers
, 2014
"... This paper studies leastsquare regression penalized with partly smooth convex regularizers. This class of functions is very large and versatile allowing to promote solutions conforming to some notion of lowcomplexity. Indeed, they force solutions of variational problems to belong to a lowdimensi ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
This paper studies leastsquare regression penalized with partly smooth convex regularizers. This class of functions is very large and versatile allowing to promote solutions conforming to some notion of lowcomplexity. Indeed, they force solutions of variational problems to belong to a lowdimensional manifold (the socalled model) which is stable under small perturbations of the function. This property is crucial to make the underlying lowcomplexity model robust to small noise. We show that a generalized “irrepresentable condition ” implies stable model selection under small noise perturbations in the observations and the design matrix, when the regularization parameter is tuned proportionally to the noise level. This condition is shown to be almost a necessary condition. We then show that this condition implies model consistency of the regularized estimator. That is, with a probability tending to one as the number of measurements increases, the regularized estimator belongs to the correct lowdimensional model manifold. This work unifies and generalizes several previous ones, where model consistency is known to hold for sparse, group sparse, total variation and lowrank regularizations. Lastly, we also show that this generalized “irrepresentable condition ” implies that the forwardbackward proximal splitting algorithm identifies the model after a finite number of steps.
G.: Local linear convergence of Forward–Backward under partial smoothness. NIPS
, 1970
"... In this paper, we consider the Forward–Backward proximal splitting algorithm to minimize the sum of two proper convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relative to an active manifold M. We propose a generic framework under which we sho ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we consider the Forward–Backward proximal splitting algorithm to minimize the sum of two proper convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relative to an active manifold M. We propose a generic framework under which we show that the Forward–Backward (i) correctly identifies the active manifold M in a finite number of iterations, and then (ii) enters a local linear convergence regime that we characterize precisely. This gives a grounded and unified explanation to the typical behaviour that has been observed numerically for many problems encompassed in our framework, including the Lasso, the group Lasso, the fused Lasso and the nuclear norm regularization to name a few. These results may have numerous applications including in signal/image processing processing, sparse recovery and machine learning. 1
Learning by marginalizing corrupted features
 In Proceedings of the International Conference on Machine Learning
, 2013
"... The goal of machine learning is to develop predictors that generalize well to test data. Ideally, this is achieved by training on an almost infinitely large training data set that captures all variations in the data distribution. In practical learning settings, however, we do not have infinite data ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
The goal of machine learning is to develop predictors that generalize well to test data. Ideally, this is achieved by training on an almost infinitely large training data set that captures all variations in the data distribution. In practical learning settings, however, we do not have infinite data and our predictors may overfit. Overfitting may be combatted, for example, by adding a regularizer to the training objective or by defining a prior over the model parameters and performing Bayesian inference. In this paper, we propose a third, alternative approach to combat overfitting: we extend the training set with infinitely many artificial training examples that are obtained by corrupting the original training data. We show that this approach is practical and efficient for a range of predictors and corruption models. Our approach, called marginalized corrupted features (MCF), trains robust predictors by minimizing the expected value of the loss function under the corruption model. We show empirically on a variety of data sets that MCF classifiers can be trained efficiently, may generalize substantially better to test data, and are also more robust to feature deletion at test time. 1.
Cox regression with correlation based regularization for electronic health records
 In Data Mining (ICDM), 2013 IEEE 13th International Conference on
, 2013
"... AbstractSurvival Regression models play a vital role in analyzing timetoevent data in many practical applications ranging from engineering to economics to healthcare. These models are ideal for prediction in complex data problems where the response is a timetoevent variable. An event is define ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
AbstractSurvival Regression models play a vital role in analyzing timetoevent data in many practical applications ranging from engineering to economics to healthcare. These models are ideal for prediction in complex data problems where the response is a timetoevent variable. An event is defined as the occurrence of a specific event of interest such as a chronic health condition. Cox regression is one of the most popular survival regression model used in such applications. However, these models have the tendency to overfit the data which is not desirable for healthcare applications because it limits their generalization to other hospital scenarios. In this paper, we address these challenges for the cox regression model. We combine two unique correlation based regularizers with cox regression to handle correlated and grouped features which are commonly seen in many practical problems. The proposed optimization problems are solved efficiently using cyclic coordinate descent and Alternate Direction Method of Multipliers algorithms. We conduct experimental analysis on the performance of these algorithms over several synthetic datasets and electronic health records (EHR) data about heart failure diagnosed patients from a hospital. We demonstrate through our experiments that these regularizers effectively enhance the ability of cox regression to handle correlated features. In addition, we extensively compare our results with other regularized linear and logistic regression algorithms. We validate the goodness of the features selected by these regularized cox regression models using the biomedical literature and different feature selection algorithms.
Model Selection with Piecewise Regular Gauges
"... Regularization plays a pivotal role when facing the challenge of solving illposed inverse problems, where the number of observations is smaller than the ambient dimension of the object to be estimated. A line of recent work has studied regularization models with various types of lowdimensional str ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Regularization plays a pivotal role when facing the challenge of solving illposed inverse problems, where the number of observations is smaller than the ambient dimension of the object to be estimated. A line of recent work has studied regularization models with various types of lowdimensional structures. In such settings, the general approach is to solve a regularized optimization problem, which combines a data fidelity term and some regularization penalty that promotes the assumed lowdimensional/simple structure. This paper provides a general framework to capture this lowdimensional structure through what we coin piecewise regular gauges. These are convex, nonnegative, closed, bounded and positively homogenous functions that will promote objects living on lowdimensional subspaces. This class of regularizers encompasses many popular examples such as the `1 norm, `1 − `2 norm (group sparsity), nuclear norm, as well as several others including the ` ∞ norm. We will show that the set of piecewise regular gauges is closed under addition and precomposition by a linear operator, which allows to cover mixed regularization (e.g. sparse+lowrank), and the socalled analysistype priors (e.g. total variation, fused Lasso, trace Lasso, bounded polyhedral gauges). Our main result presents a unified sharp analysis of exact and robust recovery of the lowdimensional subspace model associated to the object to recover from partial measurements. This analysis is illustrated on a number of special and previously studied cases.