Results 1  10
of
55
Theory and applications of Robust Optimization
, 2007
"... In this paper we survey the primary research, both theoretical and applied, in the field of Robust Optimization (RO). Our focus will be on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying the most pr ..."
Abstract

Cited by 110 (16 self)
 Add to MetaCart
(Show Context)
In this paper we survey the primary research, both theoretical and applied, in the field of Robust Optimization (RO). Our focus will be on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying the most prominent theoretical results of RO over the past decade, we will also present some recent results linking RO to adaptable models for multistage decisionmaking problems. Finally, we will highlight successful applications of RO across a wide spectrum of domains, including, but not limited to, finance, statistics, learning, and engineering.
Robustness and regularization of support vector machines
, 1485
"... We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equival ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
(Show Context)
We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVMlike algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization provides a robust optimization interpretation for the success of regularized SVMs. We use this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well.
The Interplay of Optimization and Machine Learning Research
 Journal of Machine Learning Research
, 2006
"... The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embra ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embraced the advances in mathematical programming allowing new types of models to be pursued. The special topic includes models using quadratic, linear, secondorder cone, semidefinite, and semiinfinite programs. We observe that the qualities of good optimization algorithms from the machine learning and optimization perspectives can be quite different. Mathematical programming puts a premium on accuracy, speed, and robustness. Since generalization is the bottom line in machine learning and training is normally done offline, accuracy and small speed improvements are of little concern in machine learning. Machine learning prefers simpler algorithms that work in reasonable computational time for specific classes of problems. Reducing machine learning problems to wellexplored mathematical programming classes with robust general purpose optimization codes allows machine learning researchers to rapidly develop new techniques.
Robust Regression and Lasso
"... We consider robust leastsquares regression with featurewise disturbance. We show that this formulation leads to tractable convex optimization problems, and we exhibit a particular uncertainty set for which the robust problem is equivalent to ℓ1 regularized regression (Lasso). This provides an inte ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
(Show Context)
We consider robust leastsquares regression with featurewise disturbance. We show that this formulation leads to tractable convex optimization problems, and we exhibit a particular uncertainty set for which the robust problem is equivalent to ℓ1 regularized regression (Lasso). This provides an interpretation of Lasso from a robust optimization perspective. We generalize this robust formulation to consider more general uncertainty sets, which all lead to tractable convex optimization problems. Therefore, we provide a new methodology for designing regression algorithms, which generalize known formulations. The advantage is that robustness to disturbance is a physical property that can be exploited: in addition to obtaining new formulations, we use it directly to show sparsity properties of Lasso, as well as to prove a general consistency result for robust regression problems, including Lasso, from a unified robustness perspective. 1
Ellipsoidal kernel machines
 In Proceedings of the Artificial Intelligence and Statistics
, 2007
"... A novel technique is proposed for improving the standard VapnikChervonenkis (VC) dimension estimate for the Support Vector Machine (SVM) framework. The improved VC estimates are based on geometric arguments. By considering bounding ellipsoids instead of the usual bounding hyperspheres and assuming ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
(Show Context)
A novel technique is proposed for improving the standard VapnikChervonenkis (VC) dimension estimate for the Support Vector Machine (SVM) framework. The improved VC estimates are based on geometric arguments. By considering bounding ellipsoids instead of the usual bounding hyperspheres and assuming gaptolerant classifiers, a linear classifier with a given margin is shown to shatter fewer points than previously estimated. This improved VC estimation method directly motivates a different estimator for the parameters of a linear classifier. Surprisingly, only VCbased arguments are needed to justify this modification to the SVM. The resulting technique is implemented using Semidefinite Programming (SDP) and is solvable in polynomial time. The new linear classifier also ensures certain invariances to affine transformations on the data which a standard SVM does not provide. We demonstrate that the technique can be kernelized via extensions to Hilbert spaces. Promising experimental results are shown on several standardized datasets. 1
Learning with Marginalized Corrupted Features
"... The goal of machine learning is to develop predictors that generalize well to test data. Ideally, this is achieved by training on very large (infinite) training data sets that capture all variations in the data distribution. In the case of finite training data, an effective solution is to extend the ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
The goal of machine learning is to develop predictors that generalize well to test data. Ideally, this is achieved by training on very large (infinite) training data sets that capture all variations in the data distribution. In the case of finite training data, an effective solution is to extend the training set with artificially created examples—which, however, is also computationally costly. We propose to corrupt training examples with noise from known distributions within the exponential family and present a novel learning algorithm, called marginalized corrupted features (MCF), that trains robust predictors by minimizing the expected value of the loss function under the corrupting distribution— essentially learning with infinitely many (corrupted) training examples. We show empirically on a variety of data sets that MCF classifiers can be trained efficiently, may generalize substantially better to test data, and are more robust to feature deletion at test time. 1.
Learning with marginalized corrupted features
 In Proceedings of the 30th International Conference on Machine Learning (ICML13
, 2013
"... The goal of machine learning is to develop predictors that generalize well to test data. Ideally, this is achieved by training on very large (infinite) training data sets that capture all variations in the data distribution. In the case of finite training data, an effective solution is to extend t ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
The goal of machine learning is to develop predictors that generalize well to test data. Ideally, this is achieved by training on very large (infinite) training data sets that capture all variations in the data distribution. In the case of finite training data, an effective solution is to extend the training set with artificially created examples—which, however, is also computationally costly. We propose to corrupt training examples with noise from known distributions within the exponential family and present a novel learning algorithm, called marginalized corrupted features (MCF), that trains robust predictors by minimizing the expected value of the loss function under the corrupting distribution— essentially learning with infinitely many (corrupted) training examples. We show empirically on a variety of data sets that MCF classifiers can be trained efficiently, may generalize substantially better to test data, and are more robust to feature deletion at test time. 1.
Robustness and Generalization
"... We derive generalization bounds for learning algorithms based on their robustness: the property that if a testing sample is “similar ” to a training sample, then the testing error is close to the training error. This provides a novel approach, different from the complexity or stability arguments, to ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
We derive generalization bounds for learning algorithms based on their robustness: the property that if a testing sample is “similar ” to a training sample, then the testing error is close to the training error. This provides a novel approach, different from the complexity or stability arguments, to study generalization of learning algorithms. We further show that a weak notion of robustness is both sufficient and necessary for generalizability, which implies that robustness is a fundamental property for learning algorithms to work. 1
Maximum Relative Margin and DataDependent regularization
 JOURNAL OF MACHINE LEARNING RESEARCH
"... Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensit ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensitivity to affine transformations of the data and to directions with large data spread. Maximum margin solutions may be misled by the spread of data and preferentially separate classes along large spread directions. This article corrects these weaknesses by measuring margin not in the absolute sense but rather only relative to the spread of data in any projection direction. Maximum relative margin corresponds to a datadependent regularization on the classification function while maximum absolute margin corresponds to an ℓ2 norm constraint on the classification function. Interestingly, the proposed improvements only require simple extensions to existing maximum margin formulations and preserve the computational efficiency of SVMs. Through the maximization of relative margin, surprising performance gains are achieved on realworld problems such as digit, image histogram, and text classification. In addition, risk bounds are derived for the new formulation based on Rademacher averages.