Results 1 - 10
of
23
Learning the Kernel with Hyperkernels
, 2003
"... This paper addresses the problem of choosing a kernel suitable for estimation with a Support Vector Machine, hence further automating machine learning. This goal is achieved by defining a Reproducing Kernel Hilbert Space on the space of kernels itself. Such a formulation leads to a statistical es ..."
Abstract
-
Cited by 59 (2 self)
- Add to MetaCart
This paper addresses the problem of choosing a kernel suitable for estimation with a Support Vector Machine, hence further automating machine learning. This goal is achieved by defining a Reproducing Kernel Hilbert Space on the space of kernels itself. Such a formulation leads to a statistical estimation problem very much akin to the problem of minimizing a regularized risk functional.
Variable Selection Using SVM-based Criteria
, 2003
"... We propose new methods to evaluate variable subset relevance with a view to variable selection. ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
We propose new methods to evaluate variable subset relevance with a view to variable selection.
Optimal Properties and Adaptive Tuning of Standard and Nonstandard Support Vector Machines
- IN PROCEEDINGS OF THE MSRI BERKELEY WORKSHOP ON
, 2002
"... We review some of the basic ideas of Support Vector Machines (SVM's) for classification, with the goal of describing how these ideas can sit comfortably inside the statistical literature in decision theory and penalized likelihood regression. We review recent work on adaptive tuning of SVMs, discuss ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
We review some of the basic ideas of Support Vector Machines (SVM's) for classification, with the goal of describing how these ideas can sit comfortably inside the statistical literature in decision theory and penalized likelihood regression. We review recent work on adaptive tuning of SVMs, discussing generalizations to the nonstandard case where the training set is not representative and misclassification costs are not equal. Mention is made of recent results in the multicategory case.
Multi-objective model selection for support vector machines
- Proceedings of the Third International Conference on Evolutionary MultiCriterion Optimization (EMO 2005), volume 3410 of LNAI
, 2005
"... Abstract. In this article, model selection for support vector machines is viewed as a multi-objective optimization problem, where model complexity and training accuracy define two conflicting objectives. Different optimization criteria are evaluated: Split modified radius margin bounds, which allow ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Abstract. In this article, model selection for support vector machines is viewed as a multi-objective optimization problem, where model complexity and training accuracy define two conflicting objectives. Different optimization criteria are evaluated: Split modified radius margin bounds, which allow for comparing existing model selection criteria, and the training error in conjunction with the number of support vectors for designing sparse solutions. 1
Efficient multiple hyperparameter learning for log-linear models
- in NIPS
, 2007
"... Using multiple regularization hyperparameters is an effective method for managing model complexity in problems where input features have varying amounts of noise. While algorithms for choosing multiple hyperparameters are often used in neural networks and support vector machines, they are not common ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Using multiple regularization hyperparameters is an effective method for managing model complexity in problems where input features have varying amounts of noise. While algorithms for choosing multiple hyperparameters are often used in neural networks and support vector machines, they are not common in structured prediction tasks, such as sequence labeling or parsing. In this paper, we consider the problem of learning regularization hyperparameters for log-linear models, a class of probabilistic models for structured prediction tasks which includes conditional random fields (CRFs). Using an implicit differentiation trick, we derive an efficient gradient-based method for learning Gaussian regularization priors with multiple hyperparameters. In both simulations and the real-world task of computational RNA secondary structure prediction, we find that multiple hyperparameter learning provides a significant boost in accuracy compared to models learned using only a single regularization hyperparameter. 1
Image Replica Detection based on Support Vector Classifier
- IN PROC. SPIE APPLICATIONS OF DIGITAL IMAGE PROCESSING XXVIII
, 2005
"... In this paper, we propose a technique for image replica detection. By replica, we mean equivalent versions of a given reference image, e.g. after it has undergone operations such as compression, filtering or resizing. Applications of this technique include discovery of copyright infringement or dete ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
In this paper, we propose a technique for image replica detection. By replica, we mean equivalent versions of a given reference image, e.g. after it has undergone operations such as compression, filtering or resizing. Applications of this technique include discovery of copyright infringement or detection of illicit content. The technique
Model selection via bilevel optimization
- In IJCNN
, 2006
"... Abstract — A key step in many statistical learning methods used in machine learning involves solving a convex optimization problem containing one or more hyper-parameters that must be selected by the users. While cross validation is a commonly employed and widely accepted method for selecting these ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract — A key step in many statistical learning methods used in machine learning involves solving a convex optimization problem containing one or more hyper-parameters that must be selected by the users. While cross validation is a commonly employed and widely accepted method for selecting these parameters, its implementation by a grid-search procedure in the parameter space effectively limits the desirable number of hyper-parameters in a model, due to the combinatorial explosion of grid points in high dimensions. This paper proposes a novel bilevel optimization approach to cross validation that provides a systematic search of the hyper-parameters. The bilevel approach enables the use of the state-of-the-art optimization methods and their well-supported softwares. After introducing the bilevel programming approach, we discuss computational methods for solving a bilevel cross-validation program, and present numerical results to substantiate the viability of this novel approach as a promising computational tool for model selection in machine learning.
Bilevel model selection for support vector machines
, 2007
"... Abstract. The successful application of Support Vector Machines (SVMs), kernel methods and other statistical machine learning methods requires selection of model parameters based on estimates of the generalization error. This paper presents a novel approach to systematic model selection through bile ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract. The successful application of Support Vector Machines (SVMs), kernel methods and other statistical machine learning methods requires selection of model parameters based on estimates of the generalization error. This paper presents a novel approach to systematic model selection through bilevel optimization. We show how modelling tasks for widely used machine learning methods can be formulated as bilevel optimization problems and describe how the approach can address a broad range of tasks—among which are parameter, feature and kernel selection In addition, we also discuss the challenges in implementing these approaches and enumerate opportunities for future work in this emerging research area. 1.
Theoretical and Practical Model Selection Methods for Support Vector Classifiers
- in L. Wang (Ed.), Support Vector Machines: Theory and Applications
, 2005
"... Abstract. In this chapter, we revise several methods for SVM model selection, deriving from different approaches: some of them build on practical lines of reasoning but are not fully justified by a theoretical point of view; on the other hand, some methods rely on rigorous theoretical work but are o ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract. In this chapter, we revise several methods for SVM model selection, deriving from different approaches: some of them build on practical lines of reasoning but are not fully justified by a theoretical point of view; on the other hand, some methods rely on rigorous theoretical work but are of little help when applied to real–world problems, because the underlying hypotheses cannot be verified or the result of their application is uninformative. Our objective is to sketch some light on these issues by carefully analyze the most well–known methods and test some of them on standard benchmarks to evaluate their effectiveness.
Model selection via bilevel programming
- Proceedings of the IEEE International Joint Conference on Neural Networks
, 2006
"... Support vector machines and related classification models require the solution of convex optimization problems that have one or more regularization hyper-parameters. Typically, the hyper-parameters are selected to minimize cross validated estimates of the out-of-sample classification error of the mo ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Support vector machines and related classification models require the solution of convex optimization problems that have one or more regularization hyper-parameters. Typically, the hyper-parameters are selected to minimize cross validated estimates of the out-of-sample classification error of the model. This cross-validation optimization problem can be formulated as a bilevel program in which the outer-level objective minimizes the average number of misclassified points across the cross-validation folds, subject to inner-level constraints such that the classification functions for each fold are (exactly or nearly) optimal for the selected hyper-parameters. Feature selection is included in the bilevel program in the form of bound constraints in the weights. The resulting bilevel problem is converted to a mathematical program with linear equilibrium constraints, which is solved using state-of-the-art optimization methods. This approach is significantly more versatile than commonly used grid search procedures, enabling, in particular, the use of models with many hyper-parameters. Numerical results demonstrate the practicality of this approach for model selection in machine learning.

