Results 11  20
of
359
The connection between regularization operators and support vector kernels
, 1998
"... In this paper a correspondence is derived between regularization operators used in regularization networks and support vector kernels. We prove that the Green’s Functions associated with regularization operators are suitable support vector kernels with equivalent regularization properties. Moreover, ..."
Abstract

Cited by 175 (40 self)
 Add to MetaCart
In this paper a correspondence is derived between regularization operators used in regularization networks and support vector kernels. We prove that the Green’s Functions associated with regularization operators are suitable support vector kernels with equivalent regularization properties. Moreover, the paper provides an analysis of currently used support vector kernels in the view of regularization theory and corresponding operators associated with the classes of both polynomial kernels and translation invariant kernels. The latter are also analyzed on periodical domains. As a byproduct we show that a large number of radial basis functions, namely conditionally positive definite
The mathematics of learning: Dealing with data
 Notices of the American Mathematical Society
, 2003
"... Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1 ..."
Abstract

Cited by 168 (18 self)
 Add to MetaCart
(Show Context)
Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1
Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel
"... Support vector machines (SVMs) with the Gaussian (RBF) kernel have been popular for practical use. Model selection in this class of SVMs involves two hyperparameters: the penalty parameter C and the kernel width σ. This paper analyzes the behavior of the SVM classifier when these hyperparameters tak ..."
Abstract

Cited by 163 (8 self)
 Add to MetaCart
Support vector machines (SVMs) with the Gaussian (RBF) kernel have been popular for practical use. Model selection in this class of SVMs involves two hyperparameters: the penalty parameter C and the kernel width σ. This paper analyzes the behavior of the SVM classifier when these hyperparameters take very small or very large values. Our results help in a good understanding of the hyperparameter space that leads to an efficient heuristic method of searching for hyperparameter values with small generalization errors. The analysis also indicates that if complete model selection using the Gaussian kernel has been conducted, there is no need to consider linear SVM.
A nonparametric approach to pricing and hedging derivative securities via learning networks
 Journal of Finance
, 1994
"... http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, no ..."
Abstract

Cited by 158 (7 self)
 Add to MetaCart
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, noncommercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
Networks and the Best Approximation Property
 Biological Cybernetics
, 1989
"... Networks can be considered as approximation schemes. Multilayer networks of the backpropagation type can approximate arbitrarily well continuous functions (Cybenko, 1989# Funahashi, 1989# Stinchcombe and White, 1989). Weprovethatnetworks derived from regularization theory and including Radial Bas ..."
Abstract

Cited by 143 (8 self)
 Add to MetaCart
Networks can be considered as approximation schemes. Multilayer networks of the backpropagation type can approximate arbitrarily well continuous functions (Cybenko, 1989# Funahashi, 1989# Stinchcombe and White, 1989). Weprovethatnetworks derived from regularization theory and including Radial Basis Functions (Poggio and Girosi, 1989), have a similar property.From the point of view of approximation theory, however, the property of approximating continuous functions arbitrarily well is not sufficientforcharacterizing good approximation schemes. More critical is the property of best approximation. The main result of this paper is that multilayer networks, of the type used in backpropagation, are not best approximation. For regularization networks (in particular Radial Basis Function networks) we prove existence and uniqueness of best approximation.
The analysis of decomposition methods for support vector machines
 IEEE Transactions on Neural Networks
, 1999
"... Abstract. The decomposition method is currently one of the major methods for solving support vector machines. An important issue of this method is the selection of working sets. In this paper through the design of decomposition methods for boundconstrained SVM formulations we demonstrate that the w ..."
Abstract

Cited by 134 (21 self)
 Add to MetaCart
(Show Context)
Abstract. The decomposition method is currently one of the major methods for solving support vector machines. An important issue of this method is the selection of working sets. In this paper through the design of decomposition methods for boundconstrained SVM formulations we demonstrate that the working set selection is not a trivial task. Then from the experimental analysis we propose a simple selection of the working set which leads to faster convergences for difficult cases. Numerical experiments on different types of problems are conducted to demonstrate the viability of the proposed method.
Input Space Versus Feature Space in KernelBased Methods
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 1999
"... This paper collects some ideas targeted at advancing our understanding of the feature spaces associated with support vector (SV) kernel functions. We first discuss the geometry of feature space. In particular, we review what is known about the shape of the image of input space under the feature spac ..."
Abstract

Cited by 130 (3 self)
 Add to MetaCart
(Show Context)
This paper collects some ideas targeted at advancing our understanding of the feature spaces associated with support vector (SV) kernel functions. We first discuss the geometry of feature space. In particular, we review what is known about the shape of the image of input space under the feature space map, and how this influences the capacity of SV methods. Following this, we describe how the metric governing the intrinsic geometry of the mapped surface can be computed in terms of the kernel, using the example of the class of inhomogeneous polynomial kernels, which are often used in SV pattern recognition. We then discuss the connection between feature space and input space by dealing with the question of how one can, given some vector in feature space, find a preimage (exact or approximate) in input space. We describe algorithms to tackle this issue, and show their utility in two applications of kernel methods. First, we use it to reduce the computational complexity of SV decision functions; second, we combine it with the Kernel PCA algorithm, thereby constructing a nonlinear statistical denoising technique which is shown to perform well on realworld data.
Local Error Estimates for Radial Basis Function Interpolation of Scattered Data
 IMA J. Numer. Anal
, 1992
"... Introducing a suitable variational formulation for the local error of scattered data interpolation by radial basis functions OE(r), the error can be bounded by a term depending on the Fourier transform of the interpolated function f and a certain "Kriging function", which allows a formulat ..."
Abstract

Cited by 120 (24 self)
 Add to MetaCart
(Show Context)
Introducing a suitable variational formulation for the local error of scattered data interpolation by radial basis functions OE(r), the error can be bounded by a term depending on the Fourier transform of the interpolated function f and a certain "Kriging function", which allows a formulation as an integral involving the Fourier transform of OE. The explicit construction of locally wellbehaving admissible coefficient vectors makes the Kriging function bounded by some power of the local density h of data points. This leads to error estimates for interpolation of functions f whose Fourier transform f is "dominated" by the nonnegative Fourier transform / of /(x) = OE(kxk) in the sense R j f j 2 / \Gamma1 dt ! 1. Approximation orders are arbitrarily high for interpolation with Hardy multiquadrics, inverse multiquadrics and Gaussian kernels. This was also proven in recent papers by Madych and Nelson, using a reproducing kernel Hilbert space approach and requiring the same h...
A Study on Sigmoid Kernels for SVM and the Training of nonPSD Kernels by SMOtype Methods
, 2003
"... The sigmoid kernel was quite popular for support vector machines due to its origin from neural networks. However, as the kernel matrix may not be positive semidefinite (PSD), it is not widely used and the behavior is unknown. In this paper, we analyze such nonPSD kernels through the point of view o ..."
Abstract

Cited by 94 (5 self)
 Add to MetaCart
(Show Context)
The sigmoid kernel was quite popular for support vector machines due to its origin from neural networks. However, as the kernel matrix may not be positive semidefinite (PSD), it is not widely used and the behavior is unknown. In this paper, we analyze such nonPSD kernels through the point of view of separability. Based on the investigation of parameters in different ranges, we show that for some parameters, the kernel matrix is conditionally positive definite (CPD), a property which explains its practical viability. Experiments are given to illustrate our analysis. Finally, we discuss how to solve the nonconvex dual problems by SMOtype decomposition methods. Suitable modifications for any symmetric nonPSD kernel matrices are proposed with convergence proofs.
Priors, Stabilizers and Basis Functions: from regularization to radial, tensor and additive splines
, 1993
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular we had discussed how standard smoothness functionals lead to a subclass of regularization networks, th ..."
Abstract

Cited by 86 (14 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular we had discussed how standard smoothness functionals lead to a subclass of regularization networks, the wellknown Radial Basis Functions approximation schemes. In this paper weshow that regularization networks encompass amuch broader range of approximation schemes, including many of the popular general additivemodels and some of the neural networks. In particular weintroduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same extension that leads from Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additivemodels to ridge approximation models, containing as special cases Breiman's hinge functions and some forms of Projection Pursuit Regression. We propose to use the term GeneralizedRegularization Networks for this broad class of approximation schemes that follow from an extension of regularization. In the probabilistic interpretation of regularization, the different classes of basis functions correspond to different classes of prior probabilities on the approximating function spaces, and therefore to differenttypes of smoothness assumptions. In the final part of the paper, weshow the relation between activation functions of the Gaussian and sigmoidal type by considering the simple case of the kernel G(x)=x. In summary,