Results 1 - 10
of
68
Regularization Theory and Neural Networks Architectures
- Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract
-
Cited by 257 (30 self)
- Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Regularization networks and support vector machines
- Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract
-
Cited by 215 (28 self)
- Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Correlation-based feature selection for machine learning
, 1998
"... A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that ..."
Abstract
-
Cited by 86 (3 self)
- Add to MetaCart
A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set.
A unified framework for Regularization Networks and Support Vector Machines
, 1999
"... This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS-9800032, the O#ce ofN aval Researc ..."
Abstract
-
Cited by 40 (11 self)
- Add to MetaCart
This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS-9800032, the O#ce ofN aval Research under contractN o.N 0001493 -1-0385 and contractN o.N 00014-95-1-0600. Partial support was also provided by Daimler-Benz AG, Eastman Kodak, Siemens Corporate Research, Inc., ATR and AT&T. Contents Introductic 3 2 OverviF of stati.48EF learni4 theory 5 2.1 Unifo6 Co vergence and the Vapnik-Chervo nenkis bo und ............. 7 2.2 The metho d o Structural Risk Minimizatio ..................... 10 2.3 #-unifo8 co vergence and the V # ..................... 10 2.4 Overviewo fo urappro6 h ............................... 13 3 Reproduci9 Kernel HiT ert Spaces: a briL overviE 14 4RegulariEqq.L Networks 16 4.1 Radial Basis Functio8 ................................. 19 4.2 Regularizatioz generalized splines and kernel smo oxy rs .............. 20 4.3 Dual representatio o f Regularizatio Netwo rks ................... 21 4.4 Fro regressioto 5 Support vector machiT9 22 5.1 SVMin RKHS ..................................... 22 5.2 Fro regressioto 6SRMforRNsandSVMs 26 6.1 SRMfo SVMClassificatio .............................. 28 6.1.1 Distributio dependent bo undsfo SVMC .................. 29 7 A BayesiL Interpretatiq ofRegulariTFqEL and SRM? 30 7.1 Maximum A Po terio6 Interpretatio o f ............... 30 7.2 Bayesian interpretatio o f the stabilizer in the RN andSVMfunctio6I6 ...... 32 7.3 Bayesian interpretatio o f the data term in the Regularizatio andSVMfunctioy8 33 7.4 Why a MAP interpretatio may be misleading .................... 33 Connectine between SVMs and Sparse Ap...
Blur Identification by the Method of Generalized Cross-Validation
- IEEE Trans. Image Processing
, 1991
"... The point-spread function (PSF) of a blurred image is often unknown a priori --- the blur must first be identified from the degraded image data before restoring the image. We introduce generalized cross-validation (GCV) to address the blur identification problem. Motivated by the success of GCV in i ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
The point-spread function (PSF) of a blurred image is often unknown a priori --- the blur must first be identified from the degraded image data before restoring the image. We introduce generalized cross-validation (GCV) to address the blur identification problem. Motivated by the success of GCV in identifying optimal smoothing parameters for image restoration, we have extended the method to the problem of identifying blur parameters as well. The GCV criterion identifies model parameters for the blur, the image, and the regularization parameter, providing all the information necessary to restore the image. Experiments are presented which show that GCV is capable of yielding good identification results. Furthermore, a comparison of the GCV criterion to maximum likelihood (ML) estimation shows that GCV often outperforms ML in identifying the blur and image model parameters. To appear in IEEE Transactions on Image Processing. This work was supported in part by the Joint Services Electroni...
Subspace information criterion for model selection
- Neural Computation
, 2001
"... The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC), which is a generalization of Mallows ’ C L. It is a ..."
Abstract
-
Cited by 27 (16 self)
- Add to MetaCart
The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC), which is a generalization of Mallows ’ C L. It is assumed that the learning target function belongs to a specified functional Hilbert space and the generalization error is defined as the Hilbert space squared norm of the difference between the learning result function and target function. SIC gives an unbiased estimate of the generalization error so defined. SIC assumes the availability of an unbiased estimate of the target function and the noise covariance matrix, which are generally unknown. A practical calculation method of SIC for least mean squares learning is provided under the assumption that the dimension of the Hilbert space is less than the number of training examples. Finally, computer simulations in two examples show that SIC works well even when the number of training examples is small.
Regularisation in the Selection of Radial Basis Function Centres
- NEURAL COMPUTATION
, 1995
"... Subset selection and regularisation are two well known techniques which can improve the generalisation performance of nonparametric linear regression estimators, such as radial basis function networks. This paper examines regularised forward selection (RFS) -- a combination of forward subset selecti ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
Subset selection and regularisation are two well known techniques which can improve the generalisation performance of nonparametric linear regression estimators, such as radial basis function networks. This paper examines regularised forward selection (RFS) -- a combination of forward subset selection and zero-order regularisation. An efficient implementation of RFS into which either delete-1 or generalised cross-validation can be incorporated and a re-estimation formula for the regularisation parameter are also discussed. Simulation studies are presented which demonstrate improved generalisation performance due to regularisation in the forward selection of radial basis function centres.
Efficient Leave-One-Out Cross-Validation of Kernel Fisher Discriminant Classifiers
- PATTERN RECOGNITION
, 2003
"... Mika et al. [1] apply the "kernel trick" to obtain a non-linear variant of Fisher's linear discriminant analysis method, demonstrating state-of-the-art performance on a range of benchmark datasets. We show that leave-one-out cross-validation of kernel Fisher discriminant classifiers can be implement ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
Mika et al. [1] apply the "kernel trick" to obtain a non-linear variant of Fisher's linear discriminant analysis method, demonstrating state-of-the-art performance on a range of benchmark datasets. We show that leave-one-out cross-validation of kernel Fisher discriminant classifiers can be implemented with a computational complexity of only O(l³) operations rather than the O(l^4) of a nave implementation, where l is the number of training patterns. Leave-one-out cross-validation then becomes an attractive means of model selection in large-scale applications of kernel Fisher discriminant analysis, being significantly faster than conventional k-fold cross-validation procedures commonly used.
Adaptive spectral methods for simulation output analysis
- IBM Journal of Research and Development
, 1981
"... This paper addresses two central problems in simulation methodology: the generation of conjidence intervals for the steady state means of the output sequences and the sequential of use these conjidence intervals to control the run length. The variance of the sample mean of a covariance stationary pr ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This paper addresses two central problems in simulation methodology: the generation of conjidence intervals for the steady state means of the output sequences and the sequential of use these conjidence intervals to control the run length. The variance of the sample mean of a covariance stationary process is given approximately by p(O)lN, where p(f) is the spectral density at frequency f and N is the sample size. In earlier an paper we developed a method of confidence interval generation based on the estimation of p(0) through the least squares jit of a quadratic to the logarithm of the periodogram. This method was applied in a run length Control procedure to a sequence of batched means. As the run length increased the batch means were rebatched into larger batch sizes so as to limit storage requirements. In this rebatching the shape of the spectral density changes, gradually becoming flat as N increases. Quadratics were chosen as a compromise between small sample bias and large sample stability. In this paper we consider smoothing techniques which adapt to the changing spectral shape in an attempt to improve both the small and large sample behavior of the method. The techniques considered are polynomial smoothing with the degree selected sequentially using standard regression statistics, polynomial smoothing with the degree selected by cross validation, and smoothing splines with the amount of smoothing determined by cross validation. These techniques were empirically evaluated both for fixed sample sizes and when incorporated into the sequential run length control procedure. Forjixed sample sizes they did not improve the small sample behavior and only marginally improved the large sample behavior when compared with the quadratic method. Their performance in the sequential procedure was unsatisfactory. Hence, the straightforward quadratic technique recommended in the earlier paper is still recommended as an effective, practical technique for simulation conjidence interval generation and run length control. 1.

