Results 11 - 20
of
22
Microchoice Bounds and Self Bounding Learning Algorithms
- Machine Learning
, 2001
"... A major topic in machine learning is to determine good upper bounds on the true error rates of learned hypotheses based upon their empirical performance on training data. In this paper, we demonstrate new adaptive bounds designed for learning algorithms that operate by making a sequence of choices. ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
A major topic in machine learning is to determine good upper bounds on the true error rates of learned hypotheses based upon their empirical performance on training data. In this paper, we demonstrate new adaptive bounds designed for learning algorithms that operate by making a sequence of choices. These bounds, which we call Microchoice bounds, are similar to Occam-style bounds and can be used to make learning algorithms self-bounding in the style of Freund [Fre98]. We then show how to combine these bounds with Freund's query-tree approach producing a version of Freund's query-tree structure that can be implemented with much more algorithmic efficiency.
Support vector machines for dyadic data
- Neural Computation
"... We describe a new technique for the analysis of dyadic data, where two sets of objects (“row ” and “column ” objects) are characterized by a matrix of numerical values which describe their mutual relationships. The new technique, called “Potential Support Vector Machine ” (P-SVM), is a large-margin ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We describe a new technique for the analysis of dyadic data, where two sets of objects (“row ” and “column ” objects) are characterized by a matrix of numerical values which describe their mutual relationships. The new technique, called “Potential Support Vector Machine ” (P-SVM), is a large-margin method for the construction of classifiers and regression functions for the “column ” objects. Contrary to standard support vector machine approaches, the P-SVM minimizes a scale-invariant capacity measure and requires a new set of constraints. As a result, the P-SVM method leads to a usually sparse expansion of the classification and regression functions in terms of the “row ” rather than the “column ” objects and can handle data and kernel matrices which are neither positive definite nor square. We then describe two complementary regularization schemes. The first scheme improves generalization performance for classification and regression tasks, the second scheme leads to the selection of a small, informative set of “row ” “support ” objects and can be applied to feature selection. Benchmarks for classification, regression, and feature selection tasks are performed with toy data as well as with several real world data sets. The results show, that the new method is at least competitive with but often performs better than the benchmarked standard methods for standard vectorial as well as for true dyadic data sets. In addition, a theoretical justification is provided for the new approach. 1
Support Vector Methods in Learning and Feature Extraction
, 1998
"... The last years have witnessed an increasing interest in Support Vector (SV) machines, which use Mercer kernels for efficiently performing computations in high-dimensional spaces. In pattern recognition, the SV algorithm constructs nonlinear decision functions by training a classifier to perform a li ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The last years have witnessed an increasing interest in Support Vector (SV) machines, which use Mercer kernels for efficiently performing computations in high-dimensional spaces. In pattern recognition, the SV algorithm constructs nonlinear decision functions by training a classifier to perform a linear separation in some high-dimensional space which is nonlinearly related to input space. Recently, we have developed a technique for Nonlinear Principal Component Analysis (Kernel PCA) based on the same types of kernels. This way, we can for instance efficiently extract polynomial features of arbitrary order by computing projections onto principal components in the space of all products of n pixels of images. We explain the idea of Mercer kernels and associated feature spaces, and describe connections to the theory of reproducing kernels and to regularization theory, followed by an overview of the above algorithms employing these kernels. 1. Introduction For the case of two-class pattern...
Detecting and interpreting acoustic features by support vector machines (Tech
- University of Chicago Computer Science Dept
, 2002
"... 1 INTRODUCTION Any approach to speech perception or recognition will have to specify a mechanism by means of which the acoustic input is mapped to discrete linguistic objects or symbols. In most conventional speech recognition systems, the primitive linguistic objects are taken to be phonemes. We ar ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
1 INTRODUCTION Any approach to speech perception or recognition will have to specify a mechanism by means of which the acoustic input is mapped to discrete linguistic objects or symbols. In most conventional speech recognition systems, the primitive linguistic objects are taken to be phonemes. We are pursuing an approach that considers the primary linguistic objects to be distinctive features (Jakobson et al, 1952) that will then need to be recovered from the speech signal. We proceed by developing detectors for various distinctive 1
Sample Based Generalization Bounds
, 1999
"... It is known that the covering numbers of a function class on a double sample (length 2m, where m is the number of points in the sample) can be used to bound the generalization performance of a classifier by using a margin based analysis. Traditionally this has been done using a "Sauer-like" relation ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
It is known that the covering numbers of a function class on a double sample (length 2m, where m is the number of points in the sample) can be used to bound the generalization performance of a classifier by using a margin based analysis. Traditionally this has been done using a "Sauer-like" relationship involving a combinatorial dimension such as the fat-shattering dimension. In this paper we show that one can utilize an analogous argument in terms of the observed covering numbers on a single m-sample (being the actual observed data points). The significance of this is that for certain interesting classes of functions, such as support vector machines, one can readily estimate the empirical covering numbers quite well. We show how to do so in terms of the eigenvalues of the Gram matrix created from the data. These covering numbers can be much less than a priori bounds indicate in situations where the particular data received is "easy". The work can be considered an extension of previous results which provided generalization performance bounds in terms of the VC-dimension of the class of hypotheses restricted to the sample, with the considerable advantage that the covering numbers can be readily computed, and they often are small.
Classification, Regression, and Feature Selection on Matrix Data
, 2004
"... We describe a new technique for the analysis of data which is given in matrix form. We consider two sets of objects, the "row" and the "column" objects, and we represent these objects by a matrix of numerical values which describe their mutual relationships. We then introduce a new technique, the ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We describe a new technique for the analysis of data which is given in matrix form. We consider two sets of objects, the "row" and the "column" objects, and we represent these objects by a matrix of numerical values which describe their mutual relationships. We then introduce a new technique, the "Potential Support Vector Machine" (P-SVM), as a large-margin based method for the construction of classifiers and regression functions for the "column" objects. Contrary to standard support vector machine (SVM) approaches, the P-SVM minimizes a scale-invariant capacity measure under a new set of constraints. As a result, the P-SVM can handle data matrices which are neither positive definite nor square, and leads to a usually sparse expansion of the classification boundary or the regression function in terms of the "row" rather than the "column" objects. We introduce two complementary regularization schemes in order to avoid overfitting for noisy data sets. The first scheme improves generalization performance for classification and regression problems, the second scheme leads to the selection of a small and informative set of "row" objects and can be applied to feature selection. A fast optimization algorithm based on the "Sequential Minimal Optimization" (SMO) technique is provided. We first apply
Learning via Internal Representation (Extended Abstract)
, 1998
"... ) Eli Dichterman Department of mathematics, LSE and Department of Computer Science, RHUL 1 E-mail: eli@cdam.lse.ac.uk NeuroCOLT2 Technical Report Series NC2-TR-1998-009 May, 1998 2 Produced as part of the ESPRIT Working Group in Neural and Computational Learning II, NeuroCOLT2 27150 NeuroCOLT2 Coo ..."
Abstract
- Add to MetaCart
) Eli Dichterman Department of mathematics, LSE and Department of Computer Science, RHUL 1 E-mail: eli@cdam.lse.ac.uk NeuroCOLT2 Technical Report Series NC2-TR-1998-009 May, 1998 2 Produced as part of the ESPRIT Working Group in Neural and Computational Learning II, NeuroCOLT2 27150 NeuroCOLT2 Coordinating Partner !()+, -./01 23456 Department of Computer Science Egham, Surrey TW20 0EX, England For more information contact John Shawe-Taylor at the above address or email neurocolt@neurocolt.com 1 1 Department of Mathematics, London School of Economics, Houghton Street, London WC2A 2AE, UK. and Department of Computer Science,Royal Holloway University of London, Egham, Surrey TW20 0EX, UK. 2 Received 11-MAY-1998 Introduction 1 Abstract We present a learning framework based on reducing a learning task to the problem of finding a good internal representation of the input examples; a good internal representation is a set of features, relative to which a simple generalization rule, ...
Large Margin Classification
, 1999
"... Pi The Vapnik-Chervonenkis dimension is the point at which the graph stops being linear: VCdim(H) = maxfm : for some x 1 ; : : : ; x m ; for all b 2 f\Gamma1; 1g m ; 9h b 2 H;h b (x i ) = b i g \Pi For linear functions L in R n , VCdim(L) = n + 1. \Pi Sauer's Lemma: BH (m) d X i=0 ..."
Abstract
- Add to MetaCart
Pi The Vapnik-Chervonenkis dimension is the point at which the graph stops being linear: VCdim(H) = maxfm : for some x 1 ; : : : ; x m ; for all b 2 f\Gamma1; 1g m ; 9h b 2 H;h b (x i ) = b i g \Pi For linear functions L in R n , VCdim(L) = n + 1. \Pi Sauer's Lemma: BH (m) d X i=0 ` m i ' i em d j d ; where m d = VCdim(H). 5 Basic Sta
A PAC bound for mixture discriminants
, 2000
"... Recently, McAllester [13] proved a remarkable theorem which gives a PAC-style bound on the generalization error of discriminants like the Gibbs classifier. We show how to combine this result with techniques proposed in [1] to arrive at a bound on the generalization error for arbitrary mixture discri ..."
Abstract
- Add to MetaCart
Recently, McAllester [13] proved a remarkable theorem which gives a PAC-style bound on the generalization error of discriminants like the Gibbs classifier. We show how to combine this result with techniques proposed in [1] to arrive at a bound on the generalization error for arbitrary mixture discriminants over a hypothesis space.

