Results 1 - 10
of
10
Online Bayes Point Machines
"... We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptron-like algorithms run in parallel in order to approximate the Bayes point. A rand ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptron-like algorithms run in parallel in order to approximate the Bayes point. A random subsample of the incoming data stream is used to ensure diversity in the perceptron solutions. We experimentally study the algorithm's performance on online and batch learning settings.
Large scale transductive svms
- JMLR
"... We show how the Concave-Convex Procedure can be applied to Transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is a ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
We show how the Concave-Convex Procedure can be applied to Transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is available at
Gaussian Process Regression: Active Data Selection and Test Point Rejection
- In Proceedings of the International Joint Conference on Neural Networks (IJCNN
, 2000
"... We consider active data selection and test point rejection strategies for Gaussian process regression based on the variance of the posterior over target values. Gaussian process regression is viewed as transductive regression that provides target distributions for given points rather than selectin ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
We consider active data selection and test point rejection strategies for Gaussian process regression based on the variance of the posterior over target values. Gaussian process regression is viewed as transductive regression that provides target distributions for given points rather than selecting an explicit regression function. Since not only the posterior mean but also the posterior variance are easily calculated we use this additional information to two ends: Active data selection is performed by either querying at points of high estimated posterior variance or at points that minimize the estimated posterior variance averaged over the input distribution of interest or --- in a transductive manner --- averaged over the test set. Test point rejection is performed using the estimated posterior variance as a confidence measure. We find for both a two-dimensional toy problem and for a real-world benchmark problem that the variance is a reasonable criterion for both active data...
On Transductive Regression
, 2006
"... In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in th ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets. 1
The Kernel Gibbs Sampler
- In NIPS
, 2001
"... We present an algorithm that samples the hypothesis space of kernel classifiers. Given a uniform prior over normalised weight vectors and a likelihood based on a model of label noise leads to a piecewise constant posterior that can be sampled by the kernel Gibbs sampler (KGS). The KGS is a Marko ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We present an algorithm that samples the hypothesis space of kernel classifiers. Given a uniform prior over normalised weight vectors and a likelihood based on a model of label noise leads to a piecewise constant posterior that can be sampled by the kernel Gibbs sampler (KGS). The KGS is a Markov Chain Monte Carlo method that chooses a random direction in parameter space and samples from the resulting piecewise constant density along the line chosen. The KGS can be used as an analytical tool for the exploration of Bayesian transduction, Bayes point machines, active learning, and evidence-based model selection on small data sets that are contaminated with label noise. For a simple toy example we demonstrate experimentally how a Bayes point machine based on the KGS outperforms an SVM that is incapable of taking into account label noise. 1 Introduction Two great ideas have dominated recent developments in machine learning: the application of kernel methods and the popularis...
Comparing the bayes and typicalness frameworks
- In Proceedings of the 12th European Conference on Machine Learning (ECML-2001
, 2001
"... Abstract. When correct priors are known, Bayesian algorithms give optimal decisions, and accurate confidence values for predictions can be obtained. If the prior is incorrect however, these confidence values have no theoretical base – even though the algorithms ’ predictive performance may be good. ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract. When correct priors are known, Bayesian algorithms give optimal decisions, and accurate confidence values for predictions can be obtained. If the prior is incorrect however, these confidence values have no theoretical base – even though the algorithms ’ predictive performance may be good. There also exist many successful learning algorithms which only depend on the iid assumption. Often however they produce no confidence values for their predictions. Bayesian frameworks are often applied to these algorithms in order to obtain such values, however they can rely on unjustified priors. In this paper 1 we outline the typicalness framework which can be used in conjunction with many other machine learning algorithms. The framework provides confidence information based only on the standard iid assumption and so is much more robust to different underlying data distributions. We show how the framework can be applied to existing algorithms. We also present experimental results which show that the typicalness approach performs close to Bayes when the prior is known to be correct. Unlike Bayes however, the method still gives accurate confidence values even when different data distributions are considered. 1
The typicalness framework: a comparison with the Bayesian approach
- Department of Computer Science, Royal Holloway, University of London
, 2001
"... When correct priors are known, Bayesian algorithms give optimal decisions, and accurate confidence values for predictions can be obtained. If the prior is incorrect however, these confidence values have no theoretical base -- even though the algorithms' predictive performance may be good. There ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
When correct priors are known, Bayesian algorithms give optimal decisions, and accurate confidence values for predictions can be obtained. If the prior is incorrect however, these confidence values have no theoretical base -- even though the algorithms' predictive performance may be good. There also exist many successful learning algorithms which only depend on the iid assumption. Often however they produce no confidence values for their predictions. Bayesian frameworks are often applied to these algorithms in order to obtain such values, however they can rely on unjustified priors. In this paper we outline the typicalness framework which can be used in conjunction with many other machine learning algorithms. The framework provides confidence information based only on the standard iid assumption and so is much more robust to different underlying data distributions. We show how the framework can be applied to existing algorithms. We also present experimental results which show that the typicalness approach performs close to Bayes when the prior is known to be correct. Unlike Bayes however, the method still gives accurate confidence values even when different data distributions are considered. 1
Large Margin vs. Large Volume in Transductive Learning
"... Abstract. We consider a large volume principle for transductive learning that prioritizes the transductive equivalence classes according to the volume they occupy in hypothesis space. We approximate volume maximization using a geometric interpretation of the hypothesis space. The resulting algorithm ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. We consider a large volume principle for transductive learning that prioritizes the transductive equivalence classes according to the volume they occupy in hypothesis space. We approximate volume maximization using a geometric interpretation of the hypothesis space. The resulting algorithm is defined via a non-convex optimization problem that can still be solved exactly and efficiently. We provide a bound on the test error of the algorithm and compare it to transductive SVM (TSVM) using 31 datasets. 1
The Structure of Version Space
, 2000
"... We investigate the generalisation performance of consistent classiers, i.e. classiers that are contained in the so-called version space, both from a theoretical and experimental angle. In contrast to classical VC analysis where no single classier within version space is singled out on grounds of ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We investigate the generalisation performance of consistent classiers, i.e. classiers that are contained in the so-called version space, both from a theoretical and experimental angle. In contrast to classical VC analysis where no single classier within version space is singled out on grounds of a generalisation error bound the data dependent structural risk minimisation framework suggests that there exists one particular classier that is to be preferred because it minimises the generalisation error bound. This is usually taken to provide a theoretical justication for learning algorithms such as the well known support vector machine. A reinterpretation of a recent PAC-Bayesian result, however, reveals that given a suitably chosen hypothesis space there exists a large fraction of classiers with small generalisation error albeit we cannot identify them for a specic learning task. In the particular case of linear classiers we show that classiers found by the classical perceptron algorithm have guarantees bounded by the size of version space. These results are complemented with an empirical study for kernel classiers on the task of handwritten digit recognition which demonstrates that even classiers with a small margin may exhibit excellent generalisation. In order to perform this analysis we introduce the kernel Gibbs sampler an algorithm which can be used to sample consistent kernel classiers. 1.
ekalska, “The science of pattern recognition. achievements and perspectives
- in Challenges for Computational Intelligence, Studies in Computational Intelligence Series
, 2007
"... Summary. Automatic pattern recognition is usually considered as an engineering area studying the development and evaluation of systems that imitate or assist the human ability of recognizing patterns. It may, however, also be considered as a science that studies the natural phenomenon that human bei ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Summary. Automatic pattern recognition is usually considered as an engineering area studying the development and evaluation of systems that imitate or assist the human ability of recognizing patterns. It may, however, also be considered as a science that studies the natural phenomenon that human beings (and possibly other biological systems) are able to discover, distinguish and characterize patterns in their environment, and identify new observations accordingly. The engineering approach to pattern recognition is in this view an attempt to build systems that simulate this phenomenon. By that, scientific understanding is achieved of what is needed in order to recognize patterns. Like in any science understanding can be gained from different, sometimes opposite viewpoints. We will introduce the main approaches to the science of pattern recognition as two dichotomies of complementary scenarios, giving rise to four different schools. These schools are roughly defined under the terms of expert systems, neural networks, structural and statistical pattern recognition. We will briefly describe what has been achieved by these schools, what is common and what is specific, which limitations are encountered and which perspectives arise for the future. Finally, we will focus on the challenges facing pattern recognition in the decennia to come. They deal mainly with weaker assumptions to make procedures for learning and recognition wider applicable, others need to develop new formalisms. 1

