Results 1  10
of
12
Statistical pattern recognition: A review
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques ..."
Abstract

Cited by 1035 (30 self)
 Add to MetaCart
The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have bean receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the wellknown methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
Sparse Bayesian Learning and the Relevance Vector Machine
, 2001
"... This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classification tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vect ..."
Abstract

Cited by 966 (5 self)
 Add to MetaCart
This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classification tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vector machine’ (RVM), a model of identical functional form to the popular and stateoftheart `support vector machine ’ (SVM). We demonstrate that by exploiting a probabilistic Bayesian learning framework, we can derive accurate prediction models which typically utilise dramatically fewer basis functions than a comparable SVM while offering a number of additional advantages. These include the benefits of probabilistic predictions, automatic estimation of `nuisance’ parameters, and the facility to utilise arbitrary basis functions (e.g. non`Mercer’ kernels). We detail the Bayesian framework and associated learning algorithm for the RVM, and give some illustrative examples of its application along with some comparative benchmarks. We offer some explanation for the exceptional degree of sparsity obtained, and discuss and demonstrate some of the advantageous features, and potential extensions, of Bayesian relevance learning.
The Relevance Vector Machine
, 2000
"... The support vector machine (SVM) is a stateoftheart technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement ..."
Abstract

Cited by 294 (6 self)
 Add to MetaCart
The support vector machine (SVM) is a stateoftheart technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement to estimate a tradeoff parameter and the need to utilise `Mercer' kernel functions. In this paper we introduce the Relevance Vector Machine (RVM), a Bayesian treatment of a generalised linear model of identical functional form to the SVM. The RVM suffers from none of the above disadvantages, and examples demonstrate that for comparable generalisation performance, the RVM requires dramatically fewer kernel functions.
Fast Marginal Likelihood Maximisation for Sparse Bayesian Models
 Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics
, 2003
"... The 'sparse Bayesian' modelling approach, as exemplified by the 'relevance vector machine ', enables sparse classification and regression functions to be obtained by linearlyweighting a small nmnber of fixed basis functions from a large dictionary of potential candidates. S ..."
Abstract

Cited by 117 (0 self)
 Add to MetaCart
The 'sparse Bayesian' modelling approach, as exemplified by the 'relevance vector machine ', enables sparse classification and regression functions to be obtained by linearlyweighting a small nmnber of fixed basis functions from a large dictionary of potential candidates. Such a model conveys a nmnber of advantages over the related and very popular 'support vector machine', but the necessary 'training' procedure optimisation of the marginal likelihood function is typically much slower. We describe a new and highly accelerated algorithm which exploits recentlyelucidated properties of the marginal likelihood function to enable maximisation via a principled and efficient sequential addition and deletion of candidate basis functions.
CommitteeBased Sample Selection For Probabilistic Classifiers
 Journal of Artificial Intelligence Research
, 1999
"... In many realworld learning tasks it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by sample selection. In this approach, during training the learning program examines many unlabeled examples and selects for ..."
Abstract

Cited by 65 (0 self)
 Add to MetaCart
(Show Context)
In many realworld learning tasks it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by sample selection. In this approach, during training the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids redundantly labeling examples that contribute little new information. Our work follows on previous research on Query By Committee, and extends the committeebased paradigm to the context of probabilistic classification. We describe a family of empirical methods for committeebased sample selection in probabilistic classification models, which evaluate the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set labeled so far. The method was applied to...
Moderating the Outputs of Support Vector Machine Classifiers
 IEEE Transactions on Neural Networks
, 1999
"...  In this paper, we extend the use of moderated outputs to the support vector machine (SVM) by making use of a relationship between SVM and the evidence framework. The moderated output is more in line with the Bayesian idea that the posterior weight distribution should be taken into account upon pre ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
(Show Context)
 In this paper, we extend the use of moderated outputs to the support vector machine (SVM) by making use of a relationship between SVM and the evidence framework. The moderated output is more in line with the Bayesian idea that the posterior weight distribution should be taken into account upon prediction, and it also alleviates the usual tendency of assigning overly high condence to the estimated class memberships of the test patterns. Moreover, the moderated output derived here can be taken as an approximation to the posterior class probability. Hence, meaningful rejection thresholds can be assigned and outputs from several networks can be directly compared. Experimental results on both articial and realworld data are also discussed. KeywordsSupport vector machine, Evidence framework, Moderated output, Bayesian I. Introduction I N recent years, there has been a lot of interest in studying the support vector machine (SVM) [1], [2], [3], [4], [5], [6], [7]. SVM is based on the i...
Bayesian Neural Networks and Density Networks
 Nuclear Instruments and Methods in Physics Research, A
, 1994
"... This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
(Show Context)
This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent variable model is dened that is capable of discovering the underlying dimensionality of a data set. A Bayesian learning algorithm for these networks is derived and demonstrated. 1 Introduction to the Bayesian view of learning A binary classier is a parameterized mapping from an input x to an output y 2 [0; 1]); when its parameters w are specied, the classier states the probability that an input x belongs to class t = 1, rather than the alternative t = 0. Consider a binary classier which models the probability as a sigmoid function of x: P (t = 1jx; w;H) = y(x; w;H) = 1 1 + e wx (1) This form of model is known to statisticians as a linear logistic model, and in the neural networks ...
The Relevance Vector Machine Technique for Channel Equalization Application
 IEEE Trans. Neural Networks
, 2001
"... The recently introduced relevance vector machine (RVM) technique is applied to communication channel equalization. It is demonstrated that the RVM equalizer can closely match the optimal performance of the Bayesian equalizer, with a much sparser kernel representation than that is achievable by the s ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
(Show Context)
The recently introduced relevance vector machine (RVM) technique is applied to communication channel equalization. It is demonstrated that the RVM equalizer can closely match the optimal performance of the Bayesian equalizer, with a much sparser kernel representation than that is achievable by the stateofart support vector machine (SVM) technique. Keywords Support vector machines, relevance vector machines, Bayesian classication, equalization. I.
Neural Networks: A Pattern Recognition Perspective
, 1996
"... Introduction Neural networks have been exploited in a wide variety of applications, the majority of which are concerned with pattern recognition in one form or another. However, it has become widely acknowledged that the effective solution of all but the simplest of such problems requires a princip ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Introduction Neural networks have been exploited in a wide variety of applications, the majority of which are concerned with pattern recognition in one form or another. However, it has become widely acknowledged that the effective solution of all but the simplest of such problems requires a principled treatment, in other words one based on a sound theoretical framework. From the perspective of pattern recognition, neural networks can be regarded as an extension of the many conventional techniques which have been developed over several decades. Lack of understanding of the basic principles of statistical pattern recognition lies at the heart of many of the common mistakes in the application of neural networks. In this chapter we aim to show that the `black box' stigma of neural networks is largely unjustified, and that there is actually considerable insight available into the way in which neural networks operate, and how to use them effectively. Some of the ke