Results 1  10
of
26
On combining classifiers
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1998
"... We develop a common theoretical framework for combining classifiers which use distinct pattern representations and show that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision. An experimental ..."
Abstract

Cited by 1392 (32 self)
 Add to MetaCart
We develop a common theoretical framework for combining classifiers which use distinct pattern representations and show that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision. An experimental comparison of various classifier combination schemes demonstrates that the combination rule developed under the most restrictive assumptions—the sum rule—outperforms other classifier combinations schemes. A sensitivity analysis of the various schemes to estimation errors is carried out to show that this finding can be justified theoretically.
An optimization criterion for generalized discriminant analysis on undersampled problems
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2004
"... Abstract—An optimization criterion is presented for discriminant analysis. The criterion extends the optimization criteria of the classical Linear Discriminant Analysis (LDA) through the use of the pseudoinverse when the scatter matrices are singular. It is applicable regardless of the relative size ..."
Abstract

Cited by 50 (9 self)
 Add to MetaCart
(Show Context)
Abstract—An optimization criterion is presented for discriminant analysis. The criterion extends the optimization criteria of the classical Linear Discriminant Analysis (LDA) through the use of the pseudoinverse when the scatter matrices are singular. It is applicable regardless of the relative sizes of the data dimension and sample size, overcoming a limitation of classical LDA. The optimization problem can be solved analytically by applying the Generalized Singular Value Decomposition (GSVD) technique. The pseudoinverse has been suggested and used for undersampled problems in the past, where the data dimension exceeds the number of data points. The criterion proposed in this paper provides a theoretical justification for this procedure. An approximation algorithm for the GSVDbased approach is also presented. It reduces the computational complexity by finding subclusters of each cluster and uses their centroids to capture the structure of each cluster. This reduced problem yields much smaller matrices to which the GSVD can be applied efficiently. Experiments on text data, with up to 7,000 dimensions, show that the approximation algorithm produces results that are close to those produced by the exact algorithm. Index Terms—Classification, clustering, dimension reduction, generalized singular value decomposition, linear discriminant analysis, text mining. 1
Bagging for linear classifiers
 Pattern Recognition
, 1998
"... Classifiers built on small training sets are usually biased or unstable. Different techniques exist to construct more stable classifiers. It is not clear which ones are good, and whether they really stabilize the classifier or just improve the performance. In this paper bagging (bootstrapping and ag ..."
Abstract

Cited by 29 (10 self)
 Add to MetaCart
(Show Context)
Classifiers built on small training sets are usually biased or unstable. Different techniques exist to construct more stable classifiers. It is not clear which ones are good, and whether they really stabilize the classifier or just improve the performance. In this paper bagging (bootstrapping and aggregating (1) ) is studied for a number of linear classifiers. A measure for the instability of classifiers is introduced. The influence of regularization and bagging on this instability and the generalization error of linear classifiers is investigated. In a simulation study it is shown that in general bagging is not a stabilizing technique. It is also demonstrated that one can consider the instability of the classifier to predict how useful bagging will be. Finally, it is shown experimentally that bagging might improve the performance of the classifier only for very unstable situations.
Patient Classification of fMRI Activation Maps
 in Proc. of the 6th Annual International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI'03
, 2003
"... The analysis of brain activations using functional magnetic resonance imaging (fMRI) is an active area of neuropsychological research. ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
The analysis of brain activations using functional magnetic resonance imaging (fMRI) is an active area of neuropsychological research.
ADAPTIVE FEATURE SPACES FOR LAND COVER CLASSIFICATION WITH LIMITED GROUND TRUTH DATA
, 2003
"... Classification of land cover based on hyperspectral data is very challenging because typically tens of classes with uneven priors are involved, the inputs are high dimensional, and there is often scarcity of labeled data. Several researchers have observed that it is often preferable to decompose a m ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
Classification of land cover based on hyperspectral data is very challenging because typically tens of classes with uneven priors are involved, the inputs are high dimensional, and there is often scarcity of labeled data. Several researchers have observed that it is often preferable to decompose a multiclass problem into multiple twoclass problems, solve each such subproblem using a suitable binary classifier, and then combine the outputs of this collection of classifiers in a suitable manner to obtain the answer to the original multiclass problem. This approach is taken by the popular error correcting output codes (ECOC) technique, as well by the binary hierarchical classifier (BHC). Classical techniques for dealing with small sample sizes include regularization of covariance matrices and feature reduction. In this paper we address the twin problems of small sample sizes and multiclass settings by proposing a feature reduction scheme that adaptively adjusts to the amount of labeled data available. This scheme can be used in conjunction with ECOC and the BHC, as well as other approaches such as roundrobin classification that decompose a multiclass problem into a number of two (meta)class problems. In particular, we develop the bestbasis binary hierarchical classifier (BBBHC) and best basis
Classification of SPECT Images of Normal Subjects versus Images of Alzheimer's Disease Patients
 in 4th Int. Conf. on Medical Image Computing and ComputerAssisted Intervention (MICCAI'01
, 2001
"... This wo rk aims atpro vidi g ato o lto assist the i terpretatio o SPECT images fo the diag ogno Alzheimer's Disease (AD). ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
This wo rk aims atpro vidi g ato o lto assist the i terpretatio o SPECT images fo the diag ogno Alzheimer's Disease (AD).
Relational Discriminant Analysis and Its Large Sample Size Problem
 ICPR’98, Proc. 14th Int. Conference on Pattern Recognition (Brisbane, Aug. 1620), IEEE Computer Society Press, Los Alamitos
, 1998
"... Relational discriminant analysis is based on a similarity matrix of the training set. It is able to construct reliable nonlinear discriminants in infinite dimensional feature spaces based on small training sets. This technique has a large sample size problem as the size of the similarity matrix equ ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
(Show Context)
Relational discriminant analysis is based on a similarity matrix of the training set. It is able to construct reliable nonlinear discriminants in infinite dimensional feature spaces based on small training sets. This technique has a large sample size problem as the size of the similarity matrix equals the square of the number of objects in the training set. In this paper we discuss and initially evaluate a solution that drastically decreases training times and memory demands. 1.
Efficient pseudoinverse linear discriminant analysis and its nonlinear form for face recognition
 International Journal of Pattern Recognition and Artifcial Intelligence, Accepted
, 2007
"... Abstract. Pseudoinverse Linear Discriminant Analysis (PLDA) is a classical and pioneer method that deals with the Small Sample Size (SSS) problem in LDA when applied to such application as face recognition. However, it is expensive in computation and storage due to manipulating on extremely large d ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Pseudoinverse Linear Discriminant Analysis (PLDA) is a classical and pioneer method that deals with the Small Sample Size (SSS) problem in LDA when applied to such application as face recognition. However, it is expensive in computation and storage due to manipulating on extremely large d × d matrices, where d is the dimensionality of the sample image. As a result, although frequently cited in literature, PLDA is hardly compared in terms of classification performance with the newly proposed methods. In this paper, we propose a new feature extraction method named RSw+LDA, which is 1) much more efficient than PLDA in both computation and storage; and 2) theoretically equivalent to PLDA, meaning that it produces the same projection matrix as PLDA. Our experimental results on AR face dataset, a challenging dataset with variations in expression, lighting and occlusion, show that PLDA (or RSw+LDA) can achieve significantly higher classification accuracy than the recently proposed Linear Discriminant Analysis via QR decomposition and Discriminant Common Vectors.
RANDOM FORESTS OF BINARY HIERARCHICAL CLASSIFIERS FOR ANALYSIS OF HYPERSPECTRAL DATA
"... Abstract – Statistical classification of hyperspectral data is challenging because the input space is high in dimension and correlated, but labeled information to characterize the class distributions is typically sparse. The resulting classifiers are often unstable and have poor generalization. A ne ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract – Statistical classification of hyperspectral data is challenging because the input space is high in dimension and correlated, but labeled information to characterize the class distributions is typically sparse. The resulting classifiers are often unstable and have poor generalization. A new approach that is based on the concept of random forests of classifiers and implemented within a multiclassifier system arranged as a binary hierarchy is proposed. The primary goal is to achieve improved generalization of the classifier in analysis of hyperspectral data, particularly when the quantity of training data is limited. The new classifier incorporates bagging of training samples and adaptive random subspace feature selection with the Binary Hierarchical Classifier (BHC), such that the number of features that is selected at each node of the tree is dependent on the quantity of associated training data. Classification results from experiments on data acquired by the Hyperion sensor on the NASA EO1 satellite over the Okavango Delta of Botswana are superior to those from our original best basis BHC algorithm, a random subspace extension of the BHC, and a random forest implementation using the CART classifier.