Results 1  10
of
20
Functional bregman divergence and bayesian estimation of distributions
 CoRR
"... Abstract—A class of distortions termed functional Bregman divergences is defined, which includes squared error and relative entropy. A functional Bregman divergence acts on functions or distributions, and generalizes the standard Bregman divergence for vectors and a previous pointwise Bregman diverg ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
Abstract—A class of distortions termed functional Bregman divergences is defined, which includes squared error and relative entropy. A functional Bregman divergence acts on functions or distributions, and generalizes the standard Bregman divergence for vectors and a previous pointwise Bregman divergence that was defined for functions. A recent result showed that the mean minimizes the expected Bregman divergence. The new functional definition enables the extension of this result to the continuous case to show that the mean minimizes the expected functional Bregman divergence over a set of functions or distributions. It is shown how this theorem applies to the Bayesian estimation of distributions. Estimation of the uniform distribution from independent and identically drawn samples is presented as a case study. Index Terms—Bayesian estimation, Bregman divergence, convexity, Fréchet derivative, uniform distribution.
Completely lazy learning
 IEEE Transactions on Knowledge and Data Engineering
"... AbstractLocal classifiers are sometimes called lazy learners because they do not train a classifier until presented with a test sample. However, such methods are generally not completely lazy, because the neighborhood size k (or other locality parameter) is usually chosen by crossvalidation on th ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
AbstractLocal classifiers are sometimes called lazy learners because they do not train a classifier until presented with a test sample. However, such methods are generally not completely lazy, because the neighborhood size k (or other locality parameter) is usually chosen by crossvalidation on the training set, which can require significant preprocessing and risks overfitting. We propose a simple alternative to crossvalidation of the neighborhood size that requires no preprocessing: instead of committing to one neighborhood size, average the discriminants for multiple neighborhoods. We show that this forms an expected estimated posterior that minimizes the expected Bregman loss with respect to the uncertainty about the neighborhood choice. We analyze this approach for six standard and stateoftheart local classifiers, including discriminative adaptive metric kNN (DANN), a local support vector machine (SVMKNN), hyperplane distance nearestneighbor (HKNN) and a new local Bayesian quadratic discriminant analysis (local BDA). The empirical effectiveness of this technique vs. crossvalidation is confirmed with experiments on seven benchmark datasets, showing that similar classification performance can be attained without any training.
Functional Bregman divergence
 INT. SYMP. INF. THEORY
, 2008
"... To characterize the differences between two positive functions or two distributions, a class of distortion functions has recently been defined termed the functional Bregman divergences. The class generalizes the standard Bregman divergence defined for vectors, and includes total squared difference ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
To characterize the differences between two positive functions or two distributions, a class of distortion functions has recently been defined termed the functional Bregman divergences. The class generalizes the standard Bregman divergence defined for vectors, and includes total squared difference and relative entropy. Recently a key property was discovered for the vector Bregman divergence: that the mean minimizes the average Bregman divergence for a finite set of vectors. In this paper the analog result is proven: that the mean function minimizes the average Bregman divergence for a set of positive functions that can be parameterized by a finite number of parameters. In addition, the relationship of the functional Bregman divergence to the vector Bregman divergence and pointwise Bregman divergence is stated, as well as some important properties.
Weighted Nearest Neighbor Classifiers and Firstorder Error
, 2009
"... Weighted nearestneighbor classification is analyzed in terms of squared error of class probability estimates. Two classes of algorithms for calculating weights are studied with respect to their ability to minimize the firstorder term of the squared error: local linear regression and a new class te ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Weighted nearestneighbor classification is analyzed in terms of squared error of class probability estimates. Two classes of algorithms for calculating weights are studied with respect to their ability to minimize the firstorder term of the squared error: local linear regression and a new class termed regularized linear interpolation. A number of variants of each class are considered or proposed, and compared analytically and by simulations and experiments on benchmark datasets. The experiments establish that weighting methods which aim to minimize firstorder error can perform significantly better than standard kNN, particularly in highdimensions. Regularization functions, the fitted surfaces, crossvalidated neighborhood size, and the effect of highdimensionality are also analyzed. 1
Robust Phoneme Classification: Exploiting The Adaptability of Acoustic Waveform Models
 in Proceedings of EUSIPCO
, 2009
"... The robustness of classification of isolated phoneme segments using generative classifiers is investigated for the acoustic waveform, MFCC and PLP speech representations. Gaussian mixture models with diagonal covariance matrices are used followed by maximum likelihood classification. The performance ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
The robustness of classification of isolated phoneme segments using generative classifiers is investigated for the acoustic waveform, MFCC and PLP speech representations. Gaussian mixture models with diagonal covariance matrices are used followed by maximum likelihood classification. The performance of noise adapted acoustic waveform models is compared with PLP and MFCC models that were adapted using noisy training set feature standardisation. In the presence of additive noise, acoustic waveforms have significantly lower classification error. Even for the unrealistic case where PLP and MFCC classifiers are trained and tested in exactly matched noise conditions acoustic waveform classifiers still outperform them. In both cases the acoustic waveform classifiers are trained explicitly only on quiet data and then modified by a simple transformation to account for the noise.
Comparison of Visible, Thermal InfraRed and Range Images for Face Recognition
"... Abstract. Existing literature compares various biometric modalities of the face for human identification. The common criterion used for comparison is the recognition rate of different face modalities using the same recognition algorithms. Such comparisons are not completely unbiased as the same reco ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Existing literature compares various biometric modalities of the face for human identification. The common criterion used for comparison is the recognition rate of different face modalities using the same recognition algorithms. Such comparisons are not completely unbiased as the same recognition algorithm or features may not be suitable for every modality of the face. Moreover, an important aspect which is overlooked in these comparisons is the amount of variation present in each modality which will ultimately effect the database size each modality can handle. This paper presents such a comparison between the most common biometric modalities of the face namely visible, thermal infrared and range images. Experiments are performed on the Equinox and the FRGC databases with results indicating that visible images capture more interpersonal variations of the human face compared to thermal IR and range images. We conclude that under controlled conditions, visible face images have a greater potential of accommodating large databases compared to longwave IR and range images. 1
4. TITLE AND SUBTITLE Generative Models for Similaritybased Classification
, 2007
"... Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments ..."
Abstract
 Add to MetaCart
(Show Context)
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,
CLASSIFYING LINEAR SYSTEM OUTPUTS BY ROBUST LOCAL BAYESIAN QUADRATIC DISCRIMINANT ANALYSIS ON LINEAR ESTIMATORS
"... ABSTRACT We consider the problem of assigning a class label to the noisy output of a linear system, where clean feature examples are available for training. We design a robust classifier that operates on a linear estimate, with uncertainty modeled by a Gaussian distribution with parameters derived ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT We consider the problem of assigning a class label to the noisy output of a linear system, where clean feature examples are available for training. We design a robust classifier that operates on a linear estimate, with uncertainty modeled by a Gaussian distribution with parameters derived from the bias and covariance of a linear estimator. Classconditional distributions are modeled locally as Gaussians. Since estimation of Gaussian parameters from few training samples can be illposed, we extend recent work in Bayesian quadratic discriminant analysis to derive a robust local generative classifier. Experiments show a statistically significant improvement over prior art.