Results 1  10
of
56
An introduction to variable and feature selection
 Journal of Machine Learning Research
, 2003
"... Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. ..."
Abstract

Cited by 1283 (16 self)
 Add to MetaCart
(Show Context)
Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available.
SPECTRUM ESTIMATION FOR LARGE DIMENSIONAL COVARIANCE MATRICES USING RANDOM MATRIX THEORY
 SUBMITTED TO THE ANNALS OF STATISTICS
"... Estimating the eigenvalues of a population covariance matrix from a sample covariance matrix is a problem of fundamental importance in multivariate statistics; the eigenvalues of covariance matrices play a key role in many widely techniques, in particular in Principal Component Analysis (PCA). In ma ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
Estimating the eigenvalues of a population covariance matrix from a sample covariance matrix is a problem of fundamental importance in multivariate statistics; the eigenvalues of covariance matrices play a key role in many widely techniques, in particular in Principal Component Analysis (PCA). In many modern data analysis problems, statisticians are faced with large datasets where the sample size, n, is of the same order of magnitude as the number of variables p. Random matrix theory predicts that in this context, the eigenvalues of the sample covariance matrix are not good estimators of the eigenvalues of the population covariance. We propose to use a fundamental result in random matrix theory, the MarčenkoPastur equation, to better estimate the eigenvalues of large dimensional covariance matrices. The MarčenkoPastur equation holds in very wide generality and under weak assumptions. The estimator we obtain can be thought of as “shrinking ” in a non linear fashion the eigenvalues of the sample covariance matrix to estimate the population eigenvalues. Inspired by ideas of random matrix theory, we also suggest a change of point of view when thinking about estimation of highdimensional vectors: we do not try to estimate directly the vectors but rather a probability measure that describes them. We think this is a theoretically more fruitful way to think about these problems. Our estimator gives fast and good or very good results in extended simulations. Our algorithmic approach is based on convex optimization. We also show that the proposed estimator is consistent.
Testing for Homogeneity with Kernel Fisher Discriminant Analysis
"... We propose to investigate test statistics for testing homogeneity based on kernel Fisher discriminant analysis. Asymptotic null distributions under null hypothesis are derived, and consistency against fixed alternatives is assessed. Finally, experimental evidence of the performance of the proposed a ..."
Abstract

Cited by 29 (14 self)
 Add to MetaCart
(Show Context)
We propose to investigate test statistics for testing homogeneity based on kernel Fisher discriminant analysis. Asymptotic null distributions under null hypothesis are derived, and consistency against fixed alternatives is assessed. Finally, experimental evidence of the performance of the proposed approach on both artificial and real datasets is provided. 1
A probabilistic model for chord progressions
 In Proceedings of the International Conference on Music Information Retrieval
, 2005
"... Chord progressions are the building blocks from which tonal music is constructed. Inferring chord progressions is thus an essential step towards modeling long term dependencies in music. In this paper, a distributed representation for chords is designed such that Euclidean distances roughly correspo ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
Chord progressions are the building blocks from which tonal music is constructed. Inferring chord progressions is thus an essential step towards modeling long term dependencies in music. In this paper, a distributed representation for chords is designed such that Euclidean distances roughly correspond to psychoacoustic dissimilarities. Estimated probabilities of chord substitutions are derived from this representation and are used to introduce smoothing in graphical models observing chord progressions. Parameters in the graphical models are learnt with the EM algorithm and the classical Junction Tree algorithm is used for inference. Various model architectures are compared in terms of conditional outofsample likelihood. Both perceptual and statistical evidence show that binary trees related to meter are well suited to capture chord dependencies. 1
Tradeoffs in the Empirical Evaluation of Competing Algorithm Designs
"... Abstract. We propose an empirical analysis approach for characterizing tradeoffs between different methods for comparing a set of competing algorithm designs. Our approach can provide insight into performance variation both across candidate algorithms and across instances. It can also identify the b ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We propose an empirical analysis approach for characterizing tradeoffs between different methods for comparing a set of competing algorithm designs. Our approach can provide insight into performance variation both across candidate algorithms and across instances. It can also identify the best tradeoff between evaluating a larger number of candidate algorithm designs, performing these evaluations on a larger number of problem instances, and allocating more time to each algorithm run. We applied our approach to a study of the rich algorithm design spaces offered by three highlyparameterized, stateoftheart algorithms for satisfiability and mixed integer programming, considering six different distributions of problem instances. We demonstrate that the resulting algorithm design scenarios differ in many ways, with important consequences for both automatic and manual algorithm design. We expect that both our methods and our findings will lead to tangible improvements in algorithm design methods.
Detecting Correlation in Stock Market
 Physica A: Statistical Mechanics and its Applications, Volume 344, Issues 12
, 2004
"... We present a new method for detecting dependencies in the stock market. In order to find hidden correlations in the daily returns, we build cross prediction models and use the normalized modeling error as a generalized correlation measure that extends the concept of the classical correlation matrix. ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
We present a new method for detecting dependencies in the stock market. In order to find hidden correlations in the daily returns, we build cross prediction models and use the normalized modeling error as a generalized correlation measure that extends the concept of the classical correlation matrix.
Regularized allpole models for speaker verification under noisy environments
 IEEE Sig. Proc. Lett
, 2012
"... Regularization of linear prediction based melfrequency cepstral coefficient (MFCC) extraction in speaker verification is considered. Commonly, MFCCs are extracted from the discrete Fourier transform (DFT) spectrum of speech frames. In our recent study, it was shown that replacing the DFT spectrum ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Regularization of linear prediction based melfrequency cepstral coefficient (MFCC) extraction in speaker verification is considered. Commonly, MFCCs are extracted from the discrete Fourier transform (DFT) spectrum of speech frames. In our recent study, it was shown that replacing the DFT spectrum estimation step with the conventional and temporally weighted linear prediction (LP) and their regularized versions increases the recognition performance considerably. In this paper, we provide a through analysis on the regularization of conventional and temporally weighted LP methods. Experiments on the NIST 2002 corpus indicate that regularized allpole methods yield large improvements on recognition accuracy under additive factory and babble noise conditions in terms of both equal error rate (EER) and minimum detection cost function (MinDCF). 1.
VFOLD CROSSVALIDATION IMPROVED: VFOLD PENALIZATION
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2008
"... We study the efficiency of Vfold crossvalidation (VFCV) for model selection from the nonasymptotic viewpoint, and suggest an improvement on it, which we call “Vfold penalization”. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for m ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
We study the efficiency of Vfold crossvalidation (VFCV) for model selection from the nonasymptotic viewpoint, and suggest an improvement on it, which we call “Vfold penalization”. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for model selection, because it “overpenalizes ” all the more that V is large. Hence, asymptotic optimality requires V to go to infinity. However, when the signaltonoise ratio is low, it appears that overpenalizing is necessary, so that the optimal V is not always the larger one, despite of the variability issue. This is confirmed by some simulated data. In order to improve on the prediction performance of VFCV, we define a new model selection procedure, called “Vfold penalization” (penVF). It is a Vfold subsampling version of Efron’s bootstrap penalties, so that it has the same computational cost as VFCV, while being more flexible. In a heteroscedastic regression framework, assuming the models to have a particular structure, we prove that penVF satisfies a nonasymptotic oracle inequality with a leading constant that tends to 1 when the sample size goes to infinity. In particular, this implies adaptivity to the smoothness of the regression function, even with a highly heteroscedastic noise. Moreover, it is easy to overpenalize with penVF, independently from the V parameter. A simulation study shows that this results in a significant improvement on VFCV in nonasymptotic situations.
Feature selection for satellite image indexing
 In ESAEUSC: Image Information Mining
, 2005
"... Lots of feature extraction procedures can be found in the literature. We propose to automatically select the best ones using automatic supervised feature selection algorithms. We demonstrate the efficiency of such a methodology in comparing texture features sets (using Haralick coefficients, Gaussia ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Lots of feature extraction procedures can be found in the literature. We propose to automatically select the best ones using automatic supervised feature selection algorithms. We demonstrate the efficiency of such a methodology in comparing texture features sets (using Haralick coefficients, Gaussian Markov Random Fields, several wavelets,...) computed on satellite images. We illustrate the fact that geometrical features and texture features have to be combined to enhance the discriminative power of the selected features set. Key words: Feature Selection, supervised classification, satellite image indexing.
Practical feature selection: from correlation to causality. Mining Massive Data Sets for Security
 Advances in Data Mining, Search, Social Networks and Text Mining, and their Applications to Security
, 2008
"... Feature selection encompasses a wide variety of methods for selecting a restricted number of input variables or “features”, which are “relevant ” to a problem at hand. In this report, we guide practitioners through the maze of methods, which have recently appeared in the literature, particularly for ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Feature selection encompasses a wide variety of methods for selecting a restricted number of input variables or “features”, which are “relevant ” to a problem at hand. In this report, we guide practitioners through the maze of methods, which have recently appeared in the literature, particularly for supervised feature selection. Starting from the simplest methods of feature ranking with correlation coefficients, we branch in various direction and explore various topics, including “conditional relevance”, “local relevance”, “multivariate selection”, and “causal relevance”. We make recommendations for assessment methods and stress the importance of matching the complexity of the method employed to the available amount of training data. Software and teaching material associated with this tutorial are available [12].