Results 1  10
of
65
An introduction to variable and feature selection
 Journal of Machine Learning Research
, 2003
"... Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. ..."
Abstract

Cited by 1352 (16 self)
 Add to MetaCart
(Show Context)
Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available.
Dimensionality Reduction via Sparse Support Vector Machines
 Journal of Machine Learning Research
, 2003
"... We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to prod ..."
Abstract

Cited by 121 (14 self)
 Add to MetaCart
We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to produce a final nonlinear model. The method exploits the fact that a linear SVM (no kernels) with # 1 norm regularization inherently performs variable selection as a sidee#ect of minimizing capacity of the SVM model. The distribution of the linear model weights provides a mechanism for ranking and interpreting the e#ects of variables.
An Improved Approximation Algorithm for the Column Subset Selection Problem
"... We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel twostage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stag ..."
Abstract

Cited by 74 (13 self)
 Add to MetaCart
(Show Context)
We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel twostage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O(k log k) columns according to a judiciouslychosen probability distribution that depends on information in the topk right singular subspace of A. In the second stage (the deterministic stage), the algorithm applies a deterministic columnselection procedure to select and return exactly k columns from the set of columns selected in the first stage. Let C be the m × k matrix containing those k columns, let PC denote the projection matrix onto the span of those columns, and let Ak denote the “best ” rankk approximation to the matrix A as computed with the singular value decomposition. Then, we prove that ‖A − PCA‖2 ≤ O k 3 4 log 1
Feature selection with ensembles, artificial variables, and redundancy elimination
 JMLR
, 2009
"... Predictive models benefit from a compact, nonredundant subset of features that improves interpretability and generalization. Modern data sets are wide, dirty, mixed with both numerical and categorical predictors, and may contain interactive effects that require complex models. This is a challenge f ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
(Show Context)
Predictive models benefit from a compact, nonredundant subset of features that improves interpretability and generalization. Modern data sets are wide, dirty, mixed with both numerical and categorical predictors, and may contain interactive effects that require complex models. This is a challenge for filters, wrappers, and embedded feature selection methods. We describe details of an algorithm using treebased ensembles to generate a compact subset of nonredundant features. Parallel and serial ensembles of trees are combined into a mixed method that can uncover masking and detect features of secondary effect. Simulated and actual examples illustrate the effectiveness of the approach.
Performance prediction challenge
 In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2006
, 2006
"... Abstract — A major challenge for machine learning algorithms in real world applications is to predict their performance. We have approached this question by organizing a challenge in performance prediction for WCCI 2006. The class of problems addressed are classification problems encountered in patt ..."
Abstract

Cited by 21 (12 self)
 Add to MetaCart
Abstract — A major challenge for machine learning algorithms in real world applications is to predict their performance. We have approached this question by organizing a challenge in performance prediction for WCCI 2006. The class of problems addressed are classification problems encountered in pattern recognition (classification of images, speech recognition), medical diagnosis, marketing (customer categorization), text categorization (filtering of spam). Over 100 participants have been trying to build the best possible classifier from training data and guess their generalization error on a large unlabeled test set. The challenge scores indicate that crossvalidation yields good results both for model selection and performance prediction. Alternative model selection strategies were also sometimes employed with success. The challenge web site keeps open for postchallenge submissions:
Feature selection for Descriptor based Classification Models
 Part II  Human Intestinal Absorption (HIA). J. Chem. Inf. Comput. Sci
, 2003
"... The paper describes different aspects of classification models based on molecular data sets with the focus on feature selection methods. Especially model quality and avoiding a high variance on unseen data (overfitting) will be discussed with respect to the feature selection problem. We present seve ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
The paper describes different aspects of classification models based on molecular data sets with the focus on feature selection methods. Especially model quality and avoiding a high variance on unseen data (overfitting) will be discussed with respect to the feature selection problem. We present several standard approaches and modifications of our Genetic Algorithm based on the Shannon Entropy Cliques (GASEC) algorithm and the extension for classification problems using boosting.
Embedded Methods
"... Although many embedded feature selection methods have been introduced during the last few years, a unifying theoretical framework has not been developed to date. We start this chapter by defining such a framework which we think is general enough to cover many embedded methods. We will then discuss e ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Although many embedded feature selection methods have been introduced during the last few years, a unifying theoretical framework has not been developed to date. We start this chapter by defining such a framework which we think is general enough to cover many embedded methods. We will then discuss embedded methods based on how they solve the feature selection problem.
Ensemble Feature Ranking
 Proceedings of ECMLPKDD’04
, 2004
"... Abstract. A crucial issue for Machine Learning and Data Mining is Feature Selection, selecting the relevant features in order to focus the learning search. A relaxed setting for Feature Selection is known as Feature Ranking, ranking the features with respect to their relevance. This paper proposes a ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
(Show Context)
Abstract. A crucial issue for Machine Learning and Data Mining is Feature Selection, selecting the relevant features in order to focus the learning search. A relaxed setting for Feature Selection is known as Feature Ranking, ranking the features with respect to their relevance. This paper proposes an ensemble approach for Feature Ranking, aggregating feature rankings extracted along independent runs of an evolutionary learning algorithm named ROGER. The convergence of ensemble feature ranking is studied in a theoretical perspective, and a statistical model is devised for the empirical validation, inspired from the complexity framework proposed in the Constraint Satisfaction domain. Comparative experiments demonstrate the robustness of the approach for learning (a limited kind of) nonlinear concepts, specifically when the features significantly outnumber the examples. 1
M.: Prospect for a Silent Speech Interface Using Ultrasound Imaging
 In: International Conference on Acoustics, Speech and Signal Processing
, 2006
"... The feasibility of a silent speech interface using ultrasound (US) imaging and lip profile video is investigated by examining the quality of line spectral frequencies (LSF) derived from the image sequences. It is found that the data do not at present allow reliable identification of silences and fri ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
The feasibility of a silent speech interface using ultrasound (US) imaging and lip profile video is investigated by examining the quality of line spectral frequencies (LSF) derived from the image sequences. It is found that the data do not at present allow reliable identification of silences and fricatives, but that LSF’s recovered from vocalized passages are compatible with the synthesis of intelligible speech. 1.
BLIND SOURCE SEPARATION AND SPARSE BUMP MODELLING OF TIME FREQUENCY REPRESENTATION OF EEG SIGNALS: NEW TOOLS FOR EARLY DETECTION OF ALZHEIMER’S DISEASE
"... The early detection of Alzheimer’s disease (AD) is an important challenge. In this paper, we propose a novel method for early detection of AD using only electroencephalographic (EEG) recordings for patients with Mild Cognitive Impairment (MCI) without any clinical symptoms of the disease who later d ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
The early detection of Alzheimer’s disease (AD) is an important challenge. In this paper, we propose a novel method for early detection of AD using only electroencephalographic (EEG) recordings for patients with Mild Cognitive Impairment (MCI) without any clinical symptoms of the disease who later developed AD. In our method, first a blind source separation algorithm is applied to extract the most significant spatiotemporal uncorrelated components; afterward these components are wavelet transformed; subsequently the wavelets or more generally time frequency representation (TFR) is approximated with sparse bump modeling approach. Finally, reliable and discriminant features are selected and reduced with orthogonal forward regression and the random probe methods. The proposed features were finally fed to a simple neural network classifier. The presented method leads to a substantially improved performance (93 % correctly classified improved sensitivity and specificity) over classification results previously published on the same set of data. We hope that the new computational and machine learning tools provide some new insights in a wide range of clinical settings, both diagnostic and predictive.