#### DMCA

## Fast Binary Feature Selection with Conditional Mutual Information (2004)

### Cached

### Download Links

- [cvlab.epfl.ch]
- [www.jmlr.org]
- [jmlr.csail.mit.edu]
- [sci2s.ugr.es]
- [jmlr.org]
- [www.idiap.ch]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Machine Learning Research |

Citations: | 175 - 1 self |

### Citations

13215 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...r (Duda and Hart, 1973; Langley et al., 1992) based on features chosen with our criterion achieves error rates similar or lower than AdaBoost (Freund and Schapire, 1996a) or SVMs (Boser et al., 1992; =-=Vapnik, 1998-=-; Christiani and Shawe-Taylor, 2000). Also, experiments show the robustness of this method when challenged by noisy training sets. In such a context, it actually achieves better results than regulariz... |

4842 |
Pattern Classification and Scene Analysis
- Duda, Hart
(Show Context)
Citation Context ...cceptance, while our algorithm does not. Experiments demonstrate that CMIM outperforms the other feature selection methods we have implemented. Results also show and that a naive Bayesian classifier (=-=Duda and Hart, 1973-=-; Langley et al., 1992) based on features chosen with our criterion achieves error rates similar or lower than AdaBoost (Freund and Schapire, 1996a) or SVMs (Boser et al., 1992; Vapnik, 1998; Christia... |

3648 | Bagging predictors - Breiman - 1996 |

2379 |
An introduction to support vector machines and other kernel-based learning methods
- Cristianini, Shawe-Taylor
- 2000
(Show Context)
Citation Context ...rt, 1973; Langley et al., 1992) based on features chosen with our criterion achieves error rates similar or lower than AdaBoost (Freund and Schapire, 1996a) or SVMs (Boser et al., 1992; Vapnik, 1998; =-=Christiani and Shawe-Taylor, 2000-=-). Also, experiments show the robustness of this method when challenged by noisy training sets. In such a context, it actually achieves better results than regularized AdaBoost, even though it does no... |

2213 | Experiments with a new boosting algorithm
- Freund, Schapire
- 1996
(Show Context)
Citation Context ...ed. Results also show and that a naive Bayesian classifier (Duda and Hart, 1973; Langley et al., 1992) based on features chosen with our criterion achieves error rates similar or lower than AdaBoost (=-=Freund and Schapire, 1996-=-a) or SVMs (Boser et al., 1992; Vapnik, 1998; Christiani and Shawe-Taylor, 2000). Also, experiments show the robustness of this method when challenged by noisy training sets. In such a context, it act... |

1863 | A training algorithm for optimal margin classi er
- Boser, Guyon, et al.
- 1992
(Show Context)
Citation Context ...e Bayesian classifier (Duda and Hart, 1973; Langley et al., 1992) based on features chosen with our criterion achieves error rates similar or lower than AdaBoost (Freund and Schapire, 1996a) or SVMs (=-=Boser et al., 1992-=-; Vapnik, 1998; Christiani and Shawe-Taylor, 2000). Also, experiments show the robustness of this method when challenged by noisy training sets. In such a context, it actually achieves better results ... |

1748 | Additive logistic regression: a statistical view of boosting (with discussion). Annals of Statistics - Friedman, Hastie, et al. - 2000 |

1567 | Wrappers for feature subset selection
- Kohavi, John
- 1997
(Show Context)
Citation Context ...r on the selection of a few tens of binary features among a several tens of thousands in a context of classification. Feature selection methods can be classified into two types, filters and wrappers (=-=Kohavi and John, 1997-=-; Das, 2001). The first kind are classifier agnostic, as they are not dedicated to a specific type of classification method. On the contrary the wrappers rely on the performance of one type of classif... |

1352 | An introduction to variable and feature selection,” The - Guyon, Elisseeff - 2003 |

1143 | The perceptron: A probabilistic model for information storage and organization in the brain
- Rosenblatt
- 1958
(Show Context)
Citation Context ...gn of a function of the form f(x1, ..., xN) = K ∑ ωk xν(k) + b. k=1 We have used two algorithms to estimate the (ω1, ..., ωK) and b from the training set L. The first one is the classical perceptron (=-=Rosenblatt, 1958-=-; Novikoff, 1962) and the second one is the naive Bayesian classifier (Duda and Hart, 1973; Langley et al., 1992). 3.1.1 PERCEPTRON The perceptron learning scheme estimates iteratively the normal vect... |

568 | Support vector machine classification and validation of cancer tissue samples using microarray expression data - Furey, Cristianini, et al. - 2000 |

478 | Toward Optimal Feature Selection - Koller, Sahami - 1996 |

439 | An analysis of Bayesian classifier
- Langley, Iba, et al.
- 1992
(Show Context)
Citation Context ...algorithm does not. Experiments demonstrate that CMIM outperforms the other feature selection methods we have implemented. Results also show and that a naive Bayesian classifier (Duda and Hart, 1973; =-=Langley et al., 1992-=-) based on features chosen with our criterion achieves error rates similar or lower than AdaBoost (Freund and Schapire, 1996a) or SVMs (Boser et al., 1992; Vapnik, 1998; Christiani and Shawe-Taylor, 2... |

358 | Using mutual information for selecting features in supervised neural net learning
- Battiti
- 1994
(Show Context)
Citation Context ...l predictive power, which can be estimated by various means such as Fisher score (Furey et al., 2000), Kolmogorov-Smirnov test, Pearson correlation (Miyahara and Pazzani, 2000) or mutual information (=-=Battiti, 1994-=-; Bonnlander and Weigend, 1996; Torkkola, 2003). Selection based on such a ranking does not ensure weak dependency among features, and can lead to redundant and thus less informative selected families... |

274 | Feature selection for high-dimensional data: a fast correlation-based filter solution
- Yu, Liu
- 2003
(Show Context)
Citation Context ... additional information about the class to predict. Thus, this criterion ensures a good tradeoff between independence and discrimination. A very similar solution called Fast Correlation-Based Filter (=-=Yu and Liu, 2003-=-) selects features which are highly correlated with the class to predict if they are less correlated to any feature already selected. This criterion is very closed to ours but does not rely on a uniqu... |

162 | Feature extraction by non parametric mutual information maximization - Torkkola - 2003 |

159 | Game theory, on-line prediction and boosting.
- Freund, Schapire
- 1996
(Show Context)
Citation Context ...ed. Results also show and that a naive Bayesian classifier (Duda and Hart, 1973; Langley et al., 1992) based on features chosen with our criterion achieves error rates similar or lower than AdaBoost (=-=Freund and Schapire, 1996-=-a) or SVMs (Boser et al., 1992; Vapnik, 1998; Christiani and Shawe-Taylor, 2000). Also, experiments show the robustness of this method when challenged by noisy training sets. In such a context, it act... |

156 | Boosting algorithms as gradient descent. In:
- Mason, Baxter, et al.
- 2000
(Show Context)
Citation Context ...d, and the distribution is refreshed to increase the weight of the misclassified samples and reduce the importance of the others. Boosting can be seen as a functional gradient descent (Breiman, 2000; =-=Mason et al., 2000-=-; Friedman et al., 2000) in which each added weak learner is a step in the space of classifiers. From that perspective, the weight of a given sample is proportional to the derivative of the functional... |

156 |
On convergence proofs on perceptrons
- Novikoff
- 1962
(Show Context)
Citation Context ...f the form f(x1, ..., xN) = K ∑ ωk xν(k) + b. k=1 We have used two algorithms to estimate the (ω1, ..., ωK) and b from the training set L. The first one is the classical perceptron (Rosenblatt, 1958; =-=Novikoff, 1962-=-) and the second one is the naive Bayesian classifier (Duda and Hart, 1973; Langley et al., 1992). 3.1.1 PERCEPTRON The perceptron learning scheme estimates iteratively the normal vector (ω1, ..., ωK)... |

107 | wrappers and a boosting-based hybrid for feature Selection - Das, Filters - 2001 |

84 | Joint induction of shape features and tree classifiers,
- Amit, Geman, et al.
- 1997
(Show Context)
Citation Context ...he choice of a feature in a binary tree depends on the statistical behavior conditionally on the values of the ones picked above. Efficiency was increased on our specific task by using randomization (=-=Amit et al., 1997-=-) which consist of using random subsets of the features instead of random subsets of training examples as in bagging (Breiman, 1999, 1996). We have built 50 trees, each with one half of the features s... |

72 | Result analysis of the nips 2003 feature selection challenge.
- Guyon, Hur, et al.
- 2004
(Show Context)
Citation Context ... size 800, 350 and 800). Our main method CMIM + Bayesian achieves 12.46% error rate on the validation set without any tunning, while the topranking method achieves 5.47% with a Bayesian Network, see (=-=Guyon et al., 2004-=-) for more details on the participants and results. 6.2 Speed The image classification task requires the selection of 50 features among 43,904 with a training set of 500 examples. The naive implementa... |

61 | Selecting input variables using mutual information and nonparametric density estimation,
- Bonnlander, Weigend
- 1994
(Show Context)
Citation Context ...wer, which can be estimated by various means such as Fisher score (Furey et al., 2000), Kolmogorov-Smirnov test, Pearson correlation (Miyahara and Pazzani, 2000) or mutual information (Battiti, 1994; =-=Bonnlander and Weigend, 1996-=-; Torkkola, 2003). Selection based on such a ranking does not ensure weak dependency among features, and can lead to redundant and thus less informative selected families. To catch dependencies betwee... |

42 | Pazzani.Improvement of collaborative filtering with the simple bayesian classifier 1.
- Miyahara, J
- 2002
(Show Context)
Citation Context ... filters rank features according to their individual predictive power, which can be estimated by various means such as Fisher score (Furey et al., 2000), Kolmogorov-Smirnov test, Pearson correlation (=-=Miyahara and Pazzani, 2000-=-) or mutual information (Battiti, 1994; Bonnlander and Weigend, 1996; Torkkola, 2003). Selection based on such a ranking does not ensure weak dependency among features, and can lead to redundant and t... |

41 |
Some infinity theory for predictor ensembles
- Breiman
- 2000
(Show Context)
Citation Context ...rate is selected, and the distribution is refreshed to increase the weight of the misclassified samples and reduce the importance of the others. Boosting can be seen as a functional gradient descent (=-=Breiman, 2000-=-; Mason et al., 2000; Friedman et al., 2000) in which each added weak learner is a step in the space of classifiers. From that perspective, the weight of a given sample is proportional to the derivati... |

27 |
Random Forests, Random Features,
- BREIMAN
- 2001
(Show Context)
Citation Context ...iency was increased on our specific task by using randomization (Amit et al., 1997) which consist of using random subsets of the features instead of random subsets of training examples as in bagging (=-=Breiman, 1999-=-, 1996). We have built 50 trees, each with one half of the features selected at random, and collected the features in the first five layers. Several configurations of number of trees, proportions of f... |

15 | Regularizing adaboost
- Rätsch, Onoda, et al.
- 1999
(Show Context)
Citation Context ...sons, we have used the original AdaBoost procedure (Freund and Schapire, 1996a,b), which is known to suffer from overfitting. For noisy tasks, we have chosen a soft-margin version called AdaBoostreg (=-=Ratsch et al., 1998-=-), which regularizes the classical AdaBoost by penalizing samples which too heavily influence the training, as they are usually outliers. To use boosting as a feature selector, we just keep the set of... |

12 | Fast face detection with precise pose estimation - Fleuret, Geman - 2002 |

11 | Gunopulos D. Feature Selection for the Naïve Bayesian Classifier Using Decision Trees [J - Ratanamahatana |

7 | Coarse-To-Fine Visual Selection - Fleuret, Geman - 2001 |