| G. Holmes and C. G. Nevill-Manning. Feature selection via the discovery of simple classification rules. In Proceedings of the Symposium on Intelligent Data Analysis, Baden-Baden, Germany, 1995. |
....by dividing continuous features into discrete ranges during the construction of a decision tree. Many of the feature selection algorithms described in the next chapter require continuous features to be discretized, or give superior results if discretization is performed at the outset [AD91, HNM95, KS96b, LS96] Discretization is used as a preprocessing step for the correlation based approach to feature selection presented in this thesis, which requires all features to be of the same type. This section describes some discretization approaches from the machine learning literature. 1 CART ....
....classifier on several machine learning datasets. Results showed that Bayesian networks using features selected by the oblivious decision tree algorithms outperformed Bayesian networks without feature selection and Bayesian networks with features selected by a wrapper. Holmes and Nevill Manning [HNM95] use Holte s 1R system [Hol93] to estimate the predictive accuracy of individual features. 1R builds rules based on a single features (called 1 rules 5 ) If the data is split into training and test sets, it is possible to calculate a classification accuracy for each rule and hence each ....
G. Holmes and C. G. Nevill-Manning. Feature selection via the discovery of simple classification rules. In Proceedings of the Symposium on Intelligent Data Analysis, Baden-Baden, Germany, 1995.
....when every combination of values for a feature subset is associated with a single class label [1] Another method [13] eliminates features whose information content is subsumed by some number of the remaining features. Still other methods attempt to rank features according to a relevancy score [11][8]. Filters have proven to be much faster than wrappers and hence can be applied to large data sets containing many features. Their general nature allow them to be used with any learner, unlike the wrapper, which must be re run when switching from one learning algorithm to another. However, most ....
Holmes, G. and Nevill-Manning, C. G. 1995. Feature selection via the discovery of simple classification rules. In Proceedings of the International Symposium on Intelligent Data Analysis.
....a single class label (Almuallim and Dietterich, 1992) Another method (Koller and Sahami, 1996) eliminates features whose information content is subsumed by some number of the remaining features. Still other methods attempt to rank features according to a relevancy score (Kira and Rendell, 1992; Holmes and Nevill Manning, 1995). Filters have proven to be much faster than wrappers and hence can be applied to large data sets containing many features. Their general nature allow them to be used with any learner, unlike the wrapper, which must be re run when switching from one learning algorithm to another. This paper ....
Holmes, G. and Nevill-Manning, C. G. 1995. Feature Selection via the Discovery of Simple Classification Rules. In Proceedings of the International Symposium on Intelligent Data Analysis.
.... Some filter methods strive for consistency in the data#that is, they note when every combination of values for a feature subset is associated with a single class label [Almuallim and Deitterich, 1991] Other filter methods rank features according to a relevancy score [Kira and Rendell, 1992; Holmes and Nevill Manning,1995] Another school of thought argues that the bias of a particular induction algorithm should be taken into account when selecting features. This method, dubbed the Wrapper [Kohavi and John, 1996] uses the induction algorithm along with a statistical re sampling technique such as crossvalidation ....
G. Holmes and C.G. NevillManning, Feature selection via the discovery of simple classification rules, Proc. Int. Symp. on Intelligent Data Analysis (IDA-95), 1995.
....1991] Another method [Koller and Sahami, 1996] eliminates features whose information content (concerning other features and the class) is subsumed by some number of the remaining features. Still other methods attempt to rank features according to a relevancy score [Kira and Rendell, 1992; Holmes and Nevill Manning, 1995]. Filters have proven to be much faster than wrappers and hence can be applied to large data sets containing many features. 2.1 Searching the Feature Subset Space The purpose of feature selection is to decide which of the initial (possibly large number) of features to include in the final subset ....
Holmes, G., Nevill-Manning, C. G. (1995). Feature selection via the discovery of simple classification rules. Proceedings of the International Symposium on Intelligent Data Analysis (IDA-95).
....(Fu 1968; Kohavi et al. 1994) FSS is a forward sequential selection algorithm which iteratively adds attributes and tests their effectiveness relevance with an induction algorithm. It produces a list of what the induction algorithm considers to be the most relevant attributes. Runic (Smith and Holmes 1995), an algorithm for subset selection that examines rough numeric dependencies in the data. An attribute is considered potentially relevant if a particular subrange of its values can be used to predict the value of another feature more accurately than by pure chance. The degree of predictivity can ....
Holmes, G., and Nevill-Manning, C.G. (1995) "Feature selection via the discovery of simple classification rules." To appear in Proceedings of Symposium on Intelligent Data Analysis (IDA--95), Baden-Baden, Germany, August, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC