| R. Kohavi and D. Sommerfield. Feature subset selection using the wrapper model: Overfitting and dynamic search space topology. In First International Conference on Knowledge Discovery and Data Mining, pages 192--197, 1995. |
....methods: the wrapper and filter approaches [9] The wrapper approach relies on feedback from the performance algorithm to learn feature weights. The filter approach optimizes a classifier independent criterion function. The wrapper approach tends to perform better, however it can cause overfitting [10]. Moreover, it should only be applied in combination with classifiers of low complexity to limit its computational cost. Feature selection methods have exploited several search strategies. The most rudimentary strategy, exhaustive search, considers 2 n ;1 (where n is the maximum number of ....
R. Kohavi and D. Sommerfield. Feature subset selection using the wrapper model: Overfitting and dynamic search space topology. In 1 st Int. Conf. on Knowledge Discovery and Data Mining, pages 192--197, 1995.
....to use a different set of feature weights for each cluster. This can help in classifying new documents into one of the pre existing categories. So far, the problem of clustering and feature seletion have been treated rather independently or in a wrapper kind approach [8] 9] 10] 11] 12] [13], but rarely coupled together to achieve the same objective. In [14] we have presented a new algorithm, called Simultaneous Clustering and Attribute Discrimination (SCAD) that performs clustering and feature weighting simultaneously. When used as part of a supervised or unsupervised learning ....
R. Kohavi and D. Sommerfield, "Feature subset selection using the wrapper model: Overfitting and dynamic search space topology, " in First International Conference on Knowledge Discovery and Data Mining, 1995, pp. 192--197.
....induction (Buntine 1991) in which an inductive learner and a human analyst interact in real time, requires very fast learning algorithms in order to be practical. Wrapper approaches, which for a particular problem and algorithm iteratively search for feature subsets or good parameter settings (Kohavi Sommerfield 1995) (Provost Buchanan 1995) also require very fast learners because such systems run the learning algorithms multiple times, evaluating them under different conditions. SCALING METHODS Main Approach General Methods Example Techniques Fast algorithm Restricted model space linear discriminant, ....
Kohavi, R. and Sommerfield, D. 1995. Feature subset selection using wrapper model: Overfitting and dynamic search space topology. In Proc. KDD-95.
....the mentioned weights. Bearing in mind the general EBNA procedure, Figure 3 summarizes the FW EBNA method. Following the basic EDA scheme the initial population of weights is randomly created. To adopt a stopping criteria we will follow the findings of Ng [16] and Kohavi and Sommerfield [17]. Ng [16] in a work on the overfitting phenomenom, demonstrates that when cross validation is used to select from a large pool of different classification models in a noisy task with too small training set, it may not be advisable to pick the 10 model with minimum cross validation error, and a ....
....of different classification models in a noisy task with too small training set, it may not be advisable to pick the 10 model with minimum cross validation error, and a model with higher cross validation error will have better generalization power over novel test instances. Kohavi and Sommerfield [17] display the effect of overfitting in a Feature Subset Selection problem using a wrapper cross validated approach when the number of instances is small. As we usually work with small (less than 1; 000 training instances) and noisy training sets, we decide to stop the search when in a sampled new ....
R. Kohavi, D. Sommerfield, Feature Subset Selection using the wrapper model: overfitting and dynamic search space topology, Proceedings of the First International Conference on Knowledge Discovery and Data Mining, KDD'95, Montreal, Canada, 1995, pp. 192-197.
....approach to feature selection. In the wrapper model (section 3.2.3) the performance of feature subsets is measured by classifiers. The feature selection process, then, depends on the learning algorithm being used. For this particular implementation of our framework, we chose the wrapper model [13] because in many cases different features can yield different results with different learning algorithms. Avt# ## In our model, features and classifiers are chosen automatically for each node of the object definition hierarchy. D1 represents the training set for a leaf node, and Ci the ....
....searching all possibilities. In this paper, rather than focusing on a particular selection strategy, the goal is to show the benefits of automating the feature selection process. Since the measure P e is dependent on the learning algorithm and data used, we use the wrapper model described in [13]. In that model (fig. 5) feature selection is performed with respect to a particular classifier and data set. Avt# # The wrapper model for feature selection in [13] The classifiers themselves are used to measure performance, so the features selected depend specifically on the data and ....
[Article contains additional citation context not shown here]
R. Kohavi, "Feature Subset Selection Using the Wrapper Model: Overfitting and Dynamic Search Space Topology", First International Conference on Knowledge Discovery and Data Mining, pp. 192-197, 1995.
....an inductive learner and a human analyst interact in real time, requires very fast SCALING UP INDUCTIVE ALGORITHMS 3 learning algorithms in order to be practicable. Wrapper approaches, which for a particular problem and algorithm iteratively search for feature subsets or good parameter settings (Kohavi and Sommerfield 1995; Kohavi 1996; Provost 1992; Provost and Buchanan 1995) also require very fast learners because such systems run the learning algorithms multiple times, evaluating them under different conditions. Furthermore, in a wrapper approach, each evaluation may involve multiple runs to produce performance ....
Kohavi, R. and D. Sommerfield (1995). Feature subset selection using the wrapper model: Overfitting and dynamic search space topology. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, Menlo Park, CA. AAAI Press.
....case with the help of a description generated by the learning system. Instead of ( T ) v) we will just write (T ; v) to indicate the class returned for the feature vector v by the classifier that was generated by on the set of preclassified cases from T (the notation has been taken from Kohavi 1995). 2.2.2 Training Sets and Test Sets We have mentioned above that the set of cases presented to a learning system during the learning phase is usually called training set. Often, the available data set of preclassified cases is subdivided so that one part is used as the training set and the other ....
....domain) In order to do that, we will have to estimate those measures of performance from available data. Most of the measures and methods described in this section have been implemented in the VIE CBR2 system (c.f. chapter 4) Most definitions of performance measures and estimates were taken from Kohavi (1995) and Egmont Petersen, Talmon, Brender, and McNair (1994) 2.2.5.1 Classification Accuracy Probably the most commonly used measure of performance is accuracy. Accuracy is the relative frequency of correctly classified cases: a = n T n T n F (2:2) Where n T and n F are the number of correctly ....
[Article contains additional citation context not shown here]
Kohavi, R. & D. Sommerfield (1995). Feature Subset Selection Using the Wrapper Model: Overfitting and Dynamic Search Space Topology. In KDD-95. Submitted manuscript.
....interactive induction [14] in which an inductive learner and a human analyst interact in real time, requires very fast learning algorithms in order to be practical. Wrapper approaches, which for a particular problem and algorithm iteratively search for feature subsets or good parameter settings [46] [44] 58] 55] also require very fast learners because such systems run the learning algorithms multiple times, evaluating them under different conditions. Furthermore, in a wrapper approach, each evaluation may involve multiple runs to produce performance statistics (e.g. with cross ....
Kohavi, R. and Sommerfield, D. (1995). Feature subset selection using wrapper model: Overfitting and dynamic search space topology. In Proc. of The First Intl. Conf. on Knowledge Discovery and Data Mining,(KDD-95), Menlo Park, CA: AAAI Press.
....induction (Buntine 1991) in which an inductive learner and a human analyst interact in real time, requires very fast learning algorithms in order to be practicable. Wrapper approaches, which for a particular problem and algorithm iteratively search for feature subsets or good parameter settings (Kohavi and Sommerfield 1995) (Kohavi 1996) Provost and Buchanan 1995) Provost 1992) also require very fast learners because such systems run the learning algorithms multiple times, evaluating them under different conditions. Furthermore, in a wrapper approach, each evaluation may involve multiple runs to produce ....
Kohavi, R. and D. Sommerfield (1995). Feature subset selection using the wrapper model: Overfitting and dynamic search space topology. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, Menlo Park, CA. AAAI Press.
....be searched before a solution is found. The most common technique for reducing the size of this hypothesis space is to attempt to identify relevant subsets of these attributes, a process that is commonly referred to as feature subset selection (FSS) Caruana and Freitag, 1994; John et al. 1994; Kohavi and Sommerfield, 1995; Pfahringer, 1995] The special case of identifying relevant values of attributes that could be used as candidate conditions in a rule learning algorithm has also been called literal selection [Gamberger, 1995] FSS algorithms attempt to dynamically identify candidate conditions that are ....
Ron Kohavi and Dan Sommerfield. Feature subset selection using the wrapper model: Overfitting and dynamic search space topology. In U.M. Fayyad and R. Uthurusamy, editors, Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining (KDD-95), pages 192--197. AAAI Press, 1995.
....S 00 of size flm. Its increase with f reflects the fact that we are testing a set of hypotheses of size exponential in f , and that there is potential for overfitting the flm holdout samples. In the context of feature selection, the issue of overfitting of hold out data was also raised by [ Kohavi and Sommerfield, 1995 ] see also [ Ng, 1997 ] for a detailed discussion of overfitting of hold out data in hypothesis selection. But since this is a worstcase bound, it holds in particular for the bad case where all 2 f hypotheses are very different from each other. This is unlikely as they were trained on the ....
Kohavi, R. and Sommerfield, D. (1995). Feature subset selection using the wrapper model: Overfitting and dynamic search space topology. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining.
.... methods give good results [19,1,16] However for the sake of computational cost wrapper methods can only be applied in combination with classifiers of low complexity like e.g. k nearest neighbor methods or decision trees (see e.g. 25] or [9] Moreover, wrapper methods can cause overfitting [11] because the learning algorithm is fitted by the change of data. Summarizing, we favor the filter approach because this technique is generally applicable and can be used even with complex classifiers like neural networks. There exist a lot of search strategies for the determination of an optimal ....
....feature subsets, whereas in RELIEF F the user has to guess the relevant features by the size of the feature weights. This would be a hard problem if features have approximately the same feature weights are all of these features relevant or just a subset of them For applying RELIEF F, 19] [11] and [12] propose the selection of features with negative weights, which would lead to the selection of all features in every of our real world examples and therefore to suboptimal results. Since our feature selection method and RELIEF F face the same goal but use different optimization techniques ....
John G. Kohavi R. Feature subset selection using the wrapper model: Overfitting and dynamic search space topology. Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 1995.
....counts with an m estimate Laplace correction (Cestnik 1990) as described in Kohavi, Becker Sommerfield (1997) The Naive Bayes classifier is relatively simple but very robust to violations of its independence assumptions. It performs well for many real world datasets (Domingos Pazzani 1997, Kohavi Sommerfield 1995) and is excellent at handling irrelevant attributes (Langley Sage 1997) 4. The Voting Algorithms The different voting algorithms used are described below. Each algorithm takes an inducer and a training set as input and runs the inducer multiple times by changing the distribution of training ....
....and variance gives similar estimated error to holdout error estimation repeated 30 times. Standard deviations of the error estimate from each run were computed as the standard deviation of the three outer runs, assuming they were independent. Although such an assumption is not strictly correct (Kohavi 1995a, Dietterich 1998) it is quite reasonable given our circumstances because our training sets are small in size and we only average three values. 6. Experimental Design We now describe our desiderata for comparisons, show a sanity check we performed to verify the correctness of our ....
[Article contains additional citation context not shown here]
Kohavi, R. & Sommerfield, D. (1995), Feature subset selection using the wrapper model: Overfitting and dynamic search space topology, in `The First International Conference on Knowledge Discovery and Data Mining', pp. 192--197.
....knowledge of the algorithm is needed, just the interface) The feature subset selection algorithm conducts a search for a good subset using the induction algorithm itself as part of the evaluation function. The accuracy of the induced classifiers is estimated using accuracy estimation techniques (Kohavi, 1995b) The problem we are investigating is that of state space search, and different search engines will be investigated in the next sections. The wrapper approach conducts a search in the space of possible parameters. A search requires a state space, an initial state, a termination condition, and a ....
....of 100 instances, the estimated accuracy rose to 76 (26 optimistic) after about 300 node evaluations, indicative of overfitting. Although the theoretical problem exists, our experiments with the wrapper approach indicate that overfitting is mainly a problem when the number of instances is small (Kohavi and Sommerfield, 1995). Moreover, even if the estimates are biased, the algorithm may still choose the correct feature subsets because it is the relative accuracy that matters most. 1.5 RELATED WORK The pattern recognition and statistics literature offers many filter approaches for feature subset selection (Devijver ....
[Article contains additional citation context not shown here]
Kohavi, R. and Sommerfield, D. (1995). Feature subset selection using the wrapper model: Overfitting and dynamic search space topology. In The First International Conference on Knowledge Discovery and Data Mining, pages 192--197.
....problems: the learning algorithms are not given access to the underlying distribution, and most practical algorithms attempt to find a hypothesis by approximating NP hard optimization problems. The first problem is closely related to the bias variance tradeoff (Geman, Bienenstock Doursat 1992, Kohavi 1995b) one must tradeoff estimation of more parameters (bias reduction) with accurately estimating these parameters (variance reduction) This problem is independent of the computational power available to the learner. The second problem, that of finding a best hypothesis, is usually intractable ....
....algorithm is needed, just the interface) The feature subset selection algorithm conducts a search for a good subset using the induction algorithm itself as part of the evaluation function. The accuracy of the induced classifiers is estimated using accuracy estimation techniques as described in Kohavi (1995b) The problem we are investigating is that of state space search, and different search engines will be investigated in the next sections. The wrapper approach conducts a search in the space of possible parameters. A search requires a state space, an initial state, a termination condition, and a ....
[Article contains additional citation context not shown here]
Kohavi, R. & Sommerfield, D. (1995), Feature subset selection using the wrapper model: Overfitting and dynamic search space topology, in The First International Conference on Knowledge Discovery and Data Mining, pp. 192--197.
....we give comprehensibility an important weight. 3. It is compact. While related to comprehensibility, one does not imply the other. A perceptron (see below) might be a compact classifier, yet given an instance, it may be hard to understand the labelling process. Alternatively, a decision table (Kohavi 1995a) also see Chapter 5 on page 130) may be very large, yet labelling each instance is trivial: simply look it up in the table. Michie (1987) reported that when ID3 s output on the chess domain was shown to a domain expert, i.e. a chess master, it was completely opaque. Although it was very accurate, the tree was large, obscure, ....
....learning community by Wolpert (1992a) and Schaffer (1993) and the subject has received much attention in the statistics community (cf. Miller (1990) Although the theoretical problem exists, our experiments indicate that overfitting is mainly a problem when the number of instances is small. Kohavi Sommerfield (1995a) reported that out of 70 searches for feature subsets with datasets containing over 250 instances, ten searches were optimistically biased by more than two standard deviations and CHAPTER 4. WRAPPERS 116 0 100 200 300 400 500 600 Nodes Rand forward selection 50 55 60 65 70 75 80 Accuracy 0 20 40 60 80 Nodes Breast ....
Kohavi, R. & Sommerfield, D. (1995a), Feature subset selection using the wrapper model: Overfitting and dynamic search space topology, in "The First International Conference on Knowledge Discovery and Data Mining".
....counts with an m estimate Laplace correction (Cestnik 1990) as described in Kohavi, Becker Sommerfield (1997) The Naive Bayes classifier is relatively simple but very robust to violations of its independence assumptions. It performs well for many real world datasets (Domingos Pazzani 1997, Kohavi Sommerfield 1995) and is excellent at handling irrelevant attributes (Langley Sage 1997) 4. The Voting Algorithms The di#erent voting algorithms used are described below. Each algorithm takes an inducer and a training set as input and runs the inducer multiple times by changing the distribution of training set ....
....and variance gives similar estimated error to holdout error estimation repeated 30 times. Standard deviations of the error estimate from each run were computed as the standard deviation of the three outer runs, assuming they were independent. Although such an assumption is not strictly correct (Kohavi 1995a, Dietterich 1998) it is quite reasonable given our circumstances because our training sets are small in size and we only average three values. 6. Experimental Design We now describe our desiderata for comparisons, show a sanity check we performed to verify the correctness of our implementation, ....
[Article contains additional citation context not shown here]
Kohavi, R. & Sommerfield, D. (1995), Feature subset selection using the wrapper model: Overfitting and dynamic search space topology, in `The First International Conferenceon Knowledge Discovery and Data Mining', pp. 192--197.
....The accuracy of predictions made about an instance. For example, whether a customer will be able to pay a loan or whether he or she will respond to a yet another credit card offer. Using methods such as holdout, bootstrap, and cross validation (Weiss Kulikowski 1991, Efron Tibshirani 1995, Kohavi 1995b) one can estimate the future prediction accuracy on unseen data quite well in practice. Comprehensibility The ability for humans to understand the data and the classification rules induced by the learning algorithm. Some classifiers, such as decision rules and decision trees are inherently ....
....was made so long as it is accurate. Compactness While related to comprehensibility, one does not necessarily imply the other. A Perceptron (single neuron) might be a compact classifier, yet given an instance, it may be hard to understand the labelling process. Alternatively, a decision table (Kohavi 1995a) may be very large, yet labelling each instance is trivial: one simply looks it up in the table. Training and classification time The time it takes to classify versus the training time. Some classifiers, such as neural networks are fast to classify but slow to train. Other classifiers, such as ....
[Article contains additional citation context not shown here]
Kohavi, R. & Sommerfield, D. (1995), Feature subset selection using the wrapper model: Overfitting and dynamic search space topology, in `The First International Conference on Knowledge Discovery and Data Mining', pp. 192--197.
....The accuracy of predictions made about an instance. For example, whether a customer will be able to pay a loan or whether he or she will respond to a yet another credit card offer. Using methods such as holdout, bootstrap, and cross validation (Weiss Kulikowski 1991, Efron Tibshirani 1995, Kohavi 1995b) one can estimate the future prediction accuracy on unseen data quite well in practice. Comprehensibility The ability for humans to understand the data and the classification rules induced by the learning algorithm. Some classifiers, such as decision rules and decision trees are inherently ....
....was made so long as it is accurate. Compactness While related to comprehensibility, one does not necessarily imply the other. A Perceptron (single neuron) might be a compact classifier, yet given an instance, it may be hard to understand the labelling process. Alternatively, a decision table (Kohavi 1995a) may be very large, yet labelling each instance is trivial: one simply looks it up in the table. Training and classification time The time it takes to classify versus the training time. Some classifiers, such as neural networks are fast to classify but slow to train. Other classifiers, such as ....
[Article contains additional citation context not shown here]
Kohavi, R. & Sommerfield, D. (1995), Feature subset selection using the wrapper model: Overfitting and dynamic search space topology, in `The First International Conference on Knowledge Discovery and Data Mining', pp. 192--197.
....fragmentation, which in turns causes the irrelevant feature to be chosen. selection algorithm conducts a search for a good subset using the induction algorithm itself as part of the evaluation function. The accuracy of the induced classifiers is estimated using accuracy estimation techniques (Kohavi 1995b) The problem we are investigating is that of state space search, and different search engines will be investigated in the next sections. The wrapper approach conducts a search in the space of possible parameters. A search requires a state space, an initial state, a termination condition, and a ....
.... seeing only a small number of instances) age (old datasets at the UC Irvine repository, such as Chess, hypothyroid, and vote, were not considered because of their possible influence on the development of algorithms) A detailed description of the datasets and these considerations is given in Kohavi (1995c) Small datasets were tested using ten fold cross validation; artificial datasets and large datasets were split into training and testing sets (the artificial datasets have a well defined training set, as does the DNA dataset from StatLog (Taylor, Michie Spiegalhalter 1994) The baseline ....
[Article contains additional citation context not shown here]
Kohavi, R. & Sommerfield, D. (1995), Feature subset selection using the wrapper model: Overfitting and dynamic search space topology, in The First International Conference on Knowledge Discovery and Data Mining, pp. 192--197.
....is made for the class with the largest posterior probability. The probabilities in the above formulas must be estimated from the training set. This model is very robust and continues to perform well even in the face of obvious violations of this independence assumption (Domingos Pazzani 1996, Kohavi Sommerfield 1995). We begin with a discussion of our motivation and requirements for the SBC visualization. We then describe it in detail and then explain why we made certain design decisions. 2 Motivation and Requirements for Visualization The ability to describe the structure of a classifier in a way that ....
Kohavi, R. & Sommerfield, D. (1995), Feature subset selection using the wrapper model: Overfitting and dynamic search space topology, in `The First International Conference on Knowledge Discovery and Data Mining', pp. 192--197.
No context found.
R. Kohavi and D. Sommerfield. Feature subset selection using the wrapper model: Overfitting and dynamic search space topology. In First International Conference on Knowledge Discovery and Data Mining, pages 192--197, 1995.
No context found.
Ron Kohavi and Dan Sommerfield. 1995. Feature subset selection using the wrapper model: Overfitting and dynamic search space topology. In The First International Conference on Knowledge Discovery and Data Mining, pages 192--197. AAAI Press, Menlo Park, California, August. Journal version in AIJ.
No context found.
R. Kohavi and D. Sommerfield. Feature subset selection using the wrapper model: Overfitting and dynamic search space topology. In First International Conference on Knowledge Discovery and Data Mining, pages 192--197, 1995.
No context found.
R. Kohavi, "Feature Subset Selection Using the Wrapper Model: Overfitting and Dynamic Search Space Topology," in proceedings of First International Conference on Knowledge Discovery and Data Mining, pages 192-197, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC