| Dietterich, T., & Kong, E. B. (1995). Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Tech. rep., Oregon State University. Available via the WWW at http://www.cs.orst.edu:80/tgd/cv/tr.html. |
....stage, the classification part, the predictions of the binary classifiers are combined to extend a prediction on the original label of a test instance. Experimental work has shown that output coding can often greatly improve over standard reductions to binary problems (Dietterich Bakiri, 1995; Dietterich Kong, 1995; Kong Dietterich, 1995; Aha Bankert, 1997; Schapire, 1997; Dietterich, 1999; Berger, 1999; Allwein, Schapire, Singer, 2000) The performance of output coding was also analyzed in statistics and learning theoretic contexts (Hastie Tibshirani, 1998; James Hastie, 1998; Schapire Singer, ....
....the predictions of the binary classifiers are combined to extend a prediction on the original label of a test instance. Experimental work has shown that output coding can often greatly improve over standard reductions to binary problems (Dietterich Bakiri, 1995; Dietterich Kong, 1995; Kong Dietterich, 1995; Aha Bankert, 1997; Schapire, 1997; Dietterich, 1999; Berger, 1999; Allwein, Schapire, Singer, 2000) The performance of output coding was also analyzed in statistics and learning theoretic contexts (Hastie Tibshirani, 1998; James Hastie, 1998; Schapire Singer, 1999; Allwein et al. ....
Dietterich, T., & Kong, E. B. (1995). Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Tech. rep., Oregon State University. Available via the WWW at http://www.cs.orst.edu:80/tgd/cv/tr.html.
....the learner guesses the target on average; the variance measures how much the returned hypotheses hL differ over all possible samples when the learner is invoked repeatedly. Bias variance decompositions are known for the quadratic loss function (Geman et al. 1992) and the zero one loss (Kong Dietterich, 1995; Dietterich Kong, 1995; Kohavi Wolpert, 1996) The bias variance decomposition serves as a tool for the analysis of learning algorithms rather than an efficient method of estimating the error incurred by a learner for a particular problem. Estimating the terms of the decomposition requires ....
....the target on average; the variance measures how much the returned hypotheses hL differ over all possible samples when the learner is invoked repeatedly. Bias variance decompositions are known for the quadratic loss function (Geman et al. 1992) and the zero one loss (Kong Dietterich, 1995; Dietterich Kong, 1995; Kohavi Wolpert, 1996) The bias variance decomposition serves as a tool for the analysis of learning algorithms rather than an efficient method of estimating the error incurred by a learner for a particular problem. Estimating the terms of the decomposition requires repeated runs of the ....
Dietterich, T., & Kong, E. (1995). Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Tech. rep., Department of Computer Science, Oregon State University.
....prediction accuracy and computational requirements are two primary concerns. To increase the prediction accuracy of classifiers, classifier committee 1 learning techniques have been developed with great success (Freund 1996; Freund and Schapire 1996a; 1996b; Quinlan 1996; Breiman 1996a; 1996b; Dietterich and Kong 1995; Ali 1996; Chan, Stolfo, and Wolpert 1996; Schapire, Freund, Bartlett, and Lee 1997; Domingos 1997; Bauer and Kohavi 1998) especially Boosting 2 (Freund and Schapire 1996b; Quinlan 1996; Bauer and Kohavi 1998) This type of technique generates several classifiers to form a committee by using a ....
.... There are some other classifier committee learning approaches such as generating multiple trees by manually changing learning parameters (Kwok and Carter 1990) errorcorrecting output codes (Dietterich and Bakiri 1995) and generating different classifiers by randomizing the base learning process (Dietterich and Kong 1995; Ali 1996) which is similar to Sasc (Zheng and Webb 1998) Reviews of related methods are provided in Dietterich (1997) and Ali (1996) A collection of recent research in this area can be found from Chan et al. 1996) Base on the observation that both Boosting and Sasc can significantly increase ....
Dietterich, T.G. and Kong, E.B. 1995. Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms. Technical Report, Dept of Computer Science, Oregon State University, Corvallis, Oregon (available at ftp://ftp.cs.orst.edu/pub/tgd/papers/ tr-bias.ps.gz).
....However, increasing the training data size linearly with concept complexity helped keep the accuracy stable. ffl Greedily induced trees were not much larger than the optimal, even for complex concepts. However, the variance in tree size was more for higher dimensional and more complex concepts. Dietterich and Kong (1995) empirically argue that even in terms of prediction accuracy, variance is the main cause for the failure of decision trees in some domains. ffl For a fixed training set size, increasing the dimensionality did not affect greedy induction as much as increasing concept complexity or noise did. ....
Dietterich, T. G., & Kong, E. B. (1995). Machine learning bias, statistical bias and statistical variance of decision tree algorithms. In Schlimmer, J. (Ed.), MLC-95: Machine Learning. Proceedings of the Twelfth International Conference Tahoe City, CA. Morgan Kaufmann Publishers Inc.
....However, increasing the training data size linearly with concept complexity helped keep the accuracy stable. ffl Greedily induced trees were not much larger than the optimal, even for complex concepts. However, the variance in tree size was more for higher dimensional and more complex concepts. Dietterich and Kong (1995) empirically argued that even in terms of prediction accuracy, variance is the main cause for the failure of decision trees in some domains. ffl For a fixed training set size, increasing the dimensionality did not affect greedy induction as much as increasing concept complexity or noise did. ....
Dietterich, T. G., and Kong, E. B. 1995. Machine learning bias, statistical bias and statistical variance of decision tree algorithms. In Machine Learning: Proceedings of the Twelfth International Conference.
....Inductive learning, Machine learning, Data mining. 1 Introduction In order to increase the prediction accuracy of classifiers, classifier committee 1 learning techniques have been developed with great success (Freund 1996; Freund and Schapire 1996a; 1996b; Quinlan 1996; Breiman 1996a; 1996b; Dietterich and Kong 1995; Ali 1996; Chan, Stolfo, and Wolpert 1996; Schapire, Freund, Bartlett, and Lee 1997; Domingos 1997; Bauer and Kohavi 1998; Zheng and Webb 1998) especially Boosting 2 (Freund and Schapire 1996b; Quinlan 1996; Bauer and Kohavi 1998) This type of technique generates several classifiers to form a ....
.... There are some other classifier committee learning approaches such as generating multiple trees by manually changing learning parameters (Kwok and Carter 1990) error correcting output codes (Dietterich and Bakiri 1995) and generating different classifiers by randomizing the base learning process (Dietterich and Kong 1995; Ali 1996) which is similar to Sasc (Zheng and Webb 1998) Reviews of related methods are provided in Dietterich (1997) and Ali (1996) A collection of recent research in this area can be found from Chan et al. 1996) In the following section, we briefly describe the Boosting and Sasc techniques ....
Dietterich, T.G. and Kong, E.B. 1995. Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms. Technical Report, Dept of Computer Science, Oregon State University, Corvallis, Oregon (available at ftp://ftp.cs.orst.edu/pub/tgd/papers/tr-bias.ps.gz).
....learning. The bias variance decomposition for quadratic loss functions is well known and serves as an important tool for analyzing learning algorithms, yet no decomposition was offered for the more commonly used zero one (misclassification) loss functions until the recent work of Kong Dietterich (1995) and Breiman (1996) Their decomposition suffers from some major shortcomings though (e.g. potentially negative variance) which our decomposition avoids. We show that, in practice, the naive frequency based estimation of the decomposition terms is by itself biased and show how to correct for ....
....For classification, the quadratic loss function is often inappropriate because the class labels are not numeric. In practice, an overwhelming majority of researchers in the Machine Learning community instead use expected misclassification rate, which is equivalent to the zeroone loss. Kong Dietterich (1995) and Dietterich Kong (1995) recently proposed a bias variance decomposition for zero one loss functions, but their proposal suffers from some major problems, such as the possibility of negative variance, and only allowing the values zero or one as the bias for a given test point. In this paper, ....
[Article contains additional citation context not shown here]
Dietterich, T. G. & Kong, E. B. (1995), Machine learning bias, statistical bias, and statistical variance of decision tree algorithms, Technical report, Department of Computer Science, Oregon State University.
....may all involve splitting on the same attribute. This is a very crude randomization technique. One can imagine more sophisticated methods that preferred to select splits with higher information gain. But our goal in this paper is to explore how well this simple method works. In a previous paper (Dietterich Kong, 1995), we reported promising results for this technique on five tasks. In this paper, we have performed a much more thorough experiment using 33 learning tasks. We compare randomized C4.5 to C4.5 alone, C4.5 with bagging, and C4.5 with Adaboost.M1 (boosting by weighting) We also explore the effect of ....
....competitive with, and probably superior to, Bagged C4.5 in applications where there is relatively little noise in the data. One disappointing aspect of the results shown in Figure 1 and Table 1 is that randomization did not reach zero error in the letter recognition domain. In a previous study, Dietterich and Kong (1995) reported an experiment in which 200 fold randomized C4.5 attained perfect performance on the letter recognition task (training on the first 16,000 examples and testing on the remaining 4,000) We have attempted to replicate that result without success, and we have not been able to determine the ....
Dietterich, T. G., & Kong, E. B. (1995). Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Tech. rep., Department of Computer Science, Oregon State University, Corvallis, Oregon. Available from ftp://ftp.cs.orst.edu/pub/tgd/papers/tr-bias.ps.gz.
.... research has shown that committee (or ensemble) learning techniques (Dietterich, 1997) can significantly increase the prediction accuracy of base classifier learning algorithms, especially decision tree learning (Freund, 1996; Freund Schapire, 1996a; 1996b; Quinlan, 1996; Breiman, 1996a; 1996b; Dietterich Kong, 1995; Ali, 1996; Chan, Stolfo, Wolpert, 1996; Schapire, Freund, Bartlett, Lee, 1997; Bauer Kohavi, 1999; Zheng Webb, 1998a; 1998b) Committee learning induces multiple individual classifiers to form a committee by repeated application of a single base learning algorithm. At the classification ....
Dietterich, T.G. & Kong, E.B. 1995. Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms. Technical Report, Dept of Computer Science, Oregon State University, Corvallis, Oregon (available at ftp://ftp.cs.orst.edu/pub/tgd /papers/tr-bias.ps.gz).
....features with V values, the decision tree splits the data into V subsets, depending on the value of the chosen feature. For real valued features, the decision tree splits the data into 2 subsets, depending on whether the value of the chosen feature is above or below a chosen threshold. Dietterich Kong (1995) implemented a variant of C4.5 that chooses randomly (with equal probability) among the top 20 best tests. Table 2 compares a single run of C4.5 to ensembles of 200 classifiers constructed by bagging C4.5 and by injecting randomness into C4.5. The results show that injecting randomness obtains the ....
....and shows that the AdaBoost methods are generally superior. On the other hand, Quinlan (1996) has shown that in domains with very noisy training data, AdaBoost.M1 can perform very badly it places high weight on incorrectly labeled training examples and consequently constructs bad classifiers. Dietterich and Kong (1995) showed that combining bagging with error correcting output coding improved the performance of both methods, which suggests that combinations of other ensemble methods should be explored as well. Dietterich and Kong also showed that error correcting output coding does not work well with highly ....
[Article contains additional citation context not shown here]
Dietterich, T. G., & Kong, E. B. (1995). Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Tech. rep., Department of Computer Science, Oregon State University, Corvallis, Oregon.
....may all involve splitting on the same attribute. This is a very crude randomization technique. One can imagine more sophisticated methods that preferred to select splits with higher information gain. But our goal in this paper is to explore how well this simple method works. In a previous paper (Dietterich Kong, 1995), we reported promising results for this technique on five tasks. In this paper, we have performed a much more thorough experiment using 33 learning tasks. We compare randomized C4.5 to C4.5 alone, C4.5 with bagging, and C4.5 with Adaboost.M1 (boosting by weighting) We also explore the effect of ....
....competitive with, and probably superior to, Bagged C4.5 in applications where there is relatively little noise in the data. One disappointing aspect of the results shown in Figure 2 and Table 1 is that randomization did not reach zero error in the letter recognition domain. In a previous study, Dietterich and Kong (1995) reported an experiment in which 200 fold randomized C4.5 attained perfect performance on the letter recognition task (training on the first 16,000 examples and testing on the remaining 4,000) We have attempted to replicate that result without success, and we have not been able to determine the ....
Dietterich, T. G., & Kong, E. B. (1995). Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Tech. rep., Department 22 of Computer Science, Oregon State University, Corvallis, Oregon. Available from ftp://ftp.cs.orst.edu/pub/tgd/papers/tr-bias.ps.gz.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC