| S. W. Kwok and C. Carter. Multiple decision trees. In R. D. Shachter, T. S. Levitt, L. N. Kanal, and J. F. Lemmer, editors, Uncertainty in Artificial Intelligence 4, pages 327--335. Elsevier Science Publishers B. V., North Holland, 1990. |
....they call it ambiguity, of an ensemble of base classifiers, by calculating the mean square di#erence between the prediction made by the ensemble and the base classifiers. They proved that increasing ambiguity decreases overall error. A similar conclusion is reached by Kwok and Carter in [ Kwok Carter, 1990 ] Their study shows that ensembles with decision trees that were more syntactically diverse achieved lower error rates than ensembles consisting of less diverse decision trees, while Ali and Pazzani [ Ali Pazzani, 1996 ] suggest that the larger the number of gain ties, the greater the ....
Kwok, S., and Carter, C. 1990. Multiple decision trees. In Uncertainty in Artificial Intelligence 4, 327--335.
....and Vedelsby [17] measure the diversity, called ambiguity, of an ensemble of base classifiers, by calculating the mean square difference between the prediction made by the ensemble and the base classifiers. They proved that increasing ambiguity diversity decreases overall error. Kwok and Carter [18] showed that ensembles with decision trees that were more syntactically diverse achieved lower error rates than ensembles consisting of less diverse decision trees, while Ali and Pazzani [1] suggested that the larger the number of gain ties, the greater the ensemble s syntactic diversity is, ....
S. Kwok and C. Carter. Multiple decision trees. In Uncertainty in Aritificial Intelligence 4, pages 327--335, 1990.
....capitalize on these ideas appeared in [8] Probably due to the immaturity of those results and to the awkwardness of the terminology used therein, the potential has not been fully explored. Of course, this research flavor has been apparent in [13] who discuss overlapping concepts, in [1] and in [12]; the latter two are oriented towards the exploitation of multiple models. Our short review would be incomplete without mentioning RIPPER [5] a system that exploits setvalued features to solve categorization problems in the linguistic domain. Cohen emphasized on symbolic set valued attributes, ....
Kwok, S.W. and Carter, C., Multiple Decision Trees. In Proceedings of the Uncertainty in Artificial Intelligence 4, (R.D. Shachter, T.S. Levitt, L.N. Kanal, and J.F. Lemmer, eds.), 1990.
....example of improving accuracy through combining the power of multiple classifiers [15] Here each classifier is a decision tree, and we call the combined classifier a decision forest. Some previous attempts to construct multiple trees rely on certain heuristic procedures or manual intervention [19][30] 31] The number of different trees they can obtain is often severely limited. We emphasize that arbitrarily introduced differences between trees, such as using randomized initial conditions or perturbations in the search procedures [9] do not guarantee good combination performance. Here good ....
S.W. Kwok, C. Carter, Multiple Decision Trees, in R.D. Shachter, T.S. Levitt, L.N. Kanal, J.F. Lemmer (eds.), Uncertainty in Artificial Intelligence, 4, Elsevier Science Publishers B.V. (North-Holland), 1990, 327-335.
.... that construct decision trees with multiple options at some nodes (Buntine 1992b, Buntine 1992a, Kohavi Kunz 1997) averaging path sets, fanned sets, and extended fanned sets as alternatives to pruning (Oliver Hand 1995) voting trees using different splitting criteria and human intervention (Kwok Carter 1990); and error correcting output codes (Dietterich Bakiri 1991, Kong Dietterich 1995) Wolpert (1992) discusses stacking classifiers into a more complex classifier instead of using the simple uniform weighting scheme of Bagging. Ali (1996) provides a recent review of related algorithms, and ....
Kwok, S. W. & Carter, C. (1990), Multiple decision trees, in R. D. Schachter, T. S. Levitt, L. N. Kanal & J. F. Lemmer, eds, `Uncertainty in Artificial Intelligence', Elsevier Science Publishers, pp. 327--335.
....search. Recently, aggregation techniques, sometimes called stacking, have been advocated by many people in machine learning, neural networks, and Statistics (Wolpert 1992b, Breiman 1994, Freund Schapire 1995, Schapire 1990, Freund 1990, Perrone 1993, Krogh Vedelsby 1995, Buntine 1992, Kwok Carter 1990). It is possible to build many models, each one with a different parameter setting or with a different feature subset, and let them vote on the class. Aggregation techniques reduce the variance of the models by aggregating them, but they make it extremely hard to interpret the resulting ....
Kwok, S. W. & Carter, C. (1990), Multiple decision trees, in R. D. Schachter, T. S. Levitt, L. N. Kanal & J. F. Lemmer, eds, Uncertainty in Artificial Intelligence, Elsevier Science Publishers, pp. 327--335.
.... and the other would err in favor of introns (many false negatives) A common technique in situations such as this, where large changes in the predictor result from small changes in the training set, is to generate multiple predictors and combine them using some form of voting ( 3] 6] 10] [16]) The method which we adopted has two parameters: r is the number of hypotheses and s is the interval (in examples) between hypotheses. Given a multiset of training examples for Winnow, we record the 6 current Winnow hypothesis periodically (r times) at the end of the training phase, at ....
Kwok, S., and Carter, C. 1990. Multiple Decision Trees. In Uncertainty in Artificial Intelligence 4, ed. Schacter, R., Levigt, T., Kanal, L., and Lemmer, J. North-Holland, 327-335. 11
....the same way by two classifiers, while Chan [6] associates it with the entropy in the predictions of the base classifiers. When the predictions of the classifiers are distributed evenly across the possible classes, the entropy is higher and the set of classifiers more diverse. Kwok and Carter [13] correlate the error rates of a set of decision trees to their syntactical diversity, while Ali and Pazzani [1] studied the impact of the number of gain ties 2 on the accuracy of an ensemble of classifiers. Here, we measure the diversity within a set of classifiers by calculating the average ....
S. Kwok and C. Carter. Multiple decision trees. In Uncertainty in Aritificial Intelligence 4, pages 327--335, 1990.
....1996) More precisely, we have take an approach which uses a relational learning algorithm to learn multiple models from the training data and combines them using a statistical method. It has been shown that learning multiple models can improve classification accuracy (Ali Pazzani, 1996; Kwok Carter, 1990; Baxt, 1992; Breiman, 1996; Quinlan, 1996a) We are going to briefly review a system which is most similar to our approach called Hydra which uses a form of Bayesian learning to combine multiple models acquired by a relational learning algorithm (Ali Pazzani, 1995) The Hydra system uses a ....
Kwok, S., & Carter, C. (1990). Multiple decision trees. Uncertainty in Artificial Intelligence, 4, 327--335.
....10 fold cross validated committees. They found that cross validated committees worked best, Bagging second best, and multiple random initial weights third best on one synthetic data set and two medical diagnosis data sets. For the C4.5 decision tree algorithm, it is also easy to inject randomness (Kwok Carter, 1990; Dietterich, 2000) The key decision of C4.5 is to choose a feature to test at each internal node in the decision tree. At each internal node, C4.5 applies a criterion known as the information gain ratio to rank order the various possible feature tests. It then chooses the top ranked ....
Kwok, S. W., & Carter, C. (1990). Multiple decision trees. In Schachter, R. D., Levitt, T. S., Kannal, L. N., & Lemmer, J. F. (Eds.), Uncertainty in Artificial Intelligence 4, pp. 327--335. Elsevier Science, Amsterdam.
....1994) Several researchers have worked on a still more degenerate form of stacked generalization without any cross validation or learning at level 1. Examples are neural network ensembles (Hansen Salamon, 1990; Perrone Cooper, 1993; Krogh Vedelsby, 1995) multiple decision tree combination (Kwok Carter, 1990; Buntine, 1991; Oliver Hand, 1995) and multiple rule combination (Kononenko Kovacic, 1992) The methods used at level 1 are majority voting, weighted averaging and Bayesian combination. Other possible methods are distribution summation and likelihood combination. There are various forms of ....
Kwok, S. & C. Carter (1990). Multiple Decision Trees. Uncertainty in Artificial Intelligence 4, R. Shachter, T. Levitt, L. Kanal and J. Lemmer (Editors), pp. 327-335, North-Holland.
....for the latter. This makes Bagging appropriate for parallel datamining. While much recent attention has focused on Boosting and Bagging, other classifier committee learning approaches have also been developed, including generating multiple trees by manually changing learning parameters [16], error correcting output codes [17] generating different classifiers by randomizing the base learning process [8, 9] and learning option trees [18] A collection of recent research in this area and reviews of related methods can be found in [10, 1, 9] In this paper, we investigate an ....
Kwok, S.W. and Carter, C. Multiple decision trees. In Schachter, R.D., Levitt, T.S., Kanal, L.N., and Lemmer, J.F. (eds.), Uncertainty in Artificial Intelligence, 327-335. Elsevier Science, 1990.
....of using multiple decision trees is relatively new although it appears to be quickly gaining popularity. Some authors have suggested using multiple pruned or shrunken subtrees of one single decision tree ( 10] Others have created mutiple decision trees through interaction with the user ([30]) A very recent idea is to use mutiple decision stumps , namely depth two trees; see [39] In [7] multiple trees are created by generating bootstrap replicates of the learning set and creating a decision tree for each such replicate. Most of these papers are dealing with relatively small data ....
Kwok, S. W. and Carter, C., "Multiple decision trees," in Uncertainty and Artificial Intelligence, ed. Shachter, R. D., Levitt, T. S., Kanal, L.N. and Lemmer, J. F., Elsevier Science Publishers, North-Holland, Amsterdam 1990.
....of this distribution as the estimated class. Our first experiments with such multiple, randomized trees were reported in [21] and there is a statistical analysis of the dependence structure among the trees in [17] Alternative ways of producing and aggregating multiple trees are proposed in [22] [23], 24] 25] and [19] Various rejection criteria can also be defined in terms of . For example an image x is classified only if the value at the mode of (x) exceeds some threshold or exceeds some multiple of the value at the second mode. 6 Handwritten Digit Recognition Although ....
S. W. Kwok and C. Carter, "Multiple decision trees," in Uncertainty and Artificial Intelligence (R. D. Shachter, T. S. Levitt, L. Kanal, and J. F. Lemmer, eds.), NorthHolland, Amsterdam: Elsevier Science Publishers, 1990,.
....the predictive accuracy of both pruned and unpruned decision trees learned by C4.5. It has been hypothesised (Webb, 1997) that grafting has a similar effect to learning ensembles of classifiers (Ali, Brunk, Pazzani, 1994; Breiman, 1996; Dietterich Bakiri, 1994; Freund Schapire, 1995; Kwok Carter, 1990; Oliver Hand, 1995; Nock Gascuel, 1995; Schapire, 1990; Wolpert, 1992) in that both consider more evidence about the best classifications for objects in regions of the instance space that are poorly represented in the training data. Both aproaches form more complex partitions of the instance ....
Kwok, S. W., & Carter, C. (1990). Multiple decision trees. In Shachter, R. D., Levitt, T. S., Kanal, L. N., & Lemmer, J. F. (Eds.), Uncertainty in Artificial Intelligence 4, pp. 327--335. North Holland, Amsterdam.
....features. The development of these heuristics is a difficult ad hoc manual process based on the practitioner s past experience. DS DW avoids this process by building a collection of models and dynamically choosing which one(s) to use based on past performance in the given region of the domain. [Kwok90] select several good models (i.e. rules trees with high posterior probabilities) manually and average their predictions. This avoids the difficult problem of calculating posterior probabilities for rules [Buntine89] The two main drawbacks to this approach are the manual derivation of the models, ....
Kwok, S. W., & Carter, C. (1990). Multiple decision trees. Uncertainty in Artificial Intelligence 4 (pp. 327-335). North-Holland, Amsterdam.
....the same way by two classifiers while Chan [6] associates it with the entropy in the predictions of the base classifiers. When the predictions of the classifiers are distributed evenly across the possible classes, the entropy is higher and the set of classifiers more diverse. Kwok and Carter [16] correlate the error rates of a set of decision trees to their syntactical diversity, while Ali and Pazzani [1] studied the impact of the number of gain ties 4 on the accuracy of an ensemble of classifiers. Here, we measure the diversity within a set of classifiers S by calculating the average ....
S. Kwok and C. Carter. Multiple decision trees. In Uncertainty in Aritificial Intelligence 4, pages 327--335, 1990.
....et al. 1993) Freund (1992) has a similar approach, but with potentially many more sequentially generated distributions involved. 19 Hansen and Salamon (1990) integrate an ensemble of neural networks by simple voting. The different networks in an ensemble are generated by randomized parameters. Kwok and Carter (1990) generate different decision trees by choosing different tests at the root and combine their predictions by simple voting. Breiman s (1994) bagging utilizes bootstrapping to generate many different training distributions and voting to combine the predictions. Optiz and Shavlik (1996) perturb ....
Kwok, S. & Carter, C. (1990). Multiple decision trees. Uncertainty in Aritificial Intelligence 4 (pp. 327--335).
....for the latter. This makes Bagging appropriate for parallel datamining. While much recent attention has focused on Boosting and Bagging, other classifier committee learning approaches have also been developed, including generating multiple trees by manually changing learning parameters [Kwok and Carter, 1990], error correcting output codes [Dietterich and Bakiri, 1995] generating different classifiers by randomizing the base learning process [Dietterich and Kong, 1995; Ali, 1996] and learning option trees [Kohavi and Kunz, 1997] A collection of recent research in this area and reviews of related ....
Kwok, S.W. and Carter, C. Multiple Decision Trees. In Schachter, R.D., Levitt, T.S., Kanal, L.N., and Lemmer, J.F. eds, Uncertainty in Artificial Intelligence, 327335. Elsevier Science.
....of training and pruning samples, by many equally good attributes only one of which can be chosen at a node, due to cross validation or because of other reasons. A few authors suggested using a collection of decision trees, instead of just one, to reduce the variance in classification performance [274, 443, 444, 62, 203]. The idea is to build a set of (correlated or uncorrelated) trees for the same training sample, and then combine their results. 17 Multiple trees have been built using randomness [203] or using different subsets of attributes for each tree [443, 444] Classification results of the trees have ....
S.W. Kwok and Carter. C. Multiple decision trees. In R.D. Schachter, T.S. Levitt, L.N. Kanal, and J.F. Lemmer, editors, Uncertainty in Artificial Intelligence, volume 4, pages 327--335. Elsevier Science, Amsterdam, 1990.
....and Vedelsby [17] measure the diversity, called ambiguity, of an ensemble of base classifiers, by calculating the mean square difference between the prediction made by the ensemble and the base classifiers. They proved that increasing ambiguity diversity decreases overall error. Kwok and Carter [18] showed that ensembles with decision trees that were more syntactically diverse achieved lower error rates than ensembles consisting of less diverse decision trees, while Ali and Pazzani [1] suggested that the larger the number of gain ties, 2 the greater the ensemble s syntactic diversity is, ....
S. Kwok and C. Carter. Multiple decision trees. In Uncertainty in Aritificial Intelligence 4, pages 327--335, 1990.
....10 fold cross validated committees. They found that cross validated committees worked best, bagging second best, and multiple random initial weights third best on one synthetic data set and two medical diagnosis data sets. For the C4.5 decision tree algorithm, it is also easy to inject randomness (Kwok Carter, 1990). The key decision of C4.5 is to choose a feature to test at each internal node in the decision tree. At each internal node, C4.5 applies a criterion known as the information gain ratio to rank order the various possible feature tests. It then chooses the top ranked feature value test. For ....
Kwok, S. W., & Carter, C. (1990). Multiple decision trees. In Schachter, R. D., Levitt, T. S., Kannal, L. N., & Lemmer, J. F. (Eds.), Uncertainty in Artificial Intelligence 4, pp. 327--335. Elsevier Science, Amsterdam.
No context found.
S. W. Kwok and C. Carter. Multiple decision trees. In R. D. Shachter, T. S. Levitt, L. N. Kanal, and J. F. Lemmer, editors, Uncertainty in Artificial Intelligence 4, pages 327--335. Elsevier Science Publishers B. V., North Holland, 1990.
No context found.
S.W. Kwok and Carter. C. Multiple decision trees. In R.D. Schachter, T.S. Levitt, L.N. Kanal, and J.F. Lemmer, editors, Uncertainty in Arti#cial Intelligence,volume 4, pages 327#335. Elsevier Science, Amsterdam, 1990.
No context found.
Kwok S. and Carter C. (1990). Multiple decision trees. Uncertainty in Artificial Intelligence, 4, 327-335.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC