| Ali K. M., Pazzani M. J., Error Reduction through Learning Multiple Descriptions, Machine Learning, 24: 3, 173-202, 1996. |
....that learn an arbiter to arbitrate among predictions generated by different classifiers, and a combiner to merge the predictions of several classifiers. Similar work in this direction also includes the stacked generation [36, 39] and combining multiple rule sets using Bayesian utility theory [1]. Our technique is different from the meta learning approach. We simply refine a classifier using the training data. Unlike metalearning, no voting is involved in our process. We do not require multiple classification techniques. Another popular framework is adaptive resampling [3, 17, 31] ....
. Ali, K. and Pazzani, M. Error reduction through learning multiple descriptions. Machine Learning, 1996.
....combines the three classifiers into an ensemble meta classifier by learning how they predict, i.e. by observing their input output behavior. 16 Several methods for integrating ensembles of models have been studied, including techniques that combine the set of models in some linear fashion [ Ali Pazzani, 1996; Breiman, 1994; 1996; Freund Schapire, 1995; Krogh Vedelsby, 1995; LeBlanc Tibshirani, 1993; Littlestone Warmuth, 1989; Opitz Shavlik, 1996; Perrone Cooper, 1993; Schapire, 1990; Tresp Taniguchi, 1995 ] techniques that employ referee functions to arbitrate among the predictions ....
....overall error. A similar conclusion is reached by Kwok and Carter in [ Kwok Carter, 1990 ] Their study shows that ensembles with decision trees that were more syntactically diverse achieved lower error rates than ensembles consisting of less diverse decision trees, while Ali and Pazzani [ Ali Pazzani, 1996 ] suggest that the larger the number of gain ties, the greater the ensemble s syntactic diversity is, which may lead to less correlated errors among the classifiers and hence lower error rates. However, they also cautioned that syntactic diversity may not be enough and members of the ensemble ....
Ali, K., and Pazzani, M. 1996. Error reduction through learning multiple descriptions. Machine Learning 24:173--202.
....One can employ several different measures and methods to analyze, compare and manage the formation of ensembles of classifiers. Here, we focus on the diversity and specialty metrics. Accuracy, correlation error and coverage are other metrics explored in the literature. For example, Ali and Pazzani [1] define correlation error as the fraction of instances for which a pair of base classifiers make the same incorrect predictions and Brodley and Lane [5] measured coverage by computing the fraction of instances for which at least one of the base classifiers produces the correct prediction. 3.1 ....
....base classifiers. They proved that increasing ambiguity diversity decreases overall error. Kwok and Carter [18] showed that ensembles with decision trees that were more syntactically diverse achieved lower error rates than ensembles consisting of less diverse decision trees, while Ali and Pazzani [1] suggested that the larger the number of gain ties, the greater the ensemble s syntactic diversity is, which may lead to less correlated errors among the classifiers and hence lower error rates. However, they also cautioned that syntactical diversity may not be enough and members of the ....
K. Ali and M. Pazzani. Error reduction through learning multiple descriptions. Machine Learning, 24:173--202, 1996.
....community. Most of the work in machine learning decomposition can be found either in practical attempts in specific real life applications (see [7] or in treatments of closely related problems mainly in the context of distributed and parallel learning (see [42] or multiple classifiers (see [1]) Figure 1.0 illustrates our approach for arranging the different types of decomposition in supervised learning. Fig. 1. Decomposition methods in supervised learning In Intermediate Concept decomposition instead of learning a single complex classification model, several sub problems with ....
....and the problem complexity. There are two obvious alternatives for measuring the error reduction achieved by using Feature Decomposition approach: measuring the error difference between IFN and DOT or measuring the error ratio (i.e. the error of DOT divided by the error of single IFN) Following [1] we use error ration because it manifests the fact that it becomes Table 1. Summary of experimental results Database Naive Bayes C4.5 IFN D IFN Aust 84.93 2.7 85.36 5.1 84.49 5.1 84.49 2.9 Bcan 97.29 1.6 92.43 3.5 94.39 3.5 97.29 1.6 LED17 63.18 8.7 59.09 6.9 55.55 6.3 ....
Ali K. M., Pazzani M. J., Error Reduction through Learning Multiple Descriptions, Machine Learning, 24(3): 173-202, 1996.
....cation errors of each method to those of MDT CDP (in terms of average relative accuracy improvement and number of signi cant wins and losses) is given in Table 10. A summary of this detailed report is given in Table 6. 4. 4 Diversity of Base Level Classi ers Empirical studies performed in [1, 2] show that the classi cation error of meta level learning methods as well as the improvement of accuracy achieved using them is highly correlated to the degree of diversity of the predictions of the base level classi ers. The measure of the diversity of two classi ers used in these studies is ....
....and boosting have been performed using the WEKA data mining suite [21] which includes J48, a Java re implementation of C4.5. The di erences between the J48 results and the C4.5 results are negligible: an average of 0.01 with a maximum relative di erence of 4 . 17 Error correlation is de ned by [1, 2] as the probability that both classi ers make the same error. This de nition of error correlation is not normalized : its maximum value is the lower of the two classi cation errors. An alternative de nition of error correlation, proposed in [11] is used in this paper. Error correlation is ....
Ali, K. M. and Pazzani, M. J. (1996) Error reduction through learning multiple descriptions. Machine Learning 24: 173-202.
.... to AdaBoost that generates simple perceptrons of random parameters and then combines the perceptron outputs using majority voting [33] similar to generating an ensemble of classifiers through randomizing the internal parameters of a base classifier, previously introduced by Ali and Pazzani [34]. Ji and Ma give an excellent review of various methods for combining classifiers in [35] whereas Dietterich compares ensemble of classifiers to other types of learners, such as reinforcement and stochastic learners in [36] There have also been some attempts for using HMEs in an online setting ....
K. M. Ali and M. J. Pazzani, "Error reduction through learning multiple descriptions," Machine Learn., vol. 24, no. 3, pp. 173--202, 1996.
....problem complexity. There are two obvious alternatives for measuring the error reduction achieved by using Attribute Decomposition approach: measuring the error difference between IFN and D IFN or measuring the error ratio (i.e. the error of D IFN divided by the error of single IFN) Following [1] we use error ration because it manifests the fact that it becomes gradually harder to achieve error reduction as the error of single IFN converge to zero. In order to estimate the problem complexity we used the following ratio: the log of the hypothesis space size divided by the training set ....
K. M. Ali and M. J. Pazzani. Error reduction through learning multiple descriptions. Machine Learning, 24(3):173-- 202, 1996.
....for combining classi ers: MDTs are more accurate and much more concise. The comparison of MLC4.5 and AC4.5 shows that the performance improvement is due to the extended expressive power of MDT leaves. 5. 3 Meta Decision Trees and Diversity of Base Level Classi ers Empirical studies performed in [1, 2] show that the classi cation error of meta level learning methods as well as the improvement of accuracy achieved using them is highly correlated to the degree of diversity of the predictions of the base level classi ers. The measure of the diversity of two classi ers used in these studies is ....
....to the degree of diversity of the predictions of the base level classi ers. The measure of the diversity of two classi ers used in these studies is error correlation. The smaller the error correlation, the greater the diversity of the base level classi ers. Error correlation is de ned by [1, 2] as the probability that both classi ers make the same error. This de nition of error correlation is not normalized : its maximum value is the lower of the two classi cation errors. An alternative de nition of error correlation, proposed in [10] is used in this paper. Error correlation is de ned ....
Ali, K. M. and Pazzani, M. J. (1996) Error reduction through learning multiple descriptions. Machine Learning 24: 173-202. 30
....attempt to capitalize on these ideas appeared in [8] Probably due to the immaturity of those results and to the awkwardness of the terminology used therein, the potential has not been fully explored. Of course, this research flavor has been apparent in [13] who discuss overlapping concepts, in [1] and in [12] the latter two are oriented towards the exploitation of multiple models. Our short review would be incomplete without mentioning RIPPER [5] a system that exploits setvalued features to solve categorization problems in the linguistic domain. Cohen emphasized on symbolic set valued ....
Ali, K.M. and Pazzani, M.J., Error Reduction through Learning Multiple Descriptions, Machine Learning, 24, 173-202, 1996.
....classifiers in an ensemble include simple voting where each component classifier gets an equal vote, and weighted voting, in which each component classifier s vote is weighted by its accuracy (see e.g. Golding and Roth, 1999) More sophisticated weighting methods have been designed as well. e.g. (Ali and Pazzani, 1996) apply the Naive Bayes algorithm to learn weights for classifiers. Voting methods lead to the gang effect discussed earlier. The most interesting approach to combination is stacking in which a classifier is trained to predict the correct output class when given as input the outputs of the ensemble ....
Ali, K.M. and M.J. Pazzani. 1996. Error Reduction through Learning Multiple Descriptions.
....distribution mechanism. It also provides a set of meta learning agents that combine the computed models that were learned (perhaps) at di#erent sites. Several methods for integrating ensembles of models have been studied, including techniques that combine the set of models in some linear fashion (Ali and Pazzani, 1996; Breiman, 1994; Breiman, 1996; Freund and Schapire, 1995; Krogh and Vedelsby, 1995; Opitz and Shavlik, 1996; Perrone and Cooper, 1993; Schapire, 1990; Tresp and Taniguchi, 1995; LeBlanc and Tibshirani, 1993) techniques that employ referee functions to arbitrate among the predictions generated by ....
Ali, K. and Pazzani, M. (1996), `Error reduction through learning multiple descriptions', Machine Learning 24, 173--202.
....seeks to compute a meta classi er that integrates in some principled fashion the separately learned classi ers to boost overall predictive accuracy. Several methods for integrating ensembles of models have been studied, including techniques that combine the set of models in some linear fashion [1, 3, 4, 17, 24, 25, 27, 33, 35, 51, 54], e.g. majority or weighted voting, bagging, etc. techniques that employ referee functions to arbitrate among the predictions generated by the classi ers [7, 20, 22, 50, 21, 23, 34] e.g. arbiters, mixture of experts, etc. methods that rely on principal components analysis [29, 31] e.g. ....
K. Ali and M. Pazzani. Error reduction through learning multiple descriptions. Machine Learning, 24:173-202, 1996.
....distribution mechanism. It also provides a set of meta learning agents that combine the computed models that were learned (perhaps) at di#erent sites. Several methods for integrating ensembles of models have been studied, including techniques that combine the set of models in some linear fashion [1, 2, 3, 12, 20, 27, 29, 37, 39, 21], techniques that employ referee functions to arbitrate among the predictions generated by the classifiers, 16, 17, 18, 19, 28, 36] methods that rely on principal components analysis [23, 24] or methods that apply inductive learning techniques to learn the behavior and properties of the ....
K. Ali and M. Pazzani. Error reduction through learning multiple descriptions. Machine Learning, 24:173--202, 1996.
....an approach using weighted voting. They find an estimate of the predictive accuracy of each decision region using crossvalidation. This predictive accuracy value is then used in to weight the final vote. Their aim is similar to ours in that they wish to combine models learned on separate datasets. Ali and Pazzani (1996) investigated combining decision trees learned from disjoint datasets using several different voting mechanisms. Their 5 main conclusion was that voting increases accuracy by eliminating uncorrelated errors amongst the models. This is relevant to our findings comparing the behavior of DAGGER with ....
K. M. Ali and M. J. Pazzani (1996). Error Reduction through Learning Multiple Descriptions, Machine Learning, 24, pages 173-203, Kluwer Academic Publishers, Boston.
....classifiers for the final classification, we adopt meta learning to combine predictions of the individual classifiers. Next, we focus on the diversity and specialty metrics. Accuracy, correlation error and coverage are other metrics metrics explored in the literature. For example, Ali and Pazzani [1] define correlation error as the fraction of instances for which a pair of classifiers make the same incorrect predictions and Brodley and Lane [4] measure coverage by computing the fraction of instances for which at least one of the classifiers produces the correct prediction. 1 TP stands for ....
....classifiers. When the predictions of the classifiers are distributed evenly across the possible classes, the entropy is higher and the set of classifiers more diverse. Kwok and Carter [13] correlate the error rates of a set of decision trees to their syntactical diversity, while Ali and Pazzani [1] studied the impact of the number of gain ties 2 on the accuracy of an ensemble of classifiers. Here, we measure the diversity within a set of classifiers by calculating the average diversity of all possible pairs of classifiers in that set : where denotes the prediction of the instance by the ....
K. Ali and M. Pazzani. Error reduction through learning multiple descriptions. Machine Learning, 24:173--202, 1996.
No context found.
Ali K. M., Pazzani M. J., Error Reduction through Learning Multiple Descriptions, Machine Learning, 24: 3, 173-202, 1996.
No context found.
Ali KM, Pazzani MJ. Error reduction through learning multiple descriptions. Machine Learning 1996; 24:173--202
No context found.
Ali, K. M., & Pazzani, M. #1996#. Error reduction through learning multiple descriptions. Machine Learning, 24 #13#, 173#202.
No context found.
K. M. Ali and M. J. Pazzani, "Error reduction through learning multiple descriptions," Machine Learning, vol. 24, pp. 173--202, 1996.
No context found.
K. Ali and M. Pazzani. Error reduction through learning multiple descriptions. Machine Learning, 24(3), 1996.
No context found.
K. Ali and M. Pazzani. Error reduction through learning multiple descriptions. Machine Learning, 24(3):173-206, 1996.
No context found.
K. Ali and M. J. Pazzani, "Error reduction through learning multiple descriptions," Machine Learning, vol. 24, p. 173, 1996.
No context found.
K. Ali, and Pazzani M. Error Reduction through Learning Multiple Descriptions. Machine Learning, 24:3, 1996.
No context found.
K.M. Ali and M.J. Pazzani. Error reduction through learning multiple descriptions. Machine Learning, 24(3):173--202, 1996.
No context found.
Kamal M. Ali and Michael J. Pazzani. Error reduction through learning multiple descriptions. Machine Learning, 24:173--202, 1996.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC