36 citations found. Retrieving documents...
T. Niblett. Constructing decision trees in noisy domains. In Proc. 2nd European Working Session on Learning, pages 67--78, Bled, Yogoslavia, 1987.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Finite Mixture Model of Bounded Semi-Naive Bayesian Networks .. - Huang, King, Lyu   (Correct)

....i=1 P (A l 1 C) In this way, in handling sparse database, the B SNB can at lease maintain the performance of NB since it will degrade towards the NB classifier for the absence of many configurations of large attributes. To tackle the second issue, we use the popular Laplace correction methods [27]. The modified estimated empirical probability for P (A j = a jk C i ) is (n ijk f) n i fn j ) instead of the uncorrected one: n ijk n i , where a jk is the value of an attribute A j , n ijk is the number of times class C i and the value a jk of attribute A j occur together, n i is the number ....

T Niblett. Constructing decision trees in noisy domains. In Proceedings of the Second European Working Session on Learning, pages 67--78, 1987.


A Trade-Off Between Depth and Impurity for Pruning.. - Fournier..   (Correct)

....are presented in Safavian and Landgrebe (1991) 21] In many areas, we are sure a priori that it is impossible to build a tree that correctly classi es all the examples. In such situations, decision tree algorithms tend to divide nodes having few examples and a main drawback appears (see [1] [15], 3] and [22] the resulting trees tend to be very large and overspeci ed. Some branches, especially towards the bottom, are present due to sample variability and are statistically meaningless (one can also say that they are due to noise in the sample) Such branches, must either not be built ....

....domains and motivations 2. 1 An overview of existing decision trees pruning methods The principal methods for pruning decision trees are examined in [18] 14] 9] and [12] Cestnik and Bratko [4] suggest an approach upgrading the minimization of the expected classi cation error method [15]. The latter method uses Laplace probability estimates. Cestnik and Bratko present some improvements on the original method. Their method does not need to assume a uniform initial distribution of the class, the degree of pruning is not a ected by the number of classes, and it produces several ....

T. Niblett. Constructing decision trees in noisy domains. In proceedings of 2nd European Working Sessions on Learning EWSL 87, pages 67-78, Bled (Yugoslavia), 1987. Sigma Press.


Use of Attribute Selection Criteria in Decision Trees in .. - Crémilleux..   (Correct)

....criteria. Many domains are uncertain and in such areas, like medicine, we are sure a priori that it is impossible to build a tree that correctly classifies all the examples. In such situations, decision tree algorithms tend to divide nodes having few examples and a main drawback appears (see [3][18][6] 23] the resulting trees tend to be very large and overspecified. Some branches, especially towards the bottom, are due to sample variability and are statistically meaningless (one can also say that they are present due to noise in the sample) Such branches must either not be built or be ....

T. Niblett. Constructing decision trees in noisy domains. In proceedings of 2nd European Working Sessions on Learning EWSL 87, pages 67--78, Bled (Yugoslavia) , 1987. Sigma Press.


Uncertain Domains and Decision Trees: ORT versus C.M.. - Crémilleux..   (Correct)

....criteria. Many domains are uncertain and in such areas, like medicine, we are sure a priori that it is impossible to build a tree that correctly classifies all the examples. In such situations, decision tree algorithms tend to divide nodes having few examples and a main drawback appears (see [2][14][4] 18] the resulting trees tend to be very large and overspecified. Some branches, especially towards the bottom, are due to sample variability and are statistically meaningless (one can also say that they are present due to noise in the sample) Such branches must either not be built or be ....

T. Niblett. Constructing decision trees in noisy domains. In proceedings of 2nd European Working Sessions on Learning EWSL 87, pages 67--78, Bled (Yugoslavia), 1987. Sigma Press.


A Statistical Approach to 3D Object Detection Applied to Faces .. - Schneiderman (2000)   (17 citations)  (Correct)

....The corresponding bin in the histogram will have zero count denoting zero probability. However, it may not be desirable to actually assign zero probability to this bin during testing, since its probably of occurrence usually will not be zero. To correct for this situation, we use Laplace correction[8], where we add one to each bin in the histogram. 5.4. Training Method (II) Minimization of Classification Error using AdaBoost Training each histogram separately as described in Section 5.3, however, is not optimal for classification. To achieve better results, we should explicitly train our ....

T. Niblett. "Constructing decision trees in noisy domains." Proceedings of the second Europena working session on learning. pp 67-78. Bled, Yugoslavia.


A Comparative Analysis of Methods for Pruning Decision Trees - Esposito, Malerba, Semeraro (1997)   (14 citations)  (Correct)

....classify all the training instances. One way around this short sightedness is that of adopting a post pruning method. In this case, a tree T max is grown even when it seems worthless and is then retrospectively pruned of those branches that seem superfluous with respect to predictive accuracy [22]. For instance, for the same data set reported above, a learning system could generate the tree in Fig. 1c, in which case it is preferable to prune up to the root since no leaf captures regularities in the data. The final effect is that the intelligibility of the decision tree is improved, without ....

T. Niblett, "Constructing Decision Trees in Noisy Domains," Progress in Machine Learning, Proc. EWSL 87, I. Bratko and N. Lavrac, eds. Wilmslow: Sigma Press, pp. 67-78, 1987.


Well-Trained PETs: Improving Probability Estimation Trees - Provost, Domingos (2000)   (8 citations)  (Correct)

....of 1 C for each class note that with zero examples the probability of each class is 1 C . This may or may not be desirable for a speci c problem; however, practitioners have found the Laplace correction worthwhile. To our knowledge, the Laplace correction was introduced in machine learning by Niblett (1987). Clark and Boswell (1991) incorporated it into the CN2 rule learner, and its use is now widespread. For decision tree learning the Laplace correction has been used by certain researchers and practitioners (Pazzani et al. 1994; Bradford, Kunz, Kohavi, Brunk, Brodley, 1998; Provost et al. 1998; ....

Niblett, T. (1987). Constructing decision trees in noisy domains. In Proceedings of the Second European Working Session on Learning, pp. 67-78 Bled, Yugoslavia. Sigma.


On the Optimality of the Simple Bayesian Classifier under.. - Pazzani (1997)   (107 citations)  (Correct)

....zero probabilities will wipe out the information in all the other probabilities P (A j = v jk C i ) when they are multiplied according to Equation 2. A principled solution to this problem is to incorporate a small sample correction into all probabilities, such as the Laplace correction (Niblett, 1987). If n ijk is the number of times class C i and value v jk of attribute A j occur together, and n i is the total number of times class C i occurs in the training set, the uncorrected estimate of P (A j = v jk C i ) is n ijk n i , and the Laplacecorrected estimate is P (A j =v jk C i ) n ....

Niblett, T. (1987). Constructing decision trees in noisy domains. Proceedings of the Second European Working Session on Learning (pp. 67--78). Bled, Yugoslavia: Sigma.


Learning Users Habits: The APE Project - Ruvini, Dony (1999)   (1 citation)  (Correct)

....h, hP the number of examples h matches in P and hN the number of examples h matches in N . Then, we use four functions: Match(h; OE x; c ) the subsumption operator. Relevance(h) computes a numerical evaluation of the quality of h. It is similar to the constant time Laplace correction [Niblett, 1987]. Submit(h; OE x; c ) updates the local approximation of OE x; c , i.e. xH h if Relevance(h) Relevance(xH ) HypGen(OE x; c ; P [ H;DL ) returns a set of hypotheses built from OE x; c i.e. S jD L j i=1 HypGen i ( OE x; c ; P [ H) Where: DL is a set of description ....

T. Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning, pages 67--78, Wilmslow, 1987. Sigma Press.


Extending Naïve Bayes Classifiers Using Long Itemsets - Meretakis, Wüthrich (1999)   (3 citations)  (Correct)

....than one itemsets qualify one is randomly chosen. 4.3 Zero Counts and Smoothing The set of conditional probabilities used in line 11 of classify can be unreliable for small data sets. A standard statistical technique is to incorporate a small sample correction into the observed probabilities [N87]. This is called smoothing in [FGG97] and helps eliminating unreliable estimates and zero counts. Let P(x,y,c i ) and P(y,c i ) be the observed frequencies in D of x,y,c i and y,c i respectively (x and y are itemsets) Instead of P(x y,c i ) P(x,y,c i ) P(y,c i ) we use the smoothed ....

T.Niblett, "Constructing Decision trees in noisy domains", Proc. of the Second European Working Session on Learning, (pp 67-78), 1987.


Careful Abstraction from Instance Families in Memory-Based.. - van den Bosch (1999)   (Correct)

....and Boswell, 1991) cn2 is an incremental rule induction algorithm that attempts to find the best rule governing a certain amount of instances in the instance base that are not yet covered by a rule. Goodness of a rule is estimated by computing its apparent accuracy with Laplacian correction (Niblett, 1987; Clark and Boswell, 1991) Rule induction ends when all instances are abstracted (covered by rules) rise induces rules in a careful manner, operating in cycles. At the onset of learning, all instances are converted to instance specific rules. During a cycle, for each rule a search is made for ....

Niblett, T. 1987. Constructing decision trees in noisy domains. In Proceedings of the Second European Working Session on Learning, pages 67--78, Bled, Yugoslavia. Sigma.


Learning from Imperfect Data - Pavel Brazdil (1990)   (1 citation)  (Correct)

....unpruned versions of a hypothesis can be directly compared rather than having to estimate the latter s quality during search. Again in rule induction systems this technique is common, for example in the most recent ID3 descendants C4 [16] Assistant 86 [15] and by Niblett and Bratko [17] Niblett [18] gives a review of pre and post pruning techniques used for decision trees. AQ15 [19] employs a post pruning technique for production rules (termed rule truncation ) The use of post pruning of decision tree branches to generate production rules has been used by Corlett [20] and Quinlan [21] ....

Tim Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning (proceedings of the 2nd European Working Session on Learning), pages 67--78, Sigma, Wilmslow, UK, 1987.


Bayesian Integration of Rule Models - Domingos   (Correct)

....randomly partitioning the database into several smaller ones, running the system on each one separately, and combining the results. This combination was originally performed by letting each rule set vote for the class it predicts, with a weight equal to the Laplace corrected training set accuracy (Niblett, 1987; Good, 1965) of the rule that won the example in that rule set (Domingos, 1996b) This ad hoc approach was compared with BMA on eight of the larger databases in the UCI repository. BMA was applied using Equations 3 and 4 above, with each region being the region of instance space won by each ....

Niblett, T. (1987). Constructing decision trees in noisy domains. In Proceedings of the Second European Working Session on Learning (pp. 67--78). Bled, Yugoslavia: Sigma.


A Further Comparison of Simplification Methods for.. - Malerba, Esposito.. (1996)   (3 citations)  (Correct)

....belong to the same class is too weak, since it does not prevent the generation of harmful branches. A typical approach adopted in many TDIDT systems grows a tree, T max , to maximum depth, and then retrospectively removes those branches that seem superfluous with respect to predictive accuracy [Nib87] The final effect should be that of improving the intelligibility of a decision tree without really affecting its predictive accuracy. Recently, Fisher [Fis92] and Schaffer [Sch93] have shown that overfitting avoidance by means of tree pruning is a form of bias (here intended as a set of factors ....

T. Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning. Sigma Press, Wilmslow, 1987.


On Growing Better Decision Trees from Data - Murthy (1997)   (17 citations)  (Correct)

....of the data. For very large tree classifiers, the critical issue is optimizing structural properties (height, balance etc. 493, 71] Breiman et al. 44] pointed out that tree quality depends more on good stopping rules than on splitting rules. Effects of noise on generalization are discussed in [363, 253]. Overfitting avoidance as a specific bias is studied in [507, 428] Effect of noise on classification tree construction methods is studied in the pattern recognition literature in [468] Several techniques have been suggested for obtaining the right sized trees. The most popular of these is ....

T. Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning. Sigma Press, England, 1986.


Rule Induction and Instance-Based Learning: A Unified Approach - Domingos (1995)   (25 citations)  (Correct)

....to a given possible value if it behaves similarly to it, and inversely if it doesn t. When two or more rules are equally close to a test example, the rule that was most accurate on the training set wins. So as to not unduly favor more specific rules, the Laplace corrected accuracy is used [ Niblett, 1987 ] LAcc(R) N corr (R) 1 Nwon (R) C (5) where R is any rule, C is the number of classes, Nwon (R) is the total number or examples won by R, N corr (R) is the number of examples among those that R correctly classifies, and C is the number of classes. The effect of the Laplace correction is ....

T. Niblett. Constructing decision trees in noisy domains. In Proc. EWSL-87, pages 67--78, 1987.


Using Partitioning to Speed Up Specific-to-General Rule Induction - Domingos (1996)   (1 citation)  (Correct)

....Stanfill and Waltz s value difference metric for symbolic attributes (Stanfill Waltz 1986) When two or more rules are equally close to a test example, the rule that was most accurate on the training set wins. So as to not unduly favor more specific rules, the Laplace corrected accuracy is used (Niblett 1987): LAcc(R) N corr (R) 1 Nwon (R) C (1) where R is any rule, C is the number of classes, Nwon (R) is the total number or examples won by R, N corr (R) is the number of examples among those that R correctly classifies, and C is the number of classes. The effect of the Laplace correction is ....

Niblett, T. 1987. Constructing decision trees in noisy domains. In Proceedings of the Second European Working Session on Learning, 67--78. Bled, Yugoslavia: Sigma.


The Time Complexity of Decision Tree Induction - Martin, Hirschberg (1995)   (2 citations)  (Correct)

....(and whether to perform tree surgery beyond mere pruning) From the point of view of minimizing learning time, there can be no question that post processing is very expensive, and that stopping is a more effective approach. There have been a number of studies in the area of stopping, pruning, etc. [4, 6, 19, 20, 23, 24, 29]. Some of the noteworthy findings are that: 1) in general, it seems better to post process using an independent data set than to stop, 2) cross validation seems to work better than point estimates such as the 2 test, 3) the decision to prune is a form of bias whether pruning will ....

T. Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning: Proceedings of the European Working Session on Learning (EWSL-87). Sigma Press, Wilmslow, 1987.


Context-Sensitive Feature Selection for Lazy Learners - Domingos (1997)   (20 citations)  (Correct)

....the nearest instance is found, and its class assigned to the example. The accuracy of an instance is the fraction of the examples it won that were indeed of its class. Because this tends to over estimate the accuracy of instances that win very few examples, the Laplace corrected accuracy (Niblett, 1987) is used instead: LAcc(I) N corr (I) 1 Nwon (I) C (5) where I is any instance, C is the number of classes, Nwon (I) is the total number or examples won by I, and N corr (I) is the number of examples that I wins and correctly classifies. At classification time, LazyRISE compares the new ....

Niblett, T. (1987). Constructing Decision Trees in Noisy Domains. In Proceedings of The Second European Working Session on Learning, 67--78. Bled, Yugoslavia: Sigma.


Pessimistic Decision Tree Pruning Based on Tree Size - Mansour (1997)   (6 citations)  (Correct)

....is significant. We try to estimate (or at least to upper bound) the true error of a subtree, based on the number of examples it classifies, the number of errors, the number of classes, and the size of the tree. One method that falls into our category of pessimistic pruning is Minimum error pruning [Nib87]. This method tries to estimate the true error using the following formula, e (r Gamma 1) n r where r is the number of classes, n the number of examples, and e is the number of errors. The theoretical justification for the formula is simple. Assuming a uniform prior over the classes, the ....

T. Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning-- Proceedings of EWSL 87: 2nd European Working Session on Learning, pages 67--78, Bled, Yugoslavia, May 1987.


Concept Learning and the Problem of Small Disjuncts - March Robert   (Correct)

....such as ID3, refrain from testing the significance of small disjuncts 2 . In any case, the significance of small disjuncts is not reliably estimated, with the undesirable result that significant small disjuncts may be eliminated and insignificant ones retained. This problem is not insuperable: Nib87] gives an exact test for significance. Error Rate Estimation Error rate cannot be tested exactly: it can only be estimated. Like approximate tests of significance, techniques for estimating error rate are not entirely reliable for small disjuncts. 2 [Qui86, page 154] The action taken in lieu ....

....is too drastic when the number of learning examples per class per attribute is low. The Need for Both Significance and Error Rate Testing No existing system tests both significance and error rate. Pre pruning systems use significance testing; post pruning systems use error estimation [Nib87] Indeed, post pruning systems have a rather strong disregard for the significance of disjuncts, their sole objective being to eliminate from a definition as many disjuncts as possible without suffering too great an increase in error rate. A one test approach is sufficient to eliminate ....

[Article contains additional citation context not shown here]

Tim Niblett. Constructing decision trees in noisy domains. In Ivan Bratko and Nada Lavrac, editors, Progress in Machine Learning, pages 67--78. Sigma Press, Wilmslow, England, 1987.


An Analysis of Oversearch - Segal (1996)   (1 citation)  (Correct)

....appearing in each rule are selected from a set consisting of all tests of the form A i = v and A i 6= v for each discrete attribute A i and all tests of the form A j t and A j t for each continuous attribute A j . Each rule in the hypothesis space is evaluated using Laplace predicted error [Niblett, 1987]. If a rule matches n examples of the training data and incorrectly predicts the classification of e of those examples, then the Laplace error for this rule is Laplace = e k 0 1 n k where k is the number of classes. The experiments are conducted using the eleven data sets from the UCI ....

T. Niblett. Constructing decision trees in noisy domains. In Progress in Machine Learning (Proceedings of the 2nd European Working Session on Learning), pages 67--78, Wilmslow, UK, 1987.


The CN2 Induction Algorithm - Clark, Niblett (1989)   (227 citations)  Self-citation (Niblett)   (Correct)

....requiring relaxation of the constraint that the induced description must be classify the training data perfectly. Fortunately the id3 algorithm lends itself to easy modification allowing this constraint to be relaxed, by the nature of its general to specific search. Tree pruning techniques (e.g. [4, 5]) as used for example in the systems c4 [6] and assistant [7] have proved to be effective methods of avoiding overfitting. The aq algorithm, however, is less easy to modify due to its dependence on specific training examples during its search. Existing implementations (e.g. aq11 [8] and aq15 ....

Tim Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning (proceedings of the 2nd European Working Session on Learning), pages 67--78. Sigma, Wilmslow, UK, 1987.


The PNC 2 Cluster Algorithm - An integrated learning algorithm.. - Haendel (2003)   (Correct)

No context found.

T. Niblett. Constructing decision trees in noisy domains. In Proc. 2nd European Working Session on Learning, pages 67--78, Bled, Yogoslavia, 1987.


Automatic Construction of Decision Trees from Data: A.. - Murthy (1997)   (37 citations)  (Correct)

No context found.

T. Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning. Sigma Press, England, 1986. 44 SREERAMA K. MURTHY

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC