| Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth International Group, Blemont, California, 1984. |
....Maybe the two most glaring omissions include tree based methods and artificial neural nets. While these methods have also been parallelised they rely less on parallel numerical linear algebra. Tree based methods provide piecewise constant functions which are used for classification and regression [48, 12]. One motivation for the MARS algorithm was to provide a smooth version of the regression trees. The bagging idea we discussed for additive models is directly applicable to classification trees [53] Trees also have a natural parallelism which is based on the partitioning of the domain but which ....
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth International Group, Blemont, California, 1984.
....# 2 respectively then the decrease of the impurity due to the splitting is modelled as #(p) s#(p 1 ) s)#(p 2 ) where s is the proportion of cases in # 1 . Examples for impurity functions are: # 1 (p) min(p, 1 p) based on misclassification rate) # 2 (p) 2p(1 p) Gini [20]) # 3 (p) p log(p) p) log(1 p) Entropy [62] 0.0 0.2 0.4 0.6 0.8 1.0 0.1 0.2 0.3 0.4 0.5 p miscl. error gini entropy Figure 4.6: Three impurity functions. A plot of these three functions can be found in figure 4.6. Alternatively, the value is used. The construction ....
....to the dimension. They have two shortcomings, however: They do not represent linear functions well and, more generally, they can not approximate smooth functions accurately as they are discontinuous, piecewise constant functions. The regression tree approach is described by Breiman et al. [20]. Temp 28.06 Wind =11.5 Solar.R 79.5 Temp 25.28 Temp 30.83 Wind =14.32 12.22 21.18 34.56 55.6 45.57 72.31 90.06 Figure 4.7: Regression tree for Ozone data (temperature in Celsius and wind speed in km h) 50 100 150 Figure 4.8: Estimated vs observed values of regression tree for ....
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth International Group, Blemont, California, 1984.
....[240] built binary segmentation trees aimed towards unearthing the interactions between predictor and dependent variables. A standard reference for the current work on decision trees from a sta tistical perspective is Breiman et al. s excellent monograph on classification and regression trees [44]. Pattern recognition work on decision trees was motivated by the need to interpret images from remote sensing satellites such as LANDSAT in the 19 0s [464] An overview of work on decision trees in the patter recognition literature can be found in [106] A high level comparative perspective on ....
....can be found in [ 6] Decision trees in particular, and induction methods in general, arose in machine learning to avoid the knowledge acquisition bottleneck [137] for expert systems. A majority of work on decision trees in machine learning is an offshoot of Breiman et al. s CART work [44] ana (uinlan s ID3 algorithm [391] uinlan s book on C4.5 [398] although specific to his tree building program, is perhaps the best available overview of tree methodology from a machine learning perspective. In sequential fault diagnosis, a set of possible tests with associated costs and a set ....
[Article contains additional citation context not shown here]
LEO BREIMAN, JEROME FRIEDMAN, RICHARD OLSHEN, AND CHARLES STONE. Classification and Regression Trees. Wadsworth International Group, 1984.
....a predictor from a training set; and cost of deployment, the time it takes to generate predictions for new data objects. Accuracy typically depends on the problem domain, although some broad brush qualitative comparisons between data mining techniques are possible. For example, decision trees [7, 11], which are the workhorses of prediction, can achieve accuracies in the high ninety percents. For a training set with n examples, each with d attributes, producing a decision tree of size t, the cost of construction ranges from O(dn) to O(dn 2 log n) depending on the precise kind of decision ....
Leo Brieman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.
....node except a terminal node is specified by one of the features, a critical value of that feature, and the hypothesis generated when a sample reaches that node. A terminal node is specified by a hypothesis only. In CART and C4.5, the splitting criterion is based on an information theoretic value [3, 12]. Let us look at another splitting criterion first discussed by others [4, 8, 14] We have looked at this criterion (termed the Z criterion) over a range of difficult and hard problems and found it superior to that of C4.5. Let us examine the denominator of equation (1) of Figure 3. Its value is ....
Leo Breiman, Jerome Friedman, Richard A Olshen, and Charles Stone, Classification and Regression Trees, Wadsworth International Group, 1984.
....to other implementations give lower error rates. Keywords: classification trees, boosting, C4.5 Introduction One aspect of this paper is a discussion of a new splitting criterion, called the Z function which can be used to build classification trees as opposed to the splitting criteria of C4.5 [2] and CART [18] and others discussed by Mingers [17] The second objective of this paper is to describe a new procedure to combine trees in a boosting ensemble of trees that has much better performance than a single classification tree. Boosting as a learning algorithm was initiated by Schapire ....
Leo Breiman, Jerome Friedman, Richard A Olshen, and Charles Stone, Classification and Regression Trees, Wadsworth International Group, 1984.
....hypothesis for the index function I(q) 2 p q(1 Gamma q) Empirically, however, the weak tree growth hypothesis seems to hold for a variety of index functions that were already used for tree growth prior to the work of Kearns and Mansour. The Ginni index I(q) 4q(1 Gamma q) is used in CART [1] and the entropy I(q) Gammaq log q Gamma (1 Gamma q) log(1 Gamma q) is used in C4.5 [7] It has long been empirically observed that it is possible to make steady progress in reducing I(T ) for these choices of I while it is difficult to make steady progress in reducing ffl(T ) We now ....
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.
....CHAID [240] built binary segmentation trees aimed towards unearthing the interactions between predictor and dependent variables. A standard reference for the current work on decision trees from a statistical perspective is Breiman et al. s excellent monograph on classification and regression trees [44]. Pattern recognition work on decision trees was motivated by the need to interpret images from remote sensing satellites such as LANDSAT in the 1970s [464] An overview of work on decision trees in the patter recognition literature can be found in [106] A high level comparative perspective on ....
....intelligence can be found in [76] Decision trees in particular, and induction methods in general, arose in machine learning to avoid the knowledge acquisition bottleneck [137] for expert systems. A majority of work on decision trees in machine learning is an offshoot of Breiman et al. s CART work [44] and Quinlan s ID3 algorithm [391] Quinlan s book on C4.5 [398] although specific to his tree building program, is perhaps the best available overview of tree methodology from a machine learning perspective. In sequential fault diagnosis, a set of possible tests with associated costs and a set ....
[Article contains additional citation context not shown here]
Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone. Classification and Regression Trees. Wadsworth International Group, 1984.
....Oliver and Hand also included subtrees resulting from those splits that were considered but rejected when the decision tree was grown. This can be modeled in our setting by storing multiple prediction rules (one for each rejected split) at the nodes (see Section 5) According to Breiman et al. [1], pages 87 92, predicting with the leaves of a decision tree will often have error at most twice that of the best pruning. They argue why this is likely when the structure of the decision tree is created from a training set independent of the test set (but drawn from the same distribution) In ....
....the size measure defined in the previous section. This prior favors those prunings which are small and thus unlikely to reflect noise in the training set. In small prunings, each leaf s prediction will tend to be based on more examples; see the discussion of bias versus variance in Breiman et al. [1], pages 87 92. Although the master algorithm can run with any prior on the prunings, this 2 GammajPj prior enables us to efficiently implement the master algorithm as described in Section 4. At each time step, the learner computes its prediction as y t = F fi (r t ) where fi 2 [0; 1] is ....
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.
....(factor experiments) is an arduous task. Display of information In order to understand the problem better, and perhaps bias a learning algorithm, it is helpful to view the resulting structures (e.g. decision tree) or the data. Even commercially available programs such as C4.5 [17] and CART [3] give only a rudimentary display of an induced tree. A library providing a common framework and basic tools for implementing learning algorithms would alleviate the pain involved in programming from scratch. Since researchers will need to write less code, they can write higher quality code and do ....
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.
....encapsulates knowledge. For example, one tree induced concerning the prognosis of patients after a heart attack based upon their symptoms consisted of three nodes and four leaves. Thus a decision tree can dramatically compress information. Algorithms such as ID3 [Qui86] C4.5 [Qui93] and CART [BFOS84] automatically induce decision trees from data. These trees can label unseen cases with one of the possible decisions. A decision tree can be considered a knowledge object. In distributed knowledge discovery, the data sets may reside in distributed sites or the data may be generated there on the ....
Leo Brieman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, California, 1984.
No context found.
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth International Group, Blemont, California, 1984.
No context found.
L. Brieman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and regression trees. Wadsworth International Group, Belmont, MA, 1984.
No context found.
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and regression trees. Wadsworth International Group, Belmont CA, 1984.
No context found.
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and regression trees. Wadsworth International Group, 1984.
No context found.
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone (1984), \Classication and Regression Trees," Wadsworth International Group, Belmont, California.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC