| Breiman, L., Friedman, J.H., Olshen, R. A., Stone C. J.: Classification and Regression Trees, Belmont, CA: Wadsworth International Group, 1984. |
....have been proposed. The simplest approach is to divide the problem into subproblems that have no common elements, also called a hard split of the data. The optimum solution of the smaller problems can then be chosen on a winner takes all basis. Classi cation and Regression Trees (CART) [3] are based on this principle. Stacked Generalization [40] also uses a hard split of the data and a weighted sum with weights derived from the performance of the smaller problems in their partition space. In contrast, HME divides the large problem into sub problems that can have common elements a ....
Breiman L, Friedman JH, Olshen RA, and Stone CJ. Classi cation and regression trees. Belmont, CA: Wadsworth International Group,1984.
....problems a combination of methods can work best. Typically, all of the methods have a single parametric form, but each method is tuned to give high performance for a particular sub task in the overall problem. Examples of these approaches include Classification and Regression Trees (CART) [16] and Multivariate Adaptive Regression Splines (MARS) 43] These methods decompose large (multivariate) problems into subregions, each of which is solved by a single regression function (or expert, in our terminology) To classify a novel pattern, the procedure finds the expert associated with the ....
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Belmont, CA: Wadsworth International Group, 1984.
....is very costly to misclassify minority instances. In these cases the training distribution is modified by oversampling the minority class or by undersampling the majority class, because the learning algorithm either cannot accept explicit cost information or cannot use this information effectively [Breiman, et al. 1984; Drummond and Holte, 2000; Kubat and Matwin, 1997] Our research is not restricted to highly skewed data sets and consequently one of the things we are able to show is that even when a data set s natural class distribution is nearly balanced, it still often is not best for learning. Other studies ....
Leo Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Belmont, CA: Wadsworth International Group.
....in predictive accuracy after PCA has been applied. This method, when combined with the FW V SM feature weight learning algorithm, boosted the performance of kNNwv in the Waveform domains from 81 correct to 85 correct which is very close to the Bayes optimal classification rate of 86 correct [BFOS84] We conclude that either the mutual information or the VSM feature weight learning algorithms should be employed whenever nearest neighbor algorithms are used and that features should be pre processed via PCA in domains with solely continuous features. 4.5 Related Work Nearest neighbor ....
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and regression trees. Belmont, CA: Wadsworth International Group, 1984.
....new kind of neural network called Neural Tree Network (NTN) 5, 6] This network has a tree structure with a neural network at each tree node. Unlike many standard classification trees, the NTN is not a binary tree. An optimal pruning algorithm for binary classification trees has been developed in [7]. We extend this approach to the case of arbitrary tree structures such as the NTN. In section 2, the NTN is reviewed. The pruning algorithm is described in section 3. In section 4, simulation results on a speaker independent vowel recognition task are presented. The summary and conclusions are ....
....set. The learning algorithm grows the NTN so that DT is below some small tolerance level. However, for good generalization, it is necessary to prune the NTN as described in the next section. 3 The Pruning Algorithm An optimal pruning algorithm for binary classification trees has been given in [7]. We extend this algorithm to the case of arbitrary trees such as the NTN. We first establish some notation and definitions. Denote the NTN grown by the learning algorithm as T . Definition 1 A trivial tree consists of just a single node. Definition 2 A subtree T 0 of T has the same root node ....
[Article contains additional citation context not shown here]
L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Belmont,CA: Wadsworth international group, 1984.
....algorithm for finding a minimum split for any impurity measure that is additive in the sense that the impurity of several intervals is the sum of the impurities of each interval. This includes most standard impurity measures, such as entropy [ Quinlan, 1993 ] twoing rule and Gini index [ Breiman et al. 1984 ] Sum Minority, inconsistency rate, and others. The dynamic programming algorithm runs in a quadratic time in the number of starting intervals. We describe two approximation algorithms that can compute nearly optimal solutions efficiently for large datasets. We then propose a method of ....
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Belmont, CA: Wadsworth International Group, 1984.
....of neurons and the interconnections between them must be determined before learning can begin. This is a problem since there is no known optimal way to determine the network architecture and it is usually arrived at by trial and error. Decision trees offer another approach to pattern recognition [8, 9]. The pattern to be classified enters the decision tree at its root. At each internal decision tree node, a hyperplane test is evaluated to route the pattern to one of the child nodes. This process is repeated for each successive child node until, finally, the pattern reaches a leaf node. The leaf ....
....to one of the child nodes. This process is repeated for each successive child node until, finally, the pattern reaches a leaf node. The leaf nodes classify the pattern. The hyperplanes at the internal tree nodes are formed during the training stage. Most decision tree learning algorithms like CART [8] and ID3 [9] generate a set of candidate hyperplanes and then use a distortion metric to search for the best hyperplane for each tree node. Since the candidate hyperplanes are arbitrarily generated, this technique is not efficient. Decision tree learning algorithms, however, always converge to a ....
[Article contains additional citation context not shown here]
L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Belmont,CA: Wadsworth international group, 1984.
....nodes the classes; the leaf nodes correspond to the inseparable sets classified as one class, and the nodes, the other. Like other non parametric learning techniques, decision trees are susceptible to over training. In order to correct for over fitting, a fully grown tree can be pruned [3, 41, 4] by determining the classification error for each non leaf subtree, and then comparing it to the classification error resulting from replacing the subtree with a leaf node bearing the class label of the majority of the training instances in the set. If the leaf node results in better performance, ....
L. Breiman, J.H. Friedman, R.A. Olshen and C.J. Stone, Classification and Regression Trees, Belmont, CA: Wadsworth International Group, 1984.
....mixture of experts network as a normalized RBF network with the output layer weights (now provided by the individual experts) being functions of the inputs to the network, rather than constants. The training algorithms are however quite different. Decision tree type approaches such as CART [27] and MARS [28] also work on localized subspaces. Besides employing a different technique for parameter estimation, these approaches differ in that splits are limited to be parallel to the co ordinate axes, rather than being based on normalized ellipsoids. 0 5 10 15 20 25 30 15 20 25 30 35 40 ....
.... via network growing or pruning has been widely studied for feedforward networks [20] 19] 17] In the mixture of experts framework, Waterhouse et al. 18] have developed a growing technique to build a hierarchical mixture of experts, based on principles from Classification and Regression Trees [27]. The hierarchical network is grown one layer at a time by splitting an expert at the leaf of the tree into two experts and a gate. The expert to be split at any time is the one that maximizes the increase in log likelihood due to the split. The scheme has a nice theoretical framework that ....
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Belmont, CA: Wadsworth International Group, 1984.
.... network, where several independent networks are trained to model the data, and a special gating network computes the nal output for any given input as combination of the expert network outputs [12] Similar modular structure has been used in statistical literature in nonparametric decision trees [5] and adaptive spline models [6] For training the mixture of experts network, an Expectation Maximization algorithm was presented in [12] where the optimal model is found by repeating the following steps: E step: estimation of the likelihood of the data for the current model M step: computing ....
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classiøcation and Regression Trees, Belmont, CA: Wadsworth International Group, 1984.
....nodes the classes; the leaf nodes correspond to the inseparable sets classified as one class, and the nodes, the other. Like other non parametric learning techniques, decision trees are susceptible to over training. In order to correct for over fitting, a fully grown tree can be pruned [3, 35, 4] by determining the classification error for each non leaf subtree, and then comparing it to the classification error LTU = 0 LTU = 0 LTU = 0 LTU = 0 LTU = 0 LTU = 0 LTU = 0 LTU = 0 LTU = 0 : Figure 6: LTU s and targets of an MDT: ....
L. Breiman, J.H. Friedman, R.A. Olshen and C.J. Stone, Classification and Regression Trees, Belmont, CA: Wadsworth International Group, 1984.
....abnormal sample) and values less than 0.5 are labeled nontarget (a normal sample) The number of inputs is determined by the number of elements in the feature vector. This is 42 for both spiculated lesion detection and microcalcification detection. 4. 5 Binary Decision Trees Binary decision tree [85] (BDT) classification methods provide a means of approximating the optimal Bayesian classifier. A BDT is simply a set of binary threshold operations on the feature vectors, organized as a tree. At each node, one of the features in a vector is compared to a threshold, which moves the vector down ....
L. Breiman, J. H. Friedman, R. A. Olsen, and C. J. Stone, Classification and Regression Trees. Belmont, CA: Wadsworth International Group, 1984.
....of experts network is capable of modeling inverse problems [9] 10] which the system of experts network cannot model. In a broader sense, our architectures draws strength from both localized modeling and multiple model approaches. The former class includes Decision tree type methods such as CART [11] and MARS [12] which also work on localized subspaces. Besides employing a different technique for parameter estimation, these approaches differ in that splits are limited to be parallel to the co ordinate axes, rather than being based on normalized ellipsoids. The latter includes ensemble ....
....a growing technique for HME, wherein the expert with the least likelihood is selected for replacement by multiple experts. Waterhouse et al. 22] have developed techniques for both growing and pruning a hierarchical mixture of experts, based on principles from Classification and Regression Trees [11]. The hierarchical network is grown one layer at a time by splitting an expert at the leaf of the tree into two experts and a gate. The expert to be split at any time is the one that maximizes the increase in log likelihood due to the split. The scheme has a nice theoretical framework that ....
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Belmont, CA: Wadsworth International Group, 1984.
....reasons. For example, most applied algorithms for decision tree computation have to solve some version of this problem. 1. 1 Decision trees Decision trees are a very useful tool in artificial intelligence where they are used to classify new examples and to summarize concepts (see [Q93] [BFOS], SCFMW] MKS] Mo] PP] SL] Decision trees have been used successfully in many practical applications, such as finding protein sequences or identifying cosmic objects in Computer Science Dept. Princeton U. 35 Olden St. Princeton, NJ, dpd cs.princeton.edu y Max Planck Institute fur ....
....to classify an unknown input point is proportional to the height of the tree. When constructing decision trees, we typically wish to produce accurate and concise (small) trees. The standard algorithms build trees top down, recursively splitting the examples using tests on the attributes ( Q93] [BFOS], MKSB] The splitting stops when all the examples remaining belong to the same class. For examples with real valued attributes, the decision tree algorithm considers splitting the examples into two groups, and picks the best cut point available. Once a set of n examples is sorted along a ....
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees, Belmont, CA: Wadsworth International Group, 1984.
.... Gamma i j ) P M j=1 h ij ; 9) from which we see that the LSE function estimate is a weighted sum of linear approximations, where the weights h ij vary nonlinearly over the input space. In fact, the LSE estimator on a Gaussian mixture has interesting relations to algorithms such as CART [2], MARS [6] and competitive modular networks [11] as the mixture of Gaussians competitively partitions the input space, and learns a linear regression surface on each partition (details are given in [7] In the limit, as the covariance matrices go to zero the approximation becomes a ....
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Belmont, CA: Wadsworth International Group, 1984.
....methods exist for learning the weights in a linear threshold unit. Brodley and Utgoff [Brodley and Utgoff] discuss four such methods: the Recursive Least Squares (RLS) algorithm [Young 1984] the Pocket al..gorithm [Gallant 1986] Thermal Training [Frean 1990] and CART s coefficient learning method [Breiman, et al. 1984]. Because we are concerned only with two class classification in this domain, the RLS training method is used in this paper (see [Young 1984] for a description of training LTUs for two class classification, and [Draper, et al. 1994] for a description of multi class classification using Frean s ....
....error for each non leaf subtree, and then comparing it to the classification error resulting from replacing the subtree with a leaf node bearing the class label of the majority of the training instances in the set. If the leaf node results in better performance, the subtree is replaced by it [Breiman, et al. 1984; Quinlan 1986; Brodley and Utgoff] 5. Finding Camouflaged Vehicles The color model described in sections 2 and 3 suggests that multivariate decision trees should be a good technique for classifying objects in outdoor images. To test this hypothesis, we tested MDTs ability to identify military ....
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J., Classification and Regression Trees, Belmont, CA: Wadsworth International Group, 1984.
No context found.
Breiman, L., Friedman, J.H., Olshen, R. A., Stone C. J.: Classification and Regression Trees, Belmont, CA: Wadsworth International Group, 1984.
No context found.
Breiman, L., Friedman, J.H., Olshen, R. A., Stone C. J.: Classification and Regression Trees, Belmont, CA: Wadsworth International Group, 1984.
No context found.
Breiman, E., J Friedman, R. Olshen and C. Stone. Classification and Regression Trees. Belmont, CA: Wadsworth International Group, 1984.
No context found.
L. Breiman, J.H. Friedman, R.A. Oslhen, and C.J. Stone, Classification and Regression Trees. Belmont, CA: Wadsworth International Group, 1984
No context found.
L. Breiman, J. H. Friedman, R. A. Olshen and P. J. Stone, Classi cation and regression tree, Belmont, CA: Wadsworth International Group.
No context found.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Belmont, CA: Wadsworth International Group, 1984.
No context found.
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J., Classification and Regression Trees, Belmont, CA: Wadsworth International Group, 1984.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC