| Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classi cation and regression trees. Wadsworth & Brooks, Paci c Grove. |
....(features) costs of cases, costs of errors. In the literature the latter kind of costs is the most commonly discussed one. Several attempts to incorporate misclassi cation costs into decision tree or decision rule learning were made so far. The rst approach was introduced by Breiman et al. [3] in CART decision tree learning system. Their method consists in modi cation of the class prior probabilities used in the splitting criterion. The cost based measure is also used for tree pruning. In a simpler approach (e.g. 2] 11] error costs are taken into consideration during the pruning ....
Breiman, L., Friedman, R.A., Olshen, R., Stone, C.J.: Classi cation and Regression Trees. Wadsworth (1984).
.... by using leave n a out cross validation, where one sets aside assessment data of size n a instead of size 1 (Geisser [18] An interesting discussion of those multifold cross validation methods is given in Zhang [36] A widespread version is the grouped or v fold cross validation (Breiman et al. [3]) where the dataset is randomly divided into v groups S a;k , k = 1; v, of roughly the same size n a = n=v. Each group constitutes a test set to assess the results based on the remaining n n a training cases. Leave one out cross validation is applied to these groups leading to the v fold ....
Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classi - cation and Regression Trees. Wadsworth and Brooks, Belmont.
....algorithm. None of the above techniques would work for a data set which is massively incomplete, because it would lead to the ignoring of almost all the records and attributes. Common solutions to the missing data problem include the use of imputation, statistical or regression based procedures [4, 5, 10, 11, 19, 20, 15, 17] in order to estimate the entries. Unfortunately, these techniques are also prone to estimation errors with increasing dimensionality and incompleteness. This is because when a large percentage of the entries are missing, each attribute can be estimated to a much lower degree of accuracy. ....
L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone. Classi cation and Regression Trees, Wadsworth, Belmont, 1984. 24
....we use the outcome of that experiment and con dence intervals as in the case of zero prior experiments. In the experimental results section, we will refer to this as the IAVG (Independent AVeraGe) function approximator. 2.2. 2 Regression Trees We implement an algorithm similar to CART [3]. A tree is greedily constructed in a top down fashion. At each node all possible split tests are considered of the form: attribute i = value j. For each test, the data points at that node are separated into those that pass the test and those that don t. Then using the same Gaussian noise ....
L. Brieman, J. Friedman, R. Olshen, and C. Stone. Classi cation and Regression Trees. Wadsworth, 1984.
....functions T j (x) are products of one dimensional basis functions which are characteristic (or indicator) functions in the case of categorical variables and truncated power functions in the case of numeric variables. The key to the method, which can be viewed as a successor of the regression trees [2] is that the T j (x) are generated in a recursive way and depend on the data. In particular, two new basis functions are generated at each step of the form T J 1 (x) T j (x) x v s) T J 2 (x) T j (x) x v s) where the T j (x) j = 0; J are the parent basis function and the ....
Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J., Classi cation and Regression Trees, Wadsworth, Belmont, California, 1984.
....chapter displays Logic Trees) On the set of trees we de ne a move set by a collection of standard operations: alternating leaves, changing operators, splitting and deleting leaves, and growing and pruning the trees. The terminology used is similar to the terminology introduced by Breiman et al. [1]. Ruczinski et al. 5] also provide a comparison between CART and Logic models. Since the number of possible Logic Models for a given set of predictors can be very large, we rely on search algorithms to help us nd the best scoring models. We implemented two algorithms: a greedy (stepwise) and a ....
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C.J. (1984), Classi cation and Regression Trees, Belmont, CA: Wadsworth.
....intervals, is thus de ned by a sequence of length S = f2 s k g 1 k K j;p . Imposing that each group has a number of elements which is a power of 2 has no asymptotic e ect on the performance of the macrotile averaging, and allows one to optimize this segmentation with a fast CART algorithm [4]. A macrotile model M is speci ed by the indices of the local cosine basis B b = fB p l g 1 l L and the adaptive segmentation S l for each B p l M = fj l ; p l ; S l g 1 l L : 20) Figure 1 displays the representation of PM (x) for one such macrotile model. This macrotile model is just an ....
....for each j and p, we compute the segmentation S p which yields a model M p of minimum cost for local cosine family B p . The cost is calculated from the N2 coecients fb j associated to B p . The optimal segmentation S p can be computed with O(2 N) operations with a CART algorithm [4]. In the following we write Cost(S) the cost of a partial macrotile model corresponding to a segmentation S of B p . By de nition S p is the segmentation which minimizes this cost. To compute it, we use a segmentation tree, which sub decompose any segmentation in partial segmentations whose ....
L. Breiman, J. Friedman, R. Olshen and C. Stone, Classi cation and Regression Trees, Chapman Hall, 1993.
....data. These fall into two categories, regression techniques and association rules using discretization of numeric values. Regression techniques can be used to predict a numeric value for a given set of input parameters. Examples of such techniques include linear regression, regression trees [9], and neural networks. The relationship between regression techniques and impact rules is like the relationship between classi cation rules and association rules. Regression techniques allow a prediction to be made about a target value for a given data point. However, they do not provide an ....
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classi cation and Regression Trees. Wadsworth International, Belmont, CA, 1984.
....the square root of the size of the program. When DT learning algorithms are viewed as boosting algorithms, the training error declination is only polynomial in the size of the tree [9, 12] The learning algorithm for BPs is basically very similar to the algorithms used in top down induction of DTs [3, 15]. It greedily searches for good splits of the nodes in the last level of the evolving program. The central dioeerence between a BP and a DT is that in the former branches may grow together, while in the latter two separate branches never unite. In this paper we experiment with a practical ....
....distributions gradually lose impurity, they usually become easier as we go down in the tree. The nal predictor is combined from the node predicates. Thus, viewing node predicate selection as a form of weak learning enables to explain the learning behavior of such successful DT learners as CART [3] and C4.5 [15] Under the weak learning hypothesis, choosing a suitable predicate class and using an appropriate splitting criterion, the training error of the DT is driven down polynomially in its size [9] This bound, which is close to optimal for any DT learning algorithm, is exponentially ....
[Article contains additional citation context not shown here]
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classication and regression trees. Wadsworth, Pacic Grove, CA (1984)
....nition 5 Suppose we are given S k = 1 ; j ; k ) 2 IR : 8j 2 f1; 2; kg; j 0) j = 1 = 2) the k dimensional simplex, where k is a positive integer. An entropy measure is an application from S k to IR , with the following properties (for more details see [24]) Symmetry, Minimality, Maximality, Continuity and Concavity De nition 6 The Quadratic Entropy is a function QE from [0; 1] to [0; 1] 1 ; k ) QE( 1 ; k ) 3.4 Local and Total Uncertainties in the MST Given the previous de nitions, we use the quadratic entropy ....
....the better the feature subset, the smallest the overlaps, and the smallest U tot . Even more, at each instance s level, the quadratic entropy becomes equivalent to Gini s impurity criterion, used in decision tree induction to evaluate the quality of a tree node in the well known CART TM package [24]. It is worthwhile to remark that Gini s criterion, as well as more recent criteria such as Schapire Singer s Z criterion [25] have been rigorously proven to be very ecient measures to grow decision trees, in particular more accurate than the accuracy itself [26] Furthermore, in our case, a ....
[Article contains additional citation context not shown here]
L. Breiman, J. H. Freidman, R. A. Olshen, C. J. Stone. Classi cation and Regression Trees. Wadsworth, 1984.
....predictive clustering approach presented in [4] for which it has been argued that it provides a very general approach to predictive modelling. 4 Predictive Clustering Trees A variety of algorithms for predictive modeling exists. Among the better known are algorithms that induce decision trees [6, 18]. Compared to other well known techniques such as neural networks [2] decision trees have the advantage of being more interpretable: they clearly explicitate the factors that in uence the outcome most strongly. Decision trees are most often used in the context of classi cation or singletarget ....
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classi cation and Regression Trees. Wadsworth, Belmont, 1984.
....in size) results to be very high. The use of more sophisticated pattern recognition techniques clearly could not to be set aside. We then began to apply standard classification techniques like Nearest Neighbor (NN) Maximum Likelihood Gaussian Classifier (MLGC) and Tree Based Classifiers (TBC) [Breiman et al. 1984]. We also applied some combining techniques like Bagging [Breiman, 1996] and AdaBoost, using maximal TBC as base classifiers. In Table 1 are reported the performances of these classifiers in terms of global accuracy, sensitivity and specificity, computed by 10 fotd crossvalidation. AdaBoost ....
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classication and Regression Trees. Wadsworth, Belmont (CA).
.... e.g. Quinlan s Foil can also be considered an upgrade of either Michalski s AQ [47] or CN2 [14, 13] RIBL upgrades the classical k nearest neighbor algorithm (using a rst order distance due to [6] SRT and Tilde upgrade the well known decision (and regression) tree paradigm incorporated in CART [12] and C4.5 [56, 57] Warmr upgrades Apriori [2, 1] Maccent upgrades the Maximum Entropy approach in [5] De Raedt and Dzeroski s PAC learning results (as well as its incorporation in the Claudien system) for jk CT are derived from results in [67] for k CNF, Reddy and Tadepalli s results are based ....
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classi cation and Regression Trees. Wadsworth, Belmont, 1984.
....algorithms. 2.1 Discretization Techniques Nominal attributes, most often, are divided into single value subsets; i.e. so that each separate value in the domain of the attribute receives its own subset of training examples. Though, some learning algorithms also support grouping of nominal values [5, 7, 25, 8]. Numerical attributes, on the other hand, may have an innite domain, which means that they often require discretization. It is performed either in preprocessing or during induction. The need to discretize numerical attributes arises from two sources: the desire to speed up induction and the ....
....= jfr 2 R j val C (r) 6= maj C (R)gj: Training Set Error is the number of training instances falsely classied in the partition. For a partition R i of S it is dened as ffi(R i ) Other evaluation functions used in machine learning algorithms include for example the Gini index (GI) [5], which is closely related to the entropy based functions, and many evaluation functions based on the Minimum Description Length [26] and the Minimum Message Length [28] principles. 3 On the Complexity of the Optimal Multisplitting Problem In this section we concentrate on the inherent ....
Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J.: Classication and Regression Trees, Wadsworth, 1984.
....and the D r s. In Fig. 3, results are reported for the data set Gaussians. More speci cally, 100 bootstrap replicates of the training set were generated for each data set, and an AdaBoost model was trained on each bootstrap replicate. The base models adopted in our experiments are unpruned trees [1]. The predictions made by the AdaBoost models on an independent test set were used for decomposing the error according to Eq. 5. The rst 1 0.5 0 0.5 1 1.5 2 1 0 1 1.5 1 0.5 0 0.5 1 1.5 2 1 0 1 (a) b) Figure 4: Controlling bias and variance for the data set Gaussians of ....
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classi cation and Regression Trees. Wadsworth, Belmont (CA), 1984.
....not identical to Armstrong s Rules of functional dependencies (cf. Theorem 3. 7) The approach to classi cation that we propose in this paper generalizes wellknown classi cation techniques in data mining such as the one proposed in the CART system which is based on the Gini impurity measure (see [BFOS84]) The algorithm described in our paper is reasonably fast and is scalable. The role played by impurity measures in the de nition and study of purity dependencies suggests the need to investigate an axiomatization of these measures. Speci cally, given a nite set S and a partition = fB 1 ; ....
L. Breiman, J. H. Friedman, R. A. Ohlsen, and C. J. Stone. Classi- cation and Regression Trees. Chapman & Hall/CRC, Boca Raton, 1984. Republished 1993.
.... Control Charts. In this dataset there are 6 classes of control charts [1] Figure 2.b shows some examples of three of the classes. The data used were obtained from the UCI KDD Archive [4] The number of examples is 600, with 60 points in each series. Waveform. This dataset was introduced by [9]. We used the version from the UCI ML Repository [7] The number of examples is 900 and the number of points in each series is 21. Wave Noise. This dataset was generated in the same way as the previous one, but 19 random points are added at the end of each example, with mean 0 and variance 1. ....
....C4.5, over the raw data, and obtained an average error of 8:6 (also using ve ve fold cross validation) It should also be noted that obtained error is reduced to 0:82 using 100 iterations and only dtw le. Waveform. The error of a Bayes optimal classi er on this dataset is approximately 14 [9]. The best previous result we are aware of for this dataset is an error of 15.21 [11] That result was obtained using boosting, with decision trees as base classi ers, which are much more complex than our base classi ers (clauses with one literal in the body) The obtained error is reduced to ....
L. Breiman, J.H. Friedman, A. Olshen, and C.J. Stone. Classi cation and Regression Trees. Chapman & Hall, New York, 1993.
....in the training data. Parameters can also be tied for the covariance matrix of the distribution using algorithms such as semi tied covariance [40, 11, 46, 41] Decision trees have proved to be useful in classi cation, description and generalization of data in a variety of applications (e.g. [91, 14, 15]) Decision trees were rst used in ASR for tying HMMs where state labels were approximated using a Poisson process [5] It became popular when they were used to cluster observation densities of a state in a maximum likelihood framework [70, 130] Decision tree clustering is particularly ....
Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone. Classi cation and regression trees. Wadsworth Inc., Belmont, CA, 1984.
....or is not in the subset. Our implementation is based on a further variant of AdaBoost.OC, named AdaBoost.ECC [11] but dealing with con dence based predictions. 4 Experimental Validation The characteristics of the data sets are summarized in table 1. The data sets waveform, waveform with noise [5, 6], CBF (cylinder, bell and funnel) 19] and control charts [1, 3] were already used in our work on boosting distance literals [18] Auslan is the Australian sign language, the language of the Australian deaf community. Instances of the signs were collected using an instrumented glove [12] Each ....
L. Breiman, J.H. Friedman, A. Olshen, and C.J. Stone. Classi cation and Regression Trees. Chapman & Hall, New York, 1993.
....are more expensive to evaluate than axis orthogonal hyperplanes. There is a vast body of literature on the design of decision trees and a description of available methods and their advantages and disadvantages is beyond the scope of this thesis. However, two seminal works should be mentioned. In [16], Breiman et al. describe the CART (Classi cation And Regression Trees) algorithm, and in [53] Quinlan describes the ID3 algorithm which has since evolved into the C4.5 [54] and C5.0 algorithms (the latter of whichcombines a version of AdaBoost with C4.5) All of these algorithms use top down ....
....complexityofeach mask perceptron in this result is related to the to the proportion of training examples which are close to the mask perceptron threshold. For details see [49] 4 51 The alternating decision tree Part of the popularity of algorithms for constructing decision trees (such as CART [16] and C4.5 [54] is due to the fact that they produce classi ers which are highly accurate, yet still admit simple interpretation. The application of voting methods, particularly the AdaBoost algorithm, to the construction of voted combinations of decision trees has proven to be remarkably ....
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classi cation and Regression Trees. Wadsworth, Belmont, 1984.
....of GALE2 coevolves di erent knowledge representations. In this paper, GALE2 coevolves three di erent knowledge representations in its heterogeneous runs: 1) sets of fully de ned instances [Llor a and Garrell, 2001b] 2) orthogonal decision trees [Quinlan, 1993] and (3) oblique decision trees [Breiman et al. 1984, Van de Merckt, 1993] Rules can be extracted from orthogonal decision trees. Instance sets are evolved sets of instances that describe the set P mined, based on nearest neighbor algorithms. Merge uses two point crossover [De Jong and Spears, 1991] and splitting is done using mutation based on ....
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classi cation and Regression Trees. Wadsworth International Group.
.... The algorithm recursively extracts the dyadic partition points according to our maximized penalized likelihood criterion de ned in (12) As described here, our algorithm is reminiscent of dynamic programming, Coifman and Wickerhauser s Best Ortho Basis algorithm [8] or the CART algorithm [9]. It begins one level above the leaf nodes in our dyadic tree and traverses upwards, performing a tree pruning operation at each stage. For each node at a particular level, the vector y contains the leaf nodes descending from the current node. We then calculate the penalized likelihoods p(yjH 0 ) ....
L. Breiman, J. Friedman, R. Olshen, and C. J. Stone, Classi cation and Regression Trees, Wadsworth, Belmont, CA, 1983.
....is to split the cluster in such a way that the impurity of its subsets is minimized. Splitting can be repeated 65 recursively until the cluster size reaches some minimal threshold or the impurity can no longer be signi cantly reduced. Splitting of clusters was done by means of the CART method [15], i.e. by building a decision tree. Each non nal node of the tree contains a yes no question about the phonetic, lexical and or phonological context of a given unit (e.g. is the right neighbor a nasal , is the syllable stressed , does the syllable have a pitch accent ) The CART algorithm ....
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classi cation and Regression Trees. Wadsworth & Brooks, Paci c Grove, CA., 1984.
....of the classes. Suppose a system constructs a binary classi er for which we have unbiased estimates for the following: TP , the proportion of instances observed to be and classi ed as such; and FP , the proportion of instances observed to be and classi ed as . Using the notation in [4], let the costs of false positives and false negatives be C( j) and C(j ) respectively (that is, the cost of classifying an instance as when it is really a , and vice versa) 2. The expected misclassi cation cost of the classi er is then given by ( 1 TP ) C(j ) F P C( j) For brevity, ....
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classication and Regression Trees. Wadsworth, Belmont, 1984.
....by a SOCRATES (ERASMUS) grant. 2 S. Kramer et al. Prediction of Ordinal Classes Using Regression Trees 1. Introduction Learning to predict discrete classes or numerical values from preclassi ed examples has long been, and continues to be, a central research topic in Machine Learning (e.g. [Breiman et al. 1984, Quinlan, 1992, Quinlan 1993] A class of problems between classi cation and regression, learning to predict ordinal classes, i.e. discrete classes with a linear ordering, has not received much attention so far, which seems somewhat surprising, as there are many classi cation problems in the ....
....classes or numerical values from examples and relational background knowledge. The algorithm constructs a tree containing a positive literal or a conjunction of literals in each node, and assigns a discrete class or a numeric value to each leaf. S CART is a full edged relational version of CART [Breiman et al. 1984]. After the tree growing phase, the tree is pruned using so called error complexity pruning for regression or cost complexity pruning for classi cation [Breiman et al. 1984] These types of pruning are based on a separate prune set of examples or on cross validation. 4 S. Kramer et al. ....
[Article contains additional citation context not shown here]
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classi cation and Regression Trees. Belmont, CA: Wadsworth International Group.
....consistent with H j and 0 otherwise. b) is an associated binary decision tree illustrating how hypotheses might be discriminated by the outcomes of trials. 2.1. 2 A Near Optimal Solution However, according to results by Muggleton and Page [35] within a Bayesian setting, the CART algorithm [3] gives a near optimal solution for this binary decision tree construction problem in time polynomial in m;n. Thus while CART is usually used in Machine Learning for constructing hypothesised propositional theories based on example data, ASE Progol uses ILP to construct hypothesised rst order ....
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classication and Regression Trees. Wadsworth, Belmont, 1984.
No context found.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classi cation and regression trees. Wadsworth & Brooks, Paci c Grove.
No context found.
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classication and regression trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software.
No context found.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J.Stone. Classi cation and regression trees . Wadsworth &Brooks, 1984
No context found.
L. Brieman, J. Friedman, R. Olshen, and C. Stone. Classi cation and Regression Trees. Wadsworth, 1984.
No context found.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classication and Regression Trees. Wadsworth, Belmont, Ca., 1984.
No context found.
L. Breiman, J Friedman, R. Ohlsen, and C. Stone. Classi cation and Regression Trees. Paci c Grove, 1984.
No context found.
L. Breiman, J Friedman, R. Olshen, and C. Stone. Classi cation and Regression Trees. Paci c Grove, 1984.
No context found.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classi cation and Regression Trees. Wadsworth Int., 1984.
No context found.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classi cation and Regression Trees. Wadsworth Inc., Belmont, CA, USA, 1984.
No context found.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classi cation and Regression Trees. Wadsworth, Belmont, 1984.
No context found.
Breiman, L., Friedman, J., Ohlsen, R., Stone, C. Classi cation and Regression Trees , Wadsworthand Brooks, Paci c Grove, CA. 1984
No context found.
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classi cation and Regression Trees. Wadsworth (1984)
No context found.
Breiman, L., Friedman,J., Olshen, R. and Stone,C. (1984) Classi cation and Regression Trees. Wadsworth, Belmont, CA.
No context found.
L. Breiman, J. Friedman, R. Olsen, and C. Stone. Classi cation and Regression Trees. Wadsworth International (California), 1984.
No context found.
L. Breiman, H. Friedman, J. A. Olshen, and C. J. Stone. Classi cation and Regression Trees. Wadsworth, 1984.
No context found.
L. Breiman, J. Friedman, R. Olsen, and C. Stone. Classi cation and Regression Trees. Wadsworth International (California), 1984.
No context found.
L. Breiman, J.H. Friedman, R. Olshen, and C.J. Stone, Classi cation and Regression Trees, Wadsworth, Belmont, CA., 1984.
No context found.
L. Breiman, J.H. Friedman, R. A. Olshen, and C.J. Stone. Classi cation and Regression Trees. Wadsworth Inc., Belmont, California, 1984.
No context found.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classi cation and regression trees. Technical report, Wadsworth International, Monterey, CA, 1984.
No context found.
Breiman, L., Friedman, J.H., Olshen, R.A. & Stone, C.J. (1984). Classi cation and Regression Trees. Wadsworth, Belmont (CA).
No context found.
Breiman, L., Friedman, J.H., Olshen, R.A. & Stone, C.J. (1984). Classi cation and Regression Trees. Wadsworth.
No context found.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classi cation and Regression Trees. Wadsworth, Belmont, 1984.
No context found.
Breiman, L., Friedman, J.H., Olshen, R.A. & Stone, C.J. (1984). Classi cation and Regression Trees. Wadsworth, Belmont (CA).
No context found.
L. Breiman, J. H. Freidman, R. A. Olshen, and C. J. Stone. Classi cation and Regression Trees. Wadsworth, Belmont, CA, 1984.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC