| D. Heath, S. Kasif, and S. Salzberg, "Learning oblique decision trees," in Proc. 13th Int. Joint Conf. Artificial Intelligence, 1993, pp. 1002--1007. |
....of at most 2 levels) 12] Its computation time is almost linear in the size of training set. The T2 algorithm is evaluated on 15 common real world dataset. It is shown that the most of these datasets, T2 provides simple decision trees with little or no loss in accuracy compared to C4.5. SADT [34] and OC1 [52] are decision tree induction methods, which partition instances using oblique hyperplanes. Standard decision tree techniques, such as C4.5 [58] partition a set of points with axis parallel hyperplanes whereas oblique decision tree algorithms attempts to find hyperplanes at any ....
.... values of test instance 1: F[1] 2 F[2] 3 F[3] 3 F[4] 3 F[5] 3 F[6] 0 F[7] 0 F[8] 0 F[9] 3 F[10] 3 F[11] 0 F[12] 0 F[13] 0 F[14] 0 F[15] 0 F[16] 0 F[17] 3 F[18] 2 F[19] 2 F[20] 3 F[21] 3 F[22] 3 F[23] 1 F[24] 3 F[25] 0 F[26] 0 F[27] 0 F[28] 0 F[29] 0 F[30] 0 F[31] 0 F[32] 1 F[33] 0 F[34]:34 Classes: 1] 2] 3] 4] 5] 6] Votes of Feature[1] 0.16 0.16 0.17 0.18 0.15 0.18 Votes of Feature[2] 0.36 0.27 0.15 0.07( 0.05( 0.10 Votes of Feature[3] 0.34 0.05( 0.38 0.07( 0.10 0.05( Votes of ....
[Article contains additional citation context not shown here]
D. Heath, S. Kasif, and S. Salzberg, Learning Oblique Decision Trees, Proceedings of 13th International Joint Conference on Artificial Intelligence (IJCAI-93), 1002--1007, 1993.
....of this problem. Key words. linear decision tree, hardness of approximation, parameterized complexity AMS subject classifications. 68Q17, 62H30 1. Introduction. Classifying point sets in n by linear decision trees is of great interest in pattern analysis and many other applications [4, 5, 14]. Typically, in such a problem we are given a set W of white points and a set B of black points in n , and we must produce a linear decision tree which classifies them. That is, the tree defines a linear decision at each internal node, such that for each leaf of this tree, either only white ....
D. Heath, S. Kasif, and S. Salzberg, Learning oblique decision trees, in Proc. 13th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, 1993, pp. 1002--1007. Chambery, France.
.... measures were proposed, such as impurity [2] information gain [17] and more in [14] in building an MDT, however, we need to find the most promising hyperplane (a set of features is needed) It is known that selecting the best set of features for test at a node is an NP complete problem [9]. In order to find an optimal hyperplane for a multivariate test, researchers offered various suggestions such as Regression [2] Linear Programming [1, 3] Randomization [15] and Simulated Annealing [9] In the following, we examine the conventional procedure of MDT building with reference to ....
.... known that selecting the best set of features for test at a node is an NP complete problem [9] In order to find an optimal hyperplane for a multivariate test, researchers offered various suggestions such as Regression [2] Linear Programming [1, 3] Randomization [15] and Simulated Annealing [9]. In the following, we examine the conventional procedure of MDT building with reference to UDT building. 2.1 Common aspects of the MDT building Many issues in building MDT s are the same as those for UDT s. Both types of trees are built from labeled examples. In finding a linear combination of ....
[Article contains additional citation context not shown here]
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. In Proceedings of the Thirteenth International Joint Conference on AI, pages 1002--1007, France, 1993.
....therefore to attack the fragmentation problem by removing replications and repetitions and avoiding branching on features with many values. Many solutions to these problems have been investigated such as global data analysis [21] constructive induction [10] multivariate or oblique decision trees [8], and discretization [4] All these have been studied in isolation. This work proposes a solution based on constructive induction and global data analysis to create compound features. The solution does not require feature discretization, and compound features can be used in generating ....
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. In Proceedings of the Thirteenth Internaional Joint Conference on Artificial Intelligence (IJCAI-93), pages 1002--1007, 1993.
....variants of perceptron algorithms [Utg89, BU92] have worked well in practice, but the algorithms may fail to converge and may not find optimal solutions. The problem of creating a linear function that minimizes the number of points misclassified is NP complete [Hea92] The algorithms CSADT [HKS93] and OC1 [MKSB93, MKS94] use simulated annealing to minimize functions of the number of misclassified points. Using mathematical programming, we have developed an algorithm that combines the two error criteria and exploits the mathematical structure of the underlying problem in order to find ....
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. In Proceedings of the Thirteenth International Conference on Artificial Intelligence, pages 1002--1007, Chambery, France, 1993. Morgan Kaufmann.
....trying a split than before splitting) so there is no natural way of assessing where to stop further expansion of a node. As a remedy, Kalkanis suggested the use of the upper bounds in the confidence intervals for the misclassification error as an attribute selection criterion. 9 Heath et al.[204] used the simplest possible attribute selection criteria, based on the number of misclassified objects, for oblique decision tree induction. Their measures were called max minority and sum minority, respectively denoting the maximum and the sum of the number of misclassified points on either side ....
.... the information theory or distance based measures, and additional tricks are needed to make these measures robust [297, 351] Another measure suggested by Heath et al. called the sum of impurities, assigns an integer to each class and measures the variance between class numbers in each partition [204, 351]. An almost identical measure was used earlier in the Automatic Interaction Detection (AID) program [139] Most of the above feature evaluation criteria assume no knowledge of the probability distribution of the training objects. The optimal decision rule at each tree node, a rule that minimizes ....
[Article contains additional citation context not shown here]
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. In IJCAI93
....of at most 2 levels) 12] Its computation time is almost linear in the size of training set. The T2 algorithm is evaluated on 15 common real world dataset. It is shown that the most of these datasets, T2 provides simple decision trees with little or no loss in accuracy compared to C4.5. SADT [34] and OC1 [52] are decision tree induction methods, which partition instances using oblique hyperplanes. Standard decision tree techniques, such as C4.5 [58] partition a set of points with axis parallel hyperplanes whereas oblique decision tree algorithms attempts to find hyperplanes at any ....
.... values of test instance 1: F[1] 2 F[2] 3 F[3] 3 F[4] 3 F[5] 3 F[6] 0 F[7] 0 F[8] 0 F[9] 3 F[10] 3 F[11] 0 F[12] 0 F[13] 0 F[14] 0 F[15] 0 F[16] 0 F[17] 3 F[18] 2 F[19] 2 F[20] 3 F[21] 3 F[22] 3 F[23] 1 F[24] 3 F[25] 0 F[26] 0 F[27] 0 F[28] 0 F[29] 0 F[30] 0 F[31] 0 F[32] 1 F[33] 0 F[34]:34 Classes: 1] 2] 3] 4] 5] 6] Votes of Feature[1] 0.16 0.16 0.17 0.18 0.15 0.18 Votes of Feature[2] 0.36 0.27 0.15 0.07( 0.05( 0.10 Votes of Feature[3] 0.34 0.05( 0.38 0.07( 0.10 0.05( Votes of ....
[Article contains additional citation context not shown here]
D. Heath, S. Kasif, and S. Salzberg, Learning Oblique Decision Trees, Proceedings of 13th International Joint Conference on Artificial Intelligence (IJCAI-93), 1002--1007, 1993.
.... improve accuracy are: ffl Construction of multi attribute tests using logical combinations (Ragavan and Rendell 1993) arithmetic combinations (Utgoff and Brodley 1990; 1 For extremely large datasets, however, learning time can remain the dominant issue (Catlett 1991; Chan and Stolfo 1995) Heath, Kasif, and Salzberg 1993), and counting operations (Murphy and Pazzani 1991; Zheng 1995) ffl Use of error correcting codes when there are more than two classes (Dietterich and Bakiri 1995) ffl Decision trees that incorporate classifiers of other kinds (Brodley 1993; Ting 1994) ffl Automatic methods for setting ....
Heath, D., Kasif, S., and Salzberg, S. 1993. Learning oblique decision trees. In Proceedings 13th International Joint Conference on Artificial Intelligence, 10021007.
....a classifier with a single inequality that involves as few variables as possible, and point out certain aspects of the difficulty of this problem. 1 Introduction Classifying point sets in n by linear decision trees is of great interest in pattern analysis and many other applications [BFOS84, HKS93, BGV92] Typically, in such a problem we are given a set W of white points and a set B of black points in n , and we must produce a decision tree with linear decision nodes, such that for each leaf of this tree, either only white or CSE Department, UCSD. y Night Vision Laboratory, ....
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. In Proc. 13th International Joint Conference on Artificial Intelligence, pages 1002--1007. Morgan Kaufmann, 1993. Chambery, France.
....a remedy, Kalkanis suggested the use of the upper bounds in the confidence intervals for the misclassification error as an attribute selection criterion. 10 The total number of misclassified points has been explored as a selection criterion by many authors. Two examples are Heath s sum minority [147] and Lubinsky s inaccuracy [223, 224] The CART book [31] among others, discuss why this is not a good measure for tree induction. Additional tricks are needed to make this measure useful [223, 269] Heath [147] also used max minority (maximum of the number of misclassified points on two sides ....
....as a selection criterion by many authors. Two examples are Heath s sum minority [147] and Lubinsky s inaccuracy [223, 224] The CART book [31] among others, discuss why this is not a good measure for tree induction. Additional tricks are needed to make this measure useful [223, 269] Heath [147] also used max minority (maximum of the number of misclassified points on two sides of a binary split) and sum of impurities (which assigns an integer to each class and measures the variance between class numbers in each partition) 147, 269] An almost identical measure to sum of impurities was ....
[Article contains additional citation context not shown here]
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. In IJCAI-93 [160], pages 1002--1007. Editor: Ruzena Bajcsy.
....is simpler to describe, algorithms that manipulate these objects run much faster than they would with the more complex object. In addition, there has been considerable work in machine learning directed at the design of small linear decision trees to represent a multi category point set (e.g. see [5, 6, 8, 21, 29, 32, 33, 34, 35, 36]) In this paper we formally establish that several natural problems on convex polyhedra are provably difficult, including several problems involving the approximation and illumination of convex polyhedra. Interestingly, a key ingredient in our proofs is a linear time method for realizing any ....
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pages 1002--1007, Chambery, France, 1993. Morgan Kaufmann.
....as the classification for the query point. An example of classified points and an associated decision tree are shown in Figure 1. Research in artificial intelligence is showing linear decision trees to be a powerful, practical method for performing these machine learning classification tasks [4, 5, 8, 13, 18, 19, 20, 21, 22, 23]. Typically, the real world problems where such methods are being applied involve points that are taken from IR d , with d being a reasonably small constant (typically, d 50) There has been a fairly extensive empirical study of such instances of the learning problem, but there has not been a ....
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pages 1002--1007, Chambery, France, 1993. Morgan Kaufmann.
....equivalent to partitioning a set of examples with an axisparallel hyperplane. Although Breiman et al. [1984] suggested an elegant method for inducing multivariate linear decision trees, there has not been much activity in the development of such trees until very recently [Utgoff and Brodley, 1991] [Heath et al. 1992]. Because these trees use oblique hyperplanes to partition the data, we call them oblique decision trees. This paper presents a new method for inducing oblique decision trees. As it constructs a tree, this method searches at each node for the best hyperplane to partition the data. Although most of ....
....avoids local minima in many cases. Randomization also allows the algorithm to produce many different trees for the same data set. This offers the possibility of a new family of classifiers: k Gammadecision tree algorithms, in which an example is classified by the majority vote of k trees (See [Heath, 1992]) Two other methods for generating oblique trees, that have been introduced recently, are perceptron trees [Utgoff and Brodley, 1991] and simulated annealing (SADT) Heath et al. 1992] The former shows that much smaller trees can be induced when oblique hyperplanes are used. However, theirs is ....
[Article contains additional citation context not shown here]
Heath, D.; Kasif, S.; and Salzberg, S. 1992. Learning oblique decision trees. Technical report, Johns Hopkins University, Baltimore MD.
....for trees of depth five or less. In particular, we describe simple and linear time algorithms for answering the query: is there a depth two or three decision tree that fits the data exactly 3 Trees with K Split Nodes Many recent papers have examined the use of decision trees for real valued data [HKS93, MKS94, dM92, dM93, FI92, DKS95]. The standard approach to handling continuous values is to sort all the values according to each attribute i, and then consider the cut points between each successive pair of points in the sorted list. Each cut point splits the examples into two groups, and the split is scored by applying a ....
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. In Proc. of the 13th Internatl. Joint Conf. on Artificial Intelligence, pages 1002--1007, Chambery, France, 1993. Morgan Kaufmann.
....the optimal solutions. Fixed depth lookahead search is one such technique, and in fact Sarkar et al. 22] have shown that lookahead search can guarantee bounded error solutions for the 0 1 knapsack problem. Several versions of the optimal decision tree induction problem are known to be NP Complete [11, 2, 9]. As a result, virtually all implemented decision tree systems use a greedy, top down approach. There have been, however, some exceptions to this rule. Moret [12] surveys early induction systems that used dynamic programming and branch and bound methods to produce optimal trees. Hartmann et al. ....
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. In Proceedings of the 13th IJCAI, pages 1002--1007, Chambery, France, 1993. Morgan Kaufmann.
....databases. One of the most promising paradigms for solving this problem is that of using a combination of techniques from the fields of learning, search, and clustering to classify, analyze, and visualize scientific data. Our methods are based on recent advances in the areas of machine learning [15, 22, 1, 18, 19, 9, 10, 13] as well as older work on pattern classification [7] Our goal is to develop a methodology for knowledge discovery in scientific databases, and to use this methodology to discover new and important classes of objects. We also want to produce better characterizations of known object classes. We are ....
....as well as different feature sets. The methods we have used are: back propagation, nearest neighbor, nested hyperrectangles, and multivariate ( oblique ) decision trees. The first two methods are standard learning methods, while the latter two are relatively new methods we have developed [19, 13, 9]. Since the first two methods are standard, we will not describe them here, but readers interested in details of back propagation can see [17] The nearest neighbor method has been used for classification since the 1950s [7] and numerous variations have been described and explored. For some ....
[Article contains additional citation context not shown here]
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambery, France, 1993 (to appear).
....a constant; i.e. they are equivalent to partitioning a set of examples with an axis parallel hyperplane. Although [BFOS84] suggested an elegant method for inducing multivariate linear decision trees, there has not been much activity in the development of such trees until very recently ( UB91] HKS92] Because these trees use oblique hyperplanes to partition the data, we call them oblique decision trees. This paper presents a new method for inducing oblique decision trees. As it constructs a tree, this method searches at each node for the best hyperplane to partition the data. Although most ....
.... the possibility of a new family of classifiers: k Gammadecision tree algorithms, in which an example is classified by the majority vote of k trees (See [Hea92] Two other methods for generating oblique trees have been introduced recently: perceptron trees [UB91] and simulated annealing (SADT) HKS92] In [UB91] Utgoff and Brodley show how oblique hyperplanes can be used to find much smaller trees than standard methods. In [Hea92] Heath shows that the problem of finding the optimal oblique tree is NP complete. More precisely, Heath [Hea92] proves that the problem of of finding an optimal ....
[Article contains additional citation context not shown here]
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. Technical report, Johns Hopkins University, Baltimore MD, 1992.
No context found.
Heath, D., Kasif, S., and Salzberg, S. 1993. Learning oblique decision trees. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1002-1007, Chambery, France. Morgan Kaufmann, San Mateo, CA.
....to the results reported in Table 1, the following results have appeared on the Star Galaxy data. Odewahn et al. 1992) reported accuracy of 99.8 accuracy on the bright objects, and 92.0 on the dim ones, although it should be noted that this study used a single training and test set partition. Heath (1992) reported 99.0 accuracy on the bright objects using SADT, with an average tree size of 7.03 leaves. This study also used a single training and test set. Salzberg (1992) reported accuracies of 98.8 on the bright objects, and 95.1 on the dim objects, using 1 Nearest Neighbor (1 NN) coupled with a feature ....
Heath, D., Kasif, S., & Salzberg, S. (1993b). Learning oblique decision trees. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1002--1007.
No context found.
D. Heath, S. Kasif, and S. Salzberg, "Learning oblique decision trees," in Proc. 13th Int. Joint Conf. Artificial Intelligence, 1993, pp. 1002--1007.
No context found.
D. Heath, Simon Kasif, and Steven Salzberg. Learning oblique decision trees. In Proceedings of the 13th Intenational Joint Conference on Artificial Intelligece, pages 1002--1007. Morgan Kaufmann, 1993.
No context found.
D. Heath, S. Kasif, and S. Salzberg, "Learning oblique decision trees," in Proc. 13th Int. Joint Conf. Artificial Intelligence, 1993, pp. 1002--1007.
No context found.
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. In IJCAI-93 #160#, pages 1002#1007. Editor: Ruzena Bajcsy.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC