| Fulton T., Kasif S., and Salzberg S. (1995). Efficient algorithms for finding multi-way splits for decision trees. In Proceedings of the Twelfth Int. Conference on Machine Learning, 244-251. Morgan Kaufmann. |
....If we measure optimality by the linear loss function, our algorithm is better by a factor of sample sizeorder compared to solutions induced by the dynamic programming scheme presented in [5] for this kind of problems. 1 We learnt from a referee that the same time bound can be obtained from [3] (Theorem 3) by carrying over and generalizing the formalism introduced there. However, we think that our algorithm completes and improves the algorithm presented in [3] in the following ways. Our algorithm is simpler (no preprocessing step has to be performed) Its proof of correctness is more or ....
....scheme presented in [5] for this kind of problems. 1 We learnt from a referee that the same time bound can be obtained from [3] Theorem 3) by carrying over and generalizing the formalism introduced there. However, we think that our algorithm completes and improves the algorithm presented in [3] in the following ways. Our algorithm is simpler (no preprocessing step has to be performed) Its proof of correctness is more or less evident. Our dynamic scheme can be generalized easily (see Section 3) At last, we briefly point out how DYN can be used as a tool to identify two dimensional ....
Truxton Fulton, Simon Kasif, Steven Salzberg. Efficient Algorithms for Finding Multi-way Splits for Decision Trees. Proceedings of the 12th International Conference on Machine Learning, 1995, p. 244--251.
....within the domain. Numerical attributes have been observed to slow down, e.g. the C4.5 decision tree learning algorithm [14, 12] Restricted classes of attribute evaluation functions have efficient optimization algorithms, but only the Training Set Error is known to optimize in linear time [1, 2, 11]. The class of so called cumulative functions can be optimized in quadratic time in the number of possible cut points [6, 11] However, even quadratic time evaluation may be too much if the number of potential cut points is high. Therefore, reducing their number is critical for the efficiency of ....
.... 12] Restricted classes of attribute evaluation functions have efficient optimization algorithms, but only the Training Set Error is known to optimize in linear time [1, 2, 11] The class of so called cumulative functions can be optimized in quadratic time in the number of possible cut points [6, 11]. However, even quadratic time evaluation may be too much if the number of potential cut points is high. Therefore, reducing their number is critical for the efficiency of numerical attribute handling. For most commonly used attribute evaluation functions it is possible to do static pruning of cut ....
T. Fulton, S. Kasif, and S. Salzberg. Efficient algorithms for finding multiway splits for decision trees. In Machine Learning: Proceedings of the Twelfth International Conference, pages 244--251, San Francisco, Calif., 1995. Morgan Kaufmann.
....some similarity measure. It could be difficult to specify a good threshold of similarity so that not too many intervals are constructed. Another approach aims at finding a split that optimizes some goodness criterion. Examples are [ Quinlan, 1993; Catlett, 1991; Holte, 1993; Chiu et al. 1990; Fulton et al. 1995; Auer et al. 1995 ] See [ Dougherty et al. 1995 ] for more on these work. In these methods, additional constraints, such as the maximum number of intervals, the minimum number of examples in each interval, a penalty function on the number of intervals, are needed to control the number of ....
T. Fulton, S. Kasif, and S. Salzberg. Efficient Algorithms for Finding Multi-Way Splits for Decision Trees. Machine Learning.
....may be time consuming if the domain at hand has a very high number of candidate cut points. This affects both binarization [4, 14] methods and, in particular, algorithms that need to partition numerical ranges into more than two subsets; e.g. off line discretization algorithms [5] and optimal [8, 11] or greedy [5, 10] multisplitters in decision tree learning, rule induction, and nearest neighbor methods. In data mining applications numerical attributes may constitute a significant time consumption bottleneck. In this paper we continue to explore ways to enhance the efficiency of numerical ....
.... 2 ) O( k m)B) ACE, IG, GI Monotonic One pass O(kmn) O(km) TSE Cumulative evaluation functions, i.e. functions that compute a (weighted) sum of goodness scores of the subsets, can be optimized in time quadratic in the number of bins using the general algorithm which uses dynamic programming [11, 8]. Subsequently we refer to this algorithm as Bin Opt. If the evaluation function, additionally, is well behaved, then an algorithm called Block Opt [8] can be used to optimize it in time quadratic in the number of blocks. This paper introduces a pruning method for minimization of concave and ....
[Article contains additional citation context not shown here]
Fulton, T., Kasif, S., Salzberg, S.: Efficient algorithms for finding multi-way splits for decision trees. In: Prieditis, A., Russell, S. (eds.): Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, San Francisco, CA (1995) 244--251
....and possibly still after that. Even though the example set, which determines the goodness of an attribute, changes dynamically as the construction process proceeds, we do not have to keep sorting the examples over and over again, but can rely on careful bookkeeping instead (Fayyad and Irani, 1993; Fulton, Kasif, and Salzberg, 1995). One remaining topic is the arity, k, of the resulting partition. Usually we would like to penalize increase in k; that is, we would like to keep the arity of the partition relatively low because of the following reasons: The utility of an attribute in class prediction may be impaired because of ....
....by the attribute. In the empirical experiments reported in this paper we circumvented taking a stand to this issue by assuming k to be a given value. We continue the presentation by giving, in Section 2, a review of related work conducted hitherto; in particular, we recapitulate the recent work of Fulton, Kasif, and Salzberg (1995), which has laid the ground for our method, and the formal results of Fayyad and Irani (1992) which constitute the underpinnings for our work. Thereafter, in Section 3, we introduce our main contribution: An efficient method of finding optimal multi splits in one shot. Furthermore, we extend the ....
[Article contains additional citation context not shown here]
Fulton, T., Kasif, S., and Salzberg, S. (1995). Efficient algorithms for finding multi-way splits for decision trees. In A. Prieditis and S. Russell (eds.), Proc. Twelfth International Conference on Machine Learning (pp. 244-- 251). San Francisco, CA: Morgan Kaufmann.
....or continuous (real) Numerical attributes cannot be handled by the TDIDT scheme quite as naturally as nominal attributes. In real world induction tasks numerical attribute ranges, nevertheless, regularly appear, and therefore, they also need to be taken into account in decision tree construction [4, 7, 8, 9, 16, 17, 18]. Independent of which evaluation function is used to determine the goodness of an attribute, a numerical attribute s value range needs to be categorized into two or more intervals for evaluation. When discretizing a numerical value range we inherently meet the problem of choosing the right ....
....resulting multisplit is assigned to a single node of the evolving tree. Neither of the above mentioned methods can guarantee the quality (as measured by the evaluation function) of the resulting partition. Algorithms for finding optimal multisplits have been devised by Fulton, Kasif, and Salzberg [9] and by Elomaa and Rousu [6] These methods finally make it possible to discretize a numerical value range into intervals that are determined best by the evaluation function. Independent of the algorithmic strategy that is used for partitioning the numerical value ranges it is important to ensure ....
[Article contains additional citation context not shown here]
T. Fulton, S. Kasif and S. Salzberg, Efficient algorithms for finding multi-way splits for decision trees. In: A. Prieditis and S. Russell (eds.), Proc. Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 1995, 244--251.
....be either discrete (integer) or continuous (real) Numerical attributes cannot be handled by the TDIDT scheme quite as naturally as nominal attributes. In real world induction tasks numerical attribute ranges, nevertheless, regularly appear, and therefore, they also need to be taken into account [4, 7 9, 14 16]. Independent of the function that is used to determine the goodness of an attribute, a numerical attribute s value range needs to be categorized into two or more intervals for evaluation. When discretizing a numerical value range we inherently meet the problem of choosing the right arity for ....
....at once; the resulting multisplit is assigned to a single node of the evolving tree. Neither of the abovementioned methods can guarantee the quality (as measured by the evaluation function) of the resulting partition. Algorithms for finding optimal multisplits have been devised by Fulton et al. [9] and by Elomaa and Rousu [5] Independent of the algorithmic strategy that is used for partitioning the numerical value ranges it is important to ensure that obviously bad partitions are not selected by the function, i.e. that the function is well behaved. A manifestation of poor behavior of an ....
[Article contains additional citation context not shown here]
Fulton, T., Kasif, S., Salzberg, S.: Efficient algorithms for finding multi-way splits for decision trees. In A. Prieditis, S. Russell (eds.), Proc. Twelfth Intl. Conf. on Machine Learning (244--251). Morgan Kaufmann, San Francisco, CA, 1995.
....attribute at a time can be said to be local discretization methods. In contrast, global discretization methods simultaneously convert all continuous attributes [81] Fast methods for splitting a continuous dimension into more than two ranges is considered in the machine learning literature [135, 157]. Trees in which an internal node can have more than 2 children, have also been considered in the vector quantization literature [431] An extension to ID3 [391] that distinguishes between attributes with unordered domains and attributes with linearly ordered domains is suggested in [88] 20 2.4 ....
Truxton K. Fulton, Simon Kasif, and Steven Salzberg. An efficient algorithm for for finding multi-way splits for decision trees. In ML-95 [333]. to appear.
....if the attribute domain at hand has a very high number of candidate cut points. Even a linear time method like binarization can require substantial amount of time. This presents a particular problem for learning algorithms that have to manipulate numerical attributes exhaustively; e.g. optimal [13, 9] or greedy [11] multisplitters in decision tree learning, rule induction, and nearest neighbor methods. The inconvenience for all attribute selection strategies alike is that the time consumption of attribute selection is dominated by the attributes that are the heaviest ones to evaluate. Hence, ....
....or simplicity [15, 14, 2] but strive to maintain the prediction ability of the resulting decision tree while speeding up the classifier construction by simple and efficient dynamic data processing. Our research is motivated by an attempt to relieve the quadratic time optimal multisplitting [13, 9] from its vulnerability with respect to a single anomalous dimension of the data. Even though the methods that we propose can be characterized as lazy evaluation, this is not lazy decision tree learning in the sense of Friedman, Kohavi and Yun [12] We still construct a comprehensive hypothesis ....
[Article contains additional citation context not shown here]
Fulton, T., Kasif, S. & Salzberg, S. (1995). Efficient algorithms for finding multi-way splits for decision trees. In A. Prieditis & S. Russell (eds.), Machine Learning: Proc. Twelfth International Conference. Morgan Kaufmann.
.... which can be either discrete (integer) or continuous (real) Numerical attributes are frequent in real world induction tasks and their proper treatment is important (Catlett, 1991; Fayyad Irani, 1992b, 1993; Quinlan, 1993, 1996; Van de Merckt, 1993; Maass, 1994; Dougherty, Kohavi Sahami, 1995; Fulton, Kasif Salzberg, 1995). Unfortunately, numerical value ranges are not as easy for evaluation functions to handle as nominal domains: for example, Quinlan (1996) was compelled to reconsider this problem in his well known C4.5 decision tree learner (Quinlan, 1993) The problem of choosing the interval borders and the ....
....This paper addresses the problem of finding optimal multiway partitions for numerical attribute value ranges. The only known general algorithm for finding them MULTISPLITTING NUMERICAL ATTRIBUTES 3 is the exponential brute force method, which, of course, is not a feasible solution in practice. Fulton et al. 1995) devised an algorithm for finding efficiently optimal multisplits for a class of evaluation functions. In this article we enhance and extend their method and give an exact characterization of the kind of attribute evaluation functions for which the enhanced approach is valid. Henceforth, optimal ....
[Article contains additional citation context not shown here]
Fulton, T., Kasif, S., & Salzberg, S. (1995). Efficient algorithms for finding multi-way splits for decision trees. In A. Prieditis, & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 244--251). San Francisco, CA: Morgan Kaufmann.
....consuming if the domain at hand has a very high number of candidate cut points. Even a linear time method like binarization can require substantial amount of time. This presents a particular problem for learning algorithms that have to manipulate numerical attributes exhaustively; e.g. optimal [8, 11] or greedy [10] multisplitters in decision tree learning. The inconvenience for all attribute selection strategies alike is that the time consumption of attribute selection is dominated by the attributes that require the heaviest evaluation. Hence, even a single difficult attribute can ruin the ....
....By using a well behaved function we may concentrate on boundary points independent of whether the partition arity is limited a priori or not. If a well behaved evaluation function also has the so called cumulativity property, the general optimal partitioning algorithm of Fulton et al. [11] can be adapted to operate in time that is quadratic in the number of blocks instead of bins. 3 Boundary points as an indication of attribute relevance Let us study the well behaved evaluation function average class entropy, ACE. For a partition U i S i of the data set S, ACE is defined to ....
Fulton, T., Kasif, S., Salzberg, S.: Efficient algorithms for finding multi-way splits for decision trees. In: Machine Learning: Proc. 12th Intl. Conf. Morgan Kaufmann (1995) 244--251
....Holte ( Ho] has shown that this simple classification method can be surprisingly effective. His experiments demonstrate the interesting fact that in many practical cases, even in instance spaces of more than 10 dimensions, there exists one dimension that fairly accurately describes the concept. [FKS] give an algorithm with O(dKn dn log n) running time that computes the tree in T (1; K) that minimizes the error for a set of n points in d dimensions. They also give new experimental results that verify that these simple trees have good accuracy. Auer et al. ( AHM] were the first to give a ....
....d 1 1 0 Figure 6: An optimal 4 stripe. jSj = n) the tree in T (2; K) that minimizes the error can be computed in O(d 2 fflKn d 2 K 2 n d 2 n log n) time if there exists a tree that incorrectly classifies at most ffl points. In a different approach to generalize the results of [Ho] and [FKS], DG] introduce the hypothesis set SK of K stripes. A K stripe is defined by K parallel lines of arbitrary orientation (Fig. 6) The following result is shown: The stripe in SK that minimizes the error for a set S ae [0; 1] 2 (jSj = n) can be computed in O(K 2 n 2 log n) time, and in O(n ....
T. Fulton, S. Kasif and S. Salzberg. An Efficient Algorithm for Finding Multi-way Splits in Decision Trees. Proc. Machine Learning 1995.
....recursively to partition the sub intervals left and right of the cutpoint. We found this method to be quite conservative in the number of intervals used in the discretization, which led to poor performance when used for our classification algorithm. The second method was originally proposed by Fulton, Kasif and Salzberg (1995) and then extended by Elomaa and Rousou (1996) In this case, the algorithm searches for the best split with a given maximum number of intervals. The quality of a partition is evaluated by an impurity measure, and the efficiency of the search is ensured by a dynamic programming algorithm. The ....
Fulton, T., Kasif, S. & Salzberg, S. (1995). Efficient algorithms for finding multi-way splits for decision trees.
....values those most likely to make correct classification decisions. The search for such boundaries begins at a coarse level and is refined over time to find locally optimal partition boundaries. Dynamic programming methods have been applied to find interval boundaries for continuous features (Fulton, Kasif Salzberg 1994). In such methods, each pass over the observed values of the data can identify a new partition on the continuous space based on the intervals already identified up to that point. This general framework allows for a wide variety of impurity functions to be used to measure the quality of candidate ....
Fulton, T., Kasif, S. & Salzberg, S. (1994), An efficient algorithm for finding multi-way splits for decision trees, Unpublished paper.
....have shown that very shallow trees (one level or two levels respectively) can be rather effective on many common problems if splits of this sort are allowed. This has led to discussion of algorithms to find efficient and optimal multi way splits of continuous attributes (Fayyad Irani, 1993; Fulton et al. 1995; Elomaa Rousu, 1996) The idea is to partition a continuous attribute into K 2 disjoint intervals, and have K daughter nodes, in such a way as to maximize the reduction in impurity. Elomaa Rousu, 1996 point out errors and inefficiencies in the earlier algorithms. Quinlan (1996b) disputes ....
Fulton, T., Kasif, S. & Salzberg, S. (1995) Efficient algorithms for finding multi-way splits for decision trees. In Proceedings of the Twelfth International Conference on Machine Learning, eds A. Prieditis & S. Russell, pp. 244--251. San Francisco: Morgan Kaufmann.
....continuous attributes into these algorithms is considered subsequently. The problem of meaningfully discretizing a continuous dimension is considered in [99, 181, 367, 263] Fast methods for splitting a continuous dimension into more than two ranges is considered in the machine learning literature [100, 115]. 12 An extension to ID3 [301] DECISION TREE CONSTRUCTION: SURVEY 15 that distinguishes between attributes with unordered domains and attributes with linearly ordered domains is suggested in [60] Quinlan [308] recently discussed improved ways of using continuous attributes with C4.5. 4. ....
Truxton K. Fulton, Simon Kasif, and Steven Salzberg. An efficient algorithm for for finding multi-way splits for decision trees. In ML-95 [255]. to appear.
....S of S by method M Methods: Promise, Information gain Table 2. A summary of representation space contraction operators in AQ17 DCI. 4.2.2. 1 Attribute value Abstraction Using QUANT Research on attribute value abstraction is usually performed under the name attribute value discretization [7] [8]. We view this process as a form of abstraction because the result of it is a decrease of information about an object [24] By replacing original attribute values by more abstract ones the representation space is reduced, thus this process represents a representation space contraction ....
Fulton, T., Kasif, S., and Salzberg S., "Efficient Algorithms for Finding Multi-way Splits for Decision Trees", Proceedings of the Twelfth International Conference on Machine Learning, pp. 244-251., June 1995.
....the data. With this result in hand, we can make one more pass to find the optimal 4 way split. We continue for K passes (for any K n) each time incurring only linear cost. At the end, we have the optimal K way multi split for all K. This algorithm is described below in Section 3. 1 (see also [FKS95]) Clearly one could model a K split with a succession of binary splits. However, an algorithm that considers binary splits one at a time will not necessarily discover a K split, even though the K way split might lead to a much smaller tree overall. We illustrate this with an example in Section 5. ....
....any of six built in impurity measures, including information gain. Because we have not yet investigated pruning methods for our k split algorithms, we also ran some comparisons against unpruned trees produced by these methods. 5. 3 Artificial data We ran experiments on numerous artificial datasets [FKS95]; for space considerations, we present just two of those datasets here. Both data sets were constructed to illustrate distributions for which a multiple split capability should benefit a tree building algorithm. We use 2 D data in order to illustrate pictorially how our algorithm classifies the ....
[Article contains additional citation context not shown here]
T. Fulton, S. Kasif, and S. Salzberg. Efficient algorithms for finding multi-way splits for decision trees. In Proc. of the Twelfth Internatl. Conf. on Machine Learning, July 1995.
No context found.
Fulton T., Kasif S., and Salzberg S. (1995). Efficient algorithms for finding multi-way splits for decision trees. In Proceedings of the Twelfth Int. Conference on Machine Learning, 244-251. Morgan Kaufmann.
No context found.
Fulton T., Kasif S., and Salzberg S. (1995). Efficient algorithms for finding multi-way splits for decision trees. In Proceedings of the Twelfth Int. Conference on Machine Learning, 244-251. Morgan Kaufmann.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC