| L.J. Frey and D.H. Fisher. Modeling decision tree performance with the power law. In Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics. Morgan Kaufmann, 1999. |
.... curve, if available, can be very useful: 1) to predict what is a reachable learning accuracy for a given amount of data, 2) to estimate how many data should be used to achieve a desired learning accuracy, and (3) when to stop a learning process according the predictable gain in learning accuracy [5]. Such uses can find many practical applications by data mining practitioners who frequently deal with large data sets. For large data sets, it will be more preferable to fit learning points that only cover a small amount of all available data. Mathematically, such a fitting can be implemented by ....
....data mining. Then which model is the desired 1.2 Related Research The common feature in the shape of learning curves suggests a possibility of fitting with a common model, of which the parameters are tuned to fit the exact shapes. The famous power law seems more often suggested to take this job [4, 5, 8]. Frey and Fisher [5] use y = a # x b to model C4.5 decision tree performance, where y is error rate of learning, x is training data size, a 0 and b 0 are parameters to be fitted. After comparing the two parameter power law with other three two parameters models, namely linear model, ....
[Article contains additional citation context not shown here]
L.J. Frey and D.H. Fisher. Modeling decision tree performance with the power law. In Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics. Morgan Kaufmann, 1999.
....consider it as the process of increasing the speed of data mining. Although large data sets are necessary for reliable results, large databases are not necessarily advantages for the following reasons: ffl Not all data is informative [14, 19, 15] ffl High degree of redundancy in the databases [9, 11]. 2 ffl Experimental studies on the entire database are expensive [3] This is the basic problem in genebank collections and in drug industry. For example to conduct genetic studies on a single gene we need many resources and it is impossible to conduct studies on all of the genes. So ....
L.J. Frey and D. H. Fisher Jr. Modeling decision tree performance with the power law. Morgan Kaufmann, San Franciso, CA, 1999.
....is required to do so. In Section 7 we discuss a similar technique for determining the minimum number of training examples sufficient for satisfactory learning, namely, 18 PROVOST AND KOLLURI progressively sampling larger subsets until model performance no longer improves (John and Langley 1996; Frey and Fisher 1999; Provost, Jensen, and Oates 1999) 6.2.2. Select a subset of the features So far, our discussion of data partitioning has focused on selecting a subset of the examples. Let us now turn to the problem of selecting a subset of features. It is important to consider the symmetry with selecting ....
....method. Considering that the run time complexity of inductive algorithms is at best linear in the number of examples, and often worse, relatively inexpensive experiments can be conducted on small samples in order to estimate the number of examples that are actually needed (John and Langley 1996; Frey and Fisher 1999; Provost, Jensen, and Oates 1999) In cases where the number of examples needed is much smaller than the number available, such procedures can provide substantial practical speedups. Subsets of the examples should be sampled, using stratified sampling when one class dominates strongly. Subsets ....
Frey, L. J. and D. H. Fisher (1999). Modeling decision tree performance with the power law. In D. Heckerman and J. Whittaker (Eds.), Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics. San Francisco, CA: Morgan Kaufmann.
....mechanics [7, 19] have shown that sudden increases in accuracy are possible, particularly on small samples. However, empirical studies of the application of standard induction algorithms to large data sets those of relevance to this paper have shown learning curves to be well behaved [3, 4, 6, 12, 13]. In addition, practical progressive sampling demands only that learning curves are well behaved at the level Compute schedule S = fn 0 ; n 1 ; n 2 ; n k g of sample sizes n n 0 M model induced from n instances while not converged recompute S if necessary n next element of S ....
....fraught with difficulties. Actual learning curves often require a complex functional form to estimate accurately. The curve shown in figure 1 has three regions of behavior a primary rise, a secondary rise, and a plateau. Most simple functional forms (e.g. the power laws used by Frey and Fisher [4] and by John and Langley [8] generally cannot capture all three regions of behavior, often causing the estimated curves to converge too quickly or never to converge. Estimating convergence is generally more challenging than fitting earlier parts of the curve, and even fairly small errors can ....
[Article contains additional citation context not shown here]
Frey, L. J., and Fisher, D. H. Modeling decision tree performance with the power law. In Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics (1999), D. Heckerman and J. Whittaker, Eds., San Francisco, CA: Morgan Kaufmann.
....the large quantities available may not be su#cient to capture all the relevant structure. It would thus be useful to have methods, even if heuristic in nature, to estimate early on how much data will be needed. Examples of research in this direction are the fitting of power laws to learning curves (Frey Fisher, 1999) and statistical tests on the slope of these curves (Provost, Jensen, Oates, 1999) A complementary approach is to attempt to estimate the Bayes rate, i.e. the error rate at which even an infinite capacity learner will necessarily asymptote (Dasarathy, 1991; Cortes, 1995; Tumer Ghosh, 1996) ....
Frey, L. J., & Fisher, D. H. (1999). Modeling decision tree performance with the power law. In Proceedings of Uncertainty '99: The Seventh International Workshop on Artificial Intelligence and Statistics (pp. 59-65). Fort Lauderdale, FL: Morgan Kaufmann.
....incorrect. However, progressive sampling algorithms can model the convergence probability dynamically. For example, a progressive sampling algorithm might assume that the accuracy of a particular algorithm on a particular data set can be modeled by a power law. A simple power law is shown by Frey and Fisher (1999) to model learning curves better than a variety of alternatives, and a similar approach is used by John and Langley (1996) to determine convergence (see Section 6) Such an modeling approach could allow a progressive sampling procedure to adaptively improve the efficiency of its schedule during ....
.... empirical estimates of the complexity of C4.5 on the led and waveform data sets (used below) and found the former to be O(n 1:22 ) and the latter to be O(n 1:37 ) 4 As with learning curve estimation, progressive sampling can dynamically determine the 4 We follow a similar procedure to (Frey Fisher, 1999) and assumed that the running time could be modeled by: y = a n b , gathered samples of CPU time required to build trees on 1,000 to 100,000 instances in increments of 1,000, then took the log of both the CPU time and the number of instances, ran linear regression, and used the resulting slope ....
[Article contains additional citation context not shown here]
Frey, L. J., & Fisher, D. H. (1999). Modeling decision tree performance with the power law. In Heckerman, D., & Whittaker, J. (Eds.), Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics. San Francisco, CA: Morgan Kaufmann.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC