| Oren Etzioni and Ruth Etzioni. Statistical methods for analyzing speedup learning experiments. Machine Learning. (To Appear). |
....and Russell[18] point out, if the faster system has problems censored by resource limits then experiments that compare these two systems problem solving speeds can exhibit an horizon effect, i.e. changing the resource bounds can radically change which system appears faster. Etzioni and Etzioni[8] describes two conservative statistical tests for analyzing speedup experiments. Both of these tests ignore the magnitude of the differences between the systems problem solving times. Instead they either just look at the sign of the differences or they look at both the sign and the ranking of the ....
....how long 89 it takes the system to find a solution and how long it takes to fail. With the expert search control theory the system often goes directly to the solution. Because they ignore the actual magnitudes of the differences in problem solving times, the tests proposed by Etzioni and Etzioni[8] are not appropriate here. One standard statistical approach to handling censored data (i.e. data where the problem solver s resource consumption is terminated early because it has exceeded its resource limits) is to throw out the doubly censored data (i.e. data where both problemsolvers fail ....
[Article contains additional citation context not shown here]
O. Etzioni and R. Etzioni. Statistical methods for analyzing speedup learning experiments. Machine Learning, 14, March 1994.
....(Figure 9(a) and Figure 9(b) respectively) we see that TWEAK s performance deteriorates significantly around points with #e #toS 4. We also note that around these middle points both SNLPMTC and McNONLIN MTC perform much better than TWEAK. This conforms to Hypothesis 2. Using a signed rank test [7] on cpu times, these results are statistically significant at #ctob = 4 with p values of 0.01 and 0.008, 34 respectively. To show that this observed performance differential is correlated with the interplay between b, and be factors, we compare the average b, and be factors of the three planners ....
O Etzioni and R Etzioni. Statistical Methods for Analyzing Speedup Learning Experiments. Machine Learning, 14(3), 1994.
....of abstraction and no abstraction is less dramatic, although prodigy alpine has solved more of the problems in considerably less time than prodigy. To evaluate the statistical significance of the results in the experiment, we can apply the signed rank test as presented by Etzioni and Etzioni [18]. This test generates an upper bound on what is called the p value. The p value is the probability that conclusions drawn from the data are in error. The lower the p value, the stronger the evidence that the hypotheses are correct. In all of these comparisons the significance level is taken to be ....
Oren Etzioni and Ruth Etzioni. Statistical methods for analyzing speedup learning experiments. Machine Learning, forthcoming.
....of these experiments. Figure 20 shows the cumulative performance graphs for the three methods in the second test set. Our results clearly show that SNLP EBL was able to outperform SNLP significantly on these problem populations (p value for this was .24 for sign test and . 00 for signed rank test [15]) A closer analysis of the second set revealed that SNLP EBL outperformed SNLP in 36 problems, resulting in a cumulative saving of 3,587 cpu. sec. SNLP on the other hand outperformed SNLP EBL in 43 instances, but the cumulative difference in this case was a mere 27 sec. Similarly between ....
O. Etzioni and R. Etzioni. Statistical methods for analyzing speedup learning experiments. Machine Learning, Vol 14, 1994.
....that could be solved within the time limit is shown in Table 7. We have determined these values for reasoning from abstract cases separately for each of the three types of abstract cases. The significance of the speedup results has be investigated by using a maximally conservative sign test (Etzioni Etzioni, 1994). Unfortunately it turned out that the speedup of hierarchical planning over pure search was not significant. We also couldn t find a significant speedup of reasoning from abstract cases when using always the worst applicable abstract case (c) over pure search. This was due to the large number of ....
....for an upper bound of the p value of 0:001. The mentioned p value is a standard value used in statistical hypothesis tests. It is the probability, assuming that the hypothesis does not hold, of encountering data that favors the hypothesis as much or more than the observed data in the experiment (Etzioni Etzioni, 1994). Therefore a result is more significant if the p value is smaller. From this analysis, we can clearly see, that our two basic hypotheses are supported by our experimental data. Even if not significant we can see a moderate improvement in the problem solving time and in the number of solved ....
[Article contains additional citation context not shown here]
Etzioni, O., & Etzioni, R. (1994). Statistical methods for analyzing speedup learning.
....the results obtained using ten well known data sets. The two statistical hypohesis tests performed (sign test and signed rank test) fully confirm that the trees induced with the distance are smaller. For a good short exposition of statistical hypothesis tests, we refer the reader to a paper by O. Etzioni and R. Etzioni (Etzioni 1994) in the Machine Learning journal where these same two tests are applied to analyze speedup learning algorithms. For a detailed exposition on hypothesis testing see Gibbons (1971) 2. The learning data sets In our analysis we have used ten data sets from the UCI repositery ....
Etzioni, O. & Etzioni, R. (1994). Statistical Methods for Analyzing Speedup Learning Experiments. Machine Learning, 14, 333-347.
....(the project member) wrote the best program, followed closely by Multi tac and then by Subject1. The unoptimized CSP program was by far the least efficient. These conclusions regarding the relative efficiencies of the four programs can be justified statistically using the methodology proposed by Etzioni and Etzioni (1993). Specifically, any pairwise comparison using a simple sign test on the four programs completion times (on all 100 test instances) is statistically significant with p :05. In fact, we note that for the rest of the experiments summarized in Table 1, a similar comparison between Multi tac s ....
O. Etzioni and R. Etzioni. Statistical methods for analyzing speedup learning experiments. Machine Learning, 14(3):333--347, 1993.
....9(a) and Figure 9(b) respectively) we see that TWEAK s performance deteriorates significantly around points with # est # clob 4. We also note that around these middle points both SNLPMTC and McNONLIN MTC perform much better than TWEAK. This conforms to Hypothesis 2. Using a signed rank test [7] on cpu times, these results are statistically significant at # clob = 4 with p values of 0.01 and 0.008, 34 respectively. To show that this observed performance differential is correlated with the interplay between b t and b e factors, we compare the average b t and b e factors of the three ....
O Etzioni and R Etzioni. Statistical Methods for Analyzing Speedup Learning Experiments. Machine Learning, 14(3), 1994.
.... to escape from local minima but we do not allow it to acquire new macros (see the SolveProblem procedure in Figure 2) For all the experiments described in this paper, Micro Hillary was able to solve all the problems while in testing mode; therefore, no special handling of censored data (Etzioni Etzioni, 1994; Segre, Elkan, Russell, 1991) was necessary. Recall that the quiescence test lets Micro Hillary stop only after it is able to solve 50 problems without getting stuck in any local minima. This significantly reduces the likelihood that it will encounter a local minimum after learning. 4.1.1 ....
Etzioni, O., & Etzioni, R. (1994). Statistical methods for analyzing speedup learning experiments.
....Elkan, and Russell (1991) a learning system that greatly improves problem solving performance under a given resource bound may perform quite differently under a different resource bound. Some researchers suggest statistical analysis methods for assessing the significance of this factor (e.g. see Etzioni and Etzioni, 1994). In this study, however, we do not address the issue of how results might change given different resource bounds. We note that COMPOSER s statistical properties suggest that problem solving performance should be no worse after learning, whatever the resource bound, but the performance improvement ....
Etzioni. O. & Etzioni, R. (1994). Statistical Methods for Analyzing Speedup Learning Experiments.
....sizes and the different kinds of reuse. These average numbers are computed from the 10 training and testing sets for each size. We can see a strong improvement through reusing abstract case. Additionally, these results were analyzed with the maximally conservative sign test as proposed in [ 10 ] . It turned out that in all 20 (10 10) experiments, the improvement was significant (p 0:05) 4.4 Flexibility of Reuse The purpose of this experiment was to evaluate the flexibility of the reuse. For each of the 100 cases, we evaluated how many of the problems in the remaining 99 cases could ....
O. Etzioni and R. Etzioni. Statistical methods for analyzing speedup learning. Machine Learning, 14:333--347, 1994.
....in this time is considered unsolved (This limit includes both the time taken for retrieval and the time taken for planning) To eliminate any bias introduced by the time bound (c.f. 16] we used the maximally conservative statistical tests for censored data, described by Etzioni and Etzioni in [2], to assess the significance of all speedups. All experiments were performed in interpreted Lucid Commonlisp running on a Sun Sparc II. 4.2. Experimental Results Table 1 shows the cumulative statistics for solving the 30 test problems from each domain for all three planners and all three reuse ....
....the cumulative time and the percentage problems solved. IEBG strategy is also the best strategy for TOCL, but turns out to be considerably less effective than the IEBG strategy for SNLP. More interestingly, we see that the 8 Using the statistical tests for censored data advocated by Etzioni in [2], we find that the hypothesis that SNLP IEBG is faster than SNLP as well as the hypothesis that SNLP IEBG is faster than TOCL IEBG are both supported with very high significance levels by our experimental data. The p value is bounded above by 0:000 for both the signed test, and the more ....
O. Etzioni and R. Etzioni. Statistical methods for analyzing speedup learning experiments. Machine Learning. (To Appear).
....realized. Speedup on the 10 package problems increased from 4.5x to 6.6x, and on the 20 packages problems from 4.5x to 7.5x. On the 50 package problems, solution coverage actually slightly decreased from 24 to 22 . This decrease is most likely caused by the use of a time limit. As discussed in (Etzioni Etzioni, 1994), using a time limit can decrease speedup, especially when solution times are close to the limit. In these problems, solution times were often very close to the time limit (of 500 seconds) After training on harder problems it s possible that solution times in several of these problems got ....
....in this domain since many of the test problems cannot be solved by the base planner under the time limit. Though employing a time limit is often necessary when running experiments such as the one presented in this dissertation, it can often underplay the true speedup achieved by a learning system (Etzioni Etzioni, 1994). For these experiments, it is unclear how long it would take the base planner (without control knowledge) to solve all of the test problems. The planner was run for several days without a time limit and could only solve a few problems. Second, Scope is producing high quality solutions. The time ....
Etzioni, O., & Etzioni, R. (1994). Statistical methods for analyzing speedup learning experiments.
.... average, on a population of problems (e.g. 21, 32, 52] Average speedup is relatively easy to test experimentally; measuring problem solving time with and without a set of control rules, on a large, randomly generated sample of problems, indicates whether the set achieves average speedup or not [13]. Unfortunately, average speedup is distribution specific a rule set may be effective on one problem distribution and ineffective on another. Furthermore, average speedup is a weak notion. Problem solving may remain intractable, due to its exponential nature, despite a sizable average speedup. ....
....specifically designed to address this problem, showing how the relative performance of the systems scales on larger and larger CPU time bounds. Segre et al. also argue that the experimenter s choice of time bound can bias the results of the experiment. To address this problem, Etzioni and Etzioni [13] develop statistical hypothesis tests designed to analyze speedup learning data that is truncated due to the use of time bounds. The tests show that the differences between prodigy static and prodigy ebl are statistically significant. The graphs in Figure 9 show that prodigy static is faster ....
Oren Etzioni and Ruth Etzioni. Statistical methods for analyzing speedup learning experiments. Submitted for publication., 1992.
No context found.
. 5. Etzioni, O. and Etzioni, R. Statistical methods for analyzing speedup learning experiments. To appear in Machine Learning. 6. Noreen, E.W. Computer Intensive Methods for Testing Hypotheses: An Introduction. John Wiley and Sons, New York, NY
....of time doing redundant sensing. The version of xii without LCW completed only 8 of the goals before hitting a fixed time bound of 1000 CPU seconds. In contrast, the version with LCW completed 94 of the goals in the allotted time. The hypothesis tests for analyzing censored data, described in [15], demonstrate that our results are statistically significant. 5 Future Work Although we have relaxed the assumption of complete information, we still assume correct information. Since we want our agents to cope with exogenous events, we are in the process of relaxing this assumption as well. We ....
Oren Etzioni and Ruth Etzioni. Statistical methods for analyzing speedup learning experiments. Machine Learning, 14, March 1994. Technical note.
No context found.
Oren Etzioni and Ruth Etzioni. Statistical methods for analyzing speedup learning experiments. Machine Learning. (To Appear).
No context found.
Oren Etzioni and Ruth Etzioni. Statistical methods for analyzing speedup learning experiments. Machine Learning, 1992. Technical note, to appear.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC