| J.N. Hooker. Testing Heuristics: We Have it All Wrong. Journal of Heuristics, 1:33--42, 1995. |
....confidence intervals) on input classes that should induce uniform behavior. Ultimately, the role of the skeptic is to be subsumed in a new methodology of experimental design and performance evaluation of combinatorial algorithms. The case for such methods has already been succinctly articulated in [15, 16] as well as demonstrated experimentally in our earlier work [17, 18, 19, 20] We first contrast the proposed approach with the up to date benchmarking experiments on SAT solvers that are being reported on the Web [3] the figure of merit being time to solve on a particular PC. A sample of such a ....
.... SAT solvers has been evaluated experimentally either in terms of randomly generated instances of SAT problems, e.g. 24, 25] or structured instances, such as the instances from the DIMACS set [21] or the SATPLAN set [22] Merits of either approach are subject to on going critique and examination [15, 16, 26, 27, 28]. The traditional way to report results of SAT solvers is the time to solve performance of single instances of a reference formula in conjunctive normal form (cnf) Table 1 represents the traditional organization of such an experimental report. Results of more comprehensive experiments, repeated ....
J. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, pages 1:33--42, 1996.
....been, the workhorses of industrial computing, there is no question about the ultimate significance of this program for dealing with intractability. There has recently been a revival of interest in obtaining systematic empirical performance evaluations of heuristic algorithms for hard problems [BGKRS95, Hoo95, JM93, JT96]. The main question is whether this can actually be considered a program of computer science theory in the sense intended for this list. It isn t fundamentally at present a mathematical research program, though theorists have sometimes contributed to the design of heuristic algorithms. ....
J. N. Hooker, "Testing Heuristics: We Have It All Wrong," J. Heuristics 1 (1995), 33--42.
....some form of comparative testing can expect their work to be ignored. The compulsion to develop a new method has resulted in the literature being full of new algorithms, most of which are never used or analyzed by anyone other than the researchers who created them. 2 D. Whitley et al. Hooker [6] discusses the evils of competitive testing , and points out the difficulty of making fair comparisons of algorithm performance. Implementation details can significantly impact algorithm performance, as can the values selected for various tuning parameters. Some algorithms have been refined for ....
J.N. Hooker. Testing Heuristics: We Have it All Wrong. Journal of Heuristics, 1:33--42, 1995.
....been, the workhorses of industrial computing, there is no question about the ultimate significance of this program for dealing with intractability. There has recently been a revival of interest in obtaining systematic empirical performance evaluations of heuristic algorithms for hard problems [BGKRS95, Hoo95, JM93, JT96]. There have been vigorous developments of new ideas for designing heuristic algorithms, particularly new ideas employing randomization in various ways. These approaches frequently have colorful and extravagant names based on far fetched analogies in other sciences, such as simulated annealing, ....
J. N. Hooker, "Testing Heuristics: We Have It All Wrong," J. Heuristics 1 (1995), 33--42.
....of algorithms. In a number of computer science fields, we need more and more knowledge about the behavior of the algorithms we design and about the characteristics of benchmarks. This competition is a step in a better and crucial empirical knowledge of SAT algorithms and benchmarks [Hoo94, Hoo96] The aim of this paper is to report what organizers learned during this competition (about solvers, benchmarks and the competition itself) and to publish enough data to allow the reader to make is own opinion about the results. As it was mentioned, in the first half of the last decade, some ....
J. N. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, pages 32--42, 1996.
....of algorithm features influence algorithm behavior needs a careful design of experiments, because the brute force approach of trying all combinations is not practicable. We will not go into general principles of experiment design, good treatments of this subject can be found for instance in [8] [9]. Here we only note that in this case the weaponry of algorithm performance measures is extended with indicators of algorithm behavior. The leading question behind such indicators is What happens during execution , rather than How far can we get . Commonly used indicators include population ....
J. N. Hooker, "Testing heuristics: We have it all wrong," Journal of Heuristics, vol. 1, no. 1, pp. 33--42, 1995.
....confidence intervals) on input classes that should induce uniform behavior. Ultimately, the role of the skeptic is to be subsumed in a new methodology of experimental design and performance evaluation of combinatorial algorithms. The case for such methods has already been succinctly articulated in [15, 16] as well as demonstrated experimentally in our earlier work [17, 18, 19, 20] We first contrast the proposed approach with the up todate benchmarking experiments with SAT solvers that are being reported on the Web [3] the figure of merit being time to solve on a particular PC. A sample of such ....
.... has been evaluated experimentally either in terms of randomly generated instances of SAT problems, e.g. 24, 25] or structured instances, such as the instances from the DIMACS set [21] or the SATPLAN set [22] Merits of either approach are subject to on going critique and examination [15] [16], 26] 27] 28] The traditional way to report results of SAT solvers is the time to solve performance of single instances of a reference formula in conjunctive normal form (cnf) Table 1 represents the traditional organization of such an experimental report. Results of more comprehensive ....
J. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, pages 1:33--42, 1996.
.... both discrete and continuous based greedy algorithms have been proposed, see for example [10, 15, 28] and the reviews in [16] and [27] MAX SAT is therefore a paradigmatic problem for the algorithmic engineering and scientific testing and tuning effort advocated for example by [2] 5] and [22]. The core of this paper consists of the design of a new heuristic algorithm guided by a series of focussed experiments, where individual factors are isolated and studied through a statistical analysis. Because of this design process the algorithm achieves average results that are significantly ....
.... Algorithms (GA) 21] Tabu Search (TS) 12] Now, the pitfalls of competitive testing where the participants select options and tune parameters in a non scientific way (i.e. poorly documented and hardly reproducible) are being recognized in the heuristics research community, see for example [22], where a more scientific approach of controlled experimentation is advocated. It is often the case that the user is a crucial learning component of an heuristic algorithm, whose eventual success on a problem should be credited more to the human smartness than to the algorithm intrinsic ....
J. Hooker, "Testing Heuristics: We Have it All Wrong," Journal of Heuristics 1(1) (1996), 33--42.
....of search space topologies should enable intelligent (as opposed to the current ad hoc methods) tuning of problem sensitive algorithm parameters. Taken together, we feel development of such methodologies is an important next step toward the development of a Science of Algorithms advocated in [Hoo95]. Finally, we acknowledge there are many theoretical questions regarding the feasibility of developing such methodologies. Statistical measures of search space topologies are necessarily approximate the open question is how much information is actually lost during the process of sampling and ....
J.N. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1(1):33--42, 1995.
.... either in terms of randomly generated instances of SAT problems, e.g. 28, 11] or structured instances, such as the instances from the DIMACS set [30] or the SATPLAN set [23] Merits of either approach are subject to on going critique and examination [10] 24] 26] in particular, and [21] [22], 25] in general. The papers [21] and [22] succinctly articulate the case for careful experimental design an approach adopted for the experiments with SAT problems in this paper. The experimental design methodology presumes availability of well defined classes of experimental subjects. In our ....
....instances of SAT problems, e.g. 28, 11] or structured instances, such as the instances from the DIMACS set [30] or the SATPLAN set [23] Merits of either approach are subject to on going critique and examination [10] 24] 26] in particular, and [21] 22] 25] in general. The papers [21] and [22] succinctly articulate the case for careful experimental design an approach adopted for the experiments with SAT problems in this paper. The experimental design methodology presumes availability of well defined classes of experimental subjects. In our earlier work we identified and created ....
J. Hooker, Testing heuristics: We have it all wrong, Journal of Heuristics, (1996), pp. 1:33--42.
....(of models) of the considered family for the following reason. Towards using real life instances, we would have to rst pick some models within the considered family, but we would then be unable to justify why these models were picked rather than some others. The purpose of our experiments [7] is to generate statistics that guide us in designing a family speci c default heuristic for a solver, which must be able to handle random instances over that entire family. We do not aim at a heuristic for a speci c model, which would have to be able to handle (only) real life instances of (only) ....
J.N. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics 1:33-42, 1996.
....models) of the considered family for the following reason. Towards using real life instances, we would have had to rst pick some models within the considered family, but we would then have been unable to justify why these models were picked rather than some others. The purpose of our experiments [10] was to generate statistics that guide us in our companion work [7, 11] where we aim at a family speci c default heuristic for a solver, which must be able to handle random instances over that entire family. We do not aim at a heuristic for a speci c model, which would have to be able to handle ....
J.N. Hooker. Testing heuristics: We have it all wrong. J. of Heuristics 1:33-42, 1996.
....search steps, i.e. we get run length distributions (RLDs) instead of run time distributions. Note, that obtaining run length distributions for single instances does not involve significantly higher computation times than to get a stable estimate for the mean performance of an algorithm. Hooker [12,13] criticises that the empirical analysis of algorithms usually remains at the stage of simply collecting data and argues that, analogous to empirical methodology used in other sciences, one should furthermore attempt to formulate hypotheses based on this data which, in turn, can be experimentally ....
J.N. Hooker. Testing Heuristics: We Have It All Wrong. Journal of Heuristics, pages 33--42, 1996.
.... while not so good performance with tableaux proof systems, and for some problem sets the situation is reversed, cf. 22, 24, 27] However, by competitive testing alone it is near impossible to identify the major factors having a positive or negative in uence on the performance of a theorem prover [20]. As long as the theorem provers which are being compared follow di erent search strategies this di erence is likely to have a dominating e ect on the overall performance. This has two consequences. One, we can say little about the other factors in uencing the performance, for example, fundamental ....
Hooker, J. N.: 1996, `Testing heuristics: We have it all wrong'. Journal of Heuristics 1, 33-42. Resolution for Testing Modal Satisability and Building Models 33
....can vary a lot due to implementation differences in hardware, e.g. cpu speed, as well as software, e.g. choice of implementation language. Hooker has argued that the use of cpu time as a performance indicator for algorithmic experimentation may be suitable for development, but not for research [9]. We believe that hardware independent performance measures are quite important for conducting large scale empirical CSP research because all available hardware can be utilised for experimentation in spite of their underlying differences. 3.1. General results Figures 2 and 3 display the ....
Hooker, J.N. Testing Heuristics: We have It All Wrong. Journal of Heuristics 1, 33-42, 1995.
....either because they represent seemingly typical practical problems or (in most cases) because they have been discussed in the GA 26 literature as being especially challenging. In general, finding a representative sample of decision models is an unsolved problem, and its pursuit may be futile [5]. Certainly, what needs to be done includes: a) examining a much larger body of models, and (b) taking a carefullydesigned approach to computational experimentation with the models we do examine [1, 5] 2. What happens when much larger problems and models are addressed Will CLA ideas ....
.... a representative sample of decision models is an unsolved problem, and its pursuit may be futile [5] Certainly, what needs to be done includes: a) examining a much larger body of models, and (b) taking a carefullydesigned approach to computational experimentation with the models we do examine [1, 5]. 2. What happens when much larger problems and models are addressed Will CLA ideas effectively scale up Again, substantial and extensive computational experiments are indicated, and experiments performed but not reported here cohere with results we do report. In addition, we note that ....
[Article contains additional citation context not shown here]
Hooker, J.N., "Testing Heuristics: We Have It All Wrong," Journal of Heuristics, 1, no. 1, Fall 1995, pp. 33--42.
....control schemes should not be disregarded prematurely, on the basis of effectiveness alone. Such a focus turns the 33 process of improving algorithms into a competitive race which concentrates on beating other algorithms rather than on insight why some algorithms perform better than others (Hooker 1995). Given the long history of more and more effective heuristics for project scheduling problems, evaluating algorithms on these clearly puts new algorithms at a disadvantage if the outcome is considered in terms of effectiveness alone. One has to bear in mind that the analytical and experimental ....
HOOKER, J.N. (1995), "Testing heuristics: We have it all wrong", Journal of Heuristics 1, pp. 33-42.
....We applied exactly the same branch and bound algorithm to the original problem (without decomposition) and compared the computation time to that of the Benders algorithm. This provided a controlled experiment in which the e ect of decomposition could be isolated, a general approach advocated in [24, 26]. Because the master problem does not have the traditional inequality constraints, we solved it with modi ed branch and bound algorithm. We branched on y j s with fractional values as well as on the possible values of the right hand side of each Benders cut (39) More precisely, for k = 1; ....
Hooker, J. N., Testing heuristics: We have it all wrong, Journal of Heuristics xx. 33
No context found.
J.N. Hooker. Testing Heuristics: We Have it All Wrong. Journal of Heuristics, 1:33--42, 1995.
No context found.
J. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1(1):33-- 42, 1996.
No context found.
J. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1:33--42, 1995.
No context found.
J. N. Hooker, "Testing Heuristics: We Have It All Wrong," Heuristics, vol. 1, pp. 33-- 42, 1996.
No context found.
J. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1(1):33-- 42, 1996.
No context found.
J. N. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1:33-- 42, 1995.
No context found.
Hooker, J. N. Testing heuristics: We have it all wrong. Journal of Heuristics, 1:3342, 1995.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC