Results 1 - 10
of
135
Mutation-driven generation of unit tests and oracles
- IEEE Transactions on Software Engineering (TSE
"... To assess the quality of test suites, mutation analysis seeds artificial defects (mutations) into programs; a non-detected mutation indicates a weakness in the test suite. We present an automated approach to generate unit tests that detect these mutations for object-oriented classes. This has two ad ..."
Abstract
-
Cited by 72 (26 self)
- Add to MetaCart
(Show Context)
To assess the quality of test suites, mutation analysis seeds artificial defects (mutations) into programs; a non-detected mutation indicates a weakness in the test suite. We present an automated approach to generate unit tests that detect these mutations for object-oriented classes. This has two ad-vantages: First, the resulting test suite is optimized towards finding defects rather than covering code. Second, the state change caused by mutations induces oracles that precisely detect the mutants. Evaluated on two open source librari-es, our µtest prototype generates test suites that find si-gnificantly more seeded defects than the original manually written test suites.
Symbolic Execution for Software Testing in Practice – Preliminary Assessment
"... We present results for the “Impact Project Focus Area ” on the topic of symbolic execution as used in software testing. Symbolic execution is a program analysis technique introduced in the 70s that has received renewed interest in recent years, due to algorithmic advances and increased availability ..."
Abstract
-
Cited by 39 (6 self)
- Add to MetaCart
We present results for the “Impact Project Focus Area ” on the topic of symbolic execution as used in software testing. Symbolic execution is a program analysis technique introduced in the 70s that has received renewed interest in recent years, due to algorithmic advances and increased availability of computational power and constraint solving technology. We review classical symbolic execution and some modern extensions such as generalized symbolic execution and dynamic test generation. We also give a preliminary assessment of the use in academia, research labs, and industry.
Whole test suite generation
- IEEE Transactions on Software Engineering
"... Abstract—Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of a software test’s outcome. A common scenario in software testing is therefore that test data is generated, and a tester manually adds test oracles. As this is a difficult task, i ..."
Abstract
-
Cited by 30 (14 self)
- Add to MetaCart
(Show Context)
Abstract—Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of a software test’s outcome. A common scenario in software testing is therefore that test data is generated, and a tester manually adds test oracles. As this is a difficult task, it is important to produce small yet representative test sets, and this representativeness is typically measured using code coverage. There is, however, a fundamental problem with the common approach of targeting one coverage goal at a time: Coverage goals are not independent, not equally difficult, and sometimes infeasible – the result of test generation is therefore dependent on the order of coverage goals and how many of them are feasible. To overcome this problem, we propose a novel paradigm in which whole test suites are evolved with the aim of covering all coverage goals at the same time, while keeping the total size as small as possible. This approach has several advantages, as for example its effectiveness is not affected by the number of infeasible targets in the code. We have implemented this novel approach in the EVOSUITE tool, and compared it to the common approach of addressing one goal at a time. Evaluated on open source libraries and an industrial case study for a total of 1,741 classes, we show that EVOSUITE achieved up to 188 times the branch coverage of a traditional approach targeting single branches, with up to 62 % smaller test suites. Index Terms—Search based software engineering, length, branch coverage, genetic algorithm, infeasible goal, collateral coverage 1
Strong higher order mutation-based test data generation
- In 8th European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE ’11
"... This paper introduces SHOM, a mutation-based test data generation approach that combines Dynamic Symbolic Ex-ecution and Search Based Software Testing. SHOM targets strong mutation adequacy and is capable of killing both first and higher order mutants. We report the results of an em-pirical study us ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
(Show Context)
This paper introduces SHOM, a mutation-based test data generation approach that combines Dynamic Symbolic Ex-ecution and Search Based Software Testing. SHOM targets strong mutation adequacy and is capable of killing both first and higher order mutants. We report the results of an em-pirical study using 17 programs, including production indus-trial code from ABB and Daimler and open source code as well as previously studied subjects. SHOM achieved higher strong mutation adequacy than two recent mutation-based test data generation approaches, killing between 8 % and 38 % of those mutants left unkilled by the best performing previous approach.
Automated oracle creation support, or: How I learned to stop worrying about fault propagation and love mutation testing
- in Int’l Conf. Softw. Eng
, 2012
"... Abstract—In testing, the test oracle is the artifact that determines whether an application under test executes correctly. The choice of test oracle can significantly impact the effectiveness of the testing process. However, despite the prevalence of tools that support the selection of test inputs, ..."
Abstract
-
Cited by 26 (9 self)
- Add to MetaCart
(Show Context)
Abstract—In testing, the test oracle is the artifact that determines whether an application under test executes correctly. The choice of test oracle can significantly impact the effectiveness of the testing process. However, despite the prevalence of tools that support the selection of test inputs, little work exists for supporting oracle creation. In this work, we propose a method of supporting test oracle creation. This method automatically selects the oracle data—the set of variables monitored during testing—for expected value test oracles. This approach is based on the use of mutation analysis to rank variables in terms of fault-finding effectiveness, thus automating the selection of the oracle data. Experiments over four industrial examples demonstrate that our method may be a cost-effective approach for producing small, effective oracle data, with fault finding improvements over current industrial best practice of up to 145.8 % observed. Keywords-testing, test oracles, oracle data, oracle selection, verification I.
A.: Evolutionary generation of whole test suites
- In: International Conference On Quality Software (QSIC
, 2011
"... Abstract—Recent advances in software testing allow automatic derivation of tests that reach almost any desired point in the source code. There is, however, a fundamental problem with the general idea of targeting one distinct test coverage goal at a time: Coverage goals are neither independent of ea ..."
Abstract
-
Cited by 25 (17 self)
- Add to MetaCart
(Show Context)
Abstract—Recent advances in software testing allow automatic derivation of tests that reach almost any desired point in the source code. There is, however, a fundamental problem with the general idea of targeting one distinct test coverage goal at a time: Coverage goals are neither independent of each other, nor is test generation for any particular coverage goal guaranteed to succeed. We present EVOSUITE, a search-based approach that optimizes whole test suites towards satisfying a coverage criterion, rather than generating distinct test cases directed towards distinct coverage goals. Evaluated on five open source libraries and an industrial case study, we show that EVOSUITE achieves up to 18 times the coverage of a traditional approach targeting single branches, with up to 44 % smaller test suites. Keywords-Search based software engineering, length, branch coverage, genetic algorithm
Is Operator-Based Mutant Selection Superior to Random Mutant Selection?
"... Due to the expensiveness of compiling and executing a large number of mutants, it is usually necessary to select a subset of mutants to substitute the whole set of generated mutants in mutation testing and analysis. Most existing research on mutant selection focused on operator-based mutant selectio ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
(Show Context)
Due to the expensiveness of compiling and executing a large number of mutants, it is usually necessary to select a subset of mutants to substitute the whole set of generated mutants in mutation testing and analysis. Most existing research on mutant selection focused on operator-based mutant selection, i.e., determining a set of sufficient mutation operators and selecting mutants generated with only this set of mutation operators. Recently, researchers began to leverage statistical analysis to determine sufficient mutation operators using execution information of mutants. However, whether mutants selected with these sophisticated techniques are superior to randomly selected mutants remains an open question. In this paper, we empirically investigate this open question by comparing three representative operator-based mutant-selection techniques with two random techniques. Our empirical results show that operator-based mutant selection is not superior to random mutant selection. These results also indicate that random mutant selection can be a better choice and mutant selection on the basis of individual mutants is worthy of further investigation.
Efficient multi-objective higher order mutation testing with genetic programming
"... It is said ninety percent of faults that survive manufacturer’s testing procedures are complex. That is, the corresponding bug fix contains multiple changes. Higher order mutation testing is used to study defect interactions and their impact on software testing for fault finding. We adopt a multi-ob ..."
Abstract
-
Cited by 21 (11 self)
- Add to MetaCart
It is said ninety percent of faults that survive manufacturer’s testing procedures are complex. That is, the corresponding bug fix contains multiple changes. Higher order mutation testing is used to study defect interactions and their impact on software testing for fault finding. We adopt a multi-objective Pareto optimal approach using Monte Carlo sampling, genetic algorithms and genetic programming to search for higher order mutants which are both hard-to-kill and realistic. The space of complex faults (higher order mutants) is much larger than that of traditional first order mutations which correspond to simple faults, nevertheless search based approaches make this scalable. The problems of non-determinism and efficiency are overcome. Easy to detect faults may become harder to detect when they interact and impossible to detect single faults may be brought to light when code contains two such faults. We use strong typing and BNF grammars in search based mutation testing to find examples of both in ancient heavily optimised every day C code.
A critical review of “automatic patch generation learned from human-written patches”: An essay on the problem statement and the evaluation of automatic software repair
- In Proc. of the Int. Conf on Software Engineering (ICSE
, 2014
"... At ICSE’2013, there was the first session ever dedicated to automatic program repair. In this session, Kim et al. pre-sented PAR, a novel template-based approach for fixing Java bugs. We strongly disagree with key points of this paper. Our critical review has two goals. First, we aim at explain-ing ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
(Show Context)
At ICSE’2013, there was the first session ever dedicated to automatic program repair. In this session, Kim et al. pre-sented PAR, a novel template-based approach for fixing Java bugs. We strongly disagree with key points of this paper. Our critical review has two goals. First, we aim at explain-ing why we disagree with Kim and colleagues and why the reasons behind this disagreement are important for research on automatic software repair in general. Second, we aim at contributing to the field with a clarification of the essen-tial ideas behind automatic software repair. In particular we discuss the main evaluation criteria of automatic software re-pair: understandability, correctness and completeness. We show that depending on how one sets up the repair scenario, the evaluation goals may be contradictory. Eventually, we discuss the nature of fix acceptability and its relation to the notion of software correctness.
On the danger of coverage directed test case generation
- In 15th Int’l Conf. on Fundamental Approaches to Software Engineering (FASE
, 2012
"... Abstract. In the avionics domain, the use of structural coverage criteria is legally required in determining test suite adequacy. With the success of automated test generation tools, it is tempting to use these criteria as the basis for test generation. To more firmly establish the effectiveness of ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
(Show Context)
Abstract. In the avionics domain, the use of structural coverage criteria is legally required in determining test suite adequacy. With the success of automated test generation tools, it is tempting to use these criteria as the basis for test generation. To more firmly establish the effectiveness of such approaches, we have generated and evaluated test suites to satisfy two coverage criteria using counterexample-based test generation and a random generation approach, contrasted against purely random test suites of equal size. Our results yield two key conclusions. First, coverage criteria satisfaction alone is a poor indication of test suite effectiveness. Second, the use of structural coverage as a supplement—not a target—for test generation can have a positive impact. These observations points to the dangers inherent in the increase in test automation in critical systems and the need for more research in how coverage criteria, generation approach, and system structure jointly influence test effectiveness. 1