Is the cure worse than the disease? Overfitting in automated program repair (2015)
Venue: | In European Software Engineering Conference and ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE |
Citations: | 4 - 2 self |
Citations
3723 |
Genetic programming: on the programming of computers by means of natural selection
- Koza
- 1992
(Show Context)
Citation Context ...inistic, and so much of our experimental methodology does not apply. However, we do find that AE similarly overfits to input tests (Section 4.2). GenProg [32, 62] uses a genetic programming heuristic =-=[31]-=- to search the space of candidate repairs. Given a buggy program and a set of tests, GenProg generates a population of random patches by using statistical fault localization to identify which program ... |
1556 |
Modeling by shortest data description
- Rissanen
- 1978
(Show Context)
Citation Context ...verfitting is also a well-studied problem in machine learning [41]. Our experiments suggest that minimization and overfitting are unrelated, which is consistent with prior results in machine learning =-=[52]-=-. To the best of our knowledge, ours is the first consideration of this relationship in the program repair domain. G&V approaches fall in the space of search-based software engineering [25], which ada... |
556 | KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs
- Cadar, Dunbar, et al.
- 2008
(Show Context)
Citation Context ...independent test suites: a black-box test suite written by the course instructor to the natural-language specification, and a white-box test suite constructed using the symbolic execution engine Klee =-=[12]-=- on a reference solution. This dataset is publicly available: http:// repairbenchmarks.cs.umass.edu. Our study admittedly uses small programs written by novice programmers, which threatens the general... |
312 | An Experimental Evaluation of the assumption of Independence in Multi-version Programming",
- Knight, Leveson
- 1986
(Show Context)
Citation Context ..., a group vote on the behavior may outperform each individual patch. N-version patches may therefore provide an avenue to mitigate overfitting. Human-written code typically lacks sufficient diversity =-=[30]-=- to enable true n-version programming [15], but randomized G&V repair may not. We created the n-version program Pn in the following way: For each buggy version-test suite subset pair Pb, run GenProg o... |
289 | Simplifying and isolating failure-inducing input - Zeller, Hildebrandt |
183 |
N-version programming: a fault tolerant approach to reliability of software operation
- Chen, Avizienis
- 1978
(Show Context)
Citation Context ...orm each individual patch. N-version patches may therefore provide an avenue to mitigate overfitting. Human-written code typically lacks sufficient diversity [30] to enable true n-version programming =-=[15]-=-, but randomized G&V repair may not. We created the n-version program Pn in the following way: For each buggy version-test suite subset pair Pb, run GenProg on Pb 20 times. If fewer than three of the ... |
183 | Generating software test data by evolution,”
- Michael, McGraw, et al.
- 2001
(Show Context)
Citation Context ...ch-based software engineering [25], which adapts search methods, such as genetic programming, to software engineering tasks. Search-based software engineering has been used for developing test suites =-=[40,59]-=-, finding safety violations [3], refactoring [53], and project management and effort estimation [8]. Good fitness functions are critical to searchbased software engineering. Our findings indicate that... |
157 | The current state and future of search based software engineering. In:
- Harman
- 2007
(Show Context)
Citation Context ...PROGRAM REPAIR Automatic repair techniques can be classified broadly into two classes: (1) Generate-and-validate (G&V) techniques create candidate patches (often via search-based software engineering =-=[25]-=-) and then validate them, typically through testing (e.g., [4, 13, 14, 17, 18, 28, 29, 37, 39, 43, 47, 50, 55, 58, 61, 62]. (2) Synthesis-based techniques use constraints to build correct-by-construct... |
146 | Automatically finding patches using genetic programming,”
- Weimer, Nguyen, et al.
- 2009
(Show Context)
Citation Context ...ing other approaches, such as synthesis-based repair [26, 46, 60] techniques, is also of great value, but is outside the scope of this paper. Our contribution is a controlled investigation of GenProg =-=[35, 62]-=- and TrpAutoRepair [49], both test-case-guided, search-based automatic program repair tools with freely available implementations that scale to large programs. The evaluation identifies the circumstan... |
143 | Fault-scalable byzantine fault-tolerant services
- Abd-El-Malek, Ganger, et al.
- 2005
(Show Context)
Citation Context ...ell as evaluated the monetary and time costs of automatic repair [32], the relationship between operator choices and test execution parameters and success [33,61], and human-rated patch acceptability =-=[1,29]-=- and maintainability [21]. However, these evaluations have generally not used an objective metric of correctness independent of patch construction. Our evaluation measures patch correctness independen... |
115 |
A few billion lines of code later: using static analysis to find bugs in the real world
- Bessey, Block, et al.
(Show Context)
Citation Context ...ddress software quality in existing systems written in legacy languages. Since legacy codebases often are often idiosyncratic to the point of not adhering to the specifications of their host language =-=[9]-=-, it might not be possible even to add contracts to such projects. G&V repair works by generating multiple candidate patches that might address a particular bug and then validating the candidates to d... |
102 | Automatically patching errors in deployed software
- Perkins, Kim, et al.
- 2009
(Show Context)
Citation Context ...o improve repair by helping designers select change operators and search strategies [27, 64]. Understanding how automated repair handles particular classes of errors, such as security vulnerabilities =-=[35, 47]-=- can guide tool design. For this reason, some automated repair techniques focus on a particular defect class, such as buffer overruns [54, 57], unsafe integer use in C programs [17], single-variable a... |
96 |
Mechanization in problem solving: the effect of Einstellung
- Luchins
- 1942
(Show Context)
Citation Context ...stractly about the program specification. However, while humans can reason about program faults abstractly above the level of a repair tool, they are also subject to a large array of cognitive biases =-=[2, 38]-=- that can hamper their debugging effort. Repair tools have no such biases, and will mechanically explore the solution space as guided by an objective function, without becoming irrationally fixated on... |
87 | Timeaware test suite prioritization
- Walcott, Soffa, et al.
- 2006
(Show Context)
Citation Context ...ch-based software engineering [25], which adapts search methods, such as genetic programming, to software engineering tasks. Search-based software engineering has been used for developing test suites =-=[40,59]-=-, finding safety violations [3], refactoring [53], and project management and effort estimation [8]. Good fitness functions are critical to searchbased software engineering. Our findings indicate that... |
83 | Automating string processing in spreadsheets using inputoutput examples
- Gulwani
- 2011
(Show Context)
Citation Context ...echniques provide the benefit of provable correctness for patches, but require contracts, so they are unsuitable for legacy systems. Synthesis techniques can also construct new features from examples =-=[16, 23]-=-, rather than address existing bugs. Our work has focused on G&V approaches, and investigating overfitting and patch quality in synthesisbased techniques is a complementary and worthwhile pursuit. The... |
81 | A novel co-evolutionary approach to automatic software bug fixing,”
- Arcuri, Yao
- 2008
(Show Context)
Citation Context ...r evaluates existing techniques in a new way to expose previously hidden limitations to G&V program repair. Our findings may extend to other search-based or test suite-guided repair techniques (e.g., =-=[6,18,29,39,43,44,47,61]-=-). Monperrus [42] has recently discussed the challenges of experimentally comparing program repair techniques. For example, the selection of test subjects (defects) can introduce evaluation bias [10, ... |
79 |
A practical guide for using statistical tests to assess randomized algorithms in software engineering,”
- Arcuri, Briand
- 2011
(Show Context)
Citation Context ...d requires paying special attention to the sample sizes, statistical tests, crossvalidation, and uses of bootstrapping. Our work is consistent with the guidelines for evaluating randomized algorithms =-=[5]-=- to enhance the credibility of our findings. Specifically, we used a large sample of 998 buggy student programs, controlled for a variety of potential influencers in our experiments, and used fixed-ef... |
79 | Countering network worms through automatic patch generation
- Sidiroglou, Keromytis
- 2005
(Show Context)
Citation Context ...cular classes of errors, such as security vulnerabilities [35, 47] can guide tool design. For this reason, some automated repair techniques focus on a particular defect class, such as buffer overruns =-=[54, 57]-=-, unsafe integer use in C programs [17], single-variable atomicity violations [26], deadlock and livelock defects [36], concurrency errors [37], and data input errors [4]. Other techniques tackle gene... |
73 | Genprog: A generic method for automatic software repair,”
- Goues, Nguyen, et al.
- 2012
(Show Context)
Citation Context ...g/10.1145/2786805.2786825. The most common prior evaluations of automatic repair provide evidence of techniques’ feasibility with respect to this test-casebased definition of patch correctness (e.g., =-=[17, 35, 46, 60]-=-). However, in practice, test suites are rarely exhaustive [51], and repair techniques must avoid breaking undertested functionality. When evaluations of repair techniques use the same test cases to b... |
72 | Automated fixing of programs with contracts,” in
- Wei, Pei, et al.
- 2010
(Show Context)
Citation Context ...ting tools General Terms: Experimentation Keywords: automated program repair, empirical evaluation, independent evaluation, GenProg, TrpAutoRepair, INTROCLASS 1. INTRODUCTION Automated program repair =-=[4,13,17,18,26,28,29,32,36,37,39,39, 43,46,47,50,55,58,60,61]-=- holds great potential to reduce debugging costs and improve software quality. For example, GenProg quickly and cheaply generated patches for 55 out of 105 C bugs [32], while PAR showed comparable res... |
69 | A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each,”
- Goues, Dewey-Vogt, et al.
- 2012
(Show Context)
Citation Context ...ting tools General Terms: Experimentation Keywords: automated program repair, empirical evaluation, independent evaluation, GenProg, TrpAutoRepair, INTROCLASS 1. INTRODUCTION Automated program repair =-=[4,13,17,18,26,28,29,32,36,37,39,39, 43,46,47,50,55,58,60,61]-=- holds great potential to reduce debugging costs and improve software quality. For example, GenProg quickly and cheaply generated patches for 55 out of 105 C bugs [32], while PAR showed comparable res... |
67 | Fair and balanced?: bias in bug-fix datasets.
- Bird, Bachmann, et al.
- 2009
(Show Context)
Citation Context ...7,61]). Monperrus [42] has recently discussed the challenges of experimentally comparing program repair techniques. For example, the selection of test subjects (defects) can introduce evaluation bias =-=[10, 48]-=-. Our evaluation focuses precisely on the limits and potential of repair techniques on a large dataset of defects, and controls for a variety of potential influencers, addressing some of Monperrus’ co... |
58 | Search-based determination of refactorings for improving the class structure of objectoriented systems,
- Seng, Stammel, et al.
- 2006
(Show Context)
Citation Context ...rch methods, such as genetic programming, to software engineering tasks. Search-based software engineering has been used for developing test suites [40,59], finding safety violations [3], refactoring =-=[53]-=-, and project management and effort estimation [8]. Good fitness functions are critical to searchbased software engineering. Our findings indicate that using test cases alone as the fitness function l... |
52 | Are automated debugging techniques actually helping programmers?
- Parnin, Orso
- 2011
(Show Context)
Citation Context ...tch construction, it is fundamentally different from the correctness criterion we use in our evaluation, as it is often difficult for humans to spot bugs even when told exactly where to look for them =-=[45]-=-. Meanwhile, our recent evaluation of SearchRepair uses the same methodology as the evaluation we present here [28]. Our work evaluates automated repair so that it can be improved. Empirical studies o... |
51 | Automated atomicity-violation fixing,” in
- Jin, Song, et al.
- 2011
(Show Context)
Citation Context ...ting tools General Terms: Experimentation Keywords: automated program repair, empirical evaluation, independent evaluation, GenProg, TrpAutoRepair, INTROCLASS 1. INTRODUCTION Automated program repair =-=[4,13,17,18,26,28,29,32,36,37,39,39, 43,46,47,50,55,58,60,61]-=- holds great potential to reduce debugging costs and improve software quality. For example, GenProg quickly and cheaply generated patches for 55 out of 105 C bugs [32], while PAR showed comparable res... |
50 | Automatic patch generation learned from human-written patches,” in
- Kim, Nam, et al.
- 2013
(Show Context)
Citation Context ...r evaluates existing techniques in a new way to expose previously hidden limitations to G&V program repair. Our findings may extend to other search-based or test suite-guided repair techniques (e.g., =-=[6,18,29,39,43,44,47,61]-=-). Monperrus [42] has recently discussed the challenges of experimentally comparing program repair techniques. For example, the selection of test subjects (defects) can introduce evaluation bias [10, ... |
46 | Semfix: Program repair via semantic analysis,”
- Nguyen, Qi, et al.
- 2013
(Show Context)
Citation Context ...ting tools General Terms: Experimentation Keywords: automated program repair, empirical evaluation, independent evaluation, GenProg, TrpAutoRepair, INTROCLASS 1. INTRODUCTION Automated program repair =-=[4,13,17,18,26,28,29,32,36,37,39,39, 43,46,47,50,55,58,60,61]-=- holds great potential to reduce debugging costs and improve software quality. For example, GenProg quickly and cheaply generated patches for 55 out of 105 C bugs [32], while PAR showed comparable res... |
39 | Automatic workarounds for web applications,”
- Carzaniga, Gorla, et al.
- 2010
(Show Context)
Citation Context ...oadly into two classes: (1) Generate-and-validate (G&V) techniques create candidate patches (often via search-based software engineering [25]) and then validate them, typically through testing (e.g., =-=[4, 13, 14, 17, 18, 28, 29, 37, 39, 43, 47, 50, 55, 58, 61, 62]-=-. (2) Synthesis-based techniques use constraints to build correct-by-construction patches via formal verification or inferred or programmer-provided contracts or specifications (e.g., [26, 46, 60]. Th... |
34 |
Automatic recovery from runtime failures
- Carzaniga, Gorla, et al.
- 2013
(Show Context)
Citation Context ...ting tools General Terms: Experimentation Keywords: automated program repair, empirical evaluation, independent evaluation, GenProg, TrpAutoRepair, INTROCLASS 1. INTRODUCTION Automated program repair =-=[4,13,17,18,26,28,29,32,36,37,39,39, 43,46,47,50,55,58,60,61]-=- holds great potential to reduce debugging costs and improve software quality. For example, GenProg quickly and cheaply generated patches for 55 out of 105 C bugs [32], while PAR showed comparable res... |
34 | Flight of the FINCH through the Java wilderness - Orlov, Sipper - 2011 |
32 |
Functional fixedness as related to problem solving: A repetition of three experiments.
- Adamson
- 1952
(Show Context)
Citation Context ...stractly about the program specification. However, while humans can reason about program faults abstractly above the level of a repair tool, they are also subject to a large array of cognitive biases =-=[2, 38]-=- that can hamper their debugging effort. Repair tools have no such biases, and will mechanically explore the solution space as guided by an objective function, without becoming irrationally fixated on... |
31 |
Staffing a software project: a constraint satisfaction approach,”
- Barreto, Barros, et al.
- 2008
(Show Context)
Citation Context ...re engineering tasks. Search-based software engineering has been used for developing test suites [40,59], finding safety violations [3], refactoring [53], and project management and effort estimation =-=[8]-=-. Good fitness functions are critical to searchbased software engineering. Our findings indicate that using test cases alone as the fitness function leads to patches that may not generalize to the pro... |
31 | Using mutation to automatically suggest fixes for faulty programs,” in
- Debroy, Wong
- 2010
(Show Context)
Citation Context ...r evaluates existing techniques in a new way to expose previously hidden limitations to G&V program repair. Our findings may extend to other search-based or test suite-guided repair techniques (e.g., =-=[6,18,29,39,43,44,47,61]-=-). Monperrus [42] has recently discussed the challenges of experimentally comparing program repair techniques. For example, the selection of test subjects (defects) can introduce evaluation bias [10, ... |
26 | Ecological inference in empirical software engineering,"
- Posnett, Filkov, et al.
- 2011
(Show Context)
Citation Context ...7,61]). Monperrus [42] has recently discussed the challenges of experimentally comparing program repair techniques. For example, the selection of test subjects (defects) can introduce evaluation bias =-=[10, 48]-=-. Our evaluation focuses precisely on the limits and potential of repair techniques on a large dataset of defects, and controls for a variety of potential influencers, addressing some of Monperrus’ co... |
23 | Finding safety errors with aco, in:
- Alba, Chicano
- 2007
(Show Context)
Citation Context ... which adapts search methods, such as genetic programming, to software engineering tasks. Search-based software engineering has been used for developing test suites [40,59], finding safety violations =-=[3]-=-, refactoring [53], and project management and effort estimation [8]. Good fitness functions are critical to searchbased software engineering. Our findings indicate that using test cases alone as the ... |
22 | Problem difficulty and code growth in genetic programming. Genetic Programming and Evolvable Machines
- Gustafson, Ekart, et al.
- 2004
(Show Context)
Citation Context ...the same program. Finally, as a challenge that applies to GenProg in particular, genetic programming is known to lead to bloat, in which solutions contain more code than necessary to maximize fitness =-=[24]-=-. GenProg minimizes code bloat post facto; prior work has claimed that minimization reduces patches overfitting to the training tests [35]. TrpAutoRepair only attempts single-edit patches, and thus do... |
22 |
Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories,”
- Silva, Costa
- 2009
(Show Context)
Citation Context ...trols for a variety of potential influencers, addressing some of Monperrus’ concerns [42]. Genetic programming tends to produce extraneous code that does not contribute to the fitness of the solution =-=[24, 56]-=-. GenProg attempts to mitigate this through solution minimization, which may reduce the chances of breaking undertested functionality. Overfitting is also a well-studied problem in machine learning [4... |
21 | A critical review of ”automatic patch generation learned from human-written patches”: Essay on the problem statement and the evaluation of automatic software repair.
- Monperrus
- 2014
(Show Context)
Citation Context ...new way to expose previously hidden limitations to G&V program repair. Our findings may extend to other search-based or test suite-guided repair techniques (e.g., [6,18,29,39,43,44,47,61]). Monperrus =-=[42]-=- has recently discussed the challenges of experimentally comparing program repair techniques. For example, the selection of test subjects (defects) can introduce evaluation bias [10, 48]. Our evaluati... |
18 | A human study of patch maintainability,”
- Fry, Landau, et al.
- 2012
(Show Context)
Citation Context ... consider independent quality measures, though less extensively than we do here. And while some evaluations have used humans to independently measure repair acceptability [19, 29] and maintainability =-=[21]-=-, unlike our work, they neither directly nor objectively evaluate functional patch correctness. Meanwhile, our recent evaluation of SearchRepair uses the same methodology as the evaluation we propose ... |
16 | Representations and operators for improving evolutionary software repair,”
- Goues, Weimer, et al.
- 2012
(Show Context)
Citation Context ... search algorithms, but comparable to previous program repair evaluations. More attempts may have revealed more solutions. Finally, we used the recommended GenProg parameters defined in previous work =-=[33]-=-; a full parameter sweep is outside the scope of this investigation. Our INTROCLASS dataset — http://repairbenchmarks.cs. umass.edu— includes all the buggy versions, student-written solutions, and tes... |
15 | The economic impacts of inadequate infrastructure for software testing. Planning Report 02-3 - Institute - 2002 |
15 | Leveraging program equivalence for adaptive program repair: Models and first results.
- Weimer, Fry, et al.
- 2013
(Show Context)
Citation Context ...number of bugs in medium-sized programs, as well as evaluated the monetary and time costs of automatic repair [32], the relationship between operator choices and test execution parameters and success =-=[33,61]-=-, and human-rated patch acceptability [1,29] and maintainability [21]. However, these evaluations have generally not used an objective metric of correctness independent of patch construction. Our eval... |
11 | Testing mined specifications
- Gabel, Su
- 2012
(Show Context)
Citation Context ... in an attempt to fix it. This describes the fix space of a particular program repair problem. GenProg and TrpAutoRepair tackle this challenge using the observation that programs are often repetitive =-=[7, 22]-=- and logic implemented with a bug in one place is likely to be implemented correctly elsewhere in the same program. GenProg and TrpAutoRepair therefore limit the code changes to deleting constructs an... |
10 |
Program transformations to fix c integers
- Coker, Hafiz
- 2013
(Show Context)
Citation Context ...ting tools General Terms: Experimentation Keywords: automated program repair, empirical evaluation, independent evaluation, GenProg, TrpAutoRepair, INTROCLASS 1. INTRODUCTION Automated program repair =-=[4,13,17,18,26,28,29,32,36,37,39,39, 43,46,47,50,55,58,60,61]-=- holds great potential to reduce debugging costs and improve software quality. For example, GenProg quickly and cheaply generated patches for 55 out of 105 C bugs [32], while PAR showed comparable res... |
9 | The plastic surgery hypothesis
- Barr, Brun, et al.
- 2014
(Show Context)
Citation Context ... in an attempt to fix it. This describes the fix space of a particular program repair problem. GenProg and TrpAutoRepair tackle this challenge using the observation that programs are often repetitive =-=[7, 22]-=- and logic implemented with a bug in one place is likely to be implemented correctly elsewhere in the same program. GenProg and TrpAutoRepair therefore limit the code changes to deleting constructs an... |
8 | An analysis of patch plausibility and correctness for generate-and-validate patch generation systems.
- Qi, Long, et al.
- 2015
(Show Context)
Citation Context ...ting tools General Terms: Experimentation Keywords: automated program repair, empirical evaluation, independent evaluation, GenProg, TrpAutoRepair, INTROCLASS 1. INTRODUCTION Automated program repair =-=[4,13,17,18,26,28,29,32,36,37,39,39, 43,46,47,50,55,58,60,61]-=- holds great potential to reduce debugging costs and improve software quality. For example, GenProg quickly and cheaply generated patches for 55 out of 105 C bugs [32], while PAR showed comparable res... |
7 |
Automatic repair of real bugs: An experience report on the defects4j dataset.
- Durieux, Martinez, et al.
- 2015
(Show Context)
Citation Context ...(e.g., [50, 64]) has begun to consider independent quality measures, though less extensively than we do here. And while some evaluations have used humans to independently measure repair acceptability =-=[19, 29]-=- and maintainability [21], unlike our work, they neither directly nor objectively evaluate functional patch correctness. Meanwhile, our recent evaluation of SearchRepair uses the same methodology as t... |
7 | Defects4J: A database of existing faults to enable controlled testing studies for Java programs.
- Just, Jalali, et al.
- 2014
(Show Context)
Citation Context ...automated repair so that it can be improved. Empirical studies of fixes of real bugs in open-source projects can also improve repair by helping designers select change operators and search strategies =-=[27, 64]-=-. Understanding how automated repair handles particular classes of errors, such as security vulnerabilities [35, 47] can guide tool design. For this reason, some automated repair techniques focus on a... |
7 |
Efficient automated program repair through fault-recorded testing prioritization
- Qi, Mao, et al.
- 2013
(Show Context)
Citation Context ...as synthesis-based repair [26, 46, 60] techniques, is also of great value, but is outside the scope of this paper. Our contribution is a controlled investigation of GenProg [35, 62] and TrpAutoRepair =-=[49]-=-, both test-case-guided, search-based automatic program repair tools with freely available implementations that scale to large programs. The evaluation identifies the circumstances under which these t... |
6 | Automatic error elimination by horizontal code transfer across multiple applications. In
- Sidiroglou-Douskos, Lahtinen, et al.
- 2015
(Show Context)
Citation Context ...ting tools General Terms: Experimentation Keywords: automated program repair, empirical evaluation, independent evaluation, GenProg, TrpAutoRepair, INTROCLASS 1. INTRODUCTION Automated program repair =-=[4,13,17,18,26,28,29,32,36,37,39,39, 43,46,47,50,55,58,60,61]-=- holds great potential to reduce debugging costs and improve software quality. For example, GenProg quickly and cheaply generated patches for 55 out of 105 C bugs [32], while PAR showed comparable res... |
6 |
cker Chiueh. Dira: Automatic detection, identification and repair of control-hijacking attacks
- Smirnov, T
- 2005
(Show Context)
Citation Context ...cular classes of errors, such as security vulnerabilities [35, 47] can guide tool design. For this reason, some automated repair techniques focus on a particular defect class, such as buffer overruns =-=[54, 57]-=-, unsafe integer use in C programs [17], single-variable atomicity violations [26], deadlock and livelock defects [36], concurrency errors [37], and data input errors [4]. Other techniques tackle gene... |
5 |
Semantic differential repair for input validation and sanitization
- Alkhalaf, Aydin, et al.
- 2014
(Show Context)
Citation Context ...ting tools General Terms: Experimentation Keywords: automated program repair, empirical evaluation, independent evaluation, GenProg, TrpAutoRepair, INTROCLASS 1. INTRODUCTION Automated program repair =-=[4,13,17,18,26,28,29,32,36,37,39,39, 43,46,47,50,55,58,60,61]-=- holds great potential to reduce debugging costs and improve software quality. For example, GenProg quickly and cheaply generated patches for 55 out of 105 C bugs [32], while PAR showed comparable res... |
4 | Contracts in practice
- Estler, Furia, et al.
- 2013
(Show Context)
Citation Context ...specifications. As examples, Clearview [47], GenProg, Par, and Debroy and Wong [18] have successfully fixed bugs in legacy software. Although new projects appear to be increasingly adopting contracts =-=[20]-=-, their penetration into existing systems and languages remains limited. Few maintained contract implementations exist for widely-used languages such as C. As an example, as of March 2014, in the Debi... |
4 | The ManyBugs and IntroClass benchmarks for automated repair of C programs
- Goues, Holtschulte, et al.
- 2015
(Show Context)
Citation Context ...er 4, 2015, Bergamo, Italy . 978-1-4503-3675-8/15/08...$15.00 http://dx.doi.org/10.1145/2786805.2786825 532 respect to some specification representation test suites. We produce the INTROCLASS dataset =-=[34]-=- for our evaluation by collecting 998 student-written programs with defects, submitted as homework in a freshman programming class, and all with student-written, bugfixing patches. Each program is acc... |
4 |
DirectFix: Looking for simple program repairs
- Mechtaev, Yi, et al.
- 2015
(Show Context)
Citation Context ...ting tools General Terms: Experimentation Keywords: automated program repair, empirical evaluation, independent evaluation, GenProg, TrpAutoRepair, INTROCLASS 1. INTRODUCTION Automated program repair =-=[4,13,17,18,26,28,29,32,36,37,39,39, 43,46,47,50,55,58,60,61]-=- holds great potential to reduce debugging costs and improve software quality. For example, GenProg quickly and cheaply generated patches for 55 out of 105 C bugs [32], while PAR showed comparable res... |
3 | Program boosting: Program synthesis via crowdsourcing
- Cochran, D’Antoni, et al.
- 2015
(Show Context)
Citation Context ...echniques provide the benefit of provable correctness for patches, but require contracts, so they are unsuitable for legacy systems. Synthesis techniques can also construct new features from examples =-=[16, 23]-=-, rather than address existing bugs. Our work has focused on G&V approaches, and investigating overfitting and patch quality in synthesisbased techniques is a complementary and worthwhile pursuit. The... |
3 | Automatic repair for multi-threaded programs with deadlock/livelock using maximum satisfiability - Lin, Kulkarni - 2014 |
3 | An empirical study on real bug fixes
- Zhong, Su
- 2015
(Show Context)
Citation Context ...tions of automated repair techniques that relied on test cases or workloads to validate candidate patches failed to evaluate those patches independently of patch construction. More recent work (e.g., =-=[50, 64]-=-) has begun to consider independent quality measures, though less extensively than we do here. And while some evaluations have used humans to independently measure repair acceptability [19, 29] and ma... |
2 | Evolution vs. intelligent design in program patching - Brun, Barr, et al. - 2013 |
2 | Repairing programs with semantic code search
- Ke, Stolee, et al.
- 2015
(Show Context)
Citation Context ...ting tools General Terms: Experimentation Keywords: automated program repair, empirical evaluation, independent evaluation, GenProg, TrpAutoRepair, INTROCLASS 1. INTRODUCTION Automated program repair =-=[4,13,17,18,26,28,29,32,36,37,39,39, 43,46,47,50,55,58,60,61]-=- holds great potential to reduce debugging costs and improve software quality. For example, GenProg quickly and cheaply generated patches for 55 out of 105 C bugs [32], while PAR showed comparable res... |
2 |
Grail: Context-aware fixing of concurrency bugs
- Liu, Tripp, et al.
- 2014
(Show Context)
Citation Context ...ting tools General Terms: Experimentation Keywords: automated program repair, empirical evaluation, independent evaluation, GenProg, TrpAutoRepair, INTROCLASS 1. INTRODUCTION Automated program repair =-=[4,13,17,18,26,28,29,32,36,37,39,39, 43,46,47,50,55,58,60,61]-=- holds great potential to reduce debugging costs and improve software quality. For example, GenProg quickly and cheaply generated patches for 55 out of 105 C bugs [32], while PAR showed comparable res... |
2 |
relifix: Automated repair of software regressions
- Tan, Roychoudhury
- 2015
(Show Context)
Citation Context ...ting tools General Terms: Experimentation Keywords: automated program repair, empirical evaluation, independent evaluation, GenProg, TrpAutoRepair, INTROCLASS 1. INTRODUCTION Automated program repair =-=[4,13,17,18,26,28,29,32,36,37,39,39, 43,46,47,50,55,58,60,61]-=- holds great potential to reduce debugging costs and improve software quality. For example, GenProg quickly and cheaply generated patches for 55 out of 105 C bugs [32], while PAR showed comparable res... |