Results 1 - 10
of
15
Fair and Balanced? Bias in bug-fix Datasets
- in European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE
, 2009
"... Software engineering researchers have long been interested in where and why bugs occur in code, and in predicting where they might turn up next. Historical bug-occurence data has been key to this research. Bug tracking systems, and code version histories, record when, how and by whom bugs were fixed ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
Software engineering researchers have long been interested in where and why bugs occur in code, and in predicting where they might turn up next. Historical bug-occurence data has been key to this research. Bug tracking systems, and code version histories, record when, how and by whom bugs were fixed; from these sources, datasets that relate file changes to bug fixes can be extracted. These historical datasets can be used to test hypotheses concerning processes of bug introduction, and also to build statistical bug prediction models. Unfortunately, processes and humans are imperfect, and only a fraction of bug fixes are actually labelled in source code version histories, and thus become available for study in the extracted datasets. The question naturally arises, are the bug fixes recorded in these historical datasets a fair representation of the full population of bug fixes? In this paper, we investigate historical data from several software projects, and find strong evidence of systematic bias. We then investigate the potential effects of “unfair, imbalanced” datasets on the performance of prediction techniques. We draw the lesson that bias is a critical problem that threatens both the effectiveness of processes that rely on biased datasets to build prediction models and the generalizability of hypotheses tested on biased data 1.
The Promises and Perils of Mining Git
"... We are now witnessing the rapid growth of decentralized source code management (DSCM) systems, in which every developer has her own repository. DSCMs facilitate a style of collaboration in which work output can flow sideways (and privately) between collaborators, rather than always up and down (and ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
We are now witnessing the rapid growth of decentralized source code management (DSCM) systems, in which every developer has her own repository. DSCMs facilitate a style of collaboration in which work output can flow sideways (and privately) between collaborators, rather than always up and down (and publicly) via a central repository. Decentralization comes with both the promise of new data and the peril of its misinterpretation. We focus on git, a very popular DSCM used in high-profile projects. Decentralization, and other features of git, such as automatically recorded contributor attribution, lead to richer content histories, giving rise to new questions such as “How do contributions flow between developers to the official project repository? ” However, there are pitfalls. Commits may be reordered, deleted, or edited as they move between repositories. The semantics of terms common to SCMs and DSCMs sometimes differ markedly, potentially creating confusion. For example, a commit is immediately visible to all developers in centralized SCMs, but not in DSCMs. Our goal is to help researchers interested in DSCMs avoid these and other perils when mining and analyzing git data. Out of a stem that scored the hand I wrung it in a weary land.
The Missing Links: Bugs and Bug-fix Commits
"... Empirical studies of software defects rely on links between bug databases and program code repositories. This linkage is typically based on bug-fixes identified in developer-entered commit logs. Unfortunately, developers do not always report which commits perform bug-fixes. Prior work suggests that ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Empirical studies of software defects rely on links between bug databases and program code repositories. This linkage is typically based on bug-fixes identified in developer-entered commit logs. Unfortunately, developers do not always report which commits perform bug-fixes. Prior work suggests that such links can be a biased sample of the entire population of fixed bugs. The validity of statistical hypotheses-testing based on linked data could well be affected by bias. Given the wide use of linked defect data, it is vital to gauge the nature and extent of the bias, and try to develop testable theories and models of the bias. To do this, we must establish ground truth: manually analyze a complete version history corpus, and nail down those commits that fix defects, and those that do not. This is a difficult task, requiring an expert to compare versions, analyze changes, find related bugs in the bug database, reverse-engineer missing links, and finally record their work for use later. This effort must be repeated for hundreds of commits to obtain a useful sample of reported and unreported bug-fix commits. We make several contributions. First, we present Linkster, a tool to facilitate link reverse-engineering. Second, we evaluate this tool, engaging a core developer of the Apache HTTP web server project to exhaustively annotate 493 commits that occurred during a six week period. Finally, we analyze this comprehensive data set, showing that there are serious and consequential problems in the data.
An Extensive Comparison of Bug Prediction Approaches
"... Abstract—Reliably predicting software defects is one of software engineering’s holy grails. Researchers have devised and implemented a plethora of bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark make ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract—Reliably predicting software defects is one of software engineering’s holy grails. Researchers have devised and implemented a plethora of bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available data set consisting of several software systems, and provide an extensive comparison of the explanative and predictive power of well-known bug prediction approaches, together with novel approaches we devised. Based on the results, we discuss the performance and stability of the approaches with respect to our benchmark and deduce a number of insights on bug prediction models. I.
Ownership and Experience in Fix-Inducing Code
"... Software defects cost the U.S. economy billions of dollars annually [1]. Recent research indicates that “people ” factors such as ownership, experience, organizational structure, and geographic distribution have a big impact on software quality. This paper considers the impact of code ownership and ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Software defects cost the U.S. economy billions of dollars annually [1]. Recent research indicates that “people ” factors such as ownership, experience, organizational structure, and geographic distribution have a big impact on software quality. This paper considers the impact of code ownership and developer experience on software quality. In a large project, a file might be entirely owned by a single developer, or worked on by many. Some previous research indicates that more developers working on a file might lead to more defects. We examine this issue at a fine-grained level. Using version control history, we examine contributions to code fragments that are actually repaired to fix bugs. Are these “troubled ” code fragments the result of contributions from many? or from one? Does experience matter? What type of experience? We find that fix-inducing code is more strongly associated with a single developer’s contribution; our findings also indicate that author’s specific experience in the target file is more important than general experience. Our findings suggest that quality control efforts could be profitably targeted at changes made by single developers with limited prior experience on that file.
Reducing Features to Improve Bug Prediction
"... Abstract—Recently, machine learning classifiers have emerged as a way to predict the existence of a bug in a change made to a source code file. The classifier is first trained on software history data, and then used to predict bugs. Two drawbacks of existing classifier-based bug prediction are poten ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—Recently, machine learning classifiers have emerged as a way to predict the existence of a bug in a change made to a source code file. The classifier is first trained on software history data, and then used to predict bugs. Two drawbacks of existing classifier-based bug prediction are potentially insufficient accuracy for practical use, and use of a large number of features. These large numbers of features adversely impact scalability and accuracy of the approach. This paper proposes a feature selection technique applicable to classification-based bug prediction. This technique is applied to predict bugs in software changes, and performance of Naïve Bayes and Support Vector Machine (SVM) classifiers is characterized.
Recalling the “imprecision” of cross-project defect prediction
- In the 20th ACM SIGSOFT FSE
, 2012
"... There has been a great deal of interest in defect prediction: using prediction models trained on historical data to help focus quality-control resources in ongoing development. Since most new projects don’t have historical data, there is interest in cross-project prediction: using data from one proj ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
There has been a great deal of interest in defect prediction: using prediction models trained on historical data to help focus quality-control resources in ongoing development. Since most new projects don’t have historical data, there is interest in cross-project prediction: using data from one project to predict defects in another. Sadly, results in this area have largely been disheartening. Most experiments in cross-project defect prediction report poor performance, using the standard measures of precision, recall and F-score. We argue that these IR-based measures, while broadly applicable, are not as well suited for the quality-control settings in which defect prediction models are used. Specifically, these measures are taken at specific threshold settings (typically thresholds of the predicted probability of defectiveness returned by a logistic regression model). However, in practice, software quality control processes choose from a range of time-and-cost vs quality tradeoffs: how many files shall we test? how many shall we inspect? Thus, we argue that measures based on a variety of tradeoffs, viz., 5%, 10 % or 20 % of files tested/inspected would be more suitable. We study cross-project defect prediction from this perspective. We find that cross-project prediction performance is no worse than within-project performance, and substantially better than random prediction!
Change Prediction in Object-Oriented Software Systems: A Probabilistic Approach
, 2008
"... An estimation of change-proneness of parts of a software system is an active topic in the area of software engineering. Such estimates can be used to predict changes to different classes of a system from one release to the next. They can also be used to estimate and possibly reduce the effort requ ..."
Abstract
- Add to MetaCart
An estimation of change-proneness of parts of a software system is an active topic in the area of software engineering. Such estimates can be used to predict changes to different classes of a system from one release to the next. They can also be used to estimate and possibly reduce the effort required during the development and maintenance phase by balancing the amount of developers’ time assigned to each part of a software system. This research work proposes a novel approach to predict changes in an object-oriented software system. The rationale behind this approach is that in a well-designed software system, feature enhancement or corrective maintenance should affect a limited amount of existing code. Our goal is to quantify this aspect of quality by assessing the probability that each class will change in a future generation. Our proposed probabilistic approach uses the dependencies obtained from the UML diagrams, as well as other code metrics extracted from source code of several releases of a software system using reverse engineering techniques. These measures, combined with the change log of the software system and the expected time of next release, are used in an automated manner to predict whether a class will change in the next release of the software system. The proposed systematic approach has been evaluated on a multiversion medium sized open source project namely JFlex, the Fast Scanner Generator for Java. The obtained results indicate the simplicity and accuracy of our approach in the comparison with existing methods referred in the literature.
Oscar Nierstrasz Software Composition Group
"... Abstract—Navigating large software systems, even when using a modern IDE, is difficult, since conceptually related software artifacts are distributed in a huge software space. For most software maintenance tasks, only a small fraction of the entire software space is actually relevant. The IDE, howev ..."
Abstract
- Add to MetaCart
Abstract—Navigating large software systems, even when using a modern IDE, is difficult, since conceptually related software artifacts are distributed in a huge software space. For most software maintenance tasks, only a small fraction of the entire software space is actually relevant. The IDE, however, does not reveal the task relevancy of source artifacts, thus developers cannot easily focus on the artifacts required to accomplish their tasks. SmartGroups help developers to perform software maintenance tasks by representing groups of source artifacts that are relevant for the current task. Relevancy is determined by analyzing historical navigation and modification activities, evolutionary information, and runtime information. The prediction quality of SmartGroups is validated with a benchmark evaluation using recorded development activities and evolutionary information from versioning systems.
The Code Orb – Supporting Contextualized Coding via Ata-Glance Views (NIER Track)
"... While code is typically presented as a flat file to a developer who must change it, this flat file exists within a context that can drastically influence how a developer approaches changing it. While the developer clearly must be careful changing any code, they probably should be yet more careful in ..."
Abstract
- Add to MetaCart
While code is typically presented as a flat file to a developer who must change it, this flat file exists within a context that can drastically influence how a developer approaches changing it. While the developer clearly must be careful changing any code, they probably should be yet more careful in changing code that recently saw major changes, is barely covered by test cases, and was the source of a number of bugs. Contextualized coding refers to the ability of the developer to effectively use such contextual information while they work on some changes. In this paper, we introduce the Code Orb, a contextualized coding tool that builds upon existing mining and analysis techniques to warn developers on a line-by-line basis of the volatility of the code they are working on. The key insight underneath the Code Orb is that it is neither desired nor possible to always present a code’s context in its entirety; instead, it is necessary to provide an abstracted view of the context that informs the developer of which parts of the code they need to pay more attention to. This paper discusses the principles of and rationale behind contextualized coding, introduces the Code Orb, and illustrates its function with example code and context drawn from the Mylyn [11] project.

