Results 1 - 10
of
34
PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code
- In Proc. 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE’05
, 2005
"... Programs usually follow many implicit programming rules, most of which are too tedious to be documented by programmers. When these rules are violated by programmers who are unaware of or forget about them, defects can be easily introduced. Therefore, it is highly desirable to have tools to automatic ..."
Abstract
-
Cited by 87 (9 self)
- Add to MetaCart
Programs usually follow many implicit programming rules, most of which are too tedious to be documented by programmers. When these rules are violated by programmers who are unaware of or forget about them, defects can be easily introduced. Therefore, it is highly desirable to have tools to automatically extract such rules and also to automatically detect violations. Previous work in this direction focuses on simple function-pair based programming rules and additionally requires programmers to provide rule templates. This paper proposes a general method called PR-Miner that uses a data mining technique called frequent itemset mining to efficiently extract implicit programming rules from large software code written in an industrial programming language such as C, requiring little effort from programmers and no prior knowledge of the software. Benefiting from frequent itemset mining, PR-Miner can extract programming
AccMon: Automatically Detecting Memory-related Bugs via Program Counter-based Invariants
- In 37th International Symposium on Microarchitecture (MICRO
, 2004
"... This paper makes two contributions to architectural support for software debugging. First, it proposes a novel statistics-based, onthe -fly bug detection method called PC-based invariant detection. The idea is based on the observation that, in most programs, a given memory location is typically acce ..."
Abstract
-
Cited by 47 (10 self)
- Add to MetaCart
This paper makes two contributions to architectural support for software debugging. First, it proposes a novel statistics-based, onthe -fly bug detection method called PC-based invariant detection. The idea is based on the observation that, in most programs, a given memory location is typically accessed by only a few instructions. Therefore, by capturing the invariant of the set of PCs that normally access a given variable, we can detect accesses by outlier instructions, which are often caused by memory corruption, buffer overflow, stack smashing or other memory-related bugs. Since this method is statistics-based, it can detect bugs that do not violate any programming rules and that, therefore, are likely to be missed by many existing tools. The second contribution is a novel architectural extension called the Check Look-aside Buffer (CLB). The CLB uses a Bloom filter to reduce monitoring overheads in the recentlyproposed iWatcher architectural framework for software debugging. The CLB significantly reduces the overhead of PC-based invariant debugging.
Bugbench: Benchmarks for evaluating bug detection tools
- In Workshop on the Evaluation of Software Defect Detection Tools
, 2005
"... Benchmarking provides an effective way to evaluate different tools. Unfortunately, so far there is no good benchmark suite to systematically evaluate software bug detection tools. As a result, it is difficult to quantitatively compare the strengths and limitations of existing or newly proposed bug d ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Benchmarking provides an effective way to evaluate different tools. Unfortunately, so far there is no good benchmark suite to systematically evaluate software bug detection tools. As a result, it is difficult to quantitatively compare the strengths and limitations of existing or newly proposed bug detection tools. In this paper, we share our experience of building a bug benchmark suite called BugBench. Specifically, we first summarize the general guidelines on the criteria for selecting representative bug benchmarks, and the metrics for evaluating a bug detection tool. Second, we present a set of buggy applications collected by us, with various types of software bugs. Third, we conduct a preliminary study on the application and bug characteristics in the context of software bug detection. Finally, we evaluate several existing bug detection tools including Purify, Valgrind, and CCured to validate the selection of our benchmarks.
Deckard: Scalable and accurate tree-based detection of code clones
- In ICSE
, 2007
"... Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space R n and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar. 1.
Have Things Changed Now? An Empirical Study of Bug Characteristics in Modern Open Source Software
- Proc. of 1st Workshop on Architectural and System Support for Improving Software Dependability
, 2006
"... Software errors are a major cause for system failures. To effectively design tools and support for detecting and recovering from software failures requires a deep understanding of bug 1 characteristics. Recently, software and its development process have significantly changed in many ways, including ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
Software errors are a major cause for system failures. To effectively design tools and support for detecting and recovering from software failures requires a deep understanding of bug 1 characteristics. Recently, software and its development process have significantly changed in many ways, including more help from bug detection tools, shift towards multi-threading architecture, the opensource development paradigm and increasing concerns about security and user-friendly interface. Therefore, results from previous studies may not be applicable to present software. Furthermore, many new aspects such as security, concurrency and open-sourcerelated characteristics have not well studied. Additionally, previous studies were based on a small number of bugs, which may lead to non-representative results. To investigate the impacts of the new factors on software errors,
Scalable Detection of Semantic Clones ∗
"... Several techniques have been developed for identifying similar code fragments in programs. These similar fragments, referred to as code clones, can be used to identify redundant code, locate bugs, or gain insight into program design. Existing scalable approaches to clone detection are limited to fin ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Several techniques have been developed for identifying similar code fragments in programs. These similar fragments, referred to as code clones, can be used to identify redundant code, locate bugs, or gain insight into program design. Existing scalable approaches to clone detection are limited to finding program fragments that are similar only in their contiguous syntax. Other, semantics-based approaches are more resilient to differences in syntax, such as reordered statements, related statements interleaved with other unrelated statements, or the use of semantically equivalent control structures. However, none of these techniques have scaled to real world code bases. These approaches capture semantic information from Program Dependence Graphs (PDGs), program representations that encode data and control dependencies between statements and predicates. Our definition of a code clone is also based on this representation: we consider program fragments with isomorphic PDGs to be clones. In this paper, we present the first scalable clone detection algorithm based on this definition of semantic clones. Our insight is the reduction of the difficult graph similarity problem to a simpler tree similarity problem by mapping carefully selected PDG subgraphs to their related structured syntax. We efficiently solve the tree similarity problem to create a scalable analysis. We have implemented this algorithm in a practical tool and performed evaluations on several million-line open source projects, including the Linux kernel. Compared with previous approaches, our tool locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments.
HeapMD: Identifying Heap-based Bugs using Anomaly Detection
- In International Conference on Architectural Support for Programming Languages and Operating Systems
, 2006
"... We present the design, implementation, and evaluation of HeapMD, a dynamic analysis tool that finds heap-based bugs using anomaly detection. HeapMD is based upon the observation that, in spite of the evolving nature of the heap, several of its properties remain stable. HeapMD uses this observation i ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We present the design, implementation, and evaluation of HeapMD, a dynamic analysis tool that finds heap-based bugs using anomaly detection. HeapMD is based upon the observation that, in spite of the evolving nature of the heap, several of its properties remain stable. HeapMD uses this observation in a novel way: periodically, during the execution of the program, it computes a suite of metrics which are sensitive to the state of the heap. These metrics track heap behavior, and the stability of the heap reflects quantitatively in the values of these metrics. The “normal ” ranges of stable metrics, obtained by running a program on multiple inputs, are then treated as indicators of correct behaviour, and are used in conjunction with an anomaly detector to find heap-based bugs. Using HeapMD, we were able to find 40 heap-based bugs, 31 of them previously unknown, in 5 large, commercial applications.
Listening to programmers - Taxonomies and characteristics of comments in operating system code
- In Proc. 31st ICSE
, 2009
"... Innovations from multiple directions have been proposed to improve software reliability. Unfortunately, many of the innovations are not fully exploited by programmers. To bridge the gap, this paper proposes a new approach to “listen” to thousands of programmers: studying their programming comments. ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Innovations from multiple directions have been proposed to improve software reliability. Unfortunately, many of the innovations are not fully exploited by programmers. To bridge the gap, this paper proposes a new approach to “listen” to thousands of programmers: studying their programming comments. Since comments express programmers ’ assumptions and intentions, comments can reveal programmers’ needs, which can provide guidance (1) for language/tool designers on where they should develop new techniques or enhance the usability of existing ones, and (2) for programmers on what problems are most pervasive and important so that they should take initiatives to adopt some existing tools or language extensions. We studied 1050 comments randomly sampled from the latest versions of Linux, FreeBSD, and OpenSolaris. We found that 52.6 % of these comments could be leveraged by existing or to-be-proposed tools for improving reliability. Our findings include: (1) many comments describe code relationships, code evolutions, or the usage and meaning of integers and integer macros, (2) a significant amount of comments could be expressed by existing annotation languages, and (3) many comments express synchronization related concerns but are not well supported by annotation languages.
Mining control flow abnormality for logic error isolation
- In Proceedings of 2006 SIAM International Conference on Data Mining (SDM’06
, 2006
"... Analyzing the executions of a buggy program is essentially a data mining process: Tracing the data generated during program executions may disclose important patterns and outliers that could eventually reveal the location of software errors. In this paper, we investigate program logic errors, which ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Analyzing the executions of a buggy program is essentially a data mining process: Tracing the data generated during program executions may disclose important patterns and outliers that could eventually reveal the location of software errors. In this paper, we investigate program logic errors, which rarely incur memory access violations but generate incorrect outputs. We show that through mining program control flow abnormality, we could isolate many logic errors without knowing the program semantics. In order to detect the control abnormality, we propose a hypothesis testing-like approach that statistically contrasts the evaluation probability of condition statements between correct and incorrect executions. Based on this contrast, we develop two algorithms that effectively rank functions with respect to their likelihood of containing the hidden error. We evaluated these two algorithms on a set of standard test programs, and the result clearly indicates their effectiveness.
SoftGUESS: Visualization and Exploration of Code Clones in Context
- In the proceedings of the 29th International Conference on Software Engineering (ICSE’07), Tool Demo
, 2007
"... We introduce SoftGUESS, a code clone exploration system. SoftGUESS is built on the more general GUESS system which provides users with a mechanism to interactively explore graph structures both through direct manipulation as well as a domain-specific language. We demonstrate SoftGUESS through a numb ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We introduce SoftGUESS, a code clone exploration system. SoftGUESS is built on the more general GUESS system which provides users with a mechanism to interactively explore graph structures both through direct manipulation as well as a domain-specific language. We demonstrate SoftGUESS through a number of mini-applications to analyze evolutionary code-clone behavior in software systems. The miniapplications of SoftGUESS represent a novel way of looking at code-clones in the context of many system features. It is our hope that SoftGUESS will form the basis for other analysis tools in the softwareengineering domain. 1.

