Results 1 - 10
of
349
KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs
"... We present a new symbolic execution tool, KLEE, capable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs. We used KLEE to thoroughly check all 89 stand-alone programs in the GNU COREUTILS utility suite, which form the cor ..."
Abstract
-
Cited by 557 (15 self)
- Add to MetaCart
(Show Context)
We present a new symbolic execution tool, KLEE, capable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs. We used KLEE to thoroughly check all 89 stand-alone programs in the GNU COREUTILS utility suite, which form the core user-level environment installed on millions of Unix systems, and arguably are the single most heavily tested set of open-source programs in existence. KLEE-generated tests achieve high line coverage — on average over 90% per tool (median: over 94%) — and significantly beat the coverage of the developers’ own hand-written test suite. When we did the same for 75 equivalent tools in the BUSYBOX embedded system suite, results were even better, including 100 % coverage on 31 of them. We also used KLEE as a bug finding tool, applying it to 452 applications (over 430K total lines of code), where it found 56 serious bugs, including three in COREUTILS that had been missed for over 15 years. Finally, we used KLEE to crosscheck purportedly identical BUSYBOX and COREUTILS utilities, finding functional correctness errors and a myriad of inconsistencies.
Automated Whitebox Fuzz Testing
"... Fuzz testing is an effective technique for finding security vulnerabilities in software. Traditionally, fuzz testing tools apply random mutations to well-formed inputs of a program and test the resulting values. We present an alternative whitebox fuzz testing approach inspired by recent advances in ..."
Abstract
-
Cited by 311 (25 self)
- Add to MetaCart
(Show Context)
Fuzz testing is an effective technique for finding security vulnerabilities in software. Traditionally, fuzz testing tools apply random mutations to well-formed inputs of a program and test the resulting values. We present an alternative whitebox fuzz testing approach inspired by recent advances in symbolic execution and dynamic test generation. Our approach records an actual run of the program under test on a well-formed input, symbolically evaluates the recorded trace, and gathers constraints on inputs capturing how the program uses these. The collected constraints are then negated one by one and solved with a constraint solver, producing new inputs that exercise different control paths in the program. This process is repeated with the help of a code-coverage maximizing heuristic designed to find defects as fast as possible. We have implemented this algorithm in SAGE (Scalable, Automated, Guided Execution), a new tool employing x86 instruction-level tracing and emulation for whitebox fuzzing of arbitrary file-reading Windows applications. We describe key optimizations needed to make dynamic test generation scale to large input files and long execution traces with hundreds of millions of instructions. We then present detailed experiments with several Windows applications. Notably, without any format-specific knowledge, SAGE detects the MS07-017 ANI vulnerability, which was missed by extensive blackbox fuzzing and static analysis tools. Furthermore, while still in an early stage of development, SAGE has already discovered 30+ new bugs in large shipped Windows applications including image processors, media players, and file decoders. Several of these bugs are potentially exploitable memory access violations.
A decision procedure for bit-vectors and arrays
- IN COMPUTER AIDED VERIFICATION, NUMBER 4590 IN LNCS
, 2007
"... STP is a decision procedure for the satisfiability of quantifier-free formulas in the theory of bit-vectors and arrays that has been optimized for large problems encountered in software analysis applications. The basic architecture of the procedure consists of word-level pre-processing algorithms fo ..."
Abstract
-
Cited by 190 (11 self)
- Add to MetaCart
STP is a decision procedure for the satisfiability of quantifier-free formulas in the theory of bit-vectors and arrays that has been optimized for large problems encountered in software analysis applications. The basic architecture of the procedure consists of word-level pre-processing algorithms followed by translation to SAT. The primary bottlenecks in software verification and bug finding applications are large arrays and linear bit-vector arithmetic. New algorithms based on the abstraction-refinement paradigm are presented for reasoning about large arrays. A solver for bit-vector linear arithmetic is presented that eliminates variables and parts of variables to enable other transformations, and reduce the size of the problem that is eventually received by the SAT solver. These and other algorithms have been implemented in STP, which has been heavily tested over thousands of examples obtained from several real-world applications. Experimental results indicate that the above mix of algorithms along with the overall architecture is far more effective, for a variety of applications, than a direct translation of the original formula to SAT or other comparable decision procedures.
Exploring multiple execution paths for malware analysis
- In Security and Privacy, 2007. SP ’07. IEEE Symposium on
, 2007
"... Malicious code (or malware) is defined as software that fulfills the deliberately harmful intent of an attacker. Malware analysis is the process of determining the behavior and purpose of a given malware sample (such as a virus, worm, or Trojan horse). This process is a necessary step to be able to ..."
Abstract
-
Cited by 151 (13 self)
- Add to MetaCart
(Show Context)
Malicious code (or malware) is defined as software that fulfills the deliberately harmful intent of an attacker. Malware analysis is the process of determining the behavior and purpose of a given malware sample (such as a virus, worm, or Trojan horse). This process is a necessary step to be able to develop effective detection techniques and removal tools. Currently, malware analysis is mostly a manual process that is tedious and time-intensive. To mitigate this problem, a number of analysis tools have been proposed that automatically extract the behavior of an unknown program by executing it in a restricted environment and recording the operating system calls that are invoked. The problem of dynamic analysis tools is that only a single program execution is observed. Unfortunately, however, it is possible that certain malicious actions are only triggered under specific circumstances (e.g., on a particular day, when a certain file is present, or when a certain command is received). In this paper, we propose a system that allows us to explore multiple execution paths and identify malicious actions that are executed only when certain conditions are met. This enables us to automatically extract a more complete view of the program under analysis and identify under which circumstances suspicious actions are carried out. Our experimental results demonstrate that many malware samples show different behavior depending on input read from the environment. Thus, by exploring multiple execution paths, we can obtain a more complete picture of their actions. 1
Output-deterministic replay for multicore debugging
, 2009
"... Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zero-in on bugs. Unfortunately, existing replay systems for multiprocessor programs fall short. These systems either incur hi ..."
Abstract
-
Cited by 127 (6 self)
- Add to MetaCart
(Show Context)
Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zero-in on bugs. Unfortunately, existing replay systems for multiprocessor programs fall short. These systems either incur high overheads, rely on non-standard multiprocessor hardware, or fail to reliably reproduce executions. Their primary stumbling block is data races – a source of nondeterminism that must be captured if executions are to be faithfully reproduced. In this paper, we present ODR–a software-only replay system that reproduces bugs and provides low-overhead multiprocessor recording. The key observation behind ODR is that, for debugging purposes, a replay system does not need to generate a high-fidelity replica of the original execution. Instead, it suffices to produce any execution that exhibits the same outputs as the original. Guided by this observation, ODR relaxes its fidelity guarantees to avoid the problem of reproducing data-races altogether. The result is a system that replays real multiprocessor applications, such as Apache, MySQL, and the Java Virtual Machine, and provides low record-mode overhead. Categories andSubjectDescriptors D.2.5 [Testing and Debugging]: Debugging aids
HAMPI: A Solver for String Constraints
, 2009
"... Many automatic testing, analysis, and verification techniques for programs can be effectively reduced to a constraint-generation phase followed by a constraint-solving phase. This separation of concerns often leads to more effective and maintainable tools. The increasing efficiency of off-the-shelf ..."
Abstract
-
Cited by 101 (19 self)
- Add to MetaCart
(Show Context)
Many automatic testing, analysis, and verification techniques for programs can be effectively reduced to a constraint-generation phase followed by a constraint-solving phase. This separation of concerns often leads to more effective and maintainable tools. The increasing efficiency of off-the-shelf constraint solvers makes this approach even more compelling. However, there are few effective and sufficiently expressive off-the-shelf solvers for string constraints generated by analysis techniques for string-manipulating programs. We designed and implemented Hampi, a solver for string constraints over fixed-size string variables. Hampi constraints express membership in regular languages and fixed-size context-free languages. Hampi constraints may contain context-free-language definitions, regular-language definitions and operations, and the membership predicate. Given a set of constraints, Hampi outputs a string that satisfies all the constraints, or reports that the constraints are unsatisfiable. Hampi is expressive and efficient, and can be successfully applied to testing and analysis of real programs. Our experiments use Hampi in: static and dynamic analyses for finding SQL injection vulnerabilities in Web applications; automated bug finding in C programs using systematic testing; and compare Hampi with another string solver. Hampi’s source code, documentation, and the experimental data are available at
Grammar-based Whitebox Fuzzing
"... Whitebox fuzzing is a form of automatic dynamic test generation, based on symbolic execution and constraint solving, designed for security testing of large applications. However, the effectiveness of whitebox fuzzing is limited when testing applications with highly-structured inputs, such as compile ..."
Abstract
-
Cited by 96 (8 self)
- Add to MetaCart
Whitebox fuzzing is a form of automatic dynamic test generation, based on symbolic execution and constraint solving, designed for security testing of large applications. However, the effectiveness of whitebox fuzzing is limited when testing applications with highly-structured inputs, such as compilers and interpreters. These applications process their inputs in stages, such as lexing, parsing and evaluation. Due to the enormous number of control paths in the early processing stages, whitebox fuzzing rarely reaches parts of the application beyond the first stages. In this paper, we study how to enhance whitebox fuzzing of complex structured-input applications with a grammarbased specification of their valid inputs. We present a novel dynamic test generation algorithm where symbolic execution directly generates grammar-based constraints whose satisfiability is checked using a custom grammar-based constraint solver. We have implemented this algorithm and evaluated it on a large security-critical application, the JavaScript interpreter of Internet Explorer 7. Results of experiments show that grammar-based whitebox fuzzing explores deeper program paths and avoids dead-ends due to non-parsable inputs. Compared to regular whitebox fuzzing, grammar-based whitebox fuzzing increased coverage of the code generation module of the IE7 JavaScript interpreter from 53 % to 81% while using three times fewer tests.
DSD-Crasher: A hybrid analysis tool for bug finding
- In ISSTA
, 2006
"... DSD-Crasher is a bug finding tool that follows a three-step approach to program analysis: D. Capture the program’s intended execution behavior with dynamic invariant detection. The derived invariants exclude many unwanted values from the program’s input domain. S. Statically analyze the program with ..."
Abstract
-
Cited by 90 (5 self)
- Add to MetaCart
(Show Context)
DSD-Crasher is a bug finding tool that follows a three-step approach to program analysis: D. Capture the program’s intended execution behavior with dynamic invariant detection. The derived invariants exclude many unwanted values from the program’s input domain. S. Statically analyze the program within the restricted input domain to explore many paths. D. Automatically generate test cases that focus on reproducing the predictions of the static analysis. Thereby confirmed results are feasible. This three-step approach yields benefits compared to past two-step combinations in the literature. In our evaluation with third-party applications, we demonstrate higher precision over tools that lack a dynamic step and higher efficiency over tools that lack a static step.
An attack surface metric
, 2008
"... Abstract—Measurement of software security is a long standing challenge to the research community. At the same time, practical security metrics and measurements are essential for secure software development. Hence the need for metrics is more pressing now due to a growing demand for secure software. ..."
Abstract
-
Cited by 89 (4 self)
- Add to MetaCart
(Show Context)
Abstract—Measurement of software security is a long standing challenge to the research community. At the same time, practical security metrics and measurements are essential for secure software development. Hence the need for metrics is more pressing now due to a growing demand for secure software. In this paper, we propose to use a software system’s attack surface measurement as an indicator of the system’s security. We formalize the notion of a system’s attack surface and introduce an attack surface metric to measure the attack surface in a systematic manner. Our measurement method is agnostic to a software system’s implementation language and is applicable to systems of all sizes; we demonstrate our method by measuring the attack surfaces of small desktop applications and large enterprise systems implemented in C and Java. We conducted three exploratory empirical studies to validate our method. Software developers can mitigate their software’s security risk by measuring and reducing their software’s attack surfaces. Our attack surface reduction approach complements software industry’s traditional code quality improvement approach for security risk mitigation and is useful in multiple phases of the software development lifecycle. Our collaboration with SAP demonstrates the use of our metric in the software development process.
Heuristics for scalable dynamic test generation
, 2008
"... Abstract—Recently there has been great success in using symbolic execution to automatically generate test inputs for small software systems. A primary challenge in scaling such approaches to larger programs is the combinatorial explosion of the path space. It is likely that sophisticated strategies ..."
Abstract
-
Cited by 82 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Recently there has been great success in using symbolic execution to automatically generate test inputs for small software systems. A primary challenge in scaling such approaches to larger programs is the combinatorial explosion of the path space. It is likely that sophisticated strategies for searching this path space are needed to generate inputs that effectively test large programs (by, e.g., achieving significant branch coverage). We present several such heuristic search strategies, including a novel strategy guided by the control flow graph of the program under test. We have implemented these strategies in CREST, our open source concolic testing tool for C, and evaluated them on two widely-used software tools, grep 2.2 (15K lines of code) and Vim 5.7 (150K lines). On these benchmarks, the presented heuristics achieve significantly greater branch coverage on the same testing budget than concolic testing with a traditional depth-first search strategy. I.