Results 1 - 10
of
41
Eclat: Automatic generation and classification of test inputs
- In 19th European Conference Object-Oriented Programming
, 2005
"... Abstract. This paper describes a technique that selects, from a large set of test inputs, a small subset likely to reveal faults in the software under test. The technique takes a program or software component, plus a set of correct executions— say, from observations of the software running properly, ..."
Abstract
-
Cited by 91 (12 self)
- Add to MetaCart
Abstract. This paper describes a technique that selects, from a large set of test inputs, a small subset likely to reveal faults in the software under test. The technique takes a program or software component, plus a set of correct executions— say, from observations of the software running properly, or from an existing test suite that a user wishes to enhance. The technique first infers an operational model of the software’s operation. Then, inputs whose operational pattern of execution differs from the model in specific ways are suggestive of faults. These inputs are further reduced by selecting only one input per operational pattern. The result is a small portion of the original inputs, deemed by the technique as most likely to reveal faults. Thus, the technique can also be seen as an error-detection technique. The paper describes two additional techniques that complement test input selection. One is a technique for automatically producing an oracle (a set of assertions) for a test input from the operational model, thus transforming the test input into a test case. The other is a classification-guided test input generation technique that also makes use of operational models and patterns. When generating inputs, it filters out code sequences that are unlikely to contribute to legal inputs, improving the efficiency of its search for fault-revealing inputs. We have implemented these techniques in the Eclat tool, which generates unit tests for Java classes. Eclat’s input is a set of classes to test and an example program execution—say, a passing test suite. Eclat’s output is a set of JUnit test cases, each containing a potentially fault-revealing input and a set of assertions at least one of which fails. In our experiments, Eclat successfully generated inputs that exposed fault-revealing behavior; we have used Eclat to reveal real errors in programs. The inputs it selects as fault-revealing are an order of magnitude as likely to reveal a fault as all generated inputs. 1
Improved error reporting for software that uses black-box components
- In Proceedings of the Conference on Programming Language Design and Implementation 2007
, 2007
"... An error occurs when software cannot complete a requested action as a result of some problem with its input, configuration, or environment. A high-quality error report allows a user to understand and correct the problem. Unfortunately, the quality of error reports has been decreasing as software bec ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
An error occurs when software cannot complete a requested action as a result of some problem with its input, configuration, or environment. A high-quality error report allows a user to understand and correct the problem. Unfortunately, the quality of error reports has been decreasing as software becomes more complex and layered. End-users take the cryptic error messages given to them by programs and struggle to fix their problems using search engines and support websites. Developers cannot improve their error messages when they receive an ambiguous or otherwise insufficient error indicator from a black-box software component. We introduce Clarify, a system that improves error reporting by classifying application behavior. Clarify uses minimally invasive monitoring to generate a behavior profile, which is a summary of the program’s execution history. A machine learning classifier uses the behavior profile to classify the application’s behavior,
Applying classification techniques to remotely-collected program execution data
- IN PROCEEDINGS OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE 2005
, 2005
"... There is an increasing interest in techniques that support measurement and analysis of fielded software systems. One of the main goals of these techniques is to better understand how software actually behaves in the field. In particular, many of these techniques require a way to distinguish, in the ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
There is an increasing interest in techniques that support measurement and analysis of fielded software systems. One of the main goals of these techniques is to better understand how software actually behaves in the field. In particular, many of these techniques require a way to distinguish, in the field, failing from passing executions. So far, researchers and practitioners have only partially addressed this problem: they have simply assumed that program failure status is either obvious (i.e., the program crashes) or provided by an external source (e.g., the users). In this paper, we propose a technique for automatically classifying execution data, collected in the field, as coming from either passing or failing program runs. (Failing program runs are executions that terminate with a failure, such as a wrong outcome.) We use statistical learning algorithms to build the classification models. Our approach builds the models by analyzing executions performed in a controlled environment (e.g., test cases run in-house) and then uses the models to predict whether execution data produced by a fielded instance were generated by a passing or failing program execution. We also present results from an initial feasibility study, based on multiple versions of a software subject, in which we investigate several issues vital to the applicability of the technique. Finally, we present some lessons learned regarding the interplay between the reliability of classification models and the amount and type of data collected.
Context-Aware Statistical Debugging: From Bug Predictors to Faulty Control Flow Paths
- ASE'07
, 2007
"... Effective bug localization is important for realizing automated debugging. One attractive approach is to apply statistical techniques on a collection of evaluation profiles of program properties to help localize bugs. Previous research has proposed various specialized techniques to isolate certain p ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Effective bug localization is important for realizing automated debugging. One attractive approach is to apply statistical techniques on a collection of evaluation profiles of program properties to help localize bugs. Previous research has proposed various specialized techniques to isolate certain program predicates as bug predictors. However, because many bugs may not be directly associated with these predicates, these techniques are often ineffective in localizing bugs. Relevant control flow paths that may contain bug locations are more informative than stand-alone predicates for discovering and understanding bugs. In this paper, we propose an approach to automatically generate such faulty control flow paths that link many bug predictors together for revealing bugs. Our approach combines feature selection (to accurately select failure-related predicates as bug predictors), clustering (to group correlated predicates), and control flow graph traversal in a novel way to help generate the paths. We have evaluated our approach on code including the Siemens test suite and rhythmbox (a large music management application for GNOME). Our experiments show that the faulty control flow paths are accurate, useful for localizing many bugs, and helped to discover previously unknown errors in rhythmbox.
MATRIX: Maintenance-Oriented Testing Requirements Identifier and Examiner
"... This paper presents a new test-suite augmentation technique for use in regression testing of software. Our technique combines dependence analysis and symbolic evaluation and uses information about the changes between two versions of a program to (1) identify parts of the program affected by the chan ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
This paper presents a new test-suite augmentation technique for use in regression testing of software. Our technique combines dependence analysis and symbolic evaluation and uses information about the changes between two versions of a program to (1) identify parts of the program affected by the changes, (2) compute the conditions under which the effects of the changes are propagated to such parts, and (3) create a set of testing requirements based on the computed information. Testers can use these requirements to assess the effectiveness of the regression testing performed so far and to guide the selection of new test cases. The paper also presents MATRIX, a tool that partially implements our technique, and its integration into a regression-testing environment. Finally, the paper presents a preliminary empirical study performed on two small programs. The study provides initial evidence of both the effectiveness of our technique and the shortcomings of previous techniques in assessing the adequacy of a test suite with respect to exercising the effect of program changes. 1
Failure proximity: A fault localization-based approach
- In Proceedings of the International Symposium on the Foundations of Software Engineering
, 2006
"... Recent software systems usually feature an automated failure reporting system, with which a huge number of failing traces are collected every day. In order to prioritize fault diagnosis, failing traces due to the same fault are expected to be grouped together. Previous methods, by hypothesizing that ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Recent software systems usually feature an automated failure reporting system, with which a huge number of failing traces are collected every day. In order to prioritize fault diagnosis, failing traces due to the same fault are expected to be grouped together. Previous methods, by hypothesizing that similar failing traces imply the same fault, cluster failing traces based on the literal trace similarity, which we call trace proximity. However, since a fault can be triggered in many ways, failing traces due to the same fault can be quite different. Therefore, previous methods actually group together traces exhibiting similar behaviors, like similar branch coverage, rather than traces due to the same fault. In this paper, we propose a new type of failure proximity, called R-Proximity, which regards two failing traces as similar if they suggest roughly the same fault location. The fault location each failing case suggests is automatically obtained with Sober, an existing statistical debugging tool. We show that with R-Proximity, failing traces due to the same fault can be grouped together. In addition, we find that R-Proximity is helpful for statistical debugging: It can help developers interpret and utilize the statistical debugging result. We illustrate the usage of R-Proximity with a case study on the grep program and some experiments on the Siemens suite, and the result clearly demonstrates the advantage of R-Proximity over trace proximity.
Classification of software behaviors for failure detection: A discriminative pattern mining approach
- In KDD
, 2009
"... Software is a ubiquitous component of our daily life. We often depend on the correct working of software systems. Due to the difficulty and complexity of software systems, bugs and anomalies are prevalent. Bugs have caused billions of dollars loss, in addition to privacy and security threats. In thi ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Software is a ubiquitous component of our daily life. We often depend on the correct working of software systems. Due to the difficulty and complexity of software systems, bugs and anomalies are prevalent. Bugs have caused billions of dollars loss, in addition to privacy and security threats. In this work, we address software reliability issues by proposing a novel method to classify software behaviors based on past history or runs. With the technique, it is possible to generalize past known errors and mistakes to capture failures and anomalies. Our technique first mines a set of discriminative features capturing repetitive series of events from program execution traces. It then performs feature selection to select the best features for classification. These features are then used to train a classifier to detect failures. Experiments and case studies on traces of several benchmark software systems and a real-life concurrency bug from MySQL server show the utility of the technique in capturing failures and anomalies. On average, our pattern-based classification technique outperforms the baseline approach by 24.68 % in accuracy 1.
On-line anomaly detection of deployed software: a statistical machine learning approach
- In Proc. of the 3rd International Workshop on Software Quality Assurance
, 2006
"... This paper presents a new machine-learning technique that performs anomaly detection as software is executing in the field. The technique uses a fully observable Markov model where each state in the model emits a number of distinct observations according to a probability distribution, and estimates ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper presents a new machine-learning technique that performs anomaly detection as software is executing in the field. The technique uses a fully observable Markov model where each state in the model emits a number of distinct observations according to a probability distribution, and estimates the model parameters using the Baum-Welch algorithm. The trained model is then deployed with the software to perform anomaly detection. By performing the anomaly detection as the software is executing, faults associated with anomalies can be located and fixed before they cause critical failures in the system, and developers time to debug deployed software can be reduced. This paper also presents a prototype implementation of our technique, along with a case study that shows, for the subjects we studied, the effectiveness of the technique.
Techniques for classifying executions of deployed software to support software engineering tasks
- IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
, 2007
"... There is an increasing interest in techniques that support analysis and measurement of fielded software systems. These techniques typically deploy numerous instrumented instances of a software system, collect execution data when the instances run in the field, and analyze the remotely collected data ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
There is an increasing interest in techniques that support analysis and measurement of fielded software systems. These techniques typically deploy numerous instrumented instances of a software system, collect execution data when the instances run in the field, and analyze the remotely collected data to better understand the system’s in-the-field behavior. One common need for these techniques is the ability to distinguish execution outcomes (e.g., to collect only data corresponding to some behavior or to determine how often and under which condition a specific behavior occurs). Most current approaches, however, do not perform any kind of classification of remote executions and either focus on easily observable behaviors (e.g., crashes) or assume that outcomes’ classifications are externally provided (e.g., by the users). To address the limitations of existing approaches, we have developed three techniques for automatically classifying execution data as belonging to one of several classes. In this paper, we introduce our techniques and apply them to the binary classification of passing and failing behaviors. Our three techniques impose different overheads on program instances and, thus, each is appropriate for different application scenarios. We performed several empirical studies to evaluate and refine our techniques and to investigate the trade-offs among them. Our results show that 1) the first technique can build very accurate models, but requires a complete set of execution data; 2) the second technique produces slightly less accurate models, but needs only a small fraction of the total execution data; and 3) the third technique allows for even further cost reductions by building the models incrementally, but requires some sequential ordering of the software instances ’ instrumentation.
Automatic Goal-Oriented Classification of Failure Behaviors for Testing XML-Based Multimedia Software Applications: an Experimental Case Study
- Journal of Systems and Software
, 2006
"... When testing multimedia software applications, we need to overcome important issues such as the forbidding size of the input domains, great difficulties in repeating non-deterministic test outcomes, and the test oracle problem. A statistical testing methodology is proposed. It applies pattern classi ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
When testing multimedia software applications, we need to overcome important issues such as the forbidding size of the input domains, great difficulties in repeating non-deterministic test outcomes, and the test oracle problem. A statistical testing methodology is proposed. It applies pattern classification techniques enhanced with the notion of test dimensions. Test dimensions are orthogonal properties of associated test cases. Temporal properties are being studied in the experimentation in this paper. For each test dimension, a pattern classifier is trained on the normal and abnormal behaviors. A type of failure is said to be classified if it is recognized by the classifier. Test cases can then be analyzed by the failure pattern recognizers. Experiments show that some test dimensions are more effective than others in failure identification and classification.

