Results 1 -
7 of
7
Leveraging Existing Instrumentation to Automatically Infer Invariant-Constrained Models
"... Computer systems are often difficult to debug and understand. A common way of gaining insight into system behavior is to inspect execution logs and documentation. Unfortunately, manual inspection of logs is an arduous process and documentation is often incomplete and out of sync with the implementat ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Computer systems are often difficult to debug and understand. A common way of gaining insight into system behavior is to inspect execution logs and documentation. Unfortunately, manual inspection of logs is an arduous process and documentation is often incomplete and out of sync with the implementation. This paper presents Synoptic, a tool that helps developers by inferring a concise and accurate system model. Unlike most related work, Synoptic does not require developer-written scenarios, specifications, negative execution examples, or other complex user input. Synoptic processes the logs most systems already produce and requires developers only to specify a set of regular expressions for parsing the logs. Synoptic has two unique features. First, the model it produces satisfies three kinds of temporal invariants mined from the logs, improving accuracy over related approaches. Second, Synoptic uses refinement and coarsening to explore the space of models. This improves model efficiency and precision, compared to using just one approach. In this paper, we formally prove that Synoptic always produces a model that satisfies exactly the temporal invariants mined from the log, and we argue that it does so efficiently. We empirically evaluate Synoptic through two user experience studies, one with a developer of a large, real-world system and another with 45 students in a distributed systems course. Developers used Synoptic-generated models to verify known bugs, diagnose new bugs, and increase their confidence in the correctness of their systems. None of the developers in our evaluation had a background in formal methods but were able to easily use Synoptic and detect implementation bugs in as little as a few minutes.
Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis
"... Abstract-- Detection of execution anomalies is very important for the maintenance, development, and performance refinement of large scale distributed systems. Execution anomalies include both work flow errors and low performance problems. People often use system logs produced by distributed systems ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract-- Detection of execution anomalies is very important for the maintenance, development, and performance refinement of large scale distributed systems. Execution anomalies include both work flow errors and low performance problems. People often use system logs produced by distributed systems for troubleshooting and problem diagnosis. However, manually inspecting system logs to detect anomalies is unfeasible due to the increasing scale and complexity of distributed systems. Therefore, there is a great demand for automatic anomaly detection techniques based on log analysis. In this paper, we propose an unstructured log analysis technique for anomaly detection. In the technique, we propose a novel algorithm to convert free form text messages in log files to log keys without heavily relying on application specific knowledge. The log keys correspond to the log-print statements in the source code which can provide cues of system execution behavior. After converting log messages to log keys, we learn a Finite State Automaton (FSA) from training log sequences to present the normal work flow for each system component. At the same time, a performance measurement model is learned to characterize the normal execution performance based on the log messages ’ timing information. With these learned models, we can automatically detect anomalies in newly input log files. Experiments on Hadoop and SILK show that the technique can effectively detect running anomalies.
I-Queue: Smart queues for service management
- in Proceedings of the 4th International Conference on Service Oriented Computing (ICSOC 06), Lecture Notes in Computer Science
, 2006
"... Abstract. Modern enterprise applications and systems are characterized by complex underlying software structures, constantly evolving feature sets, and frequent changes in the data on which they operate. The dynamic nature of these applications and systems poses substantial challenges to their use a ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. Modern enterprise applications and systems are characterized by complex underlying software structures, constantly evolving feature sets, and frequent changes in the data on which they operate. The dynamic nature of these applications and systems poses substantial challenges to their use and management, suggesting the need for automated solutions. This paper considers a specific set of dynamic changes, large data updates that reflect changes in the current state of the business, where the frequency of such updates can be multiple times per day. The paper then presents techniques and their middleware implementation for automatically managing requests streams directed at server applications subjected to dynamic data updates, the goal being to improve application reliability in face of evolving feature sets and business data. These techniques (1) automatically detect input patterns that lead to performance degradation or failures and then (2) use these detections to trigger application-specific methods that control input patterns to avoid or at least, defer such undesirable phenomena. Lab experiments using actual traces from Worldspan show a 16 % decrease in frequency of server restarts when using these techniques, at negligible costs in additional overheads and within delays suitable for the rates of changes experienced by this application. 1
Mining Temporal Invariants from Partially Ordered Logs
"... A common assumption made in log analysis research is that the underlying log is totally ordered. For concurrent systems, this assumption constrains the generated log to either exclude concurrency altogether, or to capture a particular interleaving of concurrent events. This paper argues that capturi ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A common assumption made in log analysis research is that the underlying log is totally ordered. For concurrent systems, this assumption constrains the generated log to either exclude concurrency altogether, or to capture a particular interleaving of concurrent events. This paper argues that capturing concurrency as a partial order is useful and often indispensable for answering important questions about concurrent systems. To this end, we motivate a family of event ordering invariants over partially ordered event traces, give three algorithms for mining these invariants from logs, and evaluate their scalability on simulated distributed system logs. 1
Visual, Log-based Causal Tracing for Performance Debugging of MapReduce Systems
"... Abstract—The distributed nature and large scale of MapReduce programs and systems poses two challenges in using existing profiling and debugging tools to understand MapReduce programs. Existing tools produce too much information because of the large scale of MapReduce programs, and they do not expos ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract—The distributed nature and large scale of MapReduce programs and systems poses two challenges in using existing profiling and debugging tools to understand MapReduce programs. Existing tools produce too much information because of the large scale of MapReduce programs, and they do not expose program behaviors in terms of Maps and Reduces. We have developed a novel non-intrusive log-analysis technique which extracts state-machine views of the control- and data-flows in MapReduce behavior from the native logs of Hadoop MapReduce systems, and it synthesizes these views to create a unified, causal view of MapReduce program behavior. This technique enables us to visualize MapReduce programs in terms of MapReducespecific behaviors, greatly aiding operators in reasoning about and debugging performance problems in MapReduce systems in a scalable fashion. We validate our technique and visualizations using a real-world workload, showing how to understand the structure and performance behavior of MapReduce jobs, and diagnose injected performance problems reproduced from realworld problems. I.
Log-based Approaches to Characterizing and Diagnosing MapReduce Systems
, 2009
"... not be interpreted as representing the official policies, either expressed or implied, of the Singapore Government, or the U.S. Government. Keywords: MapReduce, Hadoop, Failure Diagnosis, Log analysisOur deepest fear is not that we are inadequate. Our deepest fear is that we are powerful MapReduce p ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
not be interpreted as representing the official policies, either expressed or implied, of the Singapore Government, or the U.S. Government. Keywords: MapReduce, Hadoop, Failure Diagnosis, Log analysisOur deepest fear is not that we are inadequate. Our deepest fear is that we are powerful MapReduce programs and systems are large-scale, highly distributed and parallel, consisting of many interdependent Map and Reduce tasks executing simultaneously on potentially large numbers of cluster nodes. They typically process large datasets and run for long durations. Thus, diagnosing failures in MapReduce programs is challenging due to their scale. This renders traditional time-based Service-Level Objectives ineffective. Hence, even detecting whether a MapReduce program is suffering from a performance problem is difficult. Tools for debugging and profiling traditional programs are not suitable for MapReduce programs, as they generate too much information at the scale of MapReduce programs, do not fully expose the distributed interdependencies, and do not expose information at the MapReduce level of abstraction.

