Results 1 - 10
of
20
Debugging Parallel Systems: A State of the Art Report
, 2002
"... In this State of the art Report (SotA), we will give an introduction to work presented in the area of debugging large software systems with modern hardware architectures. We will discuss techniques used for single- multi- and distributed systems. In addition we will provide pointers to work by large ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
(Show Context)
In this State of the art Report (SotA), we will give an introduction to work presented in the area of debugging large software systems with modern hardware architectures. We will discuss techniques used for single- multi- and distributed systems. In addition we will provide pointers to work by large players in the field, and major conferences of importance.
Macrodebugging: Global Views of Distributed Program Execution
"... Creatinganddebuggingprogramsforwirelessembedded networks (WENs) is notoriously difficult. Macroprogramming is an emerging technology that aims to address this problem by providing high-level programming abstractions. We present MDB, the first system to support the debugging of macroprograms. MDB all ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
(Show Context)
Creatinganddebuggingprogramsforwirelessembedded networks (WENs) is notoriously difficult. Macroprogramming is an emerging technology that aims to address this problem by providing high-level programming abstractions. We present MDB, the first system to support the debugging of macroprograms. MDB allows the user to set breakpoints and step through a macroprogram using a sourcelevel debugging interface similar to GDB, a process we call macrodebugging. AkeychallengeofMDBistostepthrough a macroprogram in sequential order even though it executes on the network in a distributed, asynchronous manner. Besides allowing the user to view distributed state, MDB also provides the abilityto search for bugs over the entire history of distributed states. Finally, MDB allows the user to make hypothetical changes to a macroprogram and to see the effect on distributed state without the need to redeploy, execute, and test the new code. We show that macrodebugging is both easy and efficient: MDB consumes few system resourcesandrequiresfewusercommandstofindthecauseof bugs. We also provide a lightweight version of MDB called MDB Lite that can be used during the deployment phase to reduceresourceconsumptionwhilestilleliminatingthepossibility of heisenbugs: changes in the manifestation of bugs caused byenabling ordisabling the debugger. Categories and SubjectDescriptors
Trace: Parallel trace replay with approximate causal events
- In Proceedings of the 5th USENIX Symposium on File and Storage Technologies (FAST’07). MCDOUGALL
, 2007
"... //TRACE 1 is a new approach for extracting and replaying traces of parallel applications to recreate their I/O behavior. Its tracing engine automatically discovers inter-node data dependencies and inter-I/O compute times for each node (process) in an application. This information is reflected in per ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
(Show Context)
//TRACE 1 is a new approach for extracting and replaying traces of parallel applications to recreate their I/O behavior. Its tracing engine automatically discovers inter-node data dependencies and inter-I/O compute times for each node (process) in an application. This information is reflected in per-node annotated I/O traces. Such annotation allows a parallel replayer to closely mimic the behavior of a traced application across a variety of storage systems. When compared to other replay mechanisms, //TRACE offers significant gains in replay accuracy. Overall, the average replay error for the parallel applications evaluated in this paper is below 6%. 1
Debugging distributed programs using controlled re-execution
- In Proceedings of the 19th ACM Symposium on Principles of Distributed Computing (PODC
, 2000
"... Distributed programs are hard to write. A distributed debugger equipped with the mechanism to re-execute the traced computation in a controlled fashion can greatly facilitate the detection and localization of bugs. This approach gives rise to a general problem, called predicate control problem, whic ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
(Show Context)
Distributed programs are hard to write. A distributed debugger equipped with the mechanism to re-execute the traced computation in a controlled fashion can greatly facilitate the detection and localization of bugs. This approach gives rise to a general problem, called predicate control problem, which takes a computation and a safety property specified on the computation, and outputs a controlled computation that maintains the property. We define a class of global predicates, called region predicates, that can be controlled efficiently in a distributed computation. We prove that the synchronization generated by our algorithm is optimal. Further, we introduce the notion of an admissible sequence of events and prove that it is equivalent to the notion of predicate control. We then give an efficient algorithm for the class of disjunctive predicates based on the notion of an admissible sequence. 1.
Observation and Control for Debugging Distributed Computations
- In Proceedings of the International Workshop on Automated Debugging (AADEBUG
, 1997
"... Ipresent a general framework for observing and controlling a distributedcomputation and its applications to distributed debugging. Algorithms for observation are useful in distributed debugging to stop a distributed program under certain undesirable global conditions. Ipresent the main ideas require ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
(Show Context)
Ipresent a general framework for observing and controlling a distributedcomputation and its applications to distributed debugging. Algorithms for observation are useful in distributed debugging to stop a distributed program under certain undesirable global conditions. Ipresent the main ideas required for developing e cient algorithms for observation. Algorithms for control are useful in debugging to restrict the behavior of the distributed program to suspicious executions. It is also useful when a programmer wants to test a distributed program under certain conditions. I present di erent models and their limitations for controlling distributed computations. 1
Decomposing Partial Order Execution Graphs to Improve Message Race Detection
- Proc. 21st Int’l Parallel and Distributed Processing Symposium (IPDPS’07
, 2007
"... In message-passing parallel applications, messages are not delivered in a strict order. In most applications, the computation results and the set of messages produced during the execution should be the same for all distinct orderings of messages delivery. Finding an ordering that produces a differen ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
In message-passing parallel applications, messages are not delivered in a strict order. In most applications, the computation results and the set of messages produced during the execution should be the same for all distinct orderings of messages delivery. Finding an ordering that produces a different outcome then reveals a message race. Assuming that the Partial Order Execution Graph (POEG) capturing the causality between events is known for a reference execution, the present paper describes techniques for identifying independent sets of messages and within each set equivalent message orderings. Orderings of messages belonging to different sets may then be reexecuted independently from each other, thereby reducing the number of orderings that must be tested to detect message races. We integrated the presented techniques into the Dynamic Parallel Schedules parallelization framework, and applied our approach on an image processing, a linear algebra, and a neighborhood-dependent parallel computation. In all cases, the number of possible orderings is reduced by several orders of magnitudes. In order to further reduce this number, we describe an algorithm that generates a subset of orderings that are likely to reveal existing message races. 1.
Dynamic Testing of Flow Graph Based Parallel Applications
"... In message-passing parallel applications, messages are not delivered in a strict order. The number of messages, their content and their destination may depend on the ordering of their delivery. Nevertheless, for most applications, the computation results should be the same for all possible orderings ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
In message-passing parallel applications, messages are not delivered in a strict order. The number of messages, their content and their destination may depend on the ordering of their delivery. Nevertheless, for most applications, the computation results should be the same for all possible orderings. Finding an ordering that produces a different outcome or that prevents the execution from terminating reveals a message race or a deadlock. Starting from the initial application state, we dynamically build an acyclic message-passing state graph such that each path within the graph represents one possible message ordering. All paths lead to the same final state if no deadlock or message race exists. If multiple final states are reached, we reveal message orderings that produce the different outcomes. The corresponding executions may then be replayed for debugging purposes. We reduce the number of states to be explored by using previously acquired knowledge about communication patterns and about how operations read and modify local process variables. We also describe a heuristic that tests a subset of orderings that are likely to reveal existing message races or deadlocks. We applied our approach on several applications developed using the Dynamic Parallel Schedules (DPS) parallelization framework. Compared to the naive execution of all message orderings, the use of a message-passing state graph reduces the cost of testing all orderings by several orders of magnitude. The use of prior information further reduces the number of visited states by a factor of up to fifty in our tests. The heuristic relying on a subset of orderings was able to reveal race conditions in all tested cases. We finally present a first step in generalizing the approach to MPI applications.
A De-bugger for Flow Graph Based Parallel Applications
- Proceedings of the ACM International Symposium on Software Testing and Analysis (ISSTA'07), Parallel and Distributed Systems: Testing and Debugging workshop (PADTAD'07
, 2007
"... Flow graphs provide an explicit description of the parallelization of an application by mapping vertices onto serial computations and edges onto message transfers. We present the design and implementation of a debugger for the flow graph based Dynamic Parallel Schedules (DPS) parallelization framewo ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Flow graphs provide an explicit description of the parallelization of an application by mapping vertices onto serial computations and edges onto message transfers. We present the design and implementation of a debugger for the flow graph based Dynamic Parallel Schedules (DPS) parallelization framework. We use the flow graph to provide both a high level and detailed picture of the current state of the application execution. We describe how reordering incoming messages enables testing for the presence of message races while debugging a parallel application. The knowledge about causal dependencies between messages enables tracking messages and computations along individual branches of the flow graph. In addition to common features such as restricting the analysis to a subset of threads or processes and attaching sequential debuggers to running processes, the proposed debugger also provides support for message alterations and for message content dependent breakpoints. 1.
Testing Concurrent Software Systems
, 2006
"... Two approaches to testing concurrent software are presented. In the first, a system is assumed to contain a deterministic computation when correct, and I describe two testing algorithms to optimally achieve coverage of a testing metric involving racing pairs of messages. In the second approach, the ..."
Abstract
- Add to MetaCart
Two approaches to testing concurrent software are presented. In the first, a system is assumed to contain a deterministic computation when correct, and I describe two testing algorithms to optimally achieve coverage of a testing metric involving racing pairs of messages. In the second approach, the system model is improved to allow additional nondeterministic behavior when it is either commutative in nature or localized in its effect. I present two sets of algorithms. The first detects whether or not a system is deterministic (i.e., no race conditions affect the computation’s outcome). The second algorithm identifies localized non-determinism, and can be used to determine whether or not a system converges to a deterministic end.