Results 1 - 10
of
32
Decoupling dynamic program analysis from execution in virtual environments
"... Analyzing the behavior of running programs has a wide variety of compelling applications, from intrusion detection and prevention to bug discovery. Unfortunately, the high runtime overheads imposed by complex analysis techniques makes their deployment impractical in most settings. We present a virtu ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
Analyzing the behavior of running programs has a wide variety of compelling applications, from intrusion detection and prevention to bug discovery. Unfortunately, the high runtime overheads imposed by complex analysis techniques makes their deployment impractical in most settings. We present a virtual machine based architecture called Aftersight ameliorates this, providing a flexible and practical way to run heavyweight analyses on production workloads. Aftersight decouples analysis from normal execution by logging nondeterministic VM inputs and replaying them on a separate analysis platform. VM output can be gated on the results of an analysis for intrusion prevention or analysis can run at its own pace for intrusion detection and best effort prevention. Logs can also be stored for later analysis offline for bug finding or forensics, allowing analyses that would otherwise be unusable to be applied ubiquitously. In all cases, multiple analyses can be run in parallel, added on demand, and are guaranteed not to interfere with the running workload. We present our experience implementing Aftersight as part of the VMware virtual machine platform and using it to develop a realtime intrusion detection and prevention system, as well as an an offline system for bug detection, which we used to detect numerous novel and serious bugs in VMware ESX Server, Linux, and Windows applications.
Operating System Transactions
, 2008
"... Operating systems should provide system transactions to user applications, in which user-level processes execute a series of system calls atomically and in isolation from other processes on the system. System transactions provide a simple tool for programmers to express safety conditions during conc ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Operating systems should provide system transactions to user applications, in which user-level processes execute a series of system calls atomically and in isolation from other processes on the system. System transactions provide a simple tool for programmers to express safety conditions during concurrent execution. This paper describes TxOS, a variant of Linux 2.6.22, which is the first operating system to implement system transactions on commodity hardware with strong isolation and fairness between transactional and non-transactional system calls. System transactions provide a simple and expressive interface for user programs to avoid race conditions on system resources. For instance, system transactions eliminate time-of-check-to-time-of-use (TOCTTOU) race conditions in the file system which are a class of security vulnerability that are difficult to eliminate with other techniques. System transactions also provide transactional semantics for user-level transactions that require system resources, allowing applications using hardware or software transactional memory system to safely make system calls. While system transactions may reduce single-thread performance, they can yield more scalable performance. For example, enclosing link and unlink within a system transaction outperforms rename on Linux by 14 % at 8 CPUs.
Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism
"... Deterministic replay systems record and reproduce the execution of a hardware or software system. While it is well known how to replay uniprocessor systems, replaying shared memory multiprocessor systems at low overhead on commodity hardware is still an open problem. This paper presents Respec, a ne ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Deterministic replay systems record and reproduce the execution of a hardware or software system. While it is well known how to replay uniprocessor systems, replaying shared memory multiprocessor systems at low overhead on commodity hardware is still an open problem. This paper presents Respec, a new way to support deterministic replay of shared memory multithreaded programs on commodity multiprocessor hardware. Respec targets online replay in which the recorded and replayed processes execute concurrently. Respec uses two strategies to reduce overhead while still ensuring correctness: speculative logging and externally deterministic replay. Speculative logging optimistically logs less information about shared memory dependencies than is needed to guarantee deterministic replay, then recovers and retries if the replayed process diverges from the recorded process. Externally deterministic replay relaxes the degree to which the two executions must match by requiring only their system output and final program states match. We show that the combination of these two techniques results in low recording and replay overhead for the common case of datarace-free execution intervals and still ensures correct replay for execution intervals that have data races. We modified the Linux kernel to implement our techniques. Our software system adds on average about 18 % overhead to the execution time for recording and replaying programs with two threads and 55 % overhead for programs with four threads.
DoublePlay: Parallelizing Sequential Logging and Replay
"... Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the need to reproduce the order or values read by sha ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the need to reproduce the order or values read by shared memory operations performed by multiple threads. In this paper, we present DoublePlay, a new way to efficiently guarantee replay on commodity multiprocessors. Our key insight is that one can use the simpler and faster mechanisms of single-processor record and replay, yet still achieve the scalability offered by multiple cores, by using an additional execution to parallelize the record and replay of an application. DoublePlay timeslices multiple threads on a single processor, then runs multiple time intervals (epochs) of the program concurrently on separate processors. This strategy, which we call uniparallelism, makes logging much easier because each epoch runs on a single processor (so threads in an epoch never simultaneously access the same memory) and different epochs operate on different copies of the memory. Thus, rather than logging the order of shared-memory accesses, we need only log the order in which threads in an epoch are timesliced on the processor. DoublePlay runs an additional execution of the program on multiple processors to generate checkpoints so that epochs run in parallel. We evaluate DoublePlay on a variety of client, server, and scientific parallel benchmarks; with spare cores, DoublePlay reduces logging overhead to an average of 15 % with two worker threads and 28 % with four threads.
C.: Fast track: A software system for speculative program optimization
- In: CGO
, 2009
"... Abstract—Fast track is a software speculation system that enables unsafe optimization of sequential code. It speculatively runs optimized code to improve performance and then checks the correctness of the speculative code by running the original program on multiple processors. We present the interfa ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Abstract—Fast track is a software speculation system that enables unsafe optimization of sequential code. It speculatively runs optimized code to improve performance and then checks the correctness of the speculative code by running the original program on multiple processors. We present the interface design and system implementation for Fast Track. It lets a programmer or a profiling tool mark fast-track code regions and uses a run-time system to manage the parallel execution of the speculative process and its checking processes and ensures the correct display of program outputs. The core of the run-time system is a novel concurrent algorithm that balances exploitable parallelism and available processors when the fast track is too slow or too fast. The programming interface closely affects the run-time support. Our system permits both explicit and implicit end markers for speculatively optimized code regions as well as extensions that allow the use of multiple tracks and user defined correctness checking. We discuss the possible uses of speculative optimization and demonstrate the effectiveness of our prototype system by examples of unsafe semantic optimization and a general system for fast memory-safety checking, which is able to reduce the checking time by factors between 2 and 7 for large sequential code on a 8-CPU system. while (...) { if (FastTrack ()){ / * unsafely */ / * optimized */ fast_fortuitous(); else { / * safe code */ safe_sequential();
Parallelizing Dynamic Information Flow Tracking
"... Dynamic information flow tracking (DIFT) is an important tool for detecting common security attacks and memory bugs. A DIFT tool tracks the flow of information through a monitored program’s registers and memory locations as the program executes, detecting and containing/fixing problems on-the-fly. U ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Dynamic information flow tracking (DIFT) is an important tool for detecting common security attacks and memory bugs. A DIFT tool tracks the flow of information through a monitored program’s registers and memory locations as the program executes, detecting and containing/fixing problems on-the-fly. Unfortunately, sequential DIFT tools are quite slow, and DIFT is quite challenging to parallelize. In this paper, we present a new approach to parallelizing DIFT-like functionality. Extending our recent work on accelerating sequential DIFT, we consider a variant of DIFT that tracks the information flow only through unary operations (relaxed DIFT), and yet makes sense for detecting security attacks and memory bugs. We present a parallel algorithm for relaxed DIFT, based on symbolic inheritance tracking, which achieves linear speed-up asymptotically. Moreover, we describe techniques for reducing the constant factors, so that speed-ups can be obtained even with just a few processors. We implemented the algorithm in the context of a Log-Based Architectures (LBA) system, which provides hardware support for logging a program trace and delivering it to other (monitoring) processors. Our simulation results on SPEC benchmarks and a video player show that our parallel relaxed DIFT reduces the overhead to as low as 1.2X using 9 monitoring cores on a 16-core chip multiprocessor.
Enforcing Authorization Policies using Transactional Memory Introspection
- CCS'08
, 2008
"... Correct enforcement of authorization policies is a difficult task, especially for multi-threaded software. Even in carefully-reviewed code, unauthorized access may be possible in subtle corner cases. We introduce Transactional Memory Introspection (TMI), a novel reference monitor architecture that b ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Correct enforcement of authorization policies is a difficult task, especially for multi-threaded software. Even in carefully-reviewed code, unauthorized access may be possible in subtle corner cases. We introduce Transactional Memory Introspection (TMI), a novel reference monitor architecture that builds on Software Transactional Memory—a new, attractive alternative for writing correct, multi-threaded software. TMI facilitates correct security enforcement by simplifying how the reference monitor integrates with software functionality. TMI can ensure complete mediation of security-relevant operations, eliminate race conditions related to security checks, and simplify handling of authorization failures. We present the design and implementation of a TMI-based reference monitor and experiment with its use in enforcing authorization policies on four significant servers. Our experiments confirm the benefits of the TMI architecture and show that it imposes an acceptable runtime overhead.
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures
"... Scaling computations on emerging massive-core supercomputers is a daunting task, which coupled with the significantly lagging system I/O capabilities exacerbates applications ’ end-to-end performance. The I/O bottleneck often negates potential performance benefits of assigning additional compute cor ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Scaling computations on emerging massive-core supercomputers is a daunting task, which coupled with the significantly lagging system I/O capabilities exacerbates applications ’ end-to-end performance. The I/O bottleneck often negates potential performance benefits of assigning additional compute cores to an application. In this paper, we address this issue via a novel functional partitioning (FP) runtime environment that allocates cores to specific application tasks — checkpointing, de-duplication, and scientific data format transformation — so that the deluge of cores can be brought to bear on the entire gamut of application activities. The focus is on utilizing the extra cores to support HPC application I/O activities and also leverage solid-state disks in this context. For example, our evaluation shows that dedicating 1 core on an oct-core machine for checkpointing and its assist tasks using FP can improve overall execution time of a FLASH benchmark on 80 and 160 cores by 43.95 % and 41.34%, respectively. I.
Tolerating latency in replicated state machines through client speculation
"... Replicated state machines are an important and widelystudied methodology for tolerating a wide range of faults. Unfortunately, while replicas should be distributed geographically for maximum fault tolerance, current replicated state machine protocols tend to magnify the effects of high network laten ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Replicated state machines are an important and widelystudied methodology for tolerating a wide range of faults. Unfortunately, while replicas should be distributed geographically for maximum fault tolerance, current replicated state machine protocols tend to magnify the effects of high network latencies caused by geographic distribution. In this paper, we examine how to use speculative execution at the clients of a replicated service to reduce the impact of network and protocol latency. We first give design principles for using client speculation with replicated services, such as generating early replies and prioritizing throughput over latency. We then describe a mechanism that allows speculative clients to make new requests through replica-resolved speculation and predicated writes. We implement a detailed case study that applies this approach to a standard Byzantine fault tolerant protocol (PBFT) for replicated NFS and counter services. Client speculation trades in 18 % maximum throughput to decrease the effective latency under light workloads, letting us speed up run time on singleclient micro-benchmarks 1.08–19 × when the client is co-located with the primary. On a macro-benchmark, reduced latency gives the client a speedup of up to 5×. 1
permission. Towards Practical Taint Tracking
, 2010
"... personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires pri ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific

