Results 1 - 10
of
479
Valgrind: A framework for heavyweight dynamic binary instrumentation
- In Proceedings of the 2007 Programming Language Design and Implementation Conference
, 2007
"... Dynamic binary instrumentation (DBI) frameworks make it easy to build dynamic binary analysis (DBA) tools such as checkers and profilers. Much of the focus on DBI frameworks has been on performance; little attention has been paid to their capabilities. As a result, we believe the potential of DBI ha ..."
Abstract
-
Cited by 545 (5 self)
- Add to MetaCart
(Show Context)
Dynamic binary instrumentation (DBI) frameworks make it easy to build dynamic binary analysis (DBA) tools such as checkers and profilers. Much of the focus on DBI frameworks has been on performance; little attention has been paid to their capabilities. As a result, we believe the potential of DBI has not been fully exploited. In this paper we describe Valgrind, a DBI framework designed for building heavyweight DBA tools. We focus on its unique support for shadow values—a powerful but previously little-studied and difficult-to-implement DBA technique, which requires a tool to shadow every register and memory value with another value that describes it. This support accounts for several crucial design features that distinguish Valgrind from other DBI frameworks. Because of these features, lightweight tools built with Valgrind run comparatively slowly, but Valgrind can be used to build more interesting, heavyweight tools that are difficult or impossible to build with other DBI frameworks such as Pin and DynamoRIO. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging—debugging aids, monitors; D.3.4
Valgrind: A program supervision framework
- In Third Workshop on Runtime Verification (RV’03
, 2003
"... a;1 ..."
(Show Context)
A comparison of software and hardware techniques for x86 virtualization
- in ASPLOS-XII: Proceedings of the 12th international conference on Architectural
, 2006
"... Until recently, the x86 architecture has not permitted classical trap-and-emulate virtualization. Virtual Machine Monitors for x86, such as VMware R ○ Workstation and Virtual PC, have instead used binary translation of the guest kernel code. However, both Intel and AMD have now introduced architectu ..."
Abstract
-
Cited by 205 (3 self)
- Add to MetaCart
(Show Context)
Until recently, the x86 architecture has not permitted classical trap-and-emulate virtualization. Virtual Machine Monitors for x86, such as VMware R ○ Workstation and Virtual PC, have instead used binary translation of the guest kernel code. However, both Intel and AMD have now introduced architectural extensions to support classical virtualization. We compare an existing software VMM with a new VMM designed for the emerging hardware support. Surprisingly, the hardware VMM often suffers lower performance than the pure software VMM. To determine why, we study architecture-level events such as page table updates, context switches and I/O, and find their costs vastly different among native, software VMM and hardware VMM execution. We find that the hardware support fails to provide an unambiguous performance advantage for two primary reasons: first, it offers no support for MMU virtualization; second, it fails to co-exist with existing software techniques for MMU virtualization. We look ahead to emerging techniques for addressing this MMU virtualization problem in the context of hardware-assisted virtualization.
A Framework for Reducing the Cost of Instrumented Code
- In SIGPLAN Conference on Programming Language Design and Implementation
, 2001
"... Instrumenting code to collect profiling information can cause substantial execution overhead. This overhead makes instrumentation difficult to perform at runtime, often preventing many known offline feedback-directed optimizations from being used in online systems. This paper presents a general fram ..."
Abstract
-
Cited by 200 (11 self)
- Add to MetaCart
Instrumenting code to collect profiling information can cause substantial execution overhead. This overhead makes instrumentation difficult to perform at runtime, often preventing many known offline feedback-directed optimizations from being used in online systems. This paper presents a general framework for performing instrumentation sampling to reduce the overhead of previously expensive instrumentation. The framework is simple and effective, using code-duplication and counter-based sampling to allow switching between instrumented and non-instrumented code.
Managing Multi-Configurable Hardware via Dynamic Working Set Analysis
- In 29th Annual International Symposium on Computer Architecture
, 2002
"... Microprocessors are designed to provide good average performance over a variety of workloads. This can lead to inefficiencies both in power and performance for individual programs and during individual phases within the same program. Microarchitectures with multi-configuration units (e.g. caches, pr ..."
Abstract
-
Cited by 192 (3 self)
- Add to MetaCart
(Show Context)
Microprocessors are designed to provide good average performance over a variety of workloads. This can lead to inefficiencies both in power and performance for individual programs and during individual phases within the same program. Microarchitectures with multi-configuration units (e.g. caches, predictors, instruction windows) are able to adapt dynamically to program behavior and enable /disable resources as needed. A key element of existing configuration algorithms is adjusting to program phase changes. This is typically done by "tuning" when a phase change is detected -- i.e. sequencing through a series of trial configurations and selecting the best. We study algorithms that dynamically collect and analyze program working set information. To make this practical, we propose working set signatures -- highly compressed working set representations (e.g. 32-128 bytes total). We describe algorithms that use working set signatures to 1) detect working set changes and trigger re-tuning; 2) identify recurring working sets and re-install saved optimal reconfigurations, thus avoiding the time-consuming tuning process; 3) estimate working set sizes to configure caches directly to the proper size, also avoiding the tuning process. We use reconfigurable instruction caches to demonstrate the performance of the proposed algorithms. When applied to reconfigurable instruction caches, an algorithm that identifies recurring phases achieves power savings and performance similar to the best algorithm reported to date, but with orders-of-magnitude savings in retunings. 1
Understanding data lifetime via whole system simulation
- In USENIX Security Symposium
, 2004
"... Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. ..."
Abstract
-
Cited by 192 (5 self)
- Add to MetaCart
(Show Context)
Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein.
Adaptive Optimization in the Jalapeno JVM
- In ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA
, 2000
"... (*58()9$"2#$:0/,;58(03<10/2,>=?33@">"29 #A:0*/,B58(*C2"258/052,D3*>#$,,6-*0'/ 58@F,058*,+HG?!"*0"I"252J58K0/ ,6-*0'/ 030"6N*IO40"58DP)"58QF,058SRUT6252,D<0!2T6252,V52!8("9 "W5X3,06*9E,'Y58(*03C:0'/ X3,06 ..."
Abstract
-
Cited by 180 (14 self)
- Add to MetaCart
(*58()9$"2#$:0/,;58(03<10/2,>=?33@">"29 #A:0*/,B58(*C2"258/052,D3*>#$,,6-*0'/ 58@F,058*,+HG?!"*0"I"252J58K0/ ,6-*0'/ 030"6N*IO40"58DP)"58QF,058SRUT6252,D<0!2T6252,V52!8("9 "W5X3,06*9E,'Y58(*03C:0'/ X3,06*9E,'Y58(*03C 1622 *'\,20/2XD3Q#$,U-0/269EU,/52,X"58QF,0'58,+ I,2/2-K58X^528-3L2T6252,_0/252/,58('4-*0'2,Y 0C#$,058Z#>58,0@=`58a02T/2*(*C/,':b(/,058c+ \",25C0d@"3,152058[#;58!*03e0/252,/58( 5805f8(""52<00"58>b(3589$3,3*"*58QF058C-02,;"(3T Y2520'58258/,03@20'Q"3+ ] D,Q"...
Randomized Instruction Set Emulation To Disrupt Binary . . .
- ACM TRANSACTIONS ON INFORMATION SYSTEM SECURITY
, 2003
"... Many remote attacks against computer systems inject binary code into the execution path of a running program, gaining control of the program's behavior. If each defended system or program could use a machine instruction set that was both unique and private, such binary code injection attacks ..."
Abstract
-
Cited by 155 (3 self)
- Add to MetaCart
Many remote attacks against computer systems inject binary code into the execution path of a running program, gaining control of the program's behavior. If each defended system or program could use a machine instruction set that was both unique and private, such binary code injection attacks would become extremely difficult if not impossible. A binary-to-binary translator provides an economic and flexible implementation path for realizing that idea. As a proof of concept, we describe a randomized instruction set emulator (RISE) based on the open-source Valgrind x86-to-x86 binary translator. Although currently very slow and memory-intensive, our prototype RISE can indeed disrupt binary code injection attacks against a program without requiring its recompilation, linking, or access to source code. We describe the RISE implementation, give evidence demonstrating that RISE defeats common attacks, consider consequences of the dense x86 instruction set on the method's effects, and discuss limitations of the RISE prototype as well as design tradeoffs and extensions of the underlying idea.
N-variant systems: A secretless framework for security through diversity
- In Proceedings of the 15th USENIX Security Symposium
, 2006
"... We present an architectural framework for systematically using automated diversity to provide high assurance detection and disruption for large classes of attacks. The framework executes a set of automatically diversified variants on the same inputs, and monitors their behavior to detect divergences ..."
Abstract
-
Cited by 119 (3 self)
- Add to MetaCart
(Show Context)
We present an architectural framework for systematically using automated diversity to provide high assurance detection and disruption for large classes of attacks. The framework executes a set of automatically diversified variants on the same inputs, and monitors their behavior to detect divergences. The benefit of this approach is that it requires an attacker to simultaneously compromise all system variants with the same input. By constructing variants with disjoint exploitation sets, we can make it impossible to carry out large classes of important attacks. In contrast to previous approaches that use automated diversity for security, our approach does not rely on keeping any secrets. In this paper, we introduce the N-variant systems framework, present a model for analyzing security properties of N-variant systems, define variations that can be used to detect attacks that involve referencing absolute memory addresses and executing injected code, and describe and present performance results from a prototype implementation. 1.
Dynamic hot data stream prefetching for general-purpose programs
- InACM SIGPLANConference on Programming Language Designand Implementation
, 2002
"... Prefetching data ahead of use has the potential to tolerate the growing processor-memory performance gap by overlapping long latency memory accesses with useful computation. While sophisticated prefetching techniques have been automated for limited domains, such as scientific codes that access dense ..."
Abstract
-
Cited by 116 (2 self)
- Add to MetaCart
(Show Context)
Prefetching data ahead of use has the potential to tolerate the growing processor-memory performance gap by overlapping long latency memory accesses with useful computation. While sophisticated prefetching techniques have been automated for limited domains, such as scientific codes that access dense arrays in loop nests, a similar level of success has eluded general-purpose programs, especially pointer-chasing codes written in languages such as C and C++. We address this problem by describing, implementing and evaluating a dynamic prefetching scheme. Our technique runs on stock hardware, is completely automatic, and works for generalpurpose programs, including pointer-chasing codes written in weakly-typed languages, such as C and C++. It operates in three phases. First, the profiling phase gathers a temporal data reference profile from a running program with low-overhead. Next, the profiling is turned off and a fast analysis algorithm extracts hot data streams, which are data reference sequences that frequently repeat in the same order, from the temporal profile. Then, the system dynamically injects code at appropriate program points to detect and prefetch these hot data streams. Finally, the process enters the hibernation phase where no profiling or analysis is performed, and the program continues to execute with the added prefetch instructions. At the end of the hibernation phase, the program is deoptimized to remove the inserted checks and prefetch instructions, and control returns to the profiling phase. For long-running programs, this profile, analyze and optimize, hibernate, cycle will repeat multiple times. Our initial results from applying dynamic prefetching are promising, indicating overall execution time improvements of 5–19 % for several memory-performance-limited SPECint2000 benchmarks running their largest (ref) inputs.