Results 1 - 10
of
205
Dynamo: A Transparent Dynamic Optimization System
- ACM SIGPLAN NOTICES
, 2000
"... We describe the design and implementation of Dynamo, a software dynamic optimization system that is capable of transparently improving the performance of a native instruction stream as it executes on the processor. The input native instruction stream to Dynamo can be dynamically generated (by a JIT ..."
Abstract
-
Cited by 479 (2 self)
- Add to MetaCart
We describe the design and implementation of Dynamo, a software dynamic optimization system that is capable of transparently improving the performance of a native instruction stream as it executes on the processor. The input native instruction stream to Dynamo can be dynamically generated (by a JIT for example), or it can come from the execution of a statically compiled native binary. This paper evaluates the Dynamo system in the latter, more challenging situation, in order to emphasize the limits, rather than the potential, of the system. Our experiments demonstrate that even statically optimized native binaries can be accelerated Dynamo, and often by a significant degree. For example, the average performance of --O optimized SpecInt95 benchmark binaries created by the HP product C compiler is improved to a level comparable to their --O4 optimized version running without Dynamo. Dynamo achieves this by focusing its efforts on optimization opportunities that tend to manifest only at runtime, and hence opportunities that might be difficult for a static compiler to exploit. Dynamo's operation is transparent in the sense that it does not depend on any user annotations or binary instrumentation, and does not require multiple runs, or any special compiler, operating system or hardware support. The Dynamo prototype presented here is a realistic implementation running on an HP PA-8000 workstation under the HPUX 10.20 operating system.
Disco: Running commodity operating systems on scalable multiprocessors
- ACM Transactions on Computer Systems
, 1997
"... In this paper we examine the problem of extending modern operating systems to run efficiently on large-scale shared memory multiprocessors without a large implementation effort. Our approach brings back an idea popular in the 1970s, virtual machine monitors. We use virtual machines to run multiple c ..."
Abstract
-
Cited by 253 (10 self)
- Add to MetaCart
(Show Context)
In this paper we examine the problem of extending modern operating systems to run efficiently on large-scale shared memory multiprocessors without a large implementation effort. Our approach brings back an idea popular in the 1970s, virtual machine monitors. We use virtual machines to run multiple commodity operating systems on a scalable multiprocessor. This solution addresses many of the challenges facing the system software for these machines. We demonstrate our approach with a prototype called Disco that can run multiple copies of Silicon Graphics ’ IRIX operating system on a multiprocessor. Our experience shows that the overheads of the monitor are small and that the approach provides scalability as well as the ability to deal with the non-uniform memory access time of these systems. To reduce the memory overheads associated with running multiple operating systems, we have developed techniques where the virtual machines transparently share major data structures such as the program code and the file system buffer cache. We use the distributed system support of modern operating systems to export a partial single system image to the users. The overall solution achieves most of the benefits of operating systems customized for scalable multiprocessors yet it can be achieved with a significantly smaller implementation effort. 1
Managing Multi-Configurable Hardware via Dynamic Working Set Analysis
- In 29th Annual International Symposium on Computer Architecture
, 2002
"... Microprocessors are designed to provide good average performance over a variety of workloads. This can lead to inefficiencies both in power and performance for individual programs and during individual phases within the same program. Microarchitectures with multi-configuration units (e.g. caches, pr ..."
Abstract
-
Cited by 192 (3 self)
- Add to MetaCart
(Show Context)
Microprocessors are designed to provide good average performance over a variety of workloads. This can lead to inefficiencies both in power and performance for individual programs and during individual phases within the same program. Microarchitectures with multi-configuration units (e.g. caches, predictors, instruction windows) are able to adapt dynamically to program behavior and enable /disable resources as needed. A key element of existing configuration algorithms is adjusting to program phase changes. This is typically done by "tuning" when a phase change is detected -- i.e. sequencing through a series of trial configurations and selecting the best. We study algorithms that dynamically collect and analyze program working set information. To make this practical, we propose working set signatures -- highly compressed working set representations (e.g. 32-128 bytes total). We describe algorithms that use working set signatures to 1) detect working set changes and trigger re-tuning; 2) identify recurring working sets and re-install saved optimal reconfigurations, thus avoiding the time-consuming tuning process; 3) estimate working set sizes to configure caches directly to the proper size, also avoiding the tuning process. We use reconfigurable instruction caches to demonstrate the performance of the proposed algorithms. When applied to reconfigurable instruction caches, an algorithm that identifies recurring phases achieves power savings and performance similar to the best algorithm reported to date, but with orders-of-magnitude savings in retunings. 1
An Infrastructure for Adaptive Dynamic Optimization
, 2003
"... Dynamic optimization is emerging as a promising approach to overcome many of the obstacles of traditional static compilation. But while there are a number of compiler infrastructures for developing static optimizations, there are very few for developing dynamic optimizations. We present a framework ..."
Abstract
-
Cited by 189 (6 self)
- Add to MetaCart
Dynamic optimization is emerging as a promising approach to overcome many of the obstacles of traditional static compilation. But while there are a number of compiler infrastructures for developing static optimizations, there are very few for developing dynamic optimizations. We present a framework for implementing dynamic analyses and optimizations. We provide an interface for building external modules, or clients, for the DynamoRIO dynamic code modification system. This interface abstracts away many low-level details of the DynamoRIO runtime system while exposing a simple and powerful, yet efficient and lightweight, API. This is achieved by restricting optimization units to linear streams of code and using adaptive levels of detail for representing instructions. The interface is not restricted to optimization and can be used for instrumentation, profiling, dynamic translation, etc.. To demonstrate
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
- IN PROCEEDINGS OF THE 27TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 2000
"... Reconfigurable hardware has the potential for significant performance improvements by providing support for application−specific operations. We report our experience with Chimaera, a prototype system that integrates a small and fast reconfigurable functional unit (RFU) into the pipeline of an aggres ..."
Abstract
-
Cited by 109 (1 self)
- Add to MetaCart
(Show Context)
Reconfigurable hardware has the potential for significant performance improvements by providing support for application−specific operations. We report our experience with Chimaera, a prototype system that integrates a small and fast reconfigurable functional unit (RFU) into the pipeline of an aggressive, dynamically−scheduled superscalar processor. Chimaera is capable of performing 9−input/1−output operations on integer data. We discuss the Chimaera C compiler that automatically maps computations for execution in the RFU. Chimaera is capable of: (1) collapsing a set of instructions into RFU operations, (2) converting control−flow into RFU operations, and (3) supporting a more powerful fine−grain data−parallel model than that supported by current multimedia extension instruction sets (for integer operations). Using a set of multimedia and communication applications we show that even with simple optimizations, the Chimaera C compiler is able to map 22 % of all instructions to the RFU on the average. A variety of computations are mapped into RFU operations ranging from as simple as add/sub−shift pairs to operations of more than 10 instructions including several branches. Timing experiments demonstrate that for a 4−way out−of−order superscalar processor Chimaera results in average performance improvements of 21%, assuming a very aggressive core processor design (most pessimistic RFU latency model) and communication overheads from and to the RFU.
A dynamic compilation framework for controlling microprocessor energy and performance.
- In Microarchitecture, 2005. MICRO-38. Proceedings. 38th Annual IEEE/ACM International Symposium on
, 2005
"... ..."
(Show Context)
Retargetable and reconfigurable software dynamic translation
- In CGO ’03: Proceedings of the international symposium on Code generation and optimization
, 2003
"... Software dynamic translation (SDT) is a technology that permits the modification of an executing program’s instructions. In recent years, SDT has received increased attention, from both industry and academia, as a feasible and effective approach to solving a variety of significant problems. Despite ..."
Abstract
-
Cited by 84 (19 self)
- Add to MetaCart
(Show Context)
Software dynamic translation (SDT) is a technology that permits the modification of an executing program’s instructions. In recent years, SDT has received increased attention, from both industry and academia, as a feasible and effective approach to solving a variety of significant problems. Despite this increased attention, the task of initiating a new project in software dynamic translation remains a difficult one. To address this concern, and in particular, to promote the adoption of SDT technology into an even wider range of applications, we have implemented Strata, a cross-platform infrastructure for building software dynamic translators. This paper describes Strata’s architecture, our experience retargeting it to three different processors,
A brief history of just-in-time
- ACM Computing Surveys
"... Software systems have been using “just-in-time ” compilation (JIT) techniques since the 1960s. Broadly, JIT compilation includes any translation performed dynamically, after a program has started execution. We examine the motivation behind JIT compilation and constraints imposed on JIT compilation s ..."
Abstract
-
Cited by 79 (2 self)
- Add to MetaCart
Software systems have been using “just-in-time ” compilation (JIT) techniques since the 1960s. Broadly, JIT compilation includes any translation performed dynamically, after a program has started execution. We examine the motivation behind JIT compilation and constraints imposed on JIT compilation systems, and present a classification scheme for such systems. This classification emerges as we survey forty years of JIT work, from
System Support for Automatic Profiling and Optimization
"... The Morph system provides a framework for automatic collection and management of profile information and application of profile-driven optimizations. In this paper, we focus on the operating system support that is required to collect and manage profile information on an end-user’s workstation in an ..."
Abstract
-
Cited by 73 (6 self)
- Add to MetaCart
The Morph system provides a framework for automatic collection and management of profile information and application of profile-driven optimizations. In this paper, we focus on the operating system support that is required to collect and manage profile information on an end-user’s workstation in an automatic, continuous, and transparent manner. Our implementation for a Digital Alpha machine running Digital UNIX 4.0 achieves run-time overheads of less than 0.3 % during profile collection. Through the application of three code layout optimizations, we further show that Morph can use statistical profiles to improve application performance. With appropriate system support, automatic profiling and optimization is both possible and effective.
Safe virtual execution using software dynamic translation
- Annual Computer Security Applications Conference
, 2002
"... Safe virtual execution (SVE) allows a host computer system to reduce the risks associated with running untrusted programs. SVE prevents untrusted programs from directly accessing system resources, thereby giving the host the ability to control how individual resources may be used. SVE is used in a v ..."
Abstract
-
Cited by 71 (11 self)
- Add to MetaCart
(Show Context)
Safe virtual execution (SVE) allows a host computer system to reduce the risks associated with running untrusted programs. SVE prevents untrusted programs from directly accessing system resources, thereby giving the host the ability to control how individual resources may be used. SVE is used in a variety of safety-conscious software systems, including the Java Virtual Machine (JVM), software fault isolation (SFI), system call interposition layers, and execution monitors. While SVE is the conceptual foundation for these systems, each uses a different implementation technology. The lack of a unifying framework for building SVE systems results in a variety of problems: many useful SVE systems are not portable and therefore are usable only on a limited number of platforms; code reuse among different SVE systems is often difficult or impossible; and building SVE systems from scratch can be both time consuming and error prone. To address these concerns, we have developed a portable, extensible framework for constructing SVE systems. Our framework, called Strata, is based on software dynamic translation (SDT), a technique for modifying binary programs as they execute. Strata is designed to be ported easily to new platforms and to date has been targeted to SPARC/Solaris, x86/Linux, and MIPS/IRIX. This portability ensures that SVE applications implemented in Strata are available to a wide variety of host systems. Strata also affords the opportunity for code reuse among different SVE applications by establishing a common implementation framework. Strata implements a basic safe virtual execution engine using SDT. The base functionality supplied by this engine is easily extended to implement specific SVE systems. In this paper we describe the organization of Strata and demonstrate its extension by building two SVE systems: system call interposition and stacksmashing prevention. To illustrate the use of the system call interposition extensions, the paper presents implementations of several useful security policies.