Results 1 - 10
of
56
A Survey of Adaptive Optimization in Virtual Machines
- PROCEEDINGS OF THE IEEE, 93(2), 2005. SPECIAL ISSUE ON PROGRAM GENERATION, OPTIMIZATION, AND ADAPTATION
, 2004
"... Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimiza ..."
Abstract
-
Cited by 59 (6 self)
- Add to MetaCart
Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimization overhead. Second, modular
The Structure and Performance of Efficient Interpreters
- Journal of Instruction-Level Parallelism
, 2003
"... Interpreters designed for high general-purpose performance typically perform a large number of indirect branches (3.2%-13% of all executed instructions in our benchmarks). These branches consume... ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
(Show Context)
Interpreters designed for high general-purpose performance typically perform a large number of indirect branches (3.2%-13% of all executed instructions in our benchmarks). These branches consume...
Abstract YETI: a graduallY Extensible Trace Interpreter
, 2008
"... The implementation of new programming languages benefits from interpretation because it is simple, flexible and portable. The only downside is speed of execution, as there remains a large performance gap between even efficient interpreters and systems that include a just-in-time (JIT) compiler. Augm ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
(Show Context)
The implementation of new programming languages benefits from interpretation because it is simple, flexible and portable. The only downside is speed of execution, as there remains a large performance gap between even efficient interpreters and systems that include a just-in-time (JIT) compiler. Augmenting an interpreter with a JIT, however, is not a small task. Today, Java JITs are typically method-based. To compile whole methods, the JIT must re-implement much functionality already provided by the interpreter, leading to a “big bang ” development effort before the JIT can be deployed. Adding a JIT to an interpreter would be easier if we could more gradually shift from dispatching virtual instructions bodies implemented for the interpreter to running instructions compiled into native code by the JIT. We show that virtual instructions implemented as lightweight callable routines can form the basis for a very efficient interpreter. Our new technique, interpreted traces, identifies hot paths, or traces, as a virtual program is interpreted. By exploiting the way traces predict branch desti-nations our technique markedly reduces branch mispredictions caused by dispatch. Interpreted traces are a high-performance technique, running about 25 % faster than direct threading. We show that interpreted traces are a good starting point for a trace-based JIT. We extend
The Case for Virtual Register Machines
, 2004
"... Virtual machines (VMs) are a popular target for language implementers. A longrunning question in the design of virtual machines has been whether stack or register architectures can be implemented more efficiently with an interpreter. Many designers favour stack architectures since the location of op ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Virtual machines (VMs) are a popular target for language implementers. A longrunning question in the design of virtual machines has been whether stack or register architectures can be implemented more efficiently with an interpreter. Many designers favour stack architectures since the location of operands is implicit in the stack pointer. In contrast, the operands of register machine instructions must be specified explicitly. In this paper, we present a working system for translating stackbased Java virtual machine (JVM) code to a simple register code. We describe the translation process, the complicated parts of the JVM which make translation more difficult, and the optimisations needed to eliminate copy instructions. Experimental results show that a register format reduced the number of executed instructions by 34.88%, while increasing the number of bytecode loads by an average of 44.81%. Overall, this corresponds to an increase of 2.32 loads for each dispatch removed. We believe that the high cost of dispatches makes register machines attractive even at the cost of increased loads.
Context Threading: A Flexible and Efficient Dispatch Technique for Virtual Machine Interpreters
- IN CGO ’05: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION
, 2005
"... Direct-threaded interpreters use indirect branches to dispatch bytecodes, but deeply-pipelined architectures rely on branch prediction for performance. Due to the poor correlation between the virtual program’s control flow and the hardware program counter, which we call the context problem, direct t ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
(Show Context)
Direct-threaded interpreters use indirect branches to dispatch bytecodes, but deeply-pipelined architectures rely on branch prediction for performance. Due to the poor correlation between the virtual program’s control flow and the hardware program counter, which we call the context problem, direct threading’s indirect branches are poorly predicted by the hardware, limiting performance. Our dispatch technique, context threading, improves branch prediction and performance by aligning hardware and virtual machine state. Linear virtual instructions are dispatched with native calls and returns, aligning the hardware and virtual PC. Thus, sequential control flow is predicted by the hardware return stack. We convert virtual branching instructions to native branches, mobilizing the hardware’s branch prediction resources. We evaluate the impact of context threading on both branch prediction and performance using interpreters for Java and OCaml on the Pentium and PowerPC architectures. On the Pentium IV, our technique reduces mean mispredicted branches by 95%. On the PowerPC, it reduces mean branch stall cycles by 75 % for OCaml and 82 % for Java. Due to reduced branch hazards, context threading reduces mean execution time by 25 % for Java and by 19 % and 37 % for OCaml on the P4 and PPC970, respectively. We also combine context threading with a conservative inlining technique and find its performance comparable to that of selective inlining.
Retargeting JIT compilers by using C-compiler generated executable code
- IN PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT’ 04
, 2004
"... JIT compilers produce fast code, whereas interpreters are easy to port between architectures. We propose to combine the advantages of these language implementation techniques as follows: we generate native code by concatenating and patching machine code fragments taken from interpreter-derived code ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
JIT compilers produce fast code, whereas interpreters are easy to port between architectures. We propose to combine the advantages of these language implementation techniques as follows: we generate native code by concatenating and patching machine code fragments taken from interpreter-derived code (generated by a C compiler); we completely eliminate the interpreter dispatch overhead and accesses to the interpreted code by patching jump target addresses and other constants into the fragments. In this paper we present the basic idea, discuss some issues in more detail, and present results from a proof-of-concept implementation, providing speedups of up to 1.87 over the fastest previous interpreter-based technique, and performance comparable to simple native-code compilers. The effort required for retargeting our implementation from the 386 to the PPC architecture was less than a person-day.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps
"... Indirect jump instructions are used to implement increasinglycommon programming constructs such as virtual function calls, switch-case statements, jump tables, and interface calls. The performance impact of indirect jumps is likely to increase because indirect jumps with multiple targets are difficu ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
Indirect jump instructions are used to implement increasinglycommon programming constructs such as virtual function calls, switch-case statements, jump tables, and interface calls. The performance impact of indirect jumps is likely to increase because indirect jumps with multiple targets are difficult to predict even with specialized hardware. This paper proposes a new way of handling hard-to-predict indirect jumps: dynamically predicating them. The compiler (static or dynamic) identifies indirect jumps that are suitable for predication along with their control-flow merge (CFM) points. The hardware predicates the instructions between different targets of the jump and its CFM point if the jump turns out to be hard-to-predict at run time. If the jump would actually have been mispredicted, its dynamic predication eliminates a pipeline flush, thereby improving performance. Our evaluations show thatDynamicIndirectjumpPredication (DIP) improves the performance of a set of object-oriented applications including the Java DaCapo benchmark suite by 37.8 % compared to a commonly-used branch target buffer based predictor, while also reducing energy consumption by 24.8%. We compare DIP to three previously proposed indirect jump predictors and find that it provides the best performance and energy-efficiency.
A study of the influence of coverage on the relationship between static and dynamic coupling metrics
- Science of Computer Programming
, 2006
"... This paper examines the relationship between the static coupling between objects (CBO) metric and some of its dynamic counterparts. The dimensions of the relationship for Java programs are investigated, and the influence of instruction coverage on this relationship is measured. An empirical evaluati ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
This paper examines the relationship between the static coupling between objects (CBO) metric and some of its dynamic counterparts. The dimensions of the relationship for Java programs are investigated, and the influence of instruction coverage on this relationship is measured. An empirical evaluation of 14 Java programs taken from the SPEC JVM98 and the JOlden benchmark suites is conducted using the static CBO metric, six dynamic metrics and instruction coverage data. The results presented here confirm preliminary studies indicating the independence of static and dynamic coupling metrics, but point to a strong influence of coverage on the relationship. Based on this, this paper suggests that dynamic coupling metrics might be better interpreted in the context of coverage measures, rather than as stand-alone software metrics.
Blink: Advanced Display Multiplexing for Virtualized Applications
"... Providing untrusted applications with shared and dependable access to modern display hardware is of increasing importance. Our new display system, called Blink, safely multiplexes complex graphical content from multiple untrusted virtual machines onto a single Graphics Processing Unit (GPU). Blink d ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Providing untrusted applications with shared and dependable access to modern display hardware is of increasing importance. Our new display system, called Blink, safely multiplexes complex graphical content from multiple untrusted virtual machines onto a single Graphics Processing Unit (GPU). Blink does not allow clients to program the GPU directly, but instead provides a virtual processor abstraction to which they can program. Blink executes virtual processor programs and controls the GPU on behalf of the client, in a manner that reduces processing and context switching overheads. To achieve performance and safety, Blink employs just-in-time compilation and simple static analysis. 1
Optimizations for a Java Interpreter Using Instruction Set Enhancement
"... Several methods for optimizing Java interpreters have been proposed that involve augmenting the existing instruction set. In this paper we describe the design and implementation of three such optimizations for an efficient Java interpreter. Specialized instructions are new versions of existing instr ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Several methods for optimizing Java interpreters have been proposed that involve augmenting the existing instruction set. In this paper we describe the design and implementation of three such optimizations for an efficient Java interpreter. Specialized instructions are new versions of existing instructions with commonly occurring operands hardwired into them, which reduces operand fetching. Superinstructions are new Java instructions which perform the work of common sequences of instructions. Finally, instruction replication is the duplication of existing instructions with a view to improving branch prediction accuracy. We describe our basic interpreter, the interpreter generator we use to automatically create optimised source code for enhanced instructions, and discuss Java specific issues relating to these optimizations. Experimental results show significant speedups (up to a factor of 3.3, and realistic average speedups of 30%-35%) are attainable using these techniques.