Results 1 - 10
of
19
Efficient Software-Based Fault Isolation
, 1993
"... One way to provide fault isolation among cooperating software modules is to place each in its own address space. However, for tightly-coupled modules, this solution incurs prohibitive context switch overhead. In this paper, we present a software approach to implementing fault isolation within a sing ..."
Abstract
-
Cited by 627 (11 self)
- Add to MetaCart
One way to provide fault isolation among cooperating software modules is to place each in its own address space. However, for tightly-coupled modules, this solution incurs prohibitive context switch overhead. In this paper, we present a software approach to implementing fault isolation within a single address space. Our approach has two parts. First, we load the code and data for a distrusted module into its own fault domain, a logically separate portion of the application's address space. Second, we modify the object code of a distrusted module to prevent it from writing or jumping to an address outside its fault domain. Both these software operations are portable and programming language independent. Our approach poses a tradeo relative to hardware fault isolation: substantially faster communication between fault domains, at a cost of slightly increased execution time for distrusted modules. We demonstrate that for frequently communicating modules, implementing fault isolation in software rather than hardware can substantially improve end-to-end application performance.
Using Profile Information to Assist Classic Code Optimizations
- SOFTWARE---PRACTICE AND EXPERIENCE
, 1991
"... This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new components, an execution profiler and a profile-based code optimizer, which are not commonly found in tra ..."
Abstract
-
Cited by 116 (13 self)
- Add to MetaCart
This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new components, an execution profiler and a profile-based code optimizer, which are not commonly found in traditional optimizing compilers. The execution profiler inserts probes into the input program, executes the input program for several inputs, accumulates profile information and supplies this information to the optimizer. The profile-based code optimizer uses the profile information to expose new optimization opportunities that are not visible to traditional global optimization methods. Experimental results show that the profile-based code optimizer significantly improves the performance of production C programs that have already been optimized by a high-quality global code optimizer
Optimization of Instruction Fetch Mechanisms for High Issue Rates
- In 22nd Annual International Symposium on Computer Architecture
, 1995
"... Recent superscalar processors issue four instructions per cycle. These processors are also powered by highly-parallel superscalar cores. The potential performance can only be exploited when fed by high instruction bandwidth. This task is the responsibility of the instruction fetch unit. Accurate bra ..."
Abstract
-
Cited by 115 (4 self)
- Add to MetaCart
Recent superscalar processors issue four instructions per cycle. These processors are also powered by highly-parallel superscalar cores. The potential performance can only be exploited when fed by high instruction bandwidth. This task is the responsibility of the instruction fetch unit. Accurate branch prediction and low I-cache miss ratios are essential for the efficient operation of the fetch unit. Several studies on cache design and branch prediction address this problem. However, these techniques are not sufficient. Even in the presence of efficient cache designs and branch prediction, the fetch unit must continuously extract multiple, non-sequential instructions from the instruction cache, realign these in the proper order, and supply them to the decoder. This paper explores solutions to this problem and presents several schemes with varying degrees of performance and cost. The most-general scheme, the collapsing buffer, achieves near-perfect performance and consistently aligns in...
Cache Behavior Prediction by Abstract Interpretation
- Science of Computer Programming
, 1996
"... 1 Cache Memories and Real-Time Applications Caches are used to improve the access times of fast microprocessors to relatively slow main memories. They can reduce the number of cycles a processor is waiting for data by providing faster access to recently referenced regions of memory1. Programs with h ..."
Abstract
-
Cited by 72 (11 self)
- Add to MetaCart
1 Cache Memories and Real-Time Applications Caches are used to improve the access times of fast microprocessors to relatively slow main memories. They can reduce the number of cycles a processor is waiting for data by providing faster access to recently referenced regions of memory1. Programs with hard real time constraints have to be subjected to a schedulability analysis by the compiler [17, 6]; it has to be determined whether all timing constraints can be satisfied. WCETs (Worst Case Execution Times) for processes have to be used for this. For hardware with caches, the appropriate worst case assumption is that all accesses miss the cache. This is an overly pessimistic assumption which leads to a waste of hardware resources. 1 Hennessy and Patterson [8] describe typical values for caches in 1990 workstations
Generating Efficient Protocol Code from an Abstract Specification
, 1996
"... A protocol compiler takes as input an abstract specification of a protocol and generates an implementation of that protocol. Protocol compilers usually produce inefficient code both in terms of code speed and code size. In this paper, we show that by compiling a modular specification into an integra ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
A protocol compiler takes as input an abstract specification of a protocol and generates an implementation of that protocol. Protocol compilers usually produce inefficient code both in terms of code speed and code size. In this paper, we show that by compiling a modular specification into an integrated automaton and by selectively optimizing its different transitions, it is possible to automatically generate efficient protocol code. Our protocol compiler takes as input a protocol specification in the synchronous language Esterel and compiles it into a C implementation. This process is divided into two stages. First, the specicfiation is compiled into an integrated automaton by the Esterel front end. This automaton is then optimized and converted into an efficient C implementation by a protocol code optimizer called HIPPCO. HIPPCO improves performance and reduces code size by simultaneously optimizing the performance of common path whi...
Using Profile Information to Assist Advanced Compiler Optimization and Scheduling
"... Compilers for superscalar and VLIW processors must expose sufficient instruction-level parallelism in order to achieve high performance. Compiletime code transformations which expose instruction-level parallelism typically take into account the constraints imposed by all execution scenarios in the p ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Compilers for superscalar and VLIW processors must expose sufficient instruction-level parallelism in order to achieve high performance. Compiletime code transformations which expose instruction-level parallelism typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instruction-level parallelism along the frequent execution scenario at the expense of the less frequent execution sequences. Profile information identifies these important execution sequences in a program. In this paper, two major categories of profile information are studied: control-flow and memory-dependence. Profile-based transformations have been incorporated into the IMPACT compiler. These transformations include global optimization, acyclic global scheduling, and software pipelining. The effectiveness of these profile-based techniques is evaluated for a range of superscalar and VLIW processors.
Biray rewriting of an operating system kernel
- In Proc. Workshop on Binary Instrumentation and Applications
, 2006
"... This paper deals with some of the issues that arise in the context of binary rewriting and instrumentation of an operating system kernel. OS kernels are very different from ordinary application code in many ways, e.g., they contain a significant ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper deals with some of the issues that arise in the context of binary rewriting and instrumentation of an operating system kernel. OS kernels are very different from ordinary application code in many ways, e.g., they contain a significant
Accurate Data Distribution Into Blocks May Boost Cache Performance
, 1996
"... Applications often under-utilize cache space, generating unnecessarily high cache miss ratios. Data distribution is a software technique which could improve cache miss rates for any types of application. There is a great potential to exploit: data distribution can reduce capacity misses as well as c ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Applications often under-utilize cache space, generating unnecessarily high cache miss ratios. Data distribution is a software technique which could improve cache miss rates for any types of application. There is a great potential to exploit: data distribution can reduce capacity misses as well as conflict misses.
Reducing Load Delay to Improve Performance of Internet-Computing Programs
, 2001
"... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii I Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 II Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 A. Implementation of ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii I Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 II Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 A. Implementation of the Java Language Specification . . . . . . . . . . . . . . . . 8 1. Access Rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2. Class File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 B. The Java Virtual Machine (JVM) . . . . . . . . . . . . . . . . . . . . . . . . . . 10 C. Applets v/s Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 D. The Java Execution Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 III Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 A. Transfer Delay Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....
Compact and Efficient Presentation Conversion Code
, 1997
"... Presentation conversion is a key operation in any development environment for distributed applications, such as Corba, Java-RMI, DCE or ASN.1-based environments. It is also well-known performance bottleneck in high-speed network communication. Presentation conversion code is usually generated by an ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Presentation conversion is a key operation in any development environment for distributed applications, such as Corba, Java-RMI, DCE or ASN.1-based environments. It is also well-known performance bottleneck in high-speed network communication. Presentation conversion code is usually generated by an automatic code generation tool referred to as stub compiler. The quality of the code generated by a stub compiler is often very low. The code is either very slow, or has a large code size, or both. This paper describes the design and experimental evaluation of an optimization stage for a stub compiler. The optimization stage automates the trade-off between code size and execution speed of the code generated by the compiler. This is achieved by using a hybrid of two implementation alternatives for presentation conversion routines (interpreted and procedure-driven code). The optimization problem is modeled as a Knapsack problem. A Markov model in combination with a heuristic branch predictor is used for...

