Results 1 - 10
of
13
Static Cache Simulation and its Applications
, 1994
"... This work takes a fresh look at the simulation of cache memories. It introduces the technique of static cache simulation that statically predicts a large portion of cache references. To efficiently utilize this technique, a method to perform efficient on-the-fly analysis of programs in general is de ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
This work takes a fresh look at the simulation of cache memories. It introduces the technique of static cache simulation that statically predicts a large portion of cache references. To efficiently utilize this technique, a method to perform efficient on-the-fly analysis of programs in general is developed and proved correct. This method is combined with static cache simulation for a number of applications. The application of fast instruction cache analysis provides a new framework to evaluate instruction cache memories that outperforms even the fastest techniques published. Static cache simulation is shown to address the issue of predicting cache behavior, contrary to the belief that cache memories introduce unpredictability to real-time systems that cannot be efficiently analyzed. Static cache simulation for instruction caches provides a large degree of predictability for real-time systems. In addition, an architectural modification through bit-encoding is introduced that provides fu...
Effectively Exploiting Indirect Jumps
- Software Practice and Experience
, 1997
"... This dissertation describes a general code-improving transformation that can coalesce conditional branches into an indirect jump from a table. Applying this transformation allows an optimizer to exploit indirect jumps for many other coalescing opportunities besides the translation of multiway branch ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
This dissertation describes a general code-improving transformation that can coalesce conditional branches into an indirect jump from a table. Applying this transformation allows an optimizer to exploit indirect jumps for many other coalescing opportunities besides the translation of multiway branch statements. First, dataflow analysis is performed to detect a set of coalescent conditional branches, which are often separated by blocks of intervening instructions. Second, several techniques are applied to reduce the cost of performing an indirect jump operation, often requiring the execution of only two instructions on a SPARC. Finally, the control flow is restructured using code duplication to replace the set of branches with an indirect jump. Thus, the transformation essentially provides early resolution of conditional branches that may originally have been some distance from the point where the indirect jump is inserted. The transformation can be frequently applied with often significant reductions in the number of instructions executed, total cache work, and execution time. In fact, over twice the benefit was achieved from exploiting indirect jumps as a general code-improving transformation instead of using the traditional approach of producing indirect jumps as an intermediate code generation decision. In addition, the author show that with comparable branch target buffer support, indirect jumps improve branch prediction since they cause fewer mispredictions than the set of branches they replaced.
Supporting the Specification and Analysis of Timing Constraints
- Proceedings of the IEEE Real-Time Technology and Applications Symposium
, 1996
"... Real-time programmers have to deal with the problem of relating timing constraints associated with source code to sequences of machine instructions. This paper describes an environment to assist users inthe specification and analysis of timing constraints. A user is allowed specify timing constraint ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
Real-time programmers have to deal with the problem of relating timing constraints associated with source code to sequences of machine instructions. This paper describes an environment to assist users inthe specification and analysis of timing constraints. A user is allowed specify timing constraints within the source code of a C program. Auser interface for a timing analyzer was developed to depict whether these constraints were violated or met. In addition, the interface allows portions of programs to be quickly selected with the corresponding bounded times, source code lines, and machine instructions automatically displayed. The result is a user-friendly environment that supports the user specification and analysis of timing constraints at a high (source code) level and retains the accuracy of low (machine code) level analysis. 1.
Decreasing process memory requirements by overlapping program portions
- In Proceedings of the Hawaii International Conference on System Sciences
, 1998
"... Most compiler optimizations focus on saving time and sometimes occur at the expense of increasing size. Yet processor speeds continue to increase at a faster rate than main memory and disk access times. Processors are now frequently being used in embedded systems that often have strict limitations o ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Most compiler optimizations focus on saving time and sometimes occur at the expense of increasing size. Yet processor speeds continue to increase at a faster rate than main memory and disk access times. Processors are now frequently being used in embedded systems that often have strict limitations on the size of programs it can execute. Also, reducing the size of a program may result in improved memory hierarchy performance. This paper describes general techniques for decreasing the memory requirements for a process by automatically overlapping portions of a program. Live range analysis, similar to the analysis used for allocating variables to registers, is used to determine which pro gram portions conflict. Nonconflicting portions are assigned overlapping memory locations. The results show an average decrease of over 10% in process size for a variety of programs with minimal or no dynamic instruction increases. 1.
Coalescing Conditional Branches into Efficient Indirect Jumps
- Proceedings of the International Static Analysis Symposium
, 1997
"... Indirect jumps from tables are traditionally only generated by compilers as an intermediate code generation decision when translating multiway selection statements. However, making this decision during intermediate code generation poses problems. The research described in this paper resolves these p ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Indirect jumps from tables are traditionally only generated by compilers as an intermediate code generation decision when translating multiway selection statements. However, making this decision during intermediate code generation poses problems. The research described in this paper resolves these problems by using several types of static analysis as a framework for a code improving transformation that exploits indirect jumps from tables. First, control-flow analysis is performed that provides opportunities for coalescing branches generated from other control statements besides multiway selection statements. Second, the optimizer uses various techniques to reduce the cost of indirect jump operations by statically analyzing the context of the surrounding code. Finally, path and branch prediction analysis is used to provide a more accurate estimation of the benefit of coalescing a detected set of branches into a single indirect jump. The results indicate that the coalescing transformation can be frequently applied with significant reductions in the number of instructions executed and total cache work. This paper shows that static analysis can be used to implement an effective improving transformation for exploiting indirect jumps.
Vista: Vpo interactive system for tuning applications
- ACM Transactions on Embedded Computing Systems
, 2005
"... Software designers face many challenges when developing applications for embedded systems. One major challenge is meeting the conflicting constraints of speed, code size and power consumption. Embedded application developers often resort to hand-coded assembly language to meet these constraints sinc ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Software designers face many challenges when developing applications for embedded systems. One major challenge is meeting the conflicting constraints of speed, code size and power consumption. Embedded application developers often resort to hand-coded assembly language to meet these constraints since traditional optimizing compiler technology is usually of little help in addressing this challenge. The results are software systems that are not portable, less robust and more costly to develop and maintain. Another limitation is that compilers traditionally apply the optimizations to a program in a fixed order. However, it has long been known that a single ordering of optimization phases will not produce the best code for every application. In fact, the smallest unit of compilation in most compilers is typically a function and the programmer has no control over the code improvement process other than setting flags to enable or disable certain optimization phases. This paper describes a new code improvement paradigm implemented in a system called VISTA that can help achieve the cost/performance trade-offs that embedded applications demand. The VISTA system opens the code improvement process and gives the application programmer, when necessary, the ability to finely control it. VISTA also provides support for finding effective sequences of optimization phases. This support includes the ability to interactively get
Jello: a retargetable Just-In-Time compiler for LLVM bytecode
, 2002
"... We present the design and implementation of Jello, a retargetable Just-In-Time (JIT) compiler for the Intel IA-32 architecture. The input to Jello is a C program statically compiled to Low-Level Virtual Machine (LLVM) bytecode. Jello takes advantage of the features of the LLVM bytecode representatio ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We present the design and implementation of Jello, a retargetable Just-In-Time (JIT) compiler for the Intel IA-32 architecture. The input to Jello is a C program statically compiled to Low-Level Virtual Machine (LLVM) bytecode. Jello takes advantage of the features of the LLVM bytecode representation to permit efficient run-time code generation, while emphasizing retargetability. Our approach uses an abstract machine code representation in Static Single Assignment form that is machine-independent, but can handle machine-specific features such as implicit and explicit register references. Because this representation is target-independent, many phases of code generation can be target-independent, making the JIT easily retargetable to new platforms without changing the code generator. Jello's ultimate goal is to provide a flexible host for future research in runtime optimization for programs written in languages which are traditionally compiled statically.
Decreasing process memory requirements by overlapping program portions
- In Proceedings of the Hawaii International Conference on System Sciences
, 1998
"... Most of the time, faced with a time/space trade-off, a compiler writer will choose to optimize time, even at the cost of space. This was not always the case. Early in the history of computers, programmers would try everything they could think of to reduce the size of their code to get it to fit in t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Most of the time, faced with a time/space trade-off, a compiler writer will choose to optimize time, even at the cost of space. This was not always the case. Early in the history of computers, programmers would try everything they could think of to reduce the size of their code to get it to fit in the computer’s constrained space. As memory and

