This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.
614.4 LEDA - A Platform for Combinatorial and Geometric Computing - Mehlhorn, Näher (1995)(Correct)
LEDA is a library of efficient data types and algorithms in combinatorial and
geometric computing. The main features of the library are its wide collection
of data types and algorithms, the precise an... / can be used with any Ccompiler supporting templates e.g. cfront br areas such as discrete optimization scheduling traffic control
531.9 Parallel Programming in Split-C - Culler (1993)(Correct)
We introduce the Split-C language, a parallel extension
of C intended for high performance programming
on distributed memory multiprocessors, and demonstrate
the use of the language in optimizing para... / the programmer specifies the compiler takes care of addressing and br or locality is not limited by the compiler s recognition capability nor
480.8 MediaBench: A Tool for Evaluating and Synthesizing Multimedia and.. - Lee (1997)(Correct)
Over the last decade, significant advances have been made in compilation technology for capitalizing on instruction-level parallelism (ILP). The vast majority of ILP compilation research has been cond... / matched to the needs of the ILP compilers. Most of these processors are br currently exists a gap between the compiler community and embedded
469.5 Performance of Various Computers Using Standard Linear Equations.. - Jack Dongarra (1995)(Correct)
This report compares the performance of different computer systems in solving dense systems
of linear equations. The comparison involves approximately a hundred computers, ranging from
a Cray Y-MP to ... / or multiple processors. The compilers on some machines may of br gives the operating system and compiler used. The run was based on two
451.4 Automatically Tuned Linear Algebra Software - Whaley, Dongarra (1998)(Correct)
This paper describes an approach for the automatic generation and optimization of
numerical software for processors with deep memory hierarchies and pipelined functional
units. The production of suc... / . . Why Can't the Compiler Do This br Conclusions A BLAS and compiler details List of Tables
400.0 Shade: A Fast Instruction-Set Simulator for Execution Profiling - Bob Cmelik (1993)(Correct)
Tracing tools are used widely to help analyze, design, and tune
both hardware and software systems. This paper describes a tool
called Shade which combines efficient instruction-set simulation
with a ... / everything from architectures to compilers to applications. Analyzers can br on particular languages and compilers. Ideally it should also avoid
400.0 A Survey of Program Slicing Techniques - Tip (1995)(Correct)
A program slice consists of the parts of a program that (potentially) affect the
values computed at some point of interest, referred to as a slicing criterion. The task
of computing program slices is ... / are investigated. We discuss how compiler-optimization techniques can be br Section . Section suggests how compiler-optimization techniques may be
391.3 Simultaneous Multithreading: Maximizing On-Chip Parallelism - Tullsen, Eggers, Levy (1995)(Correct)
This paper examines simultaneous multithreading, a technique permitting
several independent threads to issue instructions to a superscalar
's multiple functional units in a single cycle. We present
se... / Multiflow trace scheduling compiler Our results show the br the workload and the wide-issue compiler optimization and scheduling
379.3 Compiler Transformations for High-Performance Computing - Bacon (1993)(Correct)
In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by t... / Compiler Transformations for br three decades a large number of compiler transformations for optimizing
374.4 JavaParty - Transparent Remote Objects in Java - Philippsen, Zenger (1997)(Correct)
Java's threads offer appropriate means either for parallel programming of SMPs or as target constructs when compiling add-on features (e.g. forall constructs, automatic parallelization, etc.) Unfortun... / to specific nodes of the network compiler and runtime system deal with br selected and changed at runtime. Compiler analysis or a well-informed
365.2 Value Locality and Load Value Prediction - Lipasti, al. (1996)(Correct)
Since the introduction of virtual memory demand-paging
and cache memories, computer systems have been exploiting
spatial and temporal locality to reduce the average latency of a
memory reference. In t... / by modern state-of-the-art compilers exhibits these tendencies. We br for e.g. a switch statement the compiler must generate code to load a
344.8 Uniprocessor Garbage Collection Techniques - Wilson (1992)(Correct)
We survey basic garbage collection algorithms, and variations such as incremental and generational collection; we then discuss low-level implementation considerations and the relationships between sto... / systems languages and compilers. Throughout we attempt to br and Smart Pointers . Compiler Cooperation and Optimizations
342.0 A Metaobject Protocol for C++ - Chiba (1995)(Correct)
This paper presents a metaobject protocol (MOP)
for C++. This MOP was designed to bring the
power of meta-programming to C++ programmers.
It avoids penalties on runtime performance
by adopting a new m... / objects or customized compiler optimizations such Appeared in br extensions but also ones for compiler optimizations. From the
336.2 TIL: A Type-Directed Optimizing Compiler for ML - Tarditi, Morrisett, Cheng (1995)(Correct)
We describe a new compiler for Standard ML called TIL, that is based on four technologies: intensional
polymorphism, tag-free garbage collection, conventional functional language optimization,
and loo... / TIL A Type-Directed Optimizing Compiler for ML David Tarditi Greg br Abstract We describe a new compiler for Standard ML called TIL that
327.2 Typed Memory Management in a Calculus of Capabilities - Crary, Walker, Morrisett (1999)(Correct)
An increasing number of systems rely on programming language
technology to ensure safety and security of low-level
code. Unfortunately, these systems typically rely on a complex,
trusted garbage colle... / type-safe code. We present a compiler intermediate language called the br heavily optimized by hand or by compiler and yet be automatically
314.2 Implementing Multiple Protection Domains in Java - Hawblitzel, Chang, Czajkowski, Hu.. (1998)(Correct)
Safe language technology can be used for protection within a single address space. This protection
is enforced by the language's type system, which ensures that references to objects cannot be
forged... / due to current Java just-in-time compilers optimizing for fast compile br in the case of Java just-in-time compilers have the opportunity to perform
313.0 Compiling Polymorphism Using Intensional Type Analysis - Harper, Morrisett (1995)(Correct)
Traditional techniques for implementing polymorphism use a universal representation for objects of unknown type. Often, this forces a compiler to use universal representations even if the types of obj... / type. Often this forces a compiler to use universal representations br Introduction Many compilers assume a universal or boxed
285.7 Dependent Types in Practical Programming - Xi (1998)(Correct)
Programming is a notoriously error-prone process, and a great deal of evidence in practice has demonstrated that the use of a type system in a programming language can effectively detect program error... / program error detection and compiler optimization. A major br ones. The use of types for compiler optimization such as passing
281.1 Optimization of Object-Oriented Programs Using Static Class Hierarchy .. - Dean, Grove, Chambers (1995)(Correct)
Optimizing compilers for object-oriented languages apply static
class analysis and other techniques to try to deduce precise information about
the possible classes of the receivers of messages; if s... / Abstract. Optimizing compilers for object-oriented languages br class hierarchy analysis the compiler can improve the quality of static
268.5 Titanium: A High-Performance Java Dialect - Yelick, Semenzato, Pike, Miyamoto.. (1998)(Correct)
Titanium is a language and system for high-performance parallel scientific computing. Titanium
uses Java as its base, thereby leveraging the advantages of that language and allowing us to focus
attent... / on heroic parallelizing compiler technology and the consequent br and the consequent absence of compilers and tools and the
263.7 Optimizing ML with Run-Time Code Generation - Leone, Lee (1995)(Correct)
We describe the design and implementation of a compiler that automatically translates ordinary
programs written in a subset of ML into code that generates native code at run time. Run-time
code genera... / the design and implementation of a compiler that automatically translates br Our system called Fabius is a compiler that takes ordinary programs
257.9 Dealing With Disaster: Surviving Misbehaved Kernel Extensions - Seltzer (1996)(Correct)
Today's extensible operating systems allow applications
to modify kernel behavior by providing mechanisms for
application code to run in the kernel address space. The
advantage of this approach is tha... / can take advantage of advanced compiler optimization techniques br e.g.compiled with the correct compiler Finally we must limit the
254.5 A More Efficient RMI for Java - Nester, Philippsen, Haumacher (1999)(Correct)
In current Java implementations, Remote Method Invocation (RMI) is too slow, especially for high performance computing. RMI is designed for wide-area and high-latency networks, it is based on a slow o... / is currently working on the compiler project Manta Manta has br platforms or particular native compilers. There are other approaches to
245.7 Putting Pointer Analysis To Work - Ghiya (1998)(Correct)
Pointer analysis has recently been a subject of active research. The focus of most
techniques is on: (1) estimating the targets for stack-directed pointers, (2) computing
relationships between heap-di... / results of pointer analysis for compiler optimizations. This thesis br information to a wide variety of compiler applications. That is once the
245.7 Cache-Conscious Data Placement - Calder, Krintz, John, Austin (1998)(Correct)
As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instr... / of processor performance. Compiler techniques have been used to br Data Placement. This is a compiler directed approach that creates
236.3 Nonlinear Array Layouts for Hierarchical Memory Systems - Chatterjee, Jain, Lebeck, Mundhra.. (1999)(Correct)
Programming languages that provide multidimensional arrays and
a flat linear model of memory must implement a mapping between
these two domains to order array elements in memory. This layout
function ... / in several high-performance compilers. Tiling techniques are also br by the programmer or by the compiler and examine the additional
232.0 Compiler Optimizations for Improving Data Locality - Carr, McKinley, Tseng (1994)(Correct)
In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs e... / Compiler Optimizations for Improving Data br In this paper we present compiler optimizations to improve data
231.8 Improving Data Locality with Loop Transformations - McKinley (1996)(Correct)
this article, we present compiler optimizations to improve data locality based on a simple yet accurate cost model. The model computes both temporal and spatial unknown Improving Data Locality with Lo... / In this article we present compiler optimizations to improve data br Languages Processors-compilers optimization General
226.0 Type-Directed Partial Evaluation - Danvy (1996)(Correct)
We present a strikingly simple partial evaluator, that is typedirected
and reifies a compiled program into the text of a residual,
specialized program. Our partial evaluator is concise
(a few lines) a... / subtyping and coercions compiler optimization and run-time code br semantics-based compilation and compiler generation. Background and
218.1 The Jalapeño Dynamic Optimizing Compiler for Java - Burke, Choi, Fink, Grove, Hind.. (1999)(Correct)
The Jalape~no Dynamic Optimizing Compiler is a key component of the Jalape~no Virtual Machine, a new Java 1 Virtual Machine (JVM) designed to support efficient and scalable execution of Java applicati... / The Jalape no Dynamic Optimizing Compiler for Java TM Michael G. br The Jalape no Dynamic Optimizing Compiler is a key component of the
218.1 TALx86: A Realistic Typed Assembly Language - Morrisett, Crary, Glew, Grossman.. (1999)(Correct)
In previous work, we presented a formalism for a statically
typed, idealized assembly language called TAL.
The goal of TAL was to provide an extremely lowlevel,
statically-typed target language that i... / in practice just-in-time JIT compilers are used to achieve acceptable br verification an error in the compiler can introduce a security hole.
217.3 Data and Computation Transformations for Multiprocessors - Anderson (1995)(Correct)
Effective memory hierarchy utilization is critical to the performance of modern multiprocessor architectures. We havedeveloped the first compiler system that fully automatically parallelizes sequentia... / We havedeveloped the first compiler system that fully automatically br framework. We ran our compiler on a set of application programs
214.8 Implementation of a Portable Nested Data-Parallel Language - Blelloch, Chatterjee, Hardwick.. (1994)(Correct)
This paper gives an overview of the implementation of NESL, a portable nested data-parallel language.
This language and its implementation are the first to fully support nested data structures as well... / nested parallelism allows a compiler to convert them into a form that br Fortran and CM Fortran compilers generate near-optimal code. The
214.4 Optimizing Matrix Multiply using PHiPAC: a Portable.. - Bilmes, Asanovic, Demmel, Lam, Chin (1996)(Correct)
BLAS3 operations have great potential for aggressive optimization. Unfortunately, they usually need to be hand-coded for a speci#c machine and compiler to achieve near-peak performance. Wehave develop... / for a speci c machine and compiler to achieve near-peak performance. br analyzing current machines and C compilers we've developed guidelines for
205.7 Optimization of Instruction Fetch Mechanisms for High Issue Rates - Conte, Menezes, Mills, Patel (1995)(Correct)
Recent superscalar processors issue four instructions
per cycle. These processors are also powered by
highly-parallel superscalar cores. The potential performance
can only be exploited when fed by hig... / The performance boost provided by compiler optimization techniques is also br investigated. Results show that compiler optimization can significantly
204.1 Lifetime-Sensitive Modulo Scheduling - Huff (1993)(Correct)
This paper shows how to software pipeline a loop for minimal
register pressure without sacrificing the loop's minimum
execution time. This novel bidirectional slack-scheduling
method has been impleme... / been implemented in a FORTRAN compiler and tested on many scientific br To find an overlapped schedule a compiler must represent the complex
203.5 KIDS: A Semi-Automatic Program Development System - Smith (1990)(Correct)
The Kestrel Interactive Development System (KIDS) provides automated support for the development of correct and efficient programs from formal specifications. The system has components for performing ... / executable form by a conventional compiler. The initial algorithm that KIDS br language also called REFINE and compiler. The language supports
203.4 Programming In Vienna Fortran - Chapman, Mehrotra, Zima (1992)(Correct)
Exploiting the full performance potential of distributed memory machines requires
a careful distribution of data across the processors. Vienna Fortran is a language
extension of Fortran which provides... / not only for Fortran and current compiler research is aimed at br implementing them. Research in compiler technology has so far resulted in
201.4 LimitLESS Directories: A Scalable Cache Coherence Scheme - Chaiken, Kubiatowicz, Agarwal (1991)(Correct)
Caches enhance the performance of multiprocessors by reducing
network traffic and average memory access latency.
However, cache-based systems must address the problem of
cache coherence. We propose th... / closely with a multiprocessor's compiler and run-time system. The br by function as static compiler-dependent or dynamic using
195.7 Cache Miss Equations: An Analytical Representation of Cache Misses - Ghosh (1997)(Correct)
With the widening performance gap between processors and main memory, efficient memory referencing behavior is necessary for good program performance. Both hand-tuning and compiler optimization techni... / Workshop on Interaction between Compilers and Computer Architectures br performance. Both hand-tuning and compiler optimization techniques are
195.0 Scout: A Communications-Oriented Operating System - Montz, Mosberger, O'Malley.. (1994)(Correct)
This white paper describes Scout, a new operating system being designed for systems connected to the National Information Infrastructure (NII). Scout provides a communication-oriented software archite... / with the application of advanced compiler techniques result in a system br to the overall system. . Compiler Support A key design principle
194.2 Type-Based Alias Analysis - Diwan, McKinley, Moss (1998)(Correct)
This paper evaluates three alias analyses based on programming language types. The first analysis uses type compatibility to determine aliases. The second extends the first by using additional high-le... / of modern uniprocessors compilers must reorder instructions. For br programs that use pointers the compiler's alias analysis dramatically
190.1 Reducing Indirect Function Call Overhead In C++ Programs - Calder, Grumwald (1994)(Correct)
Modern computer architectures increasingly depend on mechanisms that estimate future control flow decisions to increase performance. Mechanisms such as speculative execution and prefetching are becomi... / techniques and demonstratehow compilers can use existing branch br in control and there are few compiler or hardware tricks that could
177.1 Cache Miss Equations: A Compiler Framework for Analyzing and Tuning.. - Ghosh, Martonosi, Malik (1998)(Correct)
This paper describes methods for generating and solving Cache Miss Equations (CMEs) that
give a detailed representation of cache behavior, including conflict misses, in loop-oriented scientific
code. ... / Cache Miss Equations A Compiler Framework for Analyzing and br performance. Both handtuning and compiler optimization techniques are often
177.1 Automatic Program Transformation with JOIE - Cohen, Chase, Kaminsky (1998)(Correct)
While the availability of platform-independent code on
the Internet is increasing, third-party code rarely exhibits
all of the features desired by end users. Unfortunately,
developers cannot foresee a... / translated into an executable by a compiler. Authors or users can employ br instructions by a Just-In-Time compiler JIT JITs only reimplement the
175.3 Unifying Data and Control Transformations for Distributed.. - Cierniak, Li (1994)(Correct)
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations inv... / have developed new techniques for compiler optimizations for distributed br with a memory hierarchy. Our compiler optimizations are based on an
175.3 Unifying Data and Control Transformations for Distributed Shared.. - Cierniak (1994)(Correct)
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations inv... / have developed new techniques for compiler optimizations for distributed br with a memory hierarchy. Our compiler optimizations are based on an
170.2 Run-time Adaptive Cache Hierarchy Management via Reference Analysis - Johnson, Hwn (1997)(Correct)
Improvements in main memory speeds have not kept pace with increasing processor clock frequency and improved exploitation of instruction-level parallelism. Consequently, the gap between processor and ... / they can also disrupt the compiler-generated ILP schedule. br programs there are several known compiler techniques for optimizing data
170.2 Code Compression - Ernst, Evans, Fraser, Lucco.. (1997)(Correct)
Current research in compiler optimization counts mainly
CPU time and perhaps the first cache level or two. This
view has been important but is becoming myopic, at least
from a system-wide viewpoint, a... / Abstract Current research in compiler optimization counts mainly CPU br for example all commercial JIT compilers known to us. This high
162.3 A Linear Algebra Framework for Static HPF Code Distribution - Ancourt, Coelho, Irigoin, Keryell (1995)(Correct)
High Performance Fortran (hpf) was developed to support data parallel
programming for simd and mimd machines with distributed memory. The programmer
is provided a familiar uniform logical address spac... / distribution by directives. The compiler then exploits these directives to br Fourth International Workshop on Compilers for Parallel Computers held in
161.7 Flick: A Flexible, Optimizing IDL Compiler - Eide, Frei, Ford, Lepreu, Lindstrom (1997)(Correct)
An interface definition language (IDL) is a nontraditional
language for describing interfaces between software components.
IDL compilers generate "stubs" that provide separate
communicating processes... / Flick A Flexible Optimizing IDL Compiler Eric Eide Kevin Frei Bryan br software components. IDL compilers generate stubs that provide
159.4 Compiler-Based Prefetching for Recursive Data Structures - Luk (1996)(Correct)
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. While prefetching has enjoyed... / Compiler-Based Prefetching for Recursive br This paper investigates compilerbased prefetching for
154.5 Fine-Grained Dynamic Instrumentation of Commodity Operating System.. - Tamches, Miller (1999)(Correct)
We have developed a technology, fine-grained
dynamic instrumentation of commodity kernels, which
can splice (insert) dynamically generated code before
almost any machine code instruction of a complete... / machine code is that the effect of compiler optimizations which can reorder br source code line needs the compiler's debugging line number
148.9 Value Profiling - Calder (1997)(Correct)
Identifying variables as invariant or constant at compile-time allows the compiler to perform optimizations including constant folding, code specialization, and partial evaluation. Some variables, whi... / at compile-time allows the compiler to perform optimizations br then benefit from invariant-based compiler optimizations. In this paper we
148.4 Branch Prediction For Free - Ball, Larus (1993)(Correct)
Many compilers rely on branch prediction to improve program performance by identifying frequently
executed regions and by aiding in scheduling instructions. Profile-based predictors
require a time-con... / Abstract Many compilers rely on branch prediction to br information available to a compiler would enhance our heuristics.
147.0 An Implementation of Interprocedural Bounded Regular Section Analysis - Havlak, Kennedy (1991)(Correct)
Optimizing compilers should produce efficient code even in the presence of high-level language
constructs. However, current programming support systems are significantly lacking in their ability
to an... / Abstract Optimizing compilers should produce efficient code br Introduction A major goal of compiler optimization research is to
145.4 An Efficient Implementation of Java's Remote Method Invocation - Maassen, van Nieuwpoort, Veldema.. (1999)(Correct)
Java offers interesting opportunities for parallel computing. In particular,
Java Remote Method Invocation provides an unusually flexible
kind of Remote Procedure Call. Unlike RPC, RMI supports
polymo... / a Java system based on a native compiler which supports both compile time br scheme with just-in-time compilers native compilers and
144.3 Increasing Network Throughput by Integrating Protocol Layers - Abbott, Peterson (1993)(Correct)
Integrating protocol data manipulations is a strategy for increasing the throughput of network protocols. The idea is to combine a series of protocol layers into a pipeline so as to access message dat... / Integration generalizes the compiler optimization known as loop br into one with good locality. The compiler takes advantage of this increased
142.8 Precise Miss Analysis for Program Transformations with Caches of.. - Ghosh, Martonosi, Malik (1998)(Correct)
Analyzing and optimizing program memory performance is
a pressing problem in high-performance computer architectures.
Currently, software solutions addressing the processormemory
performance gap inclu... / performance gap include compiler- or programmerapplied br and other program transformations. Compiler optimization can be effective
137.1 Flow and Stretch Metrics for Scheduling Continuous Job Streams - Bender, Chakrabarti, Muthukrishnan (1998)(Correct)
Many servers, such as web and database servers, receive a continual stream of requests requiring vastly different amounts of processing time. The servers should schedule these requests to provide the ... / completion time is useful in some compiler optimization settings br schedule. The two classical optimization metrics in Scheduling Theory
133.8 IMPACT: An Architectural Framework for Multiple-Instruction-Issue.. - Chang, Mahlke, Chen, Warter, Hwu (1991)(Correct)
The performance of multiple-instruction-issue processors can be severely limited by the compiler's ability to generate efficient code for concurrent hardware. In the IMPACT project, we have developed ... / can be severely limited by the compiler's ability to generate efficient br IMPACT-I a highly optimizing C compiler to exploit instruction level
133.3 Profile-Guided Receiver Class Prediction - Grove, Dean, Garrett, Chambers (1995)(Correct)
The use of dynamically-dispatched procedure calls is a key
mechanism for writing extensible and flexible code in
object-oriented languages. Unfortunately, dynamic
dispatching imposes a runtime perform... / faster than previous Self compilers on the same applications. Thus br it internally within a compiler. In sections and we report
133.3 Automatic Partitioning of Parallel Loops and Data Arrays for.. - Agarwal (1995)(Correct)
This paper presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency traffic on shared-memory multiprocessors. While several previous papers have looke... / implemented this framework in a compiler for Alewife a distributed shared br by the run time system or by the compiler. Relegating the partitioning task
133.3 Generating Communication for Array Statements: Design.. - Stichnoth (1994)(Correct)
Array statements as included in Fortran 90 or High Performance Fortran (HPF) are a wellaccepted way to specify data parallelism in programs. When generating code for such a data parallel program for a... / memory parallel system the compiler must determine when array br in an experimental Fortran compiler and this paper reports an
131.9 To Copy or Not to Copy: A Compile-Time Technique for Assessing When.. - Temam (1993)(Correct)
this paper, we present a compile-time technique for making this determination,
and present a selective copying strategy based on this methodology. Preliminary experimental
results demonstrate that, be... / data reuse cache conflicts compiler-directed cache management br incorporated into production compilers. Without copying the behavior
131.9 The Effects of the Precision of Pointer Analysis - Shapiro (1997)(Correct)
In order to analyze programs that manipulate pointers, it is necessary to have safe information about what each pointer might point to. There are many algorithms that can be used to determine this i... / to run faster. Introduction Compilers often perform a variety of br assignment x a compiler can safely ignore the first
131.9 Procedure Placement Using Temporal Ordering Information - Gloy, Blackwell, Smith, Calder (1997)(Correct)
Instruction cache performance is very important to
instruction fetch efficiency and overall processor performance.
The layout of an executable has a substantial effect
on the cache miss rate during ex... / direct-mapped caches where the compiler achieves an optimized cache line br code layout produced by most compilers places procedures in the order
131.4 Theory and Practice of Constraint Handling Rules - Frühwirth (1998)(Correct)
Constraint Handling Rules (CHR) are our proposal to allow more flexibility and application-oriented customization of constraint systems. CHR are a declarative language extension especially designed fo... / typically a library containing a compiler and run-time system written in br FrBr b FrBr includes a compiler a run-time system with debugger
125.9 Automatic Data Layout Using 0-1 Integer Programming - Bixby, Kennedy, Kremer (1994)(Correct)
The goal of languages like Fortran D or High Performance Fortran (HPF) is to
provide a simple yet efficient machine-independent parallel programming model. By shifting
much of the burden of machine-... / optimization to the compiler the programmer is able to write br Even the most sophisticated compiler may not be able to compensate
125.7 Accurate Indirect Branch Prediction - Driesen, Hölzle (1998)(Correct)
Indirect branch prediction is likely to become increasingly important in the future
because indirect branches occur more frequently in object-oriented programs. With misprediction
rates of around 25... / of choice for most Cand Java compilers execute an indirect branch for br each see Table and beta a compiler for the Beta programming
124.6 Quantifying Behavioral Differences Between C and C++ Programs - Calder, Grunwald, Zorn (1995)(Correct)
Improving the performance of C programs has been a topic of great interest for many years. Both hardware technology and compiler optimization research has been applied in an effort to make C programs ... / Both hardware technology and compiler optimization research has been br results should be of interest to compiler writers and architecture
124.6 Simple and Effective Link-Time Optimization of Modula-3 Programs - Fernandez (1995)(Correct)
Modula-3 supports development of modular programs by separating an object's interface from its implementation. This separation induces a runtime overhead in the implementation of objects, because it p... / objects because it prevents the compiler from having complete information br to implement them the Modula- compiler must generate code for various
124.1 Optimizing for Parallelism and Data Locality - Kennedy, McKinley (1992)(Correct)
Previous research has used program transformation to
introduce parallelism and to exploit data locality. Unfortunately,
these two objectives have usually been considered
independently. This work explo... / are two of the most valuable compiler techniques in use today. br of the cache is required the compiler must know the cache line size
123.4 DyC: An Expressive Annotation-Directed Dynamic Compiler for C - Brian Grant (1997)(Correct)
We present the design of DyC, a dynamic-compilation system for C
based on run-time specialization. Directed by a few declarative user
annotations that specify the variables and code on which dynamic
c... / Annotation-Directed Dynamic Compiler for C Brian Grant Markus br in the context of an optimizing compiler and initial results have been
119.7 Improving Register Allocation for Subscripted Variables - Callahan, Carr, Kennedy (1990)(Correct)
Most conventional compilers fail to allocate array elements to registers because standard data-flow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of ind... / z Abstract Most conventional compilers fail to allocate array elements br register allocators found in most compilers. In addition we present
118.5 Neural Network Synthesis Using Cellular Encoding And The Genetic.. - Frédéric Gruau (1994)(Correct)
Artificial neural networks used to be considered only as a machine that learns using small
modifications of internal parameters. Now this is changing. Such learning method do not allow
to generate big... / a number of properties and a compiler of high level language. The br A neural Compiler . Introduction
118.1 Value Profiling and Optimization - Calder, Feller, al. (1999)(Correct)
Variables and instructions that have invariant or predictable values at run-time, but
cannot be identified as such using compiler analysis, can benefit from value-based compiler
optimizations. Value-b... / cannot be identified as such using compiler analysis can benefit from br can benefit from value-based compiler optimizations. Value-based
118.1 SUIF Explorer: an interactive and interprocedural parallelizer - Liao, Diwan, Bosch, Ghuloum, Lam (1999)(Correct)
The SUIF Explorer is an interactive parallelization tool that
is more effective than previous systems in minimizing the
number of lines of code that require programmer
assistance. First, the interproc... / form of explicit directives to the compiler to perform or ignore some br to perform or ignore some compiler analysis or optimization
118.1 Set Constraints for Destructive Array Update Optimization - Wand, Clinger (1999)(Correct)
Destructive array update optimization is critical for writing scientific
codes in functional languages. We present set constraints for an interprocedural
update optimization that runs in polynomial ti... / analysis operational semantics compiler correctness Introduction br flow analysis that is familiar to compiler writers ASU It does not
117.5 A Static Parameter based Performance Prediction Tool for Parallel.. - Fahringer, Zima (1993)(Correct)
This paper presents a Parameter based Performance Prediction
Tool (P
3
T ) which is part of the Vienna Fortran Compilation
System (VFCS), a compiler that automatically translates
Fortran programs in... / Compilation System VFCS a compiler that automatically translates br programs. In contrast to earlier compilers such as SUPERB and the
117.5 Register Allocation with Instruction Scheduling: a New Approach - Pinter (1993)(Correct)
We present a new framework in which considerations of both register allocation and instruction scheduling can be applied uniformly and simultaneously. In this framework an optimal coloring of a graph,... / is an important task of every compiler. The problem of efficiently br instruction scheduling in some compilers like those for the MIPS
117.2 Reducing Memory Latency via Non-blocking and Prefetching Caches - Chen (1992)(Correct)
Non-blocking caches and prefetching caches are two techniques for hiding memory latency by exploiting the overlap of processor computations with data accesses. A non-blocking cache allows execution to... / these approaches. We also consider compiler-based optimizations to enhance br can be improved substantially by compiler optimizations such as instruction
117.2 A Retargetable Technique for Predicting Execution Time of Code.. - Harmon, Baker, Whalley (1992)(Correct)
Predicting the execution times of straight-line code sequences is a fundamental problem in the design and evaluation of hard-real-time systems. The reliability of system-level timings and schedulabi... / into account. This technique is compiler and language-independent and br is integrated with an existing C compiler. This system predicts the bounded
115.9 Synchronization and Communication in the T3E Multiprocessor - Scott (1996)(Correct)
This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors. We discuss what we have learned from the T3D p... / significantly easier for the compiler. For either programming model br queue is used by both the CRAFT compiler to fetch remote data in loops
115.9 Minimizing Register Requirements under Resource-Constrained.. - Govindarajan, Altman, Gao (1995)(Correct)
The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedule... / ACAPS Laboratory Advanced Compilers Architectures and Parallel br in two different ways i As a compiler option which can be used in
114.2 Practical Virtual Method Call Resolution for Java - Sundaresan, Razafimahefa.. (1998)(Correct)
This paper addresses the problem of resolving virtual method and interface calls in Java. The main
focus is on practical, flow-insensitive techniques that can be used to analyze large applications.
We... / important to provide optimizing compilers and more efficient runtime br that has been produced by any compiler optimizer or other tool. In
113.5 Optimal Code Motion: Theory and Practice - Knoop, Rüthing, Steffen (1994)(Correct)
this paper, we emphasize the practicality of lazy code motion by giving explicit
directions for its implementation in standard compiler environments. In particular,
we present a version of the algorit... / format is standard in optimizing compilers. The theoretical foundations of br for its implementation in standard compiler environments. Categories and
112.0 A Practical System for Intermodule Code Optimization at Link-Time - Srivastava, Wall (1992)(Correct)
We have developed a system called OM to explore the problem of code optimization at link-time. OM takes a collection of object modules constituting the entire program, and converts the object code int... / to perform optimizations that a compiler looking at a single module cannot br the particular source language or compiler this also gives us the chance
111.7 Practical Dependence Testing - Goff, Kennedy, Tseng (1991)(Correct)
Precise and efficient dependence tests are essential to
the effectiveness of a parallelizing compiler. This paper
proposes a dependence testing scheme based on classifying
pairs of subscripted variabl... / effectiveness of a parallelizing compiler. This paper proposes a br in both PFC a parallelizing compiler and ParaScope a parallel
110.1 Beyond Induction Variables: Detecting and Classifying Sequences Using .. - Gerlek (1996)(Correct)
ix
1 Introduction 1
2 A Bestiary of Sequence Forms 4
2.1 Linear Induction Variables : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6
2.2 Polynomial Induction Variables : : : : : : : : : : : ... / and the field of high performance compilers. For four years he has served br it most. Outside of the realm of compilers the most important lessons I
109.0 Clustered Speculative Multithreaded Processors - Marcuello, González (1999)(Correct)
In this paper we present a processor microarchitecture that can simultaneously execute multiple
threads and has a clustered design for scalability purposes. A main feature of the proposed
microarchite... / compiled with the DEC Alpha compiler with full optimization have a br runtime mechanisms without any compiler support b data speculation
109.0 Localizing Non-affine Array References - Mitchell, Carter, Ferrante (1999)(Correct)
Existing techniques can enhance the locality of arrays
indexed by affine functions of induction variables. This paper
presents a technique to localize non-affine array references,
such as the indirect... / Despite the efforts of the vendor compiler to mask latencies the load br CA July . A. J. C. Bik. Compiler Support for Sparse Matrix
108.6 Evaluating Compiler Optimizations For Fortran D - Hiranandani, Kennedy, Tseng (1994)(Correct)
The Fortran D compiler uses data decomposition specifications to automatically translate Fortran
programs for execution on MIMD distributed-memory machines. This paper introduces and
classifies a numb... / Evaluating Compiler Optimizations For Fortran D br Foundation. Evaluating Compiler Optimizations For Fortran D
108.6 A High-Performance Microarchitecture with Hardware-Programmable.. - Razdan, Smith (1994)(Correct)
This paper explores a novel way to incorporate hardware-programmable
resources into a processor microarchitecture to improve the
performance of general-purpose applications. Through a coupling
of comp... / Using this information the compiler interacts with sophisticated br processor and it relies on the compiler and run-time system to
108.6 Compiler Blockability of Numerical Algorithms - Carr (1992)(Correct)
Over the past decade, microprocessor design strategies have focused on increasing the computational power on a single chip. Unfortunately, memory speeds have not kept pace. The result is an imbalance ... / Compiler Blockability of Numerical br CRPC -MS Houston TX Compiler Blockability of Numerical
108.5 Fast Module Mapping and Placement for Datapaths in FPGAs - Callahan, Chong, DeHon, Wawrzynek (1998)(Correct)
By tailoring a compiler tree-parsing tool for datapath module
mapping, we produce good quality results for datapath synthesis
in very fast run time. Rather than flattening the design
to gates, we pres... / Abstract By tailoring a compiler tree-parsing tool for datapath br developed for code generation in compilers and was first used for the
107.2 A Methodology For Query Reformulation In Cis Using Semantic Knowledge - Florescu, Raschid (1996)(Correct)
We consider Cooperative Information Systems (CIS) that are multidatabase systems
(MDBMS), with a common object-oriented model, based on the ODMG standard, together
with local databases that may be rel... / technique in our Flora compiler prototype which we used for br within the Flora compiler optimizer for the ODMG data model
106.3 From ML to Ada(!?!): Strongly-typed Language Interoperability via.. - Tolmach, Oliva (1997)(Correct)
We describe a system that supports source-level integration of ML-like functional language
code with ANSI C or Ada83 code. The system works by translating the functional code into
type-correct, "vanil... / output of current optimizing ML compilers even though handicapped by a br details of FL and GL compilers which may be unacceptable in
106.3 Linear-time Subtransitive Control Flow Analysis - Heintze, McAllester (1997)(Correct)
We present a linear-time algorithm for boundedtype programs that builds a directed graph whose transitive closure gives exactly the results of the standard (cubic-time) Control-Flow Analysis (CFA) alg... / in the program. This limits compiler optimization. One way to address br a barrier to the use of CFA in compilers. In fact although a number of
105.1 Interprocedural Modification Side Effect Analysis With Pointer.. - Landi, Ryder, Zhang (1993)(Correct)
We present a new interprocedural modification side effects algorithm for C programs, that can
discern side effects through general-purpose pointer usage. Ours is the first complete design and
implemen... / effects is crucial for aggressive compiler optimization ASU practical br analyzed in LR plus compiler a compiler for a subset of
104.3 Flow-directed Inlining - Jagannathan, Wright (1996)(Correct)
A flow-directed inlining strategy uses information derived from control-flow analysis to specialize and inline procedures for functional and object-oriented languages. Since it uses control-flow analy... / makes it simple to upgrade a compiler that uses our strategy to include br is easy to implement in a compiler that uses flat closures A
104.3 Storage Assignment to Decrease Code Size - Liao (1995)(Correct)
DSP architectures typically provide indirect addressing modes with auto-increment and decrement. In addition, indexing mode is not available, and there are usually few, if any, general-purpose registe... / time-to-market. However current compilers for microcontrollers and br size penalties. While optimizing compilers have proved effective for
103.4 Profile-Guided Automatic Inline Expansion for C Programs - Chang, Mahlke, Chen, Hwu (1992)(Correct)
This paper describes critical implementation issues that must be addressed to develop a fully automatic inliner. These issues are: integration into a compiler, program representation, hazard preventio... / issues are integration into a compiler program representation hazard br integrated into an optimizing C compiler. The experimental results show
102.8 Building Program Optimizers with Rewriting Strategies - Visser, Benaissa, Tolmach (1998)(Correct)
We describe a language for defining term rewriting strategies, and its application to the production of program optimizers. Valid transformations on program terms can be described by a set of rewrite ... / Introduction Compiler components such as parsers br attractive for functional language compilers e.g. that operate
102.1 System Support for Automatic Profiling and Optimization - Zhang, Wang, Gloy, Chen, Smith (1997)(Correct)
The Morph system provides a framework for automatic collection and management of profile information and application of profile-driven optimizations. In this paper, we focus on the operating system su... / of operating system and compiler technology that provides a br framework for the advanced compiler optimizations needed to support
102.1 A Single-Chip Multiprocessor - Hammond, al. (1997)(Correct)
this article, we explain why software and hardware
trends will favor the CMP microarchitecture. We
base our conclusion on the performance results from
a comparison of simulated superscalar, SMT, and C... / as much parallelism as possible. Compilers which have essentially infinite br parallelism. Some compilers can also divide a program into
101.4 Analysis of Techniques to Improve Protocol Processing Latency - Mosberger (1996)(Correct)
This paper describes several techniques designed to improve
protocol latency, and reports on their effectiveness when
measured on a modern RISC machine employing the DEC
Alpha processor. We found that... / they can all be characterized as compiler-based techniques. As such one br limited context available to the compiler's optimizer. A technique called
100.0 An Overview of the Pablo Performance Analysis Environment - Reed, Aydt, Madhyastha, Noe.. (1992)(Correct)
As massively parallel, distributed memory systems replace traditional vector supercomputers, effective application program optimization and system resource management become more than research curiosi... / Performance Tool Compiler Integration br Pablo with data parallel Fortran compilers based on the emerging High
99.9 MultiView and Millipage - Fine-Grain Sharing in Page-Based DSMs - Itzkovitz, Schuster (1999)(Correct)
In this paper we develop a novel technique, called MultiView,
which enables implementation of page-based
fine-grain dsms. We show how the traditional techniques
for implementing page-based dsms can be... / system api and requires no compiler intervention page twinning br require a specially tailored compiler or binary code instrumentation
99.9 Flexible Type Analysis - Crary, Weirich (1999)(Correct)
Run-time type dispatch enables a variety of advanced
optimization techniques for polymorphic languages, including
tag-free garbage collection, unboxed function
arguments, and flattened data structures... / However modern type-preserving compilers transform types between stages br Introduction Type-directed compilers use type information to enable
99.9 MultiView and Millipage Fine-Grain Sharing in Page-Based DSMs - Ayal Itzkovitz (1999)(Correct)
In this paper we develop a novel technique, called MultiView,
which enables implementation of page-based
#ne-grain dsms. We show how the traditional techniques
for implementing page-based dsms can be ... / system api and requires no compiler intervention page twinning br require a specially tailored compiler or binary code instrumentation
98.7 Space-Efficient Closure Representations - Shao, Appel (1994)(Correct)
Many modern compilers implement function calls (or returns)
in two steps: first, a closure environment is properly
installed to provide access for free variables in the target
program fragment; second... / Abstract Many modern compilers implement function calls or br by the Standard ML of New Jersey compiler by about on a DECstation
98.5 Points-to Analysis by Type Inference of Programs with Structures and.. - Bjarne Steensgaard (1996)(Correct)
We present an interprocedural flow-insensitive points-to analysis algorithm
based on monomorphic type inference. The source language model the important
features of C including pointers, pointer arith... / Introduction Modern optimizing compilers and program understanding and br variables Most current compilers and programming tools use only
98.5 Elimination of Redundant Array Subscript Range Checks - Kolte, Wolfe (1995)(Correct)
This paper presents a compiler optimization algorithm to reduce
the run time overhead of array subscript range checks
in programs without compromising safety. The algorithm
is based on partial redunda... / Abstract This paper presents a compiler optimization algorithm to reduce br the algorithm in our research compiler Nascent and conducted
97.1 Efficient JavaVM Just-in-Time Compilation - Krall (1998)(Correct)
Conventional compilers are designed for producing
highly optimized code without paying much attention to
compile time. The design goals of Java just-in-time compilers
are different: produce fast code ... / Abstract Conventional compilers are designed for producing br design goals of Java just-in-time compilers are different produce fast
96.9 Cache Performance of the SPEC92 Benchmark Suite - Gee (1993)(Correct)
The SPEC92 benchmark suite consists of twenty public-domain, non-trivial programs that are
widely used to measure the performance of computer systems, particularly those in the Unix
workstation market... / of any source code modifications compiler and operating system release br realistic workloads. Similarly compiler writers have been concentrating
96.9 Cache Performance of the SPEC Benchmark Suite - Gee, Hill, Pnevmatikatos, Smith (1993)(Correct)
The SPEC benchmark suite consists of ten public-domain, non-trivial programs that are widely used to
measure the performance of computer systems, particularly those in the Unix workstation market.
The... / of any source code modifications compiler and operating system release br realistic workloads. Similarly compiler writers have been concentrating
95.6 Demand-driven Computation of Interprocedural Data Flow - Duesterwald, Gupta, Soffa (1995)(Correct)
This paper presents a general framework for deriving demanddriven
algorithms for interprocedural data flow analysis of
imperative programs. The goal of demand-driven analysis
is to reduce the time and... / Optimizing and parallelizing compilers that exhaustively analyze a br automatically e.g.by the compiler or manually by the user e.g.
92.7 Distributed Memory Compiler Design for Sparse Problems - Wu (1995)(Correct)
This paper addresses the issue of compiling concurrent loop nests in the presence of complicated
array references and irregularly distributed arrays. Arrays accessed within loops
may contain accesses ... / Distributed Memory Compiler Design for Sparse Problems br that is used effectively by a compiler to generate efficient code in
91.4 Compiling for the Multiscalar Architecture - Vijaykumar (1998)(Correct)
High-performance, general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world... / of performance and explore a few compiler optimization opportunities br To extract high degrees of ILP compiler heuristics partition programs
91.4 Monitors and Exceptions: How to implement Java efficiently - Krall, Probst (1998)(Correct)
Efficient implementation of monitors and exceptions is
crucial for the performance of Java. One implementation
of threads showed a factor of 30 difference in
run time on some benchmark programs. This ... / as used in the CACAO just-in-time compiler. With this implementation the br are possible using just-in-time compilers which translate Java byte code
90.9 Efficient Incremental Run-Time Specialization for Free - Marlet, Consel, Boinot (1999)(Correct)
Availability of data in a program determines computation
stages. Incremental partial evaluation exploit
these stages for optimization: it allows further specialization
to be performed as data become a... / its data and the generation of a compiler generator capable of br is generally obtained using a compiler generator cogen given a
89.8 Simple and Effective Analysis of Statically-Typed Object-Oriented.. - Diwan (1996)(Correct)
To use modern hardware effectively, compilers need
extensive control-flow information. Unfortunately,
the frequent method invocations in object-oriented
languages obscure control flow. In this paper, ... / use modern hardware effectively compilers need extensive control-flow br are thus practical for use in a compiler. When they fail we introduce
89.8 Extending SUIF for Machine-dependent Optimizations - Smith (1996)(Correct)
This paper describes a set of modifications and extensions to the base SUIF library that provide the abstractions necessary for machine-dependent optimizations such as global instruction scheduling. W... / . . Introduction The SUIF compiler provides an excellent set of br are useful for machine-dependent compiler research and we want to exploit
89.8 An HPF Compiler for the IBM SP2 - Gupta, Midkriff, Schonberg, Shields, .. (1995)(Correct)
We describe pHPF, an research prototype HPF compiler for the IBM SP series parallel machines. The compiler accepts as input Fortran 90 and Fortran 77 programs, augmented with HPF directives; sequentia... / An Hpf Compiler For The Ibm Sp Manish Gupta br Phpf An Research Prototype Hpf Compiler For The Ibm Sp Series Parallel
89.8 Control-Flow Analysis and Type Systems - Heintze (1995)(Correct)
We establish a series of equivalences between type systems
and control-flow analyses. Specifically, we take four type systems from the
literature (involving simple types, subtypes and recursion) and... / A central concept in compiler optimization and code generation br can significantly limit compiler performance. To addresses this
89.7 Making Pure Object-Oriented Languages Practical - Chambers, Ungar (1991)(Correct)
In the past, object-oriented language designers and programmers have been forced to choose between pure message passing and performance. Last year, our SELF system achieved close to half the speed of ... / about as fast as an optimizing C compiler and runs at over half the speed br single target method and so the compiler cannot simply expand its
88.8 Two Classes of Boolean Functions for Dependency Analysis - Armstrong, Marriott, Schachte.. (1994)(Correct)
Many static analyses for declarative programming/database languages use Boolean functions
to express dependencies among variables or argument positions. Examples include
groundness analysis, arguably ... / Languages Processors-compilers optimization F. . Logics br only important for an optimizing compiler attempting to speed up
86.5 Dependence-Based Program Analysis - Johnson, Pingali (1993)(Correct)
Program analysis and optimization can be speeded up through the use of the dependence flow graph (DFG), a representation of program dependences which generalizes def-use chains and static single assig... / into a traditional optimizing compiler framework. We accomplish this as br For example the Multiflow compiler performed predicate analysis to
85.7 Model Checking Concurrent Systems with Unbounded Integer Variables.. - Tevfik Bultan (1998)(Correct)
Model checking is a powerful technique for analyzing large, finite-state systems. In an
infinite-state system, however, many basic properties are undecidable. In this paper, we present
a new symbolic ... / Fellowship. Presburger Compiler DNF Partitioning Control-Point br a programmer draws on when setting compiler-optimization levels. As for the
85.7 Run-time Code Generation and Modal-ML - Philip Wickline (1998)(Correct)
This paper presents early experience with a typed programming language and compiler for run-time code
generation. The language is an extension of the SML language with modal operators, based on the
... / a typed programming language and compiler for run-time code generation. br of computation in a program. The compiler generates target code that makes
85.7 Power and Performance Tradeoffs using Various Caching Strategies - Bahar, Albera, Manne (1998)(Correct)
In this paper, we propose several different data and instruction
cache configurations and analyze their power as well as performance
implications on the processor. Unlike most existing work in
low pow... / version of the GNU gcc compiler with full optimization. This br with full optimization. This compiler generates bit-wide
85.1 Java for Parallel Computing and as a General Language for Scientific.. - Fox (1997)(Correct)
We discuss the role of Java and Web technologies for general simulation. We classify the
classes of concurrency typical in problems and analyze separately the role of Java in user
interfaces, coarse g... / a few instructions scheduled by a compiler smaller than an application br little reason why native Java compilers as opposed to current portable
84.0 The execution algorithm of Mercury, an efficient purely declarative.. - Somogyi, Henderson, Conway (1996)(Correct)
Machine or WAM. Section 5 describes some optimizations and shows how
Mercury handles I/O. Section 6 gives the current state of the Mercury system while
section 7 presents performance results.
2. The... / very efficient code. The Mercury compiler uses this execution model to br as much help as possible from the compiler in locating errors in their
83.9 Lightweight Run-Time Code Generation - Leone, Lee (1994)(Correct)
Run-time code generation is an alternative and complement
to compile-time program analysis and optimization. Static
analyses are inherently imprecise because most interesting
aspects of run-time behav... / developed for a prototype compiler are discussed and the results br Introduction Many compiler optimizations depend on
82.7 Beyond Induction Variables - Wolfe (1992)(Correct)
Induction variable detection is usually closely tied to the strength reduction optimization. This paper studies induction variable analysis from a different perspective, that of finding induction vari... / others are not analyzed by current compilers. Giving a unified approach br approach improves the speed of compilers and allows a more general
81.8 A Comparison of Compiler Tiling Algorithms - Rivera, Tseng (1999)(Correct)
Linear algebra codes contain data locality which can be exploited
by tiling multiple loop nests. Several approaches to tiling have
been suggested for avoiding conflict misses in low associativity ca... / A Comparison of Compiler Tiling Algorithms Gabriel br intra-variable padding a compiler optimization for eliminating
81.8 Towards Automatic Specialization of Java Programs - Schultz, Lawall, Consel, Muller (1999)(Correct)
Automatic program specialization can derive efficient implementations from
generic components, thus reconciling the often opposing goals of genericity and efficiency.
This technique has proved usefu... / for C programs and a Java-to-C compiler. Specialization is managed using br the Harissa optimizing Java-to-C compiler In this paper we
81.8 A semantics for imprecise exceptions - Jones, Reid, Hoare, Marlow, Henderson (1999)(Correct)
Some modern superscalar microprocessors provide only imprecise
exceptions. That is, they do not guarantee to report
the same exception that would be encountered by a
straightforward sequential execut... / crippling the language or its compilers. We do not yet have enough br of the programming language. The compiler or the programmer might want
81.8 Improving the Performance of Speculatively Parallel Applications on.. - Olukotun, Hammond, Willey (1999)(Correct)
Hydra is a chip multiprocessor (CMP) with integrated support for thread-level speculation. Thread-level
speculation provides a way to parallelize sequential programs without the need for data dependen... / thread architecture the compiler must break the program into br to be parallel. Furthermore compilers and manual optimization can be
81.4 Compiler-directed Data Prefetching in Multiprocessors with Memory.. - Edward Gornish (1990)(Correct)
Memory hierarchies are used by multiprocessor systems
to reduce large memory access times. It is necessary to
automatically manage such a hierarchy, to obtain effective
memory utilization. In this pap... / Compiler-directed Data Prefetching in br We take the approach that the compiler should perform the program
81.1 A Manual for the CHAOS Runtime Library - Saltz, Ponnusamy, Sharma, Moon.. (1995)(Correct)
Procedures are presented that are designed to help users efficiently program irregular
problems (e.g. unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial
differential equations solver... / are also designed for use in compilers for distributed memory br forming a portion of a portable compiler independent runtime support
81.0 Compiling Fortran D for MIMD Distributed-Memory Machines - Hiranandani (1992)(Correct)
Fortran D, a version of Fortran extended with
data decomposition specifications, is designed to provide
a machine-independent data-parallel programming
model. This paper describes analysis, optimizati... / employed in the Fortran D compiler. The compiler first partitions br in the Fortran D compiler. The compiler first partitions programs using
79.9 Using Shape Analysis to Reduce Finite-State Models of Concurrent Java .. - Corbett (1998)(Correct)
Finite-state verification (e.g., model checking) provides a powerful means to detect concurrency errors, which are
often subtle and difficult to reproduce. Nevertheless, widespread use of this technol... / methods were developed for compiler optimization where accurate br any lock is released. This allows compilers a great deal of flexibility in
79.4 A Code Generation Interface for ANSI C - Fraser (1991)(Correct)
machine code resembles assembly or machine language for a fictitious computer. 8 A front end emits a stream of instructions (in a text or compressed binary encoding) to a logically separate back end. ... / lcc is a retargetable production compiler for ANSI C it has been ported to br it results in efficient compact compilers. The interface is illustrated
79.0 Reducing Branch Costs via Branch Alignment - Calder, Grunwald (1994)(Correct)
Several researchers have proposed algorithms for basic block reordering. We call these branch alignment algorithms. The primary emphasis of these algorithms has been on improving instruction cache loc... / are compiled by any existing compiler and then transformed via binary br time analysis in the IMPACT-I compiler system. Using profile-based
78.2 A Quantitative Analysis of Loop Nest Locality - McKinley, Temam (1996)(Correct)
This paper analyzes and quantifies the locality characteristics of numerical loop nests in order to suggest future directions for architecture and software cache optimizations. Since most programs spe... / and provide new insights for the compiler writer and the architect. br cache memories Smi Smi and compiler techniques that exploit cache
78.2 Effective Flow Analysis for Avoiding Run-Time Checks - Jagannathan, Wright (1995)(Correct)
This paper describes a general purpose program analysis that
computes global control-flow and data-flow information for higher-order,
call-by-value programs. This information can be used to drive gl... / overheads hence sophisticated compiler optimizations are essential if br tend to be small these compiler optimizations must be
78.2 Optimizing Instruction Cache Performance for Operating System.. - Torrellas, Xia, Daigle (1995)(Correct)
High instruction cache hit rates are key to high performance. One known technique to
improve the hit rate of caches is to use an optimizing compiler to minimize cache interference
via an improved layo... / of caches is to use an optimizing compiler to minimize cache interference br runs of the second phase of the C compiler which generates assembly code
77.9 Using Profile Information to Assist Classic Code Optimizations - Chang (1991)(Correct)
This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new compo... / implementation of an optimizing compiler that automatically generates br classic code optimizations. This compiler contains two new components an
76.5 Quantifying the Multi-Level Nature of Tiling Interactions - Nicholas Mitchell (1997)(Correct)
Optimizations, including tiling, often target a single level
of memory or parallelism, such as cache. These optimizations usually
operate on a level-by-level basis, guided by a cost function paramet... / available to an optimizing compiler. Many machines now have multiple br is to provide a system whereby a compiler or human may consider
76.5 Automatic Inline Allocation of Objects - Dolby (1997)(Correct)
Object-oriented languages like Java and Smalltalk provide
a uniform object model that simplifies programming by
providing a consistent, abstract model of object behavior.
But direct implementations in... / For example in Figure the compiler must determine that the call to br of CThen we present the compiler framework for our optimizations
76.5 Interprocedural Conditional Branch Elimination - Bodik (1997)(Correct)
The existence of statically detectable correlation among conditional branches enables their elimination, an optimization that has a number of benefits. This paper presents techniques to determine whet... / statically and its usability for compiler optimizations. We found that not br manipulation available in the compiler. Since query propagation may not
76.5 A Note on Native Level 1 BLAS in Java - Aart Bik (1997)(Correct)
In this research note, we explore the potential of extending the Java Application
Programming Interface with some mathematical primitives to improve the performance
of certain operations in Java progr... / as additional phase to the Java compiler or at run-time where br into Blas.class using the Java compiler javac. Subsequently the tool
75.3 Generation of efficient interprocedural analyzers with PAG - Alt, Martin (1995)(Correct)
To produce high quality code, modern compilers use global
optimization algorithms based on abstract interpretation. These algorithms
are rather complex; their implementation is therefore a non--triv... / produce high quality code modern compilers use global optimization br be easily integrated in existing compilers. The analyzers are