Results 1 - 10
of
27
Instruction-Level Parallel Processing: History, Overview and Perspective
, 1992
"... Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a muc ..."
Abstract
-
Cited by 166 (0 self)
- Add to MetaCart
Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a much more significant force in computer design. Several systems were built, and sold commercially, which pushed ILP far beyond where it had been before, both in terms of the amount of ILP offered and in the central role ILP played in the design of the system. By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP have become a popular topic at academic conferences. This article provides an overview and historical perspective of the field of ILP and its development over the past three decades.
Register Allocation via Graph Coloring
, 1992
"... Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pessimis ..."
Abstract
-
Cited by 133 (4 self)
- Add to MetaCart
Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pessimistically assumes any node of high degree will not be colored and must therefore be spilled. By optimistically assuming that nodes of high degree will receive colors, I often achieve lower spill costs and faster code; my results are never worse. Coloring pairs The pessimism of Chaitin's coloring heuristic is emphasized when trying to color register pairs. My heuristic handles pairs as a natural consequence of its optimism. Rematerialization Chaitin et al. introduced the idea of rematerialization to avoid the expense of spilling and reloading certain simple values. By propagating rematerialization information around the SSA graph using a simple variation of Wegman and Zadeck's constant propag...
Profile-guided automatic inline expansion for C programs
- SOFTWARE PRACTICE AND EXPERIENCE
, 1992
"... This paper describes critical implementation issues that must be addressed to develop a fully automatic inliner. These issues are: integration into a compiler, program representation, hazard prevention, expansion sequence control, and program modi cation. An automatic inter- le inliner that uses pro ..."
Abstract
-
Cited by 109 (5 self)
- Add to MetaCart
This paper describes critical implementation issues that must be addressed to develop a fully automatic inliner. These issues are: integration into a compiler, program representation, hazard prevention, expansion sequence control, and program modi cation. An automatic inter- le inliner that uses pro le information has been implemented and integrated into an optimizing C compiler. The experimental results show that this inliner achieves signi cant speedups for production C programs.
Linear Scan Register Allocation
- ACM Transactions on Programming Languages and Systems
, 1999
"... this article we use depth-first order. The choice of instruction ordering does not a#ect the correctness of the algorithm, but it may a#ect the quality of allocation. We discuss alternative orderings in Section 6. ..."
Abstract
-
Cited by 108 (4 self)
- Add to MetaCart
this article we use depth-first order. The choice of instruction ordering does not a#ect the correctness of the algorithm, but it may a#ect the quality of allocation. We discuss alternative orderings in Section 6.
Effective Partial Redundancy Elimination
- Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation
, 1994
"... Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness ..."
Abstract
-
Cited by 84 (13 self)
- Add to MetaCart
Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness of partial redundancy elimination. By imposing a discipline on the choice of names and the shape of expressions, we are able to expose more redundancies. As part of the work, we introduce a new algorithm for global reassociation of expressions. It uses global information to reorder expressions, creating opportunities for other optimizations. The new algorithm generalizes earlier work that ordered FORTRAN array address expressions to improve optimization [25]. 1 Introduction Partial redundancy elimination is a powerful optimization that has been discussed in the literature for many years (e.g., [21, 8, 14, 12, 18]). Unfortunately, partial redundancy elimination has two serious limitations...
Adaptive Optimizing Compilers for the 21st Century
- Journal of Supercomputing
, 2001
"... Historically, compilers have operated by applying a fixed set of optimizations in a predetermined order. We call such an ordered list of optimizations a compilation sequence. This paper describes a prototype system that uses biased random search to discover a program-specific compilation sequence ..."
Abstract
-
Cited by 70 (8 self)
- Add to MetaCart
Historically, compilers have operated by applying a fixed set of optimizations in a predetermined order. We call such an ordered list of optimizations a compilation sequence. This paper describes a prototype system that uses biased random search to discover a program-specific compilation sequence that minimizes an explicit, external objective function. The result is a compiler framework that adapts its behavior to the application being compiled, to the pool of available transformations, to the objective function, and to the target machine.
Code selection through object code optimization
- ACM Transactions on Programming Languages and Systems
, 1984
"... This paper shows how thorough object code optimization has simplified a compiler and made it easy to retarget. The code generator forgoes case analysis and emits naive code that is improved by a retargetable object code optimizer. With this technique, cross-compilers have been built for seven machin ..."
Abstract
-
Cited by 66 (13 self)
- Add to MetaCart
This paper shows how thorough object code optimization has simplified a compiler and made it easy to retarget. The code generator forgoes case analysis and emits naive code that is improved by a retargetable object code optimizer. With this technique, cross-compilers have been built for seven machines, some in as few as three person days. These cross-compilers emit code comparable to host-specific compilers. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors--code genera-tion; compilers; optimization
Recovery management in QuickSilver
- ACM Transactions on Computer Systems
, 1988
"... developed at the IBM Almaden Research Center, which uses atomic tran.sactions as a unified failure recovery mechanism for a client-server structured distributed system. Transactions allow failure atomicity for related activities at a single server or at a number of independent servers. Rather than b ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
developed at the IBM Almaden Research Center, which uses atomic tran.sactions as a unified failure recovery mechanism for a client-server structured distributed system. Transactions allow failure atomicity for related activities at a single server or at a number of independent servers. Rather than bundling transaction management into a dedicated language or recoverable object manager, Quicksilver exposes the basic commit protocol and log recovery primi-tives, allowing clients and servers to tailor their recovery techniques to their specific needs. Servers can implement their own log recovery protocols rather than being required to use a system-defined protocol. These decisions allow servers to make their own choices to balance simplicity, efficiency, and recoverability. Categories and Subject Descriptors: D.4.3 [Operating Systems]: File System Management-distrib-uted file systems; file organization; maintenance; D.4.5 [Operating Systems]: Reliability-FauZt-tolerance; checkpoint/restart; H.2.4 [Database Management]: Systems--distributed systems; trun.s-action processing
Threads cannot be implemented as a library
- In PLDI
, 2005
"... threads, library, register promotion, compiler optimization, garbage collection In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a ge ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
threads, library, register promotion, compiler optimization, garbage collection In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a general understanding that this is not the right approach. We provide specific arguments that a pure library approach, in which the compiler is designed independently of threading issues, cannot guarantee correctness of the resulting code. We first review why the approach almost works, and then examine some of the surprising behavior it may entail. We further illustrate that there are very simple cases in which a pure library-based approach seems incapable of expressing an efficient parallel algorithm. Our discussion takes place in the context of C with Pthreads, since it is commonly used, reasonably well specified, and does not attempt to ensure type-safety, which would entail even stronger constraints. The issues we raise are not specific to that context.
Interprocedural Symbolic Analysis
, 1994
"... Compiling for efficient execution on advanced computer architectures requires extensive program analysis and transformation. Most compilers limit their analysis to simple phenomena within single procedures, limiting effective optimization of modular codes and making the programmer's job harder. We p ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
Compiling for efficient execution on advanced computer architectures requires extensive program analysis and transformation. Most compilers limit their analysis to simple phenomena within single procedures, limiting effective optimization of modular codes and making the programmer's job harder. We present methods for analyzing array side effects and for comparing nonconstant values computed in the same and different procedures. Regular sections, described by rectangular bounds and stride, prove as effective in describing array side effects in Linpack as more complicated summary techniques. On a set of six programs, regular section analysis of array side effects gives 0 to 39 percent reductions in array dependences at call sites, with 10 to 25 percent increases in analysis time. Symbolic analysis is essential to data dependence testing, array section analysis, and other high-level program manipulations. We give methods for building symb...

