Results 11 - 20
of
51
Design and Implementation of a Graph Coloring Register Allocator for GCC
, 2003
"... Historically the register allocator used in GCC is a two phase allocator differentiating be-tween local and global pseudo registers, which doesn’t itself produce spill code, and therefore is limited in code quality if spilling is needed. This paper describes a new register allocator for GCC based on ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Historically the register allocator used in GCC is a two phase allocator differentiating be-tween local and global pseudo registers, which doesn’t itself produce spill code, and therefore is limited in code quality if spilling is needed. This paper describes a new register allocator for GCC based on graph coloring. After a short overview of the concepts of them in general, in-cluding some of the improvements (if used in the implementation) we discuss the actual im-plementation of the allocator including design decisions and justification for them. This in-cludes parts which aren’t explained in the usual scientific papers but needed in a real world multi-target allocator. 1
An Optimistic and Conservative Register Assignment Heuristic for Chordal Graphs
, 2007
"... This paper presents a new register assignment heuristic for procedures in SSA Form, whose interference graphs are chordal; the heuristic is called optimistic chordal coloring (OCC). Previous register assignment heuristics eliminate copy instructions via coalescing, in other words, merging nodes in t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
This paper presents a new register assignment heuristic for procedures in SSA Form, whose interference graphs are chordal; the heuristic is called optimistic chordal coloring (OCC). Previous register assignment heuristics eliminate copy instructions via coalescing, in other words, merging nodes in the interference graph. Node merging, however, can not preserve the chordal graph property, making it unappealing for SSA-based register allocation. OCC is based on graph coloring, but does not employ coalescing, and, consequently, preserves graph chordality, and does not increase its chromatic number; in this sense, OCC is conservative as well as optimistic. OCC is observed to eliminate at least as many dynamically executed copy instructions as iterated register coalescing (IRC) for a set of chordal interference graphs generated from several Mediabench and MiBench applications. In many cases, OCC and IRC were able to find optimal or near-optimal solutions for these graphs. OCC ran 1.89x faster than IRC, on average.
Optimizing Scientific Application Loops on Stream Processors
"... This paper describes a graph coloring compiler framework to allocate on-chip SRF (Stream Register File) storage for optimizing scientific applications on stream processors. Our framework consists of first applying enabling optimizations such as loop unrolling to expose stream reuse and opportunities ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper describes a graph coloring compiler framework to allocate on-chip SRF (Stream Register File) storage for optimizing scientific applications on stream processors. Our framework consists of first applying enabling optimizations such as loop unrolling to expose stream reuse and opportunities for maximizing parallelism, i.e., overlapping kernel execution and memory transfers. Then the three SRF management tasks are solved in a unified manner via graph coloring: (1) placing streams in the SRF, (2) exploiting stream use, and (3) maximizing parallelism. We evaluate the performance of our compiler framework by actually running nine representative scientific computing kernels on our FT64 stream processor. Our preliminary results show that compiler management achieves an average speedup of 2.3x compared to First-Fit allocation. In comparison with the performance results obtained from running these benchmarks on Itanium 2, an average speedup of 2.1x is observed. Categories and Subject Descriptors D.3.4 [Programming Languages]:
From Bytecode to Javascript: the Js of ocaml Compiler
"... We present the design and implementation of a compiler from OCaml bytecode to Javascript. We believe that taking bytecode as input instead of a high-level language is a sensible choice. Virtual machines provide a very stable API. Such a compiler is thus easy to maintain. It is also convenient to use ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
(Show Context)
We present the design and implementation of a compiler from OCaml bytecode to Javascript. We believe that taking bytecode as input instead of a high-level language is a sensible choice. Virtual machines provide a very stable API. Such a compiler is thus easy to maintain. It is also convenient to use: it can just be added to an existing installation of the development tools. Already compiled libraries can be used directly, with no need to reinstall anything. Finally, some virtual machines are the target of several languages. A bytecode to Javascript compiler would make it possible to retarget all these languages to Web browsers at once.
Optimal bitwise register allocation using integer linear programming
- In International Workshop on Languages and Compilers for Parallel Computing (LCPC’06), LNCS
, 2006
"... Abstract. This paper addresses the problem of optimal global register allocation. The register allocation problem is expressed as an integer linear programming problem and solved optimally. The model is more flexible than previous graph-coloring based methods and thus allows for register allocations ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract. This paper addresses the problem of optimal global register allocation. The register allocation problem is expressed as an integer linear programming problem and solved optimally. The model is more flexible than previous graph-coloring based methods and thus allows for register allocations with significantly fewer moves and spills. The formulation can also model complex architectural features, such as bit-wise access to registers. With bit-wise access to registers, multiple subword temporaries can be stored in a single register and accessed effi-ciently, resulting in a register allocation problem that cannot be addressed effec-tively with simple graph coloring. The paper describes techniques that can help reduce the problem size of the ILP formulation, making the algorithm feasible in practice. Preliminary empirical results from an implementation prototype are reported. 1
Comparing conservative coalescing criteria
- ACM Trans. Programming Languages and Systems
, 2005
"... Graph-coloring register allocators can eliminate copy instructions from a program by coalescing the interference graph nodes corresponding to the source and destination. Briggs showed that by limiting coalescing to those situations that he dubbed “conservative, ” it could be prevented from causing s ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Graph-coloring register allocators can eliminate copy instructions from a program by coalescing the interference graph nodes corresponding to the source and destination. Briggs showed that by limiting coalescing to those situations that he dubbed “conservative, ” it could be prevented from causing spilling, i.e., a situation where the allocator fails to assign a register to each live range. George and Appel adopted Briggs’s conservativeness criterion in general, but provided an alternative criterion (the George test) to use in those cases where one of the nodes has been “precolored, ” i.e., pre-assigned a specific register. They motivated this alternative criterion by efficiency considerations, and provided no indication of the relative power of the two criteria. Thus it remained an open question whether the efficiency had been bought at the expense of reduced coalescing. Their implementation also used a limited version of the Briggs test, in place of the original, full version, without any comment on the impact of this substitution. In this paper we also present an analogously limited version of the George test. Thus we are now confronted with four different criteria for conservative coalescing: the full and limited Briggs tests and the full and limited George tests. We present a number of theorems characterizing the relative power of these different criteria, and a number of theorems characterizing
Live-range Unsplitting for Faster Optimal Coalescing
"... Register allocation is often a two-phase approach: spilling of registers to memory, followed by coalescing of registers. Extreme liverange splitting (i.e. live-range splitting after each statement) enables optimal solutions based on ILP, for both spilling and coalescing. However, while the solutions ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Register allocation is often a two-phase approach: spilling of registers to memory, followed by coalescing of registers. Extreme liverange splitting (i.e. live-range splitting after each statement) enables optimal solutions based on ILP, for both spilling and coalescing. However, while the solutions are easily found for spilling, for coalescing they are more elusive. This difficulty stems from the huge size of interference graphs resulting from live-range splitting. This report focuses on optimal coalescing in the context of extreme liverange splitting. We present some theoretical properties that give rise to an algorithm for reducing interference graphs, while preserving optimality. This reduction consists mainly in finding and removing useless splitting points. It is followed by a graph decomposition based on clique separators. The last optimization consists in two preprocessing rules. Any coalescing technique can be applied after these optimizations. Our optimizations have been tested on a standard benchmark, the optimal coalescing challenge. For this benchmark, the cutting-plane algorithm for optimal coalescing (the only optimal algorithm for coalescing) runs 300 times faster when combined with our optimizations. Moreover, we provide all the solutions of the optimal coalescing challenge, including the 3 instances that were previously unsolved.
Unroll-based Copy Elimination for Enhanced Pipeline Scheduling
- IEEE Trans. Comput
, 2002
"... ..."
(Show Context)
types and structures
"... We compile Nova, a new language designed for writing network processing applications, using a back end based on integer-linear programming (ILP) for register allocation, optimal bank assignment, and spills. The compiler’s optimizer employs CPS as its intermediate representation; some of the invarian ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
We compile Nova, a new language designed for writing network processing applications, using a back end based on integer-linear programming (ILP) for register allocation, optimal bank assignment, and spills. The compiler’s optimizer employs CPS as its intermediate representation; some of the invariants that this IR guarantees are essential for the formulation of a practical ILP model. Appel and George used a similar ILP-based technique for the IA32 to decide which variables reside in registers but deferred the actual assignment of colors to a later phase. We demonstrate how to carry over their idea to an architecture with many more banks, register aggregates, variables with multiple simultaneous register assignments, and, very importantly, one where bank- and registerassignment cannot be done in isolation from each other. Our approach performs well in practise—without causing an explosion in size or solve time of the generated integer linear programs.