44 citations found. Retrieving documents...
Chang, P.P., mei W. Hwu, W.: Trace selection for compiling large c application programs to microcode. In: Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture. (1988) 21--29

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Security in Dynamic Execution Environments - Inoue (2001)   (Correct)

....The profile required for trace scheduling must be constructed by examining execution paths. This is usually done by edge profiling. If basic blocks are viewed as nodes on a graph, edges are branches. A trace is constructed by following the most commonly executed edge out of a basic block [15]. These traces, since they are now contiguous, can be optimized as one large block. 8 A group at Microsoft research has pointed out that this heuristic is not perfect bad traces can be constructed [7] They have discovered another technique that builds slightly longer paths. They have ....

Pohua P. Change. Trace selection for compiling large c application programs to microcode. In 21th Annual Workshop on Microprogramming and Microarchitecture (Micro 21), pages 21--29, November 1988.


Clustered VLIW Architectures with Predicated Switching - Jacome, de Veciana, Pillai (2001)   (3 citations)  (Correct)

.... opportunities for speculation (predicate promotion) occur quite frequently (e.g. in the Mediabench benchmark) In fact, this form of speculation has been shown to be effective, while avoiding the code explosion problems associated with other compiler directed speculation techniques, see e.g. [13, 4, 10]. Space precludes us from discussing generic criteria used in selecting the code segments to be predicated. For details we refer the reader to [2] In the sequel we will refer to if conversion based predicated and speculated code as standard predication. 3 Predicated Switching In this section we ....

.... transformation generates code whose execution performance on clustered machines is frequently superior to that achieved by standard predication with predicate promotion, and never worse (in clustered or centralized machines) 10 More traditional compiler directed speculation techniques, such as [13, 4, 10, 9] can only reliably improve performance when there is a significant bias on execution paths, and typically lead to code explosion. In [6] the authors propose a predicated single static assignment conversion to enable aggressive speculation on VLIW machines. Unfortunately, the proposed ....

P.Chang and W.Hwu. Trace selection for compiling large c applications to microcode. In Proceedings of the 21st International Workshop on Microprogramming and Microarchitecture, pages 188--198, November 1988.


Microarchitectural and Compile-Time Optimizations for.. - Kalamatianos (2000)   (1 citation)  (Correct)

....their temporal reuse. The code reordering algorithm they employed is described in [55] In [83] several code transformations, including procedure inlining and intraprocedural basic block reordering, are examined, as they relate to instruction cache design. They used the algorithm described in [66, 86] to form traces and reduce a function s most frequently executed part. Their approach was found to increase code sequentiality, as well as code size, improving performance when the application s working set did not fit in the instruction cache. In [85] the authors compare profile guided code ....

W.M. Hwu and P.P. Chang. Trace Selection for compiling Large C Application Programs to Microcode. In Proceedings of the International Workshop on Microarchitecture and Microprogramming, November 1988.


Using Branch Handling Hardware to Support Profile-Driven.. - Conte, Patel, Cox (1994)   (11 citations)  (Correct)

....superblocks were used to extend the scope of traditional optimizations [2] Superblock formation and trace selection both use the same heuristics to form traces. Superblocks differ from traces in the method for providing fix up code for off trace superblock execution and tail duplication [2] 9] [15]. Either method results in significant code size explosion. To limit this explosion, a threshold is placed on the execution frequency of a block. If a block s frequency is below this threshold, it is not considered for trace membership. This is discussed in more detail in Section 3.4 below) ....

....the estimated profiles and compare the results. An example of trace selection is illustrated in Figure 4. Graph (a) is annotated with the actual profile information, whereas graph (b) is the hardwaregenerated profile. Traces are formed using an arc trace selection threshold of 60 to group blocks [15]. Code explosion is avoided by not extending traces to blocks with low weights. This is implemented as a threshold, T . Values of T = 0.1 , 1 , 3 and 5 are considered below. The metric for trace selection error is introduced using the example of Figure 4. In the actual graph (graph (a) basic ....

W. W. Hwu and P. P. Chang, "Trace selection for compiling large C application programs to microcode," in Proc. 21st Ann. Workshop on Microprogramming and Microarchitectures, (San Diego, CA.), Nov. 1988.


Region Formation Analysis with Demand-driven Inlining for.. - Ben (2000)   (1 citation)  (Correct)

....[14] included classical optimizations like global common subexpression elimination, dead code removal, and code motion, but particularly found register allocation and scheduling to be enhanced by this approach. Region based compilation as proposed by Hank et al. and as implemented in the IMPACT [5] compiler is accomplished by performing an aggressive inlining pass, followed by a partitioning phase that forms new regions based on a heuristic, bundles regions to look like functions, and passes these compiler created functions to the unchanged optimization phases. While this approach can ....

....exposing interprocedural scheduling and optimization opportunities without the cost of very large function bodies created through inlining, or the expense and complexity of sophisticated interprocedural analysis and code motion. This region based compilation framework is embellished in the IMPACT [5] and Trimaran compilers [21] Limited forms of region based compilation were used in the Multiflow [20] and Cydrome [10] compilers. While they have shown it to be especially beneficial in an ILP compiler, region based compilation also can be useful for achieving both interprocedural scope and ....

[Article contains additional citation context not shown here]

P. P. Chang and W. W. Hwu. Trace selection for compiling large C application programs to microcode. Proceedings of the 21st International Workshop on Microprogramming and Microarchitecture, pages 188--198, Nov. 1988.


Scalable Procedure Restructuring for Ambitious Optimization - Way (2000)   (Correct)

....of very large function bodies created through inlining, or the expense and complexity of sophisticated interprocedural analysis and code motion. Hank s region based compilation framework is the direct descendant of superblock and hyperblock research [40, 48, 54] and is embellished in the IMPACT [15] and Trimaran compilers [55] Limited forms of region based compilation were used in the Multiflow [53] and Cydrome [61, 25] compilers. While it has proven to be especially beneficial in an ILP compiler, region based compilation also can be useful for achieving both interprocedural scope and ....

....the quality of the generated code depends upon the ability of the compiler to efficiently transform individual regions in isolation. Hank et al. use a profile sensitive region formation process that is a generalization of the profile based trace selection algorithm used in the IMPACT compiler [15], which is closely related to other trace selection and scheduling research [31, 53] In practice, each region is encapsulated in a single entry single exit CFG by adding dummy prologue and epilogue blocks and boundary condition blocks that convey variable liveness at the region exit points. Side ....

P. P. Chang and W. W. Hwu. Trace selection for compiling large C application programs to microcode. Proceedings of the 21st International Workshop on Microprogramming and Microarchitecture, pages 188--198, Nov. 1988.


Control Independence in Trace Processors - Rotenberg (1999)   (6 citations)  (Correct)

....binaries; PEs are managed in a fifo queue so CGCI is not explicitly exploited. Other related work includes trace selection studies for trace caches and trace processors (Peleg Weiser, 1995; Rotenberg et al. 1996, 1997; Patel et al. 1997, 1998) trace selection for compilers (Fisher, 1981; Hwu Chang, 1988), and task selection for multiscalar processors (Vijaykumar, 1998; Vijaykumar Sohi, 1998) 1.3 Paper Organization Section 2 describes the trace processor s novel window management, i.e. support for instruction insertion removal from the middle of the window (both control flow and data flow ....

Hwu, W., & Chang, P. (1988). Trace selection for compiling large C application programs to microcode. In Proceedings of the 21st International Symposium on Microarchitecture.


Trace Processors: Exploiting Hierarchy And Speculation - Rotenberg (1999)   (3 citations)  (Correct)

....evaluates sizes up to that of a single VAX instruction and a basic block, it also suggests joining two consecutive basic blocks if the intervening branch is highly predictable . In [60] software basic block enlargement is discussed. In the spirit of trace scheduling [19] and trace selection [35], the compiler uses profiling to identify candidate basic blocks for merging into a single execution atomic unit. The hardware sequences at the level of execution atomic units as created by the compiler. The advantage of this approach is the compiler can optimize and schedule across basic block ....

....perform comparably to com 33 plete bypasses because communication is localized as much as possible within each cluster. 2.2.5 VLIW and block structured ISAs The concept of traces has long existed in the software realm of instruction level parallelism. Early work by Fisher [19] Hwu and Chang [35], and others on trace scheduling and trace selection for microcode recognized the problem imposed by branches on code optimization. Subsequent VLIW architectures and novel ISA techniques, for example [36,61,33] further promote the ability to schedule long sequences of instructions containing ....

W. Hwu and P. Chang. Trace Selection for Compiling Large C Application Programs to Microcode. 21st International Symposium on Microarchitecture, December 1988.


Optimizing the Instruction Cache Performance of the.. - Torrellas, Xia, Daigle (1995)   (38 citations)  (Correct)

....McFarling s technique [16] uses a profile of the conditional, loop, and routine structure of the program. With this information, he places the basic blocks so that callers of routines, loops, and conditionals do not interfere with the callee routines or their descendants. Hwu and Chang s technique [7, 15] is based on identifying groups of basic blocks within a routine that tend to execute in sequence. These basic blocks are then placed in contiguous cache locations. Furthermore, routines are placed such that frequent callee routines follow immediately after their callers. The algorithms used by ....

....In that case, we start again from the seed looking for the next acceptable basic block. Note that we often end up placing some of the basic blocks of a callee routine surrounded by basic blocks of the caller. This is one of the main differences between an algorithm proposed by Chang and Hwu [7] and ours. Once we have created the sequences out of the seeds, we catenate them and place them in the cache contiguously. With this placement, we expose much spatial locality and, consequently, reduce self interference misses. We will describe the algorithm in detail in Section 4. In this ....

P. P. Chang and W. W. Hwu. Trace Selection for Compiling Large C Application Programs to Microcode. In Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitectures, pages 21--29, November 1988.


Control Independence in Trace Processors - Rotenberg, Smith (1999)   (6 citations)  (Correct)

....these gains are due to manually inserted, FGCIlike trace selection hints conveyed in the benchmark binaries; PEs are managed in a fifo queue so CGCI is not explicitly exploited. Basic trace selection studies for trace caches and trace processors can be found in [8,9,10,5,24] and for compilers in [25,26]. Task selection studies can be found in [27,28] 1.3. Paper organization Section 2 describes the trace processor s novel window management, i.e. support for instruction insertion removal from the middle of the window (both control flow and data flow aspects) This is followed by trace selection ....

W. Hwu and P. Chang. Trace selection for compiling large c application programs to microcode. 21st Intl. Symp. on Microarch. , Dec 1988.


Compiler Support For Sparc Architecture Processors - Roland Ouellette Massachusetts (1994)   (6 citations)  (Correct)

....processors. Reference [10] shows the importance of function inlining in compiling C programs. Reference [11] shows how instruction placement may be improved after function inlining has been performed. Reference [12] is a later version of the report [11] published as an article. Reference [13] describes some of the early work of Po Hua Chang and Wen Mei Hwu applying trace selection to large C programs. The trace scheduling technology was later incorporated into the IMPACT I compiler. Reference [14] shows how compiler technology may be used to improve performance by improving ....

P. P. Chang and W. W. Hwu, "Trace selection for compiling large C application programs to microcode," in Proceedings of the 21st International Workshop on Microprogramming and Microarchitecture, pp. 188--198, November 1988.


Commercializing Profile-Driven Optimization - Stan Cox David (1995)   (1 citation)  (Correct)

....3 4 5 6 7 8 9 10 11 12 13 0.6 0.4 0.25 0.75 (b) 0.69 0.31 0.08 0.92 0.55 0.45 0.25 0.08 0.01 0.07 0.04 0.03 0.17 0.02 0.01 0.03 0.18 0.82 0.06 0.15 0.09 Estimated (hardware generated) graph Figure 3: Trace selection example. trace selection threshold of 60 to group blocks [14]. Code explosion is avoided by not extending traces to blocks with low weights. This is also implemented as a threshold. Threshold values of 0.1 , 1 , 3 and 5 are considered below. The lower this threshold, the larger the code size of the generated executable, since additional patch up code is ....

W. W. Hwu and P. P. Chang, "Trace selection for compiling large C application programs to microcode," in Proc. 21st Ann. Workshop on Microprogramming and Microarchitectures, (San Diego, CA.), Nov. 1988.


Efficient Path Profiling - Ball, Larus (1996)   (95 citations)  (Correct)

....code. Recently, fine grain profiles of basic blocks and control flow edges have become the basis for profile driven compilation, which uses measured frequencies to guide compilation and optimization. One use of profile information is to identify heavily executed paths (or traces) in a program [Fis81, Ell85, Cha88, YS94]. Unfortunately, basic block and edge profiles, although inexpensive and widely available, do not always correctly predict frequencies of overlapping paths. Consider, for example, the control flow graph (CFG) in Figure 1. Each edge in the CFG is labeled with its frequency, which normally results ....

....1. Each edge in the CFG is labeled with its frequency, which normally results from dynamic profiling, but in the figure is induced by path profiles in the table. A commonly used heuristic to select a heavily executed path follows the most frequently executed edge out of a basic block [Cha88], 1 which identifies path # . However, in path profile ) this path executed only 60 times, as compared to 90 times for path , and 100 times for path . # . In profile ) 10 , the disparity is even greater although the edge profile is exactly the ....

Pohua P. Chang. Trace selection for compiling large C application programs to microcode. In 21th Annual Workshop on Microprogramming and Microarchitecture (MICRO 21), pages 21--29, November 1988.


Enhancing Instruction Level Parallelism Through.. - Bringmann (1995)   (5 citations)  (Correct)

....across basic block boundaries by removing the constraints due to side entrances within a sequence of basic blocks. Superblocks are formed in two steps. Traces within a program (sets of basic blocks which tend to execute in sequence [8] are first identified using execution profile information [24]. Tail duplication is then performed to eliminate any side entrances to the trace [25] The basic blocks in a superblock need not be consecutive in the code. However, our implementation restructures the code so that all blocks in a superblock appear in consecutive order to the optimizer and ....

P. P. Chang and W. W. Hwu, "Trace selection for compiling large C application programs to microcode," in Proceedings of the 21st International Workshop on Microprogramming and Microarchitecture, pp. 188--198, November 1988.


Trace Cache Design for Wide-Issue Superscalar Processors - Patel (1999)   (5 citations)  (Correct)

....a program (usually a subroutine) This trace is treated as a unit, as if all internal branches were removed, giving the compiler a larger scope on which to apply optimizations and scheduling. Fix up code is added to repair the cases where an internal branch did not behave as expected. Superblocks [7, 8, 19] build upon the trace scheduling concept by dividing the subprogram along most likely paths called superblocks, each composed of multiple basic blocks. Each superblock has only one entry point, but can have multiple exit points. Superblock 20 formation allows certain basic blocks to be duplicated ....

P. P. Chang and W. W. Hwu, \Trace selection for compiling large c application programs to microcode," in Proceedings of the 21st Annual ACM/IEEE International Symposium on Microarchitecture, pp. 21-29, 1988.


Automatic Annotation Of Instructions With Profiling Information - Johnson (1995)   (2 citations)  (Correct)

.... predictable branches, even with different input data sets [3] Control flow profiling has been used to identify frequent and infrequent execution paths, which aids trace scheduling [8] ILP enhancing optimizations, software pipelining, and classic global and loop optimizations [1] 2] 9] [10]. Profiling information has also been used to guide instruction placement [11] 12] to help the register allocator identify frequently accessed variables [13] 14] and to aid the compiler with inlining expansion [15] 16] Memory dependence profiling has been used to aid ILP enhancing ....

P. P. Chang and W. W. Hwu, "Trace selection for compiling large C application programs to microcode," in Proceedings of the 21st International Workshop on Microprogramming and Microarchitecture, pp. 188--198, November 1988.


Dynamic Control Of Compile Time Using Vertical Region-Based.. - Braun   Self-citation (Hwu)   (Correct)

....in the region selection process allows the compiler to apply transformations over compilation units which are more representative of the dynamic behavior of the program. The algorithm used for region selection is a generalization of the profile based trace selection algorithm used within IMPACT [25]. The difference between the two algorithms is that the region selection algorithm is able to expand a region along multiple control flow paths, while the trace selection algorithm is limited to a single path. There are four basic steps to the region selection algorithm. First, a seed basic block ....

P. P. Chang and W. W. Hwu, "Trace selection for compiling large C application programs to microcode," in Proceedings of the 21st International Workshop on Microprogramming and Microarchitecture, pp. 188--198, November 1988.


Efficient Instruction Sequencing with Inline Target Insertion - Hwu, Chang (1990)   (5 citations)  Self-citation (Chang Hwu)   (Correct)

....which closely resembles MIPS R2000 3000[25] with modifications to accommodate Inline Target Insertion. The IMPACT I C Compiler, an optimizing C compiler developed for deep pipelining and multiple instruction issue at the University of Illinois, is used to generate code for all the experiments [4][21] 6] 7] 4.1 The Benchmark Table 3 presents the benchmarks chosen for this experiment. The C lines column describes the size of the benchmark programs in number of lines of C code (not counting comments) The runs column shows the number of inputs used to generate the profile databases and ....

....3. The use of many different real inputs to each program is intended to verify the stability of Inline Target Insertion using profile information. The IMPACT I compiler automatically applies trace selection and placement, and has removed unnecessary unconditional branches via code restructuring [4][6] 4.2 Code Expansion The problem of code expansion has to do with the frequent occurrence of branches in programs. Inserting target instructions for a branch adds N instructions to the static program. 18 In Figure 8, target insertion for F and I increases the size of the loop from 5 to 9 ....

P. P. Chang and W. W. Hwu, "Trace Selection for Compiling Large C Application Programs to Microcode", Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitectures, pp.21-29, November, 1988.


Three Superblock Scheduling Models for Superscalar and.. - Pohua Chang Nancy (1991)   (6 citations)  Self-citation (Chang Hwu)   (Correct)

.... are effective at scheduling code across iterations in a well defined manner [18] 23] 16] For control intensive code, profiling provides accurate branch prediction [13] Once the direction of the branch is determined, blocks which tend to execute together can be grouped to form a trace[9] [3]. To reduce some of the bookkeeping complexity, the side entrances to the trace can be removed to form a superblock [5] In dynamically and statically scheduled processors in which the scheduling scope is enlarged by predicting the branch direction, there are possible hazards to moving ....

P. P. Chang and W. W. Hwu, "Trace Selection for Compiling Large C Application Programs to Microcode", Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitectures, pp.21-29, San Diego, California, November, 1988.


Design And Implementation Of A Portable Global Code Optimizer - Mahlke (1991)   (10 citations)  Self-citation (Chang Hwu)   (Correct)

....in a sequence and groups them into a trace [13] 12] The definition of a trace is the same as the definition of a super block, except that the program control is not restricted to enter at the first basic block. An experimental study of several trace selection algorithms was reported 57 in [5]. Figure 4.1 shows the result of trace selection. Each dotted line box represents a trace. There are three traces: fA; B;E;Fg, fCg, and fDg. After trace selection, each trace is converted into a super block by duplicating the tail part of the trace, to ensure that the program control can enter ....

P. P. Chang and W. W. Hwu, "Trace selection for compiling large C application programs to microcode," Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitectures, November 1988, pp. 21-29.


The Effect of Code Expanding Optimizations on Instruction .. - Chen, Chang, Conte, Hwu (1993)   (23 citations)  Self-citation (Hwu Chang)   (Correct)

....the sequential and spatial localities, and decreases cache mapping conflicts of the instruction accesses. For a given function body, several steps are taken to reorder the instruction sequence. For each function, basic blocks which tend to execute in sequence are grouped into traces [22] [23]. Traces are the basic units used for instruction placement. The algorithm starts with the function entrance trace and expands the placement by placing the most important descendent after it. The placement continues until all the traces with non zero execution profile count have been placed. ....

W. W. Hwu and P. P. Chang, "Trace selection for compiling large C application programs to microcode," in Proc. 21st Ann. Workshop on Microprogramming and Microarchitectures, (San Diego, CA.), Nov. 1988.


The Use of Traces for Inlining in Java Programs - Borys Bradel And   (Correct)

No context found.

Chang, P.P., mei W. Hwu, W.: Trace selection for compiling large c application programs to microcode. In: Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture. (1988) 21--29


The Use of Traces for Inlining in Java Programs - Borys Bradel And   (Correct)

No context found.

Chang, P.P., mei W. Hwu, W.: Trace selection for compiling large c application programs to microcode. In: Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture. (1988) 21--29


TFP: Time-sensitive, Flow-specific Profiling at Runtime - Nandy, Gao, Ferrante (2003)   (Correct)

No context found.

P. P. Chang and W. W. Hwu. Trace Selection for Compiling Large C Application Programs to Microcode. In Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture, pages 21--29. IEEE Computer Society Press, 1988.


Compiler-assisted Full Checkpointing - Li, Stewart, Fuchs (1994)   (5 citations)  (Correct)

No context found.

P. P. Chang and W.-M. W. Hwu, `Trace selection for compiling large C application programs to microcode', The 21st Annual Workshop on Microprogramming and Microarchitecture, November 1988, pp. 21--29.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC