11 citations found. Retrieving documents...
W.-m. W. Hwu, P. P. Chang, Efficient Instruction Sequencing with Inline Target Insertion, 41(12), 1992, pp. 1537-1551.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Loop Optimization Techniques On Multi-Issue Architectures - Kaiser   (Correct)

....less optimal than the unrolled code; if the loop typically executes few iterations, loop unrolling can be detrimental to performance. One way to overcome this problem is to add code specially designed to execute the loop a constant few iterations (Hwu calls this type of structure a superblock in [75]) There is also a secondary cost of loop unrolling in some architectures caused by the additional cache misses due to the increased code size [115] 116] 40] 171] The efficiency of loop unrolling quickly drops in relation to the size of original loop inefficiency and the unroll count. It is easy ....

W.-m. W. Hwu, P. P. Chang, Efficient Instruction Sequencing with Inline Target Insertion, 41(12), 1992, pp. 1537-1551.


Compiler Support For Sparc Architecture Processors - Roland Ouellette Massachusetts (1994)   (6 citations)  (Correct)

....the advantages of scheduling superblocks especially on superpipelined superscalar processors. Reference [10] shows the importance of function inlining in compiling C programs. Reference [11] shows how instruction placement may be improved after function inlining has been performed. Reference [12] is a later version of the report [11] published as an article. Reference [13] describes some of the early work of Po Hua Chang and Wen Mei Hwu applying trace selection to large C programs. The trace scheduling technology was later incorporated into the IMPACT I compiler. Reference [14] shows ....

W. W. Hwu and P. P. Chang, "Efficient instruction sequencing with inline target insertion," IEEE Transactions on Computers, accepted for publication.


Compiler Support For Sparc Architecture Processors - Roland Ouellette Massachusetts (1994)   (6 citations)  (Correct)

....technical report showing the advantages of scheduling code prior to register allocation. Reference [9] shows the advantages of scheduling superblocks especially on superpipelined superscalar processors. Reference [10] shows the importance of function inlining in compiling C programs. Reference [11] shows how instruction placement may be improved after function inlining has been performed. Reference [12] is a later version of the report [11] published as an article. Reference [13] describes some of the early work of Po Hua Chang and Wen Mei Hwu applying trace selection to large C programs. ....

....especially on superpipelined superscalar processors. Reference [10] shows the importance of function inlining in compiling C programs. Reference [11] shows how instruction placement may be improved after function inlining has been performed. Reference [12] is a later version of the report [11] published as an article. Reference [13] describes some of the early work of Po Hua Chang and Wen Mei Hwu applying trace selection to large C programs. The trace scheduling technology was later incorporated into the IMPACT I compiler. Reference [14] shows how compiler technology may be used to ....

W. W. Hwu and P. P. Chang, "Efficient instruction sequencing with inline target insertion," Tech. Rep. CSG-123, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, May 1990. 36


Data Preload For Superscalar And VLIW Processors - Chen, Jr. (1993)   (16 citations)  (Correct)

.... such as loop unrolling, register renaming, and critical path reduction, have been successful in removing register dependences within applications [1] Aggressive branch handling techniques, such as branch target insertion, are utilized to allow the execution of multiple branches per cycle [2]. The combination of these optimizations gives the code scheduler more freedom to reorder instructions. Unfortunately, the amount of static instruction reordering may be severely restricted due to dependences between memory instructions. Because memory references often occur on program critical ....

W. W. Hwu and P. P. Chang, "Efficient instruction sequencing with inline target insertion," IEEE Transactions on Computers, Dec. 1992.


Hardware Support for Hiding Cache Latency - Golden (1993)   (13 citations)  (Correct)

....solutions such as a buffer into which future instructions can be prefetched. Much time has been invested in researching the various techniques of instruction prefetching[6] Highly accurate branch prediction schemes, both static and dynamic, have been developed to make this process effective[7, 20, 12, 16]. A large latency in accessing the data cache presents a more difficult problem. Write buffers can eliminate the bottleneck in storing data to the memory system[6] but the loading of data cannot be effectively buffered in this way because the results are desired immediately. One method of hiding ....

W.-M. Hwu and P. P. Chang, "Efficient instruction sequencing with inline target insertion," IEEE Transactions on Computers, 1992. Accepted for Publication.


Lanalysis: A Performance Analysis Tool For The Impact Compiler - Cho (1996)   Self-citation (Hwu)   (Correct)

....may be inserted into the Hcode which is then reverse translated to C and compiled. The resulting executable will produce a profile database [3] which is merged back into the Hcode representation. In addition to program execution profiling, profile guided code layout and function inline expansion [4] may be performed at the Hcode level. After processing is completed at the Hcode representation level, the code is translated to the Lcode format. Lcode is a machine independent assemblylike representation similar to many load store RISC instruction sets. The Lcode intermediate format will be ....

W. W. Hwu and P. P. Chang, "Efficient instruction sequencing with inline target insertion," IEEE Transactions on Computers, vol. 41, pp. 1537-51, December 1992.


Efficient Instruction Sequencing with Inline Target Insertion - Hwu, Chang (1990)   (5 citations)  Self-citation (Hwu Chang)   (Correct)

No context found.

W. W. Hwu and P. P. Chang, "Efficient Instruction Sequencing with Inline Target Insertion", Technical Report CSG-103, Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, 1990.


Three Superblock Scheduling Models for Superscalar and.. - Pohua Chang Nancy (1991)   (6 citations)  Self-citation (Hwu Chang)   (Correct)

....set is a superset of the MIPS R2000 instruction set with additional branching modes [15] Table 2 shows the instruction latencies. Instructions are issued in order. Read after write hazards are handled by stalling the instruction unit pipeline. The microarchitecture uses a squashing branch scheme [14] and profile based branch prediction. For the base processor, one branch slot is allocated by the compiler for each predicted taken branch. The processor has 64 integer registers and 32 floatingpoint registers. 7 The superscalar version of this processor fetches multiple instructions into an ....

....simultaneously is called the issue rate. The superscalar processor also contains multiple function units. In this study, unless otherwise specified, every instruction can be executed from every instruction slot. When the issue rate is greater than one, the number of branch slots increases [14]. The superpipelined version of this processor has deeper pipelining for each function unit. If the number of pipeline stages is increased by a factor P, the clock cycle is reduced by approximately the same factor. The latency in clock cycles is longer, but in real time it is the same as the base ....

[Article contains additional citation context not shown here]

W. W. Hwu and P. P. Chang, "Efficient Instruction Sequencing with Inline Target Insertion", Coordinated Science Laboratory Report, UILU-ENG-90-2215, CSG-123, May, 1990.


Three Architectural Models for Compiler-Controlled.. - Chang, Warter.. (1995)   (11 citations)  Self-citation (Hwu Chang)   (Correct)

....are handled by stalling the instructionTable 3: Instruction latencies. Function Latency integer ALU 1 barrel shifter 1 integer multiply 3 integer divide 25 load 2 store FP ALU 3 FP conversion 3 FP multiply 4 FP divide 25 unit pipeline. The microarchitecture uses a squashing branch scheme [27] and profile based branch prediction. Branch prediction is used to layout the superblocks such that the branches are likely not taken. If the branch is taken, the instruction(s) following the branch is squashed. If the branch is predicted taken, the base processor has one branch delay slot. The ....

....rate. The superscalar processor also contains multiple function units. In this study, unless otherwise specified, we assume uniform function units where every instruction can be executed from every instruction slot. When the issue rate is greater than one, the number of branch slots increases [27]. The superpipelined version of this processor has deeper pipelining for each function unit. If the number of pipeline stages is increased by a factor P, the clock cycle is reduced by approximately the same factor. The latency in clock cycles is longer, but in real time it is the same as the base ....

[Article contains additional citation context not shown here]

W. W. Hwu and P. P. Chang, "Efficient instruction sequencing with inline target insertion," Tech. Rep. CSG-123, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, May 1990.


Comparing Static And Dynamic Code Scheduling for.. - Chang, Chen, Mahlke, Hwu (1991)   (9 citations)  Self-citation (Hwu Chang)   (Correct)

No context found.

W. W. Hwu and Pohua P. Chang, "Efficient Instruction Sequencing with Inline Target Insertion", Coordinated Science Laboratory Report, UILU-ENG-902215, CSG-123, May, 1990.


IMPACT: An Architectural Framework for.. - Chang, Mahlke.. (1991)   (115 citations)  Self-citation (Hwu Chang)   (Correct)

....there is at most one store or load per cycle, the performance of a fourissue machine approaches that of a two issue machine. The IMPACT I C compiler is designed to support multiple branch operations per cycle. We have developed a variant of the squashing branch, called inline target insertion [Hwu 90] Chang 89.1] which allows concurrent execution of branch operations. Furthermore, inline target insertion allows branch operations to be fetched from branch slots and independent of the length of the control unit pipeline, only one program counter needs to be saved in order to return from an ....

W. W. Hwu and P. P. Chang, "Efficient Instruction Sequencing with Inline Target Insertion", Coordinated Science Laboratory Report, UILU-ENG-90-2215, CSG-123, May, 1990.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC