24 citations found. Retrieving documents...
W.-M. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August. Compiler technology for future microprocessors. Proceedings of the IEEE, 83(12):1625--1640, December 1995.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
SUDS: Automatic Parallelization for Raw Processors - Frank (2003)   (Correct)

....loop distribution and critical path reduction share the goal of trying to minimize the time observed to update state visible outside the loop body. Schlansker and Kathail [101] have a critical path reduction algorithm that optimizes critical paths in the context of superblock scheduling [53], a form of trace scheduling [41] Vijaykumar implemented a criticalpath reduction algorithm for the multiscalar processor that moves updates in the control flow graph [120] Steffan et al. have implemented a critical path reduction algorithm based on Lazy Code Motion [63] that moves update ....

W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August. Compiler technology for future microprocessors. Proceedings of the IEEE, 83(12):1625--1640, December 1995.


A General Compiler Framework for Speculative Multithreading - Bhowmik, Franklin (2002)   (1 citation)  (Correct)

....anywhere within a thread. Sometimes, the spawning is delayed until a particular branch or data dependence gets resolved. Apart from these SpMT compiler work, there has been some notable compiler work for other parallelization models. Some of the notable ones among them are the IMPACT compiler [7], the EARTH McCAT compiler[17] and the XMT [12] compiler. The IMPACT compiler takes sequential programs, and performs a variety of optimizations, including predicated execution, superblock formation, and hyperblock formation [7] These optimizations are geared for wide issue uniprocessors. The ....

....models. Some of the notable ones among them are the IMPACT compiler [7] the EARTH McCAT compiler[17] and the XMT [12] compiler. The IMPACT compiler takes sequential programs, and performs a variety of optimizations, including predicated execution, superblock formation, and hyperblock formation [7]. These optimizations are geared for wide issue uniprocessors. The focus of our compiler framework, on the other hand, is to exploit thread level parallelism (TLP) which complements instruction level parallelism (ILP) The EARTH multi threaded framework provides simple extensions to the C ....

W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August. Compiler Technology for Future Microprocessors. Proc. IEEE, 83(12):1625-1640, December 1995.


GCDS: A compiler strategy for trading code size.. - Bodin, Chamski..   (4 citations)  (Correct)

.... scheduling: this transformation performs the scheduling of basic blocks or of superblocks [16, 11, 12] superblocks construction: gathers a set of basic block into a superblock [11] guard insertion adds guards to instructions to remove jumps and thus allows scheduling across jumps [10]. software pipeline generates a modulo scheduling of the loop body. Registers are renamed to achieve low latency. The algorithm used to compute a modulo scheduling is based on the method proposed in [22] register allocation: This transformation allocates registers either before or after ....

Wen-mei W. Hwu, Richard E. Hank, David M. Gallagher, Scott A. Mahlke, Daniel M. Lavery, Grant E. Haab, John C. Gyllenhaal, and David I. August. Compiler technology for future microprocessors. In Proceedings of the IEEE, volume 83, pages 1625--1639, December 1995.


Scalable Procedure Restructuring for Ambitious Optimization - Way (2000)   (Correct)

.... architectures have evolved to incorporate fine grained parallelism where four or more instructions can be executed in parallel [29, 60, 66] To discover and exploit this instruction level parallelism 1 (ILP) within a program, additional compiler optimization techniques have been developed [47]. The goal of an instruction level parallel architecture is to execute multiple, independent instructions in parallel [66] thereby increasing the rate at which instructions are completed. The two most important types of ILP processors are superscalar processors in which the hardware is ....

....and memory caches. Compiler techniques that benefit instruction level parallel computers attempt to identify dependences between instructions, remove these dependences when possible, and reorder, repackage, or schedule the instructions so that the processor can execute them in parallel [47]. The compiler generally can do a better job of finding a parallel schedule for instructions by considering a larger portion of the program. Naturally occurring boundaries within programs, such as functions, loops and branches, are often limiting factors to discovering more parallelism because ....

W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August. Compiler technology for future microprocessors. Proceedings of the IEEE, 83(12):1625--1677, Dec. 1995. 33


OCEANS: Optimising Compilers for Embedded ApplicatioNS - Barreteau, Bodin.. (1998)   (3 citations)  (Correct)

....are currently available within Sea: Register renaming renames local registers in each basic block. Superblock construction merges a set of basic block into a superblock [14] Guard insertion adds guards to instructions to remove jumps and thus allow scheduling across jumps [13]. Loop unrolling (also available at the high level) Local superblock scheduling [14, 15] Software pipeline generates a modulo scheduling of the loop body. The implementation is based on the tools PiLo [19] and LoRa [11] PiLo and LoRa are optimisation kernels based on periodic ....

W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery G. E. Haab, J. C. Gyllenhaal, and D. I. August. Compiler Technology for Future Microprocessors. Proceedings of the IEEE, 83(12), Dec. 1995, pp. 1625--1639.


A framework for profile-driven optimization in the IMPACT binary.. - Merten (1999)   (Correct)

....of optimizations that expose fine grained parallelism, or instruction level parallelism (ILP) allowing for multiple, independent instructions from the execution stream to be executed in parallel. These techniques have been shown to lead to significant performance improvements in scalar code [16]. Other algorithms can optimize code for particular values or for particular control flow paths. However, some of these optimizations actually increase the number of instructions executed. At the same time, the dependence height may be reduced, or second order factors, such as instruction cache ....

W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August, "Compiler technology for future microprocessors," Proceedings of the IEEE, vol. 83, pp. 1625--1995, December 1995.


An overview of the IMPACT x86 binary reoptimization framework - Merten, Thiems (1998)   (Correct)

....makes this technology inapplicable to the stated problem. The first goal of this project is to create a bridge between distributed binary programs and an internal representation of the IMPACT compiler. Doing so enables the application of IMPACT s established compiler technology tools [1] [2] to the reoptimization of binary programs, the primary aim of the project. The Intel x86 instruction set [3] and Microsoft s 32 bit Windows operating systems (Windows 95 and Windows NT) are well established standards in the market. A huge body of binary distributed commercial software exists for ....

W. W. Hwu et al., "Compiler Technology for Future Microprocessors" in Proceedings of the IEEE, Vol. 83, No. 12, December 1995, pp. 1625-1640.


Optimization and Executable Regeneration in the Impact Binary.. - Thiems (1998)   (2 citations)  (Correct)

....reoptimization framework is to apply compiler technology to programs starting from binary form and thus to reoptimize the programs for a specific processor version. In particular, the proposed framework takes advantage of the established compiler technology tools of the IMPACT compiler system [2] [3]. The Intel x86 instruction set [4] and Microsoft s 32 bit Windows operating systems (Windows 95 and Windows NT) are well established standards in the market. A large body of binary distributed commercial software exists for this platform. Furthermore, the platform continues to be advanced with ....

....operand position(s) in which the operand can be found and the expected type of the operand. Each operand is thereby given a number to identify it independent of the Mcode source or destination operand number. In the examples, operand 2 comes from Mcode src[1] while operand 3 is found in Mcode dest[3]. The remaining fields in each entry specify the binary format of the machine instruction. The opcode field specifies the base binary machine opcode, while the s and w fields specify bit positions of that opcode that may also be set depending on the size of various operands. Note that the ....

W. W. Hwu et al., "Compiler Technology for Future Microprocessors" in Proceedings of the IEEE, Vol. 83, No. 12, December 1995, pp. 1625-1640.


A Linker for Effective Whole-program Optimizations - Cilio, Corporaal (1999)   (1 citation)  (Correct)

....before scheduling and register allocation. 3 Optimizations using high level information. A number of low level transformations, particularly those increasing the instruction level parallelism of the program, are more effective when the information computed by source level analyses is available [7]. Our proposal is to compute the required information at the point in compilation which is most suitable for the analysis, and to maintain this information throughout compilation until it is needed by some code transformation pass. Dependence information is one of them. When reordering machine ....

Wen-mei W. Hwu and et al. Compiler technology for future microprocessors. IEEE Transactions on Computers, 41(12):1625--1640, December 1995.


OCEANS: Optimising Compilers for Embedded ApplicatioNS - Barreteau, Bodin.. (1998)   (3 citations)  (Correct)

....of code, then according to performance or size criteria one of the solutions found is chosen and propagated to the low level program representation using the rebuild( method. The optimisations currently available within Sea are: register renaming, superblock construction [12] guard insertion [11], loop unrolling (also available at the high level) local superblock scheduling [12] software pipeline, and register allocation. The implementation of software pipeline is based on the tools PiLo [17] and LoRa [9] which generate a modulo scheduling of the loop body. 5 Integration 5.1 The ....

W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery G. E. Haab, J. C. Gyllenhaal, and D. I. August. Compiler Technology for Future Microprocessors. Proceedings of the IEEE, 83(12), Dec. 1995, pp. 1625--1639.


Data Prefetch Mechanisms For Accelerating Symbolic And Numeric.. - Mehrotra (1996)   (10 citations)  (Correct)

....generated in the instruction stream by branches, compilers use predicated execution to reduce branches from code, by using a hardware conditional move instruction. Most current CPUs now implement a conditional move instruction, including the newly announced version of Intel s Pentium. See [41] for a recent overview of important issues in compilation for instruction level parallelism. We will not say any more about the branch prediction and instruction prefetching problem in this dissertation. Note, however, that the hardware requirements are similar for instructions and data ....

W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August, "Compiler Technology for Future Microprocessors, " Proceedings of the IEEE, vol. 83, pp. 1625 -- 1640, December 1995.


Data Prefetch Mechanisms For Accelerating Symbolic And Numeric.. - Mehrotra (1996)   (10 citations)  (Correct)

....generated in the instruction stream by branches, compilers use predicated execution to reduce branches from code, by using a hardware conditional move instruction. Most current CPUs now implement a conditional move instruction, including the newly announced version of Intel s Pentium. See [41] for a recent overview of important issues in compilation for instruction level parallelism. We will not say any more about the branch prediction and instruction prefetching problem in this dissertation. Note, however, that the hardware requirements are similar for instructions and data ....

W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August, "Compiler Technology for Future Microprocessors, " Proceedings of the IEEE, vol. 83, pp. 1625 -- 1640, December 1995.


Data Prefetch Mechanisms For Accelerating Symbolic And Numeric.. - Mehrotra (1996)   (10 citations)  (Correct)

....generated in the instruction stream by branches, compilers use predicated execution to reduce branches from code, by using a hardware conditional move instruction. Most current CPUs now implement a conditional move instruction, including the newly announced version of Intel s Pentium. See [41] for a recent overview of important issues in compilation for instruction level parallelism. We will not say any more about the branch prediction and instruction prefetching problem in this dissertation. Note, however, that the hardware requirements are similar for instructions and data ....

W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August, "Compiler Technology for Future Microprocessors, " Proceedings of the IEEE, vol. 83, pp. 1625 -- 1640, December 1995.


Code Coverage and Input Variability: Effects on Architecture.. - Hunter, Hwu (2002)   Self-citation (Hwu)   (Correct)

....[10] 11] 12] 13] and [14] The ve kernels comprise the EEMBC telecommunications suite used for performance benchmarking of embedded and DSP processors. 2. 2 Code generation Code generation and performance results presented in this paper were obtained using the IMPACT compiler framework [17]. Discussions will refer to two types of code: CLO (Classically optimized) code, to which only traditional compiler optimizations have been applied; and ILPO code, which has been aggressively control transformed to enhance instructionlevel parallelism. ILPO code has been formed into superblocks ....

W. Hwu, et. al., \Compiler technology for future microprocessors," Proceedings of the IEEE, vol. 83, pp. 1625-1995, Dec. 1995.


The Impact SC140 Code Generator - Shannon (2002)   Self-citation (Hwu)   (Correct)

....the desired level of performance on a compiler, instead of a programmer. This thesis documents the adaptation of an advanced, general purpose, instructionlevel parallelism (ILP) uncovering compiler to generate code for an embedded digital signal processing (DSP) core. The IMPACT compiler [1] o ers many advanced ILP features in a research inspired infrastructure. The StarCore SC140 is a wide issue DSP core capable of capitalizing on large amounts of parallelism as exposed by the compiler. Creating a code generator that targets the SC140 presented many unique challenges due to the ....

W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August, \Compiler technology for future microprocessors, " Proceedings of the IEEE, vol. 83, pp. 1625-1640, December 1995.


Enhancing Loop Buffering of Media and Telecommunications.. - Sias, Hunter, Hwu (2001)   (2 citations)  Self-citation (Hwu)   (Correct)

....Code added to inner loop A B C D E F A B C D E F A B B B B A C B C A (b) Loop collapsing (a) Loop peeling 3 Figure 1: Nested loop transformations dependence height. This places the following transformations, implemented in the context of the IMPACT compiler framework [12], in the general realm of instruction level parallel (ILP) transformations. As previously stated, the use of a loop buffering technique requires removal of all internal control flow from the loop to be buffered. If conversion [7] converts any acyclic region of control flow into an equivalent ....

W. W. Hwu, R. E. Hank, D. M. Gallagher, et al., "Compiler technology for future microprocessors," Proceedings of the IEEE, vol. 83, pp. 1625--


Modulo Schedule Buffers - Merten, Hwu (2001)   (1 citation)  Self-citation (Hwu)   (Correct)

....a 16 bit stage tag. Each buffer would contain 32 entries for a total of 192 bytes. Therefore, the total system could require 1536 bytes of storage plus a handful of registers for the control unit. 3. Experimental Results In order to explore the effectiveness of our system, the IMPACT Compiler [17] was enhanced to generate code for kernel only modulo scheduling and for the Modulo Schedule Buffers using Iterative Modulo Scheduling [6] Twelve MediaBench [18] programs, a MP3 player, and a GSM codec [19] were emulated to verify the correctness of our system while examining loop ....

W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August, "Compiler technology for future microprocessors," Proc. of the IEEE, vol. 83, pp. 1625--


Automatic Formal Verification for Scheduled VLIW Code - Xiushan Feng Xsfeng (2002)   (1 citation)  (Correct)

No context found.

W.-M. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August. Compiler technology for future microprocessors. Proceedings of the IEEE, 83(12):1625--1640, December 1995.


Automatic Formal Veri - Cation For Scheduled   (Correct)

No context found.

Wen-Mei W. Hwu, Richard E. Hank, David M. Gallagher, Scott A. Mahlke, Daniel M. Lavery, Grant E. Haab, John C. Gyllenhaal, and David I. August. Compiler technology for future microprocessors. Proceedings of the IEEE, 83(12):1625-1640, December 1995. 52


DRESC: A Retargetable Compiler for Coarse-Grained.. - Mei, Vernalde.. (2002)   (1 citation)  (Correct)

No context found.

W. Hwu, R. Hank, D. Gallagher, S. Mahlke, D. Lavery, G. Haab, J. Gyllenhaal, and D. August, "Compiler technology for future microprocessors," Proceedings of the IEEE, vol. 83, pp. 1625-- 1640, Dec. 1995.


Compiling for Coarse-Grained Adaptable Architectures - Ebeling (2002)   (Correct)

No context found.

W. Hwu, R. Hank, D. Gallagher, S. Mahlke, D. Lavery, G. Haab, J. Gyllenhaal, and D. August. Compiler Technology for Future Microprocessors, Proceedings of the IEEE, December 1995, pp. 1625-1640.


The Program Optimization Spectrum - Bodik   (Correct)

No context found.

W.W. Hwu et al. Compiler technology for future microprocessors. IEEE, 83:1625--1640, 1995.


Worst-Case Execution Time Analysis for Optimized Code - Engblom (1998)   (3 citations)  (Correct)

No context found.

W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August. Compiler technology for future microprocessors. Proceedings of the IEEE, 83(12):1625--1995, December 1995.


Static Instruction Scheduling For Dynamic Issue Processors - Silvera (1997)   (Correct)

No context found.

W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August. Compiler technology for future microprocessors. Proceedings of the IEEE, 83(12):1625--1995, December 1995.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC