• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Procedure restructuring for ambitious optimization (2002)

by T P Way
Add To MetaCart

Tools

Sorted by:
Results 1 - 3 of 3

Procedure Cloning and Integration for Converting Parallelism From Coarse to Fine Grain

by Won So, Alex Dean - In Proceedings of Seventh Workshop on Interaction between Compilers and Computer Architecture (INTERACT-7 , 2003
"... This paper introduces a method for improving program run-time performance by gathering work in an application and executing it efficiently in an integrated thread. Our methods extend whole-program optimization by expanding the scope of the compiler through a combination of software thread integratio ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
This paper introduces a method for improving program run-time performance by gathering work in an application and executing it efficiently in an integrated thread. Our methods extend whole-program optimization by expanding the scope of the compiler through a combination of software thread integration and procedure cloning. In each experiment we integrate a frequently executed procedure with itself twice or thrice, creating two clones. Then, based on profile data we select at compile time the fastest version (original or clone) and modify call sites as needed. These techniques convert parallelism at the procedure level to the instruction level, improving performance on ILP uniprocessors. This is quite useful for media-processing applications that feature large amounts of such parallelism. We demonstrate our technique by cloning and integrating three procedures from cjpeg and djpeg at the C source code level, compiling with four compilers for the Itanium EPIC architecture and measuring the performance with the on-chip performance measurement units. Detailed performance analysis shows the primary bottleneck to be the Itanium’s 16K instruction cache, which has limited room for the code expansion introduced by thread integration. For cjpeg, which is not significantly constrained by the i-cache, we find integration consistently improves code generated by all compilers but one, with a mean program speedup of 11.9%.

Function Outlining and Partial Inlining

by Peng Zhao
"... Frequently invoked large functions are common in non-numeric applications. These large functions present challenges to modern compilers not only because they require more time and resources at compilation time, but also because they may prevent optimizations such as function inlining. However, usual ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Frequently invoked large functions are common in non-numeric applications. These large functions present challenges to modern compilers not only because they require more time and resources at compilation time, but also because they may prevent optimizations such as function inlining. However, usually it is the case that large portions of the code in a hot function fhost are executed much less frequently than fhost itself. Partial inlining is a natural solution to the problems caused by including cold code segments that are seldom executed into hot functions that are frequently invoked. When applying partial inlining, a compiler outlines cold statements from a hot function fhost. After outlining, fhost becomes smaller and thus can be easily inlined. This paper presents a framework for function outlining and partial inlining that includes several innovations: (1) an abstract-syntax-tree-based analysis and transformation to form cold regions for outlining; (2) a set of ¤exible heuristics to control the aggressiveness of function outlining; (3) several possible function outlining strategies; (4) alias agent, a new technique that overcomes negative side-effects of function outlining. With the proper strategy, partial inlining improves performance by up to 5.75%. A performance study also suggests that partial inlining is not effective on enabling more aggressive inlining. The performance improvement from partial inlining actually comes from better code placement and better code generation. 1

Commentary

by Derek M. Jones, Copyright Derek Jones , 2005
"... The material in the C99 subsections is copyright © ISO. The material in the C90 and C++ sections that is quoted from the respective language standards is copyright © ISO. Credits and permissions for quoted material is given where that material appears. ..."
Abstract - Add to MetaCart
The material in the C99 subsections is copyright © ISO. The material in the C90 and C++ sections that is quoted from the respective language standards is copyright © ISO. Credits and permissions for quoted material is given where that material appears.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University