15 citations found. Retrieving documents...
T. M. Conte, B. A. Patel, and J. S. Cox. Using branch handling hardware to support profile-driven optimization. In 27th International Symposium on Microarchitecture, pages 12--21, Nov 1994.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Dynamic Optimization through the use of Automatic Runtime.. - Whaley (1999)   (3 citations)  (Correct)

....can minimize the dynamic instruction count along critical paths [71, 72] It can specialize procedures for common argument values. Using a calling context tree with actual branch percentages, it can optimize branches for the typical case and maximize cache locality by putting related code together [117, 87, 48]. It can utilize the actual execution frequencies to reorder if elseif constructs to put the most common cases at the top, and compile switch statements into a Huffman tree to minimize average lookup time [63] For hash tables, it can come up with a nearperfect hash function that is suited to the ....

....at each point in the dynamic region. 19 2.3 Profile driven optimization Profile driven optimization is a relatively new field. Some static compilers utilize profile information from prior test runs to perform better optimizations, for example, trace scheduling [79] improving cache locality [87, 117, 48, 128, 127], or traditional optimizations [35, 34, 36, 23] Profile driven optimizations have shown up in recent commercial products [89, 88, 131] There has been work on using profile information in dynamic compilers for Scheme [18, 17, 19] Self [61] Cecil [26] and ML [96, 101] More recently, the ....

Thomas M. Conte, Burzin A. Patel, and J. Stan Cox. Using branch handling hardware to support profile-driven optimization. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 12--21, San Jose, 103 California, November 30--December 2, 1994. ACM SIGMICRO and IEEE Computer Society TC-MICRO.


Saving and Restoring Implementation Contexts with.. - Dhodapkar, Smith (2001)   (3 citations)  (Correct)

....implementation context the worst thing that can happen is performance loss. As an extended example, we consider the implementation state contained in branch prediction tables. We assume the branch predictor table contents can be read and written via special implementationdependent instructions [10] available to the VMM. Whenever the operating system is about to switch to a new context, the VMM can save the branch predictor contents, and load the predictor contents for the new context. 3 Context Switches and Conditional Branch Prediction Previously, researchers have observed that context ....

T. M. Conte, B. A. Patel, and J. S. Cox, "Using Branch Handling Hardware to Support ProfileDriven Optimization," Proc. of the 27th Intl. Sym. on Microarchitecture, pp. 12-21, Nov. 1994.


Relational Profiling: Enabling Thread-Level Parallelism in.. - Heil, Smith (2000)   (8 citations)  (Correct)

....94 139 184 jess 166 299 499 750 62 111 185 277 ray 289 468 676 869 99 160 230 296 9 7 Related work Special purpose mechanisms have been proposed for many of the profiling operations discussed above. These require special hardware, software, or both. Conte et al. propose two hardware methods [6, 7] for edge profiling. The first samples the values of the branchtarget buffer and branch prediction array to derive estimates of the edge profile. The second method improves the accuracy using a small special purpose array, the Profile Buffer, to count taken and not taken branches indexed by PC. ....

T. M. Conte, B. A. Patel, J. S. Cox, "Using Branch Handling Hardware to Support Profile-Driven Intl. Symp. on Microarchitecture, pp. 12-21, Nov. 1994.


Saving and Restoring Implementation Contexts with.. - Dhodapkar, Smith (2001)   (3 citations)  (Correct)

....implementation context the worst thing that can happen is performance loss. As an extended example, we consider the implementation state contained in branch prediction tables. We assume the branch predictor table contents can be read and written via special implementation dependent instructions [10] available to the VMM. Whenever the operating system is about to switch to a new context, the VMM can save the branch predictor contents, and load the predictor contents for the new context. 3.1 Context Switches Previously, researchers have observed that context switches can affect branch ....

T. M. Conte, B. A. Patel, and J. S. Cox, "Using Branch Handling Hardware to Support Profile-Driven Optimization," Proc. of the 27th Intl. Sym. on Microarchitecture, pp. 1221, Nov. 1994.


A Programmable Co-processor for Profiling - Zilles, Sohi (2001)   (11 citations)  (Correct)

....is the key to maximizing performance. Without a means to identify bottlenecks and inefficiencies, it is difficult to effectively optimize a program s execution. Program profiling is an important mechanism for observing dynamic program behavior. Many program profiling systems have been proposed [1, 2, 6, 7, 16, 17, 18, 24, 25, 30, 36, 40, 41] and there is some consensus as to the desired attributes of such a system. These attributes can be grouped into four main categories: Usability: Widespread adoption of profiling necessitates that the effort required by the user be minimized and that the technique be widely applicable. ....

T. Conte, B. Patel, and J. Cox. Using branch handling hardware to support profile-driven optimization. In Proc. 27th International Symposium on Microarchitecture, pages 11--21, Nov. 1994.


Ephemeral Instrumentation for Lightweight Program Profiling - Traub, Schechter, Smith (2000)   (14 citations)  (Correct)

....be a significant improvement. This line of thought helped to spur the development of profiling systems like DCPI [1] and Morph [17] which achieve extremely low overheads through the use of statistical sampling, and the recent investigations into hardware based approaches to profile gathering [6,10]. Though the overhead in these approaches is nearly unnoticeable (Anderson et al. 1] report overhead of 1 3 and Conte et al. 6] report an overhead of 0.4 4.6 ) researchers are still investigating the usefulness of statistical and hardware generated profiles in code optimization. This paper ....

.... which achieve extremely low overheads through the use of statistical sampling, and the recent investigations into hardware based approaches to profile gathering [6,10] Though the overhead in these approaches is nearly unnoticeable (Anderson et al. 1] report overhead of 1 3 and Conte et al. [6] report an overhead of 0.4 4.6 ) researchers are still investigating the usefulness of statistical and hardware generated profiles in code optimization. This paper introduces a new technique for program profiling called ephemeral instrumentation. The goal of this software based instrumentation ....

[Article contains additional citation context not shown here]

T. Conte, B. Patel, and J. S. Cox, "Using Branch Handling Hardware to Support Profile-Driven Optimization," Proc. of the 27th Annual International Symposium on Microarchitecture,SanJose,CA,1994.


Algorithms and Architecture Support for Pipelining and.. - Yu, Sha, Passos, Ju (1997)   (Correct)

....reduce pipeline stalls, by using branch history or just a guess, to predict the outcome of a condition test. However, these methods require extra hardware for the conditionals under concern and they cannot guarantee 100 accuracy. In the Instruction Level Parallelism research, branch predication [5, 28, 32, 33] converts instructions from different basic blocks, through if conversion, to straight line code guarded by boolean predicates. It usually requires a significant amount of effort in redesigning instruction formats. Since resource sharing cannot be applied, the final schedule is less efficient than ....

....come from diverse areas and are typical for Multi Dimensional applications. In all experiments except for Kim s MD CdDFGs from [12] we use two unit cycle ALUs and one three cycle multiplier. The methods listed are list scheduling, GPMB approach from [31] Multi Dimensional Branch Predication [5, 28, 32, 33] which utilizes the OPTIMUS algorithm from [22] polynomial time Branch Anticipation algorithm MDBA, and the optimal schedule length utilizing resource sharing but ignoring any sharing prevention cycles. In Table 1, the nodes column lists the total number of operations in each MD CdDFG. The len ....

T. M. Conte, B. A. Patel, and J. S. Cox, "Using Branch Handling Hardware to Support Profile-Driven Optimization", 27th International Symposium on Microarchitecture, Nov. 1994, pp. 12-21.


Branch Anticipation using Loop Pipelining for Resource.. - Yu, Passos, Sha (1996)   (Correct)

....is another way to eliminate pipeline stalls, by using branch history or just a guess to predict the outcome of a condition test. However, these methods do not consider the extra hardware requirement for the conditions under control. In Instruction Level Parallelism research, branch predication [11, 12, 13, 14, 15] has been proposed in order to completely remove conditional branches via conditional execution of individual instructions. This method usually requires a significant amount of effort in redesigning instruction formats and is restricted to small numbers of instructions following the branch. In ....

T. M. Conte, B. A. Patel, and J. S. Cox, "Using Branch Handling Hardware to Support ProfileDriven Optimization", 27th International Symposium on Microarchitecture, Nov. 1994, pp. 12-21.


GURU: A retargetable cfg-based program reorganizer - Ramadan, Gupta (1995)   (Correct)

....the frequencies of all edges incident on the block. Insertion of counters increases the execution time of programs. Ball and Larus[2] present a technique to reduce the number of profiling counters by a factor of two. Machine hardware can also be used to obtain fast profiling. Conte, Patel et al.[5] present a technique that uses branch history hardware to achieve a fast profiling technique. Static profiling techniques use a common set of rules to estimate the probability of the direction of a conditional branch. The rules are heuristics derived from the structure of the program and or ....

Thomas M. Conte, Burzin A. Patel, and J. Stan Cox. Using branch handling hardware to support profiledriven optimization. Proceedings of the 27th Annual International Symposium on Mocroarchitecture, pages 12--21, November 1994.


ProfileMe: Hardware Support for Instruction-Level.. - Dean, Hicks.. (1997)   (66 citations)  (Correct)

....from a high L2 cache miss rate. Using the effective addresses and the latency information for loads and stores captured by ProfileMe, we can provide the same information as a CML buffer. Some processors, such as the Intel Pentium, have software readable branch target buffers (BTB) Conte et al. [7] showed how to cheaply estimate a program s edge execution frequencies by periodically reading the contents of the BTB. More recently, Conte et al. 6] proposed additional hardware called a profile buffer, which counts the number of times a branch is taken and not taken. The branch direction ....

T. M. Conte, B. A. Patel, and J. S. Cox. Using branch handling hardware to support profile-driven optimization. In Proc. 27th Annual Intl. Symp. on Microarchitecture, pages 12--21, Nov. 1994.


Commercializing Profile-Driven Optimization - Stan Cox David (1995)   (1 citation)  Self-citation (Conte Cox)   (Correct)

....is required. Hardware based profiling can be used with existing processors by exploiting BIST scan paths or performance monitoring features. The accuracy of the scheme is not as high as probe based or arc based profiling, since the BTB must be sampled. This is discussed in depth in [15]. One reason for the error is that short lived arcs are not well represented in the sampled data. However, they comprise the lessfrequently executed transitions in the program. The error is quantified in the following section. 2.5 Comparisons The slowdown of each profiling technique for ....

T. M. Conte, B. A. Patel, and J. S. Cox, "Using branch handling hardware to support profiledriven optimization," in Proceedings of the 27th Annual International Symposium on Microarchitecture, (San Jose, CA), Dec. 1994.


Accurate and Practical Profile-Driven Compilation Using the.. - Thomas Conte (1996)   (18 citations)  Self-citation (Conte)   (Correct)

....commercial acceptance, profiling must be smoothly integrated into the software development cycle. Unfortunately, this requires the reduction or elimination of the need for a sample input suite, as well as more efficient profiling methods. Hardware based profiling was introduced by Conte, et al. [8] as a way to address these problems. This technique uses existing branch prediction hardware to collect profile information at kernel entrances. It has a slowdown of 1.02 on average, and 1.05 as a worst case [8] The use of hardware based profiling allows software vendors to supply instrumented ....

....efficient profiling methods. Hardware based profiling was introduced by Conte, et al. 8] as a way to address these problems. This technique uses existing branch prediction hardware to collect profile information at kernel entrances. It has a slowdown of 1.02 on average, and 1. 05 as a worst case [8]. The use of hardware based profiling allows software vendors to supply instrumented versions of applications to alpha and beta testers. The profiled information based on actual day to day usage can be later retrieved, and final program optimization can be performed. The advantage to hardware ....

[Article contains additional citation context not shown here]

T. M. Conte, B. A. Patel, and J. S. Cox. Using branch handling hardware to support profile-driven optimization. In Proc. 27th Ann. International Symposium on Microarchitecture, San Jose, CA, Nov. 1994.


Optimization of Instruction Fetch Mechanisms for High.. - Conte, Menezes, Mills.. (1995)   (85 citations)  Self-citation (Conte Patel)   (Correct)

....of both code reordering and padtrace is that they require profile information, which is often hard to gather and requires additional steps when compiling code. Hardware based profiling techniques can remove many of these disadvantages, although their use was not studied in this paper. See [21]. An alternative to pad trace is to pad all blocks without regard for trace membership. Padtrace introduces significantly less nops than pad all, as can be seen from Table 4. Table 4: Degree of nops inserted for pad all and padtrace (expressed as percentage of nops vs. original code size) ....

T. M. Conte, B. A. Patel, and J. S. Cox, "Using branch handling hardware to support profiledriven optimization," in Proc. 27th Ann. International Symposium on Microarchitecture, (San Jose, CA), Nov. 1994.


Evolutionary Compilation to Long Instruction Superscalar.. - Thomas Conte (1998)   (2 citations)  Self-citation (Conte)   (Correct)

....code while it remains dormant between executions flossing the executable, if you will. A key challenge is the collection of profiles all the time. That implies it shouldn t impact performance at all. This rules out software profiling. We introduced techniques to allow branch predictors [1] or performance monitors [2] 3] to collect profile data in real time in hardware, without significant overhead. These hardware buffers are actually performing prediction of events, just like branch and memory dependence predictors. But they are predicting events for future runs of the code, ....

T. M. Conte, B. A. Patel, and J. S. Cox, "Using branch handling hardware to support profile-driven optimization," in Proc. 27th Annual International Symposium on Microarchitecture, (San Jose, CA), Dec. 1994.


Instruction History Management for High-Performance Microprocessors - Bhargava (2003)   (Correct)

No context found.

T. M. Conte, B. A. Patel, and J. S. Cox. Using branch handling hardware to support profile-driven optimization. In 27th International Symposium on Microarchitecture, pages 12--21, Nov 1994.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC