45 citations found. Retrieving documents...
T. Ball and J. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, 1994.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Online Feedback-Directed Optimization of Java - Arnold, Hind, Ryder (2002)   (3 citations)  (Correct)

....as early as possible is preferable. However, converging on high performance sooner is not worth compromising the level of performance that is eventually reached, or substantially degrading startup performance. 3. 2 Intraprocedural Edge Profiles The goal of intraprocedural edge counters [13] is to collect the execution frequencies of intraprocedural control flow edges. Execution frequencies of basic blocks can be easily derived from edge frequencies. Such profile information is useful for a variety of optimizations. It has been used o#line in previous work for optimizations such as ....

....subproblems involved in collecting edge profiles for the purpose of optimization: collecting the profiles, and making the profiles available to the client optimizations that use them. These two topics are discussed in the next two subsections, respectively. Collecting Edge Profiles Previous work [13] has shown that careful placement of counters can reduce the overhead of collecting edge profiles. However, one advantage of the instrumentation sampling framework is that it significantly reduces the execution overhead of instrumentation. Therefore, to avoid unnecessary complexity, a simple ....

Thomas Ball and James R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994.


Online Profiling And Feedback-Directed Optimization Of Java - Arnold (2002)   (1 citation)  (Correct)

....can be substantially improved by exploiting invariant runtime values; however, these systems were not fully automatic and relied on programmer directives to identify regions of code to be optimized. There exists a large body of work on collecting profiling information by performing instrumentation [25, 44, 4, 18, 17], as well as fully automatic optimizations based instrumented profiles [34, 27, 46, 30, 31, 10, 62, 65, 53] However this work assumes the execution model where a profiles can be collected o#ine, using a separate training run. Although the resulting speedups are often promising, this approach ....

....optimizations. All 3 of these steps involve overhead and creating the potential for degrading performance rather than improving it. Most importantly, the overhead of collecting instrumented profiles is a problem. Overheads in the range of 30 1,000 above non instrumented code is not uncommon [46, 17, 18, 27, 26, 4] for collecting the kinds of profiles often used to drive feedbackdirected optimizations, and overheads in the range of 10,000 (100 times slower) have been reported [26] This overhead is one of the main reasons why today s JVM s perform only limited forms of feedback directed optimizations [8, ....

[Article contains additional citation context not shown here]

Thomas Ball and James R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994.


Profile-Directed Optimization of Event-Based Programs - Saumya (2002)   (1 citation)  (Correct)

....time the handler is invoked, thereby obtaining handler profiles. Profiling is done to one program and for configurable programs, one program configuration at a time. At present, the event framework is instrumented by hand, but this can easily be automated using well understood techniques [2]. The analysis and optimizations are currently performed offline after the program to be optimized is executed enough times. On line analysis, and potentially optimization, are potential extensions to this work and are discussed in section 5. The profiling algorithm takes the event trace ....

T. Ball and J. R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994.


Efficient Performance Prediction - For Modern Microprocessors   (Correct)

....These tools included ATOM from Digital [SE94] Shade from Sun [CK94] MINT from SGI [V97] and EEL from the University of Wisconsin [LS95] These tools were not limited to building profile based performance prediction, but this is one of the tasks they were used for. A study by Ball and Larus [BL94] showed that simple basic block node and edge profiling only added an average of 16 to the runtime of the uninstrumented application. This is a tiny overhead compared with simulation based approaches, which at their fastest, are still an order of magnitude slower [WR96] The analysis phase ....

....The next two subsections evaluate the run time performance of each of these phases independently. The instrumentation phase is proportional to the total number of dynamic instructions in the program, this puts a strong requirement that overhead of the instrumentation to be small. Ball and Larus [BL94] report that for the SPEC95 benchmarks, their efficient edgebased profiling technique only causes an average 16 overhead over the run time of the uninstrumented program. The fastest simulator has a factor of four to ten times slowdown just to execute the instructions [WR96] The two simulators ....

T. Ball and J.R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, vol. 16, no. 3, pp. 1319-1360, July 1994.


Dataflow Frequency Analysis based on Whole Program Paths - Scholz, Mehofer (2002)   (Correct)

....flow framework which achieves more accurate results by taking execution history into account by inspecting intervals of k edges instead of treating each edge separately. However, the result is still an approximation of the best solution. So far profiling algorithms were based on edge profiling [1] or profiling of intraprocedural, acyclic paths [2] only. In [2] it has been shown that profiling of acyclic paths can be done efficiently and takes about twice the time of edge profiling. An interprocedural extension of acyclic path profiling [9] results in longer paths, but paths still do not ....

....nodes. The CFG consists of a branching statement inside a loop with two assignments d 1 and d 2 on edges 2 4 and 3 4 , respectively. For sake of simplicity consider the reaching definitions problem [4] Moreover, let us consider a specific program run r which takes 8 times the left branch [1,2,4] and terminates with the right branch [1,3,4,5] Hence, we get the frequencies that definition d 1 reaches node 2 seven times while definition d 2 never reaches node 2, whereas the use of variable x at edge 4 5 is reached by definition d 2 only. Recently, approaches have been developed to ....

[Article contains additional citation context not shown here]

T. Ball and J. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994.


Profile-Directed Optimization of Event-Based Programs - Rajagopalan, Debray.. (2002)   (1 citation)  (Correct)

....of events. Second, this information is used to target specific handlers in the program for similar profiling to identify predictable sequences of handlers that can be optimized. At present, programs are instrumented by hand, but this can easily be automated using well understood techniques [3]. The remainder of this section elaborates on these profiling steps. Event execution is profiled by instrumenting the event system to create an event trace each time a program is executed. This trace consists of a sequence of entries, where each entry corresponds to an occurrence of the raise or ....

T. Ball and J. R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994.


Automating Selective Dynamic Compilation - Mock (2002)   (Correct)

....only after 2 1.8#10 executions; therefore no overflow check was implemented. Instead of adding a counter to every basic block, a spanning tree algorithm could be used to place a minimal number of counters in the program and thereby mitigate the slowdown from frequency profile collection [BL94] However, since most of the slowdown is incurred by value profiling, any improvements to frequency profiling will not have a significant impact on overall performance of an instrumented application. Therefore, only the straightforward counter per basic block approach was implemented in Tumi. ....

Thomas Ball and James R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems (TOPLAS), 16(4):1319--1360, 1994. 134


Optimal and Efficient Speculation-based Partial Redundancy.. - Qiong, Xue (2003)   (1 citation)  (Correct)

....W f l : E f l 7 IN (where IN is the set of natural numbers starting from 0) The weight W f l (u; v) attached to the edge (u; v) 2 E f l is a nonnegative integer representing the frequency of its execution. The edge profiling information required can be gathered via code instrumentation [5], statistic sampling of the program counter [3] or static program based heuristics [6, 27] An edge profile has less runtime overhead to collect than a path profile [7] The information contained in an edge profile, while less than a path profile, is sufficient to guarantee computationally optimal ....

T. Ball and J. H. Larus. Optimally profiling and tracing systems. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994.


Simple and General Statistical Profiling with PCT - Charles Blake Steve (2002)   (1 citation)  (Correct)

....basic blocks instead of function calls. Many implementations of these types of profiler are not robust to improper program exits and are not tolerant of inadequate data in some objects. Many systems have also implemented some form of link time instrumentation or post link time binary re writing. [23, 14, 9, 15, 24, 20] These address rebuilding issues somewhat and have some weak extensibility. A significant invasion of foreign code may remain, though. The code must be inserted to count executions, or, in more involved cases, log procedure arguments. Recently, a number of researchers have begun investigating the ....

T. Ball and J. R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994.


General-Purpose Architecture Instruction Scheduling Techniques - De Sutter (1998)   (Correct)

....do we need execution frequency data for global scheduling algorithms, code layout interacting with dynamic branch prediction and fetching mechanisms is important as well. Some well known pro ling techniques (instrumentation algorithms) and a comparison between these techniques can be found in [6]. The authors conclude that the most ecient pro ling algorithm for counting basic block executions is based on instrumentation code placed on edges in the CFG, instead of placing it in basic blocks. Once the pro les have been generated, we can ask ourselves how well program executions are ....

Ball, T., and Larus, J. Optimally proling and tracing programs. ACM Transactions on Programming Languages and Systems 16, 4 (July 1994), 1319-1360.


Path-Sensitive, Value-Flow Optimizations of Programs - Bodik (1999)   (2 citations)  (Correct)

....For edge profiling, the most efficient technique is sampling the execution of the program, which is a more efficient but less precise technique than instrumentation. Of the three profiles, sampling was used only for edge profiling. Edge profiles. An edge profile with about 16 overhead [BL94] With a hardware based sampling approach, edge profiles cost only 1 3 overhead [ABD # 97] Recently, a software based sampling approach was developed, via transient (removable) instrumentation [TS99] or transient interpretation of native instructions [BDB99] Their cost is comparable to ....

Thomas Ball and James R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994.


TSF: An Environment for Program Transformations - Mével   (Correct)

.... array accesses) to very complex ones (i.e. parallelism and data locality optimizations [5, 1] Furthermore these transformations are usually based on information extracted either statically (i.e. data AEow and data dependence analysis [3] or 1 2 dynamically (i.e. via program instrumentation [2]) Most of the currently available tools, compiler, parallelizer, preprocessor, prolers, etc. cover part of the needs but they suoeer from a major drawback: they cannot be easily extended. Extensibility allows the user (i.e. an application developer in our case) to add new program transformations ....

Thomas Ball and James R Larus. Optimally proling and tracing programs. ACM Transactions on Programming Languages and Systems, 16:13


Online Instrumentation and Feedback-Directed Optimization of Java - Arnold (2002)   (2 citations)  (Correct)

....occur when multiple thread simultaneously access a single counter. 5 Example Instrumentation: Edge Counters Our instrumentation infrastructure was designed to allow incorporating a wide variety of instrumentations. The rst instrumentation included in our system is intraprocedural edge counters [8]. The goal of edge counters is to collect the execution frequencies of the intraprocedural control ow edges between basic blocks. The execution frequencies of basic block can be derived easily from edge counts. Edge counters were chosen as the rst instrumentation because they are useful for a ....

....counts. Edge counters were chosen as the rst instrumentation because they are useful for a variety of optimiations. They have been used o ine in previous work for optimizations such as code reordering [27] instruction scheduling [19] and other classic code optimizations [12] 5 Previous work [8] has shown that careful placement of counters can reduce the overhead of collecting edge counts. However, one of the main advantage of instrumentation sampling is that it essentially eliminates the need to worry about the execution overhead of instrumentation. Therefore, to avoid unnecessary ....

T. Ball and J. R. Larus. Optimally proling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319-1360, July 1994.


A Framework for Reducing the Cost of Instrumented Code - Arnold, Ryder (2001)   (26 citations)  (Correct)

....level of optimization. For these applications the most substantial performance improvements will come from feedback directed optimizations, where pro ling information is used to decide not only what to optimize, but how to optimize. There exists a large body of work on collecting o ine pro les [3, 10, 11, 15, 26], as well as optimizations based on o ine pro les [6,16,17,19,20,27] Although some systems [5, 9, 21, 22, 32] apply limited forms of online feedback directed optimizations, most of the o ine work mentioned above has not yet been applied in fully automated online systems. The main diculty in ....

....in applying these optimizations online is that they often rely on instrumenting the code to collect detailed information about program execution, and instrumentation can cause substantial performance degradation. Overheads in the range of 30 1,000 above non instrumented code is not uncommon [3, 10, 11, 16, 17, 27], and overheads in the range of 10,000 (100 times slower) have been reported [16] An online system needs to execute instrumented code for some period of time, prior to performing optimization. The overhead introduced by instrumentation makes this task dif cult to perform for several reasons. ....

[Article contains additional citation context not shown here]

T. Ball and J. R. Larus. Optimally proling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319-1360, July 1994.


Rapid Profiling via Stratified Sampling - Sastry, Bodik, James (2001)   (8 citations)  (Correct)

....profilers, and hybrid profilers. Smart software profilers: The first group of software profilers instruments the program with profiling instructions. One method for reducing the overhead of executing the additional instructions is to exploit the program structure: Ball Larus edge profiling [5] and path profiling [6] use program analysis and manage to restrict overheads to 10 30 . Other tricks for reducing the instrumentation overhead include restricting profiling to a subset of instructions [8, 36] and turning off profiling after the profile stabilizes [8] 1 Despite recent advances, ....

Thomas Ball and James R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994.


An Empirical Study of Tracing Techniques - From Failure Analysis   Self-citation (Tracing)   (Correct)

No context found.

T. Ball and J. R. Laurus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(7):1319--1360, 1994.


Reducing Coverage Collection Overhead With Disposable.. - Kalyan-Ram..   (Correct)

No context found.

T. Ball and J. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, 1994.


An Empirical Study of Profiling Strategies for Released.. - Elbaum, Hardojo (2004)   (Correct)

No context found.

T. Ball and J. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, 1994.


Leveraging Disposable Instrumentation to Reduce Coverage.. - Chilakamarri, Elbaum (2006)   (Correct)

No context found.

T. Ball and J. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, 1994.


Flux: A Language for Programming High-Performance Servers - Brendan Burns Kevin   (Correct)

No context found.

T. Ball and J. R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994.


DISE: Implementing Application Meta-Features via - Software-Programmable..   (Correct)

No context found.

Thomas Ball and James R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994. 13


TFP: Time-sensitive, Flow-specific Profiling at Runtime - Nandy, Gao, Ferrante (2003)   (Correct)

No context found.

Thomas Ball and James R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994.


Techniques for Transparent Program Specialization in Dynamic.. - Sastry   (Correct)

No context found.

T. Ball and J. R. Larus. Optimally Profiling and Tracing Programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994.


Cross-Architecture Performance Predictions for Scientific.. - Marin, Mellor-Crummey (2004)   (Correct)

No context found.

T. Ball and J. R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994.


Checking Program Profiles - Patrick Moseley Saumya (2003)   (Correct)

No context found.

T. Ball and J. R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994. 9

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC