| D. Callahan, K. Kennedy, and A. Porter eld. Analyzing and visualizing performance of memory hierarchies. In Performance Instrumentation and Visualization, pages 1-26. ACM Press, 1990. |
....provide different trade offs between: accuracy, speed, flexibility (i.e is adaptable to different memory configurations) and information provided. Memory simulation techniques are very accurate, flexible and can provide rich information. They are usually based on trace driven simulation [11], 9] 17] 20] 6] 13] 10] 3] 16] 23] However these techniques are very slow (usually several orders of magnitude) For instance, the slowdown exhibited by all simulators surveyed in [22] is in the range of 45 6250. There are some innovative methods that have been proposed with the ....
K. Kennedy, D. Callahan, and A. Porterfield. Analyzing and visualizing performance of memory hierarchy. In Instrumentation for Visualization. ACM Press, New York, 1990.
.... match between performance model and experiment, as has been demonstrated for pipelined vector processors [11] Complexities arise when multiple interacting components of an architecture are considered, but even in this case, tractable and predictive architectural models can be constructed, e.g. [12] [13] For a complete understanding of performance, however, the data transport capacities of the architecture must be compared to the demands for data movement of the program it executes. Starting at the bottom of the memory hierarchy (except for register management) it is possible to develop a ....
D. Callahan, K. Kennedy and A. Porterfield, "Analyzing and visualizing performance of memory hierarchies," Performance Instrumentation and Visualization, R. Koskela and M. Simmons, Eds., pp. 1-26 (1990).
....the simple example of nested loops where the outer loop iterates L times and the inner loop sequentially accesses an array of N 4 byte integers. 4 ############################################################################################## a) Cache b) Small Array c) Large Array A[0] A[1] A[2] A[3] A[4] A[0] A[1] A[2] A[8] A[9] A[10] A[16] Figure 1: Determining Expected Cache Behavior Sequentially accessing an array that fits in cache (Figure 1b) should produce M cache misses, where M is the number of cache blocks required to ....
....loops where the outer loop iterates L times and the inner loop sequentially accesses an array of N 4 byte integers. 4 ############################################################################################## a) Cache b) Small Array c) Large Array A[0] A[1] A[2] A[3] A[4] A[0] A[1] A[2] A[8] A[9] A[10] A[16] Figure 1: Determining Expected Cache Behavior Sequentially accessing an array that fits in cache (Figure 1b) should produce M cache misses, where M is the number of cache blocks required to hold the array. Accessing an ....
[Article contains additional citation context not shown here]
D. Callahan, K. Kennedy, and A. Porterfield, "Analyzing and Visualizing Performance of Memory Hierarchies, " Instrumentation for Visualization, ACM Press, (1990).
....k) p 4 = true] Figure 6. C program fragment can be done at the source code level or machine code level. For our framework we need a source code level instrumentation. In the past a variety of different cache profilers were introduced, e.g. MTOOL (Goldberg and Hennessy, 1991) PFC Sim (Callahan et al. 1990), CPROF (Lebeck and Wood, 1994) The novelty of our approach is to compute the trace data symbolically at compile time without executing the program. A symbolic tracefile is a constructive description for all possible memory references in chronological order. It is represented as symbolic ....
Callahan, D., K. Kennedy, and A. Portfield: 1990, Analyzing and Visualizing Performance of Memory Hierachies, Instrumentation for Visualization. ACM Press.
.... [15] 16] Memory dependence profiling has been used to aid ILP enhancing optimizations by allowing the compiler to reorder ambiguous memory references [2] 4 Profiling has also been used to identify procedures, basic blocks or source lines with high memory overheads or cache misses [17] [18], 19] 20] Profiling information specifying the number of cache misses incurred by each access has been proposed to guide the compiler to selectively prefetch data [21] 22] and has been recently used to hand tune code [20] 1.2 The IMPACT Compiler The tool described in this thesis is ....
D. Callahan, K. Kennedy, and A. Porterfield, "Analyzing and visualizing performance of memory hierarchies," in Instrumentation for Visualization, New York: ACM Press, 1990.
....measurements of meaningful categories can provide significant performance tuning assistance even apart from their use in analytic models. 2. 4 Related Work The majority of the tools and metrics devised for performance evaluation and tuning reflect their orientation on the measure modify paradigm [Callahan et al. 1990; Cybenko et al. 1991; Davis and Hennessy, 1988; Dongarra et al. 1990; Goldberg and Hennessy, 1993; Heath and Etheridge, 1991; Kohn and Williams, 1993; So et al. 1987] Such tools are very useful in application fine tuning, but usually do not provide completeness (i.e. they don t measure all ....
David Callahan, Ken Kennedy, and Allan Porterfield, "Analyzing and Visualizing Performance of Memory Hierarchies," In Performance Instrumentation and Visualization, pages 1--26. ACM Press, 1990.
No context found.
D. Callahan, K. Kennedy, and A. Porterfield. Analyzing and visualizing performance of memory hierarchies. In Performance Instrumentation and Visualization, pages 1--26. ACM Press, 1990.
....Compared to these techniques, the bandwidth based method is simpler, more accurate, and more widely applicable to different machines and applications. In the past, the modeling of memory hierarchy performance relied on measuring memory latency through machine simulation. Callahan et al.[10] first used compilerbased approach to analyze and visualize memory hierarchy performance with a memory simulator. Another approach is taken by Goldberg and Hennessy[11] who simulated program execution and measured memory stall time by comparing actual running time with simulation result of ....
D. Callahan, K. Kennedy, and A. Porterfield. Analyzing and visualizing performance of memory hierarchies. In Performance Instrumentation and Visualization, pages 1--26. ACM Press, 1990.
....and estimations of bandwidth requirement or constraint is far easier and simpler than those based on memory latency. As a result, bandwidth based tuning and prediction are more efficient and accurate than previous techniques for bandwidth limited applications. Callahan, Kennedy and Porterfield[4] first used compiler based approach to analyze and visualize memory hierarchy performance. They instrumented the code to obtain memory access behavior and examined it with a memory simulator. Gallivan et al.[9] documented the memory performance of programs with different load store patterns and ....
David Callahan, Ken Kennedy, and Allan Porterfield. Analyzing and Visualizing Performance of Memory Hierarchies. In Performance Instrumentation and Visualization, pages 1--26. ACM Press, 1990.
No context found.
D. Callahan, K. Kennedy, and A. Porter eld. Analyzing and visualizing performance of memory hierarchies. In Performance Instrumentation and Visualization, pages 1-26. ACM Press, 1990.
No context found.
David Callahan, Ken Kennedy, and Allan Porterfield. Analyzing and visualizing performance of memory hierarchies. In Performance Instrumentation and Visualization, pages 1--26. ACM Press, 1990.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC