| R. E. Kessler, M. D. Hill, and D. A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers, 43(6):664--675, 1994. |
....constraint. Conversely, the pareto points in the performance power space are in general not optimal from the cost perspective. c) When using the performance as a constraint, we determine the cost power pareto points. For performance and power estimation purposes we use a time sampling technique [16], which significantly speeds the simulation process. While this may not be highly accurate compared to full simulation, the fidelity is sufficient to make good incremental decisions guiding the search through the design space. To verify that our heuristic guides the search towards the pareto curve ....
....architectures, estimates, and prunes the design space, guiding the search towards the most promising designs. We used a set of large real life multimedia and scientific benchmarks. Compress and Li are from SPEC95, and Vocoder is a GSM voice encoding application. We use a time sampling [16] estimation to guide the walk through the design space, pruning out the designs which are not interesting. The time sampling alternates on sampling and off sampling periods, assuming a ratio of 1 9 between the on and off time intervals. We then use full simulation for the most promising ....
R. Kessler, M. Hill, and D. Wood. A comparison of tracesampling techniques for multi-megabyte caches. Technical report, University of Wisconsin, 1991.
....as cycle accurate simulation, modeling all cache and branch predictor interactions is still costly. One viable method for further accelerating sampled simulations is to avoid full warm up by only modeling those interactions that occur within a given number of instructions prior to the sample [2, 3, 5]. Our technique makes the determination of when to engage cache and branch predictor warm up by exploiting memory reference reuse latencies (MRRL) a measurement of the number of instructions that elapse between successive references to the same address. We have developed software that ....
....in the mean observed IPC. In the experiments conducted for this research, we use a similar multiple sample simulation regime, prefixing each sample with a warm up interval and preserving stale cache state between samples. Other heuristics for reducing cold start bias are studied by Kessler et al. [5]. They consider using half of a sample s references for warm up purposes; tracking only entries that are known to contain good state; using stale state from the previous sample; and flushing state but estimating how much error this introduces. The warm up acceleration methods proposed by [2, 5] ....
[Article contains additional citation context not shown here]
R. E. Kessler, M. D. Hill, and D. A. Wood. A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches. Technical Report 1048, Univ. of Wisconsin-Madison Computer Sciences Dept., September 1991.
....or evaluate a system design [32] 1) use the traced workload directly to drive a simulation, or (2) create a model from the trace and use the model for either analysis or simulation. For example, trace driven simulations based on large address traces are often used to evaluate cache designs [45, 42]. But models of how applications traverse their address space have also been proposed, and provide interesting insights into program behavior [71, 72] 3.1 Why Model The advantage of using a trace directly is that it is the most real test of the system; the workload re ects a real workload ....
R. E. Kessler, M. D. Hill, and D. A. Wood, \A comparison of trace-sampling techniques for multi-megabyte caches". IEEE Trans. Comput. 43(6), pp. 664{ 675, Jun 1994.
....even in non sampled portions of simulation. Unfortunately, this is too expensive: if samples are distributed across the full length of a long program s execution, the cost of moving from the end of one sample to the beginning of the next sample becomes too expensive. Kessler, Hill, and Wood [16] and Conte, Hirsch, Menezes, and Hwu [4, 5] describe various techniques for reducing cold start bias for cache and branchpredictor simulation. Unfortunately, these prior techniques for dealing with cold start bias are heuristics whose accuracy can only be verified experimentally. Haskins and ....
R. E. Kessler, Mark D. Hill, and David A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers, 43(6):664--75, June 1994. 14
....Gupta, and Anderson [22] Kaplan, Smaragdakis, and Wilson [14] and Elnozahy [9] examine memory reference trace sampling and present new algorithms for trace reduction and compression. Other work has studied analytic models for estimating cache miss rates during the unprimed portion of the sample [15, 33], or described means for bounding errors by adjusting simulation lengths [21] The most widely used sampling technique in the processor architecture community is to perform fulldetail simulation for a single, large segment of execution, anywhere from tens of millions to billions of instructions ....
R. E. Kessler, Mark D. Hill, and David A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. Tech. Report 1048, University of Wisconsin-Madison Computer Sciences Department, Sep. 1991.
....be used as well for selecting reduced input data sets. A reference input set and a resembling reduced input set will be situated close to each other in the dimensional space built up by the principal components. Another important research topic that is related to this paper is trace sampling [5, 6, 13, 16]. In trace sampling, several samples are taken from a program execution so that the total number of instructions in the samples is significantly less than the total number of instructions of a complete execution. In order to make viable design decisions based on these sampled traces, a sampled ....
R. E. Kessler, M. D. Hill, and D. A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers, 43(6):664--675, June 1994.
....third dependent parameter is the sampling ratio, the ratio of the total number of references within the samples, divided by the total number of references in the run. In this paper, we present accuracy results for one setting of sampling parameters, and briefly summarize other results. Peferences [6], 10] and [11] discuss reference trace sampling in more detail. 20 18 MATM ESPR TRI MP3D MP3D Sequential True Miss Rate Estimated Miss Rate Using Sampling CHOL WATER LOCUS Parallel Figure 10: Estimated and true cache miss rates for sequential and parallel applications. 6.4.1 ....
R. E. Kessler, M.D. Hill, and D. A. Wood. A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches. Technical Report 1048, Univ. of Wisconsin Computer Sciences Department, Sept. 1991.
.... rigorous method of choosing simulation phases will be used in future work [29] Finally, Table 1 shows the anticipated load on the L2 cache by listing the number of L2 accesses per 1 million instructions given 64KB level 1 instruction and data caches (this metric was proposed by Kessler et al. [20]) 3 Uniform Access Caches Modern level two caches no longer employ a single monolithic data array, and instead are subdivided into multiple smaller sub banks to minimize the access time. In addition, they are typically single ported as adding additional physical ports to the SRAM cells incurs a ....
R. Kessler, M. Hill, and D. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers, June 1994.
....execution path. 3.3 The Sampling System The sampling system is used to obtain a representative, sampled trace out of a full trace. It helps to increase the simulation speed and to reduce the storage capacity required to store the traces. There are several sampling techniques proposed previously [8, 16, 18, 21]: One large sample: A single and large sample from the trace is chosen. Multiple periodic samples: A xed size chunk of instructions are samples, followed by a xed size chunk of instructions skipped, and so on. The number of chunks collected can be determined according to a scaling factor. ....
R. E. Kessler, M. D. Hill, and D. A. Wood. A comparison of trace-sampling techniques for multi{megabyte caches. In IEEE Transactions on Computers, 1994.
....to analyze or evaluate a system design: 1) use the logged workload directly to drive a simulation, or (2) create a model from the log and use the model for either analysis or simulation. For example, trace driven simulations based on large address traces are often used to evaluate cache designs [13, 12]. But models of how applications traverse their address space have also been proposed, and provide interesting insights into program behavior [24, 25] The advantage of using a trace directly is that it is the most real test of the system; the workload re ects a real workload precisely, with ....
R. E. Kessler, M. D. Hill, and D. A. Wood, \A comparison of trace-sampling techniques for multi-megabyte caches". IEEE Trans. Comput. 43(6), pp. 664-675, Jun 1994.
....varying parameters or to reach samples deep in a benchmark s execution. Checkpointing simulation state at the beginning of each sample would be one solution, but separate checkpoints would be required for each desired combination of cache and branch predictor configurations. Current approaches [2, 4] use ad hoc heuristics to reduce warm up 1 length. MSE is a more formal mathematical approach that determines a minimal length of the warm up period necessary to conform to a user specified probability of accurate large structure initialization. MSE is also flexible, being directly applicable to ....
....chosen samples representativeness. A potential problem with this approach is that finding configurationindependent metrics for representativeness is difficult. This work does not treat cold start bias between samples. Other heuristics for reducing cold start bias are studied by Kessler et al. [4]. They consider using half of a sample s references for warm up purposes; tracking only entries that are known to contain good state; using stale state from the previous sample; and flushing state but estimating how much error this introduces. In short, several formal techniques exist for ....
R. E. Kessler, Mark D. Hill, and David A. Wood. A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches. Technical Report 1048, University of WisconsinMadison Computer Sciences Department, Sep. 1991.
....to 5 times speedup in simulation speed compared to our simulator without sampling. 2. 3 Related Work on Sampling Sampling was first proposed by Laha et al. 5] to improve the speed of cache simulation, and has subsequently been used in a number of other studies to improve either cache simulation [6, 7, 8], or processor simulation [9, 10] The main inaccuracies in all these studies stem from loss of state information about the simulated system due to sampling. To reduce these inaccuracies, these studies use a combination of state repair methods, larger sampling periods, and larger warm up periods. ....
R.E.Kessler, M.D.Hill, and D.A.Wood. A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches. In IEEE Transactions on Computers Volume C-43, 1994.
....long to stabilise in their miss rate behaviour. A valid study must therefore involve tens of billions of data references, which are extremely expensive to store and use. Thus the question arises of trying to pick representative subsets of these references. A few papers by Hill, Kessler, and Wood [3, 4] suggest two techniques: time sampling and set sampling. The former samples the behaviour of the entire cache during certain stretches of time; the latter samples the behaviour of a few of the sets in the cache for the entire running of the program. Time sampling is complicated by the fact that a ....
R.E. Kessler, M.D. Hill, and D.A. Wood. A Comparison of TraceSampling Techniques for Multi-Megabyte Caches. IEEE Transactions on Computers, 43:6, 1994. 29
....or data addresses to be collected in a trace buffer during the application s execution. The generated information is later used to drive a simulator of the system under study. Given that an application may execute billions of instructions, to reduce the storage requirements of the traces, sampling [5, 21, 26] is often used. 1 This work was supported in part by the National Science Foundation under grants NSF Young Investigator Award MIP 9457436, ASC 9612099 and MIP 9619351, DARPA Contract DABT63 95 C0097, NASA Contract NAG 1 613 and gifts from IBM and Intel. While TDS is an effective methodology in ....
R.E.Kessler, M.D.Hill, and D.A.Wood. A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches. IEEE Transactions on Computers, C-43:664--675, June 1994.
....expanding on trace data from part of the grid and we can at least use the currently available trace data from parallel computers to form synthetic trace data for local scheduling systems. In essence, this means that sampling is used to solve size problem, as has also been done with address traces [32]. More research is required to establish the methodological basis and limitations of this approach. 4 Convergence 4.1 A Comparison owned by Dror Scheduling for parallel systems has been studied for a long time, and many schemes have been proposed and evaluated [15] Scheduling on a grid is ....
R. E. Kessler, M. D. Hill, and D. A. Wood, "A comparison of trace-sampling techniques for multi-megabyte caches". IEEE Trans. Comput. 43(6), pp. 664-- 675, Jun 1994.
....analyze or evaluate a system design: 1) use the traced workload directly to drive a simulation, or (2) create a model from the trace and use the model for either analysis or simulation. For ex ample, trace driven simulations based on large address traces are often used to evaluate cache designs [23, 22]. But models of how applications traverse their address space have also been proposed, and provide interesting insights into program behavior [36, 37] The advantage of using a trace directly is that it is the most real test of the system; the workload reflects a real workload precisely, with ....
R. E. Kessler, M. D. Hill, and D. A. Wood, "A comparison of trace-sampling techniques for multi-megabyte caches". IEEE Trans. Comput. 43(6), pp. 664--675, Jun 1994.
....and simply simulates all instructions preceding the desired sample, just at a lower level of detail. Only the model s caches, branch predictor, and architectural state are updated. Other work has studied analytic models for estimating cache miss rates during the unprimed portion of the sample [25], 64] or described means for bounding errors by adjusting simulation lengths [34] Iyengar and Trevillyan have derived the R metric for measuring the representativeness of a trace [18] and they generate traces by scaling basic block transition counts and adjusting selected instructions to ....
R. E. Kessler, M. D. Hill, and D. A. Wood. A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches. Technical Report 1048, Univ. of Wisconsin Computer Sciences Department, Sept. 1991.
....it also demands large amounts of space and time, particularly for large caches and long running applications. These demands can be greatly reduced by employing sampling techniques at the expense of providing only a statistical estimate of the properties of a full trace. Previous studies [10, 7, 6] contain results for other workloads and caches and discuss the conditions under which sampling may, or may not, be used. This work was supported in part by NSF Grant MIP 9700970 and by a gift from Intel Corporation. Our interest in using sampling is three fold. First, we are interested in the ....
....traces for Windows NT on the Intel X86 platform. Second, we want to demonstrate the utility of these sampling techniques for architectural studies. Although it has been shown that trace sampling is not very accurate for metrics such as hit rate when simulating large, multi megabyte caches [6], we want to demonstrate that sampling is useful to assess trends not only for caches but also for other architectural structures whose state depends on the processing of past references. Such techniques permit the testing of a wide range of architectural parameters in a relatively short amount of ....
[Article contains additional citation context not shown here]
R. Kessler, M. D. Hill, and D. A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers, 43(6):664--675, June 1994.
....and long running applications. These demands can be greatly reduced by employing sampling techniques at the expense of providing only a statistical estimate of the properties of a full trace. Previous studies contain results for various workloads and caches [Martonosi et al. 93, Laha et al. 88, Kessler et al. 94] and discuss the conditions under which sampling may, or may not, be used. Our interest is in the behavior of commonly used desktop applications. When compared to benchmarks such as SPEC95, these applications have larger working sets, are This work was supported in part by NSF Grant MIP 9700970 ....
....can be measured in a full trace by observing the average live and dead time lengths for each cache line in a cache. In a sampled trace, this probability must be estimated with observations within each sample. This sampled probability is the basis for INITMR, the miss rate estimator described in [Kessler et al. 94] and [Wood et al. 91] Accurately coping with unknown references is particularly important when sampling for large caches, where the number of unknown references can easily dominate the number of known misses. Very large caches typically correspond to a very small number of misses, and, hence, ....
[Article contains additional citation context not shown here]
Kessler, R., Hill, M. D., and Wood, D. A. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers, 43(6):664--675, June 1994.
....for a large set of cache sizes and sampling techniques. Second, we want to demonstrate the utility of these sampling techniques for architectural studies. Although it has been shown that trace sampling is not very accurate for metrics such as hit rate when simulating large multimegabyte caches[3], we demonstrate that sampling is useful in assessing trends not only for caches but also for other architectural structures whose state depends on the processing of past references. Two architectural studies are presented here that apply sampling techniques. The first study demonstrates how ....
....likely to change between samples. Technique Description cold each unknown reference misses half first half of each sample used to prime the cache[4] stitch uses the end state of the previous sample as the start state for the current sample INITMR estimates the miss ratio of unknown references[3] true sample Starts each sample with the correct cache state Figure 1 . Sampling Techniques Note that true sample simulates the caches over the full trace and reports the miss ratio observed over the regions that are sampled with the other techniques. It is therefore an unbiased estimator of ....
R.E. Kessler, et al. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Trans. on Computers, 43(6):664-675, June 1994.
....or data addresses to be collected in a trace bu er during the application s execution. The generated information is later used to drive a simulator of the system under study. Given that an application may execute billions of instructions, to reduce the storage requirements of the traces, sampling [15, 65, 79] is often used. While TDS is an e ective methodology in the study of high performance uniprocessor systems, it is not applicable to multiprocessor systems. The reason is that, in the latter systems, it is very important to faithfully model the interleaving of the memory accesses of the di erent ....
R.E.Kessler, M.D.Hill, and D.A.Wood. A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches. IEEE Transactions on Computers, C-43:664-675, June 1994.
....on trace data from part of the metasystem and we can at least use the currently available trace data from parallel computers to form synthetic trace data for machine scheduling systems. In essence, this means that sampling is used to solve size problem, as has also been done with address traces [39]. More research is required to establish the methodological basis and limitations of this approach. 4 Convergence 4.1 A Comparison Scheduling for parallel systems has been studied for a long time, and many schemes have been proposed and evaluated [19] Scheduling in metasystems is relatively ....
R. E. Kessler, M. D. Hill, and D. A. Wood, "A comparison of trace-sampling techniques for multi-megabyte caches". IEEE Trans. Comput. 43(6), pp. 664-- 675, Jun 1994.
....and simply simulates all instructions preceding the desired sample, just at a lower level of detail. Only the model s caches, branch predictor, and architectural state are updated. Other work has studied analytic models for estimating cache miss rates during the unprimed portion of the sample [9, 26], or described means for bounding errors by adjusting simulation lengths [13] Iyengar and Trevillyan have derived the R metric for measuring the representativeness of a trace [7] and they generate traces by scaling basic block transition counts and adjusting selected instructions to optimize the ....
R. E. Kessler, M. D. Hill, and D. A. Wood. A Comparison of Trace-Sampling Techniques for MultiMegabyte Caches. Tech. Report TR-1048, University of Wisconsin Computer Sciences Department, Sept. 1991.
....and simply simulates all instructions preceding the desired sample, just at a lower level of detail. Only the model s caches, branch predictor, and architectural state are updated. Other work has studied analytic models for estimating cache miss rates during the unprimed portion of the sample [48, 112], or described means for bounding errors by adjusting simulation lengths [62] Iyengar and Trevillyan have derived the R metric for measuring the representativeness of a trace [37] and they generate CHAPTER 2. EXPERIMENTAL METHODOLOGY 28 traces by scaling basic block transition counts and ....
R. E. Kessler, M. D. Hill, and D. A. Wood. A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches. Tech. Report TR-1048, University of Wisconsin Computer Sciences Department, Sept. 1991.
.... to other cache organizations by Traiger and Slutz [4] and extended even further for additional cache management policies by Thomson and Smith [5] and Hill and Smith [6] At the same time, work has been done on improving the performance of functional cache simulation for multi megabyte caches [7]. Such techniques are based on statistical sampling of caches, first proposed by Laha, et al. 8] and Stone [9] These techniques take 30 40 contiguous stripes or clusters of address references from the trace to produce an input to a simulation. The simulation results based on the sampled data ....
....30 40 contiguous stripes or clusters of address references from the trace to produce an input to a simulation. The simulation results based on the sampled data are only approximations of the simulation results for the entire trace yet Kessler, et al. found that these results are highly accurate [7]. Their accuracy, however, depends on the methods used for repairing the state of the cache at the beginning of each cluster before the simulation is applied. Variations in these repair mechanisms produce different simulation results which is known as the state repair problem. In this paper, we ....
[Article contains additional citation context not shown here]
R. E. Kessler, M. D. Hill, and D. A. Wood, "A comparison of trace-sampling techniques for multi-megabyte caches," IEEE Trans. Comput., vol. C-43, pp. 664--675, June 1994.
....to complete transactions in the OLTP benchmark. Clearly a single short simulation run cannot capture the wide spectrum of the commercial workloads behavior. Time sampling is a well known technique that may prove valuable to complete an architectural study within a reasonable simulation time [10, 12]. We intend to explore this further in future work. 8 Related Work Prior work has studied commercial workloads for their architectural and micro architectural characteristics, and has used them for simulation studies and for performance evaluations. The characterization studies can be classified ....
R. E. Kessler, Mark D. Hill, and David A. Wood. A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches. IEEE Transactions on Computers, 43(6):664--675, 1994.
....issue even for benchmarks with almost no space variability. For workloads that exhibit this behavior, time sampling approaches are necessary to decrease the probability of reaching incorrect conclusions, while enabling architectural studies to be completed within reasonable simulation time limits [17,21]. Section 5 discusses using short runs from multiple checkpoints to estimate the average workload performance on a given configuration. 5. Statistical Simulation Methodology Classical statistics provides a wealth of techniques for coping with variability. In this section, we apply a few of those ....
....5.1. Our methodology can be improved in several directions. Given a fixed simulation budget (time allowed for all simulations) a tradeoff must be made between the length of each simulation and the number of simulations required to maximize the confidence probability (and minimize cold start bias [17]) If the simulated system configuration has an impact on variability, ANOVA can be performed for different workload system configuration combinations. Sampling techniques other than systematic sampling can be used to select representative time samples. These issues are left for future work. 6. ....
[Article contains additional citation context not shown here]
R. E. Kessler, Mark D. Hill, and David A. Wood. A Comparison of Trace-Sampling Techniques for MultiMegabyte Caches. IEEE Transactions on Computers, 43(6):664--675, 1994.
.... Kessler, Hill, and Wood compared several trace sampling techniques for their ability to meet a 10 sampling goal i.e. could they estimate a trace s true misses per instruction with 10 relative error, using 10 of the trace, at least 90 of the time in the presence of multi megabyte caches [36]. They focused on two basic strategies set sampling and time sampling illustrated in figure 2.1. Their results showed trace sampling using a set sampling strategy met the 10 sampling goal. However, set sampling is not applicable in all situations, such as for caches that have time dependent ....
....goal. However, set sampling is not applicable in all situations, such as for caches that have time dependent behavior (e.g. prefetching) or when structures are shared by many sets (e.g. write buffers) FIGURE 2.1: Sampling as vertical and horizontal time space slices. This figure (taken from [36]) shows a time space diagram of a simulation with a very short trace. An observation in set sampling is the cache performance of references that occur in a horizontal slice of this figure. An observation in time sampling is the cache performance of references in a vertical slice. Time Cache ....
[Article contains additional citation context not shown here]
R. E. Kessler, Mark D. Hill, and David A. Wood, "A Comparison of Trace Sampling Techniques for Multi-Megabyte Caches," in IEEE Transactions on Computers, vol. 43, no. 6, 664-675, June 1994. 287
No context found.
R. E. Kessler, M. D. Hill, and D. A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers, 43(6):664--675, 1994.
No context found.
R.E. Kessler, M.D. Hill, and D.A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers, 43(6):664--675, June 1994.
No context found.
R. E. Kessler, M. D. Hill, and D. A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers, 43(6):664--675, 1994.
No context found.
R. E. Kessler, M. D. Hill, and D. A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers, 43(6):664--675, 1994.
No context found.
R.E. Kessler, M.D. Hill, and D.A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers, 43(6):664--675, June 1994.
No context found.
R. E. Kessler, M. D. Hill, and D. A. Wood, "A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches", IEEE Transactions on Computers, Vol. 43, No. 6, June 1994, pp. 664-675.
No context found.
R. E. Kessler, M. D. Hill, and D. A. Wood, "A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches", IEEE Transactions on Computers, Vol. 43, No. 6, June 1994, pp. 664-675.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC