| Uhlig, R., Nagle, D., Mudge, T. and Sechrest, S. Trap-driven simulation with Tapeworm II. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, ACM Press (SIGARCH), 132-144, 1994. |
....such an ideal trace collection. Most current trace collection platforms are limited to collect traces from a single application, generally only user instructions in this application [15] while it was shown that the usage of such uncompleted traces leads to significant misleading conclusions [1, 16, 17]. Another concern for micro architecture studies is the performance of trace collection. Among the numerous trace collection platforms available [15] none of them allows a slowdown lower than 10 for simply collecting addresses for memory hierarchy simulations. For studies like value prediction ....
R. Uhlig, D. Nagle, T. Mudge, and S. Sechrest. Trap-driven simulation with Tapeworm II. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), pages 132--144, San Jose, California, October 1994.
....been applied to a uniprocessor operating system. However, these are hardly the right tools to capture complete traces of a multiprogrammed parallel workload that includes applications, operating system and several daemons. A second type of software based systems are trap driven simulation systems [20]. In this approach, simulations are driven by kernel traps. These traps allow the simulation of a cache as the kernel executes. Indeed, memory traps are set on addresses that are currently not in the simulated cache. When that address is accessed, the kernel traps. After trapping, the kernel ....
....TLBs can be simulated in a similar way. With this approach, both operating system and application effects are considered. However, a major disadvantage of this scheme is that it cannot generate as much information as systems based on traces. Examples of this approach are Uhlig et al. s Tapeworm II [20] and Talluri s TLB simulator [17] Finally, the last type of software based system is exemplified by SimOS [15] In SimOS, the hardware of a machine is simulated with enough detail to run an entire operating system. On top of this operating system, we can run applications. With this system, we ....
R. Uhlig, D. Nagle, T. Mudge, and S. Sechrest. Trap-Driven Simulation with Tapeworm II. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 37--47, October 1994. 15
....that AVM is required if applications are to acheive any consistent degree of flexibility. 2.1 Examples Fine grain monitoring. AVM (by necessity) gives very precise information about and control over the TLB. This information can be used to derive working sets [18] or to trace address streams [22]. Accurate in core information. AVM systems have total control over virtual memory mappings. Their accurate knowledge of which pages are resident in memory can be useful to many types of applications. For example, a scientific program manipulating large matrices could work on those pieces that ....
Richard Uhlig, David Nagle, Trevor Mudge, and Stuart Schrest. Trap-driven simulation with tapeworm II. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), pages 132--144, October 1994.
....the latter approach leads to more efficient simulation. Finally, we examine trap driven simulation. In this approach, traps are set on memory locations by making use of error correction codes and the simulator takes control only during the traps. This approach has been used in WWT [8] and Tapeworm [9]. The advantage of trap driven simulation is that memory reference events, that do not miss in the simulated memory hierarchy do not cause a trap. As a result, the simulation is faster. However, since this approach is heavily dependent on the underlying architecture, it can not be used ....
R. Uhlig, D. Nagle, T. Mudge, and S. Sechrest. "Trap-driven simulation with Tapeworm II," 6th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), Oct. 1994, pp. 132-144.
....SPEC programs were compiled to use dynamic linking SPEC benchmarks are normally linked statically but most realworld programs on Solaris use dynamic linking to libraries. DRAFT. DO NOT COPY. 13 9. 2 Methodology We evaluate the performance of a common mask TLB using trap driven simulation [17] implemented in foxtrot [15] a Solaris 2.1 based operating system that counts the number of user TLB misses for a workload. Our simulation environment does not include kernel TLB misses but includes the effect of context switches in the multi programmed workloads. Kernel TLB misses will ....
R. Uhlig, D. Nagle, T. Mudge, and S. Sechrest, "Trap-driven Simulation with Tapeworm II," 6th Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), October 1994, pp. 132-144.
.... and OS workloads involved microcode or different forms of hardware monitoring or modifications [1, 13, 30, 41] Recently, more flexible techniques have been developed that rely only on the manipulation of ECC bits in the host memory and clever modifications to the host operating system [32, 42]. In general, these techniques are unwieldy and inflexible. Instrumenting the program binary is a common software solution to simulation. However, this approach tends to sharply constrain the class of programs that can be studied, in particular the programing language and or compiler being used. ....
R. Uhlig, D. Nagle, T. Mudge, and S. Sechrest. Trap--driven Simulation with Tapeworm II. In Proceedings of ASPLOS--VI, pages 132--144, October 1994.
....state bits for SVM on the CM5; they used ECC bits to cause faults, however, this would still require emulating writes, and the Alpha 250 has imprecise exceptions on data parity errors, making use of parity difficult or impossible. Similar techniques have been used for trace production as well [78]. 2 The IBM 801 used a similar scheme to manage transactions on units of less than a page in their case, for each 128 byte line [17] 126 Table 8.2: Page fault latencies for Eager Fullpage Fetch from remote memory. Latencies are arrival times of subpage and rest of page. Improvement ....
Richard Uhlig, David Nagle, Trevor Mudge, and Stuart Sechrest. Trap-driven simulation with Tapeworm II. In Proc. of the 6th Int. Conf. on Arch. Support for Prog. Languages and Operating Systems, October 1994.
....a similar optimization more cleanly, using the OM liveness analysis to detect, and save, caller save registers used in the simulator routines [21] However, ATOM still incurs unnecessary procedure linkage overhead in the no action cases. A recent alternative technique, trap driven simulation [17, 25], optimizes no action cases to their logical extreme. Trap driven simulators exploit the characteristics of the simulation platform to implement effective address calculation and lookup (steps 1 and 2) in hardware. References requiring no action run at full hardware speed; other references cause ....
....and hardware support that is not readily available on most machines. Generality is lacking because current trap driven simulators do not simulate arbitrary memory systems: the Wisconsin Wind Tunnel does not simulate stack references [17] while Tapeworm II does not simulate any data references [25]. Furthermore, the overhead of memory exceptions can overwhelm the benefits of free lookups for simulations with non negligible miss ratios. The active memory abstraction described in detail in the next section combines the efficiency of trap driven simulation with the generality and ....
[Article contains additional citation context not shown here]
Richard Uhlig, David Nagle, Trevor Mudge, and Stuart Sechrest. Trap-Driven Simulation with TapewormII. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 132--144, October 1994.
....a similar optimization more cleanly, using the OM liveness analysis to detect, and save, caller save registers used in the simulator routines [71] However, ATOM still incurs unnecessary procedure linkage overhead in the no action cases. A recent alternative technique, trap driven simulation [60,78], optimizes no action cases to their logical extreme. Trap driven simulators exploit the characteristics of the Action Application All Addresses Simulator Ref Gen Figure 2: On The Fly Simulator 11 simulation platform to implement effective address calculation and lookup (steps 1 and 2) in ....
.... exploit the characteristics of the Action Application All Addresses Simulator Ref Gen Figure 2: On The Fly Simulator 11 simulation platform to implement effective address calculation and lookup (steps 1 and 2) in hardware using error correcting code (ECC) bits [60] or valid bits in the TLB [78]. References requiring no action run at full hardware speed; other references cause memory system exceptions that invoke simulation software. By executing most references without software intervention, these simulators potentially perform much better than other simulation systems. Unfortunately, ....
[Article contains additional citation context not shown here]
Richard Uhlig, David Nagle, Trevor Mudge, and Stuart Sechrest. Trap-Driven Simulation with TapewormII. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 132--144, October 1994.
....some perturbation. Indeed, 8 out of 64 general purpose registers are reserved for tracing in order to minimize memory references. Furthermore, tracing slows down execution by about 10 times. A related software scheme that has been implemented in uniprocessors is called trap driven simulation [21, 28]. In this approach, simulations are not driven by traces; they are driven by kernel traps. These traps allow the simulation of a cache as the kernel executes. Indeed, memory traps are set on addresses that are currently not in the simulated cache. When that address is accessed, the kernel traps. ....
....In addition, both operating system and application effects are considered. However, there is overhead involved in the extra instructions executed. Furthermore, this approach cannot generate as much information as trace driven simulations. Examples of this approach are Uhlig et al. s Tapeworm II [21, 28] and Talluri s [25] TLB simulator. Finally, there are many software based systems that only consider application traces and ignore the operating system. Many of them run on uniprocessors and may either trace a multiprocessor program (for example [13] or a uniprocessor program (for example [9] ....
R. Uhlig, D. Nagle, T. Mudge, and S. Sechrest. Trap-Driven Simulation with Tapeworm II. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 37--47, October 1994.
....with 384MB of main memory. We are unable to report TLB simulation numbers for databases as we could not get access to a commercial database server that would run our modified operating system. 9. 2 Methodology We evaluate the performance of a common mask TLB using trap driven simulation [17] implemented in foxtrot [15] a Solaris 2.1 based operating system that counts the number of user TLB misses for a workload. Our simulation environment does not include kernel TLB misses, but includes the effect of context switches in the multi programmed workloads. Kernel TLB misses will ....
R. Uhlig, D. Nagle, T. Mudge, and S. Sechrest, "Trap-driven Simulation with Tapeworm II", 6th Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), October 1994, pp. 132-144.
....a similar optimization more cleanly, using the OM liveness analysis to detect, and save, caller save registers used in the simulator routines [29] However, ATOM still incurs unnecessary procedure linkage overhead in the no action cases. A recent alternative technique, trap driven simulation [23,34], optimizes no action cases to their logical extreme. Trapdriven simulators exploit the characteristics of the simulation platform to implement effective address calculation and lookup (steps 1 and 2) in hardware using error correcting code (ECC) bits [23] or valid bits in the TLB [19] ....
....system and hardware support that is not readily available on most machines. Generality is lacking because current trap driven simulators do not simulate arbitrary memory systems: the Wisconsin Wind Tunnel [23] does not simulate stack references because of SPARC register windows, while Tapeworm II [34] does not simulate any data references because of write buffers on the DECstation. Furthermore, as we show in Section 5, the overhead of Lookup Action Application All Addresses Simulator Ref Gen Figure 2: On The Fly Simulator 6 memory exceptions (roughly 250 cycles [34,33,22] on well tuned ....
[Article contains additional citation context not shown here]
Richard Uhlig, David Nagle, Trevor Mudge, and Stuart Sechrest. Trap-Driven Simulation with TapewormII. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 132--144, October 1994.
....gprof [4] and mprof [20] profilers provide information about processor usage and dynamic memory allocation, respectively. Memory hierarchy simulators can allow hardware designers to evaluate designs on real programs or permit programmers to take advantage of specific architectural features (e.g. [15, 18, 19]) Since there is no reason to expect hardware developments to cease or programs to become significantly simpler than they currently are, it is reasonable to expect a continuing need for new kinds of program analysis tools. Some dynamic analysis systems are limited in scope. For example, gprof ....
UHLIG, R., NAGLE, D., MUDGE, T., AND SECHREST, S. Trap-driven simulation with Tapeworm II. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (San Jose, California, 1994), pp. 132--144.
....state bits for SVM on the CM5; they used ECC bits to cause faults, however, this would still require emulating writes, and the Alpha 250 has imprecise exceptions on data parity errors, making use of parity difficult or impossible. Similar techniques have been used for trace production as well [22]. 2 The IBM 801 used a similar scheme to manage transactions on units of less than a page in their case, for each 128 byte line [3] 3 We have optimized the performance of global memory operations along the lines described in [21] hence our latencies are slightly better than those reported ....
Richard Uhlig, David Nagle, Trevor Mudge, and Stuart Sechrest. Trap-driven simulation with Tapeworm II. In Proc. of the 6th Int. Conf. on Arch. Support for Prog. Languagesand Operating Systems, October 1994.
No context found.
Uhlig, R., Nagle, D., Mudge, T. and Sechrest, S. Trap-driven simulation with Tapeworm II. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, ACM Press (SIGARCH), 132-144, 1994.
....the difficulty and expensive cost of the experimental methodology restrains designers from doing it. To overcome this problem, an efficient method of evaluating the system performance under a multi tasking environment has been proposed by Uhlig, et al. which is termed trap driven simulation [Uhlig94a, Uhlig94b]. In this study, we extended this new method to a different but even more popular hardware architecture and collected some interesting results. In order to emphasize the multi tasking environment, we incorporated the operating system (OS) because the OS is primarily responsible for managing ....
....we must be able to both monitor OS activities and keep the system functioning undisturbed (not stalled) as much as possible. A limited sized buffer and, therefore, the necessity of frequent system stalls inevitably changes the system behavior. To overcome these shortcomings, Uhlig et al. [Uhlig94b] developed a trap driven simulator, called Tapeworm, that can capture events during operating system activity efficiently and correctly. Furthermore, these events can be processed on the fly, thereby avoiding the need for buffering and stalling. Tapeworm, moreover, is purely software based. It ....
[Article contains additional citation context not shown here]
Uhlig, R., Nagle, D., Mudge, T., Sechrest, S. Trap-driven Simulation with Tapeworm II, In the proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, ACM, 132-144,
....A disadvantage of using OS traps is that, if many events must be recorded, the cumulative OS overhead of handling all the traps is significant. However, there are a number of exception mechanisms in operating systems that can be utilized to improve the efficiency of this method. Tapeworm II [28] is an example of an efficient software based tool that drives cache and TLB simulations using information from kernel traps. It utilizes low overhead exceptions and traps of relatively few events. The applicability and efficiency of the OS trap approach depends upon the accessibility of certain ....
R. Uhlig, D. Nagle, T. Mudge, and S. Sechrest, "Trap-driven simulation with Tapeworm II," Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, (San Jose, CA), Oct. 1994.
....4 55 2 7 Annotation D cache No No [Rosenblum95] Witchel96] SimOS Embra 10 7 21 Emulation D cache, I cache, TLB Yes Yes Hardware based Miss Detection [Nagle93] Tapeworm 1 2 100 650 0.5 4.5 TLB Miss TLB Yes Yes [Reinhardt93] WWT 1 2 2,500 1 1. 4 46 1 ECC D cache No No [Uhlig94] Tapeworm II 1 2 300 0 10 ECC I cache, TLB Yes Yes [Lee94] Tapeworm486 1 2 3,600 4,000 0 14 Page Fault TLB Yes Yes [Talluri94] Foxtrot 1 2 1,500 4,000 TLB Miss TLB No No Table 8. Beyond Traces: Some Recent Fast Memory Simulators Each of the simulators in this table improve ....
.... Tapeworm simulator which also uses ECC bit modification to simulated caches, improves on the speed of WWT by showing that trap handling times can be reduced by nearly an order of magnitude to about 300 cycles, bringing overall simulation slowdowns for instruction caches into the range of 0 to 10 [Uhlig94]. Tapeworm II, like the original Tapeworm, also demonstrates that trap driven cache simulation is capable of complete monitoring multi process and operating system workloads. Experiments performed with Tapeworm II show that trap driven simulation slowdowns are highly dependent on the memory ....
[Article contains additional citation context not shown here]
Uhlig, R., Nagle, D., Mudge, T. and Sechrest, S. Trap-driven simulation with Tapeworm II. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, ACM Press (SIGARCH), 132-144, 1994.
No context found.
Richard Uhlig, David Nagle, Trevor Mudge and Stuart Sechrest. Trap-driven Simulation with Tapeworm II, ASPLOS, San Jose, 1994.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC