| Peter Magnusson. A Design for Efficient Simulation of a Multiprocessor. In Proceedings of MASCOTS, pages 69--78, January 1993. |
.... 11 780 VAX 11 780 Iterative Interpretation No No 1,500 [Cmelik93] Spa SPARC SPARC Iterative Interpretation No No 600 [Larus91] SPIM MIPS I SPARC, 680x0, MIPS, x86, HP PA Predecode to IR No Some 25 [Cmelik93] Shadow SPARC SPARC Dynamic Compilation No No 64 [Magnusson93] gsim 88100 HP PA, SPARC Block Compilation Yes Some 35 45 50 74 [Veenstra94] MINT R3000 R3000 Block Compilation Yes No 18 65 [Bedicheck94] mg88 88100 SPARC, 680x0, 88100 Threaded Code Yes Some 54 74 [Cmelik94] Shade SPARC V8, SPARC V9, MIPS ....
....The analyzer code is given access to the emulation state, such addresses generated by the previous instruction, so that memory simulations are possible. The slowdowns reported in Table 2.3 are for Shadow and Shade emulations with a null analyzer. Like Shadow and Shade, MINT [Veenstra94] and gsim [Magnusson93] also dynamically compile code into the sequences of host instructions that are then saved for future re execution. But unlike Shade, which compiles one instruction at a time, these simulators compile clusters of several source instructions together. This strategy makes it possible to construct ....
Magnusson, P. A design for efficient simulation of a multiprocessor. In Proceedings of the
....techniques can improve the performance of macro simulators. Instead of decoding the operation fields each time an instruction is executed, the instruction is translated once into a form that is faster to execute. This idea has been used in a variety of simulators for a number of applications [8, 10, 17, 19, 26]. It is also used in some processors to translate an instruction set that programmers see into a more RISC like form that is more efficient to execute [7, 11] 3.3 Direct Execution The target program can also be executed directly on the simulator host [5, 13, 23] by encasing the program in an ....
....The first solution is straightforward and will only increase the page crossing cost. The second solution could be done in conjunction with full modelling of the i cache and will probably result in only a small decrease in performance. The third solution is possible, and is used by Magnusson [17], but has a number of problems with instruction cache modelling, debugger breakpoints, register access, and support for multiple program workloads. 4.5 Modelling Basic Instruction Execution Time To model the number of cycles an instruction takes, each simulated processor has an associated ....
Peter S. Magnusson. A design for efficient simulation of a multiprocessor. MASCOTS '93 -- Proceedings of the
....trade offs between: accuracy, speed, flexibility (i.e is adaptable to different memory configurations) and information provided. Memory simulation techniques are very accurate, flexible and can provide rich information. They are usually based on trace driven simulation [11] 9] 17] 20] 6] [13], 10] 3] 16] 23] However these techniques are very slow (usually several orders of magnitude) For instance, the slowdown exhibited by all simulators surveyed in [22] is in the range of 45 6250. There are some innovative methods that have been proposed with the objective of reducing the ....
P. Magnusson. A design for efficient simulation of a multiprocessor. In Procs. of the Western Simulation Multiconference on Int. Workshop on MASCOTS-93, pages 69--78, 1993. La Jolla, California.
....asm u N N N N scc N Dynascope[68] db atr otr hll u N N S Y pdi Y EEL[39] tb C exe u N N Y Y aug Y Executor[17] sim exe u N N Y Y pdi Y FastSim v.1[65] sim exe u N N Y N aug ffw Y FastSim v. 2 tb C exeuNNYYddi ffw Y FX32[15] sim exe u N N Y Y ddi scc Y g88[9] sim db exe usd Y N Y Y tci Y gsim[43][44] sim db atr otr tb C exe usd Y Y 1 Y Y tci dcc Y Mable[23] sim db atr exe u N Y 1 NYddi N Migrant[66] sim exe u Y N Y Y scc emu Y Mimic[46] sim exeuNNNNdcc N MINT[73] atr exe u N Y 1 Y N pdi dcc Y Moxie[16] sim exe u N N Y N scc N MPtrace[26] atr asm u N Y = S N aug N MX Vest[67] sim ....
Peter S. Magnusson, "A Design For Efficient Simulation of a Multiprocessor," in the Proceedings of the First International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), La Jolla, California, January 1993.
....that simulation is slower than one would like. Timing often depends substantially on the host and target architectures and on the host implementation. It also depends on the application s use of the target, which is in turn affected by details of the application, compiler optimizations, and so on [Magnusson93]. 6.16. Level Of Detail As described above, Shade simulates only the user mode SPARC architectural features. It is possible to simulate more of the machine [Bedichek90a] but the execution is typically slower because the virtual to host mapping is degraded. For example, if Shade simulates ....
....was available. It is also used to debug code that is difficult to debug on the real hardware. g88 has also been extended to simulate multiprocessors. One version uses a different decoded instruction format to reduce the number of host memory references needed to simulate a target instruction [Magnusson93]. Simulation takes typically 35 40 instructions per simulated instruction on a SPARC, and 55 60 instructions per simulated instruction on an HP PA. Context switches between virtual processors every 10 simulated instructions almost halves performance, and context switches every cycle reduces ....
Peter S. Magnusson, "A Design For Efficient Simulation of a Multiprocessor," MASCOTS '93 - Proceedings of the 1993 Western Simulation Multiconference on International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, La Jolla, California, January 1993.
.... us Y N Y Y scc gi Y ATOM [SE94] tb C exe u N N Y N aug N ATUM [ASH86] sim atr exe us Y Y= Y Y emu Y dis mod run [FC88] sim atr asm u N N N N scc N Dynascope [Sosic92] db atr otr hll u N N S Y pdi Y Executor [Hostetter93] sim exe u N N Y Y pdi Y g88 [Bedichek90] sim db exe usd Y N Y Y tci Y gsim [Magnusson93, Magnusson94] sim db atr otr tb C exe usd Y Y1 Y Y tci dcc Y Mable [DLHH93] sim db atr exe u N Y1 N Y ddi N mg88 [Bedichek94] sim db atr otr tb C exe usd Y Y1 Y Y tci Y Migrant [SE93] sim exe u Y N Y Y scc emu Y Mimic [May87] sim exe u N N N N dcc N MINT [VF94] atr exe u N Y1 Y N pdi dcc Y Moxie [CHKW86] sim ....
.... Accelerator [AS92] ebb nr, bo, ph, regs 3 pages dis mod run [FC88] bb nr 10 Executor [Hostetter93] proc nr 10 mixed code g88 [Bedichek90] i nr, bo 30 pages gsim [Magnusson93, Magnusson94] bb nr, bo 30 pages Mable [DLHH93] i 20 80 mg88 [Bedichek94] i nr, bo 80 pages Migrant [SE93] ebb nr,bo Mimic [May87] ebb nr, bo, regs 4 no fp, no align, compile Moxie [CHKW86] bb nr 2 MX Vest [SCKMR93] ip bo 2 mixed code, fp prec SELF [CUL89] ip none N A VM spec SoftPC [Nielsen91] 10 SPIM ....
[Article contains additional citation context not shown here]
Peter S. Magnusson, "A Design For Efficient Simulation of a Multiprocessor," Proc. of the First International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), La Jolla, California, Jan. 1993.
....no support for following the execution of one thread or switching over to another thread. There are now tools available for debugging of multithreaded programs but these tool were not available when the system was implemented. In the last phase of the project the instruction level simulator SimICS [43, 42, 44] was used. The simulator accurately simulates a multiprocessor SPARC architecture and can provide valuable statistics. The simulator was mainly used for performance debugging and proved to be a vital tool. The simulator does provide a feature that could have saved many days (not to say weeks) of ....
P. Magnusson. A Design for Efficient Simulation of a Multiprocessor. In Proceedings of MASCOTS, pages 69--78, January 1993.
....execution details than instrumentation monitors. Some of the simulators, such as Shade [9] provide extensive support for instruction analysis or even a debugging interface to the simulated target machine. Surrogate execution in Dynascope is similar to techniques, used in some of the simulators [4, 9, 27]. Dynascope can be used as an instruction level simulator, if its performance overhead of two orders of magnitude can be tolerated by the application. The overhead is a result of event generation, because Dynascope generates extensive information for every instruction. Tracing performance of ....
P. S. Magnusson. A design for efficient simulation of a multiprocessor. In Proceedings of the First International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), January 1993.
....obtaining the next address in the trace, searching for that address in a simulated cache, and then invoking a replacement policy in the event of a miss. The trace addresses can come from a file created by a trace extraction tool, or they might be generated on the fly by an annotated workload [Agarwal86, Borg90, Chen93b, Cmelik94, Eggers90, Holliday91, Hsu89, Larus90, Larus93, Magnusson93, MIPS88, Mogul91, Sites88, Smith91]. The search procedure involves indexing a data structure that represents the cache and then, depending on the associativity of the cache, performing one or more comparisons to test for a hit. Though a simple operation, the search and test must be performed for every address in the trace. Tapeworm ....
Magnusson, P. S. A design for efficient simulation of a multiprocessor, In MASCOTS '93 - Proceedings of the 1993 Western Simulation Multiconference on International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, La Jolla, California, 1993.
.... Translation Policy Chain, Thread or Block Iterative Interpretation [Cmelik93] Spa (Spy) SPARC SPARC Host Registers N A No 40 600 [Davies94] Mable MIPS I, MIPS III MIPS I Memory N A No 20 200 Predecode Interpretation [Larus91] SPIM MIPS I SPARC, 680x0, MIPS, x86, HP PA Memory All at once No 25 [Magnusson93] gsim 88100 HP PA, SPARC Memory Lazy Threading 45 75 [Bedicheck95] Talisman 88100 SPARC Memory Lazy Threading 100 150 [Veenstra94] MINT R3000 R3000 Hybrid All at once Block 20 70 Dynamic Translation [Cmelik94] Shade SPARC V8, SPARCV9, MIPS SPARC V8 Memory Lazy Chaining 9 14 Table 2. ....
....SPIM, which reads and translates a MIPS I executable, in its entirety, to an intermediate representation understood by the emulation engine [Larus91] After translation, SPIM can lookup and emulate predecoded instructions with a slowdown factor of approximately 25. Talisman [Bedichek95] and gsim [Magnusson93] also use a form of instruction predecoding, but instead of decoding all instructions of a workload before it begins running, these emulators predecode instructions lazily, as they are executed for the first time. By caching the results, these emulators can benefit from predecoding without the ....
[Article contains additional citation context not shown here]
Magnusson, P. A design for efficient simulation of a multiprocessor. In Proceedings of the 1993 Western Simulation Multiconference on International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS-93), 69-78, La Jolla, California, 1993.
No context found.
Peter Magnusson. A Design for Efficient Simulation of a Multiprocessor. In Proceedings of MASCOTS, pages 69--78, January 1993.
....at the instruction set level, as it represents a welldefined border between hardware and software. Variation of modeling an entire computer system at this level is a traditional theme. This type of simulation has been called complete machine [8] complete computer system [22] system level [12], instruction set [2] and faithful [5] simulation. In early work, it was referred to as virtual machines [4] or simply simulation [11] In this paper we shall use the term complete system instruction set simulation , or complete system simulation . Complete system simulators model ....
P. Magnusson. A Design for Efficient Simulation of a Multiprocessor. In Proceedings of MASCOTS, pages 69--78, January 1993.
....architecture design, porting system software, and performance tuning of software. The basic design of modern instruction set simulators is often a variation of threaded code (Bell 1973) This design can be extended to support full system level simulation (Bedichek 1990) and multiprocessors (Magnusson 1993). An element that has been missing in instruction set simulators is efficient support for accurate execution profiling and instruction cache modeling. Instruction cache behavior is important for computer architects, and both instruction cache and execution profiling are important to support ....
....of which 6 constitute the standard epilogue (4) This sets an upper limit on performance for this technique of about 20 times slower than native execution. Achieving significantly better performance than this requires more sophisticated translation, including run time generation of host code (Magnusson 1993, Witchel and Rosenblum 1996) Figure 1: SimICS Interpreter Core SimICS emulates a SunOS 5.x kernel by explicitly emulating common system calls. This includes support for running multiple programs (multitasking) as well as running programs on several processors (multiprocessing) SimICS can also ....
Magnusson, P. S. 1993. A design for efficient simulation of a multiprocessor. In Proceedings of MASCOTS'93, 69-78.
....to support all of the features described in this paper. Furthermore, the individual features can be toggled during execution the necessary internal data structures are allocated as necessary, while the execution state is unaffected. 5 Details on the internals of SimICS are described elsewhere [23, 24, 25, 26, 34]. 3.4 Memory Management Unit The MMU module is well isolated from the other memory simulation components. The MMU needs to provide a mmu logical to physical( routine to report on legal translations. Conversely, the MMU simulation code can call a routine to clear address intervals that are ....
....Considerations SimICS simulates the concurrency of multiprocessors by round robin scheduling the processors. Each processor is simulated for a fixed time slice (determined by the user) before switching. This switch must be efficient, or the user will be limited to using long switching intervals [24]. The STC implementation described above requires one pointer to be allocated in a global register during interpretation. It needs to be reloaded from memory upon every processor switch. This is the only overhead that the memory simulation directly contributes to multiprocessor simulation. 8 ....
P. Magnusson. A Design for Efficient Simulation of a Multiprocessor. In Proceedings of MASCOTS, pages 69--78, January 1993.
....they expect a machine level view of their host. Since instruction set simulators are a software only solution to target architecture modelling, there is no difficulty in principle to emulate the target faithfully enough to fool an operating system the difficulty lies in the details [3,19,26]. A sun4m architecture model for SimICS has been implemented that is sufficiently accurate to boot Linux (see Acknowledgements) It uses runtime loadable modules to SimICS for each of the devices and MMU (SPARC Reference MMU) The benefit of using a simulator to study an OS is that it cuts through ....
....on the architectural relationship between the host and target. Examples include g88 [3] CacheMire [7] Mint [28] Shade [8] SimOS [26, 30] Talisman [4] and SimICS. All these simulators translate from a target code to an intermediate format. This format can then either be interpreted [3, 4, 19] or directly executed [2, 20, 26, 28, 30] Instruction set simulation is generally the slowest but most flexible approach. 8.6 SimICS SimICS is an instruction set simulator that has borrowed many design principles from g88 [3] SimICS takes the brute force approach to all three problems mentioned ....
P. S. Magnusson. A Design for Efficient Simulation of a Multiprocessor. In Proceedings of MASCOTS, pages 69-78, January 1993.
....of this paper to go into any detailed discussion of trade offs between the various strategies. Simply put, instruction set simulation is the slowest and most flexible, host supported simulation is the converse (fast but inflexible) and execution driven simulation holds the middle ground. SimICS [9, 10, 11] is an instruction set simulator that has borrowed many design principles from g88 [3] SimICS takes the brute force approach, modelling the target architecture on an instruction by instruction level. This results in a lower performance compared to execution driven or host supported ....
P. S. Magnusson. A design for efficient simulation of a multiprocessor. In Proceedings of MASCOTS, pages 69--78, January 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC