| H. Davis, S. R. Goldschmidt, and J. Hennessy, "Multiprocessor simulation and tracing using Tango," Proceedings of the International Conference on Parallel Processing, pp. II-99-106, 1991. |
....expense of purchasing it; second, one can do the simulation fast there is no need to simulate the workstation s behavior (for example down to the level of memory references) since that part of the hardware is readily available. Many of the early simulators were designed for sequential execution [9, 13, 14]. However, even with the use of abstract models and direct execution, sequential program simulators tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program [9] Several recent efforts have been exploring the use of parallel execution [10, 17, 18, 24, ....
H. Davis, S. R. Goldschmidt, and J. Hennessey. Multiprocessor simulation and tracing using tango. In Proceedings of ICPP'91, pages 99--107, 1991.
....file, which allows the latency, functional unit usage, and ability to cause exceptions to be specified for each instruction. Twine scheduled the optimized code for our pipeline, leaving the final two issues. We have developed a detailed processor and system simulator that interfaces to Tango Lite [5]. Tango Lite is a simulation package that allows execution driven simulation of parallel programs on uniprocessors. Our simulator accepts basic block and memory reference addresses from Tango Lite. The basic block addresses index into the properly scheduled object file and are used to generate ....
Helen Davis, Steven R. Goldschmidt, and John Hennessy. Multiprocessor simulation and tracing using Tango. In Proceedings of the 1991.
....slow to evaluate system level performance. Real applications on parallel machines run for billions, or even trillions of cycles; even register transfer level simulators are much too slow. Over the last several years, direct execution has become widely used to accelerate architectural simulations [6, 4, 3, 7, 14]. Direct execution exploits the common ality between the instruction set of the simulated target machine and the underlying host system. For example, a floating point multiply on the target is simulated by executing a floating point multiply on the host. Such a system need only simulate the ....
....by executing a floating point multiply on the host. Such a system need only simulate the differences between the target system and the host, achieving impressive performance when the two systems are very similar. Simulations of parallel computers have exploited di rect execution in several ways [3, 7, 5]. Most commonly, a parallel target system is simulated on a uniprocessor host. For example, the Tango system spawns an event generation process for each processor in a target sharedmemory system. These processes directly execute all computation instructions, but must send most memory references to ....
Helen Davis, Stephen R. Goldschmidt, and John Hennessy. Multiprocessor Simulation and Tracing Using Tango. In Proceedings of the 1991.
....of the coherence protocol. The protocol s operation (and cost) in turn, depends on the memory s state. An accurate, but slow and complex method of calculating the cost of a memory access is to simulate a particular machine in detail and record which statements cause interprocessor communication [11, 14, 35]. Simulation, because it is so time consuming, is generally limited to studying short programs with small data sets. For many programmers, a more attractive approach would trade accuracy for simplicity. The CICO performance model makes three approximations that permit reasoning about cache s ....
Helen Davis, Stephen R. Goldschmidt, and John Hennessy. Multiprocessor Simulation and Tracing Using Tango. In Proceedings of the 1991.
....expense of purchasing it; second, one can do the simulation fast there is no need to simulate the workstation s behavior (for example down to the level of memory references) since that part of the hardware is readily available. Many of the early simulators were designed for sequential execution [9, 13, 14]. However, even with the use of abstract models and direct execution, sequential program simulators tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program [9] Several recent efforts have been exploring the use of parallel execution [10, 17, 18, 24, ....
H. Davis, S. R. Goldschmidt, and J. Hennessey. Multiprocessor simulation and tracing using tango. In Proceedings of ICPP'91, pages 99--107, 1991.
....simulating a less complex processor, as well as from several speed enhancing techniques developed for such simulators. Direct execution is one such widely used technique that has previously relied on simple processor features such as blocking reads, in order issue, and no speculation [CDJ 91, DGH91, MRF 97] This chapter presents a novel adaptation of direct execution to substantially speed up simulation of shared memory multiprocessors with ILP processors, without much loss of accuracy. We have developed a new simulator, DirectRSIM, based on our new technique. We evaluate the accuracy ....
....as the methodologies used for simulating ILP based systems. 87 6.1. 1 Direct execution with simple processors Direct execution is a widely used form of execution driven simulation, and has been shown to be accurate and fast for modeling shared memory systems with simple processors [CDJ 91, DGH91, MRF 97] Direct execution decouples functional and timing simulation. Functional simulation generates values (for registers and memory) and control flow, while timing simulation determines the number of cycles taken by the simulated execution. Direct execution achieves high speed in two ....
[Article contains additional citation context not shown here]
Helen Davis, Stephen R. Goldschmidt, and John Hennessy. Multiprocessor Simulation and Tracing Using Tango. In Proceedings of the International Conference on Parallel Processing, pages II--99--II107, August 1991.
....solution. The right first time approach of VLSI design (and Total Quality Management) should be extended to cover parallel software. Some simulation tools employ direct execution to evaluate performance of parallel programs on parallel architectures (e.g. MIT Proteus [11] Stan ford Tango [12], WWT [13] This is an excellent approach for obtaining fast, realistic simulations if the processor you re running the simulation on is very similar to the processor used in the parallel machine but we believe that this assumption is sometimes too restrictive and a hierarchical tool can find ....
Helen Davis, Stephen R. Goldschmidt, and John Hennessy. "Multiprocessor Simulation and Tracing Using Tango", Proc.
.... statements in data parallel [20] or task parallel programs [21] and (2) exploiting the communication topology to reduce number of neighbours of each process and hence the number of null messages [15] Other research in the field of parallel simulation for performance prediction include [7], 22] and [23] Direct execution was shown in [6] as a means to avoid instruction level simulation by executing portions of code directly on the architecture being studied. The application is compiled for the instruction set architecture of the machine, but the simulator traps all calls that ....
H. Davis, S. Goldschmidt, and J. Hennessy. "Multiprocessor Simulation and Tracing Using Tango," In Proceedings of the 1991 International Conference on Parallel Processing, August 1991.
....selecting among the modes dynamically so that it can simulate only interesting sections of execution in detail; however, it doesn t provide a detailed strategy to change its simulation mode in order to achieve the best performance with respect to an error requirement. Other simulation systems [6, 9, 12] that permit multi level simulation have also been developed. None of these systems allow selecting the most efficient level of simulation that meets a target error bound. However, these systems explicitly consider the effects of system activities such as Range of Time Raw Combined Simulation ....
H. Davis, S. R. Goldschmidt, and J. Hennessy, "Multiprocessor Simulation and Tracing Using Tango," 1991 ICPP, August 1991, St. Charles, IL, pp. 99-107.
....I O compilation) application) GUI SimpleScalar Uniprocessor x86, RS6000, Text, Instr. set, memory [2] Simulation Sparc, Alpha, C, Fortran program hierarchy, superscalar, PA RISC dependent branch pred. speculation TangoLite Multiprocessor MIPS C Fortran with Text Distrib. shared memory, [7] Simulation ANL macros mem. hier. coherence VTune Performance x86, C C , GUI, CISC microarchitecture, Analysis MS Windows Fortran statistics superscalar, mem. hier. WARTS: Cache Program Sparc Solaris C w PARMACS Text Instr. set, mem. QPT [16] Profiling and NOWs, macros (WWT) traces, ....
Helen Davis, Stephen R. Goldschmidt and John Hennessy. Multiprocessor Simulation and Tracing Using Tango. In Proceedings of the 1991 International Conference on Parallel Processing (ICPP, Vol. II, Software), August 1991.
....preliminary and need to be confirmed by studies with additional applications, they illustrate the large potential benefits that could be achieved via compiler analysis of lookahead in target applications. 2 Related Work Many of the early program simulators were designed for sequential execution [8, 12, 13]. However, even with the use of direct execution, sequential program simulators tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program [8] Several efforts have been exploring the use of parallel execution [10, 14, 17, 19, 22] to reduce the model ....
H. Davis, S. R. Goldschmidt, and J. Hennessy, "Multiprocessor Simulation and Tracing using Tango," Proceedings of ICPP'91, pp. 99-107, 1991.
....a channel has lookahead x if a message sent over that channel at time s never affects the recipient before time s x. Parallel discrete event simulation has proven successes in several application areas, most notably in aviation control[23] Markov chain simulation [15] architectural simulation [20, 4, 5] and telecommunications [3] Nevertheless, every success involves some tuning of synchronization protocol to the model. This is one of several reasons why parallel discrete event simulation is viewed by many as a domain for experts only. This research is supported in part by DARPA Contract ....
H. Davis, S. Goldschmidt, and J. Hennessy. Multiprocessor simulation and tracing using tango. In Proceedings of the 1991 International Conference on Parallel Processing, pages II99--II107, August 1991.
....of resulting data that may be produced by massively parallel applications. The trace collection issue is not a recent one. A number of user tools are heavily based on traces, including correctness debuggers [5, 17, 26, 16, 22, 15] performance debuggers[9, 10, 19, 18, 8] trace driven simulators [3, 28, 6, 24, 4], etc. A broad classification [25] of tracing techniques distinguishes four basic classes: hardware based methods which use a hardware monitor to record all requests on the address bus of a processor, interrupt based methods which cause an interrupt on every instruction that accesses some memory ....
.... modified microcode [23] and instrumented program based methods that introduce tracing statements in an application s code which are responsible for producing the traces; instrumentation code may be inserted before the compilation of the application [29, 20, 8] during the compilation [3, 6] or even during the execution of the application [13] Hardware based methods are characterised by a low degree of instrusiveness which results in low overhead and low perturbation of the execution of the target application. However, their cost, inherent inflexibility and inability to provide ....
H. Davis, S. R. Goldschmidt, and J. Hennessy. "Multiprocessor Simulation and Tracing Using Tango". In Proceedings of the 1991 International Conference on Parallel Processing, 1991.
....interconnection network, and memory hierarchy) for the purpose of isolating and quantifying overheads. IV. SPASM SPASM is an execution driven simulator written in CSIM [28] used for simulating the execution of a parallel program on a parallel machine. As with other recent simulators [29] 30] [31], 32] 33] the bulk of the instructions in the parallel program is executed at the speed of the native processor (SPARC in our studies) and only instructions such as LOADs STOREs on a shared memory platform, and SENDs RECEIVEs on a message passing platform, that may potentially involve a ....
....in the application program since the last trap to the simulator. At the trapped instruction, SPASM reconciles the simulated time for the processor issuing the instruction since the last trap using the cycle counts. This technique has been popular in other execution driven simulators [29] 30] [31], 33] 34] as well. Finally, the assembled binary is linked with the rest of the simulator code. A simulation platform like SPASM allows us to vary a wide range of hardware parameters such as the number of processors, CPU clock speed, network topology, bandwidth of the links in the network, ....
H. Davis, S. R. Goldschmidt, and J. L. Hennessy, "Multiprocessor Simulation and Tracing Using Tango," in Proceedings of the 1991 International Conference on Parallel Processing, 1991, pp. II 99--107.
....These studies used a hybrid method combining trace driven simulations and real execution of user code to study the performance of shared memory systems. This approach is difficult and expensive since it requires execution of a part of user code during simulation. The executiondriven simulation [3] or hybrid simulation of message passing programs whose behavior depends on message arrival order would be more difficult and impractical. Message driven execution, explained in Section 2, is based on the ability to run computations in different orders. Therefore, the simulation of messagedriven ....
H. Davis, S. Goldschmidt, and J. Hennesy, "Multiprocessor Simulation and Tracing using Tango", Proceedings of the International Conference on Parallel Processing, Vol II, Aug 1991, pp99-107.
....innovations on the performance of applications. It provides a controlled environment in which the various components of the simulated system can be changed, and their results evaluated. However, sequential simulation of detailed large programs and systems become extremely time consuming [BDC91, DGH91, CDJ91]. This has lead to a variety of attempts to use parallel execution to reduce simulation time. Existing parallel program simulators are used to evaluate the performance of memory systems, interconnection networks, or processor architectures. This thesis describes the design and implementation of a ....
....of a process contains two types of events: local events that correspond to execution of a local code block (LCB) in the target program, and I=O Gamma events which corresponds to the execution of an MPI IO statement. The most common method for simulation of a local event is by direct execution [BDC91, DGH91, CDJ91]. The LP, say lp i executes the LCB on the host machine, measuring its duration, say t, and advancing clock i by t. For runtime measurement to have a reasonable degree of accuracy, the host and target processors must be the same (or an appropriate scaling factor must be determined) The simulation ....
H. Davis, S. R. Goldschmidt, and Hennessey. "Multiprocessor simulation and tracing using Tango." In Proceedings of the 1991 International Conference on Parallel Processing (ICPP'91), pp. II99--II107, August 1991.
.... 2: Details of L1 L2 data cache Memory component Latency L1 1 cycle L2 10 cycles Local Memory 35 cycles Remote Memory 80 cycles Remote Cache 105 cycles Table 3: Hit Latencies for the memory hierarchy 3 Simulation environment The simulator is built upon the TangoLite simulation environment [DGH91] which is a software based multiprocessor environment and runs on MIPS based uniprocessor. It provides a multiprocessor environment through the multiplexing of lightweight threads onto this sequential machine. The simulation is achieved by instrumenting the application code with calls to the ....
H. Davis, S.R. Goldschmidt, and J. Hennessy. Multiprocessor simulation and tracing using tango. In International Conference on Parallel Processing, 1991.
No context found.
H. Davis, S. R. Goldschmidt, and J. Hennessy, "Multiprocessor simulation and tracing using Tango," Proceedings of the International Conference on Parallel Processing, pp. II-99-106, 1991.
No context found.
Helen Davis, Stephan R. Goldschmidt, and John Hennessy. Multiprocessor Simulation and Tracing using Tango. In Proc. 1991.
No context found.
Davis, H., Goldschmidt, S.R., Hennesy, J.: Multiprocessor simulation and tracing usingTango. In Proc. of theInt. Conf. on Parallel Processing, (1991) II99--II107
No context found.
H. Davis et al. Multiprocessor Simulation and Tracing Using Tango. In Proceedings of the International Conference on Parallel Processing, 1991. 50
No context found.
H. Davis, S. R. Goldschmidt, and J. Hennessy. Multiprocessor simulation and tracing using tango. In Proceedings of the 1991.
No context found.
H. Davis, S. Goldschmidt, and J. Hennessy. Multiprocessor simulation and tracing using Tango. In Proceedings of the 1991.
No context found.
Helen Davis, Stephen R. Goldschmidt, and John Hennessy, "Multiprocessor Simulation and Tracing Using Tango," In Proceedings of the 1991 International Conference on Parallel Processing, pages II-99 - II-107, August 1991.
No context found.
Davis, H., Goldschmidt, S. and Hennessy, J. Multiprocessor simulation and tracing using Tango. In Proceedings of the
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC