67 citations found. Retrieving documents...
R. C. Covington, S. Madala, V. Mehta, J. R. Jump, and J. B. Sinclair, "The Rice Parallel Processing Testbed," Proceedings of the ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, pp. 4-11, 1988.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Compiler-Optimized Simulation of Large-Scale.. - Adve, Bagrodia..   (Correct)

....expense of purchasing it; second, one can do the simulation fast there is no need to simulate the workstation s behavior (for example down to the level of memory references) since that part of the hardware is readily available. Many of the early simulators were designed for sequential execution [9, 13, 14]. However, even with the use of abstract models and direct execution, sequential program simulators tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program [9] Several recent efforts have been exploring the use of parallel execution [10, 17, 18, 24, ....

R. C. Covington, S. Madala, V. Mehta, J. R. Jump, and J. B. Sinclair. The Rice Parallel Processing Testbed. In Proceedings of 1988 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, Santa Fe, NM, 1988.


A Study of Program Behavior to Establish Temporal Locality at .. - Levy, Murdocca (2001)   (2 citations)  (Correct)

....the disk can be the limiting factors in applying this technique. In addition, if the system be4 ing simulated determines the program control ow, then the simulation becomes unrealistic because the program execution has to be completed before the system is simulated. In the program driven approach [23], the feedback problem is overcome by performing the simulation as the traced program executes. A program driven simulation can be partitioned into a two parts: a memory reference generator, which models the execution of the application and a target system simulator. The target system simulator is ....

R.C.Covington, S.Madala, V.Mehta, J.R.Jump, and J.B.Sinclair. The rice parallel processing testbed. In ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 4-11, 1988.


Cost/Performance of a Parallel Computer Simulator - Falsafi, Wood   (Correct)

....slow to evaluate system level performance. Real applications on parallel machines run for billions, or even trillions of cycles; even register transfer level simulators are much too slow. Over the last several years, direct execution has become widely used to accelerate architectural simulations [6, 4, 3, 7, 14]. Direct execution exploits the common ality between the instruction set of the simulated target machine and the underlying host system. For example, a floating point multiply on the target is simulated by executing a floating point multiply on the host. Such a system need only simulate the ....

R.C. Covington, S. Madala, V. Mehta, J.R. Jump, and J.B. Sinclair. The Rice Parallel Processing Testbed. In Proceedings of the 1988.


CICO: A Practical Shared-Memory Programming Performance Model - Larus, Chandra, Wood (1993)   (4 citations)  (Correct)

....of the coherence protocol. The protocol s operation (and cost) in turn, depends on the memory s state. An accurate, but slow and complex method of calculating the cost of a memory access is to simulate a particular machine in detail and record which statements cause interprocessor communication [11, 14, 35]. Simulation, because it is so time consuming, is generally limited to studying short programs with small data sets. For many programmers, a more attractive approach would trade accuracy for simplicity. The CICO performance model makes three approximations that permit reasoning about cache s ....

R.C. Covington, S. Madala, V. Mehta, J.R. Jump, and J.B. Sinclair. The Rice Parallel Processing Testbed. In Proceedings of the 1988.


Compiler-Optimized Simulation of Large-Scale.. - Adve, Bagrodia..   (Correct)

....expense of purchasing it; second, one can do the simulation fast there is no need to simulate the workstation s behavior (for example down to the level of memory references) since that part of the hardware is readily available. Many of the early simulators were designed for sequential execution [9, 13, 14]. However, even with the use of abstract models and direct execution, sequential program simulators tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program [9] Several recent efforts have been exploring the use of parallel execution [10, 17, 18, 24, ....

R. C. Covington, S. Madala, V. Mehta, J. R. Jump, and J. B. Sinclair. The Rice Parallel Processing Testbed. In Proceedings of


Talisman: Fast and Accurate Multicomputer Simulation - Bedichek (1995)   (24 citations)  (Correct)

....vary widely in their application. They are used by processor architects to evaluate uniprocessor design tradeoffs [8] operating system authors to debug their code [1] and to evaluate operating system performance [6, 25] parallel system architects to assess the performance of large systems [4, 9, 23], and end users to execute programs written for one system on a different host system [19, 26, 27] Simulators also vary in their performance and the level of detail they can model. A common metric is the slow down, or the average number of simulator host instructions executed per simulated ....

R.C. Covington, S. Madala, V. Mehta, J.R. Jump, and J.B. Sinclair. The Rice parallel processing testbed. In Proceedings of the


Improving Lookahead in Parallel Discrete Event.. - Deelman.. (2001)   (Correct)

....preliminary and need to be confirmed by studies with additional applications, they illustrate the large potential benefits that could be achieved via compiler analysis of lookahead in target applications. 2 Related Work Many of the early program simulators were designed for sequential execution [8, 12, 13]. However, even with the use of direct execution, sequential program simulators tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program [8] Several efforts have been exploring the use of parallel execution [10, 14, 17, 19, 22] to reduce the model ....

R. C. Covington, S. Madala, V. Mehta, J. R. Jump, and J. B. Sinclair, "The Rice parallel processing testbed," ACM Sigmetrics, 1988.


An Application-driven Study of Parallel System.. - Sivasubramaniam.. (1999)   (1 citation)  (Correct)

....of the interconnection network, and memory hierarchy) for the purpose of isolating and quantifying overheads. IV. SPASM SPASM is an execution driven simulator written in CSIM [28] used for simulating the execution of a parallel program on a parallel machine. As with other recent simulators [29] [30], 31] 32] 33] the bulk of the instructions in the parallel program is executed at the speed of the native processor (SPARC in our studies) and only instructions such as LOADs STOREs on a shared memory platform, and SENDs RECEIVEs on a message passing platform, that may potentially involve a ....

....spent in the application program since the last trap to the simulator. At the trapped instruction, SPASM reconciles the simulated time for the processor issuing the instruction since the last trap using the cycle counts. This technique has been popular in other execution driven simulators [29] [30], 31] 33] 34] as well. Finally, the assembled binary is linked with the rest of the simulator code. A simulation platform like SPASM allows us to vary a wide range of hardware parameters such as the number of processors, CPU clock speed, network topology, bandwidth of the links in the ....

R. G. Covington, S. Madala, V. Mehta, J. R. Jump, and J. B. Sinclair, "The Rice parallel processing testbed," in Proceedings of the ACM SIGMETRICS 1988 Conference on Measurement and Modeling of Computer Systems, Santa Fe, NM, May 1988, pp. 4--11.


Applying Programming Language Implementation Techniques to.. - Schnarr (2000)   (2 citations)  (Correct)

.... N N Y N scc N MPtrace[26] atr asm u N Y = S N aug N MX Vest[67] sim exe u N Y = Y Y scc gi Y Pixie[48] atr exe u Y N Y N aug N Pixie II[17] atr otr db exe us Y N Y S scc N Proteus[12] atr hll u N Y 1 N S aug N Purify[32] db exe u N N Y N aug Y qp qpt[38] atr otr exe u N N N N aug N RPPT[21] atr hll u N Y 1 NNaug N RSIM[54] sim atr otr exe u Y N N N emu Y 13 Purpose indicates the use for which the tool was intended: cross architecture simulation (sim) debugging (db) address tracing or memory hierarchy analysis (atr) or more detailed kinds of tracing (otr) Tools marked tb C are ....

R. C. Covington, S. Madala, V. Mehta, J. R. Jump, and J. B. Sinclair, "The Rice Parallel Processing Testbed," ACM SIGMETRICS, 4-11, 1988.


An Integrated Software Development Model for.. - Parashar, Hariri.. (1993)   (Correct)

....debugging support (DETOP) while FAUST incorporates a compile time and run time environment. Another tool applicable to this stage is Parafrase 2. Evaluation Stage Existing evaluation systems include PATOP and VISTOP from TOPSYS, the IPS 2 system [33] the SIMPLE environment [34] and RPPT [35]. FAUST and RPPT [35] specifically provide evaluation support for the CEDAR computer system. Maintenance Evolution Stage The PAWS systems [36] presents an approach for machine evaluation and can be used during the maintenance evolution stage. System prototyping capabilities are provided by SiGle ....

....(DETOP) while FAUST incorporates a compile time and run time environment. Another tool applicable to this stage is Parafrase 2. Evaluation Stage Existing evaluation systems include PATOP and VISTOP from TOPSYS, the IPS 2 system [33] the SIMPLE environment [34] and RPPT [35] FAUST and RPPT [35] specifically provide evaluation support for the CEDAR computer system. Maintenance Evolution Stage The PAWS systems [36] presents an approach for machine evaluation and can be used during the maintenance evolution stage. System prototyping capabilities are provided by SiGle [37] and Proteus ....

R. C. Covington, S. Madala, V. Mehta, J. R. Jump, and J. B. Sinclair, "The Rice Parallel Processing Testbed", 1988 ACM 0-89791-254-3/88/0005/0004 pp 4-11, 1988.


Abstracting Network Characteristics and Locality.. - Sivasubramaniam.. (1993)   (2 citations)  (Correct)

....But many of these models make simplifying assumptions about the hardware and or the applications, restricting their ability to model the behavior of real parallel systems. Execution driven simulation is becoming increasingly popular for capturing the dynamic behavior of parallel systems [25, 8, 10, 13, 20]. Some of these simulators have abstracted out the instruction set of the processors, since a detailed simulation of the instruction set is not likely to contribute significantly to the performance analysis of parallel systems. Researchers have tried to use other abstractions for the workload as ....

....The input to the simulator are parallel applications written in C. These programs are pre processed (to label shared memory accesses) the compiled assemblycode is augmented with cycle counting instructions, and the assembled binary is linked with the simulator code. As with other recent simulators [8, 13, 10, 20], bulk of the instructions is executed at the speed of the native processor (the SPARC in this case) and only instructions (such as LOADs and STOREs on a shared memory platform or SENDsandRECEIVEson a messagepassing platform) that may potentially involve a network access are simulated. The reader ....

R. G. Covington, S. Madala, V. Mehta, J. R. Jump, and J. B. Sinclair. The Rice parallel processing testbed. In Proceedings of the ACM SIGMETRICS 1988 Conference on Measurement and Modeling of Computer Systems, pages 4--11, Santa Fe, NM, May 1988.


A Distributed Memory LAPSE: Parallel Simulation of.. - Dickens.. (1993)   (18 citations)  (Correct)

....systems could make use of parallelized simulation of the network, driven by executing application code, as could designers of new communication networks. This paper shows how to couple parallelized simulation of the large machine s communication network with direct execution techniques [5, 6, 8, 9, 11]. Given N application processes whose performance on N processors is sought, we use n N processors to both execute the application and simulate its timing behavior. Each physical processor is assigned some number of application processes (virtual processors, or VPs) and a simulator process; ....

....code and LAPSE represent differences in the way OSF 1 handles multiple processes per processor. For more than one processor, the LAPSE slowdowns, compared to native code, range from about 4 to 11. These slowdowns are well within the range of slowdowns reported by other execution driven simulators [6, 17]. LAPSE runs from between 2.8 to 7.1 times slower than the instrumented code. The causes for the difference in these times include: 1. Operating system overheads as described above. Additional effects occur when executing on multiple processors. For example, the simulator processes may be trying ....

R.C. Covington, S. Madala, V. Mehta, J.R. Jump, and J.B. Sinclair. The Rice Parallel Processing Testbed. In Proceedings of the 1988 SIGMETRICS Conference, pages 4--11, May 1988.


Compiler-Optimized Simulation of Large-Scale.. - Adve, Bagrodia..   (Correct)

....expense of purchasing it; second, one can do the simulation fast there is no need to simulate the workstation s behavior (for example down to the level of memory references) since that part of the hardware is readily available. Many of the early simulators were designed for sequential execution [9, 13, 14]. However, even with the use of abstract models and direct execution, sequential program simulators tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program [9] Several recent efforts have been exploring the use of parallel execution [10, 16, 17, 23, ....

R. C. Covington, S. Madala, V. Mehta, J. R. Jump, and J. B. Sinclair, "The Rice parallel processing testbed," Proceedings of 1988 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, Santa Fe, NM, USA, 1988.


Techniques for Cache and Memory Simulation Using Address.. - Holliday (1990)   (9 citations)  (Correct)

....of the compiler for a uniprocessor program. At the start 16 of each basic block instructions are inserted to increment a counter by the estimated time of the basic block. The estimated time of a basic block can be done statically from instruction counts and types. As noted by Covington, et.al [61] this approach can be extended to trace events as well as the control flow. They introduced the term execution driven simulation for this extension. In the Rice Parallel Processing Testbed they used it, for example, for tracing messagepassing on a hypercube. Stunkel and Fuchs [62] noted that ....

R. Covington, S. Madala, V. Mehta, J. Jump, and J. Sinclair, "The Rice parallel processing testbed," in Proceedings of the 1988 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, (Santa Fe, NM), pp. 4--11, May 1988.


Wisconsin Wind Tunnel II: A Fast and Portable Parallel.. - Mukherjee (1997)   (24 citations)  (Correct)

....to evaluate computers without building hardware prototypes. However, simulating big problems parallel machines with realistic workloads requires large amounts of computation and memory. Two techniques, direct execution and parallel simulation, make this approach feasible. In direct execution [6], a program from the system under study (the target) runs on an existing system (the host) For example, a target s floating point multiply executes as a floating point multiply instruction on the host. The host calculates the target s execution time and only simulates operations unavailable on ....

....simulates operations unavailable on the host. Direct execution can run orders of magnitude faster than pure software simulation (which interprets every target instruction) This approach can accurately calculate the target execution time for statically scheduled processors with blocking caches [6]. However, computing the execution time for dynamically scheduled processors with non blocking caches is an open problem [15] Parallel simulation of a parallel computer further speeds simulation by exploiting the parallelism inherent in the target parallel computer and the parallel host s large ....

[Article contains additional citation context not shown here]

R.C. Covington, S. Madala, V. Mehta, J.R. Jump, and J.B. Sinclair. The Rice Parallel Processing Testbed. In Proceedings of the 1988 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 4--11, May 1988.


Simulation of the SCI Transport Layer on the Wisconsin Wind.. - Douglas Burger And (1995)   (2 citations)  (Correct)

....Wind Tunnel (WWT) 11] is one such parallel simulator, which runs on a Thinking Machines CM 5. WWT uses conservative, discrete event simulation [5, 8, 10, 15] to accurately calculate the logical execution time of the target application. Superior performance is obtained through direct execution [3] of identical target and host instructions on the native CM 5 hardware. WWT s solution to the problem of communicating simulation state is to guarantee windows of target time during which a node can perform simulation without requiring state from other physical nodes. This permits fixed quanta of ....

R.C. Covington, S. Madala, V. Mehta, J.R. Jump, and J.B. Sinclair. The Rice Parallel Processing Testbed. In Proceedings of the 1988 ACM SIGMETRICS Conference on Measurements and Modeling of Computer Systems, pages 4--11, May 1988.


MPSS: a Simulator of Message-Passing Applications for.. - Aversa, Mazzocca, Romano (1996)   (1 citation)  (Correct)

....possible to derive information on the performance for several different (real or hypothetical) hardware configurations from the trace collected in the test run. MPSS is based on a simulation engine developed for the PS simulator [7 8] using Ptolemy [9] Unlike PS, which adopts an execution driven [10] approach, MPSS is trace driven [11] However, this is not a substantial difference. During the simulation, the information that can be obtained from the previously collected traces is used to derive the duration of the sequential computation phases, i.e. the execution time of the blocks of code ....

R. C. Convington, S. Madala, V. Mehta, J. R. Jump, J. B. Sinclair. The Rice Parallel Processing Testbed. In Proc. 1988 ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pages 4-11, 1988.


Accuracy vs. Performance in Parallel Simulation of.. - Burger, Wood (1995)   (8 citations)  (Correct)

....CM 5) Given an 3 application, topology, protocol, and network model, WWT calculates the logical execution time (in cycles) WWT achieves good simulation performance for two reasons. First, direct execution, in which a target instruction is simulated by executing the identical host instruction [7], allows most target instructions to execute at the speed of the underlying host hardware. WWT uses a finegrain extension of shared virtual memory [17] to directly execute all load and store instructions (excluding instruction fetches) rather than just computation instructions. Second, WWT ....

R.C. Covington, S. Madala, V. Mehta, J.R. Jump, and J.B. Sinclair. The Rice Parallel Processing Testbed. In Proceedings of the 1988 ACM SIGMETRICS Conference on Measurements and Modeling of Computer Systems, pages 4-- 11, May 1988.


Design and Evaluation of Network Interfaces for System Area.. - Mukherjee (1998)   (Correct)

....portable simulator for parallel architectures. I developed WWT II jointly with Babak Falsafi, Mike Litzkow, and Steve Reinhardt. WWT II inherits many features of the original Wisconsin Wind Tunnel (WWT) 99, 91] including distributed, discrete event simulation techniques [40] direct execution [27], and accurate calculation of a simulated architecture s execution time via executable editing [65] However, unlike WWT, which only runs on the TMC CM 5, we designed WWT II to be easily portable. Consequently, WWT II runs on several uniprocessor and multiprocessor SPARC platforms, including ....

R.C. Covington, S. Madala, V. Mehta, J.R. Jump, and J.B. Sinclair. The Rice Parallel Processing Testbed. In Proceedings of the 1988 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 4--11, May 1988.


Memory Latency Rediction via Data Prefetching and Data Forwarding .. - Poulsen (1994)   (Correct)

No context found.

R. C. Covington, S. Madala, V. Mehta, J. R. Jump, and J. B. Sinclair, "The Rice Parallel Processing Testbed," Proceedings of the ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, pp. 4-11, 1988.


Fast Accurate Simulation of Large Shared - Memory Multiprocessors Revised   (Correct)

No context found.

R. C. Covington et al. The Rice Parallel Processing Testbed. In Proc. 1988.


Unknown - Stephen Von Worley   (Correct)

No context found.

R.G. Convington, S. Madala, V. Mehta, J.R. Jump, and J.B. Sinclair. "The Rice Parallel Processing Testbed" Proceedings of the 1988.


Fast and Portable Parallel Architecture.. - Mukherjee.. (2000)   (Correct)

No context found.

[1] R. Covington, S. Madala, V. Mehta, J. Jump, and J. Sinclair. The Rice parallel processing testbed. In Proceedings of the 1988.


Trap-driven Memory Simulation - Uhlig (1995)   (2 citations)  (Correct)

No context found.

Covington, R. C., Madala, S., Mehta, V., Jump, J. R. and Sinclair, J. B. The Rice parallel processing testbed. In Proceedings of the


Design of a Simulator for Large-Scale Distributed.. - Xiaowen Liu Sudikoff   (Correct)

No context found.

R. G. Covington, S. Madala, V. Mehta, J. R. Jump, and J. B. Sinclair, "The rice parallel processing testbed," in Proceedings of the 1988 ACM SIGMETRICS Conference, pp. 4--11, 1988.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC