22 citations found. Retrieving documents...
P. Dickens, P. Heidelberger, and D. Nicol. A distributed memory lapse : Parallel simulation of message-passing programs. In Proceedings of the 1994.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Compiler-Optimized Simulation of Large-Scale.. - Adve, Bagrodia..   (Correct)

.... [9, 13, 14] However, even with the use of abstract models and direct execution, sequential program simulators tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program [9] Several recent efforts have been exploring the use of parallel execution [10, 17, 18, 24, 25, 28, 29] to reduce the model execution times, with varying degrees of success. In order to have multiple simulation processes and maintain accuracy, simulations use protocols to synchronize the processes. One of the widely used protocols is the Quantum protocol, which lets the processes compute for a ....

....synchronizing them. In general, synchronous simulators that use the quantum protocol must trade off simulation accuracy with speed (frequent synchronizations slowdown the simulation, but synchronizing less frequently introduces errors, by possibly executing statements out of order) Both LAPSE [17, 18] and Parallel Proteus use some form of program analysis to increase the simulation window beyond a fixed quantum. MPI Sim uses parallel discrete event simulation with the conservative protocol [25, 28] Supported protocols include the Null Message Protocol (NMP) 11] the Conditional Event ....

P. Dickens, P. Heidelberger, and D. Nicol. Distributed memory lapse: Parallel simulation of message-passing programs. In Proceedings of 8th Workshop on Parallel and Distributed Simulation (PADS'94), 1994.


Compiler-Optimized Simulation of Large-Scale.. - Adve, Bagrodia..   (Correct)

.... [9, 13, 14] However, even with the use of abstract models and direct execution, sequential program simulators tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program [9] Several recent efforts have been exploring the use of parallel execution [10, 17, 18, 24, 25, 28, 29] to reduce the model execution times, with varying degrees of success. In order to have multiple simulation processes and maintain accuracy, simulations use protocols to synchronize the processes. One of the widely used protocols is the Quantum protocol, which lets the processes compute for a ....

....synchronizing them. In general, synchronous simulators that use the quantum protocol must trade off simulation accuracy with speed (frequent synchronizations slowdown the simulation, but synchronizing less frequently introduces errors, by possibly executing statements out of order) Both LAPSE [17, 18] and Parallel Proteus use some form of program analysis to increase the simulation window beyond a fixed quantum. MPI Sim uses parallel discrete event simulation with the conservative protocol [25, 28] Supported protocols include the Null Message Protocol (NMP) 11] the Conditional Event ....

P. Dickens, P. Heidelberger, and D. Nicol. Distributed memory lapse: Parallel simulation of message-passing programs. In Proceedings of 8th Workshop on Parallel and Distributed Simulation (PADS'94), 1994.


Programming Environments for High-Performance Grid .. - Kielmann, Bal.. (2002)   (1 citation)  (Correct)

....the communication related runtime system for parallel applications that are running on top of such Grid infrastructures. Network simulators like NSE [26] or DaSSF [27] focus on packet delivery and network protocols, rather than the network behavior as it is observed by an application. LAPSE [28] simulates parallel applications on con gurations with more than the available number of CPUs; the network behavior simulates the Intel Paragon machines. The MicroGrid software [29] virtualizes the Grid resources like memory, CPU, and networks. For the simulation, all relevant system calls are ....

P. M. Dickens, P. Heidelberger, D. M. Nicol, A Distributed Memory LAPSE: Parallel Simulation of Message-Passing Programs, in: Proceedings of the 8th Workshop on Parallel and Distributed Simulation (PADS '94), 1994.


Improving Lookahead in Parallel Discrete Event.. - Deelman.. (2001)   (Correct)

.... designed for sequential execution [8, 12, 13] However, even with the use of direct execution, sequential program simulators tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program [8] Several efforts have been exploring the use of parallel execution [10, 14, 17, 19, 22] to reduce the model execution times, with varying degrees of success. Many such simulators use sequential or parallel implementations of the quantum protocol. In order to support multiple simulation processes (possibly executing on multiple processors) and maintain accuracy, parallel simulation ....

....[23] thus reducing the accuracy of the simulations. Although MPI SIM is the only simulator that identifies communication patterns and directly exploits them for the purposes of synchronization, other simulators have used techniques to reduce the synchronization overhead. Among them are LAPSE [14] and Parallel Proteus [17] Both LAPSE and Parallel Proteus use some form of program analysis to increase the simulation window beyond a fixed quantum, without sacrificing accuracy. LAPSE uses a quantum protocol called WHOA (Window based Halting On Appointments) and runtime analysis to determine ....

P. Dickens, P. Heidelberger, and D. Nicol, "A Distributed Memory LAPSE: Parallel Simulation of Message-Passing Programs," PADS, 1994.


Compiler-Optimized Simulation of Large-Scale.. - Adve, Bagrodia..   (Correct)

.... [9, 13, 14] However, even with the use of abstract models and direct execution, sequential program simulators tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program [9] Several recent efforts have been exploring the use of parallel execution [10, 16, 17, 23, 24, 27, 28] to reduce the model execution times, with varying degrees of success. In order to have multiple simulation processes and maintain accuracy, simulations use protocols to synchronize the processes. One of the widely used protocols is the Quantum protocol, which lets the processes compute for a ....

....before synchronizing them. In general, synchronous simulators that use the quantum protocol must trade off simulation accuracy with speedfrequent synchronizations slowdown the simulation, but synchronizing less frequently introduces errors, by possibly executing statements out of order. Both LAPSE [16, 17] and Parallel Proteus use some form of program analysis to increase the simulation window beyond a fixed quantum. MPI Sim uses parallel discrete event simulation with the conservative protocol [24, 27] Supported protocols include the Null Message Protocol (NMP) 11] the Conditional Event ....

P. Dickens, P. Heidelberger, and D. Nicol, "A Distributed Memory LAPSE: Parallel Simulation of Message-Passing Programs," Proceedings of 8th Workshop on Parallel and Distributed Simulation, 1994.


Asynchronous Parallel Simulation of Parallel Programs - Prakash, Deelman, Bagrodia (2000)   (1 citation)  (Correct)

.... This research was supported in part by an ARPA CSTO Award (No. F30602 94 C 0273) 1 tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program[BDCW91] Several recent e orts have been exploring the use of parallel execution[LW96, RHL 93, DHN94, PB95, CH96] to reduce the model execution times, with varying degrees of success. In this paper, we describe a parallel simulator, which can model the behavior of parallel programs using conservative synchronization algorithms [Mis86] The main contributions of this paper are as follows: We ....

....including cache and CPU models of one speci c architecture, but also such items as device drivers. On the other hand, it might be too restrictive, 2 because it supports only the MIPS architecture. Two simulation engines which use approaches similar to ours are Parallel Proteus[LW96] and LAPSE[DHN94] In order to have multiple simulation processes and maintain accuracy, simulations use protocols to synchronize the processes. One of the widely used protocols is the Quantum protocol which lets the processes compute for a given quantum before synchronizing them. In general, synchronous ....

P. Dickens, P. Heidelberger, and D. Nicol. A distributed memory lapse: Parallel simulation of message-passing programs. In Workshop on Parallel and Distributed Simulation, pages 32-38, July 1994.


Asynchronous Parallel Simulation of Parallel Programs - Prakash, Deelman, Bagrodia (2000)   (1 citation)  (Correct)

.... Parts of this work were previously reported in [PB95] and [PB98] 1 simulators tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program[BDCW91] Several recent efforts have been exploring the use of parallel execution[LW96, RHL 93, DHN94, PB95, CH96] to reduce the model execution times, with varying degrees of success. Our simulator, MPI SIM, is capable of simulating a set of core MPI [GL93] functions such as non blocking, synchronous or buffered sends and non blocking receives. These are the building blocks of more complex ....

....[RHWG95] thus reducing the accuracy of the simulations. Although MPI SIM is the only simulator that identifies communication patterns and directly exploits them for the purposes of synchronization, other simulators have used techniques to reduce the synchronization overhead. Among them are LAPSE[DHN94] and Parallel Proteus[LW96] Both LAPSE and Parallel Proteus use some form of program analysis to increase the simulation window beyond a fixed quantum, without sacrificing accuracy. LAPSE is a parallel simulation engine for programs that use the message passing library of the Intel Paragon. It ....

P. Dickens, P. Heidelberger, and D. Nicol. A distributed memory lapse: Parallel simulation of message-passing programs. In Workshop on Parallel and Distributed Simulation, pages 32--38, July 1994.


Simulating Architecture Adaptive Algorithms with MISS-PVM - Kvasnicka, Ueberhuber (1997)   (Correct)

....metrics like speedup, scaled speedup, sizeup, or an isoefficiency function and an application for scalability problems. Lapse is an example for a simulator which uses parallel simulation on a small number of processors to predict the performance on large numbers of processors (Dickens et al. [9] and Dickens et al. 10] Lapse distinguishes between execution threads and communication. Only the communication has to be simulated. Therefor there are two extremes in the simulation: if communication is rare, simulation is (at first sight) slowed down only by the number of processors; if ....

....to be on another node. They can run on the processor while it is waiting to perform the first task. Taking this behavior into account it is possible to simulate several processors of a bigger machine on only a few processors of the host machine which can be very advantageous (Dickens et al. [9] and Dickens et al. 10] Intrusiveness To improve timing estimations, the time used for timing routines and for simulation functions has to be taken into account. On computers with accurate and reliable timers this can be easily achieved as straightforward measurements (i.e. on the SGI Power ....

P. M. Dickens, P. Heidelberger, D. M. Nicol, A Distributed Memory LAPSE: Parallel Simulation of Message-Passing Programs, Proceedings 8th Workshop on Parallel and Distributed Simulation (PADS '94), (D. K. Arvind, R. Bagrodia, J. Y. B. Lin, Eds.), SCS, San Diego, CA, USA, 1994, pp. 32--38.


Timepatch: A Novel Technique for the Parallel Simulation.. - Umakishore Ramachandran   (Correct)

....methods. In [HS90] it is also shown that it is sufficient to execute the traces for a set of cache lines instead of the entire cache. An implementation of a parallel trace driven simulation on a MasPar is discussed in [NGLR92] which offers extensions to the above approach. Dickens et al. DHN94] suggest a technique for parallel simulation of message passing programs. Their objective is to simulate the performance of these programs on a larger configuration of a target machine on a smaller host machine. The Wisconsin Wind Tunnel [RHL 92] uses a direct execution approach to simulate a ....

P. M. Dickens, P. Heidelberger, and D. Nicol. A distributed memory LAPSE: Parallel simulation of message passing programs. In 8th Workshop on Parallel and Distributed Simulation, pages 32--38, July 1994.


Mpi-Sim: Using Parallel Simulation To Evaluate Mpi Programs - Prakash, Bagrodia (1998)   (7 citations)  (Correct)

....Most existing simulators (Brewer et al. 1991, Davis et al. 1991, Covington et al. 1991) use direct execution to simulate the sequential blocks of code, and simulate only the communication and or I O events. As sequential execution of such models (Legedza and Weihl 1996, Reinhardt et al. 1993, Dickens et al. 1994, Dickens et al. 1996) are typically slow (slowdown factors of 2 to 15 per processor are not atypical) several researchers have used parallel execution of such models with varying degrees of success. The primary difficulty in obtaining better performance is the significant synchronization overhead ....

.... Page 3 of8 3 PARALLEL EXECUTION OF MPI SIMULATION MODEL Two types of protocols have commonly been used in the parallel simulation of parallel programs: the synchronous or quantum protocol (e.g. SimOS (Rosenblum et al. 1995, Rosenblum et al. 1997) and the asynchronous protocols (e.g. LAPSE (Dickens et al. 1994)) In the synchronous protocol, each LP periodically simulates its corresponding process for a previously determined interval Q, termed the simulation quantum, and then executes a global barrier. These barriers are used to ensure that messages from remote LPs will be accepted in their correct ....

[Article contains additional citation context not shown here]

Dickens, P., P. Heidelberger, and D. Nicol. A Distributed Memory Lapse: Parallel Simulation of Message -Passing Programs. In Workshop on Parallel and Distributed Simulation, Pages 32-38, July 1994.


Phase-based Adaptive Dynamic Load Balancing for Parallel Tree.. - Haron (1998)   (Correct)

....algorithms for computation dominated and communication dominated application, with individual node grain size being 10000 and 100, respectively. There is a consistent overestimation of the simulation results for computation dominated problem. This is largely due to the non linear cache performance [18] of the T3D. The simulator, on the other hand, simulates the floating point operation linearly by multiplying the number of operations with a single operation cost. For computation dominated problem on 1 to 128 processors, the simulation results are within 15 accurate from the actual measurements ....

P. Dickens, P. Heidelberger, and D. Nicol. A Distributed Memory LAPSE: Parallel Simulation of Message-Passing Programs. In In Workshop on Parallel and Distributed Simulation, pages 32--38, July 1994.


Asynchronous Parallel Simulation of Parallel Programs - Bagrodia, Prakash   (Correct)

.... even with the use of abstract models and direct execution, sequential program simulators tended to be slow with slowdown factors ranging from 2 to 35 for each process in the simulated program[BDCW91] Several recent efforts have been exploring the use of parallel model execution[LW96, RHL 93, DHN94, PB95b, CH96] to reduce the model execution times, with varying degrees of success. In this paper, we describe a parallel simulator for parallel programs using conservative synchronization algorithms. The primary contribution of this paper are as follows: This research was supported in part by ....

.... simulation engine, Wisconsin Wind Tunnel[RHL 93] a shared memory architecture simulation engine and SimOS[RBDH97] a complete system simulator (multiple programs plus operating system) Two simulation engines which use approaches similar to ours are Parallel Proteus[LW96] and LAPSE[DHN94] In Proteus, the application to be simulated is written in a superset of C, and constructs are provided to control the placement of data. Library routines are provided for message passing, thread management, memory management and data collection. The target architecture is specified in terms of ....

P. Dickens, P. Heidelberger, and D. Nicol. A distributed memory lapse: Parallel simulation of message-passing programs. In Workshop on Parallel and Distributed Simulation, pages 32--38, July 1994. Benchmark 16 Targ. Proc. 8 or 9 Targ. Proc. LU 8.74L 11.77L MG 2.79L 4.03L BT 12.33L 24.81L SP 4.61L 9.29L


Where is Time Spent in Message-Passing and Shared-Memory.. - Chandra, Larus, Rogers (1994)   (57 citations)  (Correct)

....machine. For the most part, this change involved disabling the shared address space, adding calls into WWT to simulate the memory mapped locations in the CM 5 network interface, and using WWT s event mechanisms to simulate message transmission. Our message passing simulator is similar to LAPSE [5], a direct execution simulator for message passing machines that runs on the Intel Paragon. One difference between the two simulators is that LAPSE models network contention. To provide a communication library for messagepassing programs, we ported the Active Message [22] layer from Thinking ....

Phillip M. Dickens, Philip Heidelberger, and David M. Nicol. A Distributed Memory LAPSE: Parallel Simulation of Message-Passing Programs. In Proceedings of the 8th Workshop on Parallel and Distributed Simulation (PADS '94), pages 32--38, July 1994.


Efficient Simulation of Message-Passing in Distributed-Memory.. - Demaine (1996)   (Correct)

....tedious and should be avoided whenever possible. 2.1.6 Parallelism and Concurrency Our final consideration is parallelism present within the simulator. In simulating a parallel program, some authors claim that there is inherent parallelism available, which can be exploited to yield good speedup [26, 71]. The idea is that the simulator needs to use communication only when the program s processes communicate. Essentially, the parallelism from the program is carried over to the simulator. For tightly coupled simulation systems, this makes sense: part of the overall system is the parallel program. ....

....CHAPTER 2. LITERATURE SURVEY 19 Simulator Coupling Timing method Network sim. Language Parallel Threads RPPT [19] Tight Augmentation Accurate Concurrent C [52] No Yes Proteus [8] Tight Augmentation Accurate; Agarwal Custom No Yes EPPP [63] Tight Augmentation Accurate; Agarwal HPC [30] No Yes LAPSE [26] Tight Augmentation Accurate NX Yes No PAPS [78] Loose None Accurate Task graph No PerPreT [5] Loose Analytic Accurate LOOP [6] No ExtraP [57] Loose CPU time Accurate pC [4] No Yes APNM [67] Either Augmentation No traffic PVM No Varies PUPPET Loose CPU time No traffic MPI NX No Varies ....

[Article contains additional citation context not shown here]

Phillip M. Dickens, Philip Heidelberger, and David M. Nicol. A distributed memory LAPSE: Parallel simulation of message-passing programs. In Proceedings of the 8th Workshop on Parallel and Distributed Simulation, volume 24 of SIGSIM Newsletter, pages 32--38, Edinburgh, Scotland, July 1991. IEEE Computer Society Press.


Parallel Simulation of Parallel File Systems and I/O Programs - Bagrodia, Docy, Kahn   (4 citations)  (Correct)

....events. Even with direct execution, sequential simulation of large parallel programs can be very time consuming [BDCW91, DGH91, CDJ 91] This has lead to a variety of attempts to use parallel execution to reduce simulation times for models that simulate parallel programs [LW96, RHL 93, DHN94] Most of the existing parallel program simulators are used to evaluate the performance of the memory hierarchy, interconnection network, or processor architecture. To the best of our knowledge, none of the existing parallel simulators have been used to evaluate parallel I O systems. Specific ....

P. Dickens, P. Heidelberger, and D. Nicol. A distributed memory lapse: Parallel simulation of message-passing programs. In Workshop on Parallel and Distributed Simulation, pages 32--38, July 1994.


Parallel Numerical Algorithms Workshop - Icase And   (Correct)

....benefit in using all nodes of a large machine to simulate the performance of a code running on a much larger (perhaps as yet unbuilt) machine. We have developed a tool, LAPSE (Large Application Parallel Simulation Environment) which provides the necessary simulation capability on the Intel Paragon (Dickens, et.al. 1994). Given an application s makefile , LAPSE transparently transforms a code intended for N processors into a set of N LAPSE application processes, and P simulation processes, where P N is the number of processors actually used. Each processor is responsible for executing N=P application ....

P. Dickens, P Heidelberger, D Nicol. A Distributed Memory LAPSE : Parallel Simulation of Message Passing Programs. In Proceedings of the 1994 Workshop on Parallel and Distributed Simulation, Edinburgh, Scotland, to appear.


Timepatch: A Novel Technique for the Parallel Simulation of.. - Gautam Shah (1994)   (1 citation)  (Correct)

....methods. In [HS90] it is also shown that it is sufficient to execute the traces for a set of cache lines instead of the entire cache. An implementation of a parallel trace driven simulation on a MasPar is discussed in [NGLR92] which offers extensions to the above approach. Dickens et al. DHN94] suggest a technique for parallel simulation of message passing programs. Their objective is to simulate the performance of these programs on a larger configuration of a target machine on a smaller host machine. The Wisconsin Wind Tunnel [RHL 92] uses a direct execution approach to simulate a ....

P. M. Dickens, P. Heidelberger, and D. Nicol. A distributed memory LAPSE: Parallel simulation of message passing programs. In 8th Workshop on Parallel and Distributed Simulation, pages 32--38, July 1994.


Parallel Simulation of Data parallel Programs - Sundeep Prakash (1995)   (Correct)

....and matrix multiplication. 1 Introduction Simulators for parallel programs can be effectively utilized to test, debug, and predict performance of parallel programs on a diverse set of parallel architectures. A variety of simulators have been designed[BDCW91, DGH91, RHL 93, CDJ 91, DHN94] to estimate the performance of a parallel program. Most simulators were designed to estimate the performance of asynchronous or task parallel programs. With few exceptions, these simulators fall broadly into two classes: the simulator itself is sequential and can be ported to almost any ....

....by inserting calls to UserSend and possibly a global memory simulator at the required points. A profiler is used to augment the code for direct execution. Parallel simulation engines include the Wisconsin Wind Tunnel[RHL 93] WWT) and the Large Application Parallel Simulation Environment[DHN94] LAPSE) WWT is a simulator of cache coherent, shared memory computers that runs on the Thinking Machines CM 5. It provides fast parallel simulation by direct execution of local code, one host processor per target processor. Shared memory of the target architecture is simulated by trapping on ....

P. Dickens, P. Heidelberger, and D. Nicol. A distributed memory lapse: Parallel simulation of message-passing programs. In Workshop on Parallel and Distributed Simulation, pages 32--38, July 1994.


On Bottleneck Partitioning of k-ary n-cubes - David Nicol Weizhen (1994)   Self-citation (Nicol)   (Correct)

No context found.

P. Dickens, P. Heidelberger, and D. Nicol. A distributed memory lapse : Parallel simulation of message-passing programs. In Proceedings of the 1994.


Parallelized Direct Execution Simulation of.. - Dickens.. (1994)   (19 citations)  Self-citation (Dickens Heidelberger Nicol)   (Correct)

....the time to natively execute the application to that when the application is augmented with instruction counting, but not timing simulation. It is harder to separate simulation overhead from operating system overhead, nor shall we attempt to do so here. However, some measurements are presented in [13] indicating that OSF 1 on the Paragon has process management overheads that increase superlinearly as the number of processes per node increase. We do note that, parallel simulation overheads aside, LAPSE must send at least twice as many messages as the natively executing application; each message ....

P.M. Dickens, P. Heidelberger, and D.M. Nicol. A distributed memory LAPSE: Parallel simulation of message-passing programs. In Proceedings of the 8th Workshop on Parallel and Distributed Simulation (PADS), Edinburgh, Scotland, 1994. The Society of Computer Simulation, pp. 32--38.


Parallelized Direct Execution Simulation of.. - Dickens.. (1994)   (19 citations)  Self-citation (Dickens Heidelberger Nicol)   (Correct)

....the time to natively execute the application to that when the application is augmented with instruction counting, but not timing simulation. It is harder to separate simulation overhead from operating system overhead, nor shall we attempt to do so here. However, some measurements are presented in [8] indicating that OSF 1 on the Paragon has process management overheads that increase superlinearly as the number of processes per node increase. We do note that, parallel simulation overheads aside, LAPSE must send at least twice as many messages as the natively executing application; each message ....

P.M. Dickens, P. Heidelberger, and D.M. Nicol. A distributed memory LAPSE: Parallel simulation of message-passing programs. In Proceedings of the 8th Workshop on Parallel and Distributed Simulation (PADS), Edinburgh, Scotland, 1994. The Society of Computer Simulation. to appear.


Parallel Execution for Serial Simulators - Nicol, Heidelberger (1996)   (5 citations)  Self-citation (Heidelberger Nicol)   (Correct)

....calls are optional; without them, the computation of lookahead is automatically handled by the U.P.S. carriers. 5 Synchronization Protocols U.P.S. provides three synchronization protocols: YAWNS (Yet Another Windowing Network Simulator, see [16, 18] WHOA (Windows, Halting On Appointments, see [6]) and PUCS (Parallel Uniformized Continuous time Markov Chain Simulator, see [13] One of these protocols is assigned to a carrier at its declaration; thus a U.P.S. model may use simultaneously any combination of these protocols. This feature allows one to tailor the synchronization method to ....

P.M. Dickens, P. Heidelberger and D.M. Nicol. A distributed memory LAPSE: Parallel simulation of message-passing programs. In Proceedings of the 8th Workshop on Parallel and Distributed Simulation (PADS), 32-38, IEEE Computer Society Press, 1994.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC