Results 1 - 10
of
980
Parallel discrete event simulation
, 1990
"... Parallel discrete event simulation (PDES), sometimes I called distributed simulation, refers to the execution of a single discrete event simulation program on a parallel computer. PDES has attracted a considerable amount of interest in recent years. From a pragmatic standpoint, this interest arises ..."
Abstract
-
Cited by 818 (39 self)
- Add to MetaCart
Parallel discrete event simulation (PDES), sometimes I called distributed simulation, refers to the execution of a single discrete event simulation program on a parallel computer. PDES has attracted a considerable amount of interest in recent years. From a pragmatic standpoint, this interest arises from the fact that large simulations in engineering, computer science, economics, and military apphcations, to mention a few, consume enormous amounts of time
A Survey of Rollback-Recovery Protocols in Message-Passing Systems
, 1996
"... this paper, we use the terms event logging and message logging interchangeably ..."
Abstract
-
Cited by 716 (22 self)
- Add to MetaCart
this paper, we use the terms event logging and message logging interchangeably
Optimistic recovery in distributed systems
- ACM Transactions on Computer Systems
, 1985
"... Optimistic Recovery is a new technique supporting application-independent transparent recovery from processor failures in distributed systems. In optimistic recovery communication, computation and checkpointing proceed asynchronously. Synchronization is replaced by causal dependency trock-ing, which ..."
Abstract
-
Cited by 355 (6 self)
- Add to MetaCart
(Show Context)
Optimistic Recovery is a new technique supporting application-independent transparent recovery from processor failures in distributed systems. In optimistic recovery communication, computation and checkpointing proceed asynchronously. Synchronization is replaced by causal dependency trock-ing, which enables a posteriori reconstruction of a consistent distributed system state following a failure using process rollback and message replay. Because there is no synchronization among computation, communication, and checkpointing, optimistic recovery can tolerate the failure of an arbitrary number of processors and yields better throughput and response time than other general recovery techniques whenever failures are infre-quent.
Distributed discrete-event simulation
- ACM Computing Surveys
, 1986
"... Traditional discrete-event simulations employ an inherently sequential algorithm. In practice, simulations of large systems are limited by this sequentiality, because only a modest number of events can be simulated. Distributed discrete-event simulation (carried out on a network of processors with a ..."
Abstract
-
Cited by 288 (0 self)
- Add to MetaCart
Traditional discrete-event simulations employ an inherently sequential algorithm. In practice, simulations of large systems are limited by this sequentiality, because only a modest number of events can be simulated. Distributed discrete-event simulation (carried out on a network of processors with asynchronous message-communicating capabilities) is
Parsec: A Parallel Simulation Environment for Complex Systems
- IEEE Computer
, 1998
"... ulating large-scale systems. Widespread use of parallel simulation, however, has been significantly hindered by a lack of tools for integrating parallel model execution into the overall framework of system simulation. Although a number of algorithmic alternatives exist for parallel execution of disc ..."
Abstract
-
Cited by 247 (23 self)
- Add to MetaCart
ulating large-scale systems. Widespread use of parallel simulation, however, has been significantly hindered by a lack of tools for integrating parallel model execution into the overall framework of system simulation. Although a number of algorithmic alternatives exist for parallel execution of discreteevent simulation models, performance analysts not expert in parallel simulation have relatively few tools giving them flexibility to experiment with multiple algorithmic or architectural alternatives for model execution. Another drawback to widespread use of simulations is the cost of model design and maintenance. The design and development costs for detailed simulation models for complex systems can easily rival the costs for the physical systems themselves. The simulation environment we developed at UCLA attempts to address some of these issues by providing these features: . An easy path for the migration of simulation models to operational software prototypes. . Implementation on
The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers
- In Proceedings of the 1993 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems
, 1993
"... We have developed a new technique for evaluating cache coherent, shared-memory computers. The Wisconsin Wind Tunnel (WWT) runs a parallel sharedmemory program on a parallel computer (CM-5) and uses execution-driven, distributed, discrete-event simulation to accurately calculate program execution tim ..."
Abstract
-
Cited by 226 (29 self)
- Add to MetaCart
(Show Context)
We have developed a new technique for evaluating cache coherent, shared-memory computers. The Wisconsin Wind Tunnel (WWT) runs a parallel sharedmemory program on a parallel computer (CM-5) and uses execution-driven, distributed, discrete-event simulation to accurately calculate program execution time. WWT is a virtual prototype that exploits similarities between the system under design (the target) and an existing evaluation platform (the host). The host directly executes all target program instructions and memory references that hit in the target cache. WWT's shared memory uses the CM-5 memory 's error-correcting code (ECC) as valid bits for a fine-grained extension of shared virtual memory. Only memory references that miss in the target cache trap to WWT, which simulates a cache-coherence protocol. WWT correctly interleaves target machine events and calculates target program execution time. WWT runs on parallel computers with greater speed and memory capacity than uniprocessors. WWT'...
Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing
, 1988
"... In a distributed system using message logging and checkpointing to provide fault tolerance, there is always a unique maximum recoverable system state, regardless of the message logging protocol used. The proof of this relies on the observation that the set of system states that have occurred during ..."
Abstract
-
Cited by 224 (14 self)
- Add to MetaCart
In a distributed system using message logging and checkpointing to provide fault tolerance, there is always a unique maximum recoverable system state, regardless of the message logging protocol used. The proof of this relies on the observation that the set of system states that have occurred during any single execution of a system forms a lattice, with the sets of consistent and recoverable system states as sublattices. The maximum recoverable system state never decreases, and if all messages are eventually logged, the domino e ect cannot occur. This paper presents a general model for reasoning about recovery in such a system and, based on this model, an efficient algorithm for determining the maximum recoverable system state at any time. This work uni es existing approaches to fault tolerance based on message logging and checkpointing, and improves on existing methods for optimistic recovery in distributed systems.
Optimistic parallelism requires abstractions
- In PLDI
, 2007
"... Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and runtime speculative execution have failed to uncover much parallelism in these applications, in spite o ..."
Abstract
-
Cited by 179 (24 self)
- Add to MetaCart
(Show Context)
Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and runtime speculative execution have failed to uncover much parallelism in these applications, in spite of a lot of effort by the research community. These difficulties have even led some researchers to wonder if there is any coarse-grain parallelism worth exploiting in irregular applications. In this paper, we describe two real-world irregular applications: a Delaunay mesh refinement application and a graphics application that performs agglomerative clustering. By studying the algorithms and data structures used in these applications, we show that there is substantial coarse-grain, data parallelism in these applications, but that this parallelism is very dependent on the input data and therefore cannot be uncovered by compiler analysis. In principle, optimistic techniques such as thread-level speculation can be used to uncover this parallelism, but we argue that current implementations cannot accomplish this because they do not use the proper abstractions for the data structures in these programs. These insights have informed our design of the Galois system, an object-based optimistic parallelization system for irregular applications. There are three main aspects to Galois: (1) a small number of syntactic constructs for packaging optimistic parallelism as iteration over ordered and unordered sets, (2) assertions about methods in class libraries, and (3) a runtime scheme for detecting and recovering from potentially unsafe accesses to shared memory made by an optimistic computation. We show that Delaunay mesh generation and agglomerative clustering can be parallelized in a straight-forward way using the Galois approach, and we present experimental measurements to show that this approach is practical. These results suggest that Galois is a practical approach to exploiting data parallelism in irregular programs.
Real time groupware as a distributed system: Concurrency control and its effect on the interface
, 1994
"... This paper exposes the concurrency control problem in groupware when it is implemented as a distributed system. Traditional concurrency control methods cannot be applied directly to groupware because system interactions include people as well as computers. Methods, such as locking, serialization, an ..."
Abstract
-
Cited by 179 (9 self)
- Add to MetaCart
This paper exposes the concurrency control problem in groupware when it is implemented as a distributed system. Traditional concurrency control methods cannot be applied directly to groupware because system interactions include people as well as computers. Methods, such as locking, serialization, and their degree of optimism, are shown to have quite different impacts on the interface and how operations are displayed and perceived by group members. The paper considers both human and technical considerations that designers should ponder before choosing a particular concurrency control method. It also reviews our work-in-progress designing and implementing a library of concurrency schemes in GROUPIUT, a groupware toolkit.