| P. Dasgupta, Z. Kedem, and M. Rabin. Parallel processing on networks of workstations: a fault tolerant, high performance approach. In Proc. of the 15th International Conference on Distributed Systems, 1995, pp. 467--474. |
....not be ready to receive the data at the time P1 reaches step 10. Worse, P2 may simply choose to leave the system, in which case P1 would get stuck with no one to send the data to. Various strategies for implementing adaptive parallelism have already been proposed and studied. In eager scheduling [34], packets of work to be done are kept in a pool from which worker nodes get any undone work whenever they run out of work to do. In this way, faster workers get more work according to their capability. And, if any work is left undone by a slow node, or a node that dies , it eventually gets ....
....work 2. 3.5.1 Adaptive Parallelism Eager Scheduling Generally, adaptive parallelism can be implemented by writing work managers that follow appropriate scheduling strategies. In our master worker runtime system we employ a simple form of adaptive parallelism sometimes called eager scheduling [34] (also discussed earlier in Sect. 2.3.2) In our implementation of this scheme, as shown in Fig. 3 5, each work object has a done flag which is set when a worker returns the result for that object. The work objects are stored in a circular list, with a pointer keeping track of the next available ....
P. Dasgupta, Z. Kedem, and M. Rabin. Parallel processing on networks of workstations: A fault-tolerant high performance approach, in Proc. 15th IEEE International Conference on Distributed Computing Systems, 1995. URL: http://cs.nyu. edu/milan/milan/
....Partially supported by the NSF Grants 9988304 and 0121277. 1 Introduction The ability to effectively cooperate on common tasks in a decentralized setting is key to solving many computation problems raging from distributed search (e.g. SETI [22] to distributed simulation (e.g. [8]) and multi agent collaboration (e.g. 1, 28] Do All, an abstraction of such cooperative activity, is the problem of using n processors to cooperatively perform m independent tasks in the presence of failures. The Do All problem can be used as the cornerstone in identifying aspects of the ....
P. Dasgupta, Z. Kedem, and M. Rabin. Parallel processing on networks of workstation: A fault-tolerant, high performance approach. In Proceedings of the 15 IEEE International Conference on Distributed Computer Systems (ICDCS 1995.
....could o er programmers illusion that their programs run on a parallel system that is synchronous, while in fact the programs would be simulated on an asynchronous system. Simulations of a parallel system that is synchronous on a system that is asynchronous have been studied for over a decade now [3, 4, 5, 6, 10, 13, 14, 17, 19, 20, 21, 22, 28, 30, 35, 36]. Simplifying considerably, simulations assume that there is a system with p asynchronous processors, and the system is to simulate a program written for n synchronous processors. The simulations use three main ideas: idempotence, load balancing, and synchronization. Speci cally, the execution of ....
Dasgupta, P., Kedem, Z.M., Rabin, M.O.: Parallel Processing on Networks of Workstations: A FaultTolerant, High Performance Approach. 15th International Conference on Distributed Computing Systems ICDCS'95, (1995) 467-474
....the goal of improving productivity of parallel computing involves using the synchronous PRAM model as the programming paradigm and then efficiently simulating PRAM programs on realistic machines. Simulations of PRAM on asynchronous parallel machines have been studied for over a decade now [2, 3, 4, 5, 9, 13, 14, 18, 20, 21, 22, 23, 27, 29, 34, 35]. Simplifying considerably, simulations assume that there is a machine with p asynchronous fault prone processors and the machine is to simulate a program written for n synchronous fault free processors. The simulations use three main ideas: idempotence, load balancing, and synchronization. ....
Dasgupta, P., Kedem, Z.M., Rabin, M.O.: Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach. 15th International Conference on Distributed Computing Systems ICDCS'95 (1995) 467--474
....12, 17, 26, 25, 28, 29, 32] Some approaches preserve pram as a model for algorithm designers and provide algorithmic simulations of pram algorithms on other platforms. It has been shown that solutions for a particular problem can be used as building blocks in constructing such simulations (e.g. [8, 21, 26, 30]) This problem is known as the Write All problem [18] Given a t element array and n undependable processors, set each element of the array to 1. Despite its simplicity, solutions for Write All can be used in constructing more complex robust algorithms and simulations of synchronous parallel ....
Dasgupta, P., Kedem, Z., Rabin, M.: Parallel Processing on Networks of Workstation: A FaultTolerant, High Performance Approach. Proc. of the International Conference on Distributed Computer Systems (1995) 467-474
....the execution environment. Also, complicated programming paradigms, e.g. elaborate consistency models, can make a programmer s life unnecessarily di#cult. Hence, it would be ideal for a programmer to write programs for an idealized virtual machine with a very simple semantic. The Calypso system [3,12] provides such a virtual machine with straightforward distributed shared memory (DSM) semantics; Calypso s run time system is responsible for executing a program written for this virtual machine on a set of real, imperfect machines. For a user of a parallel program, a simple setup and ....
....compiling P into a semantically equivalent program C(P) that can be e#ciently executed on an asynchronous machine. 3. 2 The Calypso System Calypso is a software system, based upon this theoretical work, that allows writing and executing parallel programs on COTS based clusters of workstations [3, 12]. One of the objectives of Calypso is to evaluate the theoretical results in a practical prototype for their suitability to issues like programmability, high performance, scalability, load balancing, and fault masking. The Calypso programming model. In the Calypso system, a programmer writes a ....
P. Dasgupta, Z. M. Kedem, and M. O. Rabin. Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach. In Proc. 15th Intl. Conference on Distributed Computing Systems, pages 467--474, 1995.
....13, 18, 29, 28, 31, 32, 35] Some approaches preserve pram as a model for algorithm designers and provide algorithmic simulations of pram algorithms on other platforms. It has been shown that solutions for a particular problem can be used as building blocks in constructing such simulations (e.g. [9, 23, 29, 33]) This problem is known as the Write All problem [19] Given a t element array and n undependable processors, set each element of the array to 1. Write All captures and abstracts the computational progress that can be achieved in unit time by t correct synchronous processors. Despite its ....
Dasgupta, P., Kedem, Z., Rabin, M.: Parallel Processing on Networks of Workstation: A FaultTolerant, High Performance Approach. Proc. of the International Conference on Distributed Computer Systems (1995) 467-474
....the overall computation is ecient with high probability. Note that in some deterministic models optimal simulations are possible (cf. 30] 4 however randomized solutions are able to achieve optimality (whp) for broader ranges of models and algorithms. Practical implementations are discussed in [6], where it is also observed that parallel computation can be made faster by essentially ignoring processors that are slower than others. The rest of the paper is structured as follows. In Section 2 we present models and de nitions. In Section 3 we present the bounds under the perfect ....
Dasgupta, P., Kedem, Z., Rabin, M., \Parallel Processing on Networks of Workstation: A FaultTolerant, High Performance Approach", in Proc. of the International Conference on Distributed Computer Systems, pp. 467-474, 1995.
....network resources; and a small fovea size leads to a quicker response for getting the foveal region, but a larger number of rounds to transmit the whole image. 2.2. Junction Detection The Junction Detection application [15] is a parallel image processing application running on the Calypso system [11,18], an adaptive parallel processing system which views computations as consisting of several parallel tasks inserted into a sequential program. Each parallel task can F. Chang, V. Karamcheti Automatic Adaptation of Tunable Distributed Applications 3 run on a changing set of worker machines and is ....
P. Dasgupta, Z. Kedem, and M. Rabin. Parallel processing on networks of workstations: Fault-tolerant high performance approach. In Proc. 15th IEEE Intl. Conf. on Distributed Computing Systems, 1995.
No context found.
P. Dasgupta, Z. Kedem, and M. Rabin. Parallel processing on networks of workstations: Fault-tolerant high performance approach. In Proc. 15th IEEE Intl. Conf. on Distributed Computing Systems, 1995.
....does not have di#erent types of shared variables. All of shared memory follows lazy release consistency. See Section 1.3.2. TreadMarks supports two synchronization primitives, locks and barriers. Section 1.3.4 describes the implementation of locks and barriers. 6. 4 Calypso Calypso [BDK95, DKR95] is a software system for writing and executing parallel programs on a non dedicated platform. Calypso provides an abstraction of a distributed shared memory model with an unbounded number of virtual processors. Programs are executed by steps. Each step can be either sequential or parallel. In ....
P. Dasgupta, Z. M. Kedem, and M. O. Rabin. Parallel processing on networks of workstations: a fault-tolerant, high performance approach. In Proc. 15th Intl. Conference on Distributed Computing Systems, 1995.
....Fault tolerant features are an integral part of our system, without additional fault detection and recovery mechanisms like checking points and roll back. These is no additional overhead if no failures actually occurred. 1. 3 Contributions Our work is a continuation of research reported in [29, 20, 4, 39, 37, 35, 36, 38, 40]. We developed several unique features in our system: Novel techniques to handle nested parallelism and synchronization Several new techniques are developed in our system in order to cope with nested parallelism. These techniques include nested two phase idempotent execution strategy and ....
....computing and distributed systems related to the system. Chapter 8 summarizes our work. 9 Chapter 2 Key Concepts and Techniques In this chapter, we will discuss the key concepts and techniques used in our system. Some of these concepts and techniques were developed in previous research [29, 20, 4, 39, 37, 35, 36, 38, 40], including the abstract execution model, two phase idempotent execution strategy, and eager scheduling. However, the introduction of the new features in our system like nested parallelism and synchronization make these techniques inadequate. We will present both the original ideas and the ....
P. Dasgupta, Z. M. Kedem, and M. O. Rabin. Parallel processing on networks of workstations: A fault-tolerant, high performance approach. In Proceedings of the 15th IEEE International Conference on Distributed Computing Systems, pages 467--474, 1995.
....are not. In addition, the parallel tasks execute in an isolated context; i.e. they do not have access to variables defined in the parent s context. In addition, a parallel task cannot call a function that has an embedded parallel step (nesting of parallelism is not allowed) The Calypso system [4, 17, 18] adds fault tolerance and load balancing to the DSM concept, but suffers from the lack of nesting and synchronization (except barrier synchronization) Chime is an extension to Calypso and absolves these shortcomings. A plethora of programming systems for NOW based systems exist, that uses the ....
P. Dasgupta, Z. M. Kedem, and M. O. Rabin. Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach. In Proceedings of the 15th IEEE International Conference on Distributed Computing Systems, 1995.
....The impact of the Networks of Workstations (NOW) approaches and the cluster approaches for distributed computing has had limited impact on the general purpose computing arena. The NOW approach has resulted in a plethora of parallel computing platforms (such as PVM [12] MPI [13] Calypso [10], Linda [6] Treadmarks [1] Brazos [25] and so on) The clustering approaches as been successful in a few special application areas such as highly reliable file and database and web services (notably from Sun, Tandem, IBM and Microsoft) The promise of distributed computing for generalpurpose ....
P. Dasgupta, Z. M. Kedem and M. O. Rabin. Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach. In Proceedings of the 15th IEEE International Conference on Distributed Computing Systems, 1995.
....(RPC or message based systems with provisions for fault detection, checkpointing, and so on. While the majority of the systems run on Unix, there are a few systems that run on Windows NT. These include Win PVM, Win32 MPI and Brazos [SB97] 4. The Calypso System The design of Calypso [BDK95, DKR95] addresses efficient, reliable parallel processing in a clean and efficient manner. In particular the Calypso NT [MSD97] has the following salient features: Ease of Programming: The programmer writes programs in C or C and uses a language independent API (application programming interface) to ....
P. Dasgupta, Z. M. Kedem, and M. O. Rabin. Parallel Processing on Networks of Workstations: A FaultTolerant, High Performance Approach. In Proceedings of the 15th IEEE International Conference on Distributed Computing Systems, 1995.
....may itself have nested parallel step(s) The manager maintains an execution dependency graph to capture the dependencies between the parallel tasks and schedules them and their corresponding continuations in correct order. The allocation of tasks to the workers is done by eager scheduling [DKR95, BDK95] which replicates computations in a non conventional fashion whenever, more than required computational resources are available. To ensure correctness in eager scheduled computations, the execution uses TIES (two phase idempotent execution strategy) BDK95] for memory management. This has two ....
P. Dasgupta, Z. M. Kedem, and M. O. Rabin. Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach. In Proceedings of the 15th IEEE International Conference on Distributed Computing Systems, 1995.
....PROJECT CONTEXT The MILAN metacomputing system [1] provides middleware layers that enable the efficient, reliable, predictable execution of applications on an unreliable and dynamically changing set of machines. MILAN takes advantage of two execution techniques with strong theoretical foundations [17, 11] two phase idempotent execution strategy, and eager scheduling to provide programmers with the view of a fault free virtual shared memory environment, even when the underlying parallel and distributed system resources may incur faults and exhibit wide variations in processing speeds. This ....
P. Dasgupta, Z. Kedem, and M. Rabin. Parallel processing on networks of workstations: Fault-tolerant high performance approach. In Proc. 15th IEEE Intl. Conf. on Distributed Computing Systems, 1995.
....PROJECT CONTEXT The MILAN metacomputing system [1] provides middleware layers that enable the efficient, reliable, predictable execution of applications on an unreliable and dynamically changing set of machines. MILAN takes advantage of two execution techniques with strong theoretical foundations [11, 17]##two phase idempotent execution strategy and eager scheduling##to provide programmers with the view of a fault free virtual shared memory environment, even when the underlying parallel and distributed system resources may incur faults and exhibit wide variations in processing speeds. This support ....
P. Dasgupta, Z. Kedem, and M. Rabin, Parallel processing on networks of workstations: Faulttolerant high performance approach, in Proc. 15th IEEE Intl. Conf. on Distributed Computing Systems," 1995.
No context found.
P. Dasgupta, Z. Kedem, and M. Rabin. Parallel processing on networks of workstations: a fault tolerant, high performance approach. In Proc. of the 15th International Conference on Distributed Systems, 1995, pp. 467--474.
No context found.
P. Dasgupta, Z. Kedem, M. Rabin. "Parallel processing on networks of workstations: a fault tolerant, high performance approach". In Proc. of the 15th International Conference on Distributed Systems, 1995.
No context found.
Dasgupta, P., Kedem, Z.M., Rabin, M.O.: Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach. 15th International Conference on Distributed Computing Systems ICDCS'95, (1995) 467--474
No context found.
Dasgupta, P., Kedem, Z.M., Rabin, M.O.: Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach. 15th International Conference on Distributed Computing Systems ICDCS'95, (1995) 467--474
No context found.
Dasgupta, P., Kedem, Z.M., Rabin, M.O.: Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach. 15th International Conference on Distributed Computing Systems ICDCS'95, (1995) 467--474
No context found.
P. Dasgupta, Z. Kedem, and M. Rabin. Parallel processing on networks of workstation: A fault-tolerant, high performance approach. In Proceedings of the 15 IEEE International Conference on Distributed Computer Systems (ICDCS 1995.
No context found.
Dasgupta, P., Kedem, Z., Rabin, M.: Parallel Processing on Networks of Workstation: A FaultTolerant, High Performance Approach. Proc. of the International Conference on Distributed Computer Systems (1995) 467--474
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC