Results 1 
9 of
9
Dynamic Load Balancing with Group Communication
, 2006
"... This work considers the problem of efficiently performing a set of tasks using a network of processors in the setting where the network is subject to dynamic reconfigurations, including partitions and merges. A key challenge for this setting is the implementation of dynamic load balancing that reduc ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
This work considers the problem of efficiently performing a set of tasks using a network of processors in the setting where the network is subject to dynamic reconfigurations, including partitions and merges. A key challenge for this setting is the implementation of dynamic load balancing that reduces the number of tasks that are performed redundantly because of the reconfigurations. We explore new approaches for load balancing in dynamic networks that can be employed by applications using a group communication service. The group communication services that we consider include a membership service (establishing new groups to reflect dynamic changes) but does not include maintenance of a primary component. For the nprocessor, ntask load balancing problem defined in this work, the following specific results are obtained. For the case of fully dynamic changes including fragmentation and merges we show that the termination time of any online task assignment algorithm is greater than the termination time of an offline task assignment algorithm by a factor greater than n/12. We present a load balancing algorithm that guarantees completion of all tasks in all fragments caused by partitions with work O(n + f · n) in the presence of f fragmentation failures. We develop an effective scheduling strategy for minimizing the task execution redundancy and we prove that our strategy provides each of the n processors with a schedule of Θ(n 1/3) tasks such that at most one task is performed redundantly by any two processors.
A WorkOptimal Deterministic Algorithm for the Asynchronous Certified WriteAll Problem (Extended Abstract)
 22nd ACM Symposium on Principles of Distributed Computing PODC’03
, 2003
"... In their SIAM J. on Computing paper [27] from 1992, Martel et al. posed a question for developing a workoptimal deterministic asynchronous algorithm for the fundamental loadbalancing and synchronization problem called Certified WriteAll. In this problem, introduced in a slightly different form by ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
In their SIAM J. on Computing paper [27] from 1992, Martel et al. posed a question for developing a workoptimal deterministic asynchronous algorithm for the fundamental loadbalancing and synchronization problem called Certified WriteAll. In this problem, introduced in a slightly different form by Kanellakis and Shvartsman in a PODC'89 paper [17], $p$ processors must update $n$ memory cells and then signal the completion of the updates. It is known that solutions to this problem can be used to simulate synchronous parallel programs on asynchronous systems with worstcase guarantees for the overhead of a simulation. Such simulations are interesting because they may increase productivity in parallel computing since synchronous parallel programs are easier to reason about than asynchronous ones are. This paper presents a solution to the question of Martel et al. Specifically, we show a deterministic asynchronous algorithm for the Certified WriteAll problem. Our algorithm has $O(n p^4\log n)$ work, which is optimal for a nontrivial number of processors $p\leq \bra{n/\log n}^{1/4}$. In contrast, all known deterministic algorithms require superlinear in $n$ work when $p= n^{1/r}$, for any fixed $r\geq 1$. Our algorithm generalizes the collision principle used by the algorithm T that was introduced by Buss et al. [7].
Challenges in evaluating distributed algorithms
 Future Directions in Distributed Computing, volume 2584 of LNCS
, 2003
"... ..."
A Method for Creating NearOptimal Instances of a Certified WriteAll Algorithm
 11th Annual European Symposium on Algorithms (ESA’03
, 2003
"... This paper shows how to create nearoptimal instances of the Certified WriteAll algorithm called AWT that was introduced by Anderson and Woll [2]. This algorithm is the best known deterministic algorithm that can be used to simulate n synchronous parallel processors on n asynchronous processors. In ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
This paper shows how to create nearoptimal instances of the Certified WriteAll algorithm called AWT that was introduced by Anderson and Woll [2]. This algorithm is the best known deterministic algorithm that can be used to simulate n synchronous parallel processors on n asynchronous processors. In this algorithm n processors update n memory cells and then signal the completion of the updates. The algorithm is instantiated with q permutations, where q can be chosen from a wide range of values. When implementing a simulation on a specific parallel system with n processors, one would like to use an instance of the algorithm with the best possible value of q, in order to maximize the efficiency of the simulation. This paper shows that the choice of q is critical for obtaining an instance of the AWT algorithm with nearoptimal work. For any > 0, and any large enough n, work of any instance of the algorithm must be at least n . Under certain conditions, however, that q is about e and for infinitely many large enough n, this lower bound can be nearly attained by instances of the algorithm with work at most n . The paper also shows a penalty for not selecting q well. When q is significantly away from e , then work of any instance of the algorithm with this displaced q must be considerably higher than otherwise.
Group membership and widearea masterworker computations
 In Proc. 23rd ICDCS
, 2003
"... Abstract Group communications systems have been designed to provide an infrastructure for faulttolerance in distributed systems, including widearea systems. In our work on masterworker computation for GriPhyN, which is a large project in the area of the computational grid, we asked the question ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract Group communications systems have been designed to provide an infrastructure for faulttolerance in distributed systems, including widearea systems. In our work on masterworker computation for GriPhyN, which is a large project in the area of the computational grid, we asked the question should we build our widearea masterworker computation using widearea group communications? This paperexplains why we decided doing so was not a good idea. 1
Distributed Computation Meets Design Theory: Local Scheduling for Disconnected Cooperation
 BULLETIN OF THE EATCS
, 2004
"... ..."
Practical FaultTolerance for Mobile Agents
, 2011
"... The amount of computational resources available on the Internet is increasing. Effectively using these resources for distributed computations is challenging. An infrastructure called computational grids provides tools for structuring and deploying largescale distributed computations on the Internet ..."
Abstract
 Add to MetaCart
(Show Context)
The amount of computational resources available on the Internet is increasing. Effectively using these resources for distributed computations is challenging. An infrastructure called computational grids provides tools for structuring and deploying largescale distributed computations on the Internet. One of the key problems in computational grids is managing the available computational resources; tools based on mobile agents are being advocated to solve this problem. However, to be widely adopted, such tools must be robust towards failures in the grid environment, and thus require effective mechanisms for mobile agent faulttolerance. To gain insight on how grid applications perform on the Internet, this dissertation investigates two masterworker algorithms, one based on group communication and one based on message flooding. Both algorithms are executed in simulations using Internet communication traces. The results from running and evaluating the algorithms are used to infer requirements for our mobile agent faulttolerance approach. This dissertation then derives a faulttolerant mobile agent protocol. The protocol is rooted in the primarybackup approach, where a set of backups monitor the progress of the mobile agent during the computation. The protocol allows the set of backups to be changed during the computation to adapt to the current network topology. The dissertation then describes an implementation of our protocol on top of a mobile agent platform, and evaluates the performance of the protocol. The results show that explicit management of backups can be beneficial to performance, and that our protocol is applicable outside the scope of mobile agent computations. i ii