Results 1  10
of
28
Parallel Superposition for Bulk Synchronous Parallel ML
, 2003
"... The BSMLlib is a library for Bulk Synchronous Parallel programming with the functional language Objective Caml. It is based on an extension of the lcalculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
The BSMLlib is a library for Bulk Synchronous Parallel programming with the functional language Objective Caml. It is based on an extension of the lcalculus by parallel operations on a parallel data structure named parallel vector, which is given by intention.
Parallel Juxtaposition for Bulk Synchronous Parallel ML
 EuroPar 2003, number 2790 in LNCS
, 2002
"... The BSMLlib is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the #calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
(Show Context)
The BSMLlib is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the #calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention.
A simple and efficient parallel FFT algorithm using the BSP model
 PARALLEL COMPUT
, 2000
"... In this paper, we present a new parallel radix4 FFT algorithm based on the BSP model. Our parallel algorithm uses the groupcyclic distribution family, which makes it simple to understand and easy to implement. We show how to reduce the communication cost of the algorithm by a factor of three, in ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we present a new parallel radix4 FFT algorithm based on the BSP model. Our parallel algorithm uses the groupcyclic distribution family, which makes it simple to understand and easy to implement. We show how to reduce the communication cost of the algorithm by a factor of three, in the case that the input/output vector is in the cyclic distribution. We also show how to reduce computation time on computers with a cachebased architecture. We present performance results on a Cray T3E with up to 64 processors, obtaining reasonable efficiency levels for local problem sizes as small as 256 and very good efficiency levels for sizes larger than 2048.
Evaluation of Two BSP Libraries through Parallel Sorting on Clusters
, 2000
"... We present our experiences in developping and tuning the performance at the user level, of (in core) parallel sorting on homogeneous and non homogeneous clusters with the use of the two available BSP (Bulk Synchronous Parallel model) libraries: BSPLib from Oxford university (UK) and PUB7 from the ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
We present our experiences in developping and tuning the performance at the user level, of (in core) parallel sorting on homogeneous and non homogeneous clusters with the use of the two available BSP (Bulk Synchronous Parallel model) libraries: BSPLib from Oxford university (UK) and PUB7 from the university of Paderborn (Germany). The paper is mainly about the communication performances of these two libraries and, in more general terms, it compares and summarizes the programming facilities and dierences between them.
SPCXML: A structured representation for nestedparallel programming languages
, 2005
"... Nestedparallelism programming models, where the task graph associated to a computation is seriesparallel, present good analysis properties that can be exploited for scheduling, cost estimation or automatic mapping to different architectures. In this paper we present an XML intermediate representa ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Nestedparallelism programming models, where the task graph associated to a computation is seriesparallel, present good analysis properties that can be exploited for scheduling, cost estimation or automatic mapping to different architectures. In this paper we present an XML intermediate representation for nestedparallel programming languages from which the application taskgraph can be easily derived. We introduce some design principles oriented to allow the compiler to exploit information about the task synchronization structure, automatically determine implicit communication structures, apply different scheduling policies, and generate lowerlevel code using different models or communication tools. Results obtained for simple applications, using an extensible prototype compiler framework, show how this flexible approach can lead to portable and efficient implementations.
Portable checkpointing and communication for BSP applications on dynamic heterogeneous Grid environments
 In SBACPAD’05: The 17th International Symposium on Computer Architecture and High Performance Computing (Rio de Janeiro
, 2005
"... Executing longrunning parallel applications in Opportunistic Grid environments composed of heterogeneous, shared user workstations, is a daunting task. Machines may fail, become unaccessible, or may switch from idle to busy unexpectedly, compromising the execution of applications. A mechanism for f ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Executing longrunning parallel applications in Opportunistic Grid environments composed of heterogeneous, shared user workstations, is a daunting task. Machines may fail, become unaccessible, or may switch from idle to busy unexpectedly, compromising the execution of applications. A mechanism for faulttolerance that supports these heterogeneous architectures is an important requirement for such a system. In this paper, we describe the support for faulttolerant execution of BSP parallel applications on heterogeneous, shared workstations. A precompiler instruments application source code to save state periodically into checkpoint files. In case of failure, it is possible to recover the stored state from these files. Generated checkpoints are portable and can be recovered in a machine of different architecture, with data representation conversions being performed at recovery time. The precompiler also modifies BSP parallel applications to allow execution on a Grid composed of machines with different architectures. We implemented a monitoring and recovering infrastructure in the InteGrade Grid middleware. Experimental results evaluate the overhead incurred and the viability of using this approach in a Grid environment. 1.
Parallel Bridging Models and Their Impact on Algorithm Design
 In Proc. Int'l Conf. on Computational Science, Part II
, 2001
"... The aim of this paper is to demonstrate the impact of features of parallel computation models on the design of efficient parallel algorithms. For this purpose, we start with considering Valiant's BSP model and design an optimal multisearch algorithm. For a realistic extension of this model w ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The aim of this paper is to demonstrate the impact of features of parallel computation models on the design of efficient parallel algorithms. For this purpose, we start with considering Valiant's BSP model and design an optimal multisearch algorithm. For a realistic extension of this model which takes the critical blocksize into account, namely the BSP* model due to Baumker, Dittrich, and Meyer auf der Heide, this algorithm is far from optimal. We show how the critical blocksize can be taken into account by presenting a modified multisearch algorithm which is optimal in the BSP* model. Similarly, we consider the DBSP model due to de la Torre and Kruskal which extends BSP by introducing a way to measure locality of communication. Its influence on algorithm design is demonstrated by considering the broadcast problem. Finally, we explain how our Paderborn University BSP (PUB) Library incorporates such BSP extensions. 1
1optimality of static BSP computations: scheduling independent chains as a case study
, 2001
"... The aim of this work is to study a specific scheduling problem under the machineindependent model BSP. The problem of scheduling a set of independent chains in this context is shown to be a difficult optimization problem, but it can be easily approximated in practice. Efficient heuristics taking in ..."
Abstract
 Add to MetaCart
(Show Context)
The aim of this work is to study a specific scheduling problem under the machineindependent model BSP. The problem of scheduling a set of independent chains in this context is shown to be a difficult optimization problem, but it can be easily approximated in practice. Efficient heuristics taking into account communications are proposed and analyzed in this paper. We particularly focus on the influence of synchronization between consecutive supersteps. A family of algorithms is proposed with the best possible loadbalancing. Then, a strategy for determining a good compromise between the two opposite criteria of minimizing the number of supersteps and a good balance of the load is derived. Finally, a heuristic which considers the inuence of the latency is presented. Simulations of a large number of instances have been carried out to complement the theoretical worst case analysis. They confirm the very good behavior of the algorithms on the average cases.
Parallel Processing Letters cfl World Scientific Publishing Company A NEW WAY TO DIVIDE AND CONQUER
"... ABSTRACT Valiant's model of bulksynchronous parallel (BSP) computation does not allow the programmer to synchronize a subset, rather than the complete set of a parallel computer's processors. This is perceived by many to be an obstacle to expressing divideandconquer algorithms in the BS ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT Valiant's model of bulksynchronous parallel (BSP) computation does not allow the programmer to synchronize a subset, rather than the complete set of a parallel computer's processors. This is perceived by many to be an obstacle to expressing divideandconquer algorithms in the BSP model. We argue that the divideandconquer paradigm fits naturally into the BSP model, without any need for subset synchronization. The proposed method of divideandconquer BSP programming is fully compliant with the BSP computation model. The method is based on sequentially interleaved threads of BSP computation, called superthreads. Keywords: Bulksynchronous parallel; BSP; Subset synchronization; Divideandconquer; Superthreads.