Results 1  10
of
15
Broadcast Trees for Heterogeneous Platforms
, 2004
"... In this paper, we deal with broadcasting on heterogeneous platforms. Typically, the message to be broadcast is split into several slices, which are sent by the source processor in a pipeline fashion. A spanning tree is used to implement this operation, and the objective is to find the tree which max ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we deal with broadcasting on heterogeneous platforms. Typically, the message to be broadcast is split into several slices, which are sent by the source processor in a pipeline fashion. A spanning tree is used to implement this operation, and the objective is to find the tree which maximizes the throughput, i.e. the average number of slices sent by the source processor every timeunit. We introduce several heuristics to solve this problem. The good news is that the best heuristics perform quite efficiently, reaching more than 70 % of the absolute optimal throughput, thereby providing a simple yet efficient approach to achieve very good performance for broadcasting on heterogeneous platforms.
Complexity results for collective communications on heterogeneous platforms
 INT. JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS
, 2006
"... In this paper, we consider the communications involved in the execution of a complex application, deployed on a heterogeneous platform. Such applications extensively use macrocommunication schemes, for example to broadcast data items, either to all resources (broadcast) or to a restricted set of ta ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
In this paper, we consider the communications involved in the execution of a complex application, deployed on a heterogeneous platform. Such applications extensively use macrocommunication schemes, for example to broadcast data items, either to all resources (broadcast) or to a restricted set of targets (multicast). Rather than aiming at minimizing the execution time of a single collective communication, we focus on the steadystate operation. We assume that there is a large number of messages to be broadcast or multicast in pipelined fashion, and we aim at maximizing the throughput, i.e. the (rational) number of messages which can be broadcast or multicast every timestep. We target heterogeneous platforms, modeled by a graph where resources have different communication and computation speeds. Achieving the best throughput may well require that the target platform is used in totality: different messages may need to be transferred along different paths. The main focus of the paper is on complexity results. We aim at presenting a unified framework for analyzing the complexity of collective communication schemes. We concentrate on the classification (whether maximizing the throughput is a polynomial or NPhard problem), rather than actually providing efficient polynomial algorithms (when such algorithms are known, we refer to bibliographical pointers).
Optimizing the steadystate throughput of Broadcasts on heterogeneous platforms
, 2003
"... In this paper, we consider the communications involved by the execution of a complex application, deployed on a heterogeneous "grid" platform. Such applications extensively use macrocommunication schemes, for example to broadcast data items. Rather than aiming at minimizing the execution ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
In this paper, we consider the communications involved by the execution of a complex application, deployed on a heterogeneous "grid" platform. Such applications extensively use macrocommunication schemes, for example to broadcast data items. Rather than aiming at minimizing the execution time of a single broadcast, we focus on the steadystate operation. We assume that there is a large number of messages to be broadcast in pipeline fashion, and we aim at maximizing the throughput, i.e. the (rational) number of messages which can be broadcast every timestep. We target heterogeneous platforms, modeled by a graph where resources have di#erent communication and computation speeds. Achieving the best throughput may well require that the target platform is used in totality: we show that neither spanning trees nor DAGs are as powerful as general graphs. We show how
Broadcasting on large scale heterogeneous platforms under the bounded multiport model
 in Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. IEEE, 2010
"... Abstract—We consider the classical problem of broadcasting a large message at an optimal rate in a large scale distributed network. The main novelty of our approach is that we consider that the set of participating nodes can be split into two parts: ”green ” nodes that stay in the openInternet and ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We consider the classical problem of broadcasting a large message at an optimal rate in a large scale distributed network. The main novelty of our approach is that we consider that the set of participating nodes can be split into two parts: ”green ” nodes that stay in the openInternet and ”red ” nodes that lie behind firewalls or NATs. Two red nodes cannot communicate directly, but rather need to use a green node as a gateway for transmitting a message. In this context, we are interested in both maximizing the throughput (i.e. the rate at which nodes receive the message) and minimizing the degree at the participating nodes, i.e. the number of TCP connections they must handle simultaneously. We both consider cyclic and acyclic solutions for the flow graph. In the cyclic case, our main contributions are a closed form formula for the optimal cyclic throughput and the proof that the optimal solution may require arbitrarily large degrees. In the acyclic case, we prove that it is possible to achieve the optimal throughput with low degree. Then, we prove a worst case ratio between the optimal acyclic and cyclic throughput and show through simulations that this ratio is on average very close to 1, which makes acyclic solutions efficient both in terms of the throughput and the number of connections.
On the Importance of Bandwidth Control Mechanisms for Scheduling on Large Scale Heterogeneous Platforms
 in "24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010)", Atlanta, ÉtatsUnis, 2010, http://hal.inria.fr/inria00444585 32 Activity Report INRIA 2013
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Toward Optimal Complete Exchange on WormholeRouted Tori
 8] GU Naijie, “Efficient Indirect AlltoAll Personalized Communication on Rings and 2D Tori,” Journal of Computer Science and Technology
, 1999
"... ..."
unknown title
, 2004
"... Multinode broadcasting in allported 3D wormholerouted torus using an aggregationthendistribution strategy q YuhShyan Chen a, * , ChaoYu Chiang b, CheYi Chen c ..."
Abstract
 Add to MetaCart
Multinode broadcasting in allported 3D wormholerouted torus using an aggregationthendistribution strategy q YuhShyan Chen a, * , ChaoYu Chiang b, CheYi Chen c
Towards A Scalable Broadcast in WormholeSwitched Mesh Networks
"... Broadcast algorithms for wormholeswitched meshes have been widely reported in the literature. However, most of these algorithms handle broadcast in a sequential manner and do not scale well with the network size. As a consequence, many parallel applications cannot be efficiently supported using ex ..."
Abstract
 Add to MetaCart
(Show Context)
Broadcast algorithms for wormholeswitched meshes have been widely reported in the literature. However, most of these algorithms handle broadcast in a sequential manner and do not scale well with the network size. As a consequence, many parallel applications cannot be efficiently supported using existing algorithms. Motivated by these observations, this paper presents a new broadcast algorithm based on our previously proposed Coded Path Routing (or CPR for short) [I]. The main feature of the proposed algorithm lies in its ability to perform broadcast operations with a high degree of parallelism. Furthermore, its performance is insensitive to the network size, i.e., only two messagepassing steps are required to implement a broadcast operation irrespective of the network size. Results from a comparative analysis reveal that the new algorithm exhibits superior performance characteristics over those of the wellknown
Pipelining Broadcasts on Heterogeneous Platforms under the OnePort Model
, 2004
"... In this paper, we consider the communications involved by the execution of a complex application, deployed on a heterogeneous platform. Such applications extensively use macrocommunication schemes, for example to broadcast data items. Rather than aiming at minimizing the execution time of a single ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we consider the communications involved by the execution of a complex application, deployed on a heterogeneous platform. Such applications extensively use macrocommunication schemes, for example to broadcast data items. Rather than aiming at minimizing the execution time of a single broadcast, we focus on the steadystate operation. We assume that there is a large number of messages to be broadcast in pipeline fashion, and we aim at maximizing the throughput, i.e. the (rational) number of messages which can be broadcast every timestep. We target heterogeneous platforms, modeled by a graph where resources have different communication speeds under the unidirectional oneport model (i.e. at a given time step, a processor can be involved in at most one (incoming or outgoing) communication with one of its neighbors). Achieving the best throughput may well require that the target platform is used in totality: we show that neither spanning trees nor DAGs are as powerful as general graphs. We propose a rather sophisticated polynomial algorithm for determining the optimal throughput that can be achieved using a platform, together with a (periodic) schedule achieving this throughput. The algorithm is based on the use of polynomial oracles and of the ellipsoid method [9, 13] for solving in linear programs in rational numbers. The polynomial compactness of the description comes from the decomposition of the schedule into several broadcast trees that are used concurrently to reach the best throughput. It is important to point out that a concrete scheduling algorithm based upon the steadystate operation is asymptotically optimal, in the class of all possible schedules (not only periodic solutions).
MultiNode Broadcasting in a WormholeRouted 2D Torus Using an AggregationthenDistribution Strategy
"... This paper presents an efficient multinode broadcasting algorithm in a wormholerouted 2D torus, where there are an unknown number of s source nodes located on unknown positions each intending to broadcast a message of size m bytes to the rest of network. The torus is assumed to use the allport m ..."
Abstract
 Add to MetaCart
This paper presents an efficient multinode broadcasting algorithm in a wormholerouted 2D torus, where there are an unknown number of s source nodes located on unknown positions each intending to broadcast a message of size m bytes to the rest of network. The torus is assumed to use the allport model and the popular dimensionordered routing. Most existing results are derived based on finding multiple edgedisjoint spanning trees in the network. The main technique used in this paper is an aggregationthendistribution strategy. First, the broadcast messages are aggregated into some positions of the torus. Then, a number of independent subnetworks are constructed from the torus. These subnetworks, which are responsible for distributing the messages, can well exploit the communication parallelism and the characteristic of wormhole routing. It is shown that such an approach is more appropriate than those using edgedisjoint trees for fixedconnection network such as tori. This is justified by our performance analysis. 1.