Results 1  10
of
13
A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers)
, 2001
"... In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the perform ..."
Abstract

Cited by 59 (24 self)
 Add to MetaCart
(Show Context)
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these linear algebra kernels on heterogeneous grids to the speed of the slowest processor. We present and study more sophisticated data allocation strategies that balance the load on heterogeneous platforms with respect to the performance of the processors. When targeting unidimensional grids, the loadbalancing problem can be solved rather easily. When targeting twodimensional grids, which are the key to scalability and efficiency for numerical kernels, the problem turns out to be surprisingly difficult. We formally state the 2D loadbalancing problem and prove its NPcompleteness. Next, we introduce a data allocation heuristic, which turns out to be very satisfactory: Its practical usefulness is demonstrated by MPI experiments conducted with a heterogeneous network of workstations.
Non Linear Divisible Loads: There is No Free Lunch
, 2013
"... Abstract—Divisible Load Theory (DLT) has received a lot of attention in the past decade. A divisible load is a perfect parallel task, that can be split arbitrarily and executed in parallel on a set of possibly heterogeneous resources. The success of DLT is strongly related to the existence of many o ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Divisible Load Theory (DLT) has received a lot of attention in the past decade. A divisible load is a perfect parallel task, that can be split arbitrarily and executed in parallel on a set of possibly heterogeneous resources. The success of DLT is strongly related to the existence of many optimal resource allocation and scheduling algorithms, what strongly differs from general scheduling theory. Moreover, recently, close relationships have been underlined between DLT, that provides a fruitful theoretical framework for scheduling jobs on heterogeneous platforms, and MapReduce, that provides a simple and efficient programming framework to deploy applications on large scale distributed platforms. The success of both have suggested to extend their framework to nonlinear complexity tasks. In this paper, we show that both DLT and MapReduce are better suited to workloads with linear complexity. In particular, we prove that divisible load theory cannot directly be applied to quadratic workloads, such as it has been proposed recently. We precisely state the limits for classical DLT studies and we review and propose solutions based on a careful preparation of the dataset and clever data partitioning algorithms. In particular, through simulations, we show the possible impact of this approach on the volume of communications generated by MapReduce, in the context of Matrix Multiplication and Outer Product algorithms.
Revisiting matrix product on masterworker platforms
, 2006
"... This paper is aimed at designing efficient parallel matrixproduct algorithms for heterogeneous masterworker platforms. While matrixproduct is wellunderstood for homogeneous 2Darrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses that ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
This paper is aimed at designing efficient parallel matrixproduct algorithms for heterogeneous masterworker platforms. While matrixproduct is wellunderstood for homogeneous 2Darrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses that render our work original and innovative: Centralized data. We assume that all matrix files originate from, and must be returned to, the master. The master distributes both data and computations to the workers (while in ScaLAPACK, input and output matrices are initially distributed among participating resources). Typically, our approach is useful in the context of speeding up MATLAB or SCILAB clients running on a server (which acts as the master and initial repository of files). Heterogeneous starshaped platforms. We target fully heterogeneous platforms, where computational resources have different computing powers. Also, the workers are connected to the master by links of different capacities. This framework is realistic when deploying the application from the server, which is responsible for enrolling authorized resources. Limited memory. Because we investigate the parallelization of large problems, we cannot assume that full matrix panels can be stored in the worker memories and reused for subsequent updates (as in ScaLAPACK). The amount of memory available in each worker is expressed as a given number mi of buffers, where a buffer can store a square block of matrix elements. The size q of these square blocks is chosen so as to harness the power of Level 3 BLAS routines: q = 80 or 100 on most platforms. We have devised efficient algorithms for resource selection (deciding which workers to enroll) and communication ordering (both for input and result messages), and we report a set of numerical experiments on various platforms at École Normale Supérieure de Lyon and the University of Tennessee. However, we point out that in this first version of the report, experiments are limited to homogeneous platforms.
DAGmaps: Space Filling Visualization of Directed Acyclic Graphs
, 2009
"... Gene Ontology information related to the biological role of genes is organized in a hierarchical manner that can be represented by a directed acyclic graph (DAG). Space filling visualizations, such as the treemaps, have the capacity to display thousands of items legibly in limited space via a twodi ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Gene Ontology information related to the biological role of genes is organized in a hierarchical manner that can be represented by a directed acyclic graph (DAG). Space filling visualizations, such as the treemaps, have the capacity to display thousands of items legibly in limited space via a twodimensional rectangular map. Treemaps have been used to visualize the Gene Ontology by first transforming the DAG into a tree. However this transformation has several undesirable effects such as producing trees with a large number of nodes and scattering the rectangles associated with the duplicates of a node around the display rectangle. In this paper we introduce the problem of visualizing a DAG with space filling techniques without converting it to a tree first, we present two special cases of the problem, and we discuss complexity issues. Submitted:
DLBEM: Dynamic Load Balancing Using ExpectationMaximization
"... Abstract—This paper proposes a dynamic load balancing strategy called DLBEM based on maximum likelihood estimation methods for parallel and distributed applications. A mixture Gaussian model is employed to characterize workload in dataintensive applications. Using a small subset of workload inform ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—This paper proposes a dynamic load balancing strategy called DLBEM based on maximum likelihood estimation methods for parallel and distributed applications. A mixture Gaussian model is employed to characterize workload in dataintensive applications. Using a small subset of workload information in systems, the DLBEM strategy reduces considerable communication overheads caused by workload information exchange and job migration. In the meantime, based on the ExpectationMaximization algorithm, DLBEM achieves near accurate estimation of the global system state with significantly less communication overheads and results in efficient workload balancing. Simulation results for some representative cases on a twodimensional 16*16 grid demonstrate that DLBEM approach achieves even resource utilization and over 90 % accuracy in the estimation of the global system state information with over 70% reduction on communication overheads compared to a baseline strategy. I.
MaxPlus algebra and discrete event simulation on parallel hierarchical heterogeneous platforms
 Lecture Notes in Computer Science, EuroPar Workshops Proceedings 2010, as Part of HeteroPar
, 2010
"... Abstract. In this paper we explore computing maxplus algebra operations and discrete event simulations on parallel hierarchal heterogeneous platforms. When performing such tasks on heterogeneous platforms parameters such as the total volume of communication and the toplevel data partitioning strat ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we explore computing maxplus algebra operations and discrete event simulations on parallel hierarchal heterogeneous platforms. When performing such tasks on heterogeneous platforms parameters such as the total volume of communication and the toplevel data partitioning strategy must be carefully taken into account. Choice of the partitioning strategy is shown to greatly affect the overall performance of these applications due to different volumes of interpartition communication that various strategies impart on these operations. One partitioning strategy in particular is shown to reduce the execution times of these operations more than other, more traditional strategies. The main goal of this paper is to present benefits waiting to be exploited by the use of maxplus algebra operations on these platforms and thus speeding up more complex and quite common computational topic areas such as discrete event simulation.
Heterogeneous MatrixMatrix Multiplication or Partitioning a Square into Rectangles: NPCompleteness and Approximation Algorithms
, 2001
"... In this pape6 we deal with n ~ o geometric problems arising froin heterogeneous parallel computing: how to partition the unit square irito p rectangles of given area SI, sa,... sp (such that E:='=, s = l), so as to minimize (i) either the sum of the p perimeters of the rectangles (i i) or th ..."
Abstract
 Add to MetaCart
In this pape6 we deal with n ~ o geometric problems arising froin heterogeneous parallel computing: how to partition the unit square irito p rectangles of given area SI, sa,... sp (such that E:='=, s = l), so as to minimize (i) either the sum of the p perimeters of the rectangles (i i) or the largest perimeter of the p rectangles. For both problems, we prove NPcompleteness and we introduce approximation algorithms.
Novel Force Matrix Transformations with Optimal LoadBalance for 3body Potential based Parallel Molecular Dynamics in a Heterogeneous Cluster Environment
, 2007
"... Evaluating the Force Matrix constitutes the most computationally intensive part of a Molecular Dynamics (MD) simulation. In threebody MD simulations, the total energy of the system is determined by the energy of every unique triple in the system and the force matrix is threedimensional. The execut ..."
Abstract
 Add to MetaCart
(Show Context)
Evaluating the Force Matrix constitutes the most computationally intensive part of a Molecular Dynamics (MD) simulation. In threebody MD simulations, the total energy of the system is determined by the energy of every unique triple in the system and the force matrix is threedimensional. The execution time of a threebody MD algorithm is thus proportional to the cube of the number of atoms in the system. Fortunately, there exist symmetries in the Force Matrix that can be exploited to improve the running time of the algorithm. While this optimization is straight forward to implement in the case of sequential code, it has proven to be nontrivial for parallel code even in a homogeneous environment. In this paper, we present two force matrix transformations that are capable of exploiting the symmetries in a 3body
GRENOBLE – RHÔNEALPES
, 2012
"... Abstract: Divisible Load Theory (DLT) has received a lot of attention in the past decade. A divisible load is a perfect parallel task, that can be split arbitrarily and executed in parallel on a set of possibly heterogeneous resources. The success of DLT is strongly related to the existence of many ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract: Divisible Load Theory (DLT) has received a lot of attention in the past decade. A divisible load is a perfect parallel task, that can be split arbitrarily and executed in parallel on a set of possibly heterogeneous resources. The success of DLT is strongly related to the existence of many optimal resource allocation and scheduling algorithms, what strongly differs from general scheduling theory. Moreover, recently, close relationships have been underlined between DLT, that provides a fruitful theoretical framework for scheduling jobs on heterogeneous platforms, and MapReduce, that provides a simple and efficient programming framework to deploy applications on large scale distributed platforms. The success of both have suggested to extend their framework to nonlinear complexity tasks. In this paper, we show that both DLT and MapReduce are better suited to workloads with linear complexity. In particular, we prove that divisible load theory cannot directly be applied to quadratic workloads, such as it has been proposed recently. We precisely state the limits for classical DLT studies and we review and propose solutions based on a careful preparation of the dataset and clever data partitioning algorithms. In particular, through simulations, we show the possible impact of this approach on the volume of communications generated by MapReduce, in the context of Matrix Multiplication and Outer Product algorithms. Keywords:
A New Paradigm for Load Balancing in Wireless Mesh Networks
"... Abstract: Obtaining maximum throughput across a network or a mesh through optimal load balancing is known to be an NPhard problem. Designing efficient load balancing algorithms for networks in the wireless domain becomes an especially challenging task due to the limited bandwidth available. In this ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract: Obtaining maximum throughput across a network or a mesh through optimal load balancing is known to be an NPhard problem. Designing efficient load balancing algorithms for networks in the wireless domain becomes an especially challenging task due to the limited bandwidth available. In this paper we present heuristic algorithms for load balancing and maximum throughput scheduling in Wireless Mesh Networks with stationary nodes. The goals are to (a) improve the network throughput through admissibly optimal distribution of the network traffic across the wireless links, (b) ensure that the scheme is secure, and (c) ensure fairness to all nodes in the network for bandwidth allocation. The main consideration is the routing of nonlocal traffic between the nodes and the destination via multiple Internet gateways. Our schemes split an individual node’s traffic to the Internet across multiple gateways that are accessible from it. Simulation results show that this approach results in marked increase in average network throughput in moderate to heavy traffic scenarios. We also prove that in our algorithm it is very difficult for an adversary to block a fraction of a node’s available paths, making it extremely hard to compromise all traffic from a node. Simulation results also show that our scheme is admissibly fair in bandwidth allocation even to nodes with longest paths to the gateway nodes.