Results 1 - 10
of
23
A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems
- IEEE Transactions on Software Engineering
, 1988
"... Abstract-One measure of usefulness of a general-purpose distrib-uted computing system is the system’s ability to provide a level of per-formance commensurate to the degree of multiplicity of resources pres-ent in the system. Many different approaches and metrics of performance have been proposed in ..."
Abstract
-
Cited by 223 (0 self)
- Add to MetaCart
Abstract-One measure of usefulness of a general-purpose distrib-uted computing system is the system’s ability to provide a level of per-formance commensurate to the degree of multiplicity of resources pres-ent in the system. Many different approaches and metrics of performance have been proposed in an attempt to achieve this goal in existing systems. In addition, analogous problem formulations exist in other fields such as control theory, operations research, and produc-tion management. However, due to the wide variety of approaches to this problem, it is difficult to meaningfully compare different systems since there is no uniform means for qualitatively or quantitatively eval-uating them. It is difficult to successfully build upon existing work or identify areas worthy of additional effort without some understanding of the relationships between past efforts. In this paper, a taxonomy of approaches to the resource management problem is presented in an attempt to provide a common terminology and classification mecha-nism necessary in addressing this problem. The taxonomy, while pre-sented and discussed in terms of distributed scheduling, is also appli-cable to most types of resource management. As an illustration of the usefulness of the taxonomy an annotated bibliography is given which classifies a large number of distributed scheduling approaches accord-ing to the taxonomy. Index Terms-Distributed operating systems, distributed resource management, general-purpose distributed computing systems, sched-uling, task allocation, taxonomy. T I.
Models of Machines and Computation for Mapping in Multicomputers
, 1993
"... It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonly-accepted framework ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonly-accepted framework whereby results in the field can be compared. Nor is it always easy to assess the relevance of a new result to a particular problem. Furthermore, changes in parallel computing technology have made some of the earlier work of less relevance to current multiprocessor systems. Versions of the mapping problem are classified, and research in the field is considered in terms of its relevance to the problem of programming currently available hardware in the form of a distributed memory multiple instruction stream multiple data stream computer: a multicomputer.
Design, implementation, and evaluation of parallell pipelined STAP on parallel computers
- In IEEE IPPS/SPDP'98
, 1998
"... Performance results are presented for the design and implementation of parallel pipelined space-time adaptive processing (STAP) algorithms on parallel computers. In particular, the issues involved in parallelization, our approach to parallelization, and performance results on an Intel Paragon are de ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Performance results are presented for the design and implementation of parallel pipelined space-time adaptive processing (STAP) algorithms on parallel computers. In particular, the issues involved in parallelization, our approach to parallelization, and performance results on an Intel Paragon are described. The process of developing software for such an application on parallel computers when latency and throughput are both considered together is discussed and tradeoffs considered with respect to inter and intratask communication and data redistribution are presented. The results show that not only scalable performance was achieved for individual component tasks of STAP but linear speedups were obtained for the integrated task performance, both for latency as well as throughput. Results are presented for up to 236 compute nodes (limited by the machine size available to us). Another interesting observation made from the implementation results is that performance improvement due to the assignment of additional processors to one task can improve the performance of other tasks without any increase in the number of processors assigned to them. Normally, this cannot be predicted by theoretical analysis. Manuscript received January 29, 1999; revised June 9 and
Mapping Arbitrary Non-Uniform Task Graphs onto Arbitrary Non-Uniform System Graphs
- In Proc. International Conference on Parallel Processing
, 1995
"... this paper, a generic technique for clustering and mapping arbitrary task graphs onto arbitrary system graphs is presented. The task and system graphs studied in this paper have non-uniform computation and communication weights associated with the nodes and edges. The task graphs are directed graphs ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
this paper, a generic technique for clustering and mapping arbitrary task graphs onto arbitrary system graphs is presented. The task and system graphs studied in this paper have non-uniform computation and communication weights associated with the nodes and edges. The task graphs are directed graphs, while the system graphs are undirected. Using two clustering algorithms presented, a multi-level clustered graph called Spec graph can be obtained from a given task graph, and a multi-level clustered graph called Rep graph can be obtained from a given system graph. We present a mapping algorithm which produces a sub-optimal matching of a given Spec graph containing M task modules, onto a Rep graph of N processors, in O(MP ) time, where P = max(M;N ). This algorithm is the first technique which can map arbitrary task graphs with non-uniform nodes and edges onto arbitrary system graphs with non-uniform nodes and edges. A number of algorithms exist which can map an arbitrary non-uniform task graph onto a specific uniform system graph. Even though our algorithm is more generic, we still compare ours with these specialized techniques and show that our technique produces similar results with lower time complexity. 1 Introduction The mapping problem is one of the most challenging problems in parallel and distributed computing [9, 16]. It is known to be NP-complete in its general form as well as several restricted forms [16]. The mapping problem has been studied in a number of different ways in literature. Mapping can be either static or dynamic. In static mapping, the assignments of the nodes
Video Signal Processing and Coding on Data-Parallel Computers
, 1995
"... Special purpose hardware has been traditionally viewed as the only practical solution for high speed Video Signal Processing (VSP). However, new parallel computing technologies may provide a much more flexible alternative. In this paper we discuss techniques for implementing VSP algorithms on data-p ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Special purpose hardware has been traditionally viewed as the only practical solution for high speed Video Signal Processing (VSP). However, new parallel computing technologies may provide a much more flexible alternative. In this paper we discuss techniques for implementing VSP algorithms on data-parallel computers, including data distribution and the tradeoffs between memory usage, communication, and computation. We provide theoretical analyses and illustrate them with examples of implementations written for the MasPar data-parallel computers. The algorithms studied here are a selection of classical algorithms that includes block-DCT coding, subband coding and block-matching motion estimation. Additionally, two new algorithms of the authors are also presented, on intraframe nonorthogonal subband coding and on motion compensated 3-D subband coding. 1 Introduction The concurrent progress in digital video signal processing (VSP) technologies and digital video transmission on the one h...
Improved Compressions of Cube-Connected Cycles Networks
"... We present a new technique for the embedding of large cube-connected cycles networks (CCC) into smaller ones, a problem that arises when algorithms designed for an architecture of an ideal size are to be executed on an existing architecture of a fixed size. Using the new embedding strategy, we show ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We present a new technique for the embedding of large cube-connected cycles networks (CCC) into smaller ones, a problem that arises when algorithms designed for an architecture of an ideal size are to be executed on an existing architecture of a fixed size. Using the new embedding strategy, we show that the CCC of dimension l can be embedded into the CCC of dimension k with dilation 1 and optimum load for any k; l 2 IN , k 8, such that 5 3 + c k ! l k 2, c k = 4k + 3 3 \Delta 2 2=3k , thus improving known results. Our embedding technique also leads to improved dilation 1 embeddings in the case 3 2 ! l k 5 3 + c k . Index Terms Parallel computations, parallel architectures, interconnection networks, graph embedding, network simulation, cube-connected cycles network. 1 Introduction Over the past few years, a lot of research has been done in the field of interconnection networks for parallel computer architectures (for an overview, cf. [19]). Much of the work has been focused...
Compressing Cube-Connected Cycles and Butterfly Networks
- Proc. 2nd IEEE Symposium on Parallel and Distributed Processing
, 1990
"... We consider the simulation of large cube-connected cycles (CCC) and large butterfly networks (BFN) on smaller ones, a problem that arises when algorithms designed for an architecture of an ideal size are to be executed on an existing architecture of a fixed size. We show that large CCC's and BFN 's ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
We consider the simulation of large cube-connected cycles (CCC) and large butterfly networks (BFN) on smaller ones, a problem that arises when algorithms designed for an architecture of an ideal size are to be executed on an existing architecture of a fixed size. We show that large CCC's and BFN 's can be embedded into smaller networks of the same type with (a) dilation 2 and optimum load, (b) dilation 1 and optimum load in most cases, (c) dilation 1 and nearly optimum load in all cases. Our results show that large CCC's and BFN 's can be simulated very efficiently on smaller ones. Additionally, we implemented our algorithm for compressing CCC's and ran several experiments on a Transputer network, which showed that our technique also behaves very well from a practical point of view. A preliminary version of these results appears in: Proc. 2nd IEEE Symposium on Parallel and Distributed Processing (1990), pp. 858-865. y This work was supported by grant Mo 285/4-1 from the German Re...
Composition of Specifications of Message Passing Applications Composed by the Ensemble Methodology
- in Proc. of 6 th Hellenic Conference on Informatics, Athens, volume I, 299-312, Ekdoseis Neon Technologion
, 1997
"... We present a specification composition technique which supports the message passing composition of applications by the Ensemble methodology. In Ensemble, applications are built by composing reusable executable program components designed with scalable communication interfaces. The composition is spe ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We present a specification composition technique which supports the message passing composition of applications by the Ensemble methodology. In Ensemble, applications are built by composing reusable executable program components designed with scalable communication interfaces. The composition is specified by scripts. We formally support Ensemble by defining reusable specifications of program components which are then composed to obtain the complete specification of the application. Formal methods may then be used to test and verify applications. The composition is controlled by the same script that is used to compose the application. We use coloured Petri nets (CPN) for component specification and their composition. The association of program components with their specification components and the composition of programs and their specifications in similar ways, improves the software production process. 1. Introduction The software engineering step from message passing application des...
Load Balanced Tree Embeddings
, 1991
"... When an n-processor architecture T is embedded into an m-processor architecture H with n ? m and every processor of H simulates at least bn=mc and at most dn=me processors of T , the embedding has a balanced processor load. We present efficient embeddings with a balanced load for the case when both ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
When an n-processor architecture T is embedded into an m-processor architecture H with n ? m and every processor of H simulates at least bn=mc and at most dn=me processors of T , the embedding has a balanced processor load. We present efficient embeddings with a balanced load for the case when both architectures are complete binary trees. We show that T can be embedded into H with a dilation of 1 and a congestion of at most minfd n m e; 2 log ng. We also consider embeddings that achieve a balanced l/i load; i.e., every processor of H simulates at most d n+1 2m e leaves and at most d n\Gamma1 2m e interior processors of T . We present an embedding that achieves a balanced l/i load, a dilation of 2dlog log me + 1 and a congestion of O(log n): We show that every embedding strategy achieving a balanced l/i load must have a dilation of at least 3. We also consider the embedding problem when every edge of T has a weight associated with it. Keywords Graph embeddings, binary tree ne...

