Results 1  10
of
89
Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors
, 1999
"... Devices]: Modes of ComputationParallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported ..."
Abstract

Cited by 312 (4 self)
 Add to MetaCart
Devices]: Modes of ComputationParallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported by the Hong Kong Research Grants Council under contract numbers HKUST 734/96E, HKUST 6076/97E, and HKU 7124/99E. Authors' addresses: Y.K. Kwok, Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong; email: ykwok@eee.hku.hk; I. Ahmad, Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and / or a fee. 2000 ACM 03600300/99/12000406 $5.00 ACM Computing Surveys, Vol. 31, No. 4, December 1999 1.
Hypertool: A Programming Aid for MessagePassing Systems
 IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1990
"... As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and errorprone. This paper discusses programming assistance and automation concepts and their application to a program development tool for messagepass ..."
Abstract

Cited by 194 (17 self)
 Add to MetaCart
(Show Context)
As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and errorprone. This paper discusses programming assistance and automation concepts and their application to a program development tool for messagepassing systems called Hypertool. It performs scheduling and handles the communication primitive insertion automatically. Two algorithms, based on the criticalpath method, are presented for scheduling processes statically. Hypertool also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs.
On a New Class of Codes for Identifying Vertices in Graphs
 IEEE Transactions on Information Theory
, 1998
"... We investigate a new class of codes for the optimal covering of vertices in an undirected graph G such that any vertex in G can be uniquely identified by examining the vertices that cover it. We define a ball of radius t centered on a vertex v to be the set of vertices in G that are at dis ..."
Abstract

Cited by 92 (2 self)
 Add to MetaCart
(Show Context)
We investigate a new class of codes for the optimal covering of vertices in an undirected graph G such that any vertex in G can be uniquely identified by examining the vertices that cover it. We define a ball of radius t centered on a vertex v to be the set of vertices in G that are at distance at most t from v: The vertex v is then said to cover itself and every other vertex in the ball with center v: Our formal problem statement is as follows: Given an undirected graph G and an integer t 1, find a (minimal) set C of vertices such that every vertex in G belongs to a unique set of balls of radius t centered at the vertices in C: The set of vertices thus obtained constitutes a code for vertex identification. We first develop topologyindependent bounds on the size of C: We then develop methods for constructing C for several specific topologies such as binary cubes, nonbinary cubes, and trees. We also describe the identification of sets of vertices using covering codes that uniquely identify single vertices. We develop methods for constructing optimal topologies that yield identifying codes with a minimum number of codewords. Finally, we describe an application of the theory developed in this paper to fault diagnosis of multiprocessor systems.
A survey of the theory of hypercube graphs
 Computers and Mathematics with Applications 15
, 1988
"... AlmtractWe present acomprehensive survey of the theory of hypercube graphs. Basic properties related to distance, coloring, domination and genus are reviewed. The properties of the ncube defined by its subgraphs are considered next, including thickness, coarseness, Hamiltonian cycles and induced ..."
Abstract

Cited by 62 (1 self)
 Add to MetaCart
(Show Context)
AlmtractWe present acomprehensive survey of the theory of hypercube graphs. Basic properties related to distance, coloring, domination and genus are reviewed. The properties of the ncube defined by its subgraphs are considered next, including thickness, coarseness, Hamiltonian cycles and induced paths and cycles. Finally, various embedding and packing problems are discussed, including the determination of the cubical dimension of a given cubical graph. I.
Chare Kernel  A Runtime Support System For Parallel Computations
, 1991
"... This paper presents the chare kernel system, which supports parallel computations with irregular structure. The chare kernel is a collection of primitive functions that manage chares, manipulate messages, invoke atomic computations, and coordinate concurrent activities. Programs written in the chare ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
This paper presents the chare kernel system, which supports parallel computations with irregular structure. The chare kernel is a collection of primitive functions that manage chares, manipulate messages, invoke atomic computations, and coordinate concurrent activities. Programs written in the chare kernel language can be executed on different parallel machines without change. Users writing such programs concern themselves with creation of parallel actions but not with assigning them to specific processors. We describe the design and implementation of the chare kernel. Performance of chare kernel programs on two hypercube machines, the Intel iPSC/2 and the NCUBE, is also given. 1. Introduction Large parallel computer systems are becoming increasingly available, and larger systems will be built in the near future [27]. For example, the NCUBE/2 with 8K processors has been commercially announced, and a 16K processor MIMD machine is being built [6, 12]. However, programming these machin...
Intensive Hypercube Communication: Prearranged Communication in LinkBound Machines
 Journal of Parallel and Distributed Computing
, 1990
"... Hypercube algorithms are developed for a variety of communicationintensive tasks such as transposing a matrix, histogramming, one node sending a (long) message to another, broadcasting a message from one node to all others, each node broadcasting a message to all others, and nodes exchanging messag ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
(Show Context)
Hypercube algorithms are developed for a variety of communicationintensive tasks such as transposing a matrix, histogramming, one node sending a (long) message to another, broadcasting a message from one node to all others, each node broadcasting a message to all others, and nodes exchanging messages via a fixed permutation. The algorithm for exchanging via a fixed permutation can be viewed as a deterministic analogue of Valiant's randomized routing. The algorithms are for linkbound hypercubes in which local processing time is ignored, communication time predominates, message headers are not needed because all nodes know the task being performed, and all nodes can use all communication links simultaneously. Through systematic use of techniques such as pipelining, batching, variable packet sizes, symmetrizing, and completing, for all problems algorithms are obtained which achieve a time with an optimal highestorder term. 1 Introduction This paper gives efficient hypercube algorith...
Iterative algorithms for solution of large sparse systems or linear equations on hypecubes
 IEEE Transactions on Computers
, 1988
"... ..."
DualCubes: A New Interconnection Network For HighPerformance Computer Clusters
, 2000
"... The binary hypercube, or ncube, has been widely used as the interconnection network in parallel computers. However, the major drawback of the hypercube is the increase in the number of communication links for each node with the increase in the total number of nodes in the system. This paper introdu ..."
Abstract

Cited by 26 (19 self)
 Add to MetaCart
The binary hypercube, or ncube, has been widely used as the interconnection network in parallel computers. However, the major drawback of the hypercube is the increase in the number of communication links for each node with the increase in the total number of nodes in the system. This paper introduces a new interconnection network for largescale distributed memory multiprocessors called dualcube. This network mitigates the problem of increasing number of links in the largescale hypercube network while keeps most of the topological properties of the hypercube network. We investigate the topological properties of the dualcube, compare them with other hypercubelike networks, and establish the basic routing and broadcasting algorithms for dualcubes. 1.
All pairs shortest paths on a hypercube multiprocessor
, 1987
"... this paper, we consider parallel solutions to the all pairs problem only. As with other studies, our development considers finding only the length of the shortest paths. We are interested in solving the all pairs problems on an MIMD hypercube in which each processor has local memory. Specifically, o ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
this paper, we consider parallel solutions to the all pairs problem only. As with other studies, our development considers finding only the length of the shortest paths. We are interested in solving the all pairs problems on an MIMD hypercube in which each processor has local memory. Specifically, our experimentation is done on an NCUBE/7 hypercube. A block diagram of this computer is shown in Fig 1.1 External Memory Host Processor Hypercube Processor Memory Host Memory Figure 1.1 Block diagram of the NCUBE hypercube A detailed description of the architecture of the NCUBE hypercube appears in [6]. This is an MIMD computer. Each hypercube processor is a custom 32 bit, 2 MIPS, 500,000 FLOPS processor. The local memory for each hypercube processor is either 128 K or 512 K bytes. In the configuration avaliable to us, each processor has 128 K bytes of local memory. The high level language support currently includes FORTRAN and C. Both have been extended to allow for communication between the host and the hypercube and among the hypercube processors. When designing a parallel algorithm for a shared memory machine such as the HEP, one focus on the creation of multiple processes that avoid memory access coflicts. In the case of a computer such as the hypercube the focus is on partitioning the data so as to reduce the time spent in inter processor communication. We consider two methods of partitioning the data for the all pairs problem. These are called: by stripes and by rectangles. These two methods are evaluated experimentally and are also compared with the alternative of running a single source sequential algorithm on each of the hypercube processors. 4 2 Floyd's All Pairs Algorithm Floyd's all pairs algorithm (see[3] for e.g.) is given in Fig 2.1. Here L [i, j ] is initial...
Optimal Processor Assignment for Pipeline Computations
 IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1994
"... The availability of largescale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual response times for different processor siz ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
(Show Context)
The availability of largescale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual response times for different processor sizes, find an assignment of processors to tasks. Two objectives interest us: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, we assume that a large number of processors are to be assigned to a relatively small number of tasks. In this paper we develop efficient assignment algorithms for different classes of task structures. For a p processor system and a seriesparallel precedence graph with n constituent tasks, we provide an O(np2) algorithm that find the optimal assignment for the response time optimization problem; we find the assignment optimizing the constrained throughput in O(np2logp)time. Special cases of linear, independent, and tree graphs are also considered. In addition, we also examine more efficient algorithms when certain restrictions are placed on the problem parameters. Our techniques are applied to a task system in computer vision.