Results 1 - 10
of
169
Cilk: An Efficient Multithreaded Runtime System
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a C ..."
Abstract
-
Cited by 430 (34 self)
- Add to MetaCart
Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and critical-path length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
Fortran D Language Specification
, 1990
"... This paper presents Fortran D, a version of Fortran enhanced with data decomposition specifications. It is designed to support two fundamental stages of writing a data-parallel program: problem mapping using sophisticated array alignments, and machine mapping through a rich set of data distribution ..."
Abstract
-
Cited by 278 (47 self)
- Add to MetaCart
This paper presents Fortran D, a version of Fortran enhanced with data decomposition specifications. It is designed to support two fundamental stages of writing a data-parallel program: problem mapping using sophisticated array alignments, and machine mapping through a rich set of data distribution functions. We believe that Fortran D provides a simple machine-independent programming model for most numerical computations. We intend to evaluate its usefulness for both programmers and advanced compilers on a variety of parallel architectures.
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract
-
Cited by 163 (7 self)
- Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Automatic Data Partitioning on Distributed Memory Multiprocessors
, 1991
"... An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper ..."
Abstract
-
Cited by 102 (6 self)
- Add to MetaCart
An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper, we present a novel approach to the problem of automatic data partitioning. We introduce the notion of constraints on data distribution, and show how, based on performance considerations, a compiler identifies constraints to be imposed on the distribution of various data structures. These constraints are then combined by the compiler to obtain a complete and consistent picture of the data distribution scheme, one that offers good performance in terms of the overall execution time.
Exploiting Task and Data Parallelism on a Multicomputer
- In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 1993
"... For many applications, achieving good performance on a private memory parallel computer requires exploiting data parallelism as well as task parallelism. Depending on the size of the input data set and the number of nodes (i.e., processors), different tradeoffs between task and data parallelism are ..."
Abstract
-
Cited by 86 (21 self)
- Add to MetaCart
For many applications, achieving good performance on a private memory parallel computer requires exploiting data parallelism as well as task parallelism. Depending on the size of the input data set and the number of nodes (i.e., processors), different tradeoffs between task and data parallelism are appropriate for a parallel system. Most existing compilers focus on only one of data parallelism and task parallelism. Therefore, to achieve the desired results, the programmer must separately program the data and task parallelism. We have taken a unified approach to exploiting both kinds of parallelism in a single framework with an existing language. This approach eases the task of programming and exposes the tradeoffs between data and task parallelism to the compiler. We have implemented a parallelizing Fortran compiler for the iWarp system based on this approach. We discuss the design of our compiler, and present performance results to validate our approach. 1 Introduction Many applicati...
Compiler Support for Machine-Independent Parallel Programming in Fortran D
, 1991
"... Because of the complexity and variety of parallel architectures, an efficient machine-independent parallel programming model is needed to make parallel computing truly usable for scientific programmers. We believe that Fortran D, a version of Fortran enhanced with data decomposition specifications, ..."
Abstract
-
Cited by 76 (16 self)
- Add to MetaCart
Because of the complexity and variety of parallel architectures, an efficient machine-independent parallel programming model is needed to make parallel computing truly usable for scientific programmers. We believe that Fortran D, a version of Fortran enhanced with data decomposition specifications, can provide such a programming model. This paper presents the design of a prototype Fortran D compiler for the iPSC/860, a MIMD distributed-memory machine. Issues addressed include data decomposition analysis, guard introduction, communications generation and optimization, program transformations, and storage assignment. A test suite of scientific programs will be used to evaluate the effectiveness of both the compiler technology and programming model for the Fortran D compiler.
The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms
- IEEE MICRO
, 1992
"... The Message-Driven Processor (MDP) is an integrated multicomputer node that provides efficient mechanisms for parallel computing. It incorporates a 36-bit integer processor, a memory management unit, a router for a 3-D mesh network, a network interface, a 4K-word x 36-bit SRAM, and an ECC DRAM co ..."
Abstract
-
Cited by 68 (8 self)
- Add to MetaCart
The Message-Driven Processor (MDP) is an integrated multicomputer node that provides efficient mechanisms for parallel computing. It incorporates a 36-bit integer processor, a memory management unit, a router for a 3-D mesh network, a network interface, a 4K-word x 36-bit SRAM, and an ECC DRAM controller in a single 1.1M transistor VLSI chip. Rather than being specialized for a single model of computation, the MDP incorporates efficient primitive mechanisms for communication, synchronization and naming. These mechanisms efficiently support most proposed parallel programming models. Each processing node of the MIT J-Machine consists of an MDP with 1 MByte of DRAM. MDPs have been operational since June 1991 and J-Machines built from them have been on-line since July 1991.
Architecture and Applications of the Connection Machine
- IEEE Computer
, 1988
"... F the use ot computers affects increasingly broader segments of computer systems increase steadily. For-tunately, the technology base for the last twenty years has continued to improve at a steady rate-increasing in capacity and speed while decreasing in cost for perfor-mance. However, the demands o ..."
Abstract
-
Cited by 56 (0 self)
- Add to MetaCart
F the use ot computers affects increasingly broader segments of computer systems increase steadily. For-tunately, the technology base for the last twenty years has continued to improve at a steady rate-increasing in capacity and speed while decreasing in cost for perfor-mance. However, the demands outpace the technology. This raises the question, can we make a quantum leap in perfor-mance while the rate of technology improvement remains relatively constant? Computer architects have followed two general approaches in response to this question. The first uses exotic technology in a fairly conventional serial computer architecture. This approach suffers from manufacturing and maintenance problems and high costs. The second approach exploits the parallelism inherent in many problems. The parallel approach seems to offer the best long-term strategy because, as the problems grow, more and more opportunities arise to exploit the parallel-ism inherent in the data itself. Where do we find the inherent parallel-ism and how do we exploit it? Most com-puter programs consist of a control sequence (the instructions) and a collection of data elements. Large programs have tens of thousands of instructions operat-
Representing and Executing Agent-Based Systems
- Intelligent Agents: Theories, Architectures, and Languages (LNAI Volume 890
, 1994
"... In this paper we describe an approach to the representation and implementation of agent-based systems where the behaviour of an individual agent is represented by a set of logical rules in a particular form. This not only provides a logical specification of the agent, but also allows us to direct ..."
Abstract
-
Cited by 50 (9 self)
- Add to MetaCart
In this paper we describe an approach to the representation and implementation of agent-based systems where the behaviour of an individual agent is represented by a set of logical rules in a particular form. This not only provides a logical specification of the agent, but also allows us to directly execute the rules in order to implement the agent's behaviour. Agents communicate with each other through a simple, and logically well-founded, broadcast communication mechanism.
C**: A Large-Grain, Object-Oriented, Data-Parallel Programming Language
- LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING (5TH INTERNATIONAL WORKSHOP
, 1992
"... C** is a new data-parallel programming language based on a new computation model called large-grain data parallelism. C** overcomes many disadvantages of existing data-parallel languages, yet retains their distinctive and advantageous programming style and deterministic behavior. This style makes da ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
C** is a new data-parallel programming language based on a new computation model called large-grain data parallelism. C** overcomes many disadvantages of existing data-parallel languages, yet retains their distinctive and advantageous programming style and deterministic behavior. This style makes data parallelism well-suited for massively-parallel computation. Large-grain data parallelism enhances data parallelism by permitting a wider range of algorithms to be expressed naturally. C is an object-oriented programming language that inherits data abstraction features from C++. Existing scientific programming languages do not provide modern programming facilities such as operator extensibility, abstract datatypes, or object-oriented programming. C ---and its sequential subset C ++ ---support modern programming practices and enable a single language to be used for all parts of large, complex programs and libraries. This technical report consists of three parts. The body of t...

