Results 1 - 10
of
55
Cilk: An Efficient Multithreaded Runtime System
, 1995
"... Cilk (pronounced “silk”) is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the “work” and “critical path ” of a Cilk co ..."
Abstract
-
Cited by 763 (33 self)
- Add to MetaCart
(Show Context)
Cilk (pronounced “silk”) is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the “work” and “critical path ” of a Cilk computation can be used to accurately model performance. Consequently, a Cilk programmer can focus on reducing the work and critical path of his computation, insulated from load balancing and other rtmtime scheduling issues. We also prove that for the class of “fully strict ” (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk rmrtime system currently runs on the Connection Machine CM5 MPP, the Intel Paragon MPP, the Silicon Graphics Power Challenge SMP, and the MIT Phish network of workstations. Applications written in Cilk include protein folding, graphic rendering, backtrack search, and the *Socrates chess program, which won third prize in the 1994 ACM International Computer Chess Championship.
CRL: High-Performance All-Software Distributed Shared Memory
, 1995
"... This paper introduces the C Region Library (CRL), a new all-software distributed shared memory (DSM) system. CRL requires no special compiler, hardware, or operating system support beyond the ability to send and receive messages. It provides a simple, portable shared address space programming model ..."
Abstract
-
Cited by 208 (13 self)
- Add to MetaCart
This paper introduces the C Region Library (CRL), a new all-software distributed shared memory (DSM) system. CRL requires no special compiler, hardware, or operating system support beyond the ability to send and receive messages. It provides a simple, portable shared address space programming model that is capable of delivering good performance on a wide range of multiprocessor and distributed system architectures. We have developed CRL implementations for two platforms: the CM-5, a commercial multicomputer, and the MIT Alewife machine, an experimental multiprocessor offering efficient support for both message passing and shared memory. We present results for up to 128 processors on the CM-5 and up to 32 processors on Alewife. In a set of controlled experiments, we demonstrate that CRL is the first all-software DSM system capable of delivering performance competitive with hardware DSMs. CRL achieves speedups within 30% of those provided by Alewife's native support for shared memory, eve...
Supporting Dynamic Data Structures on Distributed-Memory Machines
, 1995
"... this article, we describe an execution model for supporting programs that use pointer-based dynamic data structures. This model uses a simple mechanism for migrating a thread of control based on the layout of heap-allocated data and introduces parallelism using a technique based on futures and lazy ..."
Abstract
-
Cited by 166 (8 self)
- Add to MetaCart
this article, we describe an execution model for supporting programs that use pointer-based dynamic data structures. This model uses a simple mechanism for migrating a thread of control based on the layout of heap-allocated data and introduces parallelism using a technique based on futures and lazy task creation. We intend to exploit this execution model using compiler analyses and automatic parallelization techniques. We have implemented a prototype system, which we call Olden, that runs on the Intel iPSC/860 and the Thinking Machines CM-5. We discuss our implementation and report on experiments with five benchmarks.
Software Caching and Computation Migration in Olden
, 1995
"... The goal of the Olden project is to build a system that provides parallelism for general purpose C programs with minimal programmer annotations. We focus on programs using dynamic structures such as trees, lists, and DAGs. We demonstrate that providing both software caching and computation migratio ..."
Abstract
-
Cited by 105 (0 self)
- Add to MetaCart
The goal of the Olden project is to build a system that provides parallelism for general purpose C programs with minimal programmer annotations. We focus on programs using dynamic structures such as trees, lists, and DAGs. We demonstrate that providing both software caching and computation migration can improve the performance of these programs, and provide a compile-time heuristic that selects between them for each pointer dereference. We have implemented a prototype system on the Thinking Machines CM-5. We describe our implementation and report on experiments with ten benchmarks.
Heterogeneous Process Migration: The Tui System
- Software Practice and Experience
, 1997
"... Heterogeneous Process Migration is a technique whereby an active process is moved from one machine to another. It must then continue normal execution and communication. The source and destination processors can have a different architecture, that is, different instruction sets and data formats. B ..."
Abstract
-
Cited by 78 (0 self)
- Add to MetaCart
(Show Context)
Heterogeneous Process Migration is a technique whereby an active process is moved from one machine to another. It must then continue normal execution and communication. The source and destination processors can have a different architecture, that is, different instruction sets and data formats. Because of this heterogeneity, the entire process memory image must be translated during the migration. "Tui" is a migration system that is able to translate the memory image of a program (written in ANSI-C) between four common architectures (m68000, SPARC, i486 and PowerPC). This requires detailed knowledge of all data types and variables used with the program. This is not always possible in non-type-safe (but popular) languages such as ANSI-C, Pascal and Fortran. The important features of the Tui algorithm are discussed in great detail. This includes the method by which a program's entire set of data values can be located, and eventually reconstructed on the target processor. Perfo...
Higher-Order Distributed Objects
, 1995
"... IONS 3.1 Scheme 48 Kali Scheme is implemented as an extension to Scheme 48 [Kelsey and Rees 1994], an implementation of Scheme [Clinger and Rees 1991]. Scheme is a lexically scoped dialect of Lisp. Scheme 48 is based on as byte-coded interpreter written in a highly optimized, restricted dialect of ..."
Abstract
-
Cited by 62 (4 self)
- Add to MetaCart
IONS 3.1 Scheme 48 Kali Scheme is implemented as an extension to Scheme 48 [Kelsey and Rees 1994], an implementation of Scheme [Clinger and Rees 1991]. Scheme is a lexically scoped dialect of Lisp. Scheme 48 is based on as byte-coded interpreter written in a highly optimized, restricted dialect of Scheme called Pre-Scheme, which compiles to C. Because of the way it is implemented, the system is very portable and is reasonably efficient for an interpreted system. 2 Unlike other Scheme implementations, 2 Scheme 48 is roughly 10-15 times slower slower than a highly optimized Scheme compiler generating native code [Kranz et al. 1986]. (define-record-type thread : : : continuation : : : ) (define current-thread : : : ) (define (spawn thunk) (let ((thread (make-thread))) (set-thread-continuation! thread (lambda (ignore) (thunk) (terminate-current-thread))) (context-switch thread))) (define (context-switch thread) (add-to-queue! runnable-threads current-thread) (switch-to-thread thre...
Concert -- Efficient Runtime Support for Concurrent Object-Oriented Programming Languages on Stock Hardware
, 1993
"... Inefficient implementations of global namespaces, message passing, and thread scheduling on stock multicomputers have prevented concurrent object-oriented programming (COOP) languages from gaining widespread acceptance. Recognizing that the architectures of stock multicomputers impose a hierarchy of ..."
Abstract
-
Cited by 59 (11 self)
- Add to MetaCart
Inefficient implementations of global namespaces, message passing, and thread scheduling on stock multicomputers have prevented concurrent object-oriented programming (COOP) languages from gaining widespread acceptance. Recognizing that the architectures of stock multicomputers impose a hierarchy of costs for these operations, we have described a runtime system which provides different versions of each primitive, exposing performance distinctions for optimization. We confirm the advantages of a cost-hierarchy based runtime system organization by showing a variation of two orders of magnitude in version costs for a CM5 implementation. Frequency measurements based on COOP application programs demonstrate that a 39 % invocation cost reduction is feasible by simply selecting cheaper versions of runtime operations.
The Cilk System for Parallel Multithreaded Computing
, 1996
"... Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications whose communication patterns are either highly irregular or dependent upon dynamic information. Multithreading has become an increasingly popular way to implement these dynamic, asynchronous, concurrent programs. Cilk (pronounced "silk") is our C-based multithreaded computing system that provides provably good performance guarantees. This thesis describes the evolution of the Cilk language and runtime system, and describes applications which affected the evolution of the system.
Space-Efficient Scheduling of Nested Parallelism
- ACM Transactions on Programming Languages and Systems
, 1999
"... This article presents an on-line scheduling algorithm that is provably space e#cient and time e#cient for nested-parallel languages. For a computation with depth D and serial space requirement S1 , the algorithm generates a schedule that requires at most S1 +O(K D p)space (including scheduler spa ..."
Abstract
-
Cited by 35 (6 self)
- Add to MetaCart
(Show Context)
This article presents an on-line scheduling algorithm that is provably space e#cient and time e#cient for nested-parallel languages. For a computation with depth D and serial space requirement S1 , the algorithm generates a schedule that requires at most S1 +O(K D p)space (including scheduler space) on p processors. Here, K is a user-adjustable runtime parameter specifying the net amount of memory that a thread may allocate before it is preempted by the scheduler. Adjusting the value of K provides a trade-o# between the running time and the memory requirement of a parallel computation. To allow the scheduler to scale with the number of processors, we also parallelize the scheduler and analyze the space and time bounds of the computation to include scheduling costs. In addition to showing that the scheduling algorithm is space and time e#cient in theory, we demonstrate that it is e#ective in practice. We have implemented a runtime system that uses our algorithm to schedule lightweight parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance
Runtime Optimizations for a Java DSM Implementation
- In Proc. of the 2001 Joint ACM-ISCOPE Conf. on Java Grande
, 2001
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
-
Cited by 34 (11 self)
- Add to MetaCart
(Show Context)
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.