We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its performance under multiprogramming. We model multiprogramming with two scheduling levels: our scheduler runs at user-level and schedules threads onto a fixed collection of processes, while below, the operating-system kernel schedules processes onto a fixed collection of processors. We consider the kernel to be an adversary, and our goal is to schedule threads onto processes such that we make efficient use of whatever processor resources are provided by the kernel. Our thread scheduler is a non-blocking implementation of the work-stealing algorithm. For any multithreaded computation with work T1 and critical-path length T1, and for any number P of processes, our scheduler executes the computation in expected time O(T1=PA + T1P=PA), where PA is the average number of processors allocated to the computation by the kernel. This time bound is optimal to within a constant factor, and achieves linear speedup whenever P is small relative to the average parallelism T1=T1.
|
1289
|
The Probabilistic Method
– Alon, Spencer, et al.
- 1992
|
|
719
|
The Java Programming Language
– Arnold, Gosling
- 1996
|
|
498
|
Wait-Free Synchronization
– Herlihy
- 1991
|
|
299
|
Cilk: An efficient multithreaded runtime system
– Blumofe, Joerg, et al.
- 1995
|
|
261
|
A methodology for implementing highly concurrent data objects
– Herlihy
- 1993
|
|
252
|
A New Kernel Foundation For UNIX Development
– Mach
- 1986
|
|
240
|
Scheduling Techniques for Concurrent Systems
– Ousterhout
- 1982
|
|
236
|
Cooperating Sequential Processes
– DIJKSTRA
- 1968
|
|
228
|
C.E.: Scheduling Multithreaded Computations by Work Stealing
– Blumofe, Leiserson
- 1994
|
|
224
|
The parallel evaluation of general arithmetic expressions,J
– Brent
- 1974
|
|
205
|
Process Control and Scheduling Issues for Multiprogrammed SharedMemory Multiprocessors
– Tucker, Gupta
- 1989
|
|
198
|
Lazy task creation: a technique for increasing the granularity of parallel programs
– Mohr, Kranz, et al.
- 1990
|
|
164
|
Application Performance and Flexibility on Exokernel Systems
– Kaashoek, Engler, et al.
- 1997
|
|
137
|
The impact of operating system scheduling policies and synchronization methods of performance of parallel applications
– Gupta, Tucker, et al.
- 1991
|
|
132
|
Np-complete scheduling problems
– Ullman
- 1975
|
|
124
|
The implementation of the Cilk-5 multithreaded language
– Frigo, Leiserson, et al.
- 1998
|
|
101
|
Effective Distributed Scheduling of Parallel Workloads
– Dusseau, Arpaci, et al.
- 1996
|
|
89
|
An Analysis Of Dag-Consistent Distributed Shared-Memory Algorithms
– Blumofe, Frigo, et al.
- 1996
|
|
86
|
DIB—a distributed implementation of backtracking
– Finkel, Manber
- 1987
|
|
84
|
Axioms for Concurrent Objects
– Herlihy, Wing
- 1987
|
|
81
|
Parallel visualization algorithms: performance and architectural implications
– Singh, Gupta, et al.
- 1994
|
|
78
|
Workcrews: an abstraction for controlling parallelism
– Vandevoorde, Roberts
- 1988
|
|
73
|
Distributed Filaments: Efficient Fine-Grain Parallelism on a Cluster of Workstations
– Freeh, Lowenthal, et al.
- 1994
|
|
70
|
Demand-based Coscheduling of Parallel Jobs on Multiprogrammed Multiprocessors
– Sobalvarro
- 1997
|
|
57
|
The Network Architecture of the Connection Machine CM-5. (Extended Abstract
– Pierre, Wong, et al.
- 1992
|
|
57
|
Programming with Threads
– Kleiman, Shah, et al.
- 1996
|
|
54
|
Practical implementations of non-blocking synchronization primitives
– Moir
- 1997
|
|
51
|
Implementation of multilisp: Lisp on a multiprocessor
– Halstead, J
- 1984
|
|
48
|
Provably efficient scheduling for languages with fine-grained parallelism
– Blelloch, Gibbons, et al.
- 1995
|
|
32
|
Coscheduling Based on Runtime Identification of Activity Working Sets
– Feitelson, Rudolph
- 1995
|
|
29
|
Lazy Threads: Implementing a Fast Parallel Call
– Goldstein, Schauser, et al.
- 1996
|
|
24
|
Spaceefficient scheduling of parallelism with synchronization variables
– Blelloch, Gibbons, et al.
- 1997
|
|
22
|
Lazy Threads: Implementing a fast parallel call
– Goldstein, Schauser, et al.
- 1996
|
|
15
|
Evangelos Markatos. Multiprogramming on multiprocessors
– Crovella, Das, et al.
- 1991
|
|
13
|
The performance of work stealing in multiprogrammed environments (extended abstract
– Blumofe, Papadopoulos
- 1998
|
|
12
|
Space efficient execution of deterministic parallel programs
– Burton, Simpson
- 1994
|
|
5
|
Verification of a concurrent deque implementation
– Blumofe, Plaxton, et al.
- 1999
|
|
3
|
Blumofe and Dionisios Papadopoulos. Hood: A user-level threads library for multiprogrammed multiprocessors. http://www.cs.utexas.edu/users/hood
– Robert
- 1999
|
|
2
|
Guaranteeing good space bounds for parallel programs
– Burton
- 1992
|