Shared-memory symmetric multiprocessors (SMP's) based on conventional microprocessors are by far the most common parallel architecture today, and will continue to be so for the forseeable future. This thesis describes techniques to compile and schedule Id-S, a dialect of the implicitly parallel language Id, for execution on SMP's. We show that previous implementations of Id for conventional microprocessors incurred an overhead of at least 40-300 % over an efficient sequential implementation of Id-S. We break down this overhead into various presence-tag checking and scheduling overheads. Given this overhead, we conclude that a fine-grained, element-wise synchronizing implementation of Id is not suitable for use on small-scale SMP's. We then describe a parallelization technique for Id-S that discovers both DAG and loop parallelism. Our parallelization exploits Id-S's single-assignment semantics for data structures. We show that for many programs, our technique can discover ample parallelism, without need for Id's traditional nonstrict, fine-grained, producer-consumer semantics. Because our parallelization eliminates the need for presence-tag checking and creates coarser-grained units of work, the parallelized codes only incur a
|
5825
|
Introduction to Algorithms
– Cormen, Leiserson, et al.
- 1992
|
|
1147
|
Tcl and the Tk Toolkit
– Ousterhout
- 1994
|
|
926
|
Active Messages: A mechanism for integrated communication and computation
– Eicken, Culler, et al.
- 1992
|
|
441
|
Optimizing Supercompilers for Supercomputers
– Wolfe
- 1989
|
|
359
|
The Tera Computer System
– Alverson, Callahan, et al.
- 1990
|
|
357
|
Multilisp: A language for concurrent symbolic computation
– Halstead
- 1985
|
|
352
|
The omega test: a fast and practical integer programming algorithm for dependence analysis
– Pugh
- 1991
|
|
339
|
Effective context-sensitive pointer analysis for C programs
– Wilson, Lam
- 1995
|
|
318
|
The Stanford FLASH Multiprocessor
– Kuskin, Ofelt, et al.
- 1994
|
|
299
|
Cilk: An efficient multithreaded runtime system
– Blumofe, Joerg, et al.
- 1995
|
|
260
|
Bulldog: A Compiler for VLIW Architectures
– Ellis
- 1985
|
|
230
|
Partitioning and Scheduling Parallel Programs for Multiprocessors
– Sarkar
- 1989
|
|
202
|
Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory
– Scales, Gharachorloo, et al.
- 1996
|
|
198
|
Lazy task creation: a technique for increasing the granularity of parallel programs
– Mohr, Kranz, et al.
- 1990
|
|
188
|
An efficient method of computing static single assignment form
– CYTRON, FERRANTE, et al.
- 1989
|
|
172
|
Architecture and applications of the HEP multiprocessor computer system
– Smith
- 1981
|
|
166
|
Fine-Grain Access Control for Distributed Shared Memory
– Schoinas, Falsafi, et al.
- 1994
|
|
163
|
The MIT Alewife Machine: Architecture and Performance
– Agarwal, Bianchini, et al.
- 1995
|
|
150
|
Control Flow Analysis in Scheme
– Shivers
- 1988
|
|
137
|
Practical dependence testing
– Goff, Kennedy, et al.
- 1991
|
|
137
|
The J-Machine multicomputer: an architectural evaluation
– Noakes, Wallach, et al.
- 1993
|
|
131
|
Improving Locality and Parallelism in Nested Loops
– Wolf
- 1992
|
|
127
|
The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors
– Anderson, Lazowska, et al.
|
|
123
|
Executing a program on the MIT tagged-token dataflow architecture
– Arvind, Nikhil
- 1990
|
|
120
|
An efficient way to find the side effects of procedure calls and the aliases of variables
– Banning
- 1979
|
|
116
|
A library implementation of Posix threads under UNIX
– Mueller
- 1993
|
|
116
|
A multithreaded massively parallel architecture
– unknown authors
- 1992
|
|
110
|
Efficient and exact data dependence analysis
– Maydan, Hennessy, et al.
|
|
106
|
The SUIF compiler for scalable parallel machines
– Amarasinghe, Anderson, et al.
- 1995
|
|
104
|
Complexity of interprocedural side-effect analysis
– Cooper, Kennedy
- 1987
|
|
96
|
The Structure of Computers and Computation
– Kuck
- 1978
|
|
89
|
Tools and Techniques for Building Fast Portable Threads Package
– Keppel
- 1993
|
|
88
|
Retire Fortran? A Debate Rekindled
– Cann
- 1992
|
|
81
|
The interprocedural analysis and automatic parallelization of Scheme programs
– Harrison
- 1989
|
|
75
|
The Manchester prototype dataflow computer
– Gurd, Kirkham, et al.
- 1985
|
|
73
|
Distributed Filaments: Efficient Fine-Grain Parallelism on a Cluster of Workstations
– Freeh, Lowenthal, et al.
- 1994
|
|
72
|
Fast interprocedural alias analysis
– Cooper, Kennedy
|
|
66
|
Managing Interprocedural Optimization
– Hall
- 1991
|
|
58
|
a compiler controlled threaded abstract machine. Parallel and Computing
– TAM
- 1993
|
|
56
|
Two fundamental issues in multiprocessing
– Arvind, Ianucci
- 1987
|
|
45
|
Loop transformations for restructuring compilers
– Banerjee
- 1993
|
|
43
|
An architecture of a dataflow single chip processor
– Sakai, Yamaguchi, et al.
- 1989
|
|
43
|
A Compiler for the MIT Tagged-Token Dataflow Architecture
– Traub
- 1986
|
|
41
|
The Cilk system for Parallel Multithreaded Computing
– Joerg
- 1996
|
|
41
|
Implementation of a General Purpose Dataflow Multiprocessor
– Papadopoulos
- 1988
|
|
37
|
Amultithreaded implementation of Id using P-RISC graphs
– Nikhil
- 1993
|
|
36
|
Filaments: Efficient support for fine-grain parallelism
– Engler, Andrews, et al.
|
|
36
|
Sequential implementation of lenient programming languages
– Traub
- 1988
|
|
35
|
Interprocedural Analysis for Parallelization
– Hall, Murphy, et al.
- 1995
|
|
33
|
Garbage Collection for Strongly-Typed Languages using Run-time Type Reconstruction
– Aditya, Flood, et al.
- 1994
|