MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  on Symmetric Multiprocessors

Download:
pdf | ps
by Arthur C. Smith, Andrew Shaw, Andrew Shaw
ftp://csg-ftp.lcs.mit.edu/pub/papers/theses/shaw-phd.ps.gz
Add To MetaCart

Abstract:

Shared-memory symmetric multiprocessors (SMP's) based on conventional microprocessors are by far the most common parallel architecture today, and will continue to be so for the forseeable future. This thesis describes techniques to compile and schedule Id-S, a dialect of the implicitly parallel language Id, for execution on SMP's. We show that previous implementations of Id for conventional microprocessors incurred an overhead of at least 40-300 % over an efficient sequential implementation of Id-S. We break down this overhead into various presence-tag checking and scheduling overheads. Given this overhead, we conclude that a fine-grained, element-wise synchronizing implementation of Id is not suitable for use on small-scale SMP's. We then describe a parallelization technique for Id-S that discovers both DAG and loop parallelism. Our parallelization exploits Id-S's single-assignment semantics for data structures. We show that for many programs, our technique can discover ample parallelism, without need for Id's traditional nonstrict, fine-grained, producer-consumer semantics. Because our parallelization eliminates the need for presence-tag checking and creates coarser-grained units of work, the parallelized codes only incur a

Citations

5825 Introduction to Algorithms – Cormen, Leiserson, et al. - 1992
1147 Tcl and the Tk Toolkit – Ousterhout - 1994
926 Active Messages: A mechanism for integrated communication and computation – Eicken, Culler, et al. - 1992
441 Optimizing Supercompilers for Supercomputers – Wolfe - 1989
359 The Tera Computer System – Alverson, Callahan, et al. - 1990
357 Multilisp: A language for concurrent symbolic computation – Halstead - 1985
352 The omega test: a fast and practical integer programming algorithm for dependence analysis – Pugh - 1991
339 Effective context-sensitive pointer analysis for C programs – Wilson, Lam - 1995
318 The Stanford FLASH Multiprocessor – Kuskin, Ofelt, et al. - 1994
299 Cilk: An efficient multithreaded runtime system – Blumofe, Joerg, et al. - 1995
260 Bulldog: A Compiler for VLIW Architectures – Ellis - 1985
230 Partitioning and Scheduling Parallel Programs for Multiprocessors – Sarkar - 1989
202 Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory – Scales, Gharachorloo, et al. - 1996
198 Lazy task creation: a technique for increasing the granularity of parallel programs – Mohr, Kranz, et al. - 1990
188 An efficient method of computing static single assignment form – CYTRON, FERRANTE, et al. - 1989
172 Architecture and applications of the HEP multiprocessor computer system – Smith - 1981
166 Fine-Grain Access Control for Distributed Shared Memory – Schoinas, Falsafi, et al. - 1994
163 The MIT Alewife Machine: Architecture and Performance – Agarwal, Bianchini, et al. - 1995
150 Control Flow Analysis in Scheme – Shivers - 1988
137 Practical dependence testing – Goff, Kennedy, et al. - 1991
137 The J-Machine multicomputer: an architectural evaluation – Noakes, Wallach, et al. - 1993
131 Improving Locality and Parallelism in Nested Loops – Wolf - 1992
127 The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors – Anderson, Lazowska, et al.
123 Executing a program on the MIT tagged-token dataflow architecture – Arvind, Nikhil - 1990
120 An efficient way to find the side effects of procedure calls and the aliases of variables – Banning - 1979
116 A library implementation of Posix threads under UNIX – Mueller - 1993
116 A multithreaded massively parallel architecture – unknown authors - 1992
110 Efficient and exact data dependence analysis – Maydan, Hennessy, et al.
106 The SUIF compiler for scalable parallel machines – Amarasinghe, Anderson, et al. - 1995
104 Complexity of interprocedural side-effect analysis – Cooper, Kennedy - 1987
96 The Structure of Computers and Computation – Kuck - 1978
89 Tools and Techniques for Building Fast Portable Threads Package – Keppel - 1993
88 Retire Fortran? A Debate Rekindled – Cann - 1992
81 The interprocedural analysis and automatic parallelization of Scheme programs – Harrison - 1989
75 The Manchester prototype dataflow computer – Gurd, Kirkham, et al. - 1985
73 Distributed Filaments: Efficient Fine-Grain Parallelism on a Cluster of Workstations – Freeh, Lowenthal, et al. - 1994
72 Fast interprocedural alias analysis – Cooper, Kennedy
66 Managing Interprocedural Optimization – Hall - 1991
58 a compiler controlled threaded abstract machine. Parallel and Computing – TAM - 1993
56 Two fundamental issues in multiprocessing – Arvind, Ianucci - 1987
45 Loop transformations for restructuring compilers – Banerjee - 1993
43 An architecture of a dataflow single chip processor – Sakai, Yamaguchi, et al. - 1989
43 A Compiler for the MIT Tagged-Token Dataflow Architecture – Traub - 1986
41 The Cilk system for Parallel Multithreaded Computing – Joerg - 1996
41 Implementation of a General Purpose Dataflow Multiprocessor – Papadopoulos - 1988
37 Amultithreaded implementation of Id using P-RISC graphs – Nikhil - 1993
36 Filaments: Efficient support for fine-grain parallelism – Engler, Andrews, et al.
36 Sequential implementation of lenient programming languages – Traub - 1988
35 Interprocedural Analysis for Parallelization – Hall, Murphy, et al. - 1995
33 Garbage Collection for Strongly-Typed Languages using Run-time Type Reconstruction – Aditya, Flood, et al. - 1994