(Enter summary)
Abstract: This dissertation discusses how to write computer programs that attain both high performance and
portability, despite the fact that current computer systems have different degrees of parallelism, deep
memory hierarchies, and diverse processor architectures.
To cope with parallelism portably in high-performance programs, we present the Cilk multithreaded
programming system. In the Cilk-5 system, parallel programs scale up to run efficiently
on multiple processors, but unlike existing... (Update)
Context of citations to this paper: More
...The result of automatic regulation of GC parallelism, in BH O2K. costs than on irregular programs. The Cilk performance model [1, 8] estimates the par allel running time of both regular and irregular programs that are executed in LTC strategy. We utilize this model to...
Cited by: More
Predicting Scalability of Parallel Garbage Collectors on.. - Endo, Taura, Yonezawa (2001)
(Correct)
Similar documents (at the sentence level): More
15.1%: Cilk: Efficient Multithreaded Computing - Randall (1998)
(Correct)
11.7%: The Implementation of the Cilk-5 Multithreaded Language - Frigo, Leiserson, Randall (1998)
(Correct)
8.3%: Cache-Oblivious Algorithms (Extended Abstract) - Frigo, Leiserson, Prokop..
(Correct)
Active bibliography (related documents): More All
12.2: Portable High-Performance Programs - Frigo (1992)
(Correct)
1.0: The Fastest Fourier Transform in the West - Frigo, Johnson (1997)
(Correct)
0.6: Dag-Consistent Distributed Shared Memory - Blumofe, Frigo, Joerg.. (1996)
(Correct)
Similar documents based on text: More All
1.2: FFTW for version 3.0 - Frigo, Johnson (2003)
(Correct)
0.8: Executing Multithreaded Programs Efficiently - Blumofe (1995)
(Correct)
0.4: Code Compaction and Parallelization for VLIW/DSP Chip.. - Tsvetomir Petrov..
(Correct)
Related documents from co-citation: More All
2: A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines
- Endo, Taura et al. - 1997
BibTeX entry: (Update)
Matteo Frigo. Portable High-Performance Pro- grams. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, June 1999. http://citeseer.ist.psu.edu/frigo99portable.html More
@techreport{ frigo99portable,
author = "M. Frigo",
title = "Portable High-Performance Programs",
number = "MIT/LCS/TR-785",
pages = "169",
year = "1999",
url = "citeseer.ist.psu.edu/frigo99portable.html" }
Citations (may not include all citations):
3972
Introduction to Algorithms (context) - CORMEN, LEISERSON et al. - 1990
2441
Johns Hopkins University Press (context) - GOLUB, VAN LOAN et al. - 1989
2003
The Art of Computer Programming (context) - KNUTH, Searching et al. - 1973
1575
Computer Architecture: a Quantitative Approach (context) - HENNESSY, PATTERSON - 1996
1450
The Design and Analysis of Computer Algorithms (context) - AHO, HOPCROFT et al. - 1974
981
Introduction to Parallel Algorithms and Architectures: Array.. (context) - LEIGHTON - 1992
837
Cambridge University Press (context) - MOTWANI, RAGHAVAN et al. - 1995
835
High performance Fortran language specification v
- FORTRAN - 1997
606
How to make a multiprocessor computer that correctly execute.. (context) - LAMPORT - 1979
488
Amortized efficiency of list update and paging rules (context) - SLEATOR, TARJAN - 1985
468
Memory consistency and event ordering in scalable shared-mem..
- GHARACHORLOO, LENOSKI et al. - 1990
447
MPI: The Complete Reference (context) - SNIR, OTTO et al. - 1995
443
Discrete-time Signal Processing (context) - OPPENHEIM, SCHAFER - 1989
422
Implementation and performance of Munin
- CARTER, BENNETT et al. - 1991
406
TreadMarks: Distributed shared memory on standard workstatio..
- KELEHER, COX et al. - 1994
394
The High Performance Fortran Handbook (context) - KOELBEL, LOVEMAN et al. - 1994
301
The Midway distributed shared memory system (context) - BERSHAD, ZEKAUSKAS et al. - 1993
300
Lazy release consistency for software distributed shared mem..
- KELEHER, COX et al. - 1992
293
Hierarchical correctness proofs for distributed algorithms
- LYNCH, TUTTLE - 1987
268
Tempest and Typhoon: User-level shared memory
- REINHARDT, LARUS et al. - 1994
232
A study of replacement algorithms for virtual storage comput.. (context) - BELADY - 1966
230
Cilk: An efficient multithreaded runtime system
- BLUMOFE, JOERG et al. - 1995
219
Bounds on multiprocessing timing anomalies (context) - GRAHAM - 1969
213
Weak ordering - new definition
- ADVE, HILL - 1990
176
Shared memory consistency models: A tutorial
- ADVE, GHARACHORLOO - 1996
176
Shared memory consistency models: A tutorial
- ADVE, GHARACHORLOO - 1995
174
The parallel evaluation of general arithmetic expressions (context) - BRENT - 1974
173
Lazy task creation: A technique for increasing the granulari..
- MOHR, KRANZ et al. - 1991
168
Gaussian elimination is not optimal (context) - STRASSEN - 1969
165
Memory access buffering in multiprocessors (context) - DUBOIS, SCHEURICH et al. - 1986
159
and tools (context) - AHO, SETHI et al. - 1986
157
Scheduling multithreaded computations by work stealing
- BLUMOFE, LEISERSON - 1994
145
CRL: High-performance allsoftware distributed shared memory
- JOHNSON, KAASHOEK et al. - 1995
144
PVM: Parallel Virtual Machine
- GEIST, BEGUELIN et al. - 1994
142
Solution of a problem in concurrent programming control (context) - DIJKSTRA - 1965
135
Computational Frameworks for the Fast Fourier Transform (context) - LOAN - 1992
123
Optimizing matrix multiply using PHiPAC: a portable
- BILMES, ASANOVI et al. - 1997
113
output complexity of sorting and related problems (context) - AGGARWAL, VITTER et al. - 1988
109
Advanced Compiler Design Implementation (context) - MUCHNICK - 1997
107
Scope consistency: A bridge between release consistency and ..
- IFTODE, SINGH et al. - 1996
90
Programming parallel algorithms
- BLELLOCH - 1996
82
A model for hierarchical memory (context) - AGGARWAL, ALPERN et al. - 1987
82
Hierarchical memory with block transfer (context) - AGGARWAL, CHANDRA et al. - 1987
81
The implementation of the Cilk-5 multithreaded language
- FRIGO, RANDALL et al. - 1998
79
The design and evaluation of a shared object system for dist..
- SCALES, LAM - 1994
72
Implementing and programming causal distributed shared memor.. (context) - AHAMAD, HUTTO et al. - 1991
66
A high-performance parallel Lisp (context) - KRANZ, HALSTEAD et al. - 1989
64
Cache-oblivious algorithms (context) - FRIGO, LEISERSON et al.
64
An algorithm for the machine computation of the complex Four.. (context) - COOLEY, TUKEY - 1965
62
An analysis of dag-consistent distributed shared-memory algo..
- BLUMOFE, FRIGO et al. - 1996
61
Department of Electrical Engineering and Computer Science (context) - BLUMOFE, Programs et al. - 1995
61
FFTs in external or hierarchical memory
- BAILEY - 1990
59
Deterministic distribution sort in shared and distributed me.. (context) - NODINE, VITTER - 1993
58
Nonlinear array layouts for hierarchical memory systems
- CHATTERJEE, JAIN et al. - 1999
45
Thread scheduling for multiprogrammed multiprocessors
- ARORA, BLUMOFE et al. - 1998
42
Auto-blocking matrix-multiplication or tracking blas3 perfor..
- FRENS, WISE - 1997
42
Proving sequential consistency of high-performance shared me.. (context) - GIBBONS, MERRITT et al. - 1991
38
Implementation of Multilisp: Lisp on a multiprocessor (context) - HALSTEAD - 1984
36
Multiprocessors should support simple memory consistency pro..
- HILL - 1998
36
The Cilk System for Parallel Multithreaded Computing
- JOERG - 1996
35
Uniform memory hierarchies (context) - ALPERN, CARTER et al. - 1990
34
Id: a language with implicit parallelism (context) - NIKHIL - 1990
34
Memory Consistency Models for Shared-Memory Multiprocessors
- GHARACHORLOO - 1995
33
Dag-consistent distributed shared memory
- BLUMOFE, FRIGO et al. - 1996
27
The implementation and evaluation of fusion and contraction ..
- LEWIS, LIN et al. - 1998
27
Recursive array layouts and fast parallel matrix multiplicat..
- CHATTERJEE, LEBECK et al. - 1999
26
Massachusetts Institute of Technology (context) - FRIGO, reasonable et al. - 1998
25
Massachusetts Institute of Technology (context) - PROKOP, algorithms et al. - 1999
25
Institut National de Recherche en Informatique at Automatiqu.. (context) - LEROY, Caml et al. - 1998
25
Fast Fourier transforms: a tutorial review and a state of th.. (context) - DUHAMEL, VETTERLI - 1990
24
The nofib benchmark suite of Haskell programs
- PARTAIN - 1992
23
Discrete Fourier transforms when the number of data samples .. (context) - RADER - 1968
23
The BLAZE language: A parallel language for scientific progr.. (context) - MEHROTRA, ROSENDALE - 1987
23
A Fortran to C converter
- FELDMAN, GAY et al. - 1995
23
Detecting data races in Cilk programs that use locks
- CHENG, FENG et al. - 1998
21
The design of optimal DFT algorithms using dynamic programmi.. (context) - JOHNSON, BURRUS - 1983
21
Specifying nonblocking shared memories (context) - GIBBONS, MERRITT - 1992
20
Discrete weighted transforms and large-integer arithmetic (context) - CRANDALL, FAGIN - 1994
20
Real-valued fast Fourier transform algorithms (context) - SORENSEN, JONES et al. - 1987
18
The function of FUNCTION in LISP or why the FUNARG problem s.. (context) - MOSES - 1970
16
LCM: Memory system support for parallel language implementat..
- LARUS, RICHARDS et al. - 1994
16
Polling efficiently on stock hardware
- FEELEY - 1993
16
The control mechanism for the Myrias parallel computer syste.. (context) - BELTRAMETTI, BOBEY et al. - 1988
16
An algorithm for computing the mixed radix fast Fourier tran.. (context) - SINGLETON - 1969
15
Efficient detection of determinacy races in Cilk programs
- FENG, LEISERSON - 1999
14
Lazy threads: Implementing a fast parallel call
- GOLDSTEIN, SCHAUSER et al. - 1996
13
Analysis of linear digital networks (context) - CROCHIERE, OPPENHEIM - 1975
12
A framework for generating distributed-memory parallel progr.. (context) - GUPTA, HUANG et al. - 1996
12
Portable performance of data parallel languages
- NGO, SNYDER et al. - 1997
11
Arrays in a lazy functional language---a case study: the fas..
- HARTEL, VREE - 1992
11
Implementation of a self-sorting in-place prime factor FFT a.. (context) - TEMPERTON - 1985
11
Empirical and analytic study of stack versus heap cost for l.. (context) - APPEL, SHAO - 1996
11
Parallel Symbolic Computing in Cid (context) - NIKHIL - 1995
11
Whole-program optimization for time and space efficient thre..
- GRUNWALD, NEVES - 1996
11
Computation-centric memory models
- FRIGO, LUCHANGCO - 1998
10
Available on the Internet from http://theory (context) - Reference - 1998
10
Extending the Hong-Kung model to memory hierarchies (context) - SAVAGE - 1995
9
The effect of page allocation on caches (context) - LYNCH, BRAY et al. - 1992
8
coding methodology and its application to matrix multiply (context) - BILMES, ASANOVI et al. - 1996
8
IEEE Scalable Coherent Interface (context) - GOODMAN, consistency et al. - 1989
7
On testing cache-coherent shared memories (context) - GIBBONS, KORACH - 1994
7
Automatic generation of prime length FFT programs
- SELESNICK, BURRUS - 1996
7
The Hartley Transform (context) - BRACEWELL - 1986
6
MIT Artificial Intelligence Laboratory (context) - MILLER, ROZAS et al. - 1994
6
Parallel Computations (context) - SWARZTRAUBER, FFTs - 1982
6
ACM Transactions on Programming Languages and Systems (context) - for, computation - 1985
5
Precedence-based memory models (context) - LUCHANGCO - 1997
5
GNU Scientific Library---Reference Manual (context) - GALASSI, DAVIES et al. - 1999
5
Journal of Computational Physics (context) - set, small-n et al. - 1988
5
stack mechanism (context) - HAUCK, DENT - 1968
5
Heaps o' stacks: Time and space efficient threads without op.. (context) - GRUNWALD - 1994
5
complexity: the red-blue pebbling game (context) - HONG, KUNG - 1981
5
Optimal parallel merging and sorting without memory conflict.. (context) - AKL, SANTORO - 1987
5
Department of Electrical Engineering and Computer Science (context) - MILLER, preprocessor et al. - 1995
4
FFT algorithms for prime transform sizes and their implement.. (context) - LU, COOLEY et al. - 1993
4
Programs for Digital Signal Processing (context) - COMMITTEE - 1979
4
user's manual (context) - MOTOROLA
4
Personal communication (context) - LISIECKI, MEDINA - 1998
4
Operating Systems Theory (context) - JR, DENNING - 1973
4
A prime factor FFT algorithm implementation using a program .. (context) - PEREZ, TAKAOKA - 1987
4
Testing multivariate linear functions: Overcoming the genera.. (context) - UN - 1995
4
The Art of Computer Programming (context) - Algorithms, of - 1998
3
The Fast Fourier Transform algorithm and its applications (context) - COOLEY, LEWIS et al. - 1967
3
IEEE Transactions on Computers (context) - BLUM, WASSERMAN et al. - 1996
3
Journal of Parallel and Distributed Computing
- An, runtime - 1996
3
Factorization method for crystallographic Fourier transforms (context) - AN, COOLEY et al. - 1990
3
FOURGEN: a fast Fourier transform program generator (context) - MARUHN - 1976
3
an implicitly parallel lambda-calculus with letrec (context) - ARVIND, MAESSEN et al. - 1996
3
MIT Computation Structures Group (context) - NIKHIL, HICKS et al. - 1995
3
Performance nonmonotonicities: A case study of the UltraSPAR..
- KUSHMAN - 1998
3
Cilk: Efficient Multithreaded Computing
- RANDALL - 1998
2
The case for high level parallel programmin in zpl (context) - CHAMBERLAIN, CHOI et al. - 1998
2
The Tragical History of Doctor Faustus (context) - MARLOWE
2
Speech and Signal Processing (context) - SORENSEN, HEIDEMAN et al. - 1986
2
VLSI support for a cactus stack oriented memory organization (context) - OM - 1988
Documents on the same site (http://supertech.lcs.mit.edu/cilk/papers/): More
Portable Fault-Tolerant File I/O - Lyubashevskiy
(Correct)
Debugging Multithreaded Programs that Incorporate User-Level Locking - Stark (1998)
(Correct)
Scheduling Adaptively Parallel Jobs - Song (1998)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC