(Enter summary)
Abstract: this paper, we presented and evaluated a new runtime
algorithm to parallelize these loops. Our scheme handles any type of data dependence pattern without
requiring any special architectural support. Furthermore, compared to an older scheme with the same
generality, it speeds up execution by allowing the reuse of the inspector phase across loop invocations,
allowing partial overlap of dependent iterations, and optimizing the inspector for high locality and low
communication.
We have evaluated... (Update)
Similar documents based on text: More All
0.1: Fine-Grained Dynamic Instrumentation of Commodity Operating.. - Tamches, Miller (1999)
(Correct)
0.1: One Sense per Collocation and Genre/Topic Variations - Martinez, Agirre (2000)
(Correct)
0.1: The Flask Security Architecture: System Support.. - Spencer, Smalley, .. (1998)
(Correct)
Related documents from co-citation: More All
3: Compiler algorithm for Synchronization (context) - Midkiff, Padua - 1987
2: Loop transformations for restructuring compilers: the foundations (context) - Banerjee - 1993
2: Automatic synchronization elimination in synchronous FORALLs
- Philippsen, Heinz - 1995
BibTeX entry: (Update)
D. Chen. Compiler Optimizations for Parallel Loops with Fine-grained Synchronization. PhD thesis, Dept. of Computer Science, Univeristy of Illinois at Urbana-Champaign, 1994. http://citeseer.ist.psu.edu/chen94compiler.html More
@techreport{ chen94compiler,
author = "Ding-Kai Chen",
title = "Compiler optimizations for parallel loops with fine-grained synchronization",
number = "UIUCDCS-R-94-1863",
year = "1994",
url = "citeseer.ist.psu.edu/chen94compiler.html" }
Citations (may not include all citations):
4212
Computers and Intractability: A Guide to the Theory of NPCom.. (context) - Garey, Johnson - 1979
1254
Computational Geometry: An Introduction (context) - Preparata, Shamos - 1985
405
Depth first search and linear graph algorithms (context) - Tarjan - 1972
357
The directorybased cache coherence protocol for the DASH mul.. (context) - Lenoski, Laudon et al. - 1990
353
Software pipelining: An effective scheduling technique for v.. (context) - Lam - 1988
299
Dependence Analysis for Supercomputing (context) - Banerjee - 1988
294
A loop transformation theory and an algorithm to maximize pa.. (context) - Wolf, Lam - 1991
277
Advanced compiler optimization for supercomputers (context) - Padua, Wolfe - 1986
260
Validity of the single processor approach to achieving large.. (context) - Amdahl - 1967
234
Multilisp: A language for concurrent symbolic computation (context) - Jr - 1985
217
The PERFECT club benchmarks: Effective performance evaluatio..
- Berry, Chen - 1989
175
Matrix eigensystem routines - eispack guide (context) - Smith, Boyle et al. - 1976
168
The parallel execution of DO loops (context) - Lamport - 1974
165
SPARSKIT: A Basic Tool Kit for Sparse Matrix Computation
- Saad - 1990
159
The NYU Ultracomputer -- designing an MIMD shared memory par.. (context) - Gottlieb, Grishman et al. - 1983
146
Unimodular transformation of double loops (context) - Banerjee - 1990
104
The Structure of Computers and Computations (context) - Kuck - 1978
94
Run-time parallelization and scheduling of loops (context) - Saltz, Mirchandaney et al. - 1991
92
Performance evaluation of memory consistency models for shar..
- Gharachorloo, Gupta et al. - 1991
90
The IBM research parallel processor prototype (context) - Pfister, Brantley et al. - 1985
82
Some computer organizations and their effectiveness (context) - Flynn - 1972
78
Compiler algorithms for synchronization (context) - Midkiff, Padua - 1987
69
Runtime compilation techniques for data partitioning and com..
- Ponnusamy, Saltz et al. - 1993
46
Analysis of event synchronization in a parallel programming ..
- Callahan, Kennedy et al. - 1990
45
An empirical study of fortran programs for parallelizing com..
- Shen, Li et al. - 1990
44
Optimizing Compilers for Supercomputers (context) - Wolfe - 1982
42
Improving the performance of runtime parallelization
- Leung, Zahorjan - 1993
31
shared resource mimd computer (context) - Smith, pipelined - 1978
31
Execution-driven tools for parallel simulation of parallel a..
- Poulsen, Yew - 1993
30
The Cedar system and an initial performance study (context) - Kuck - 1993
29
Series Architecture Manual (context) - System, FX - 1986
25
Multiprocessors: Discussion of Some Theoretical and Practica.. (context) - Padua - 1979
23
An approach to synchronization of parallel computing (context) - Krothapalli, Sadayappan - 1988
21
Dependence uniformization: A loop parallelization technique
- Tzen, Ni - 1993
21
Dependence uniformization: A loop parallelization technique
- Tzen, Ni - 1991
20
Compiler generated synchronization for DO loops (context) - Midkiff, Padua - 1986
19
On reducing data synchronization in multiprocessed loops (context) - Li, Abu-Sufah - 1987
19
The preprocessed doacross loop (context) - Saltz, Mirchandaney - 1991
16
On uniformization of affine dependence algorithms (context) - Chen, Shang - 1992
14
Removal of redundant dependences in doacross loops with cons.. (context) - Krothapalli, Sadayappan - 1991
14
A special purpose architecture for finite element analysis (context) - Jordan - 1978
14
Removal of redundant dependences in doacross loops with cons.. (context) - Krothapalli, Sadayappan - 1992
13
Minimization of interprocessor synchronization in multiproce.. (context) - Shaffer - 1989
9
Butterfly products overview (context) - Computers - 1987
8
Cedar architecture and its software (context) - Emrath, Padua et al. - 1989
8
MaxPar: An execution driven simulator for studying parallel ..
- Chen - 1989
8
Automatic generation of synchronization instructions for par.. (context) - Midkiff - 1986
8
A synchronization scheme and its application for large multi.. (context) - Zhu, Yew - 1984
7
Compile-time Scheduling and Optimization for Asychronous Mac.. (context) - Cytron - 1984
7
Automatic transformation of Fortran program to vector form (context) - Allen, Kennedy - 1987
7
On data synchronization for multiprocessors (context) - Su, Yew - 1989
6
A comparison of four synchronization optimization techniques (context) - Midkiff, Padua - 1991
6
Prentice-Hall Inc (context) - Reingold, Nievergelt et al. - 1977
6
Scheduling loops on processors: Algorithms and complexity (context) - Simons, Munshi - 1990
6
PCF Fortran: Language definition (context) - Computing - 1988
5
Advanced loop optimizations for parallel computers (context) - Polychronopoulos - 1987
5
A technique for reducing synchronization overhead in large s.. (context) - Li, Abu-Sufah - 1985
4
Efficient doacross execution for distributed shared memory m.. (context) - Su, Yew - 1991
4
of Illinois at Urbana-Champaign (context) - Muroaka, Exploitation et al. - 1971
3
A scheduling problem arising from loop parallelization on MI.. (context) - Anderson, Munshi et al. - 1988
Documents on the same site (http://www.csrd.uiuc.edu/tech_reports.html): More
Automatic Detection Of Nondeterminacy, And Scalar Optimizations In .. - Ghosh (1992)
(Correct)
Run-time Visualization of Program Data - Tuchman, Jablonowski, Cybenko (1991)
(Correct)
PTOPP - A Practical Toolset for the Optimization of Parallel.. - McClaughry (1992)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC