See this document in CiteSeerX!

Portable High Performance Programming via Architecture-Cognizant Divide-and-Conquer Algorithms (2000)  (Make Corrections)  (1 citation)
Kang Su Gatlin
University of California, San Diego



  Home/Search   Context   Related

 
View or download:
ucsd.edu/~kgatlin/papers/thesis.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  ucsd.edu/~kgatlin/papers (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: ...................................................... xiii 1 Introduction .................................................. 1 1. Divide-and-Conquer and the Memory Hierarchy . . . . . . . . . . . 2 2. Overview of Architecture-Cognizant Divide-and Conquer . . . . . . 4 3. Overview of Napoleon . . . . . . . . . . . . . . . . . . . . . . . . . 5 4. What You Can Expect . . . . . . . . . . . . . . . . . . . . . . . . . 6 5. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 ... (Update)

Cited by:   More
Rescheduling for Locality in Sparse Matrix Computations - Strout, Carter, Ferrante   (Correct)

Active bibliography (related documents):   More   All
1.1:   Architecture-Cognizant Divide and Conquer Algorithms - Gatlin, Carter (1999)   (Correct)
0.8:   Memory Hierarchy Considerations for Fast Transpose and.. - Gatlin, Carter (1999)   (Correct)
0.7:   Faster FFTs via Architecture-Cognizance - Gatlin, Carter (2000)   (Correct)

Similar documents based on text:   More   All
0.5:   Adaptive Scheduling of Master/Worker Applications on Distributed.. - Shao (2001)   (Correct)
0.4:   Asymptotic Expansions of the Mergesort Recurrences - Hwang (1998)   (Correct)
0.4:   Cv - Strout   (Correct)

BibTeX entry:   (Update)

Kang Su Gatlin. Portable High Performance Programming via ArchitectureCognizant Divide-and-Conquer Algorithms. Ph.d. thesis, University of California, San Diego, September 2000. http://citeseer.ist.psu.edu/gatlin00portable.html   More

@phdthesis{ gatlin00portable,
  author = "K. Gatlin",
  title = "Portable High Performance Programming via ArchitectureCognizant Divide-and-Conquer Algorithms",
  school = {University of California, San Diego},
  year = "2000",
  month = September,
  url = "citeseer.ist.psu.edu/gatlin00portable.html",
  url = "citeseer.nj.nec.com/gatlin00portable.html" }
Citations (may not include all citations):
3972   Introduction to Algorithms (context) - Cormen, Leiserson et al. - 1994
1535   Cambridge University Press (context) - Press, Teukolsky et al. - 1993
1399   Compilers: Principles (context) - Aho, Sethi et al. - 1988
476   Programming Language (context) - Kernighan, Ritchie - 1988
294   A Loop Transformation Theory and an Algorithm to Maximize Pa.. (context) - Wolf, Lam - 1991
292   Advanced Compiler Design and Implementation (context) - Muchnick - 1997
216   Strategies for Cache and Local Memory Managementby Global Pr.. (context) - Gannon, Jalby et al. - 1988
206   An Algorithm for the Machine Calculation of Complex Fourier .. (context) - Cooley, Tukey - 1965
193   Superscalar Microprocessor Design (context) - Johnson - 1991
157   Automatically Tuned Linear Algebra Software - Whaley, Dongarra - 1998
123   Optimizing Matrix Multiply using PHiPAC: a Portable High-Per.. - Bilmes, Asanovic et al. - 1997
99   Hitting the Memory Wall: Implications of the Obvious - Wulf, McKee - 1995
98   High Performance Compilers for Parallel Computing (context) - Wolfe - 1996
83   Programming with POSIX Threads (context) - Butenhof - 1997
82   To Copy or Not to Copy: A Compile-Time Technique for Assessi.. - Teman, Granston et al. - 1993
79   Intel Architecture Software Developer's Manual (context) - Corporation - 1999
64   Cache-Oblivious Algorithms (context) - Frigo, Leiserson et al. - 1999
62   Computational Frameworks for the Fast Fourier Transform (context) - Van Loan - 1992
62   An analysis of DAG-consistent distributed shared-memory algo.. - Blumofe, Frigo et al. - 1996
60   The Uniform Memory Hierarchy Model of Computation - Alpern, Carter et al. - 1994
60   Recursion Leads to Automatic Variable Blocking for Dense Lin.. (context) - Gustavson - 1997
49   Design and Implementation of Code Optimizations for a TypeDi.. (context) - Tarditi - 1997
42   The Cache Performance of Blocked Algorithms (context) - Lam, Rothberg et al. - 1991
40   Algorithmic Skeletons: A Structured Approach to the Manageme.. (context) - Cole - 1988
29   Modeling Parallel Computers as Memory Hierarchies - Alpern, Carter et al. - 1993
28   Tuning Strassen's Matrix Multiplication for Memory Eciency - Thottethodi, Chatterjee et al. - 1998
26   Computer Architecture: A Quantitative Approach (context) - Hennessey, Patterson - 1996
24   POWER2: The Next Generation of the RISC System/6000 Family (context) - White, Dhawan - 1994
23   Optimizing for Reduced Code Space Using Genetic Algorithms - Cooper, Schielke et al. - 1999
21   Hierarchical Tiling: A Methodology for High Performance - Carter, Ferrante et al. - 1996
18   The Fastest Fourier Transform in the West - Frigo, Johnson - 1997
15   Architecture-Cognizant Divide and Conquer Algorithms - Gatlin, Carter - 1999
14   Towards a Model for Portable Parallel Performance: Exposing .. - Alpern, Carter - 1994
13   Mastering Regular Expressions (context) - Friedl, Oram - 1997
12   Portable High-PerformanceSupercomputing: High-Level Platform.. - Brewer - 1994
11   In uence of Caches on the Performance of Sorting (context) - LaMarca, Ladner - 1997
9   Space-Limited Procedures: A Methodology for Portable High-Pe.. - Alpern, Carter et al. - 1995
8   Towards an Optimal Bit-Reversal Permutation Program - Carter, Gatlin - 1998
8   Parallelization of Divide-and-Conquer in the Bird-Meertens F.. - Gorlatch, Lengauer - 1995
6   Challenges of Computing the Fast Fourier Transform - Johnson, Johnson
6   Microprocessor Report (context) - Linley, Sports et al. - 1997
5   Memory Hierarchy Consideration for Fast Transpose and Bit-Re.. - Gatlin, Carter - 1999
5   Software Methods for Improvement of Cache Performance on Sup.. (context) - Portereld - 1989
4   Reference Manual (context) - Group - 1998
4   On Computing the Fast Fourier Transform (context) - Singleton - 1967
4   Eliminating Con ict Misses for High Performance Architecture.. (context) - Rivera, Tseng - 1998
3   IBM's POWER to replace PSC (context) - POWER, SC et al. - 1997
2   An Overview of the Alpha AXP 21164 Microprocessor (context) - Benschneider - 1995
2   Programming with Divide-and-Conquer Skeletons: ACaseStudy of.. (context) - Gorlatch - 1997
1   Technical Report SC (context) - Scientic, Version et al. - 1992
1   The Hewlett-Packard PA 8000 RISC CPU: A High Performance Out.. (context) - Kumar - 1997
1   Overcoming the Challenges to Feeback-Directed Optimizations (context) - Smith - 2000
1   Personnal Correspondance (context) - Mou - 1998

Documents on the same site (http://www.cs.ucsd.edu/~kgatlin/papers.html):   More
Towards an Optimal Bit-Reversal Permutation Program - Carter, Gatlin (1998)   (Correct)
Architecture-Cognizant Divide and Conquer Algorithms - Gatlin, Carter (1999)   (Correct)
Performance Optimisations of the NPB FT Kernel by.. - Getov, Wei, Carter.. (1999)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC