See this document in CiteSeerX!

Exploiting Superword-Level Locality in Multimedia Extension Architectures (2003)  (Make Corrections)  
Jaewook Shin, Jacqueline Chame, Mary W. Hall



  Home/Search   Context   Related

 
View or download:
jilp.org/vol5/v5paper4.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  jilp.org/vol5/index (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: In this paper, we describe an algorithm and implementation of locality optimizations for architectures with instruction sets such as Intel's SSE and Motorola's AltiVec that support operations on superwords, i.e., aggregate objects consisting of several machine words. We treat the large superword register file as a compiler-controlled cache, thus avoiding unnecessary memory accesses by exploiting reuse in superword registers. This research is distinguished from previous work on exploiting... (Update)

Active bibliography (related documents):   More   All
2.5:   Compiler-Controlled Caching in Superword Register Files for.. - Shin, Chame, Hall (2002)   (Correct)
0.4:   Current Research Efforts in Media ISA Development - Lappalainen, Liuha, Hämäläinen   (Correct)
0.4:   Optimizing Compiler for a CELL Processor - Eichenberger, O'Brien, O'Brien.. (2005)   (Correct)

Similar documents based on text:   More   All
0.4:   Evaluating Compiler Technology for Control-Flow.. - Shin, Hall, Chame   (Correct)
0.3:   The Architecture of the DIVA Processing-in-Memory Chip - Draper, Chame, Hall.. (2002)   (Correct)
0.3:   Code Transformations for Exploiting Bandwidth in PIM-Based.. - Chame, Shin, Hall (2000)   (Correct)

BibTeX entry:   (Update)

@misc{ shin-exploiting,
  author = "Jaewook Shin and Jacqueline Chame and Mary W. Hall",
  title = "Exploiting Superword-Level Locality in Multimedia Extension Architectures",
  url = "citeseer.ist.psu.edu/shin03exploiting.html" }
Citations (may not include all citations):
474   A data locality optimizing algorithm (context) - Wolf, Lam - 1991
376   The cache performance and optimization of blocked algorithms (context) - Lam, Rothberg et al. - 1991
292   Advanced Compiler Design and Implementation (context) - Muchnick - 1997
197   Maximizing multiprocessor performance with the SUIF compiler - Hall, Anderson et al. - 1996
137   Compiler optimizations for improving data locality - Carr, McKinley et al. - 1994
124   Tile size selection using cache organization and data layout - Coleman, McKinley - 1995
111   More iteration space tiling (context) - Wolfe - 1989
82   To copy or not to copy: A compile-time technique for assessi.. - Temam, Granston et al. - 1993
82   On estimating and enhancing cache effectiveness (context) - Ferrante, Sarkar et al. - 1991
77   Cache miss equations: An analytical representation of cache .. - Ghosh, Martonosi et al. - 1997
71   Improving Locality and Parallelism in Nested Loops (context) - Wolf - 1992
51   Improving the ratio of memory operations to floating-point o.. - Carr, Kennedy - 1994
48   Optimizing Compilers for Modern Architectures (context) - Allen, Kennedy - 2002
47   Scalar replacement in the presence of conditional control fl.. - Carr, Kennedy - 1994
46   Precise miss analysis for program transformations with cache.. - Ghosh, Martonosi et al. - 1998
35   Performance of image and video processing with general-purpo.. - Ranganathan, Adve et al. - 1999
32   Subword parallelism with max2 (context) - Lee - 1996
31   A comparison of compiler tiling algorithms - Rivera, Tseng - 1999
26   Computational RAM: a memory-SIMD hybrid and its application .. - Elliott, Snelgrove et al. - 1992
23   Exploiting superword level parallelism with multimedia instr.. - Larsen, Amarasinghe - 2000
21   Improving data locality for caches (context) - Esseghir - 1993
19   Microservers: A new memory semantics for massively parallel .. - Brockman, Kogge et al. - 1999
17   Evaluation of existing architectures in IRAM systems - Bowman, Cardwell et al. - 1997
15   A tile selection algorithm for data locality and cache inter.. - Chame, Moon - 1999
15   Influence of cross-interferences on blocked loops: A case st.. - Fricker, Temam et al. - 1995
13   An optimizer for multimedia instruction sets - Cheong, Lam - 1997
10   A vectorizing suif compiler: Implementation and performance (context) - DeVries - 1997
8   Increasing and detecting memory address congruence - Larsen, Witchel et al. - 2002
8   A vectorizing compiler for multimedia extensions (context) - Sreraman, Govindarajan - 2000
5   A compiler approach to fast hardware design space exploratio.. - So, Hall et al. - 2002
5   Mapping irregular applications to DIVA, a PIM-based data-int.. - Hall, Kogge et al. - 1999
3   Compiler-controlled caching in superword register files for .. - Shin, Chame et al. - 2002
2   Index set splitting to exploit data locality at the register.. (context) - Jimenez, Llaberia et al. - 1996
2   VASTAltiVec Feature (context) - AltiVec, http et al. - 2001
1   T0 engineering data (context) - Asanovic, Beck

Documents on the same site (http://www.jilp.org/vol5/index.html):   More
Instruction-Isomorphism in Program Execution - Sazeides (2003)   (Correct)
Journal of Instruction-Level Parallelism 5 (2003) 1-21.. - The Role Of   (Correct)
Journal of Instruction-Level Parallelism 5(2003) 1-32.. - Branch Predictors..   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC