In this paper, we present several novel strategies to improve software controlled cache utilization, so as to achieve lower power requirements for multi-media and signal processing applications. Our methodology is targeted towards embedded multi-media and DSP processors. This methodology takes into account many program parameters like the locality of data, size of data structures, access structures of large array variables, regularity of loop nests and the size and type of cache with the objective of improving the cache performance for lower power. We also take into account the potential overhead due to the different transformations on the instruction count and the number of execution cycles to meet the real time constraints and code size limitations. Experiments on a real life demonstrator illustrate the fact that our methodology is able to achieve significant gain in power requirements while meeting all other system constraints. 1. Introduction and Related
|
3170
|
Computer Architecture: A Quantitative Approach
– Hennessy, Patterson
- 1996
|
|
363
|
A loops transformation theory and an algorithm to maximise parallelism
– Wolf, Lam
- 1991
|
|
345
|
Dependence Analysis for Supercomputing
– Banerjee
- 1988
|
|
160
|
Data and computation transformations for multiprocessors
– Anderson, Amarasinghe, et al.
- 1993
|
|
116
|
Image and Video Compression Standards: Algorithms and Applications”, 2nd Ed
– Bhaskaran, Konstantinides
- 1997
|
|
50
|
Formalized methodology for data reuse exploration in hierarchical memory mappings
– DIGUET, WUYTACK, et al.
- 1997
|
|
34
|
Memory data organization for improved cache performance in embedded processor applications
– Panda, Dutt, et al.
- 1997
|
|
29
|
Memory size reduction through storage order optimization for embedded parallel multimedia applications
– Greef, Catthoor, et al.
- 1997
|
|
19
|
Analysis of power consumption in memory hierarchies
– Hicks, Walnock, et al.
- 1997
|
|
17
|
System-level memory management for weakly parallel image processing
– Danckaert, Man
- 1996
|
|
14
|
N.Zingirian. Image processing on high-performance risc systems
– Baglietto, Maresca, et al.
- 1996
|
|
10
|
Array data layout for the reduction of cache conflicts
– Manjikian, Abdelrahman
- 1995
|
|
9
|
Linear loop transformations in optimizing compilers for parallel machines
– Kulkarni
- 1995
|
|
7
|
Automatic Data Mapping of Signal Processing Applications
– Ancourt, Barthou, et al.
- 1997
|
|
7
|
Digital signal processing of speech signals
– Rabiner, Schafer
- 1988
|
|
6
|
A Unified Transformation Technique for Multilevel Blocking
– Jiménez, Llabería, et al.
- 1996
|
|
5
|
Program transformation strategies for reduced power and memory size in pseudo-regular multimedia applications”, accepted for publication
– Greef, Man
- 1998
|
|
5
|
E.De Greef, H.Samsom, H.De Man, "Optimisation of memory organisation and hierarchy for decreased size and power in video and image processing systems
– Nachtergaele, Balasa
- 1995
|
|
3
|
E.Rothberg and M.Wolf, “ The cache performance and optimizations of blocked algorithms
– Lam
- 1991
|
|
3
|
Konstantinides "Image and video compression standards: algorithms and architectures
– Bhaskaran, K
- 1995
|
|
2
|
network computer and its future
– Broderson
- 1997
|
|
2
|
Fusion of loops for parallelism and locality
– Manjiakian
- 1995
|
|
2
|
E.De Greef, H.Samsom, H.De Man, “Optimisation of memory organisation and hierarchy for decreased size and power in video and image processing systems
– Nachtergaele, Balasa
- 1995
|
|
2
|
and M.Migliardi, "Image processing on high-performance RISC sytems
– Baglietto
- 1996
|
|
2
|
Chip fashion: multi-media chips", Byte Magazine
– Halfhill, Montgomery
- 1995
|
|
2
|
A.Fernandez and E.Morancho, "A unified transformation technique for multi-level blocking
– Jimenez
- 1996
|
|
2
|
and J.L.Hennessy, " Computer architecture a quantitative approach
– Patterson
- 1996
|
|
1
|
Development toolkit manual”, Version 2.0
– Software
- 1995
|
|
1
|
fashion: multimedia chips”, Byte Magazine
– Halfhill, Montgomery
- 1995
|
|
1
|
Caching strategies for voice coder application on embedded processors
– Kulkarni
- 1997
|
|
1
|
Development toolkit manual", Version 2.0
– Software
- 1995
|
|
1
|
and R.M.Owens, "Analysis of power consumption in memory hierarchies
– Hicks
- 1997
|
|
1
|
M.Stumm, "Linear loop transformations in optimizing compilers for parallel machines", The Australian Computer Journal
– Kulkarni
- 1995
|
|
1
|
E.Rothberg and M.Wolf, " The cache performance and optimizations of blocked algorithms
– Lam
- 1991
|
|
1
|
T.Abdelrahman, "Array data layout for reduction of cache conflicts
– Manjikian
- 1995
|
|
1
|
R.W.Schafer, "Digital signal processing of speech signals
– Rabiner
- 1988
|