Download:
by Chidamber Kulkarni, Koen Danckaert, Also Ph. D
http://www.imec.be/design/dtse/pdf/Kul01b.pdf
Add To MetaCart
Abstract:
Real-time multi-media applications need large processing power and yet require a low-power implementation in an embedded context. For programmable parallel processors, this poses new challenges for optimizing a given application for high-performance and low-power. In this paper, we present a case study of applying our low-power oriented data transfer and storage exploration methodology and coupling it with a state-of-the-art performance optimizing and parallelizing compiler. Experiments on two real-life applications show that this combined approach heavily reduces the memory accesses and bus-loading and hence power. At the same time a significant reduction in the total execution time is obtained. Decomposing the detailed parallelization and data transfer and storage exploration issues into two different stages is required to obtain the important benefits of both the stages without exploding the complexity of solving all the issues simultaneously. This will be demonstrated by the experimental results. Key-Words: Program transformations, parallelization, data transfer and storage, low power, multimedia applications. 1 Introduction and Related Work Parallel machines were mainly, if not exclusively, used in scientific communities until recently. Lately, the rapid
Citations
|
3148
|
Computer Architecture: A Quantitative Approach
– Hennessy, Patterson
- 1996
|
|
657
|
Advanced Compiler Design and Implementation
– Muchnick
- 1997
|
|
152
|
Unifying data and control transformations for distributed shared memory machines
– Cierniak, Li
- 1995
|
|
123
|
Automatic Array Privatization
– Tu, Padua
- 1993
|
|
107
|
Instruction level power analysis and optimization of software
– Tiwari, Malik, et al.
- 1996
|
|
106
|
The SUIF compiler for scalable parallel machines
– Amarasinghe, Anderson, et al.
- 1995
|
|
52
|
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors
– Agarwal, Krantz, et al.
- 1995
|
|
50
|
Formalized methodology for data reuse exploration in hierarchical memory mappings
– DIGUET, WUYTACK, et al.
- 1997
|
|
38
|
Low-overhead scheduling of nested parallelism
– Hummel, Schonberg
- 1991
|
|
31
|
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers
– Chen, Sheu
- 1994
|
|
29
|
Memory size reduction through storage order optimization for embedded parallel multimedia applications
– Greef, Catthoor, et al.
- 1997
|
|
25
|
E.De Greef, F.Balasa, L.Nachtergaele, A.Vandecappelle, “Custom Memory Management Methodology – Exploration of Memory Organisation for Embedded
– Catthoor
- 1998
|
|
18
|
QSDPCM –ANewTechnique in Scene Adaptive Coding
– Strobach
- 1988
|
|
17
|
System-level memory management for weakly parallel image processing
– Danckaert, Man
- 1996
|
|
15
|
High-level address optimisation and synthesis techniques for data-transfer intensive applications
– Miranda, Janssen, et al.
- 1998
|
|
13
|
A Strategy for Array Management
– Eisenbeis, Jalby, et al.
- 1991
|
|
6
|
Transformation of nested loops with modulo indexing to affine recurrences
– Balasa, Franssen, et al.
- 1994
|
|
6
|
A.Nicolau, D.Padua, “Automatic program parallelisation
– Banerjee
- 1993
|
|
5
|
Program transformation strategies for reduced power and memory size in pseudo-regular multimedia applications”, accepted for publication
– Greef, Man
- 1998
|
|
5
|
Optimizing Supercompilers for Supercomputers", Reasearch Monographs in Parallel and Distributed Computing
– Wolfe
- 1989
|
|
3
|
J.Cornelis, “Automatic Segmentation of Cardiac MR
– Bister
- 1989
|
|
3
|
System level energy-delay exploration for multimedia applications on embedded cores with hardware caches
– Kulkarni, Moolenaar, et al.
- 1999
|
|
3
|
G.de Jong, “Fast and extensive system-level memory exploration for ATM applications
– Slock, Catthoor
- 1997
|
|
2
|
network computer and its future
– Broderson
- 1997
|
|
2
|
Cache optimization for multimedia compilation on embedded processors for low power
– Kulkarni, Catthoor, et al.
- 1998
|