Download:
|
by Manjunath Kudlur, Kevin Fan, Michael Chu, Scott Mahlke
in IEEE 15th International Conference on Application-Specific Systems, Architectures and Processors
http://cccp.eecs.umich.edu/papers/kvman-asap04.ps
Add To MetaCart
Abstract:
Distributed local memories, or scratchpads, have been shown to effectively reduce cost and power consumption of application-specific accelerators while maintaining performance. The design of the local memory organization must take several factors into account, including the memory bandwidth and size requirements of the program and the distribution of program data among the memories. In addition, when register structures and function units in the accelerator are clustered, the effects of intercluster communication should be taken into account. This work proposes a technique to synthesize the local memory architecture of a clustered accelerator using a phase-ordered approach. First, the dataflow graph is pre-partitioned to define a performance-centric grouping of the operations. Second, memory synthesis is performed by combining multiple data structures into a set of physical memories that minimizes cost while maintaining a performance threshold. Finally, post-partitioning is performed to determine the final assignment of operations to clusters given the memory organization. Results show that customization reduces memory cost from 2 % to 59 % over a naïve scheme that utilizes one physical memory per program data structure. Further, pre-partitioning is shown to reduce the intercluster communication required to achieve a fixed performance. 1
Citations
|
1029
|
Theory of linear and integer programming
– Schrijver
- 1986
|
|
274
|
Bulldog: A Compiler for VLIW Architectures
– Ellis
- 1985
|
|
230
|
Iterative modulo scheduling: An algorithm for software pipelining loops
– Rau
- 1994
|
|
87
|
Global communication and memory optimizing transformations for low power systems
– Catthoor, Franssen, et al.
- 1994
|
|
54
|
Set Partitioning: A Survey
– Balas, Padberg
- 1976
|
|
38
|
Exact memory size estimation for array computations
– Zhao, Malik
- 1999
|
|
35
|
An algorithm for array variable clustering
– Ramachandran, Gajski, et al.
- 1994
|
|
30
|
The Combination of Scheduling, Allocation, and Mapping in a Single Algorithm
– Cloutier, Thomas
- 1990
|
|
29
|
Architectural Exploration and Optimization of Local Memory in Embedded Systems
– Panda, Dutt, et al.
- 1997
|
|
29
|
Algorithmic and Register-transfer Level Synthesis: The System Architect's Workbench
– Thomas, Lagnese, et al.
- 1990
|
|
28
|
Memory estimation for high level synthesis
– Verbauwhede, Scheers, et al.
- 1994
|
|
24
|
et al. Custom Memory Management Methodology
– Catthoor
- 1998
|
|
18
|
Synthesis of hardware models in C with pointers and complex data structures
– Semeria, Sato, et al.
- 2001
|
|
18
|
An Infrastructure for Research
– Trimaran
- 2000
|
|
16
|
lp solve: a mixed integer linear program solver. ftp://ftp.es.ele.tue.nl/pub/lp solve
– Berkelaar
- 1997
|
|
14
|
Synthesis of Application-Specific Memory Designs
– Schmit, Thomas
- 1997
|
|
14
|
et al. PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators
– Schreiber
- 2002
|
|
9
|
An integrated algorithm for memory allocation and assignment in high-level synthesis
– Seo, Kim, et al.
- 2002
|
|
8
|
The MIMOLA system: Detailed description of the system software
– Marwedel
- 1993
|
|
7
|
Architecture exploration for datapaths with memory hierarchy
– Holmes, Gajski
- 1995
|
|
7
|
Behavioral array mapping into multiport memories targeting low power
– Panda, Dutt
- 1997
|
|
7
|
et al. High-level Synthesis of Non-programmable Hardware Accelerators
– Schreiber
- 2000
|
|
5
|
High-level synthesis of distributed logic-memory architectures
– Huang, Ravi, et al.
- 2002
|
|
4
|
Memory Issues in Embedded System-on-chip: Optimization and exploration
– Dutt, Panda, et al.
- 1999
|
|
2
|
From architecture to layout: Partitioned memory synthesis for embedded systems-on-chip
– Benini, Macchiarulo, et al.
- 2001
|
|
2
|
et al. A compiler-based approach for dynamically managing scratch-pad memories in embedded systems
– Kandemir
- 2004
|
|
1
|
et al. Architecture driven synthesis techniques for mapping digital signal processing structures into silicon
– Man
- 1990
|
|
1
|
Power-exploration through virtual memory management refinement
– Silva, Catthoor, et al.
- 1998
|