Results 1 - 10
of
28
Power Optimization of Variable-Voltage Core-Based Systems
- IEEE Trans. Computer-Aided Design
, 1999
"... The growing class of portable systems, such as personal computing and communication devices, has resulted in a new set of system design requirements, mainly characterized by dominant importance of power minimization and design reuse. The energy efficiency of systems-on-a-chip (SOC) could be much imp ..."
Abstract
-
Cited by 56 (4 self)
- Add to MetaCart
The growing class of portable systems, such as personal computing and communication devices, has resulted in a new set of system design requirements, mainly characterized by dominant importance of power minimization and design reuse. The energy efficiency of systems-on-a-chip (SOC) could be much improved if one were to vary the supply voltage dynamically at run time. We develop the design methodology for the lowpower core-based real-time SOC based on dynamically variable voltage hardware. The key challenge is to develop effective scheduling techniques that treat voltage as a variable to be determined, in addition to the conventional task scheduling and allocation. Our synthesis technique also addresses the selection of the processor core and the determination of the instruction and data cache size and configuration so as to fully exploit dynamically variable voltage hardware, which results in significantly lower power consumption for a set of target applications than existing techniques. The highlight of the proposed approach is the nonpreemptive scheduling heuristic, which results in solutions very close to optimal ones for many test cases. The effectiveness of the approach is demonstrated on a variety of modern industrialstrength multimedia and communication applications.
Improving Cache Behavior of Dynamically Allocated Data Structures
- In International Conference on Parallel Architectures and Compilation Techniques
, 1998
"... Poor data layout in memory may generate weak data locality and poor performance. Code transformations such as loop blocking or interchanging and array padding have addressed this issue for scientific applications. However many generalist applications do not use data arrays, but dynamically allocated ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
Poor data layout in memory may generate weak data locality and poor performance. Code transformations such as loop blocking or interchanging and array padding have addressed this issue for scientific applications. However many generalist applications do not use data arrays, but dynamically allocated heterogeneous data structures. In this paper, we explore two data layout techniques for dynamically allocated data structures: field reorganization, and instance interleaving. The application of these techniques may be guided by program profiling. This allows significant cache behavior improvements on some applications. To support instance interleaving, we developed a specific memory allocation library called ialloc. An ialloc-like library may be of great help in a toolbox for performance tuning of general-purpose applications.
Reducing Cache Misses Using Hardware and Software Page Placement
- IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SUPERCOMPUTING
, 1999
"... As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction and data cache performance for virtually indexed caches by mapping code and data with tempora ..."
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction and data cache performance for virtually indexed caches by mapping code and data with temporal locality to different cache blocks. In this
Data remapping for design space optimization of embedded memory systems
- ACM Transactions in Embedded Computing Systems
, 2003
"... In this article, we present a novel linear time algorithm for data remapping, that is, (i) lightweight; (ii) fully automated; and (iii) applicable in the context of pointer-centric programming languages with dynamic memory allocation support. All previous work in this area lacks one or more of these ..."
Abstract
-
Cited by 25 (8 self)
- Add to MetaCart
In this article, we present a novel linear time algorithm for data remapping, that is, (i) lightweight; (ii) fully automated; and (iii) applicable in the context of pointer-centric programming languages with dynamic memory allocation support. All previous work in this area lacks one or more of these features. We proceed to demonstrate a novel application of this algorithm as a key step in optimizing the design of an embedded memory system. Specifically, we show that by virtue of locality enhancements via data remapping, we may reduce the memory subsystem needs of an application by 50%, and hence concomitantly reduce the associated costs in terms of size, power, and dollar-investment (61%). Such a reduction overcomes key hurdles in designing highperformance embedded computing solutions. Namely, memory subsystems are very desirable from a performance standpoint, but their costs have often limited their use in embedded systems. Thus, our innovative approach offers the intriguing possibility of compilers playing a significant role in exploring and optimizing the design space of a memory subsystem for an embedded design. To this end and in order to properly leverage the improvements afforded by a compiler optimization, we identify a range of measures for quantifying the cost-impact of popular notions of locality, prefetching, regularity of memory access and others. The proposed methodology will
Code Transformations for Low Power Caching in Embedded Multimedia Processors
- Intnl. Parallel Proc. Symp.(IPPS/SPDP), Orlando FL
, 1998
"... In this paper, we present several novel strategies to improve software controlled cache utilization, so as to achieve lower power requirements for multi-media and signal processing applications. Our methodology is targeted towards embedded multi-media and DSP processors. This methodology takes into ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
In this paper, we present several novel strategies to improve software controlled cache utilization, so as to achieve lower power requirements for multi-media and signal processing applications. Our methodology is targeted towards embedded multi-media and DSP processors. This methodology takes into account many program parameters like the locality of data, size of data structures, access structures of large array variables, regularity of loop nests and the size and type of cache with the objective of improving the cache performance for lower power. We also take into account the potential overhead due to the different transformations on the instruction count and the number of execution cycles to meet the real time constraints and code size limitations. Experiments on a real life demonstrator illustrate the fact that our methodology is able to achieve significant gain in power requirements while meeting all other system constraints. 1. Introduction and Related Work Rapid progress in th...
Design Space Optimization of Embedded Memory Systems via Data Remapping
- In Proceedings of the Languages, Compilers, and Tools for Embedded Systems and Software and Compilers for Embedded Systems
, 2002
"... In this paper, we provide a novel compile-time data remapping algorithm that runs in linear time. This remapping algorithm is the first fully automatic approach applicable to pointer-intensive dynamic applications. We show that data remapping can be used to significantly reduce the energy consumed a ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
In this paper, we provide a novel compile-time data remapping algorithm that runs in linear time. This remapping algorithm is the first fully automatic approach applicable to pointer-intensive dynamic applications. We show that data remapping can be used to significantly reduce the energy consumed as well as the memory size needed to meet a user-specified performance goal (i.e., execution time) -- relative to the same application executing without being remapped. These twin advantages afforded by a remapped program -- reduced cache size and energy needs -- constitute a key step in a framework for design space exploration: for any given performance goal, remapping allows the user to reduce the primary and secondary cache size by 50%, yielding a concomitant energy savings of 57%. Additionally, viewed as a compiler optimization for a fixed processor, we show that remapping improves the energy consumed by the cache subsystem by 25%. All of the above savings are in the context of the cache subsystem in isolation. We also show that remapping yields an average 20% energy saving for an ARM-like processor and cache subsystem. All of our improvements are achieved in the context of DIS, OLDEN and SPEC2000 pointer-centric benchmarks.
Memory Design and Exploration for Low Power, Embedded Systems
, 2001
"... In this paper, we describe a procedure for memory design and exploration for low power embedded systems. Our system consists of an instruction cache and a data cache on-chip, and a large memory off-chip. In the first step, we try to reduce the power consumption due to memory traffic by applying me ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
In this paper, we describe a procedure for memory design and exploration for low power embedded systems. Our system consists of an instruction cache and a data cache on-chip, and a large memory off-chip. In the first step, we try to reduce the power consumption due to memory traffic by applying memory-optimizing transformations such as loop transformations. Next we use a memory exploration procedure to choose a cache configuration (cache size and line size) that satisfies the system requirements of area, number of cycles and energy consumption. We include energy in the performance metrics, since for different cache configurations, the variation in energy consumption is quite different from the variation in the number of cycles. The memory exploration procedure is very efficient since it exploits the trends in the cycles and energy characteristics to reduce the search space significantly.
Data Cache Sizing for Embedded Processor Applications
, 1997
"... We present a technique for determining the best data cache size required for a given memory-intensive application. ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
We present a technique for determining the best data cache size required for a given memory-intensive application.
Advanced Data Layout Optimization for Multimedia Applications
- Proc. Workshop on Parallel and Distributed Computing in Image, Video and Multimedia Processing (PDIVM’00), IPDPS’2000
, 2000
"... Increasing disparity between processor and memory speeds has been a motivation for designing systems with deep memory hierarchies. Most data-dominated multimedia applications do not use their cache e ciently and spend much of their time waiting for memory accesses [1]. This also implies a signi cant ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Increasing disparity between processor and memory speeds has been a motivation for designing systems with deep memory hierarchies. Most data-dominated multimedia applications do not use their cache e ciently and spend much of their time waiting for memory accesses [1]. This also implies a signi cant additional

