Results 1 - 10
of
129
An adaptive, nonuniform cache structure for wire-delay dominated on-chip caches
- In International Conference on Architectural Support for Programming Languages and Operating Systems
, 2002
"... Growing wire delays will force substantive changes in the designs of large caches. Traditional cache architectures assume that each level in the cache hierarchy has a single, uniform access time. Increases in on-chip communication delays will make the hit time of large on-chip caches a function of a ..."
Abstract
-
Cited by 199 (34 self)
- Add to MetaCart
Growing wire delays will force substantive changes in the designs of large caches. Traditional cache architectures assume that each level in the cache hierarchy has a single, uniform access time. Increases in on-chip communication delays will make the hit time of large on-chip caches a function of a line's physical location within the cache. Consequently, cache access times will become a continuum of latencies rather than a single discrete latency. This nonuniformity can be exploited to provide faster access to cache lines in the portions of the cache that reside closer to the processor. In this paper, we evaluate a series of cache designs that provides fast hits to multi-megabyte cache memories. We first propose physical designs for these Non-Uniform Cache Architectures (NUCAs). We extend these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache. We show that, for multi-megabyte level-two caches, an adaptive, dynamic NUCA design achieves 1.5 times the IPC of a Uniform Cache Architecture of any size, outperforms the best static NUCA scheme by 11%, outperforms the best three-level hierarchywhile using less silicon area-by 13%, and comes within 13 % of an ideal minimal hit latency solution. 1.
CACTI 3.0: An Integrated Cache Timing, Power, and Area Model
, 2001
"... CACTI 3.0 is an integrated cache access time, cycle time, area, aspect ratio, and power model. By integrating all these models together users can have confidence that tradeoffs between time, power, and area are all based on the same assumptions and hence are mutually consistent. CACTI is intended fo ..."
Abstract
-
Cited by 198 (4 self)
- Add to MetaCart
CACTI 3.0 is an integrated cache access time, cycle time, area, aspect ratio, and power model. By integrating all these models together users can have confidence that tradeoffs between time, power, and area are all based on the same assumptions and hence are mutually consistent. CACTI is intended for use by computer architects so they can better understand the performance tradeoffs inherent in different cache sizes and organizations.
Multiple-banked register file architectures
- In Proceedings of the 27th Annual International Symposium on Computer Architecture
, 2000
"... The register file access time is one of the critical delays in current superscalar processors. Its impact on processor performance is likely to increase in future processor generations, as they are expected to increase the issue width (which implies more register ports) and the size of the instructi ..."
Abstract
-
Cited by 117 (7 self)
- Add to MetaCart
The register file access time is one of the critical delays in current superscalar processors. Its impact on processor performance is likely to increase in future processor generations, as they are expected to increase the issue width (which implies more register ports) and the size of the instruction window (which implies more registers), and to use some kind of multithreading. Under this scenario, the register file access time could be a dominant delay and a pipelined implementation would be desirable to allow for high clock rates. However, a multi-stage register file has severe implications for processor performance (e.g. higher branch misprediction penalty) and complexity (more levels of bypass logic). To tackle these two problems, in this paper we propose a register file architecture composed of multiple banks. In particular we focus on a multi-level organization of the register file, which provides low
System-Level Power Optimization: Techniques and Tools
- ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS
, 2000
"... ..."
Synthesis Techniques for Low-Power Hard Real-Time Systems on Variable Voltage Processors
, 1998
"... The energy efficiency of systems-on-a-chip can be much improved if one were to vary the supply voltage dynamically at run time. In this paper we describe the synthesis of systems-on-a-chip based on core processors, while treating voltage (and correspondingly, the clock frequency) as a variable to be ..."
Abstract
-
Cited by 96 (5 self)
- Add to MetaCart
The energy efficiency of systems-on-a-chip can be much improved if one were to vary the supply voltage dynamically at run time. In this paper we describe the synthesis of systems-on-a-chip based on core processors, while treating voltage (and correspondingly, the clock frequency) as a variable to be scheduled along with the computation tasks during the static scheduling step. In addition to describing the complete synthesis design flow for these variable voltage systems, we focus on the problem of doing the voltage scheduling while taking into account the inherent limitation on the rates at which the voltage and clock frequency can be changed by the power supply controllers and clock generators. Taking these limits on rate of change into account is crucial since changing the voltage by even a volt may take time equivalent to 100s to 10,000s of instructions on modern processors. We present both an exact but impractical formulation of this scheduling problem as a set of non-linear equations, as well as a heuristic approach based on reduction to an optimally solvable restricted ordered scheduling problem. Using various task mixes drawn from a set of nine real-life applications, our results show that we are able to reduce power consumption to within 7% of the lower bound obtained by imposing no limit at the rate of change of voltage and clock frequencies.
Reconfigurable Caches and their Application to Media Processing
- In Proceedings of the 27th Annual International Symposium on Computer Architecture
, 2000
"... High performance general-purpose processors are increasingly being used for a variety of application domains -- scientific, engineering, databases, and more recently, media processing. It is therefore important to ensure that architectural features that use a significant fraction of the on-chip tra ..."
Abstract
-
Cited by 91 (4 self)
- Add to MetaCart
High performance general-purpose processors are increasingly being used for a variety of application domains -- scientific, engineering, databases, and more recently, media processing. It is therefore important to ensure that architectural features that use a significant fraction of the on-chip transistors are applicable across these different domains. For example, current processor designs often devote the largest fraction of on-chip transistors (up to 80%) to caches. Many workloads, however, do not make effective use of large caches; e.g., media processing workloads which often have streaming data access patterns and large working sets. This paper proposes a new reconfigurable cache design. This design enables the cache SRAM arrays to be dynamically divided into multiple partitions that can be used for different processor activities. These activities can benefit applications that would otherwise not use the storage allocated to large conventional caches. Our design involves relativ...
Power Optimization of Variable Voltage Core-Based Systems
, 1998
"... The growing class of portable systems, such as personal computing and communication devices, has resulted in a new set of system design requirements, mainly characterized by dominant importance of power minimization and design reuse. We develop the design methodology for the low power core-based rea ..."
Abstract
-
Cited by 86 (7 self)
- Add to MetaCart
The growing class of portable systems, such as personal computing and communication devices, has resulted in a new set of system design requirements, mainly characterized by dominant importance of power minimization and design reuse. We develop the design methodology for the low power core-based real-time system-onchip based on dynamically variable voltage hardware. The key challenge is to develop effective scheduling techniques that treat voltage as a variable to be determined, in addition to the conventional task scheduling and allocation. Our synthesis technique also addresses the selection of the processor core and the determination of the instruction and data cache size and configuration so as to fully exploit dynamically variable voltage hardware, which result in significantly lower power consumption for a set of target applications than existing techniques. The highlight of the proposed approach is the non-preemptive scheduling heuristic which results in solutions very close to optimal ones for many test cases. The effectiveness of the approach is demonstrated on a variety of modern industrial-strength multimedia and communication applications.
Assigning Program and Data Objects to Scratchpad for Energy Reduction
- In Proceedings of the conference on Design, automation and test in Europe
, 2002
"... The number of embedded systems is increasing and a remarkable percentage is designed as mobile applications. For the latter, the energy consumption is a limiting factor because of today's battery capacities. Besides the processor, memory accesses consume a high amount of energy. The use of additiona ..."
Abstract
-
Cited by 73 (14 self)
- Add to MetaCart
The number of embedded systems is increasing and a remarkable percentage is designed as mobile applications. For the latter, the energy consumption is a limiting factor because of today's battery capacities. Besides the processor, memory accesses consume a high amount of energy. The use of additional less power hungry memories like caches or scratchpads is thus common.
Positional Adaptation of Processors: Application to Energy Reduction
- In International Symposium on Computer Architecture
, 2003
"... Although adaptive processors can exploit application variability to improve performance or save energy, effectively managing their adaptivity is challenging. To address this problem, we introduce a new approach to adaptivity: the Positional approach. In this approach, both the testing of configurati ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
Although adaptive processors can exploit application variability to improve performance or save energy, effectively managing their adaptivity is challenging. To address this problem, we introduce a new approach to adaptivity: the Positional approach. In this approach, both the testing of configurations and the application of the chosen configurations are associated with particular code sections. This is in contrast to the currently-used Temporal approach to adaptation, where both the testing and application of configurations are tied to successive intervals in time.
A Framework for Dynamic Energy Efficiency and Temperature Management
- INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, DEC. 2000
, 2000
"... While technology is delivering increasingly sophisticated and powerful chip designs, it is also imposing alarmingly high energy requirements on the chips. One way to address this problem is to manage the energy dynamically. Unfortunately, current dynamic schemes for energy management are relatively ..."
Abstract
-
Cited by 62 (5 self)
- Add to MetaCart
While technology is delivering increasingly sophisticated and powerful chip designs, it is also imposing alarmingly high energy requirements on the chips. One way to address this problem is to manage the energy dynamically. Unfortunately, current dynamic schemes for energy management are relatively limited. In addition, they manage energy either for energy efficiency or for temperature control, but not for both simultaneously. In this

