Results 1 - 10
of
51
Slack: Maximizing Performance Under Technological Constraints
, 2002
"... Many emerging processor microarchitectures seek to manage technological constraints (e.g., wire delay, power, and circuit complexity) by resorting to nonuniform designs that provide resources at multiple quality levels (e.g., fast/slow bypass paths, multi-speed functional units, and grid architectur ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
Many emerging processor microarchitectures seek to manage technological constraints (e.g., wire delay, power, and circuit complexity) by resorting to nonuniform designs that provide resources at multiple quality levels (e.g., fast/slow bypass paths, multi-speed functional units, and grid architectures). In such designs, the constraint problem becomes a control problem, and the challenge becomes designing a control policy that mitigates the performance penalty of the non-uniformity. Given the increasing importance of non-uniform control policies, we believe it is appropriate to examine them in their own right. To this end, we develop slack for use in creating control policies that match program execution behavior to machine design. Intuitively, the slack of a dynamic instruction i is the number of cycles i can be delayed with no effect on execution time. This property makes slack a natural candidate for hiding non-uniform latencies. We make three contributions in our exploration of slack. First, we formally define slack, distinguish three variants (local, global and apportioned), and perform a limit study to show that slack is prevalent in our SPEC2000 workload. Second, we show how to predict slack in hardware. Third, we illustrate how to create a control policy based on slack for steering instructions among fast (high power) and slow (lower power) pipelines.
Using Interaction Costs for Microarchitectural Bottleneck Analysis
- ABSTRACT APPEARS IN 36TH INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO ’03)
, 2003
"... Attacking bottlenecks in modern processors is difficult because many microarchitectural events overlap with each other. This parallelism makes it difficult to both (a) assign a cost to an event (e.g., to one of two overlapping cache misses) and (b) assign blame for each cycle (e.g., for a cycle wher ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
Attacking bottlenecks in modern processors is difficult because many microarchitectural events overlap with each other. This parallelism makes it difficult to both (a) assign a cost to an event (e.g., to one of two overlapping cache misses) and (b) assign blame for each cycle (e.g., for a cycle where many, overlapping resources are active). This paper introduces a new model for understanding event costs to facilitate processor design and optimization. First, we observe that everything in a machine (instructions, hardware structures, events) can interact in only one of two ways (in parallel or serially). We quantify these interactions by defining interaction cost, which can be zero (independent, no interaction), positive (parallel), or negative (serial). Second, we illustrate the value of using interaction costs in processor design and optimization. Finally, we propose performance-monitoring hardware for measuring interaction costs that is suitable for modern processors.
Power-Aware Control Speculation through Selective Throttling
- IN PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE
, 2003
"... ..."
(Show Context)
Quantifying Instruction Criticality
, 2002
"... Information about instruction criticality can be used to control the application of micro-architectural resources efficiently. To this end, several groups have proposed methods to predict critical instructions. This paper presents a framework that allows us to directly measure the criticality of ind ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
Information about instruction criticality can be used to control the application of micro-architectural resources efficiently. To this end, several groups have proposed methods to predict critical instructions. This paper presents a framework that allows us to directly measure the criticality of individual dynamic instructions. This allows us to (1) measure the accuracy of proposed critical path predictors, (2) quantify the amount of slack present in non-critical instructions, and (3) provide a new metric, called tautness, which ranks critical instructions by their dominance on the critical path. This research investigates methods for improving critical path predictor accuracy and studies the distribution of slack and tautness in programs. It shows that instruction criticality changes dynamically, and that criticality history patterns can be used to significantly improve predictor accuracy.
Interaction cost and shotgun profiling
- ACM Transactions on Architecture and Code Optimization
, 2004
"... We observe that the challenges software optimizers and microarchitects face every day boil down to a single problem: bottleneck analysis. A bottleneck is any event or resource that contributes to execution time, such as a critical cache miss or window stall. Tasks such as tuning processors for energ ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
We observe that the challenges software optimizers and microarchitects face every day boil down to a single problem: bottleneck analysis. A bottleneck is any event or resource that contributes to execution time, such as a critical cache miss or window stall. Tasks such as tuning processors for energy efficiency and finding the right loads to prefetch all require measuring the performance costs of bottlenecks. In the past, simple event counts were enough to find the important bottlenecks. Today, the parallelism of modern processors makes such analysis much more difficult, rendering traditional performance counters less useful. If two microarchitectural events (such as a fetch stall and a cache miss) occur in the same cycle, which event should we blame for the cycle? What cost should we assign to each event? In this paper, we introduce a new model for understanding event costs to facilitate processor design and optimization. First, we observe that all instructions, hardware structures, and events in a machine can interact in only one of two ways (in parallel or serially). We quantify these interactions by defining interaction cost, which can be zero (independent, no interaction), positive (parallel), or negative (serial). Second, we illustrate the value of using interaction costs in processor design and optimization.
Critical Path Analysis of the TRIPS Architecture
- In IEEE International Symposium on Performance Analysis of Systems and Software
, 2006
"... Fast, accurate, and effective performance analysis is essential for the design of modern processor architectures and improving application performance. Recent trends toward highly concurrent processors make this goal increasingly difficult. Conventional techniques, based on simulators and performanc ..."
Abstract
-
Cited by 18 (13 self)
- Add to MetaCart
(Show Context)
Fast, accurate, and effective performance analysis is essential for the design of modern processor architectures and improving application performance. Recent trends toward highly concurrent processors make this goal increasingly difficult. Conventional techniques, based on simulators and performance monitors, are ill-equipped to analyze how a plethora of concurrent events interact and how they affect performance. Prior research has shown the utility of critical path analysis in solving this problem [5, 18]. This analysis abstracts the execution of a program with a dependence graph. With simple manipulations on the graph, designers can gain insights into the bottlenecks of a design. This paper extends critical path analysis to understand the performance of a next-generation, high-ILP architecture. The TRIPS architecture introduces new features not present in conventional superscalar architectures. We show how dependence constraints introduced by these features, specifically the execution model and operand communication links, can be modeled with a dependence graph. We describe a new algorithm that tracks critical path information at a fine-grained level and yet can deliver an order of magnitude (30x) improvement in performance over previously proposed techniques [5, 18]. Finally, we provide a breakdown of the critical path for a select set of benchmarks and show an example where we use this information to improve the performance of a heavily-hand-optimized program by as much as 11%. 1
Reconfigurable security support for embedded systems
- In Proceedings of the 39th Hawaii International Conference on System Sciences
, 2006
"... Abstract — Embedded systems present significant security challenges due to their limited resources and power constraints. We propose a novel security architecture for embedded systems (SANES) that leverages the capabilities of reconfigurable hardware to provide efficient and flexible architectural s ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
(Show Context)
Abstract — Embedded systems present significant security challenges due to their limited resources and power constraints. We propose a novel security architecture for embedded systems (SANES) that leverages the capabilities of reconfigurable hardware to provide efficient and flexible architectural support to both security standards and a range of attacks. This paper shows the efficiency of reconfigurable architecture to implement security primitives within embedded systems. We also propose the use of hardware monitors to detect and defend against attacks. The SANES architecture is based on three main ideas: 1) reconfigurable security primitives, 2) reconfigurable hardware monitors and 3) a hierarchy of security controllers at the primitive, system and executive level. Results are presented for a reconfigurable AES security primitive within the IPSec standard and highlight the interest of such a solution. I.
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning
- In 7th Annual Workshop on Interaction Between Compilers and Computer Architecture (INTERACT-7
, 2003
"... By anticipating when resources will be idle, it is possible to reconfigure the hardware to reduce power consumption without significantly reducing performance. This requires predicting what the resource requirements will be for an application. In the past, researchers have taken one of two approache ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
By anticipating when resources will be idle, it is possible to reconfigure the hardware to reduce power consumption without significantly reducing performance. This requires predicting what the resource requirements will be for an application. In the past, researchers have taken one of two approaches: design hardware monitors that can measure recent performance, or profile the application to determine the most likely behavior for each block of code. This paper explores a third option which is to combine hardware monitoring with software profiling to achieve lower power utilization than either method alone. We demonstrate the potential for this approach in two ways. First, we compare hardware monitoring and software profiling of IPC for code blocks and show that they capture different information. By combining them, we can control issue width and ALU usage more effectively to save more power. Second, we show that anticipating stalls due to critical load misses in the L2 cache can enable fetch halting. However, hardware monitoring and software profiling must be used together to effectively predict misses and criticality of loads.
Reconfigurable hardware for high-security/ high-performance embedded systems: The safes perspective
- Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
, 2008
"... Abstract—Embedded systems present significant security chal-lenges due to their limited resources and power constraints. This paper focuses on the issues of building secure embedded systems on reconfigurable hardware and proposes a security architecture for embedded systems (SAFES). SAFES leverages ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Abstract—Embedded systems present significant security chal-lenges due to their limited resources and power constraints. This paper focuses on the issues of building secure embedded systems on reconfigurable hardware and proposes a security architecture for embedded systems (SAFES). SAFES leverages the capabilities of reconfigurable hardware to provide efficient and flexible architec-tural support for security standards and defenses against a range of hardware attacks. The SAFES architecture is based on three main ideas: 1) reconfigurable security primitives; 2) reconfigurable hardware monitors; and 3) a hierarchy of security controllers at the primitive, system and executive level. Results are presented for reconfigurable AES and RC6 security primitives and highlight the value of such an architecture. This paper also emphasizes that re-configurable hardware is not just a technology for hardware ac-celerators dedicated to security primitives as has been focused on by most studies but a real solution to provide high-security and high-performance for a system. Index Terms—Cryptography, hardware monitors, performance and security policies, reconfigurable hardware, secure embedded systems, security primitive. I.
Application Adaptive Energy Efficient Clustered Architectures
- In Proceedings of the International Symposium on Low Power Electronics and Design
, 2004
"... As clock frequency and die area increase, achieving energy efficiency, while distributing a low skew, global clock signal becomes increasingly difficult. Challenges imposed by deep-submicron technologies can be alleviated by using a multiple voltage/multiple frequency island design style, or otherwi ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
As clock frequency and die area increase, achieving energy efficiency, while distributing a low skew, global clock signal becomes increasingly difficult. Challenges imposed by deep-submicron technologies can be alleviated by using a multiple voltage/multiple frequency island design style, or otherwise called, globally asynchronous, locally synchronous (GALS) design paradigm. This paper proposes a clustered architecture that enables applicationadaptive energy efficiency through the use of dynamic voltage scaling for application code that is rendered non-critical for the overall performance, at run-time. As opposed to task scheduling using dynamic voltage scaling (DVS) that exploits workload variations across applications, our approach targets workload variations within the same application, while on-the fly classifying code as critical or non-critical and adapting to changes in the criticality of such code portions. Our results show that application adaptive variable voltage/variable frequency clustered architectures are up to 22 % better in energy and 11 % better in energy-delay product than their nonadaptive counterparts, while providing up to 31 % more energy savings when compared to DVS applied globally.