Results 1 -
9 of
9
Microarchitectural Design Space Exploration Using An Architecture-Centric Approach
"... The microarchitectural design space of a new processor is too large for an architect to evaluate in its entirety. Even with the use of statistical simulation, evaluation of a single configuration can take excessive time due to the need to run a set of benchmarks with realistic workloads. This paper ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
The microarchitectural design space of a new processor is too large for an architect to evaluate in its entirety. Even with the use of statistical simulation, evaluation of a single configuration can take excessive time due to the need to run a set of benchmarks with realistic workloads. This paper proposes a novel machine learning model that can quickly and accurately predict the performance and energy consumption of any set of programs on any microarchitectural configuration. This architecture-centric approach uses prior knowledge from off-line training and applies it across benchmarks. This allows our model to predict the performance of any new program across the entire microarchitecture configuration space with just 32 further simulations. We compare our approach to a state-of-the-art programspecific predictor and show that we significantly reduce prediction error. We reduce the average error when predicting performance from 24 % to just 7 % and increase the correlation coefficient from 0.55 to 0.95. We then show that this predictor can be used to guide the search of the design space, selecting the best configuration for energy-delay in just 3 further simulations, reducing it to 0.85. We also evaluate the cost of off-line learning and show that we can still achieve a high level of accuracy when using just 5 benchmarks to train. Finally, we analyse our design space and show how different microarchitectural parameters can affect the cycles, energy and energy-delay of the architectural configurations. 1.
Efficiency Trends and Limits from Comprehensive Microarchitectural Adaptivity
- ASPLOS'08
, 2008
"... Increasing demand for power-efficient, high-performance computing requires tuning applications and/or the underlying hardware to improve the mapping between workload heterogeneity and computational resources. To assess the potential benefits of hardware tuning, we propose a framework that leverages ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Increasing demand for power-efficient, high-performance computing requires tuning applications and/or the underlying hardware to improve the mapping between workload heterogeneity and computational resources. To assess the potential benefits of hardware tuning, we propose a framework that leverages synergistic interactions between recent advances in (a) sampling, (b) predictive modeling, and (c) optimization heuristics. This framework enables qualitatively new capabilities in analyzing the performance and power characteristics of adaptive microarchitectures. For the first time, we are able to simultaneously consider high temporal and comprehensive spatial adaptivity. In particular, we optimize efficiency for many, short adaptive intervals and identify the best configuration of 15 parameters, which define a space of 240B points. With frequent sub-application reconfiguration and a fully reconfigurable hardware substrate, adaptive microarchitectures achieve bips 3 /w efficiency gains of up to 5.3x (median 2.4x) relative to their static counterparts already optimized for a given application. This 5.3x efficiency gain is derived from a 1.6x performance gain and 0.8x power reduction. Although several applications achieve a significant fraction of their potential efficiency with as few as three adaptive parameters, the three most significant parameters differ across applications. These differences motivate a hardware substrate capable of comprehensive adaptivity to meet these diverse application requirements.
Roughness of Microarchitectural Design Topologies and its Implications for Optimization
"... Recent advances in statistical inference and machine learning close the divide between simulation and classical optimization, thereby enabling more rigorous and robust microarchitectural studies. To most effectively utilize these now computationally tractable techniques, we characterize design topol ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Recent advances in statistical inference and machine learning close the divide between simulation and classical optimization, thereby enabling more rigorous and robust microarchitectural studies. To most effectively utilize these now computationally tractable techniques, we characterize design topology roughness and leverage this characterization to guide our usage of analysis and optimization methods. In particular, we compute roughness metrics that require high-order derivatives and multi-dimensional integrals of design metrics, such as performance and power. These roughness metrics exhibit noteworthy correlations (1) against regression model error, (2) against non-linearities and non-monotonicities of contour maps, and (3) against the effectiveness of optimization heuristics such as gradient ascent. Thus, this work quantifies the implications of design topology roughness for commonly used methods and practices in microarchitectural analysis. 1
Accurate Memory Data Flow Modeling in Statistical Simulation
- Proceedings of the International Conference on Supercomputing, 2006
"... Abstract—Microprocessor design is both complex and time consuming: Exploring a huge design space for identifying the optimal design under a number of constraints is infeasible using detailed architectural simulation of entire benchmark executions. Statistical simulation is a recently introduced appr ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract—Microprocessor design is both complex and time consuming: Exploring a huge design space for identifying the optimal design under a number of constraints is infeasible using detailed architectural simulation of entire benchmark executions. Statistical simulation is a recently introduced approach for efficiently culling the microprocessor design space. The basic idea of statistical simulation is to collect a number of important program characteristics and to generate a synthetic trace from it. Simulating this synthetic trace is extremely fast as it contains only a million instructions. This paper improves the statistical simulation methodology by proposing accurate memory data flow models. We propose 1) cache miss correlation, or measuring cache statistics conditionally dependent on the global cache hit/miss history, for modeling cache miss patterns and memory-level parallelism, 2) cache line reuse distributions for modeling accesses to outstanding cache lines, and 3) through-memory read-after-write dependency distributions for modeling load forwarding and bypassing. Our experiments using the SPEC CPU2000 benchmarks show substantial improvements compared to current state-of-the-art statistical simulation methods. For example, for our baseline configuration, we reduce the average instructions per cycle (IPC) prediction error from 10.9 to 2.1 percent; the maximum error observed equals 5.8 percent. In addition, we show that performance trends are predicted very accurately, making statistical simulation enhanced with accurate data flow models a useful tool for efficient and accurate microprocessor design space explorations. Index Terms—Performance of systems, modeling techniques, simulation. Ç 1
Distilling the Essence of Proprietary Workloads into Miniature Benchmarks
"... Benchmarks set standards for innovation in computer architecture research and industry product development. Consequently, it is of paramount importance that the workload used in computer architecture research and development is representative of real-world applications. However, composing such repre ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Benchmarks set standards for innovation in computer architecture research and industry product development. Consequently, it is of paramount importance that the workload used in computer architecture research and development is representative of real-world applications. However, composing such representative workloads poses practical challenges to application analysis teams and benchmark developers – (1) real-world workloads are intellectual property and vendors hesitate to share these proprietary applications; and (2) porting and reducing these applications to benchmarks that can be simulated in a tractable amount of time is a non-trivial task. In this paper we address this problem by proposing a technique that automatically distills key inherent performance attributes of a proprietary workload and captures them into a miniature synthetic benchmark clone. The advantage of the benchmark clone is that it hides the functional meaning of the code but exhibits similar performance characteristics as the target application. Moreover, the dynamic instruction count of the synthetic benchmark clone is substantially shorter than the proprietary application, greatly reducing overall simulation time – for SPEC CPU, the simulation time reduction is over five orders of magnitude compared to entire benchmark execution. By using a set of benchmarks representative of general-purpose, scientific, and embedded applications, we demonstrate that the power and performance characteristics of the synthetic benchmark clone correlate well with
A Tutorial in Spatial Sampling and Regression Strategies for Microarchitectural Analysis
"... We present a new simulation paradigm for microarchitectural design evaluation and optimization. This paradigm counters increasing simulation costs attributed to the exponentially increasing size of design spaces and the need for more thorough, comprehensive studies when evaluating increasingly diver ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present a new simulation paradigm for microarchitectural design evaluation and optimization. This paradigm counters increasing simulation costs attributed to the exponentially increasing size of design spaces and the need for more thorough, comprehensive studies when evaluating increasingly diverse design options. We present a tutorial for (1) obtaining a more comprehensive understanding of the design space by (2) selectively simulating a modest number of designs from that space and then (3) more effectively leveraging that simulation data using techniques in statistical inference. We survey techniques in spatial sampling to obtain designs for simulation. We also detail the statistical techniques required to derive efficient and robust models, interleaving code segments from scripts performing these analyses. The predictive ability and computational efficiency of these regression models enable new capabilities in microarchitectural design space studies. Collectively, our experiences with this paradigm suggest significant potential for accurate, efficient statistical inference in the microarchitectural domain. 1
Applied Inference: Case Studies in Microarchitectural Design
"... We propose and apply a new simulation paradigm for microarchitectural design evaluation and optimization. This paradigm enables more comprehensive design studies by combining spatial sampling and statistical inference. Specifically, this paradigm (1) defines a large, comprehensive design space, (2) ..."
Abstract
- Add to MetaCart
We propose and apply a new simulation paradigm for microarchitectural design evaluation and optimization. This paradigm enables more comprehensive design studies by combining spatial sampling and statistical inference. Specifically, this paradigm (1) defines a large, comprehensive design space, (2) samples points from the space for simulation, and (3) constructs regression models based on sparse simulations. This approach greatly improves the computational efficiency of microarchitectural simulation and enables new capabilities in design space exploration. We illustrate new capabilities in three case studies for a large design space of approximately 260,000 points: (1) Pareto frontier, (2) pipeline depth, and (3) multiprocessor heterogeneity analyses. In particular, regression models are exhaustively evaluated to identify Pareto optimal designs that maximize performance for given power budgets. These models enable pipeline depth studies in which all parameters vary simultaneously with depth, thereby more effectively revealing interactions with non-depth parameters. Heterogeneity analysis combines regression based optimization with clustering heuristics to identify efficient design compromises between similar optimal architectures. These compromises are potential core designs in a heterogeneous multicore architecture. Increasing heterogeneity can improve bips3 /w efficiency by as much as 2.4x, a theoretical upper bound on heterogeneity benefits that neglects contention between shared resources as well as design complexity. Collectively these studies demonstrate regression models ’ ability to expose trends and identify optima in diverse design regions, motivating the application of such models in statistical inference for more effective use of modern simulator infrastructure.
RapidEarly-StageMicroarchitectureDesignUsingPredictiveModels ChristopheDubach,TimothyM.Jones,MichaelF.P.O’Boyle
"... involvestheevaluationofawiderangeofbenchmarksacrossa largenumberofarchitecturalconfigurations.Severalmethods areusedtocutdownontherequiredsimulationtime.Typically, however, existing approaches fail to capture true program behaviouraccuratelyandrequireanon-negligiblenumberof trainingsimulationstoberu ..."
Abstract
- Add to MetaCart
involvestheevaluationofawiderangeofbenchmarksacrossa largenumberofarchitecturalconfigurations.Severalmethods areusedtocutdownontherequiredsimulationtime.Typically, however, existing approaches fail to capture true program behaviouraccuratelyandrequireanon-negligiblenumberof trainingsimulationstoberun. Weaddresstheseproblemsbydevelopingamachinelearning modelthatpredictsthemeanofanygivenmetric,e.g.cycles or energy, across a range of programs, for any microarchitecturalconfiguration.Itworksbycombiningonlythemost representativeprogramsfromthebenchmarksuitebasedon theirbehaviourinthedesignspaceunderconsideration.Weuse ourmodeltopredictthemeanperformance,energy,energydelay(ED)andenergy-delay-squared(EDD)oftheSPECCPU 2000andMiBenchbenchmarksuiteswithinourdesignspace. We achieve the same level of accuracy as two state-of-theartpredictiontechniquesbutrequirefivetimesfewertraining simulations. Furthermore, our technique is scalable and we showthat,asymptotically,itrequiresanorderofmagnitude fewersimulationsthantheseexistingapproaches. I.
Capability Evaluation of Embedded Systems, Cell Phone Case Study
"... This paper proposes a generic technique for the evaluation and comparison of the capability of Embedded Systems (ESs) as decision-making aid. The major issue in this area is conducted by comparing and evaluating success factors and the risks associated with each ES. In this regards decisionmaking mo ..."
Abstract
- Add to MetaCart
This paper proposes a generic technique for the evaluation and comparison of the capability of Embedded Systems (ESs) as decision-making aid. The major issue in this area is conducted by comparing and evaluating success factors and the risks associated with each ES. In this regards decisionmaking modelling concepts are based on the identification of capability factors and finding mathematical models to describe or to prescribe best choice. The techniques utilize a combination of subjective and qualitative assumptions and mathematical modelling techniques. The digital cell phone as a sample of ES is analyzed as case study to show the application of the proposed approach. The results show the high performance of this methodology for capability evaluation of such systems.

