Results 1 - 10
of
15
The instruction-set extension problem: A survey
- Reconfigurable Computing: Architectures, Tools and Applications, volume 4943 of Lecture Notes in Computer Science
, 2008
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Satisfying real-time constraints with custom instructions
- In ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS
, 2005
"... Instruction-set extensible processors allow an existing processor core to be extended with application-specific custom instructions. In this paper, we explore a novel application of instruction-set extensions to meet timing constraints in real-time embedded systems. In order to satisfy real-time con ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
(Show Context)
Instruction-set extensible processors allow an existing processor core to be extended with application-specific custom instructions. In this paper, we explore a novel application of instruction-set extensions to meet timing constraints in real-time embedded systems. In order to satisfy real-time constraints, the worst-case execution time (WCET) of a task should be reduced as opposed to its average-case execution time. Unfortunately, existing custom instruction selection techniques based on average-case profile information may not reduce a task’s WCET. We first develop an Integer Linear Programming (ILP) formulation to choose optimal instruction-set extensions for reducing the WCET. However, ILP solutions for this problem are often too expensive to compute. Therefore, we also propose an efficient and scalable heuristic that obtains quite close to the optimal results. Experiment results indicate that suitable choice of custom instructions can reduce the WCET of our benchmark programs by as much as 42 % (23.5 % on an average).
An Efficient Framework for Dynamic Reconfiguration of Instruction-Set Customization
, 2007
"... We present an efficient framework for dynamic reconfiguration of application-specific instruction-set customization. A key component of this framework is an iterative algorithm for temporal and spatial partitioning of the loop kernels. Our algorithm maximizes the performance gain of an application w ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
We present an efficient framework for dynamic reconfiguration of application-specific instruction-set customization. A key component of this framework is an iterative algorithm for temporal and spatial partitioning of the loop kernels. Our algorithm maximizes the performance gain of an application while taking into consideration the dynamic reconfiguration cost. It selects the appropriate custom instruction-sets for the loops and maps them into appropriate configurations. We model the temporal partitioning problem as a k-way graph partitioning problem. A dynamic programming based solution is used for the spatial partitioning. Comprehensive experimental results indicate that our iterative algorithm is highly scalable while producing optimal or near-optimal (99 % of the optimal) performance gain.
Hardware/software partitioning for custom instruction processors
, 2007
"... Technological Research Council of Turkey (TUBITAK) under National Ph.D. Scholarship Program. I would like to thank my thesis advisors Prof. Günhan Dündar and Assoc. Prof. Can Özturan for their invaluable guidance and support throughout the development of this thesis. In particular, I am grateful to ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Technological Research Council of Turkey (TUBITAK) under National Ph.D. Scholarship Program. I would like to thank my thesis advisors Prof. Günhan Dündar and Assoc. Prof. Can Özturan for their invaluable guidance and support throughout the development of this thesis. In particular, I am grateful to Assoc. Prof. Can Özturan for encouraging me to do graduate studies since my undergraduate years. His love in mathematics and theoretical computer science has been a constant source of inspiration for me. I am indebted to Prof. Günhan Dündar for dedicating some of his valuable time for me weekly. His trust in me and his continuous help with all kinds of difficulties I faced during my Ph.D. years helped me keep my motivation always high and made this thesis possible. Additionally, I would like to thank Assoc. Prof. Oskar Mencer and Prof. Wayne Luk for giving me the opportunity to work with the Custom Computing and Computer Architecture groups at Imperial College London in the last two years. The prolific environment and the insightful discussions have greatly contributed to the quality of this thesis. I also would like to thank Prof. Cem Ersoy and Prof.
Design space exploration of instruction set customizable MPSoCs for multimedia applications
- in International Conference on Embedded Computer Systems (SAMOS), 2010
"... Abstract — Multiprocessor System-on-Chips or MPSoCs in the embedded systems domain are increasingly employing multiple customizable processor cores. Such cores offer higher performance through application-specific instruction-set extensions without sacrificing the flexibility of software solutions. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract — Multiprocessor System-on-Chips or MPSoCs in the embedded systems domain are increasingly employing multiple customizable processor cores. Such cores offer higher performance through application-specific instruction-set extensions without sacrificing the flexibility of software solutions. Existing techniques for generating appropriate custom instructions for an application domain are primarily restricted to specializing a single processor with the objective of maximizing performance. In a customizable MPSoC, in contrast, the different processor cores have to be customized in a synergistic fashion to create a heterogeneous MPSoC solution that best suits the application. Moreover, such a platform presents conflicting design tradeoffs between system throughput and on-chip memory/logic capacity. In this paper, we propose a framework to systematically explore the complex design space of customizable MPSoC platforms. In particular, we focus on multimedia streaming applications, as this class of applications constitutes a primary target of MPSoC platforms. We capture the high variability in execution times and the bursty nature of streaming applications through appropriate mathematical models. Thus, our framework can efficiently and accurately evaluate the different customization choices without resorting to expensive system-level simulations. We perform a detailed case study of an MPEG encoder application with our framework. It reveals design points with interesting tradeoffs between silicon area requirement for the custom instructions and the on-chip storage for partially-processed video data, while ensuring that all the design points strictly satisfy required QoS guarantees. I.
Looking for Instruction Patterns in the Design of Extensible Processors
"... In the last years, several approaches were proposed to improve embedded systems performance by extending base processors to fit specific applications performance demands. Although the majority of the contributions focus specially on the architectural challenges, which range from completely reconfigu ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
In the last years, several approaches were proposed to improve embedded systems performance by extending base processors to fit specific applications performance demands. Although the majority of the contributions focus specially on the architectural challenges, which range from completely reconfigurable hardware to custom ASICs, every work faces a common issue: the need of the extraction, analysis and transformation of code patterns from application source, in order to implement them in special hardware units. In this paper we present Pattlib, a library of C functions and a file format specifically designed to manipulate and store instruction patterns, binding software representations to hardware descriptions. As an intermediate pattern representation, Pattlib fits as a common denominator among a compiler, hardware generation tools and pattern manipulation tools, allowing for the highest modularization of the design flow of extensible processors. The paper also presents the results of our investigation of code patterns that are suitable for becoming new instructions, extracted from the Mediabench and MiBench benchmarks. Our approach was able to find patterns that occur in up to 15 applications, exposing code regularity.
1 Design Space Exploration of Instruction Set Customizable MPSoCs for Multimedia Applications
"... Abstract — Multiprocessor System-on-Chips or MPSoCs in the embedded systems domain are increasingly employing multiple customizable processor cores. Such cores offer higher performance through application-specific instruction-set extensions without sacrificing the flexibility of software solutions. ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — Multiprocessor System-on-Chips or MPSoCs in the embedded systems domain are increasingly employing multiple customizable processor cores. Such cores offer higher performance through application-specific instruction-set extensions without sacrificing the flexibility of software solutions. Existing techniques for generating appropriate custom instructions for an application domain are primarily restricted to specializing a single processor with the objective of maximizing performance. In a customizable MPSoC, in contrast, the different processor cores have to be customized in a synergistic fashion to create a heterogeneous MPSoC solution that best suits the application. Moreover, such a platform presents conflicting design tradeoffs between system throughput and on-chip memory/logic capacity. In this paper, we propose a framework to systematically explore the complex design space of customizable MPSoC platforms. In particular, we focus on multimedia streaming applications, as this class of applications constitutes a primary target of MPSoC platforms. We capture the high variability in execution times and the bursty nature of streaming applications through appropriate mathematical models. Thus, our framework can efficiently and accurately evaluate the different customization choices without resorting to expensive system-level simulations. We perform a detailed case study of an MPEG encoder application with our framework. It reveals design points with interesting tradeoffs between silicon area requirement for the custom instructions and the on-chip storage for partially-processed video data, while ensuring that all the design points strictly satisfy required QoS guarantees. I.
M BOURENNANE El-bey
"... I would like to extend my deepest gratitude and appreciation to my advisors Alain Merigot, Omar Hammami and Lionel Lacassagne at IEF and ENSTA. Guidance and instruction of Mr Omar Hammami has played an invaluable part in both my graduate studies and PhD Work. I will specially like to thank Mr. Alain ..."
Abstract
- Add to MetaCart
I would like to extend my deepest gratitude and appreciation to my advisors Alain Merigot, Omar Hammami and Lionel Lacassagne at IEF and ENSTA. Guidance and instruction of Mr Omar Hammami has played an invaluable part in both my graduate studies and PhD Work. I will specially like to thank Mr. Alain Sibille for his efforts of creating and maintaining an excellent research environment at LEI, ENSTA. It has been a pleasure to work with my colleagues at IEF and ENSTA Paris. They have provided a friendly, encouraging, and supportive environment for me to work in. I will specially like to thank Asad Mahmood, Taj Muhammad Khan and Husnain Mansoor Ali for being there with me to help me at all difficult moments and sharing their experiences in all ups and downs during these three and a half years. I am very thankful to my reviewers Habibullah Jamal and El-Bay Bourennane for helping me improve my manuscript and for providing me valuable feedback on my research work. I am honoured to have Eric Martin and Bertrand Granado on my PhD jury and I am thankful to them for accepting to be a part of it. Finally I would like to recognize the best family anyone could ever ask for, especially
Processor Evaluation Cube: A classification and survey of processor evaluation techniques
, 1404
"... Selecting appropriate hardware resources corresponding to the application is an important task for design of an embedded system or a SoC. A large number of techniques have been proposed in literature to select a processor matching with the application requirements. In this report, we propose a frame ..."
Abstract
- Add to MetaCart
(Show Context)
Selecting appropriate hardware resources corresponding to the application is an important task for design of an embedded system or a SoC. A large number of techniques have been proposed in literature to select a processor matching with the application requirements. In this report, we propose a framework called Processor Evaluation Cube (PEC) which helps in systematic classification and comparison of various processor evaluation techniques. The three axes of PEC are: Analysis, Architecture and Abstraction. The Analysis axis distinguishes methods employing static analysis or simulation; Architecture axis distinguishes methods evaluating single processor or multiprocessor computing platforms; Abstraction axis distinguishes methods employing clock true evaluation or higher level execution time estimation techniques. Our survey not only puts the existing techniques in proper perspective but also points to the weaknesses in the existing techniques which need to be removed if these techniques have to be
Affiliations
"... A high-performance data-path to implement DSP kernels is introduced in this paper. The data-path is realized by a Flexible Computational Component (FCC), which is a pure combinational circuit and it can implement any 2x2 template (cluster) of primitive resources. Thus, the data-path’s performance be ..."
Abstract
- Add to MetaCart
(Show Context)
A high-performance data-path to implement DSP kernels is introduced in this paper. The data-path is realized by a Flexible Computational Component (FCC), which is a pure combinational circuit and it can implement any 2x2 template (cluster) of primitive resources. Thus, the data-path’s performance benefits from the intra-component chaining of operations. Due to the flexible structure of the FCC, the data-path is implemented by a small number of such components. This allows for direct connections among FCCs and for exploiting inter-component chaining, which further improves performance. Due to the universality and flexibility of the FCC, simple and efficient algorithms perform scheduling and binding of the Data Flow Graph. DSP benchmarks synthesized with the FCC data-path method show significant performance improvements when compared with template-based data-path designs. Detailed results on execution time, FCC utilization, and area are presented.