Results 1 - 10
of
24
Real-time scheduling on multicore platforms
- Proc. of the 12th IEEE Real-Time and Embedded Technology and Applications Symp
, 2006
"... Multicore architectures, which have multiple processing units on a single chip, are widely viewed as a way to achieve higher processor performance, given that thermal and power problems impose limits on the performance of single-core designs. Accordingly, several chip manufacturers have already rele ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Multicore architectures, which have multiple processing units on a single chip, are widely viewed as a way to achieve higher processor performance, given that thermal and power problems impose limits on the performance of single-core designs. Accordingly, several chip manufacturers have already released, or will soon release, chips with dual cores, and it is predicted that chips with up to 32 cores will be available within a decade. To effectively use the available processing resources on multicore platforms, software designs should avoid co-executing applications or threads that can worsen the performance of shared caches, if not thrash them. While cache-aware scheduling techniques for such platforms have been proposed for throughput-oriented applications, to the best of our knowledge, no such work has targeted real-time applications. In this paper, we propose and evaluate a cache-aware Pfair-based scheduling scheme for real-time tasks on multicore platforms.
Multiprocessor scheduling in processor-based router platforms: Issues and ideas
- In Proceedings of the 2nd Workshop on Network Processors
, 2003
"... Abstract Two important trends are expected to guide the de-sign of next-generation networks. First, with the commercialization of the Internet, providers will usevalue-added services to differentiate their service offerings from other providers; such services requirethe use of sophisticated resource ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
Abstract Two important trends are expected to guide the de-sign of next-generation networks. First, with the commercialization of the Internet, providers will usevalue-added services to differentiate their service offerings from other providers; such services requirethe use of sophisticated resource scheduling mechanisms in routers. Second, to enable extensibilityand the deployment of new services in a rapid and cost-effective manner, routers will be instantiated us-ing programmable network processors. In this research, our goal is to develop sophisticated multipro-cessor scheduling mechanisms that would enable networks that deploy such router platforms to provideservice guarantees to applications. Existing multiprocessor scheduling techniques are either not applicableto router platforms due to their complexity or simplistic assumptions, or are not based on rigorous for-malism, which is necessary to enable strong assertions about service guarantees. In this work, we proposeto address these limitations. This paper presents our current ideas and planned future directions. 1 Introduction Routers are the basic building blocks of wide-area networks such as the Internet. Conventionally, routers have been built using application-specific integrated circuits (ASICs) that enable highspeed packet switching. Unfortunately, ASIC designs take months to develop, and routers built using them are costly to deploy. In order to enable router extensibility in a rapid and costeffective manner, significant effort is now be*Work supported by NSF grants CCR 9972211, CCR 9988327, ITR 0082866, and CCR 0204312. ing invested in a different approach: implementing routers on programmable network processors (NPs) [1, 2, 3, 34].
Cache-Aware Real-Time Scheduling on Multicore Platforms: Heuristics and a Case Study
, 2008
"... Multicore architectures, which have multiple processing units on a single chip, have been adopted by most chip manufacturers. Most such chips contain on-chip caches that areshared by some or all of the cores on the chip. To effectively use the available processing resources on such plat-forms, sched ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Multicore architectures, which have multiple processing units on a single chip, have been adopted by most chip manufacturers. Most such chips contain on-chip caches that areshared by some or all of the cores on the chip. To effectively use the available processing resources on such plat-forms, scheduling methods must be aware of these caches. In this paper, we explore various heuristics that attempt to im-prove cache performance when scheduling real-time workloads. Such heuristics are applicable when multiple multi-threaded applications exist with large working sets. In addition, we present a case study that shows how our best-performing heuristics can improve the end-user performance of video encoding applications.
Parallel real-time task scheduling on multicore platforms
- PROC. OF THE 27TH IEEE REAL-TIME SYSTEMS SYMP
, 2006
"... We propose a scheduling method for real-time systems implemented on multicore platforms that encourages individual threads of multithreaded real-time tasks to be scheduled together. When such threads are cooperative and share a common working set, this method enables more effective use of on-chip sh ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
We propose a scheduling method for real-time systems implemented on multicore platforms that encourages individual threads of multithreaded real-time tasks to be scheduled together. When such threads are cooperative and share a common working set, this method enables more effective use of on-chip shared caches.
Virtual Private Machines: A Resource Abstraction
- In University of Wisconsin - Madison, ECE TR
, 2007
"... Virtual Private Machines (VPM) are an abstraction for managing resource sharing in multi-core computer systems. A VPM consists of a complete set of resources, which includes both spatial (microar-chitecture) and temporal (processor time slice) resources. Tasks assigned VPMs achieve a minimum level o ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Virtual Private Machines (VPM) are an abstraction for managing resource sharing in multi-core computer systems. A VPM consists of a complete set of resources, which includes both spatial (microar-chitecture) and temporal (processor time slice) resources. Tasks assigned VPMs achieve a minimum level of performance regardless of other tasks in the system – that is, a VPM provides performance isola-tion. The VPM abstraction provides the interface between a system’s resource management policies and mechanisms. VPM policies, implemented primarily in software, translate system-level performance re-quirements into VPM assignments. Then VPM mechanisms, implemented in hardware, enforce the VPM assignments. To illustrate the potential of the VPM abstraction, we propose and implement a complete set of VPM policies and mechanisms. The policies translate applications ' system-level Quality of Service re-quirements into VPMs and distribute unassigned and unused resources in order to optimize aggregate system-level performance. A simulation-based study shows that the proposed VPM policies and mecha-nisms, in combination, provide a high degree of QoS and can significantly improve aggregate perform-ance. 1.
Comparing the Energy Efficiency of CMP and SMT Architectures for Multimedia Workloads
- In International Conference on Supercomputing
, 2003
"... Chip multiprocessing (CMP) and simultaneous multithreading (SMT) are two recently adopted techniques for improving the throughput of general-purpose processors by using multithreading. These techniques are likely to benefit the increasingly important real-time multimedia workloads, which are inheren ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Chip multiprocessing (CMP) and simultaneous multithreading (SMT) are two recently adopted techniques for improving the throughput of general-purpose processors by using multithreading. These techniques are likely to benefit the increasingly important real-time multimedia workloads, which are inherently multithreaded. These workloads, however, often run in an energy constrained environment. This paper compares the energy efficiency of CMP and SMT for multimedia applications. Assuming out-of-order processors as the core components, we investigate the design space by varying the core processor complexity and using a range of frequencies. To measure energy efficiency, we compare the energy consumed by systems that provide the same performance in the entire design space. We find that across the performance spectrum, a CMP configuration is the most energy efficient for our systems and applications. Further, CMP processors are amenable to further energy reductions through the use of recently proposed adaptive techniques. Finally, since SMT can provide benefits for single-thread performance, we propose a hybrid CMP/SMT architecture consisting of a CMP with SMT processor cores. This architecture shows significantly better energy-efficiency than pure SMT, and is a good compromise solution that achieves both the high single-thread performance of SMT and the energy efficiency of multithreaded CMP.
On the Design and Implementation of a Cache-Aware Multicore Real-Time Scheduler
, 2009
"... Multicore architectures, which have multiple processing units on a single chip, have been adopted by most chip manufacturers. Most such chips contain on-chip caches that are shared by some or all of the cores on the chip. Prior work has presented methods for improving the performance of such caches ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Multicore architectures, which have multiple processing units on a single chip, have been adopted by most chip manufacturers. Most such chips contain on-chip caches that are shared by some or all of the cores on the chip. Prior work has presented methods for improving the performance of such caches when scheduling soft real-time workloads. Given these methods, two additional research issues arise: (1) how to automatically profile the cache behavior of real-time tasks within the scheduler; and (2) how to implement scheduling methods efficiently, so that scheduling overheads do not offset any cache-related performance gains. This paper addresses these two issues in an implementation of a cacheaware soft real-time scheduler within Linux, and shows that the use of this scheduler can result in performance improvements that directly result from a decrease in shared cache miss rates.
Safely Exploiting Multithreaded Processors to Tolerate Memory Latency in Real-Time Systems
- In Proc. of the 2004 Int’l Conf. on Compilers, Architecture, and Synthesis for Embedded Systems
, 2004
"... A coarse-grain multithreaded processor can effectively hide long memory latencies by quickly switching to an alternate task when the active task issues a memory request, improving overall throughput. However, dynamic switching cannot be safely exploited to improve throughput in hard-real-time embedd ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
A coarse-grain multithreaded processor can effectively hide long memory latencies by quickly switching to an alternate task when the active task issues a memory request, improving overall throughput. However, dynamic switching cannot be safely exploited to improve throughput in hard-real-time embedded systems. The schedulability of a task-set (guaranteeing all tasks meet deadlines) must be determined a priori using offline schedulability tests. Any computation/memory overlap must be statically accounted for. We develop a novel analytical framework that bounds the overlap between computation of a pipeline-resident-task and on-going memory transfers of other tasks. A simple closed-form schedulability test is derived, that only depends on the aggregate computation (C) and memory (M) components of tasks. Namely, the technique does not require
Virtual multiprocessor: an analyzable, high-performance architecture for real-time computing
- In Proc. of the 2005 Int’l Conf. on Compilers, Architecture, and Synthesis for Embedded Systems
, 2005
"... The design of a real-time architecture is governed by a trade-off between analyzability necessary for real-time formalism and performance demanded by high-end embedded systems. We reconcile this trade-off with a novel Real-time Virtual Multiprocessor (RVMP). RVMP virtualizes a single in-order supers ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The design of a real-time architecture is governed by a trade-off between analyzability necessary for real-time formalism and performance demanded by high-end embedded systems. We reconcile this trade-off with a novel Real-time Virtual Multiprocessor (RVMP). RVMP virtualizes a single in-order superscalar processor into multiple interference-free different-sized virtual processors. This provides a flexible spatial dimension. In the time dimension, the number and size of virtual processors can be rapidly reconfigured. A simple real-time scheduling approach concentrates scheduling within a small time interval, producing a simple repeating space/time schedule that orchestrates virtualization. RVMP successfully combines the analyzability (hence real-time formalism) of multiple processors with the flexibility (hence high performance) of simultaneous multithreading (SMT). Worst-case schedulability experiments show that more task-sets are provably schedulable on RVMP than on conventional rigid multiprocessors with equal aggregate resources, and the advantage only intensifies with more demanding task-sets. Run-time experiments show RVMP’s statically-controlled coarser-grain space/time configurability is as effective as unsafe SMT. Moreover, RVMP provides a real-time formalism that SMT does not currently provide.
Implementing Real-time Scheduling Within a Multithreaded Java Microcontroller
, 2002
"... This paper presents the design, evaluation and hardware implementation of real-time scheduling schemes, which are embedded in a multithreaded Java microcontroller. We show the feasibility of a hardware real-time scheduler integrated deeply into the processor pipeline with a VHDL design and its synth ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper presents the design, evaluation and hardware implementation of real-time scheduling schemes, which are embedded in a multithreaded Java microcontroller. We show the feasibility of a hardware real-time scheduler integrated deeply into the processor pipeline with a VHDL design and its synthesis. Evaluations with a software simulator and real-time applications as benchmarks show that hardware multithreading reaches a 1.2 to 1.6 performance increase for hard real-time applications (multithreading without latency utilization) and a 1.8 to 2.6 speedup by latency utilization for programs without hard real-time requirements. We also show that even for the complex scheduling algorithms EDF (Earliest Deadline First), LLF (Least Laxity First), and GP (Guaranteed Percentage) a scheduling decision is possible within one processor cycle of a 327 MHz, 325 MHz, resp. 274 MHz processor with four threads. With respect to real-time scheduling on a multithreaded microcontroller, the LLF scheme outperforms the FPP (Fixed Priority Preemptive), EDF, and GP schemes. However, only GP allows isolation of threads.

