Results 1 - 10
of
13
Bounding preemption delay within data cache reference patterns for real-time tasks
- In IEEE Real-Time Embedded Technology and Applications Symposium
, 2006
"... Caches have become invaluable for higher-end architectures to hide, in part, the increasing gap between processor speed and memory access times. While the effect of caches on timing predictability of single real-time tasks has been the focus of much research, bounding the overhead of cache warm-ups ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
Caches have become invaluable for higher-end architectures to hide, in part, the increasing gap between processor speed and memory access times. While the effect of caches on timing predictability of single real-time tasks has been the focus of much research, bounding the overhead of cache warm-ups after preemptions remains a challenging problem, particularly for data caches. In this paper, we bound the penalty of cache interference for real-time tasks by providing accurate predictions of the data cache behavior across preemptions. For every task, we derive data cache reference patterns for all scalar and non-scalar references. Partial timing of a task is performed up to a preemption point using these patterns. The effects of cache interference are then analyzed using a settheoretic approach, which identifies the number and location of additional misses due to preemption. A feedback mechanism provides the means to interact with the timing analyzer, which subsequently times another interval of a task bounded by the next preemption. Our experimental results demonstrate that it is sufficient to consider the n most expensive preemption points, where n is the maximum possible number of preemptions. Further, it is shown that such accurate modeling of data cache behavior in preemptive systems significantly improves the WCET predictions for a task. To the best of our knowledge, our work of bounding preemption delay for data caches is unprecedented. 1.
Tightening the bounds on feasible preemption points
- In Proc. of the 27th IEEE International Real-Time Systems Symposium (RTSS
, 2006
"... Caches have become invaluable for higher-end architectures to hide, in part, the increasing gap between processor speed and memory access times. While the effect of caches on timing predictability of single real-time tasks has been the focus of much research, bounding the overhead of cache warm-ups ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Caches have become invaluable for higher-end architectures to hide, in part, the increasing gap between processor speed and memory access times. While the effect of caches on timing predictability of single real-time tasks has been the focus of much research, bounding the overhead of cache warm-ups after preemptions remains a challenging problem, particularly for data caches. This paper makes multiple contributions. First, we bound the penalty of cache interference for real-time tasks by providing accurate predictions of the data cache behavior across preemptions, including instruction cache and pipeline effects. For every task, we derive data cache reference patterns for all scalar and non-scalar references. We show that, when considering cache preemption, the critical instance does not occur upon simultaneous release of all tasks. Second, we develop analysis methods to calculate tight upper bounds on the number of possible preemption points for each job of a task and consider the worst-case placement of these preemption points. Partial timing of a job is performed up to a preemption point using the cache reference patterns. The effects of cache interference are then analyzed using a set-theoretic approach, which identifies the number and location of additional misses due to preemption. A feedback mechanism provides the means to interact with the timing analyzer, which subsequently times another interval of a job bounded by the next preemption. Significant improvements in tightening bounds of up to an order of magnitude over two prior methods and up to half a magnitude over a third prior method are obtained by experiments for (a) the number of preemptions, (b) the WCET and (c) the response time of a task. Overall, this work contributes (1) by formulating a new critical instance under cache preemption, (2) by proving a new analysis method to derive bounds on the number of preemptions and (3) by determining actual preemption points when calculating the preemption delay under consideration of data caches. 1.
Tightening the Bounds on Feasible Preemptions
"... Data Caches are an increasingly important architectural feature in most modern computer systems. They help bridge the gap between processor speeds and memory access times. One inherent difficulty of using data caches in a real-time system is the unpredictability of memory accesses, which makes it di ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Data Caches are an increasingly important architectural feature in most modern computer systems. They help bridge the gap between processor speeds and memory access times. One inherent difficulty of using data caches in a real-time system is the unpredictability of memory accesses, which makes it difficult to calculate worst-case execution times (WCETs) of real-time tasks. While cache analysis for single real-time tasks has been the focus of much research in the past, bounding the preemption delay in a multi-task preemptive environment is a challenging problem, particularly for data caches. This paper makes multiple contributions in the context of independent, periodic tasks with deadlines less than or equal to their periods executing on a single processor. 1) For every task, we derive data cache reference patterns for all scalar and non-scalar references. These patterns are used to derive an upper bound on the WCET of real-time tasks. 2) We show that, when considering cache preemption effects, the critical instant does not occur upon simultaneous release of all tasks. We provide results for task sets with phase differences to prove our claim. 3) We develop a method to calculate tight upper bounds on the maximum number of possible preemptions for each job of a task and, considering the worst-case placement of these preemption points, derive a much tighter bound on its WCET. We provide results using both static and dynamic priority schemes. Our results show significant improvements in the bounds derived. We achieve up to an order of magnitude improvement over two prior methods and up to half an order of magnitude over a third prior method for the number of preemptions, the WCET and the response time of a task. Consideration of the best-case and worst-case execution times of higher priority jobs enables these improvements.
Cache-aware timing analysis of streaming applications
- IN ECRTS ’07: PROCEEDINGS OF THE 19TH EUROMICRO CONFERENCE ON REAL-TIME SYSTEMS
, 2007
"... Of late, there has been a considerable interest in models, algorithms and methodologies specifically targeted towards designing hardware and software for streaming applications. Such applications process potentially infinite streams of audio/video data or network packets and are found in a wide rang ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Of late, there has been a considerable interest in models, algorithms and methodologies specifically targeted towards designing hardware and software for streaming applications. Such applications process potentially infinite streams of audio/video data or network packets and are found in a wide range of devices, starting from mobile phones to set-top boxes. Given a streaming application and an architecture, the timing analysis problem is to determine the timing properties of the processed data stream, given the timing properties of the input stream. This problem arises while determining many common performance metrics related to streaming applications and the mapping of such applications onto hardware architectures. Such metrics include the maximum delay experienced by any data item of the stream and the maximum backlog or the buffer requirement to store the incoming stream. Most of the previous work related to estimating or optimizing these metrics take a high-level view of the architecture and neglect micro-architectural features such as caches. In this paper, we show that an accurate estimation of these metrics, however, heavily relies on an appropriate modeling of the
Bounding Worst-Case Response Time for Tasks With Non-Preemptive Regions ∗
"... Real-time schedulability theory requires a priori knowledge of the worst-case execution time (WCET) of every task in the system. Fundamental to the calculation of WCET is a scheduling policy that determines priorities among tasks. Such policies can be non-preemptive or preemptive. While the former r ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Real-time schedulability theory requires a priori knowledge of the worst-case execution time (WCET) of every task in the system. Fundamental to the calculation of WCET is a scheduling policy that determines priorities among tasks. Such policies can be non-preemptive or preemptive. While the former reduces analysis complexity and overhead in implementation, the latter provides increased flexibility in terms of schedulability for higher utilizations of arbitrary task sets. In practice, tasks often have non-preemptive regions but are otherwise scheduled preemptively. To bound the WCET of tasks, architectural features have to be considered in the context of a scheduling scheme. In particular, preemption affects caches, which can be modeled by bounding the cache-related preemption delay (CRPD) of a task. In this paper, we propose a framework that provides safe and tight bounds of the data-cache related preemption delay (D-CRPD), the WCET and the worst-case response times, not just for homogeneous tasks under fully preemptive or fully non-preemptive systems, but for tasks with a non-preemptive region. By retaining the option of preemption where legal, task sets become schedulable that might otherwise not be. Yet, by requiring a region within a task to be non-preemptive, correctness is ensured in terms of arbitration of access to shared resources. Experimental results confirm an increase in schedulability of a task set with nonpreemptive regions over an equivalent task set where only those tasks with non-preemptive regions are scheduled nonpreemptively altogether. Quantitative results further indicate that D-CRPD bounds and response-time bounds comparable to task sets with fully non-preemptive tasks can be retained in the presence of short non-preemptive regions. To the best of our knowledge, this is the first framework that performs D-CRPD calculations in a system for tasks with a non-preemptive region. ∗ This work was supported in part by NSF grants CCR-0310860, CCR-
Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores
"... Abstract—Memory accesses form an important source of timing unpredictability. Timing analysis of real-time embedded software thus requires bounding the time for memory accesses. Multiprocessing, a popular approach for performance enhancement, opens up the opportunity for concurrent execution. Howeve ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract—Memory accesses form an important source of timing unpredictability. Timing analysis of real-time embedded software thus requires bounding the time for memory accesses. Multiprocessing, a popular approach for performance enhancement, opens up the opportunity for concurrent execution. However due to contention for any shared memory by different processing cores, memory access behavior becomes more unpredictable, and hence harder to analyze. In this paper, we develop a timing analysis method for concurrent software running on multi-cores with a shared instruction cache. Communication across tasks is by message passing where the message mailboxes are accessed via interrupt service routines. We do not handle data cache, shared memory synchronization and code sharing across tasks. Our method progressively improves the lifetime estimates of tasks that execute concurrently on multiple cores, in order to estimate potential conflicts in the shared cache. Possible conflicts arising from overlapping task lifetimes are accounted for in the hit-miss classification of accesses to the shared cache, to provide safe execution time bounds. We show that our method produces lower worst-case response time (WCRT) estimates than existing shared-cache analysis on a real-world embedded application. I.
Parametric Timing Analysis and Its Application to Dynamic Voltage Scaling
"... Embedded systems with real-time constraints depend on a-priori knowledge of worst-case execution times (WCETs) to determine if tasks meet deadlines. Static timing analysis derives bounds on WCETs but requires statically known loop bounds. This work removes the constraint on known loop bounds through ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Embedded systems with real-time constraints depend on a-priori knowledge of worst-case execution times (WCETs) to determine if tasks meet deadlines. Static timing analysis derives bounds on WCETs but requires statically known loop bounds. This work removes the constraint on known loop bounds through parametric analysis expressing WCETs as functions. Tighter WCETs are dynamically discovered to exploit slack by dynamic voltage scaling (DVS) saving 60%-82 % energy over DVS-oblivious techniques and showing savings close to more costly dynamic-priority DVS algorithms. Overall, parametric analysis expands the class of real-time applications to programs with loop-invariant dynamic loop bounds while retaining tight WCET bounds.
Resilience Analysis: Tightening the CRPD bound for set-associative caches
, 2010
"... In preemptive real-time systems, scheduling analyses need—in addition to the worst-case execution time—the context-switch cost. In case of preemption, the preempted and the preempting task may interfere on the cache memory. This interference leads to additional cache misses in the preempted task. Th ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In preemptive real-time systems, scheduling analyses need—in addition to the worst-case execution time—the context-switch cost. In case of preemption, the preempted and the preempting task may interfere on the cache memory. This interference leads to additional cache misses in the preempted task. The delay due to these cache misses is referred to as the cache-related preemption delay (CRPD), which constitutes the major part of the context-switch cost. In this paper, we present a new approach to compute tight bounds on the CRPD for LRU set-associative caches, based on analyses of both the preempted and the preempting task. Previous approaches analyzing both the preempted and the preempting task were either imprecise or unsound. As the basis of our approach we introduce the notion of resilience: The resilience of a memory block of the preempted task is the maximal number of memory accesses a preempting task could perform without causing an additional miss to this block. By computing lower bounds on the resilience of blocks and an upper bound on the number of accesses by a preempting task, one can guarantee that some blocks may not contribute to the CRPD. The CRPD analysis based on resilience considerably outperforms previous approaches.
Exploiting Hardware/Software Interactions for Analyzing Embedded Systems
"... Embedded systems are often subject to real-time timing constraints. Such systems require determinism to ensure that task deadlines are met. The knowledge of the bounds on worst-case execution times (WCET) of tasks is a critical piece of information required to achieve this objective. One limiting fa ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Embedded systems are often subject to real-time timing constraints. Such systems require determinism to ensure that task deadlines are met. The knowledge of the bounds on worst-case execution times (WCET) of tasks is a critical piece of information required to achieve this objective. One limiting factor in designing real-time systems is the class of processors that may be used. Contemporary processors with their advanced architectural features, such as out-of-order execution, branch prediction, speculation, and prefetching, cannot be statically analyzed to obtain WCETs for tasks as they introduce non-determinism into task execution, which can only be resolved at run-time. Such micro-processors are tuned to reduce average-case execution times at the expense of predictability. Hence, they do not find use in hard real-time systems. On the other hand, static timing analysis derives bounds on WCETs but requires that bounds on loop iterations be known statically, i.e., at compile time. This limits the class of applications that may be analyzed by static timing analysis and, hence, used in a real-time system. Finally, many embedded systems have communication and/or synchronization constructs and need to function on a wide spectrum of hardware devices ranging from small microcontrollers to modern multi-core architectures. Hence, any single analysis technique (be it static or dynamic) will not suffice in gauging the true nature of such

