Results 1 - 10
of
32
Compiler-Decided Dynamic Memory Allocation for Scratch-Pad Based Embedded Systems
, 2006
"... In this research we propose a highly predictable, low overhead and yet dynamic, memory allocation strategy for embedded systems with scratch-pad memory. A scratch-pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees v ..."
Abstract
-
Cited by 73 (4 self)
- Add to MetaCart
In this research we propose a highly predictable, low overhead and yet dynamic, memory allocation strategy for embedded systems with scratch-pad memory. A scratch-pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees vs cache and by its significantly lower overheads in energy consumption, area and overall runtime, even with a simple allocation scheme. Scratch-pad allocation methods primarily are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption and SRAM space for tags and deliver poor real-time guarantees, just like hardware caches. A second category of algorithms partitions variables at compile-time into the two banks. However, a drawback of such static allocation schemes is that they do not account for dynamic program behavior. We propose a dynamic allocation methodology for global and stack data and
WCET-centric software-controlled instruction caches for hard real-time systems
- In Euromicro Conference on Real-Time Systems (ECRTS
, 2006
"... Cache memories have been extensively used to bridge the gap between high speed processors and relatively slower main memories. However, they are sources of pre-dictability problems because of their dynamic and adaptive behavior, and thus need special attention to be used in hard real-time systems. A ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
(Show Context)
Cache memories have been extensively used to bridge the gap between high speed processors and relatively slower main memories. However, they are sources of pre-dictability problems because of their dynamic and adaptive behavior, and thus need special attention to be used in hard real-time systems. A lot of progress has been achieved in the last ten years to statically predict worst-case execution times (WCETs) of tasks on architectures with caches. How-ever, cache-aware WCET analysis techniques are not al-ways applicable due to the lack of documentation of hard-ware manuals concerning the cache replacement policies. Moreover, they tend to be pessimistic with some cache re-placement policies (e.g. random replacement policies) [6]. Lastly, caches are sources of timing anomalies in dynami-cally scheduled processors [13] (a cache miss may in some cases result in a shorter execution time than a hit). To reconciliate performance and predictability of caches, we propose in this paper algorithms for software control of instruction caches. The proposed algorithms stat-ically divide the code of tasks into regions, for which the cache contents is statically selected. At run-time, at ev-ery transition between regions, the cache contents computed off-line is loaded into the cache and the cache replacement policy is disabled (the cache is locked). Experimental re-sults provided in the paper show that with an appropriate selection of regions and cache contents, the worst-case per-formance of applications with locked instruction caches is competitive with the worst-case performance of unlocked caches. 1
Heap Data Allocation To Scratch-Pad Memory In Embedded Systems
, 2007
"... This thesis presents the first-ever compile-time method for allocating a portion of a program’s dynamic data to scratch-pad memory. A scratch-pad is a fast directly addressed compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees vs ..."
Abstract
-
Cited by 43 (5 self)
- Add to MetaCart
This thesis presents the first-ever compile-time method for allocating a portion of a program’s dynamic data to scratch-pad memory. A scratch-pad is a fast directly addressed compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees vs cache and by its signifi-cantly lower overheads in access time, energy consumption, area and overall runtime. Dynamic data refers to all objects allocated at run-time in a program, as opposed to static data objects which are allocated at compile-time. Existing compiler methods for allocating data to scratch-pad are able to place only code, global and stack data (static data) in scratch-pad memory; heap and recursive-function objects(dynamic data) are allocated entirely in DRAM, resulting in poor performance for these dy-namic data types. Runtime methods based on software caching can place data in scratch-pad, but because of their high overheads from software address translation, they have not been successful, especially for dynamic data. In this thesis we present a dynamic yet compiler-directed allocation method for dynamic data that for the first time, (i) is able to place a portion of the dynamic
WCET-Directed Dynamic Scratchpad Memory Allocation of Data
- In Proc. ECRTS
, 2007
"... Many embedded systems feature processors coupled with a small and fast scratchpad memory. To the difference with caches, allocation of data to scratchpad memory must be handled by software. The major gain is to enhance the pre-dictability of memory accesses latencies. A compile-time dy-namic allocat ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
(Show Context)
Many embedded systems feature processors coupled with a small and fast scratchpad memory. To the difference with caches, allocation of data to scratchpad memory must be handled by software. The major gain is to enhance the pre-dictability of memory accesses latencies. A compile-time dy-namic allocation approach enables eviction and placement of data to the scratchpad memory at runtime. Previous dynamic scratchpad memory allocation ap-proaches aimed to reduce average-case program execution time or the energy consumption due to memory accesses. For real-time systems, worst-case execution time is the main metric to optimize. In this paper, we propose a WCET-directed algorithm to dynamically allocate static data and stack data of a pro-gram to scratchpad memory. The granularity of placement of memory transfers (e.g. on function, basic block bound-aries) is discussed from the perspective of its computation complexity and the quality of allocation. 1.
Scratchpad memory management for portable systems with a memory management unit
- Management Unit”, Conf. Embedded Software, 2006
, 2006
"... In this paper, we present a dynamic scratchpad memory allocation strategy targeting a horizontally partitioned memory subsystem for contemporary embedded processors. The memory subsystem is equipped with a memory management unit (MMU), and physically addressed scratchpad memory (SPM) is mapped into ..."
Abstract
-
Cited by 18 (7 self)
- Add to MetaCart
(Show Context)
In this paper, we present a dynamic scratchpad memory allocation strategy targeting a horizontally partitioned memory subsystem for contemporary embedded processors. The memory subsystem is equipped with a memory management unit (MMU), and physically addressed scratchpad memory (SPM) is mapped into the virtual address space. A small minicache is added to further reduce energy consumption and improve performance. Using the MMU’s page fault exception mechanism, we track page accesses and copy frequently executed code sections into the SPM before they are executed. Because the minimal transfer unit between the external memory and the SPM is a single memory page, good code placement is of great importance for the success of our method. Based on profiling information, our postpass optimizer divides the application binary into pageable, cacheable, and uncacheable regions. The latter two are placed at fixed locations in the external memory, and only pageable code is copied on demand to the SPM from the external memory. Pageable code is grouped into sections whose sizes are equal to the physical page size of the MMU. We discuss code grouping techniques and also analyze the effect of the minicache on execution time and energy consumption. We evaluate our SPM allocation strategy with twelve embedded applications, including MPEG-4. Compared to a fully-cached configuration, on average we achieve a 12 % improvement in runtime performance and a 33 % reduction in energy consumption by the memory system.
Dynamic Scratchpad Memory Management for Code in Portable Systems with an MMU
"... In this work, we present a dynamic memory allocation technique for a novel, horizontally partitioned memory subsystem targeting contemporary embedded processors with a memory management unit (MMU). We propose to replace the on-chip instruction cache with a scratchpad memory (SPM) and a small minicac ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In this work, we present a dynamic memory allocation technique for a novel, horizontally partitioned memory subsystem targeting contemporary embedded processors with a memory management unit (MMU). We propose to replace the on-chip instruction cache with a scratchpad memory (SPM) and a small minicache. Serializing the address translation with the actual memory access enables the memory system to access either only the SPM or the minicache. Independent of the SPM size and based solely on profiling information, a postpass optimizer classifies the code of an application binary into a pageable and a cacheable code region. The latter is placed at a fixed location in the external memory and cached by the minicache. The former, the pageable code region, is copied on demand to the SPM before execution. Both the pageable code region and the SPM are logically divided into pages the size of an MMU memory page. Using the MMU’s pagefault exception mechanism, a runtime scratchpad memory manager (SPMM) tracks page accesses and copies frequently executed code pages to the SPM before they get executed. In order to minimize the number of page transfers from the external memory to the SPM, good code placement techniques become more important with increasing sizes of the MMU pages. We discuss code-grouping techniques and provide an analysis of the effect of the MMU’s page size on execution time, energy consumption, and external
Implementing time-predictable load and store operations
- In Proc. Embedded Software (Emsoft
, 2009
"... Scratchpads have been widely proposed as an alternative to caches for embedded systems. Advantages of scratchpads in-clude reduced energy consumption in comparison to a cache and access latencies that are independent of the preceding memory access pattern. The latter property makes memory accesses t ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
(Show Context)
Scratchpads have been widely proposed as an alternative to caches for embedded systems. Advantages of scratchpads in-clude reduced energy consumption in comparison to a cache and access latencies that are independent of the preceding memory access pattern. The latter property makes memory accesses time-predictable, which is useful for hard real-time tasks as the worst-case execution time (WCET) must be safely estimated in order to check that the system will meet timing requirements. However, data must be explicitly moved between scratch-pad and external memory as a task executes in order to make best use of the limited scratchpad space. When dy-namic data is moved, issues such as pointer aliasing and pointer invalidation become problematic. Previous work has proposed solutions that are not suitable for hard real-time tasks because memory accesses are not time-predictable. This paper proposes the scratchpad memory management unit (SMMU) as an enhancement to scratchpad technol-ogy. The SMMU implements an alternative solution to the pointer aliasing and pointer invalidation problems which (1) does not require whole-program pointer analysis and (2) makes every memory access operation time-predictable. This allows WCET analysis to be applied to hard-real time tasks which use a scratchpad and dynamic data, but results are also applicable in the wider context of minimizing en-ergy consumption or average execution time. Experiments using C software show that the combination of an SMMU and scratchpad compares favorably with the best and worst case performance of a conventional data cache.
Dynamic code mapping for limited local memory systems
, 2010
"... Abstract—This paper presents heuristics for dynamic man-agement of application code on limited local memories present in high-performance multi-core processors. Previous techniques formulate the problem using call graphs, which do not capture the temporal ordering of functions. In addition, they onl ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
(Show Context)
Abstract—This paper presents heuristics for dynamic man-agement of application code on limited local memories present in high-performance multi-core processors. Previous techniques formulate the problem using call graphs, which do not capture the temporal ordering of functions. In addition, they only use a conservative estimate of the interference cost between functions to obtain a mapping. As a result previous techniques are unable to achieve efficient code mapping. Techniques proposed in this paper overcome both these limitations and achieve superior code mapping. Experimental results from executing benchmarks from MiBench onto the Cell processor in the Sony Playstation 3 demonstrate upto 29 % and average 12 % performance improve-ment, at tolerable compile-time overhead. I.
Scratchpad Allocation for Data Aggregates in Superperfect Graphs
, 2007
"... Existing methods place data or code in scratchpad memory, i.e., SPM by either relying on heuristics or resorting to integer programming or mapping it to a graph coloring problem. In this work, the SPM allocation problem is formulated as an interval coloring problem. The key observation is that in ma ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Existing methods place data or code in scratchpad memory, i.e., SPM by either relying on heuristics or resorting to integer programming or mapping it to a graph coloring problem. In this work, the SPM allocation problem is formulated as an interval coloring problem. The key observation is that in many embedded applications, arrays (including structs as a special case) are often related in the following way: For any two arrays, their live ranges are often such that one is either disjoint from or contains the other. As a result, array interference graphs are often superperfect graphs and optimal interval colorings for such array interference graphs are possible. This has led to the development of two new SPM allocation algorithms. While differing in whether live range splits and spills are done sequentially or together, both algorithms place arrays in SPM based on examining the cliques in an interference graph. In both cases, we guarantee optimally that all arrays in an interference graph can be placed in SPM if its size is no smaller than the clique number of the graph. In the case that the SPM is not large enough, we rely on heuristics to split or spill a live range until the graph is colorable. Our experiment results using embedded benchmarks show that our algorithms can outperform graph coloring when their interference graphs are superperfect or nearly so although graph coloring is admittedly more general and may also be effective to applications with arbitrary interference graphs.
Heap data management for limited local memory (llm) multi-core processors
- In Proceedings of the eighth IEEE/ACM/IFIP Int. Conf. on Hardware/software codesign and system synthesis. CODES/ISSS
"... This paper presents a scheme to manage heap data in the local memory present in each core of a limited local memory (LLM) multi-core processor. While it is possible to manage heap data semi-automatically using software cache, manag-ing heap data of a core through software cache may require changing ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
This paper presents a scheme to manage heap data in the local memory present in each core of a limited local memory (LLM) multi-core processor. While it is possible to manage heap data semi-automatically using software cache, manag-ing heap data of a core through software cache may require changing the code of the other threads. Cross thread mod-ifications are difficult to code and debug, and only become more difficult as we scale the number of cores. We propose a semi-automatic, and scalable scheme for heap data man-agement that hides this complexity in a library with a much natural programming interface. Furthermore, for embedded applications, where the maximum heap size can be known at compile time, we propose optimizations on the heap manage-ment to significantly improve the application performance. Experiments on several benchmarks of MiBench executing on the Sony Playstation 3 show that our scheme is easier to use, and if we know the maximum size of heap data, then our optimizations can improve application performance by an average of 14%.