Results 1 - 10
of
13
A software memory partition approach for eliminating bank-level interference in multicore systems
- in PACT
, 2012
"... Main memory system is a shared resource in modern multicore machines, resulting in serious interference, which causes performance degradation in terms of throughput slowdown and unfairness. Numerous new memory scheduling algorithms have been proposed to address the interference problem. However, the ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
(Show Context)
Main memory system is a shared resource in modern multicore machines, resulting in serious interference, which causes performance degradation in terms of throughput slowdown and unfairness. Numerous new memory scheduling algorithms have been proposed to address the interference problem. However, these algorithms usually employ complex scheduling logic and need hardware modification to memory controllers, as a result, industrial venders seem to have some hesitation in adopting them. This paper presents a practical software approach to effectively eliminate the interference without hardware modification. The key idea is to modify the OS memory management subsystem to adopt a page-coloring based bank-level partition mechanism (BPM), which allocates specific DRAM banks to specific cores (threads). By using BPM, memory controllers can passively schedule memory requests in a core-cluster (or thread-cluster) way. We implement BPM in Linux 2.6.32.15 kernel and evaluate BPM on 4-core and 8-core real machines by running randomly generated 20 multi-programmed workloads (each contains 4/8 benchmarks) and multi-threaded benchmark. Experimental results show that BPM can improve the overall system throughput by 4.7% on average (up to 8.6%), and reduce the maximum slowdown by 4.5 % on average (up to 15.8%). Moreover, BPM also saves 5.2% of the energy consumption of memory system.
The Blacklisting Memory Scheduler: Achieving High Performance and Fairness at Low Cost
"... Abstract—In a multicore system, applications running on different cores interfere at main memory. This inter-application interference degrades overall system performance and unfairly slows down applications. Prior works have developed application-aware memory request schedulers to tackle this proble ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
(Show Context)
Abstract—In a multicore system, applications running on different cores interfere at main memory. This inter-application interference degrades overall system performance and unfairly slows down applications. Prior works have developed application-aware memory request schedulers to tackle this problem. State-of-the-art application-aware memory request schedulers prioritize memory requests of applications that are vulnerable to interfer-ence, by ranking individual applications based on their memory access characteristics and enforcing a total rank order. In this paper, we observe that state-of-the-art application-aware memory schedulers have two major shortcomings. First, ranking applications individually with a total order based on memory access characteristics leads to high hardware cost and complexity. Second, ranking can unfairly slow down applications that are at the bottom of the ranking stack. To overcome these
Martı́nez, “Improving memory scheduling via processor-side load criticality information
- in ISCA
, 2013
"... We hypothesize that performing processor-side analysis of load in-structions, and providing this pre-digested information to mem-ory schedulers judiciously, can increase the sophistication of mem-ory decisions while maintaining a lean memory controller that can take scheduling actions quickly. This ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
We hypothesize that performing processor-side analysis of load in-structions, and providing this pre-digested information to mem-ory schedulers judiciously, can increase the sophistication of mem-ory decisions while maintaining a lean memory controller that can take scheduling actions quickly. This is increasingly important as DRAM frequencies continue to increase relative to processor speed. In this paper we propose one such mechanism, pairing up a processor-side load criticality predictor with a lean memory controller that prioritizes load requests based on ranking informa-tion supplied from the processor side. Using a sophisticated multi-core simulator that includes a detailed quad-channel DDR3 DRAM model, we demonstrate that this mechanism can improve perfor-mance significantly on a CMP, with minimal overhead and virtu-ally no changes to the processor itself. We show that our design compares favorably to several state-of-the-art schedulers. 1.
Half-DRAM: A High-Bandwidth and Low-Power DRAM Architecture from the Rethinking Of Fine-grained Activation,” in ISCA,
, 2014
"... ..."
(Show Context)
A dualcriticality memory controler (DCmc): Proposal and evaluation of a space case study
- In Proc. IEEE Real-Time Systems Symposium
, 2014
"... Abstract-Multicore Dual-Criticality systems comprise two types of applications, each with a different criticality level. In the space domain these types are referred as payload and control applications, which have high-performance and realtime requirements respectively. In order to control the inte ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract-Multicore Dual-Criticality systems comprise two types of applications, each with a different criticality level. In the space domain these types are referred as payload and control applications, which have high-performance and realtime requirements respectively. In order to control the interaction (contention) among payload and control applications in the access to the main memory, reaching the goals of highbandwidth for the former and guaranteed timing bounds for the latter, we propose a Dual-Criticality memory controller (DCmc). DCmc virtually divides memory banks into real-time and high-performance banks, deploying a different request scheduler policy to each bank type, which facilitates achieving both goals. Our evaluation with a multicore cycle-accurate simulator and a real space case study shows that DCmc enables deriving tight WCET estimates, regardless of the co-running payload applications, hence effectively isolating the effect of contention in the access to memory. DCmc also enables payload applications exploiting memory locality, which is needed for high performance.
Lazy Precharge: An Overhead-free Method to Reduce Precharge Overhead for Memory Parallelism Improvement of DRAM System
"... Abstract-As we enter the multi-core era, the main memory becomes the bottleneck due to the exploded memory requests. In this work, we propose a novel memory architecture-Lazy Precharge (LaPRE) that enables aggressive activation schemes so that multiple rows in a bank can be activated successively w ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract-As we enter the multi-core era, the main memory becomes the bottleneck due to the exploded memory requests. In this work, we propose a novel memory architecture-Lazy Precharge (LaPRE) that enables aggressive activation schemes so that multiple rows in a bank can be activated successively without the interrupt from precharges. Therefore, LaPRE effectively reduces the precharge overhead and thus improves memory parallelism. In addition, three memory scheduling schemes are proposed correspondingly to fully make use of the improved memory parallelism. The experimental results show that LaPRE can achieve 14% performance improvement on average without hardware overhead.
Int. J. of High Performance System Architecture 1 A Survey of Architectural Techniques for DRAM Power Management
"... Recent trends of CMOS technology scaling and wide-spread use of multicore processors have dramatically increased the power consumption of main memory. It has been estimated that modern data-centers spend more than 30 % of their total power consumption in main memory alone. This excessive power dissi ..."
Abstract
- Add to MetaCart
Recent trends of CMOS technology scaling and wide-spread use of multicore processors have dramatically increased the power consumption of main memory. It has been estimated that modern data-centers spend more than 30 % of their total power consumption in main memory alone. This excessive power dissipation has created the problem of “memory power wall ” which has emerged as a major design constraint inhibiting further performance scaling. Recently, several techniques have been proposed to address this issue. The focus of this paper is to survey several architectural techniques designed for improving power efficiency of main memory systems, specifically DRAM systems. To help the reader in gaining insights into the similarities and differences between the techniques, this paper also presents a classification of the techniques on the basis of their characteristics. The aim of the paper is to equip the engineers and architects with knowledge of the state of the art DRAM power saving techniques and motivate them to design novel solutions for addressing the challenges presented by the memory power wall problem.
This page is intentionally left blank. BATMAN: Maximizing Bandwidth Utilization of Hybrid Memory Systems
, 2015
"... Abstract—High bandwidth memory technologies such as HMC, HBM, and WideIO provide 4x-8x higher bandwidth than com-modity DRAM while maintaining similar random access latency. The limited capacity of such high bandwidth memories require them to be architected in conjunction with traditional DRAM. Such ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—High bandwidth memory technologies such as HMC, HBM, and WideIO provide 4x-8x higher bandwidth than com-modity DRAM while maintaining similar random access latency. The limited capacity of such high bandwidth memories require them to be architected in conjunction with traditional DRAM. Such a system would designate high bandwidth memory as Near Memory (NM) and commodity DRAM as Far Memory (FM), and rely on techniques that try to maximize NM hit rate. We show that the conventional approach of optimizing for NM hit rate is inefficient as it does not maximize the overall available system bandwidth. For example, when the application working set entirely fits in NM, all requests are serviced only by NM and the bandwidth of FM remains unutilized. We show that optimizing for overall system bandwidth, instead of hit rate of NM, can significantly improve system performance.
Design and Implementation of a DDR3-based Memory Controller
"... Abstract—Memory performance has become the major bottleneck to improve the overall performance of the computer system. DDR3 SDRAM is a new generation of memory technology standard introduced by JEDEC, support multibank in parallel and open-page technology. On the basis of in-depth study of DDR3 timi ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Memory performance has become the major bottleneck to improve the overall performance of the computer system. DDR3 SDRAM is a new generation of memory technology standard introduced by JEDEC, support multibank in parallel and open-page technology. On the basis of in-depth study of DDR3 timing specification, design a DDR3-based memory controller. Memory access control module is the most key component of the memory controller. Using the stream testbench evaluate the performance, experimental results show that the memory controller of our design can correctly schedule memory access transaction, improve memory bandwidth. Keywords-DDR3 SDRAM, memory controller, memory access scheduling, memory bandwidth I.