Results 1 - 10
of
35
MLP aware heterogeneous memory system. In
, 2011
"... Abstract-Main memory plays a critical role in a computer system's performance and energy efficiency. Three key parameters define a main memory system's efficiency: latency, bandwidth, and power. Current memory systems tries to balance all these three parameters to achieve reasonable effic ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
Abstract-Main memory plays a critical role in a computer system's performance and energy efficiency. Three key parameters define a main memory system's efficiency: latency, bandwidth, and power. Current memory systems tries to balance all these three parameters to achieve reasonable efficiency for most programs. However, in a multi-core system, applications with various memory demands are simultaneously executed. This paper proposes a heterogeneous main memory with three different memory modules, where each module is heavily optimized for one the three parameters at the cost of compromising the other two. Based on the memory access characteristics of an application, the operating system allocates its pages in a memory module that satisfies its memory requirements. When compared to a homogeneous memory system, we demonstrate through cycle-accurate simulations that our design results in about 13.5% increase in system performance and a 20% improvement in memory power.
RAMZzz: Rank-aware DRAM power management with dynamic migrations and demotions
- In SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer
, 2012
"... Abstract—Main memory is a significant energy consumer which may contribute to over 40 % of the total system power, and will become more significant for server machines with more main memory. In this paper, we propose a novel memory system design named RAMZzz with rank-aware energy saving optimizatio ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Main memory is a significant energy consumer which may contribute to over 40 % of the total system power, and will become more significant for server machines with more main memory. In this paper, we propose a novel memory system design named RAMZzz with rank-aware energy saving optimizations. Specifically, we rely on a memory controller to monitor the memory access locality, and group the pages with similar access locality into the same rank. We further develop dynamic page migrations to adapt to data access patterns, and a prediction model to estimate the demotion time for accurate control on pow-er state transitions. We experimentally compare our algorithm with other energy saving policies with cycle-accurate simulation. Experiments with benchmark workloads show that RAMZzz achieves significant improvement on energy-delay2 and energy consumption over other power saving techniques. I.
A case for refresh pausing in DRAM memory systems
- In Proc. HPCA 2013
"... DRAM cells rely on periodic refresh operations to main-tain data integrity. As the capacity of DRAM memories has in-creased, so has the amount of time consumed in doing refresh. Refresh operations contend with read operations, which in-creases read latency and reduces system performance. We show tha ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
DRAM cells rely on periodic refresh operations to main-tain data integrity. As the capacity of DRAM memories has in-creased, so has the amount of time consumed in doing refresh. Refresh operations contend with read operations, which in-creases read latency and reduces system performance. We show that eliminating latency penalty due to refresh can im-prove average performance by 7.2%. However, simply doing intelligent scheduling of refresh operations is ineffective at obtaining significant performance improvement. This paper provides an alternative and scalable option to reduce the latency penalty due to refresh. It exploits the prop-erty that each refresh operation in a typical DRAM device in-ternally refreshes multiple DRAM rows in JEDEC-based dis-tributed refresh mode. Therefore, a refresh operation has well defined points at which it can potentially be Paused to service a pending read request. Leveraging this property, we propose Refresh Pausing, a solution that is highly effective at allevi-ating the contention from refresh operations. It provides an average performance improvement of 5.1 % for 8Gb devices, and becomes even more effective for future high-density tech-nologies. We also show that Refresh Pausing significantly out-performs the recently proposed Elastic Refresh scheme. 1.
Exploring DRAM Organizations for Energy-Efficient and Resilient Exascale Memories
"... The power target for exascale supercomputing is 20MW, with about 30 % budgeted for the memory subsystem. Com-modity DRAMs will not satisfy this requirement. Addition-ally, the large number of memory chips (>10M) required will result in crippling failure rates. Although specialized DRAM memories h ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
The power target for exascale supercomputing is 20MW, with about 30 % budgeted for the memory subsystem. Com-modity DRAMs will not satisfy this requirement. Addition-ally, the large number of memory chips (>10M) required will result in crippling failure rates. Although specialized DRAM memories have been reorganized to reduce power through 3D-stacking or row buffer resizing, their implica-tions on fault tolerance have not been considered. We show that addressing reliability and energy is a co-optimization problem involving tradeoffs between error correction cost, access energy and refresh power—reducing the physical page size to decrease access energy increases the energy/area over-head of error resilience. Additionally, power can be reduced by optimizing bitline lengths. The proposed 3D-stacked memory uses a page size of 4kb and consumes 5.1pJ/bit based on simulations with NEK5000 benchmarks. Scaling to 100PB, the memory consumes 4.7MW at 100PB/s which, while well within the total power budget (20MW), is also error-resilient. 1.
Resilient die-stacked dram caches
"... Die-stacked DRAM can provide large amounts of in-package, high-bandwidth cache storage. For server and high-performance com-puting markets, however, such DRAM caches must also provide sufficient support for reliability and fault tolerance. While con-ventional off-chip memory provides ECC support by ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Die-stacked DRAM can provide large amounts of in-package, high-bandwidth cache storage. For server and high-performance com-puting markets, however, such DRAM caches must also provide sufficient support for reliability and fault tolerance. While con-ventional off-chip memory provides ECC support by adding one or more extra chips, this may not be practical in a 3D stack. In this paper, we present a DRAM cache organization that uses error-correcting codes (ECCs), strong checksums (CRCs), and dirty data duplication to detect and correct a wide range of stacked DRAM failures, from traditional bit errors to large-scale row, column, bank, and channel failures. With only a modest performance degradation compared to a DRAM cache with no ECC support, our proposal can correct all single-bit failures, and 99.9993 % of all row, column, and bank failures, providing more than a 54,000 ⇥ improvement in the FIT rate of silent-data corruptions compared to basic SECDED ECC protection.
Analyzing the Impact of Useless Write-Backs on the Endurance and Energy Consumption of PCM Main Memory
"... Abstract—Phase Change Memory (PCM) is an emerging technology that has been recently considered as a cost-effective and energy-efficient alternative to traditional DRAM main memory. Due to the high energy consumption of writes and limited number of write cycles, reducing the number of writes to PCM c ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Phase Change Memory (PCM) is an emerging technology that has been recently considered as a cost-effective and energy-efficient alternative to traditional DRAM main memory. Due to the high energy consumption of writes and limited number of write cycles, reducing the number of writes to PCM can result in considerable energy savings and endurance improvement. In this paper, we introduce the concept of useless write-backs, which occur when a dirty cache line that belongs to a dead memory region is evicted from the cache (a dead region is a memory location that is not used again by a program). Since the evicted data is not used again, the write-back can be safely avoided to improve endurance and energy consumption. This paper presents a limit study on the improvement that passing information to the memory system about useless writebacks has on the endurance and energy consumption of systems based on PCM main memory. We developed algorithms to measure the number of useless write-backs to PCM for three different types of memory regions and we present an energy model to determine the maximum energy savings that could potentially be achieved through such a scheme. Our results show that avoiding useless write-backs can save up to 19.8 % of energy and improve endurance by up to 26.2%. I.
Rethinking DRAM Power Modes for Energy Proportionality
"... We re-think DRAM power modes by modeling and characterizing inter-arrival times for memory requests to determine the properties an ideal power mode should have. This analysis indicates that even the most responsive of today’s power modes are rarely used. Up to 88 % of memory is spent idling in an ac ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
We re-think DRAM power modes by modeling and characterizing inter-arrival times for memory requests to determine the properties an ideal power mode should have. This analysis indicates that even the most responsive of today’s power modes are rarely used. Up to 88 % of memory is spent idling in an active mode. This analysis indicates that power modes must have much shorter exit latencies than they have today. Wake-up latencies less than 100ns are ideal. To address these challenges, we present MemBlaze, an architecture with DRAMs and links that are capable of fast powerup, which provides more opportunities to powerdown memories. By eliminating DRAM chip timing circuitry, a key contributor to powerup latency, and by shifting timing responsibility to the controller, MemBlaze permits data transfers immediately after wake-up and reduces energy per transfer by 50 % with no performance impact. Alternatively, in scenarios where DRAM timing circuitry must re-main, we explore mechanisms to accommodate DRAMs that powerup with less than perfect interface timing. We present MemCorrect which detects timing errors while MemDrowsy lowers transfer rates and widens sampling margins to accommodate timing uncertainty in situations where the interface circuitry must recalibrate after exit from powerdown state. Combined, MemCorrect and MemDrowsy still reduce energy per transfer by 50 % but incur modest (e.g., 10%) performance penalties. 1.
Improving System Energy Efficiency with Memory Rank Subsetting JUNG HO AHN, SeoulNationalUniversity
"... VLSI process technology scaling has enabled dramatic improvements in the capacity and peak bandwidth of DRAM devices. However, current standard DDRx DIMM memory interfaces are not well tailored to achieve high energy efficiency and performance in modern chip-multiprocessor-based computer systems. Th ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
VLSI process technology scaling has enabled dramatic improvements in the capacity and peak bandwidth of DRAM devices. However, current standard DDRx DIMM memory interfaces are not well tailored to achieve high energy efficiency and performance in modern chip-multiprocessor-based computer systems. Their suboptimal performance and energy inefficiency can have a significant impact on system-wide efficiency since much of the system power dissipation is due to memory power. New memory interfaces, better suited for future many-core systems, are needed. In response, there are recent proposals to enhance the energy efficiency of main-memory systems by dividing a memory rank into subsets, and making a subset rather than a whole rank serve a memory request. We holistically assess the effectiveness of rank subsetting from system-wide performance, energyefficiency, and reliability perspectives. We identify the impact of rank subsetting on memory power and processor performance analytically, compare two promising rank-subsetting proposals, Multicore DIMM and mini-rank, and verify our analysis by simulating a chip-multiprocessor system using multithreaded and consolidated workloads. We extend the design of Multicore DIMM for high-reliability systems and show that compared with conventional chipkill approaches, rank subsetting can lead to much higher system-level energy efficiency and performance at the cost of additional DRAM devices. This holistic assessment shows
Gather-Scatter DRAM: In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses
"... Many data structures (e.g., matrices) are typically ac-cessed with multiple access patterns. Depending on the layout of the data structure in physical address space, some access patterns result in non-unit strides. In ex-isting systems, which are optimized to store and access cache lines, non-unit s ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Many data structures (e.g., matrices) are typically ac-cessed with multiple access patterns. Depending on the layout of the data structure in physical address space, some access patterns result in non-unit strides. In ex-isting systems, which are optimized to store and access cache lines, non-unit strided accesses exhibit low spatial locality. Therefore, they incur high latency, and waste memory bandwidth and cache space. We propose the Gather-Scatter DRAM (GS-DRAM) to address this problem. We observe that a commodity DRAM module contains many chips. Each chip stores a part of every cache line mapped to the module. Our idea is to enable the memory controller to access multiple val-ues that belong to a strided pattern from different chips
MemZip: Exploring unconventional benefits from memory compression
- in Proceedings of the International Symposium on High Performance Computer Architecture (HPCA
, 2014
"... Memory compression has been proposed and deployed in the past to grow the capacity of a memory system and re-duce page fault rates. Compression also has secondary ben-efits: it can reduce energy and bandwidth demands. How-ever, most prior mechanisms have been designed to focus on the capacity metric ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Memory compression has been proposed and deployed in the past to grow the capacity of a memory system and re-duce page fault rates. Compression also has secondary ben-efits: it can reduce energy and bandwidth demands. How-ever, most prior mechanisms have been designed to focus on the capacity metric and few prior works have attempted to explicitly reduce energy or bandwidth. Further, mecha-nisms that focus on the capacity metric also require com-plex logic to locate the requested data in memory. In this paper, we design a highly simple compressed memory ar-chitecture that does not target the capacity metric. Instead, it focuses on complexity, energy, bandwidth, and reliabil-ity. It relies on rank subsetting and a careful placement of compressed data and metadata to achieve these benefits. Further, the space made available via compression is used to boost other metrics – the space can be used to implement stronger error correction codes or energy-efficient data en-codings. The best performing MemZip configuration yields a 45 % performance improvement and 57 % memory energy reduction, compared to an uncompressed non-sub-ranked baseline. Another energy-optimized configuration yields a 29.8 % performance improvement and a 79 % memory en-ergy reduction, relative to the same baseline. 1