Understanding Reduced-Voltage Operation in Modern DRAM Devices

Similar documents
What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study

Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms

arxiv: v1 [cs.ar] 29 May 2017

Design-Induced Latency Variation in Modern DRAM Chips:

SoftMC A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Improving DRAM Performance by Parallelizing Refreshes with Accesses

arxiv: v1 [cs.ar] 13 Jul 2018

What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study

ChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality

Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM

DRAM Disturbance Errors

Row Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

Solar-DRAM: Reducing DRAM Access Latency by Exploiting the Variation in Local Bitlines

arxiv: v2 [cs.ar] 15 May 2017

Staged Memory Scheduling

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques

Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture

Understanding and Improving Latency of DRAM-Based Memory Systems

Run-Time Reverse Engineering to Optimize DRAM Refresh

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University

Understanding and Improving the Latency of DRAM-Based Memory Systems

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung

Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case

Lecture 5: Scheduling and Reliability. Topics: scheduling policies, handling DRAM errors

Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization

Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter

ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM SCALING. Thesis Defense Yoongu Kim

SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS

HeatWatch Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University

Nonblocking Memory Refresh. Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian

Detecting and Mitigating Data-Dependent DRAM Failures by Exploiting Current Memory Content

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Processor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

RAIDR: Retention-Aware Intelligent DRAM Refresh

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems

Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh

15-740/ Computer Architecture Lecture 5: Project Example. Jus%n Meza Yoongu Kim Fall 2011, 9/21/2011

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

EEM 486: Computer Architecture. Lecture 9. Memory

Addressing the Memory Wall

HPCA 18. Reliability-aware Data Placement for Heterogeneous memory Architecture

Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses

Memories: Memory Technology

VLSID KOLKATA, INDIA January 4-8, 2016

Lecture: Memory Technology Innovations

Don t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration

Unleashing the Power of Embedded DRAM

Efficient Data Mapping and Buffering Techniques for Multi-Level Cell Phase-Change Memories

Lecture 5: Refresh, Chipkill. Topics: refresh basics and innovations, error correction

Introduction to memory system :from device to system

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns

Computer Architecture

Improving the Reliability of Chip-Off Forensic Analysis of NAND Flash Memory Devices. Aya Fukami, Saugata Ghose, Yixin Luo, Yu Cai, Onur Mutlu

High Performance Memory Access Scheduling Using Compute-Phase Prediction and Writeback-Refresh Overlap

The Memory Hierarchy 1

Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology

Intro to Computer Architecture, Spring 2012 Midterm Exam II. Name:

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

arxiv: v2 [cs.ar] 28 Jun 2017

BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory

Intro to Computer Architecture, Spring 2012 Midterm Exam II

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAM Main Memory. Dual Inline Memory Module (DIMM)

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

Rethinking DRAM Power Modes for Energy Proportionality

Cache/Memory Optimization. - Krishna Parthaje

DRAM Tutorial Lecture. Vivek Seshadri

Lecture-14 (Memory Hierarchy) CS422-Spring

Phase Change Memory An Architecture and Systems Perspective

IN modern systems, the high latency of accessing largecapacity

ECE 485/585 Microprocessor System Design

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery

Designing High-Performance and Fair Shared Multi-Core Memory Systems: Two Approaches. Onur Mutlu March 23, 2010 GSRC

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Phase Change Memory An Architecture and Systems Perspective

Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service

Understanding and Overcoming Challenges of DRAM Refresh. Onur Mutlu June 30, 2014 Extreme Scale Scientific Computing Workshop

ECE 571 Advanced Microprocessor-Based Design Lecture 17

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

DDR3 Memory Buffer: Buffer at the Heart of the LRDIMM Architecture. Paul Washkewicz Vice President Marketing, Inphi

M2 Outline. Memory Hierarchy Cache Blocking Cache Aware Programming SRAM, DRAM Virtual Memory Virtual Machines Non-volatile Memory, Persistent NVM

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

CS311 Lecture 21: SRAM/DRAM/FLASH

Emerging NVM Memory Technologies

Toward a Memory-centric Architecture

Comparing Memory Systems for Chip Multiprocessors

Transcription:

Understanding Reduced-Voltage Operation in Modern DRAM Devices Experimental Characterization, Analysis, and Mechanisms Kevin Chang A. Giray Yaglikci, Saugata Ghose,Aditya Agrawal *, Niladrish Chatterjee *, Abhijith Kashyap, Donghyuk Lee *, Mike O Connor *, Hasan Hassan, Onur Mutlu *

Executive Summary DRAM (memory) power is significant in today s systems Existing low-voltage DRAM reduces voltage conservatively Goal: Understand and exploit the reliability and latency behavior of real DRAM chips under aggressive reduced-voltage operation Key experimental observations: Errors occur and increase with lower voltage Errors exhibit spatial locality Higher operation latency mitigates voltage-induced errors Voltron: A new DRAM energy reduction mechanism Reduce DRAM voltage without introducing errors Use a regression model to select voltage that does not degrade performance beyond a chosen target à 7.3% system energy reduction 2

Outline Executive Summary Motivation DRAM Background Characterization of DRAM Voltron: DRAM Energy Reduction Mechanism Conclusion 3

High DRAM Power Consumption Problem: High DRAM (memory) power in today s systems >40% in POWER7 (Ware+, HPCA 10) >40% in GPU (Paul+, ISCA 15) 4

Low-Voltage Memory Existing DRAM designs to help reduce DRAM power by lowering supply voltage conservatively Power Voltage, DDR3L (low-voltage) reduces voltage from 1.5V to 1.35V (-10%) LPDDR4 (low-power) employs low-power I/O interface with 1.2V (lower bandwidth) Can we reduce DRAM power and energy by further reducing supply voltage? 5

Goals 1 Understand and characterize the various characteristics of DRAM under reduced voltage 2 Develop a mechanism that reduces DRAM energy by lowering voltage while keeping performance loss within a target 6

Key Questions How does reducing voltage affect reliability (errors)? How does reducing voltage affect DRAM latency? How do we design a new DRAM energy reduction mechanism? 7

Outline Executive Summary Motivation DRAM Background Characterization of DRAM Voltron: DRAM Energy Reduction Mechanism Conclusion 8

High-Level DRAM Organization DRAM chips DRAM Channel DRAM Module 9

DRAM Chip Internals DRAM Cell Bitline Peripheral Circuitry Control Logic I/O DRAM Array Bank Wordline S S S S Sense amplifiers (row buffer) Off-chip channel 10

DRAM Operations 1 1 1 1 1 ACTIVATE: Store the row into the row buffer 2 READ: Select the target cache line and drive to CPU 3 PRECHARGE: Prepare the array for a new ACTIVATE to I/O 11

DRAM Access Latency 1 2 Activation latency (13ns / 50 cycles) Precharge latency (13ns / 50 cycles) Command Data Duration ACTIVATE READ PRECHARGE 1 1 1 1 Cache line (64B) Next ACT 12

Outline Executive Summary Motivation DRAM Background Characterization of DRAM Experimental methodology Impact of voltage on reliability and latency Voltron: DRAM Energy Reduction Mechanism Conclusion 13

Supply Voltage Control on DRAM DRAM Module Supply Voltage Adjust the supply voltage to every chip on the same module 14

Custom Testing Platform SoftMC [Hassan+, HPCA 17]: FPGA testing platform to 1) Adjust supply voltage to DRAM modules 2) Schedule DRAM commands to DRAM modules Existing systems: DRAM commands not exposed to users DRAM module FPGA Voltage controller https://github.com/cmu-safari/dram-voltage-study 15

Tested DRAM Modules 124 DDR3L (low-voltage) DRAM chips 31 SO-DIMMs 1.35V (DDR3 uses 1.5V) Density: 4Gb per chip Three major vendors/manufacturers Manufacturing dates: 2014-2016 Iteratively read every bit in each 4Gb chip under a wide range of supply voltage levels: 1.35V to 1.0V (-26%) 16

Outline Executive Summary Motivation DRAM Background Characterization of DRAM Experimental methodology Impact of voltage on reliability and latency Voltron: DRAM Energy Reduction Mechanism Conclusion 17

Reliability Worsens with Lower Voltage )raftlon of CaFhe LLneV wlth (rrorv (%) 10 2 10 1 10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 Vendor A Vendor % Vendor C Errors induced by reduced-voltage operation Min. voltage (V min ) without errors Reducing voltage 6uSSly below Voltage V(V) min causes an increasing number of errors Nominal Voltage 1.025 1.05 1.075 1.1 1.125 1.15 1.175 1.2 1.25 1.3 1.35 18

Source of Errors Detailed circuit simulations (SPICE) of a DRAM cell array to model the behavior of DRAM operations https://github.com/cmu-safari/dram-voltage-study 20 Activate Precharge Circuit model Latency (ns) 15 10 Nominal Voltage 5 0.9 1.0 1.1 1.2 1.3 Supply Voltage (V) Reliable low-voltage operation requires higher latency 19

DIMMs Operating at Higher Latency Measured minimum latency that does not cause errors in DRAM modules Measured Minimum Activate Latency (ns) 14 Distribution of latency in 100% of modules the total population 12 10 8 40% of modules DRAM requires longer latency to access data without errors at lower voltage Lower bound of latency as our latency adjustment granularity is 2.5ns 20

Spatial Locality of Errors A module under 1.175V (12% voltage reduction) Errors concentrate in certain regions 21

Other Results in the Paper Error-Correcting Codes (ECC) ECC (SECDED) is not sufficient to mitigate the errors Effect of temperature Higher temperature requires higher latency under some voltage levels Data retention time Lower voltage does not require more frequent refreshes Effect of stored data pattern on error rate Difference is not statistically significant to draw conclusion 22

Summary of Key Experimental Observations Voltage-induced errors increase as voltage reduces further below V min Errors exhibit spatial locality Increasing the latency of DRAM operations mitigates voltage-induced errors 23

Outline Executive Summary Motivation DRAM Background Characterization of DRAM Experimental methodology Impact of voltage on reliability and latency Voltron: DRAM Energy Reduction Mechanism Conclusion 24

DRAM Voltage Adjustment to Reduce Energy Goal: Exploit the trade-off between voltage and latency to reduce energy consumption Approach: Reduce DRAM voltage reliably Performance loss due to increased latency at lower voltage Performance DRAM Power Savings Improvement Over Nominal Voltage (%) 40 30 20 10 0-10 -20 High Power Savings Bad Performance 0.9 1.0 1.1 1.2 1.3 Supply Voltage (V) Low Power Savings Good Performance 25

Voltron Overview Voltron User specifies the performance loss target Select the minimum DRAM voltage without violating the target How do we predict performance loss due to increased latency under low DRAM voltage? 26

Linear Model to Predict Performance Voltron User specifies the performance loss target Select the minimum DRAM voltage without violating the target Application s characteristics [1.3V, 1.25V, ] DRAM Voltage Linear regression model [-1%, -3%, ] Predicted performance loss Min. Voltage Target Final Voltage 27

Linear Model to Predict Performance Application s characteristics for the model: Memory intensity: Frequency of last-level cache misses Memory stall time: Amount of time memory requests stall commit inside CPU Handling multiple applications: Predict a performance loss for each application Select the minimum voltage that satisfies the performance target for all applications 28

Comparison to Prior Work Prior work: Dynamically scale frequency and voltage of the entire DRAM based on bandwidth demand [David+, ICAC 11] Problem: Lowering voltage on the peripheral circuitry decreases channel frequency (memory data throughput) Voltron: Reduce voltage to only DRAM array without changing the voltage to peripheral circuitry Peripheral Circuitry Control Logic I/O DRAM Array Low Voltage Bank Peripheral Circuitry Control Logic I/O DRAM Array Low Voltage Bank Off-chip channel Low frequency Prior Work Off-chip channel High frequency Voltron 29

Exploiting Spatial Locality of Errors Key idea: Increase the latency only for DRAM banks that observe errors under low voltage Benefit: Higher performance Peripheral Circuitry DRAM Array Control Logic I/O Bank 0 Bank 1 Bank 2 Off-chip channel High latency Low latency 30

Outline Executive Summary Motivation DRAM Background Characterization of DRAM Experimental methodology Impact of voltage on reliability and latency Voltron: DRAM Energy Reduction Mechanism Evaluation Conclusion 31

Voltron Evaluation Methodology Cycle-level simulator: Ramulator [CAL 15] McPAT and DRAMPower for energy measurement https://github.com/cmu-safari/ramulator 4-core system with DDR3L memory Benchmarks: SPEC2006, YCSB Comparison to prior work: MemDVFS [David+, ICAC 11] Dynamic DRAM frequency and voltage scaling Scaling based on the memory bandwidth consumption 32

Energy Savings with Bounded Performance CPU+DRAM Energy Savings (%) 8 7 6 5 4 3 2 1 0 MemDVFS [David+, ICAC 11] More savings for high bandwidth applications Low 3.2% Voltron High 7.3% Performance Loss (%) 0-1 -2-3 -4-6 Meets performance target -1.6% -1.8% 1. Voltron improves energy for both -5 low and high intensity workloads Performance Target Low High 2. Voltron satisfies the performance loss target via a Memory Intensity Memory Intensity regression model 33

Outline Executive Summary Motivation DRAM Background Characterization of DRAM Experimental methodology Impact of voltage on reliability and latency Voltron: DRAM Energy Reduction Mechanism Conclusion 34

Conclusion DRAM (memory) power is significant in today s systems Existing low-voltage DRAM reduces voltage conservatively Goal: Understand and exploit the reliability and latency behavior of real DRAM chips under aggressive reduced-voltage operation Key experimental observations: Errors occur and increase with lower voltage Errors exhibit spatial locality Higher operation latency mitigates voltage-induced errors Voltron: A new DRAM energy reduction mechanism Reduce DRAM voltage without introducing errors Use a regression model to select voltage that does not degrade performance beyond a chosen target à 7.3% system energy reduction 35

Understanding Reduced-Voltage Operation in Modern DRAM Devices Experimental Characterization, Analysis, and Mechanisms Kevin Chang A. Giray Yaglikci, Saugata Ghose,Aditya Agrawal *, Niladrish Chatterjee *, Abhijith Kashyap, Donghyuk Lee *, Mike O Connor *, Hasan Hassan, Onur Mutlu *

BACKUP 37

Errors Rates Across Modules 38

Error Density 39

Temperature Impact 40

Impact on Retention Time 41

Derivation of More Precise Latency Circuit-level SPICE simulation Potential latency range DRAM circuit model validates our experimental results and provides more precise latency 42

Performance Loss Correlation Observation: Application s performance loss due to higher latency has a strong linear relationship with its memory intensity Performance Degradation (%) Voltage = 1.2V (-13%) DRAM Latency +5% Memory Intensity (MPKI) Voltage = 1.1V (-23%) DRAM Latency +13% Memory Intensity (MPKI) MPKI = Last-level cache Misses Per Thousand Instruction 43

Performance-Aware Voltage Adjustment Build a performance (linear-regression) model to predict performance loss based on the selected voltage PredictedLoss = θ B + θ D Latency + θ, App. Itensity + θ J App. StallTime Latency due to voltage adjustment The running application s characteristics θs are trained through 151 application samples Use the model to select a minimum voltage that satisfies a performance loss target specified by the user 44

Linear Model Accuracy R 2 = 0.75 / 0.9 for low and high intensity workloads RMSE = 2.8 / 2.5 for low and high intensity workloads 45

Dynamic Voltron 46

Effect of Exploiting Error Locality 47

Energy Breakdown 48

Heterogeneous Workloads 49

Performance Target Sweep 50

Sensitivity to Profile Interval Length 51