Emerging NVM Memory Technologies

Similar documents
Hybrid Cache Architecture (HCA) with Disparate Memory Technologies

Emerging NVM Enabled Storage Architecture:

OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches

Cache/Memory Optimization. - Krishna Parthaje

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors

BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory

The Memory Hierarchy 1

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

Phase Change Memory An Architecture and Systems Perspective

Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers

PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung

Improving Energy Efficiency of Write-asymmetric Memories by Log Style Write

A Hybrid Solid-State Storage Architecture for the Performance, Energy Consumption, and Lifetime Improvement

Emerging NV Storage and Memory Technologies --Development, Manufacturing and

Hybrid Checkpointing Using Emerging Nonvolatile Memories for Future Exascale Systems

3D Memory Stacking for Fast Checkpointing/Restore Applications

A Low-Power Hybrid Magnetic Cache Architecture Exploiting Narrow-Width Values

Energy-Efficient Spin-Transfer Torque RAM Cache Exploiting Additional All-Zero-Data Flags

WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems

The Role of Storage Class Memory in Future Hardware Platforms Challenges and Opportunities

EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES

CS311 Lecture 21: SRAM/DRAM/FLASH

Future Memory and Interconnect Technologies

Respin: Rethinking Near- Threshold Multiprocessor Design with Non-Volatile Memory

Hybrid Memory Platform

An Architecture-level Cache Simulation Framework Supporting Advanced PMA STT-MRAM

Using Non-Volatile Memory for Computation-in-Memory

Flexible Cache Error Protection using an ECC FIFO

Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies

Evalua&ng STT- RAM as an Energy- Efficient Main Memory Alterna&ve

Couture: Tailoring STT-MRAM for Persistent Main Memory

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns

Don t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration

Recent Advancements in Spin-Torque Switching for High-Density MRAM

Phase Change Memory An Architecture and Systems Perspective

Efficient Data Mapping and Buffering Techniques for Multi-Level Cell Phase-Change Memories

Lecture-14 (Memory Hierarchy) CS422-Spring

Storage and Memory Infrastructure to Support 5G Applications. Tom Coughlin President, Coughlin Associates

New Memory Organizations For 3D DRAM and PCMs

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture

Phase-change RAM (PRAM)- based Main Memory

+1 (479)

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.

Adaptive Placement and Migration Policy for an STT-RAM-Based Hybrid Cache

Bandwidth-Aware Reconfigurable Cache Design with Hybrid Memory Technologies

A Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing SRAM,

SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS

Energy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012

International Journal of Information Research and Review Vol. 05, Issue, 02, pp , February, 2018

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )

Adrian Proctor Vice President, Marketing Viking Technology

ChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

3118 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 10, OCTOBER 2016

Silicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly)

Flash Trends: Challenges and Future

Cache Memory Configurations and Their Respective Energy Consumption

Memories: Memory Technology

ECE 152 Introduction to Computer Architecture

Modeling and Design Exploration of FBDRAM as On-chip Memory

A Page-Based Storage Framework for Phase Change Memory

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

LECTURE 10: Improving Memory Access: Direct and Spatial caches

SPINTRONIC MEMORY ARCHITECTURE

Dynamically Reconfigurable Hybrid Cache: An Energy-Efficient Last-Level Cache Design

Advanced Information Storage 11

Announcements. ECE4750/CS4420 Computer Architecture L6: Advanced Memory Hierarchy. Edward Suh Computer Systems Laboratory

Scalable High Performance Main Memory System Using PCM Technology

Memory technology and optimizations ( 2.3) Main Memory

Unleashing the Power of Embedded DRAM

Chapter 02. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Information Storage and Spintronics 10

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

A Comparison of Capacity Management Schemes for Shared CMP Caches

Intel s s Memory Strategy for the Wireless Phone

DESIGNING LARGE HYBRID CACHE FOR FUTURE HPC SYSTEMS

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives

NAND Flash Memory. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

Understanding Reduced-Voltage Operation in Modern DRAM Devices

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

AdaMS: Adaptive MLC/SLC Phase-Change Memory Design for File Storage

THE DYNAMIC GRANULARITY MEMORY SYSTEM

Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University

Magnetoresistive RAM (MRAM) Jacob Lauzon, Ryan McLaughlin

Leveraging ECC to Mitigate Read Disturbance, False Reads Mitigating Bitline Crosstalk Noise in DRAM Memories and Write Faults in STT-RAM

The Memory Hierarchy. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 3, 2018 L13-1

Architectural Aspects in Design and Analysis of SOTbased

Breaking the Memory Bottleneck in Computing Applications with Emerging Memory Technologies: a Design and Technology Perspective

Silicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly) The main goal is to be cheap

Scalable Many-Core Memory Systems Lecture 3, Topic 2: Emerging Technologies and Hybrid Memories

COSC 6385 Computer Architecture - Memory Hierarchies (III)

Unleashing MRAM as Persistent Memory

A Better Storage Solution

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)

Transcription:

Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu

Position Statement Emerging NVM are very attractive Combing the speed of SRAM, the density of DRAM, and the non-volatility of Flash memory, Attractive features high density, low leakage, non-volatile Undesirable features: Write-related: long write-latency, high write-energy, low endurance (e.g. PCRAM) Cost (Needs large volume production) Solution: Hybrid cache/mem/storage + 3D? Enabling unique applications 2

Introduction Modeling Outline MRAM/PCRAM modeling Architecture MRAM stacking HCA: Hybrid Cache Architecture Hybrid storage system Application Exascale computing Conclusion 3

Traditional Memory Hierarchies Latency: (Cycles) On-chip memory (SRAM) Off-chip memory (DRAM) 1~30 100~300 Solid State Disk (Flash Memory) Large 25000~2000000 Latency Gap Secondary Storage (HDD) >5000000 4 4

Emerging Memory Techologies FeRAM (Ferroelectric RAM) MRAM (Magnetic RAM) EverSpin MRAM(2008) Toshiba FeRAM(2009) Memristor (Resistive RAM) PCRAM (Phase-Change RAM) HP Labs Memristor (2009) Samsung PCRAM (2008) 5 5

66

Traditional Memory Hierarchies Latency: (Cycles) On-chip memory (SRAM) Off-chip memory (DRAM) 1~30 100~300 Solid State Disk (Flash Memory) 25000~2000000 Secondary Storage (HDD) >5000000 7 7

NVRAM Comparison FeRAM, MRAM, or PCRAM, combines the advantages of SRAM, DRAM, and flash. Good opportunity to rethink the memory hierarchy design. Courtesy: Motoyuki Ooishi 8 8

What is the impact of emerging NVM technologies on computer memory hierarchies? Traditional Memory Hierarchies Latency: (Cycles) On-chip memory (SRAM) Off-chip memory (DRAM) ~10 ~100 Solid State Disk (SSD) 25000~2000000 Secondary Storage (HDD) >5000000 Phase-change RAM (PCRAM) Magnetic RAM (MRAM) Emerging Non-volatile Memory (NVM) 9

PCRAMsim Model Developed on the basis of CACTI CACTI models SRAM and DRAM caches CACTI does NOT support PCRAM. Precharge & Equalization Memory cells Row Decoders Wordline Drivers 2D array of memory cells PCRAMsim made 3 modifications on the subarray-level Bitline Mux Sense Amplifiers Sense Amplifier Mux Output/Write Drivers Peripheral circuitry CACTI-modeled memory subarray 10 10

SRAM vs. MRAM High Density Fast Read Slow Write Low Read Energy High Write Energy Low Leakage Area (65nm) 3.66mm 2 SRAM 3.30mm 2 MRAM Capacity 128KB 512KB Read latency 2.25ns 2.32ns Write latency 2.26ns 11.02ns Read energy 0.90nJ 0.86nJ Write energy 0.80nJ 5.00nJ Cache configurations Leakage power 2MB (16x128KB) SRAM cache 2.09W 8MB (16x512KB) MRAM cache 0.26W Pros: Low leakage power, high density. Cons: Long write latency and large write energy. Replace SRAM caches with MRAM? 11 11

Direct Replacement Replace SRAM with MRAM of same area. The number of banks are kept the same. The capacity of L2 cache increases by 4X. L2 Cache Miss Rate L2 cache miss rate reduced. How is the performance? 12 12

IPC Comparison (Direct Replacement) IPC (SRAM vs. MRAM) The last four benchmarks have high write intensities. (see Observation 1) 13 13

Observation 1 Replacing SRAM L2 caches directly with MRAM can reduce the access miss rate of L2 caches. However, the long access latency to MRAM cache has a negative impact on the performance. When the write intensity is high, it even results in performance degradation. Direct MRAM replacement may harm performance, How is power consumption? 14 14

Power Analysis (Direct Replacement) (Normalized to 2M-SRAM-SNUCA) MRAM dynamic power MRAM leakage power Total Power (SRAM vs. MRAM) For some workloads, MRAM dynamic power dominates! (see Observation 2) 15 15

Observation 2 Replacing SRAM L2 caches directly with MRAM can greatly reduce the leakage power. When the write intensity is high, the dynamic power increases significantly because of the high write energy of MRAM cache. Question: How to improve the performance and further reduce power of MRAM? 16 16

SRAM-MRAM Hybrid L2 Cache (Write Intensity: Pure vs. Hybrid) Write Intensity (Pure vs. Hybrid) Using hybrid L2 cache, MRAM write intensities are reduced 20 17

IPC Result direct replacement with read-preemptive IPC Comparison the performance degradation is eliminated. The average IPC is increased by 15%. 21 18

Power Result 8M-MRAM-DNUCA direct replacement with read-preemptive Total Power Comparison the dynamic power is reduced. The average total power is further reduced by 17%. 22 19

Comparisons SRAM edram MRAM PRAM Density (ratio) Low (1) High (4) High (4) High(16) Dynamic Power Low Medium Low for read; High for write Reduce Cache miss rate Increase hit latency Leakage Power High Medium Low Low Speed Very Fast Fast Non-volatility No No Yes Yes Scalability Yes Yes Yes Yes 10 16 Fast for read; Slow for write Low leakage power High dynamic power Medium for read; High for write Slow for read; Very slow for write Endurance 10 16 >10 15 10 8 20

No such Ideal (On-size-fits-all) Memory 1.4 1M-SRAM 4M-DRAM 4M-MRAM 16M-PRAM No 1 single memory technology has 0.6 0.2 astar bzip2 gcc gobmk h264 hmmer-sp libquantum mcf omnetpp perl sjeng blast bt cg clustalw hmmer lu mg sp ua specjbb dedup fluidanimate freqmine streamcluster Geomean Normalized IPC 1 0.8 0.6 0.4 0.2 0 1.88 1.89 the best power-performance Static Dynamic Hybrid Cache may outperform its counterpart of single technology astar bzip2 gcc gobmk h264 hmmer-sp libquantum mcf omnetpp perl sjeng blast bt cg clustalw hmmer lu mg sp ua specjbb dedup fluidanimate freqmine streamcluster Geomean Normalized Power 21

HCA: Hybrid Cache Architecture Core w/ L1s L2 (SRAM) L3 (edram/ MRAM/ PRAM) Core w/ L1s L2 Core w/ L1s L2 Core w/ L1s L2 Core w/ L1s L2 L3 L3 L3 L3 L3 L3 L3 Core w/ L1s L2 Core w/ L1s L2 Core w/ L1s L2 Core w/ L1s L2 2D design scenario Core w/ L1s L2 (SRAM) L3 (edram/ MRAM/ PRAM) A A B LHCA L3 LHCA RHCA Core w/ L1s L2 Fast (SRAM) L2 Slow (edram/ MRAM/ PRAM) 3D design scenario Baseline: a 2D 8-core CMP (3-level SRAM Caches) Flattening L2 and L3 with hybrid cache Core w/ L1s 3DHCA L2 (SRAM) 3D Layer 1 L3 (edram/ MRAM) Core w/ L1s L2 Fast (SRAM) 3DHCA C D L2 Middle E (edram/ MRAM) Core w/ L1s L2 Fast 3DHCA (SRAM) L2 Slow (edram/ MRAM) L4 (PRAM) 3D Layer 2 L2 Slow (PRAM) L3 (PRAM) A cache design scenario with 3D chip integration Flattening L2, L3 and L4 with hybrid cache Flattening L3 and L4 with hybrid cache 22

Hybrid Storage (HPCA 2010) Erase Unit In-place updating Data Region NAND flash How to manage the Log-region efficiently? Data Buffer in Memory Log Region PRAM Physical View Hybrid Architecture Sector (512Bytes) Structural View 23 23

Introduction Modeling Outline MRAM/PCRAM modeling Architecture MRAM stacking HCA: Hybrid Cache Architecture Hybrid storage system Application Exascale computing Conclusion 24

Fault Resiliency for Exascale System Microprocessor becomes unreliable Process scaling, voltage scaling, soft error, NBTI, Even assuming socket MTTF remains constant system MTTF = socket MTTF / number of socket 1 socket Socket MTTF = 5 years Exascale ~100,000 socket System MTTF = 26 minutes 25 25

Checkpoint / Restart Checkpoint / Restart is the state-of-the-art Hard disk drive (HDD) as the checkpoint storage HDD peak bandwidth: ~100MB/s BlueGene/L: 12 mins to take a checkpoint Equivalent to 8% performance loss Scale to exascale... Tolerable Unacceptable! 26 26

PCRAM A Good Candidate PCRAM is 2 orders faster than flash PCRAM has 3 orders higher endurance than flash Good candidate for local checkpoint HDD NAND Flash PCRAM Cell size - 4-6F 2 4-6F 2 Read time ~4ms 5us-50us 10ns-100ns Write time ~4ms 2ms-3ms 100-1000ns Stanby power ~1W ~0W ~0W Endurance 10 15 10 5 10 8 Courtesy: Motoyuki Ooishi 27 27

3D PCRAM How to Integrate PCRAM Deploy PCRAM directly on top of DRAM Possible local bandwidth ~2.5TB/s (DIMM bandwidth ~10GB/s) DRAM Parameters Values Bank size 32MB Mat count 16 Required TSV pitch < 74um PCRAM ITRS TSV pitch projection for 2012 3D-PCRAM delay Equivalent bandwidth 3.8um 0.8ms 2500GB/s Collaboration with HP Labs, Exascale Computing Lab, Dr. Norm Jouppi, SC 2009) 28

Our Projection 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 Collaboration with HP Labs, Exascale Computing Lab, Dr. Norm Jouppi, SC 2009) 29 29

More Details http://www.cse.psu.edu/~yuanxie/3d.html Xiangyu Dong, X. Wu, Guangyu Sun, Yuan Xie, H. Li, Y.Chen, Circuit and Microarchitecture Evaluation of 3D MRAM, DAC 2008 Xiangyu Dong, Norm Jouppi, Yuan Xie, PCRAMsim: System-Level Performance, Energy, and Area Modeling for Phase-Change RAM ICCAD 2009. G.Sun, X. Dong, Y. Xie, J. Li, Y. Chen, Novel MRAM-Stacking Architecture for CMP, HPCA 2009 Xiaoxia Wu, J. Li, L. Zhang, E. Speight, Yuan Xie. Hybrid Cache Architecture with Disparate Memory Technologies." ISCA 2009 Guangyu Sun, Y. Joo, Y. Chen, Yuan Xie, Y. Chen, H. Li, A Hybrid Solid-State Storage Architecture for Performance, Energy Consumption and Lifetime Improvement. HPCA 2010. Y.Joo, D.Niu, Guangyu Sun, Xiangyu Dong, Y. Xie, Energy- and Endurance-Aware Design of PCRAM Caches." DATE. 2010. Xiangyu Dong, N. Muralimanohar, Norm Jouppi, Richard Kaufmann, Yuan Xie, Leveraging 3D PCRAM Technologies to Reduce Checkpoint Overhead for Future Exascale Systems SC 2009. 30

Conclusion Emerging NVM are very attractive Combing the speed of SRAM, the density of DRAM, and the non-volatility of Flash memory, Attractive features high density, low leakage, non-volatile Undesirable features: Write-related: long write-latency, high write-energy, low endurance (e.g. PCRAM) Cost (Needs large volume production) Solution: Hybrid cache/mem/storage + 3D? Enabling unique applications 31