MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16

Similar documents
Smart Me for Smart Life, Smart Lifestyle Driving Internet of Things Revolution

3D Xpoint Status and Forecast 2017

PCIe Storage Beyond SSDs

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

Memory Class Storage. Bill Gervasi Principal Systems Architect Santa Clara, CA August

High-Speed NAND Flash

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

CS377P Programming for Performance Single Thread Performance Caches I

CS6453. Data-Intensive Systems: Rachit Agarwal. Technology trends, Emerging challenges & opportuni=es

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

The Long-Term Future of Solid State Storage Jim Handy Objective Analysis

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Memory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt

Adrian Proctor Vice President, Marketing Viking Technology

Caches. Hiding Memory Access Times

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Page 1. Multilevel Memories (Improving performance using a little cash )

ReRAM Status and Forecast 2017

Lecture 14: Memory Hierarchy. James C. Hoe Department of ECE Carnegie Mellon University

Developing Low Latency NVMe Systems for HyperscaleData Centers. Prepared by Engling Yeo Santa Clara, CA Date: 08/04/2017

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Toward a Memory-centric Architecture

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

MRAM Developer Day 2018 MRAM Update

Using Transparent Compression to Improve SSD-based I/O Caches

Tri-Hybrid SSD with storage class memory (SCM) and MLC/TLC NAND Flash Memories

Caches! Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Question?! Processor comparison!

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Hybrid Memory Platform

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand

Overview of Persistent Memory FMS 2018 Pre-Conference Seminar

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

1. Memory technology & Hierarchy

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Evolving Solid State Storage. Bob Beauchamp EMC Distinguished Engineer

Memory Hierarchy: Caches, Virtual Memory

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

IBM FlashSystem. IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein?

Enhancing SSD Control of NVMe Devices for Hyperscale Applications. Luca Bert - Seagate Chris Petersen - Facebook

Recap: Machine Organization

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Computer and Digital System Architecture

Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996

Gen-Z Memory-Driven Computing

A New Metric for Analyzing Storage System Performance Under Varied Workloads

FLASH MEMORY SUMMIT Adoption of Caching & Hybrid Solutions

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

High Performance and Highly Reliable SSD

virtual memory Page 1 CSE 361S Disk Disk

1. Creates the illusion of an address space much larger than the physical memory

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

Advanced Memory Organizations

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )

Memory hierarchy review. ECE 154B Dmitri Strukov

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Introduction to cache memories

ECE 30 Introduction to Computer Engineering

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Optimizing Software-Defined Storage for Flash Memory

Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies

ECE232: Hardware Organization and Design

CS 261 Fall Mike Lam, Professor. Memory

Replacement Policy: Which block to replace from the set?

Efficient Memory Mapped File I/O for In-Memory File Systems. Jungsik Choi, Jiwon Kim, Hwansoo Han

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Creating Storage Class Persistent Memory With NVDIMM

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Virtual Storage Tier and Beyond

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Algorithmic Memory Increases Memory Performance By an Order of Magnitude

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

A Framework for Memory Hierarchies

Unblinding the OS to Optimize User-Perceived Flash SSD Latency

SAS Technical Update Connectivity Roadmap and MultiLink SAS Initiative Jay Neer Molex Corporation Marty Czekalski Seagate Technology LLC

Markets for 3D-Xpoint Applications, Performance and Revenue

CMSC411 Fall 2013 Midterm 1

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

Looking Ahead to Higher Performance SSDs with HLNAND

Lecture-14 (Memory Hierarchy) CS422-Spring

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Transcription:

MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16

THE DATA CHALLENGE Performance Improvement (RelaLve) 4.4 ZB Total data created, replicated, and consumed in a single year 44ZB 1000 100 New storage class memory (SCM) NVMs MRAM, ReRAM, Source IDC 2014 3D-Xpoint, CNT 10 HDD throughput 1 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

MEMORY HIERARCHIES ARE STILL NEEDED WITH NEW NVMS Performance (Access Speed) HOW TO ACHIEVE THIS? SSD/HDD HYBRID 1,000,000x 100,000x 10,000x 1,000x 100x 10x 1x HDD? Flash DRAM 10x 100x 1000x 10000x SRAM 10X delta from SRAM to DRAM New SCM NVMs(MRAM, ReRAM, 3D-Xpoint, ) 1,000X delta from DRAM to Cost ($/GB)

BRIDGING THE GAP BETWEEN HDD AND SSD Read Data HDD SSD Addr look up (t 1 ) Access Time 10ms 100us In SSD? Miss Hit Read from SSD Average access time = t 1 + Hit Rate x SSD access time + Miss Rate x (t 2 +HDD access time) Design requirement 1-2% penalty for Hybrid drive compared to SSD Addr look up & data replacement (t 2 ) Read from HDD

TRADEOFFS BETWEEN HIT RATE, CACHE ENGINE DELAY, AND PERFORMANCE 1% PERFORMANCE (DELAY) PENALTY 2% PERFORMANCE (DELAY) PENALTY t 1 1us 0.5us 0.1us t 2 100us 50us 10us t 1 1us 0.5us 0.1us t 2 100us 50us 10us Miss Rate Requirement 0 (Not Possible) 0.0050% 0.0091% Miss Rate Requirement 0.0100% 0.0151% 0.0192% t 1 << SSD access time Hardware based lookup t 2 << HDD access time Software solution is acceptable Very high hit rate is required due to large performance gap between SSD and HDD Large cache, high associativity (i.e., data can placed anywhere in the cache)

CURRENT MEMORY HIERARCHIES L1 cache L2 cache Page (DRAM) HDD Miss rate # of entries 100s 10,000s millions 100s of millions Block placement 2-4 ways associative 4-8 ways associative Full set associative Lookup/Miss handling Hardware Hardware Hardware /Software Software Associativity PERFORMANCE HIT RATE, COST Source: https://courses.cs.washington.edu/courses/cse378/09au/

USE STORAGE CLASS MEMORY NVM AS CACHE Read Data Addr look up (t 1 ) Access Time SLC Storage Class Memory NVM DRAM 20us 200ns 20ns In DRA M? Addr look up & data replacement (t 2 ) Hit rate Miss rate Read from DRAM Read from SCM NVM or Achieving 0.1% DRAM access delay penalty requires low miss rate and low overhead cache engine Miss Rate 0.005% 0.001% 0.0001% 0.00005% t 2_SCM NVM (us) 0.2 1.8 19.8 39.8 t 2_ (us) Not possible Not possible Not possible 20.0

HARDWARE CACHE ENGINE IS NEEDED FOR SCM NVM 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Relative time between NVM access and cache engine +SW Cache Eng SCM NVM +SW Cache Eng SCM NVM +HW Cache Eng Flash SCM NVMs SW Cache Eng HW Cache Eng

UTILIZING SCM NVMs AS A CACHE L1 cache L2 cache Page (DRAM) SCM NVM SSD # of entries 100s 10,000s 100,000s - millions Block placement 2-4 ways associative 4-8 ways associative Full set associative Millions Full set associative 100s of millions Lookup/Miss handling Hardware Hardware Hardware Hardware Software DESIGN CHALLENGE High hit rate Large cache size, full set associative Low overhead Small cache size, non set associative L

MARVELL FINAL LEVEL CACHE (FLC ) CACHE ENGINE LOW MISS RATE Support very large cache (GB) Large cache line (16KB or larger) Full set associative LOW OVERHEAD Pseudo CAM based design Hardware based pseudo-lru replacement scheme <10 clock cycle latency

PERFORMANCE OF FLC (TESTED ON HYBRID HDD) 5000 Hybrid HDD with FLC TM SSD 160 4500 4000 3500 3000 140 120 100 2500 2000 1500 1000 500 HDD 80 60 40 20 0 0 PCMark8 Score PCMark8 BW

LOWER TIERS SHOULD FOCUS ON COST Register L1/L2/L3 Cache DRAM Register L1/L2/L3 Cache DRAM SCM-NVM SSD HDD SSD HDD LOWER THE COST!

ECONOMY OF CHIP DESIGN DOESN T WORK OUT UNIT COST = R&D COST + MASK COST + PACKAGE NRE # OF CHIPS + PER DIE COST+ PER PACKAGE COST R&D cost Increase Mask cost Increase Package NRE Increase # of chip Decrease Per dies cost Decrease Per package cost Flat or decrease Source: http://semiengineering.com/how-much-will-that-chip-cost/

NEED HIGH VOLUME TO AMORTIZE DEVELOPMENT COST ASSUMING A DERIVATIVE CONTROLLER - R&D COST $40M, MASK AND PACKAGING NRE $7.5M $50.00 Amortized R&D and Mask Cost vs Volume $45.00 $40.00 $35.00 $30.00 $25.00 $20.00 $15.00 $10.00 $5.00 $- 1,000,000 3,000,000 10,000,000 30,000,000 # of Chips

EXPANDABLE SSD CONTROLLERS TO HOST TO HOST TO HOST SATA 8 ch SATA 16 ch VS. SATA 8 ch MCi* 8gbps SATA 8 ch NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN *MCi MoChi Interconnect DEVELOPMENT COST: $50M + $20M DEVELOPMENT COST: $50M

MoChi INTERCONNECT (MCi) IS TRANSPARENT SATA3 (Native) SOC MCi MoChi-SATA BRIDGE SATA3 READ bandwidth (MB/s) WRITE Bandwidth (MB/s) R/W Mixed Bandwidth (MB/s) Native SATA 515 490 500 SATA through MCi 492 510 501

CONCLUSIONS IT S ALL ABOUT DATA ACCESS GET DATA WHEN NEEDED, AT THE RIGHT COST MULTI-TIERED DESIGN IS ESSENTIAL IN STORAGE SYSTEMS EVEN MORE IMPORTANT WITH NEW SCM NVMS HARDWARE-BASED CACHE ENGINE IS NEEDED FOR HIGH PERFORMANCE TIERS EXPANDABLE CONTROLLERS CAN HELP REDUCE DEVELOPMENT COST FOR LOW TIERS