Extending RDMA for Persistent Memory over Fabrics. Live Webcast October 25, 2018

Similar documents
Persistent Memory Over Fabrics. Paul Grun, Cray Inc Stephen Bates, Eideticom Rob Davis, Mellanox Technologies

Persistent Memory over Fabrics

Everything You Wanted To Know About Storage: Part Teal The Buffering Pod

Architectural Principles for Networked Solid State Storage Access

SNIA Tutorial 3 EVERYTHING YOU WANTED TO KNOW ABOUT STORAGE: Part Teal Queues, Caches and Buffers

Remote Persistent Memory SNIA Nonvolatile Memory Programming TWG

Planning For Persistent Memory In The Data Center. Sarah Jelinek/Intel Corporation

Multi-Cloud Storage: Addressing the Need for Portability and Interoperability

Mark Rogov, Dell EMC Chris Conniff, Dell EMC. Feb 14, 2018

Fibre Channel vs. iscsi. January 31, 2018

RoCE vs. iwarp A Great Storage Debate. Live Webcast August 22, :00 am PT

SNIA Tutorial 1 A CASE FOR FLASH STORAGE HOW TO CHOOSE FLASH STORAGE FOR YOUR APPLICATIONS

Adrian Proctor Vice President, Marketing Viking Technology

RDMA Requirements for High Availability in the NVM Programming Model

EVERYTHING YOU WANTED TO KNOW ABOUT STORAGE, BUT WERE TOO PROUD TO ASK Where Does My Data Go? August 1, :00 am PT

SNIA NVM Programming Model Workgroup Update. #OFADevWorkshop

PERSISTENT MEMORY PROGRAMMING

Remote Persistent Memory With Nothing But Net Tom Talpey Microsoft

Trends in Worldwide Media and Entertainment Storage

N V M e o v e r F a b r i c s -

Accessing NVM Locally and over RDMA Challenges and Opportunities

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

Use Cases for iscsi and FCoE: Where Each Makes Sense

Persistent Memory over Fabric (PMoF) Adding RDMA to Persistent Memory Pawel Szymanski Intel Corporation

REMOTE PERSISTENT MEMORY ACCESS WORKLOAD SCENARIOS AND RDMA SEMANTICS

LTFS Bulk Transfer Standard PRESENTATION TITLE GOES HERE

Opportunities from our Compute, Network, and Storage Inflection Points

Overview and Current Topics in Solid State Storage

SMB3 Extensions for Low Latency. Tom Talpey Microsoft May 12, 2016

Overview and Current Topics in Solid State Storage

Design Considerations When Implementing NVM

What NVMe /TCP Means for Networked Storage. Live Webcast January 22, 2019

NVMe SSDs with Persistent Memory Regions

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

Jeff Dodson / Avago Technologies

Windows Support for PM. Tom Talpey, Microsoft

Application Access to Persistent Memory The State of the Nation(s)!

Ron Emerick, Oracle Corporation

The SNIA NVM Programming Model. #OFADevWorkshop

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

Optimize Storage Efficiency & Performance with Erasure Coding Hardware Offload. Dror Goldenberg VP Software Architecture Mellanox Technologies

Everything You Wanted To Know About Storage (But Were Too Proud To Ask) The Basics

Annual Update on Flash Memory for Non-Technologists

Using NVDIMM under KVM. Applications of persistent memory in virtualization

SAS: Today s Fast and Flexible Storage Fabric. Rick Kutcipal President, SCSI Trade Association Product Planning and Architecture, Broadcom Limited

NVMe Over Fabrics (NVMe-oF)

The Benefits of Solid State in Enterprise Storage Systems. David Dale, NetApp

Why Composable Infrastructure? Live Webcast February 13, :00 am PT

Tiered File System without Tiers. Laura Shepard, Isilon

SAS: Today s Fast and Flexible Storage Fabric

Scaling Data Center Application Infrastructure. Gary Orenstein, Gear6

Enabling the NVMe CMB and PMR Ecosystem

Paving the Way to the Non-Volatile Memory Frontier. PRESENTATION TITLE GOES HERE Doug Voigt HP

NVMe over Fabrics. High Performance SSDs networked over Ethernet. Rob Davis Vice President Storage Technology, Mellanox

InfiniBand Networked Flash Storage

Facing an SSS Decision? SNIA Efforts to Evaluate SSS Performance. Ray Lucchesi Silverton Consulting, Inc.

Virtualization Practices:

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Co-Chair, SNIA Cloud Storage TWG

Notes & Lessons Learned from a Field Engineer. Robert M. Smith, Microsoft

Architected for Performance. NVMe over Fabrics. September 20 th, Brandon Hoff, Broadcom.

Persistent Memory, NVM Programming Model, and NVDIMMs. Presented at Storage Field Day June 15, 2017

Windows Support for PM. Tom Talpey, Microsoft

PCIe Storage Beyond SSDs

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

Performance and Innovation of Storage. Advances through SCSI Express

What You can Do with NVDIMMs. Rob Peglar President, Advanced Computation and Storage LLC

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

Mobile and Secure Healthcare: Encrypted Objects and Access Control Delegation

ADVANCED DATA REDUCTION CONCEPTS

EVERYTHING YOU WANTED TO KNOW ABOUT STORAGE, BUT WERE TOO PROUD TO ASK Part Cyan Storage Management. September 28, :00 am PT

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

THE IN-PLACE WORKING STORAGE TIER OPPORTUNITIES FOR SOFTWARE INNOVATORS KEN GIBSON, INTEL, DIRECTOR MEMORY SW ARCHITECTURE

NVMe over Universal RDMA Fabrics

Industry Collaboration and Innovation

Microsoft SMB Looking Forward. Tom Talpey Microsoft

Flash Memory Summit Persistent Memory - NVDIMMs

Create a Smarter and More Economic Cloud Storage Architecture. November 7, 2018

12th ANNUAL WORKSHOP 2016 NVME OVER FABRICS. Presented by Phil Cayton Intel Corporation. April 6th, 2016

The NE010 iwarp Adapter

Impact on Application Development: SNIA NVM Programming Model in the Real World. Andy Rudoff pmem SW Architect, Intel

Important new NVMe features for optimizing the data pipeline

Realizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics

Evolution of Rack Scale Architecture Storage

Introducing NVDIMM-X: Designed to be the World s Fastest NAND-Based SSD Architecture and a Platform for the Next Generation of New Media SSDs

Remote Access to Ultra-Low-Latency Storage Tom Talpey Microsoft

Virtualization Practices: Providing a Complete Virtual Solution in a Box

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics

NVDIMM-N Cookbook: A Soup-to-Nuts Primer on Using NVDIMM-Ns to Improve Your Storage Performance

Application Acceleration Beyond Flash Storage

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

FMS18 Invited Session 101-B1 Hardware Acceleration Techniques for NVMe-over-Fabric

Overcoming System Memory Challenges with Persistent Memory and NVDIMM-P

OpenStack Manila An Overview of Manila Liberty & Mitaka

Storage in combined service/product data infrastructures. Craig Dunwoody CTO, GraphStream Incorporated

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

Apples to Apples, Pears to Pears in SSS performance Benchmarking. Esther Spanjer, SMART Modular

OpenCAPI Technology. Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name. Join the Conversation #OpenPOWERSummit

Transcription:

Extending RDMA for Persistent Memory over Fabrics Live Webcast October 25, 2018

Today s Presenters John Kim SNIA NSF Chair Mellanox Tony Hurson Intel Rob Davis Mellanox

SNIA-At-A-Glance 3

SNIA Legal Notice The material contained in this presentation is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.

What is Persistent Memory (PM)? Persistent Memory (PM) benefits Considerably faster than NAND Flash Performance accessible via PCIe or DDR interfaces Lower cost/bit than DRAM Significantly denser than DRAM non-volatile memory PM 5

For Many Storage Applications PM Needs Network Connectivity PM NAND Read Latency ~100ns ~20us Write Latency ~500ns ~50us 6

Applications Benefitting RPM In-Memory application scaleout Software Defined Storage Big Data & NoSQL workloads Machine learning Hyperconverged Infrastructure Complete compute disaggregation Network 7

Compute Disaggregation PM Apps 8

Core Requirements for Networked PM PM NAND Read Latency ~100ns ~20us Write Latency ~500ns ~50us Network PM is really fast Needs ultra low-latency networking PM has very high bandwidth Needs ultra efficient protocol, transport offload, high bandwidth networks Remote accesses must not add significant latency PM networking requires: Predictability, Fairness, Zero Packet Loss Network switches and adapters must deliver all of these 9

What is RDMA? Remote Data transfers between nodes in a network Direct No Operating System Kernel involvement in transfers Everything about a transfer offloaded onto Interface Card Memory Transfers between user space application virtual memory No extra copying or buffering Access Send, receive, read, write, atomic operations Byte or Block Access 10

RDMA Standards Development RDMA extensions for RPM being standardized in: InfiniBand Trade Association (IBTA): native IB, RoCE Internet Engineering Task Force (IETF): iwarp IBTA/IETF names/terminology may differ: Commit : Flush Atomic Write : Non-Posted Write However, extension semantics are similar, such that - IETF will use same Verbs extensions as IBTA 11

Application 1: Database Persistent Log Replicated log data plus write pointer (wptr) updates, to remote persistent FIFOs CPU Server 0 NVDIMM Client Database Application Network RNIC wptr FIFO Server 1 NVDIMM CPU RNIC Logs are remote persistent FIFOs wptr FIFO Each log entry write: data (~4 KB), followed by write pointer (wptr) update 12

Application 2: Hot Tier Hyperscale or Hyperconverged Storage Compute Node 0 Compute Node 1 Write Replication, Data plus Metadata Network PCIe CPU RNIC NVDIMM metadata data SSD With PMR Storage data writes, replicated for availability: latency-critical, but replicas seldom used Great application for emerging NVMe Persistent Memory Regions (PMRs) Target RNIC writes data directly to SSD PMR, bypassing CPU/memory subsystem Accompanying metadata may be processed by target CPU 13

Why Simple RDMA Writes are Not Enough Client Persistent storage app.. Volatile buffer Network PCIe switch RNIC CPU Race Hazard Server NVDIMM Target buffer Server RNIC acknowledges (ACK) Write as soon as validates it At server, there may be multiple volatile intermediate buffers between RNIC and persistent memory ACK might race back to complete Write at client, before actual data makes it to persistence Bad news, if Write subsequently fails within server Client requirement: explicit confirmation of persistence, to follow Write 14

The Need for RPM Write Ordering Near Socket Far Socket DRAM NVDIMM FIFO tail ptr 2) Write pointer CPU PCIe RNIC Inter socket link 1) Data CPU DRAM NVDIMM FIFO data Server/Target System Remote persistent FIFO, with data NVDIMM further from RNIC than write pointer NVDIMM Without explicit write ordering at RNIC, pointer update may race ahead of data update Result upon server failure: corrupt FIFO 15

RDMA Extensions Summary Three new RDMA Messages: Commit Request/Response: confirms prior Writes have been flushed and committed to persistence Atomic Write Request/Response: small (8B) all-or-nothing Write, with explicit response Verify Request/Response (IETF only): hash-based integrity check of persistent data New Persistence property for data Memory Regions (MRs) New API (Verbs) extensions for Messages and MRs 16

Accelerating RPM Writes Client Best Case Existing Server RNIC Server Mem. Ctlr Client NEW Server RNIC stall Server Mem. Ctlr New RDMA Messages eliminate one serialized round-trip-time in network Commit guarantees persistence (unlike plain Reads on some platforms) New Atomic Write guarantees ordering of successive RPM writes (Diagram uses IETF terminology) 17

Matches SNIA NVM PM Model RDMA to PMEM for High Availability App: SW PeerA: Host SW PeerANIC: RNic PeerBNIC: RNic PeerB: Host SW PeerB PM: PM MAP Memory address for the file Memory address + Registration of the replication SYNC Write all the dirty pages to remote replication FLUSH (IBTA or Commit, IETF) the writes to persistency UNMAP Invalidate the registered pages for replication Map Store Store Store Optimized Flush Unmap RDMAOpen RDMAMmap RDMAWrite 1 RDMAWrite RDMAWrite RDMAWrite RDMAWrite RDMAWrite Flush Flush 2 RDMAUnmap register Memory Flush unregister Memory Write Write Write Flush 3 18

HGST PM Remote Access Demo HGST live demo at Flash Memory Summit RPM(=PCM) based Key-Value store over 100Gb/s RDMA 19

Application Level Performance PCM is slightly slower than DRAM but equivalent application perf https://www.hgst.com/company/media-room/press-releases/hgst-to-demo-inmemory-flash-fabricand-lead-discussions 20

After This Webcast Please rate this webcast and provide us with feedback This webcast and a PDF of the slides will be posted to the SNIA Networking Storage Forum (NSF) website and available on-demand at www.snia.org/library A full Q&A from this webcast, including answers to questions we couldn't get to today, will be posted to the SNIA-NSF blog: sniansfblog.org Follow us on Twitter @SNIANSF 2

Thanks!