Persistent Memory over Fabric (PMoF) Adding RDMA to Persistent Memory Pawel Szymanski Intel Corporation

Similar documents
Impact on Application Development: SNIA NVM Programming Model in the Real World. Andy Rudoff pmem SW Architect, Intel

THE IN-PLACE WORKING STORAGE TIER OPPORTUNITIES FOR SOFTWARE INNOVATORS KEN GIBSON, INTEL, DIRECTOR MEMORY SW ARCHITECTURE

Persistent Memory over Fabrics

SNIA NVM Programming Model Workgroup Update. #OFADevWorkshop

Windows Support for PM. Tom Talpey, Microsoft

Persistent Memory Over Fabrics. Paul Grun, Cray Inc Stephen Bates, Eideticom Rob Davis, Mellanox Technologies

Windows Support for PM. Tom Talpey, Microsoft

Application Access to Persistent Memory The State of the Nation(s)!

Remote Persistent Memory SNIA Nonvolatile Memory Programming TWG

APIs for Persistent Memory Programming

Extending RDMA for Persistent Memory over Fabrics. Live Webcast October 25, 2018

Accessing NVM Locally and over RDMA Challenges and Opportunities

RDMA Requirements for High Availability in the NVM Programming Model

REMOTE PERSISTENT MEMORY ACCESS WORKLOAD SCENARIOS AND RDMA SEMANTICS

Using persistent memory and RDMA for Ceph client write-back caching Scott Peterson, Senior Software Engineer Intel

The Slow NVM (R)evolution in the context of Next Generation Postprocessing

PERSISTENT MEMORY PROGRAMMING

VMware vsphere Virtualization of PMEM (PM) Richard A. Brunner, VMware

NVMe SSDs with Persistent Memory Regions

SMB3 Extensions for Low Latency. Tom Talpey Microsoft May 12, 2016

Persistent Memory: The Value to HPC and the Challenges

Architectural Principles for Networked Solid State Storage Access

Using NVDIMM under KVM. Applications of persistent memory in virtualization

Update on Windows Persistent Memory Support Neal Christiansen Microsoft

Design Considerations When Implementing NVM

The SNIA NVM Programming Model. #OFADevWorkshop

Remote Persistent Memory With Nothing But Net Tom Talpey Microsoft

Architected for Performance. NVMe over Fabrics. September 20 th, Brandon Hoff, Broadcom.

Opportunities from our Compute, Network, and Storage Inflection Points

Evolution of Rack Scale Architecture Storage

Annual Update on Flash Memory for Non-Technologists

Capabilities and System Benefits Enabled by NVDIMM-N

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

The SNIA NVM Programming Model: Latest Developments and Challenges. Andy Rudoff, Intel Corporation

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

Remote Access to Ultra-Low-Latency Storage Tom Talpey Microsoft

SNIA s SSSI Solid State Storage Initiative. Jim Pappas Vice-Char, SNIA

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics

IN-PERSISTENT-MEMORY COMPUTING WITH JAVA ERIC KACZMAREK INTEL CORPORATION

Planning For Persistent Memory In The Data Center. Sarah Jelinek/Intel Corporation

Important new NVMe features for optimizing the data pipeline

NTRDMA v0.1. An Open Source Driver for PCIe NTB and DMA. Allen Hubbe at Linux Piter 2015 NTRDMA. Messaging App. IB Verbs. dmaengine.h ntb.

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

Building on The NVM Programming Model A Windows Implementation

PM Support in Linux and Windows. Dr. Stephen Bates, CTO, Eideticom Neal Christiansen, Principal Development Lead, Microsoft

NVM Express (NVMe) & Solid State Storage: What s Next? David L. Black, Ph.D. Distinguished Engineer, Office of the CTO

NEXTGenIO Performance Tools for In-Memory I/O

FaRM: Fast Remote Memory

Why Composable Infrastructure? Live Webcast February 13, :00 am PT

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF

Advanced Computer Networks. End Host Optimization

PASTE: A Network Programming Interface for Non-Volatile Main Memory

Storage Protocol Offload for Virtualized Environments Session 301-F

N V M e o v e r F a b r i c s -

Flash Memory Summit Persistent Memory - NVDIMMs

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Enhancing NVMe-oF Capabilities Using Storage Abstraction

Microsoft SMB Looking Forward. Tom Talpey Microsoft

12th ANNUAL WORKSHOP 2016 NVME OVER FABRICS. Presented by Phil Cayton Intel Corporation. April 6th, 2016

Toward a Memory-centric Architecture

The Impact of Persistent Memory and Intelligent Data Encoding

iscsi or iser? Asgeir Eiriksson CTO Chelsio Communications Inc

Efficient Memory Mapped File I/O for In-Memory File Systems. Jungsik Choi, Jiwon Kim, Hwansoo Han

The Long-Term Future of Solid State Storage Jim Handy Objective Analysis

memory VT-PM8 & VT-PM16 EVALUATION WHITEPAPER Persistent Memory Dual Port Persistent Memory with Unlimited DWPD Endurance

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

New Computational Approaches to Big Data

IsoStack Highly Efficient Network Processing on Dedicated Cores

Windows Persistent Memory Support

Ziye Yang. NPG, DCG, Intel

NVDIMM-N Cookbook: A Soup-to-Nuts Primer on Using NVDIMM-Ns to Improve Your Storage Performance

Persistent Memory, NVM Programming Model, and NVDIMMs. Presented at Storage Field Day June 15, 2017

Concurrent Support of NVMe over RDMA Fabrics and Established Networked Block and File Storage

An Approach for Implementing NVMeOF based Solutions

Designing High-Performance Non-Volatile Memory-aware RDMA Communication Protocols for Big Data Processing

BUILDING A BLOCK STORAGE APPLICATION ON OFED - CHALLENGES

Application Advantages of NVMe over Fabrics RDMA and Fibre Channel

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Flavors of Memory supported by Linux, their use and benefit. Christoph Lameter, Ph.D,

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

The NE010 iwarp Adapter

The Magic and Mystery of In-Memory Apps. Taufik Ma Industry Insight Shaun Walsh - Marketeer

SPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

Ceph in a Flash. Micron s Adventures in All-Flash Ceph Storage. Ryan Meredith & Brad Spiers, Micron Principal Solutions Engineer and Architect

I want to build on Rick Coulson s talk earlier this morning that addressed emerging Persistent Memory technologies. I want to expand on implications

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

RDMA programming concepts

G2M Research Presentation Flash Memory Summit 2018

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

Technology Advancement in SSDs and Related Ecosystem Changes

No compromises: distributed transac2ons with consistency, availability, and performance

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

Application Acceleration Beyond Flash Storage

No Compromises. Distributed Transactions with Consistency, Availability, Performance

Transcription:

Persistent Memory over Fabric (PMoF) Adding RDMA to Persistent Memory Pawel Szymanski Intel Corporation 1

Adding RDMA to Persisteny memory Agenda PMoF Overview Comparison with other remote replication technologies RPMEM Library Capabilties NVML Architecture with RPMEM library RPMEM Library API Remote replication using PMEMOBJ library Future Improvements 2

Persistent Memory over Fabric (PMoF) Overview Enables low latency and high speed remote replication of persistent memory using various fabrics (IB, RoCE, iwarp, Omnipath, etc) Transport agnostic by using RDMA Verbs Many possible non-volatile byte addressable devices are considered in scope (NVDIMMs, 3DXP DIMMs) Does not support replication of traditional block-based storage 3

PMoF comparison with other remote replication storage technologies APP DRAM Kernel network stack PCIe / DDR NIC Initiator Node Kernel network stack PCIe / DDR NIC DRAM Target Node TCP/IP based replication to block storage Kernel Storage stack PCIe/ NVMe/ SATA Block storage (SSD, HDD etc.) APP DRAM PCIe / DDR Initiator Node PCIe / DDR DRAM Kernel Storage stack PCIe/ NVMe/ SATA Block storage (SSD, HDD etc.) Target Node RDMA based replication to block storage (e.g. NVMEoF) APP DRAM / PM PCIe / DDR Initiator Node Persistent Memory (e.g. NVDIMM) PCIe / DDR Target Node PMoF-based replication to Persistent Memory 4

Rpmem library PMoF implementation in NVML Fabric agnostic Can use any fabric with libfabric and libverbs support Supports GPSPM and APM persistency methods* GPSPM uses RDMA Send APM uses RDMA Read Utilizes multiple RDMA queue pairs for highly parallel workload Each application thread can replicate data independently without a need for inter-thread synchronization Includes RPMEMD daemon process to be run on remote replica node No need to implement application specific target node service * For details refer to SNIA SDC 2015 presentation by Chet Douglas: https://www.snia.org/sites/default/files/sdc15_presentations/persistant_mem/chetdouglas_rdma_with_pm.pdf 5

NVML Architecture with RPMEM Initiator Node Application Target Node NVML LIBPMEMBLOCK / LIBPMEMLOG LIBPMEM LIBPMEMOBJ LIBRPMEM RPMEMD Deamon LIBPMEM DAX-enabled File System PMEM Libfabric libverbs SSH Fabric Libfabric libverbs SSH DAXenabled File System PMEM 6

RPMEM Library API Remote poolset management functions rpmem_create( ) Starts RPMEMD process on single remote node using SSH Requests remote RPMEMD to create poolset Establishes RDMA connection to remote node Registers local and remote poolset (persistent memory) in libfabric rpmem_open( ) Same as rpmem_create but opens existing poolset and verifies whether it matches local poolset 7

RPMEM Library API Poolset management functions rpmem_close( ) Deregisters local and remote poolset from libfabric/verbs Disconnects RDMA connection Shuts down RPMEMD process on remote node rpmem_remove ( ) Same as rpmem_close but also removes poolset on remote node 8

RPMEM Library API memory replication functions rpmem_persist( ) Replicates data from local to remote poolset using RDMA Allows to specify data offset and size within the pool rpmem_read( ) Copies data from remote poolset to local memory (either local persistent memory or regular DRAM) using RDMA Read Could be used to verify correctness of remote replica or recover local poolset from remote replica Allows to specify data offset and size within the pool Does not persist data locally (no CPU cache flush etc.) Refer to pmem.io for more detailed description 9

Remote replication in PMEMOBJ PMEMOBJ can automatically replicate any writes to persistent memory using RPMEM Replication process is transparent to application Just add one or more remote replicas to pool set definition file PMEMPOOLSET 100G /mountpoint0/myfile.part0 # local replica REPLICA 100G /mountpoint3/mymirror.part0 # remote replica REPLICA user@example.com remote-objpool.set 10

Remote replication in PMEMOBJ Application PMEMOBJ Local PMEM RPMEM Libfabric/ Target Node(s) write pmemobj_persist( ) Repeat for each target node rpmem_persist (target_node1, ) fi_writemsg( ) fi_readmsg( ) fi_cqread ( ) RDMA Write RDMA Read RDMA Read ACK CLFLUSHOPT/CLWB + SFENCE 11

Future improvements and enhancements Performance improvements Parallel replication to multiple remote nodes and local replica(s) Use of single RDMA Read/Send for multiple RDMA Write operations similar to local OPTIMIZED_FLUSH Support for Windows OS Eventually consistent (aka asynchronous) replication 12

Key Takeaways PMoF improves latency and throughput of remote PMEM replication NVML includes RPMEM library implementing PMoF Application can either call RPMEM directly or can use PMEMOBJ that use RPMEM for replication Start using NVML with RPMEM for remote replication: http://pmem.io 13