Software and Management for NVMe Session A12 Part B 3:40 to 4:45

Similar documents
Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

NVMe Management Interface (NVMe-MI)

NVM Express TM Management Interface

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

Ziye Yang. NPG, DCG, Intel

NVM Express 1.3 Delivering Continuous Innovation

Accelerating NVMe I/Os in Virtual Machine via SPDK vhost* Solution Ziye Yang, Changpeng Liu Senior software Engineer Intel

THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF

Jim Harris Principal Software Engineer Intel Data Center Group

SPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation

Accelerate block service built on Ceph via SPDK Ziye Yang Intel

Changpeng Liu. Senior Storage Software Engineer. Intel Data Center Group

Jim Harris. Principal Software Engineer. Intel Data Center Group

New Standard for Remote Monitoring and Management of NVMe SSDs: NVMe-MI

NVM Express TM Ecosystem Enabling PCIe NVMe Architectures

NVMe : What you need to know for next year

Architected for Performance. NVMe over Fabrics. September 20 th, Brandon Hoff, Broadcom.

Implementing Virtual NVMe for Flash- Storage Class Memory

Ben Walker Data Center Group Intel Corporation

SPDK Blobstore: A Look Inside the NVM Optimized Allocator

An Approach for Implementing NVMeOF based Solutions

Changpeng Liu. Cloud Storage Software Engineer. Intel Data Center Group

EXPERIENCES WITH NVME OVER FABRICS

Storage Protocol Offload for Virtualized Environments Session 301-F

NVMe over Fabrics support in Linux Christoph Hellwig Sagi Grimberg

FC-NVMe. NVMe over Fabrics. Fibre Channel the most trusted fabric can transport NVMe natively. White Paper

Future of datacenter STORAGE. Carol Wilder, Niels Reimers,

Jim Harris. Principal Software Engineer. Data Center Group

VMware Virtual SAN. Technical Walkthrough. Massimiliano Moschini Brand Specialist VCI - vexpert VMware Inc. All rights reserved.

Important new NVMe features for optimizing the data pipeline

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

An NVMe-based FPGA Storage Workload Accelerator

NVMe over Universal RDMA Fabrics

Intel Builder s Conference - NetApp

Changpeng Liu, Cloud Software Engineer. Piotr Pelpliński, Cloud Software Engineer

NVMe From The Server Perspective

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

Low latency and high throughput storage access

Windows Support for PM. Tom Talpey, Microsoft

Windows Support for PM. Tom Talpey, Microsoft

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Cisco UCS S3260 System Storage Management

NVM PCIe Networked Flash Storage

SNIA Developers Conference - Growth of the iscsi RDMA (iser) Ecosystem

The Transition to PCI Express* for Client SSDs

Cisco HyperFlex HX220c M4 Node

OCP3. 0. ConnectX Ethernet Adapter Cards for OCP Spec 3.0

NVMe Direct. Next-Generation Offload Technology. White Paper

Storage Performance Development Kit (SPDK) Daniel Verkamp, Software Engineer

Hardware NVMe implementation on cache and storage systems

NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit

Achieve Optimal Network Throughput on the Cisco UCS S3260 Storage Server

NVMe Takes It All, SCSI Has To Fall. Brave New Storage World. Lugano April Alexander Ruebensaal

PARAVIRTUAL RDMA DEVICE

How Next Generation NV Technology Affects Storage Stacks and Architectures

LEGAL NOTICE: LEGAL DISCLAIMER:

MDev-NVMe: A NVMe Storage Virtualization Solution with Mediated Pass-Through

memory VT-PM8 & VT-PM16 EVALUATION WHITEPAPER Persistent Memory Dual Port Persistent Memory with Unlimited DWPD Endurance

NVMe Over Fabrics (NVMe-oF)

Extremely Fast Distributed Storage for Cloud Service Providers

W H I T E P A P E R. What s New in VMware vsphere 4: Performance Enhancements

AccelStor All-Flash Array VMWare ESXi 6.0 iscsi Multipath Configuration Guide

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

Software-Defined Data Infrastructure Essentials

Extending the NVMHCI Standard to Enterprise

Out-of-band (OOB) Management of Storage Software through Baseboard Management Controller Piotr Wysocki, Kapil Karkra Intel

Cisco UCS B460 M4 Blade Server

Building Your Own Robust and Powerful Software Defined Storage with VMware vsan. Tips on Choosing Hardware for vsan Deployment

Cisco UCS B440 M1High-Performance Blade Server

Configuration Maximums. Update 1 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5

InfiniBand Networked Flash Storage

Cisco UCS S3260 System Storage Management

SPDK NVMe In-depth Look at its Architecture and Design Jim Harris Intel

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

10 Steps to Virtualization

NAS for Server Virtualization Dennis Chapman Senior Technical Director NetApp

Overview. About the Cisco UCS S3260 System

NVM Express over Fabrics Storage Solutions for Real-time Analytics

NVMe over Fabrics Discussion on Transports

PCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based)

PCI-SIG ENGINEERING CHANGE NOTICE

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

ZD-XL SQL Accelerator 1.6

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics

The Non-Volatile Memory Verbs Provider (NVP): Using the OFED Framework to access solid state storage

2014 LENOVO INTERNAL. ALL RIGHTS RESERVED.

SAS Technical Update Connectivity Roadmap and MultiLink SAS Initiative Jay Neer Molex Corporation Marty Czekalski Seagate Technology LLC

BUILDING A BLOCK STORAGE APPLICATION ON OFED - CHALLENGES

14th ANNUAL WORKSHOP 2018 NVMF TARGET OFFLOAD. Liran Liss. Mellanox Technologies. April 2018

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

Standardized Firmware for ARMv8 based Volume Servers

QuickSpecs. Overview. HPE Ethernet 10Gb 2-port 535 Adapter. HPE Ethernet 10Gb 2-port 535 Adapter. 1. Product description. 2.

Cisco HyperFlex HX220c Edge M5

Data Center solutions for SMB

Hardware Monitoring. Monitoring a Fabric Interconnect

Latest Developments with NVMe/TCP Sagi Grimberg Lightbits Labs

Platform Management Component Intercommunications (PMCI) Architecture. White Paper

Next Generation Computing Architectures for Cloud Scale Applications

Pivot3 Acuity with Microsoft SQL Server Reference Architecture

Transcription:

Software and Management for NVMe Session A12 Part B 3:40 to 4:45 An overview and new features targeting NVMe-MI 1.1 New features in NVMe drivers Linux, Windows, and VMware Storage Performance Development Kit and NVM Express Austin Bolen Myron Loewen Uma Parepalli, Lee Prewitt Suds Jain, VMware Parag Maharana, Jim Harris Senior Principal Engineer, Dell EMC Platform Architect in NVM Solutions Group, Intel Senior Director, Stealth Mode Startup Principal Program Manager, Microsoft Vmware SSD Architect, Seagate Principal Engineer, Intel Santa Clara, CA 1

NVMe-MI Enhancements Austin Bolen Dell EMC Myron Loewen Intel Peter Onufryk Microsemi Santa Clara, CA 2

Agenda NVMe-MI Workgroup Update NVMe-MI 1.0a Overview Proposed NVMe-MI 1.1 Major Features In-Band NVMe-MI NVMe-MI Enclosure Management NVMe Storage Device Extension Summary Santa Clara, CA 3

NVMe-MI Workgroup Update Santa Clara, CA 4

NVMe-MI 1.0a Overview Santa Clara, CA 5

NVMe Management Interface 1.0a What is the NVMe Management Interface 1.0a? A programming interface that allows out-of-band management of an NVMe Field Replaceable Unit (FRU) or an embedded NVMe NVM Subsystem Santa Clara, CA 6

Management Fundamentals What is meant by management? Four pillars of systems management: Inventory Configuration Monitoring Change Management Management operational times: Deployment (No OS) Pre-OS (e.g. UEFI/BIOS) Runtime Auxiliary Power Decommissioning Santa Clara, CA 7

FRU definition (Wikipedia): Field Replaceable Unit (FRU) A circuit board, part or assembly that can be quickly and easily removed from a computer or other piece of electronic equipment, and replaced by the user or a technician without having to send the entire product or system to a repair facility. Santa Clara, CA 8

Out-of-Band Definition Per MCTP Overview White Paper (DSP2016), version 1.0.0: Out-of-band In NVMe-MI: Management that operates with hardware resources and components that are independent of the operating systems control. Out-of-band The out-of-band communication path for NVMe-MI is from a Management Controller (BMC) to a Management Endpoint (NVMe storage device) via: 1. MCTP over SMBus/I2C 2. MCTP over PCIe VDM 3. IPMI FRU Data (VPD) access over SMBus/I2C per IPMI Platform Management FRU Information Storage Definition Santa Clara, CA 9

In-Band Definition Per MCTP Overview White Paper (DSP2016), version 1.0.0: In-band Management that operates with the support of hardware components that are critical to and used by the operating system. Note: The operating system reference here is the host operating system, not the BMC operating system. In NVMe-MI: In-band The in-band communication path for NVMe-MI is from host software to an NVMe Controller via the NVMe Admin Queue using the NVMe-MI Send and NVMe-MI Receive commands. Santa Clara, CA 10

NVMe-MI 1.0a (out-of-band) NVMe driver communicates to NVMe controllers over PCIe per NVMe Spec Host Processor Host Operating System Application NVMe Driver Management Controller (BMC) BMC Operating System Application BMC Operating System NVMe-MI Driver MC runs on its own OS on it own processor independent from host OS and driver Two OOB paths: PCIe VDM and SMBus PCIe Root Port PCIe Bus PCIe Port PCIe Root Port PCIe VDM PCIe Bus PCIe Port SMBus/I2C SMBus/I2C SMBus/I2C PCIe VDMs are completely separate from inband PCIe traffic though they share the same physical connection Out-of-Band Data Flow Out-of-Band: NVMe-MI over MCTP over PCIe VDM NVMe NVM Subsystem Out-of-Band: NVMe-MI over MCTP over SMBus/I2C and VPD Santa Clara, CA NVMe-MI 1.0a is out-of-band only 11

NVMe Storage Device Management Server Caching Server Storage External Storage Root Complex NVMe Root Complex Controller A Root Complex Root Complex Controller B PCIe Switch x16 x16 PCIe/PCIe RAID SAS x16 PCIe Switch x16 PCIe Switch SAS x4 NVMe NVMe NVMe x4 NVMe NVMe NVMe NVMe NVMe NVMe NVMe SAS HDD Example Pre-boot Management Inventory, Power Budgeting, Configuration, Firmware Update Example Out-of-Band Management During System Operation Health Monitoring, Power/Thermal Management, Firmware Update, Configuration Santa Clara, CA 12

NVMe-MI Protocol Layering Management Applications (e.g., Remote Console) MCTP over SMBus/I2C Binding Management Controller (BMC or Host Processor) NVMe Management Interface Management Component Transport Protocol (MCTP) MCTP over PCIe VDM Binding Application Layer Protocol Management Layer Applications (e.g., Remote Console) Transport Layer SMBus/I2C PCIe Physical Layer PCIe SSD Santa Clara, CA 13

NVMe-MI 1.0a Command Set Overview Command Type Command Command Type Command NVMe Management Interface Specific Commands PCIe Command Read NVMe-MI Data Structure NVM Subsystem Health Status Poll Controller Health Status Poll Configuration Get Configuration Set VPD Read VPD Write Reset PCIe Configuration Read PCIe Configuration write PCIe I/O Read PCIe I/O Write PCIe Memory Read PCIe Memory Write NVMe Commands Firmware Activate/Commit Firmware Image Download Format NVM Get Features Get Log Page Identify Namespace Management Namespace Attachment Security Send Security Receive Set Features Santa Clara, CA 14

In-band NVMe-MI Santa Clara, CA 15

NVMe-MI 1.1 (out-of-band and in-band) Host Processor Management Controller (BMC) NVMe-MI 1.1 adds in-band NVMe-MI tunnel NVMe-MI command tunneled using two new NVMe Admin Commands NVMe-MI Send NVMe-MI Receive Host Operating System Application NVMe Driver BMC Operating System Application BMC Operating System NVMe-MI Driver PCIe Root Port PCIe Root Port PCIe Port SMBus/I2C PCIe Bus PCIe VDM PCIe Bus SMBus/I2C Out-of-Band and In-band Data Flow PCIe Port SMBus/I2C Out-of-Band: NVMe-MI over MCTP over PCIe VDM Out-of-Band: NVMe-MI over MCTP over SMBus/I2C and VPD In-Band: NVMe-MI Tunnel over NVMe NVMe NVM Subsystem NVMe-MI 1.1 adds in-band NVMe-MI Tunnel Santa Clara, CA 16

NVMe-MI over NVMe-oF Host Processor Host Operating System Management Controller (BMC) BMC Operating System Flash Appliance Fabric NVMe-oF Fabric Application NVMe Software Application BMC Operating System NVMe-MI Driver PCIe Root Port PCIe Root Port PCIe Port SMBus/I2C Out-of-Band, In-band, and Fabrics Data Flow Out-of-Band: NVMe-MI over MCTP over PCIe VDM PCIe Bus PCIe Port PCIe VDM PCIe Bus SMBus/I2C SMBus/I2C Out-of-Band: NVMe-MI over MCTP over SMBus/I2C and VPD In-Band: NVMe-MI Tunnel over NVMe In-Band: NVMe-MI Tunnel over NVMe-oF NVMe NVM Subsystem Santa Clara, CA Plumbing in place for NVMe-MI over NVMe-oF 17

Benefits NVMe-MI offers features not available in-band via NVMe. For example: o Ability to manage NVMe at the FRU level o Vital Product Data (VPD) Access o Enclosure Management NVMe-MI in-band tunnel allows defining commands once in NVMe-MI and utilizing them out-of-band, in-band, and over fabrics. Allows NVMe Technical Workgroup to focus on non-management related work Santa Clara, CA 18

Enclosure Management Santa Clara, CA 19

Example Enclosure NVMe Enclosure... NVM Subsystem NVMe Controller Mgmt. Ep. Cntrl. Mgmt Intf. Power Supplies Cooling Objects Temp. Sensors Enclosure Services Process Other Objects NVMe Storage Device NVMe Storage Device NVMe Storage Device... NVMe Storage Device

Enclosure Management Native PCIe Enclosure Management (NPEM) Submission to PCI-SIG Protocol Workgroup (PWG) on behalf of the NVMe Management Interface Workgroup (NVMe-MI) Transport specific basic management that is outside the scope of the NVMe-MI workgroup SES Based Enclosure Management Technical proposal being developed in NVMe-MI workgroup Comprehensive enclosure management

Reuse NVMe drivers SES Based Enclosure Management Reuse SCSI Enclosure Services (SES) developed by T10 for management of enclosures using the SCSI architecture While the NVMe and SCSI architectures differ, the elements of an enclosure and the capabilities required to manage these elements are the same Example enclosure elements: power supplies, fans, display or indicators, locks, temperature sensors, current sensors, and voltage sensors NVMe-MI leverages SES for enclosure management SES manages the elements of an enclosure using control and status diagnostic pages transferred using SCSI commands (SCSI SEND DIAGNOSTIC & SCSI RECEIVE DIAGNOSTIC RESULTS) NVMe-MI uses these same control and status diagnostic pages, but transfers them using the SES Send and SES Receive commands.

NVMe-MI SES Layering Legacy SCSI Host Software NVMe Host Software Management Controller SCSI Translation PCIe / Fabric NVMe Controller with In-Band NVMe-MI Support NVMe-MI SES Send & SES Receive Commands SCSI Enclosure Services (SES)

NVM Storage Device Enhancement

Original NVMe Storage Devices PCIe Port PCIe Port 0 PCIe Port 1 NVM Subsystem NVM Subsystem SMBus/I2C PCIe SSD PCIe SSD An NVMe Storage Device consists of one NVM Subsystem with One or more PCIe ports An optional SMBus/I2C interface Santa Clara, CA 25

New Multi Sub System NVMe Devices M.2 Carrier Board from Amfeltec ANA Carrier Board from Facebook PCIe SSD PCIe Switch NVM Subsystem NVM Subsystem NVM Subsystem NVM Subsystem NVM Subsystem NVM Subsystem PCIe SSD

SMBus Topologies Multiple subsystems on a single SMBus path ARP and Mux supported Scalable from 2 to 8 or more subsystems Host Host ARP Subsystem 1 E8h 3Ah Subsystem 1 A0h ARP Subsystem 2 A0h 3Ah Subsystem 2 VPD optional ARP ARP Subsystem 3 Subsystem 4 VPD required Mux 3Ah 3Ah Subsystem 3 Subsystem 4

ARP changes Related Changes NVMe-MI specification to enable additional devices PMBus/SMBus specification to add new default slave type VPD Updates in NVMe-MI Indicate topology and details for mux Optional temperature sensor on carrier board Update for multiple ports

Summary NVMe-MI 1.0a has been released Multiple NVMe devices passed the UNH-IOL NVMe-MI Compliance Tests and are shipping Systems that support NVMe-MI 1.0a devices are shipping NVMe-MI 1.1 targeting release by end of 2017 In-band NVMe-MI Enclosure Management NVMe storage device enhancements

NVMe Device Drivers Update and New Features - Uma Parepalli, Lee Prewitt, Parag Maharana, Suds Jain Santa Clara, CA 30

NVMe Ecosystem http://www.nvmexpress.org/resources/drivers/ Santa Clara, CA 31

New Features in NVMe Drivers UEFI & Windows Community Drivers Uma Parepalli Microsoft Windows Lee Prewitt, Microsoft VMWare Suds Jain, VMWare Linux Parag Maharana, Seagate Santa Clara, CA 32

NVMe UEFI Drivers Uma Parepalli Santa Clara, CA 33

NVMe UEFI Drivers Non-blocking I/O - Added NVMe Read/Write Block I/O protocols to issue command and poll/signal when transfer is complete. ARM platforms support NVMe at UEFI level. Namespace support at UEFI level is available through some implementations. NVMe UEFI diagnostics drivers available. Santa Clara, CA 34

NVMe Windows Community Driver Uma Parepalli Santa Clara, CA 35

NVMe Windows Community Driver Latest Rev.1.5.0.0 released in in Dec 2016. Separate from Microsoft inbox drivers. Maintained by dedicated engineers. Hosted on OFA site. Heavily used by some OEMs and IHVs for custom test & debug. Santa Clara, CA 36

NVMe Windows Community Driver Namespace Management (Create, Delete, Attach, Detach) EOL Read Only Support Win 8.1 Timers Surprise Removal Support in IOCTL Path Disk Initialization Performance Optimization Santa Clara, CA 37

NVMe Windows Community Driver Storage Request Block Support StorPort Performance Options StorPort DPC Redirection Security Send/Receive with Zero Data Length SNTI updates for SCSI to NVMe Translation Santa Clara, CA 38

NVMe Windows Community Driver Includes additional bug fixes Performance improvement & robustness NVMe Spec rev 1.2 feature compliance Support for MSFT Windows 10, 8.1, 7, Server 2012 R2, 2012 and 2008 R2 Support for both 32-bit & 64-bit Santa Clara, CA 39

Windows Inbox NVMe Driver Lee Prewitt Microsoft Santa Clara, CA 40

Agenda New Additions for Windows Creators Addition (RS2) New Additions for Fall Update (RS3) Futures Santa Clara, CA 41

NVMe Additions for Windows Creators Addition (RS2) Host Memory Buffer enabled by default Firmware Update and Activate Performance tuning Power tuning Santa Clara, CA 42

New Additions for Fall Update (RS3) Timestamp (v 1.3) Firmware Update Granularity (v1.3) Namespace Optimal IO Boundary (v1.3) Asynchronous Events for Namespace Addition Pass-through support of vendor unique log pages, Device Self-test, Compare commands Support for Controller Fatal Status Flag Streams (for Azure) Santa Clara, CA 43

Futures* NVMe SCSI Translation Reference IO Determinism *Not plan of record Santa Clara, CA 44

NVM Express in vsphere Environment Sudhanshu (Suds) Jain VMware Santa Clara, CA 45

Agenda vsphere NVMe Driver EcoSystem Santa Clara, CA 46

Software-Defined Infrastructure Flash and NVMe A major focus area (moving forward) vsphere Flash Use-Cases: (KB 2145210) Host swap cache Regular Datastore vsphere Flash Read Cache (aka Virtual Flash) VSphere ESXi Boot Disk VSphere ESXi Coredump device VSphere ESXi Logging device Virtual SAN (VSAN) 47

vsphere I/O Stack Powering Your Applications & Infrastructure Guest OS Stack Device Emulation Para-Virtualized Device I/O Device Driver NIC Driver PCSCI Driver Native NVMe Driver Para Virtualized RDMA T C P / I P S T A C K D i Monitor r vmxnet3 e c t P a VDS s s NIOC - t h SIOC r ESXi o u g Physical Hardware h I/O Driver P C I LSI PVSCSI vnvme Virtual SAN Virtual Volumes VMFS FCoE iscsi NVMe NVMeOverFabric NIC SAS/SCSI FC/SAS RDMA vrdma NFS Scheduler Memory Allocator S C S I S T A C K N V M e S T A C K NSX Datapath Interconnect like PCIe NIC GPU/GPGPU NVMe SSD PCI SSD FPGA IB and OPA HDD Hardware Accelerations e.g QAT VSAN Datapath 48

vsphere NVMe Native Driver Stack Guest OS Stack LSI Driver PCSCI Driver Native NVMe Driver Para Virtualized RDMA S C S I Monitor LSI PVSCSI vnvme vrdma N V M e S T A C K Scheduler Memory Allocator vmkernel Virtual SAN I/O Driver NIC FCoE Virtual Volumes iscsi SAS/SCSI FC/SAS VMFS NVMe NVMeOverFabric RDMA NFS S T A C K Physical Hardware CPU DIMM/NVDIMM NIC HBA HDD SSD PCI SSD PCIe NVMe 49

vsphere Driver Architecture Evolution VMK API VMK API VMK API vmklinux Random, IPMI SW FCoE FCoE Driver RDMA RoCE, iwarp New feature development Native SAS/FC Driver Native NIC Driver Native Graphics Driver Additional Partner Native Drivers SATA Library SATA Driver SW FCoE FCoE Driver SAS/FC Driver Legacy support NIC Driver 50

Journey towards All NVMe Stack NVMe Technical Work Begins NVM Express Evolution and vsphere NVMe 1.0 Released March 1, 2011 Queueing Interface NVM Command Set Admin Command Set End- to- end Protection (DIF /DIX ) Security Physical Region Pages ( PRPs ) NVMe 1.1 Released October 11, 2012 General Scatter Gather Lists (SGLs) Multi-Path I/O & Namespace Sharing Reservations Autonomous Power Transitions During Idle NVMe 1.2 Released November 3, 2014 Implementation and Reporting Refinements Name Space Management Controller Memory Buffer Host Memory Buffer Ground Work for NVMe Management NVMe Over Fabric 1.0 Released June 5, 2016 Defines extension to NVMe, for non PCI Primary focus on RDMA Compatible to FC-NVMe (INCITS 540) Host Memory Buffer Ground Work for NVMe Management 2009 2010 2011 2012 2013 2014 2015 2016 vsphere 5.5 vsphere 6.0 vsphere 6.5 Future Direction Introduce first async NVMe driver 1.0e Launch IOVP cert program for NVMe Introduce first inbox NVMe driver Bring broader ecosystem support vnvme Optimized performance for NVMe driver CONFIDENTIAL End-to-end NVMe Multiple name spaces, Queues NVMe Over Fabric End-to-end NVMe Stack 51

NVMe Driver Ecosystem Available as part of base ESXi image from vsphere 6.0 onwards Faster innovation with async release of VMware NVMe driver VMware led vsphere NVMe Open Source Driver project to encourage ecosystem to innovate https://github.com/vmware/nvme Broad NVMe Ecosystem on VMware NVMe Driver https://www.vmware.com/resources/compatibility/search.php?devicecategory=io Close to 300 third party NVMe devices certified on VMware NVMe driver Also available for download (async) VMware ESXi 5.5 nvme 1.2.0.27-4vmw NVMe Driver for PCI Express based Solid-State Drives 52

Introducing Virtual NVMe New in vsphere 6.5 Feature: v NVMe 1.0e Device Emulation v Works with inbox NVMe driver is various OS v Hot add/remove support Benefits: v Improved application performance, better IOPS and latency numbers v Leverage Native Stack from Guest OS (Linux, Windows ) 53

NVMe Focus @VMware Summary vsphere 6.5 6-12 Months Future Direction Driver Boot (UEFI) Firmware Update End-to-end protection Deallocate/TRIM/Unmap 4K SMART, Planned hot-remove Performance enhancements Extended CLI/UI Name space management Async event error handling Enhance diagnostic logs NVMe Over Fabric Multiple fabric option SR-IOV Sanitize I/O Determinism Core Stack Reduced serialization Locality improvements vnvme Adaption layer Multiple completion worlds support in NVMe Optimized stack with higher performance NVMe Multi-pathing Dynamic name space management Next Generation Storage Stack with ultra-high IOPS End-to-end NVMe Stack Virtual Devices NVMe 1.0e spec Hot-plug support VM orchestration Performance improvements Async mode support unmap support Rev the specification Parallel execution @backend 4K Support Scatter-gather support Interrupt coalescing 54

NVMe Core and Fabrics Linux Drivers Update Parag Maharana SSD Architect Seagate

NVMe Linux Drivers Overview Ø Linux Core and Fabrics Drivers are based on Fabrics Spec 1.0 and Core Spec 1.2.1 Ø Linux Host driver is re-architected to support multiple transports (PCIe and Fabrics) Ø Linux Fabrics Driver has Host and Target components: Host has Core, PCIe and Fabric modules Target components has Core and Fabric modules Target side required new configuration tool (nvmetcli) Ø Linux Fabrics Driver is part of Linux Kernel 4.8 from June 16 Santa Clara, CA 56

Implemented Features Previously Ø Ø NVMe Host Driver Support for RDMA transport (Infiniband /RoCE /iwarp /Intel OmniPath ) Connect/Disconnect to multiple controllers Transport of NVMe commands/data generated by NVMe core Initial Discovery service implementation Multi-Path Keep Alive NVMe Target Driver Support for mandatory NVMe and Fabrics commands Support for multiple hosts/subsystems/controls/namespaces Namespaces backed by <any> Linux block devices Initial Discovery service; Discovery Subsystem/Controller(s) Target Configuration interface using Linux configfs Create NVM and Discovery Subsystems Santa Clara, CA 57

New Features Ø NVMe Host Driver Support for transport (FC) Automated host multi-path (work in progress) Ø NVMe Target Driver Support FC Fabric transport Log page support (smart log pages, error log pages, ) Santa Clara, CA 58

NVMe Over Fabrics Host and Target Driver Components PCIe Fabrics Core PCIe Transport (Memory Based) Register Interface PCIe Bus Enumeration NVMe Admin Commands NVMe IO Commands Configuration Discovery Fabric Transport (Capsule Based) RoCE NVMe and Fabrics Common Data structures Fabrics Commands iwarp IB FC Santa Clara, CA 59

Host Driver Components Linux I/O Operations NVMe I/O Operations Santa Clara, CA 60

Target Driver Components Loop Transport Fabric Transport Linux NVMe over Fabric Target Driver Fabric Command Discovery NVMe Target Core Admin Commands Configuration IO Commands Target Configuration Linux ConfigFS Linux Transport Driver(s) Linux Block I/O FC iwarp RoCE IB NVMe Host driver (PCIe Transport) nvmetcli Santa Clara, CA 61

Linux Driver WG Next Steps Ø Next steps Fabric - Authentication features - Controller Memory Buffer NVMe 1.3 complainant and New features - Directive Stream Support - Virtualization Support - Sanitize IO Determinism Ø Call for Action: Download driver and try it out Provide suggestion/comment/feedback Suggest any future enhancement Santa Clara, CA 62

Linux Driver Reference Ø Linux Fabrics drivers NVMe Specification http://www.nvmexpress.org/specifications/ NVMe Fabric Driver Resource http://www.nvmexpress.org/resources/nvme-over-fabrics-drivers/ NVMe Linux Fabric Drivers Source www.kernel.org NVMe-Cli (nvme) Source http://github.com/linux-nvme/nvme-cli/ NVMe-Target-Cli (nvmetcli) Source http://git.infradead.org/users/hch/nvmetcli.git NVMe Linux Fabric Mailing List linux-nvme@lists.infradead.org Santa Clara, CA 63

Thank You! Architected for Performance www.nvmexpress.org Santa Clara, CA 64

Ø Example nvmetcli NVMETCLI Target NVMe controllers are exposed as hostnqn N Target NVMe SSDs are exposed as testnqn MM Santa Clara, CA 65

PCIe Memory Queuing Model 1. Host writes command to SQ 2. Host writes SQ tail pointer for doorbell 3. Controller fetches command 4. Controller processes command 5. Controller writes completion to CQ 6. Controller generates MSI-X interrupt 7. Host processes completion 8. Host writes to CQ head pointer for doorbell Santa Clara, CA 66

1. Host send RDMA_SEND that update in target as RDMA_RECV in target SQ 2. Target issue RDMA_READ or RDMA_WRITE to access data in host memory for Read or Write respectively 3. On completion target update in host CQ using RDMA_SEND that is received by host as RDMA_RECV 4. NVMe over Fabrics does not define an interrupt mechanism that allows a controller to generate a host interrupt. It is the responsibility of the host fabric interface (e.g., Host Bus Adapter) to generate host interrupts Fabrics Queuing Model Santa Clara, CA 67

Storage Performance Development Kit and NVM Express* Jim Harris Principal Software Engineer Intel Data Center Group Santa Clara, CA August 2017 68

NVMe* Software Overhead NVMe Specification enables highly optimized drivers No register reads in I/O path Multiple I/O queues allows lockless submission from multiple CPU cores in parallel But even best of class kernel mode drivers have non-trivial software overhead 3-5us of software overhead per I/O 500K+ IO/s per SSD, 4-24 SSDs per server <10us latency with latest media (i.e. Intel Optane TM SSD) Enter the Storage Performance Development Kit Includes polled-mode and user-space drivers for NVMe Santa Clara, CA August 2017 69

Storage Performance Development Kit (SPDK) Open Source Software Project BSD licensed Source code: http://github.com/spdk Project website: http://spdk.io Set of software building blocks for scalable efficient storage applications Polled-mode and user-space drivers and protocol libraries (including NVMe*) Designed for current and next generation NVM media latencies (i.e. Intel Optane TM ) Santa Clara, CA August 2017 70

Architecture Released Q4 17 Storage Protocols NVMe-oF* Target NVMe iscsi Target SCSI vhost-scsi Target vhost-blk Target Storage Services Block Device Abstraction (BDEV) 3 rd Party NVMe Linux Async IO Logical Volumes Ceph RBD BlobFS Blobstore Drivers NVMe-oF * Initiator NVMe Devices NVMe * PCIe Driver Intel QuickData Technology Driver

NVMe* Driver Released Q4 17 Storage Protocols NVMe-oF* Target NVMe iscsi Target SCSI vhost-scsi Target vhost-blk Target Storage Services Block Device Abstraction (BDEV) 3 rd Party NVMe Linux Async IO Logical Volumes Ceph RBD BlobFS Blobstore Drivers NVMe-oF * Initiator NVMe Devices NVMe * PCIe Driver Intel QuickData Technology Driver

NVMe* Driver Key Characteristics Supports NVMe 1.0 to 1.3 spec-compliant devices Userspace Asynchronous Polled Mode operation Application owns I/O queue allocation and synchronization Features supported include: End-to-end Data Protection SGL Reservations Namespace Management Weighted Round-Robin Controller Memory Buffer Firmware Update Hotplug Santa Clara, CA August 2017 73

NVMe-oF Initiator Released Q4 17 Storage Protocols NVMe-oF* Target NVMe iscsi Target SCSI vhost-scsi Target vhost-blk Target Storage Services Block Device Abstraction (BDEV) 3 rd Party NVMe Linux Async IO Logical Volumes Ceph RBD BlobFS Blobstore Drivers NVMe-oF * Initiator NVMe Devices NVMe * PCIe Driver Intel QuickData Technology Driver

NVMe-oF Initiator Common API for local and remote access Differentiated by probe parameters Pluggable fabric transport RDMA supported currently (using libibverbs) Allows for future transports (i.e. TCP) Santa Clara, CA August 2017 75

NVMe* Driver Performance Comparison Software Overhead Throughput (Single Intel Xeon R core) 3500 4000 Nanoseconds 3000 2500 2000 1500 1000 500 Throughput (KIOps) 3500 3000 2500 2000 1500 1000 500 0 Submission Linux Kernel Completion SPDK 0 1 2 3 4 5 6 7 8 Number of Intel SSD DC P3700 Linux Kernel SPDK System Configuration: 2x Intel Xeon E5-2695v4 (HT off), Intel Speed Step enabled, Intel Turbo Boost Technology disabled, 8x 8GB DDR4 2133 MT/s, 1 DIMM per channel, CentOS * Linux * 7.2, Linux kernel 4.7.0-rc1, 1x Intel P3700 NVMe SSD (800GB), 4x per CPU socket, FW 8DV10102, I/O workload 4KB random read, Queue Depth: 1 per SSD, Performance measured by Intel using SPDK overhead tool, Linux kernel data using Linux AIO System Configuration: 2x Intel Xeon E5-2695v4 (HT off), Intel Speed Step enabled, Intel Turbo Boost Technology disabled, 8x 8GB DDR4 2133 MT/s, 1 DIMM per channel, CentOS * Linux * 7.2, Linux kernel 4.10.0, 8x Intel P3700 NVMe SSD (800GB), 4x per CPU socket, FW 8DV101H0, I/O workload 4KB random read, Queue Depth: 128 per SSD, Performance measured by Intel using SPDK perf tool, Linux kernel data using Linux AIO

Blobstore Released Q4 17 Storage Protocols Storage Services Drivers NVMe-oF* Target NVMe NVMe-oF * Initiator iscsi Target NVMe Devices SCSI vhost-scsi Target Block Device Abstraction (BDEV) 3 rd Party NVMe Linux Async IO Logical Volumes Ceph RBD NVMe * PCIe Driver vhost-blk Target BlobFS Blobstore Intel QuickData Technology Driver What about: filesystems? logical volumes? SPDK Blobstore Userspace general purpose block allocator SPDK Logical Volumes Enable dynamic partitioning SPDK BlobFS Enables basic filesystem use cases Includes RocksDB integration FMS Forum D-11

vhost Released Q4 17 Storage Protocols Storage Services Drivers NVMe-oF* Target NVMe NVMe-oF * Initiator iscsi Target NVMe Devices SCSI vhost-scsi Target Block Device Abstraction (BDEV) 3 rd Party NVMe Linux Async IO Logical Volumes Ceph RBD NVMe * PCIe Driver vhost-blk Target BlobFS Blobstore Intel QuickData Technology Driver Accelerated I/O Virtualization QEMU/KVM-based VMs Reduces I/O overhead on I/O processing cores Leverage SPDK advantages for I/O on behalf of VMs Reduces I/O overhead on VM cores Polling eliminates VMEXITs on submission FMS Forum W-32

Architected for Performance

Bios TBA

BACKUP