An Approach for Implementing OF based olutions anjeev Kumar oftware Product Engineering, HiTech, Tata Consultancy ervices 25 May 2018 1 DC India 2018 Copyright 2018 Tata Consultancy ervices Limited
Agenda Recap of OF Evolution Network Fabric Which one to Choose? Implementing OF based solution Drivers support for OF 2
Click Recap to of OF edit Master Evolution title style 3
Recap of Communication Protocols A and ATA performance increased over time but protocols have not changed much was created to allow direct access to the logical block based architecture of Ds and to do highlyparallel IO enabling D to execute more IO threads than any other protocol before. Protocol pecification latest Version 1.3b launched on May 2018. 4
Data Flow with Enterprise torage Over Network : Limitations Flash torage Linux Windows C I C I ici FC ici FC C I C I CI C I ici CI CI CI NVM ubsystem NVM ubsystem NVM ubsystem Host oftware PCIe D Protocol conversion bridge is required to access the data over network which increases the latency 5
What is a NVM ubsystem? NVM ubsystem Fabric Port Controller Namespace Network Fabric Fabric Port Controller Namespace NV M I/F NVM Media 6
Over Fabric olution Launched a new specification Over Fabric 1.0 on June 2016. Is a way to send commands over networking protocols ( Fabrics ). E.g. RDMA (Infiniband, iwarp, RoCE,..) Fibre Channel Defines a common architecture that supports a range of storage networking fabrics for block storage protocol over a storage networking fabric Inherent parallelism of multiple I/O Queues is exposed to the host (64k Queues & 64k commands per Q) commands and structures are transferred end-to-end Maintains the architecture across a range of fabric types Maintains architecture and software consistency between fabric types by standardizing a common abstraction and encapsulation definition Design goal of over Fabrics : Provide distance connectivity to devices with no more than 10 microseconds (μs) of additional latency 7
Click Network to edit Fabric Master Which title style one to Choose? 8
Over fabric Protocol Transport Mode ource: nvmexpress.org 9
Over Fabric tack Architecture Over Fabric Transport Binding pecifications Architecture, Queuing Interface, Admin Command & I/O Command ets, Properties Fabric pecific Properties Transport pecific Features/ pecilization Transport Binding ervices Transport Transport Fabric Protocol Fabric Fabric Physical (i.e. Ethernet, Infiniband, Fiber Channel) 10
Network Fabrics Host oftware Over Fabric describe how to transport the interface across several scalable fabrics Over Fabric initially defines two type of fabrics for transport as Fiber Channel and RDMA Host ide Transport Abstraction Fiber Channel RoCEv2 iwarp Infiniband FCoE NextGen Fabric Controller side Transport Abstraction Ds 11
Comparative Analysis RDMA over Converged Ethernet v2 RoCEv2 iwarp Infiniband FCoE FC i Promoted by Mellanox Internet Wide Area RDMA Protocol Promoted by Intel Infiniband Promoted by Mellanox Fiber Channel over Ethernet Promoted by CICO Fiber Channel Promoted by CICO, Brocade Over TCP Promoted by Facebook, DELL EMC and Intel RDMA based RDMA based RDMA based Leverage FC- Non RDMA based Non RDMA based Not Compatible with Other Ethernet Options Lossless Ethernet support ( ECN- Explicit Congestion Notification) Not Compatible with Other Ethernet Options Lossless Ethernet not required Very Rarely Deployed. pecial usecases like HPC or erver-erver cluster Already a lossless protocol Not Compatible with Other Ethernet Options Require a DCB network(lossless Ethernet) Compatible with both CI and protocol Already a lossless protocol Not Yet a part of of standard Leverage s/w implementation of 12
Over Fabric Protocol tacks Over Fabric Three dominant protocol Fiber Channel is a lossless network fabric, is just a new upper layer protocol where RDMA is not needed RDMA + IP + Lossless Ethernet (DCB) layer add complexity to RoCEv2 & iwarp protocols Ethernet /IP based protocol are using commodity and internet scale. Fiber Channel RoCEv2 iwarp FC - Framing & Flow Control Over RDMA RDMA Verbs IB Transport (with ECN) UDP Over RDMA RDMA Verbs DDP Protocol MPA TCP (with ECN) Encoding IP (with ECN) Ethernet DCB IP (with ECN) Ethernet DCB FC Physical Ethernet Physical Ethernet Physical 13
Business Drivers to Adapt OF based olution AI/ Machine Learning Analytics Video Processing High Performance Computing Hyper Converged Infrastructure 14
Click Implementing to edit Master OF title based style olution 15
Core Elements for Implementing Over Fabric olutions 1 2 3 Enabled Host ystem upported Network based torage ub ystem Windows Linux Unix olaris VMware Fiber Channel RoCE v2 iwarp Controller NID ( Name pace ID) NVM Interface MVM Media(Chip, D, NVDIMM) 16
Reference Architecture for OF olution for Enterprise torage Linux Host Windows Host C I C I ici MB FC ici ier RP FC C I C I Flash torage CI CI N V M e H O T Over Fabric ubsystem Bridge ubsystem Bridge ubsystem Bridge PCIe PCIe PCIe D D D Optimized Host N V M e Over Fabric ubsystem Bridge PCIe D Front End Fabric Back End Fabric 17
Click Linux to Based edit Master OF title Driver style 18
Over Fabric upport in Linux Current functionality implemented for Host Driver upport for RDMA transport ( Infiniband/ RoCE /iwarp ) Connect/Disconnect to multiple controllers Transport of commands/data generated by core Initial Discovery service implementation Multi-Path Current functionality implemented for Target Driver upport for mandatory core and Fabrics commands upport for multiple hosts/subsystems/controls/namespaces Namespaces backed by Linux block devices Initial Discovery service; Discovery ubsystem/controller(s) Target Configuration interface using Linux configfs Create NVM and Discovery ubsystems Linux Fabrics Driver is a part of Linux Kernel 4.8 onwards 19
Over Fabric Host Driver Implementation in Linux Application F Operations File ystem Fabric Driver (New) RDMA tack Block Layer Generic Block Command Host Channel Adapter Driver Infiniband / RoCE / iwarp Common Code Block Device I/O PCIe Driver (minus common code) PCIe tack PCI Express Command by IOCTL New Fabric Driver uses the existing common code Additional it is split into a small common fabrics library and the actual transport driver Existing user space APIs of the PCIe driver are also supported when using fabrics Uses new sub-command of the existing nvme-cli tool to connect to remote controllers 20
iwarp RoCE v2 Infiniband Over Fabric Target Driver Implementation in Linux RDMA Transport Over Fabric Discovery ervice Over Fabric upport Admin Command Implementation I/O Command Implementation Target Core Linux RDMA Drivers Responsible for discovering ubsystems available to Host, Namespace and multipath Responsible for servicing Fabrics commands (connect, property get/set) Admin: Responsible for parsing and servicing admin commands such as controller identify, set features, keep-alive, log page,...) I/O : Responsible for performing the actual I/O (Read, Write, Flush, Deallocate. Defines and manages the entities (subsystems, controllers, namespaces,...) and their allocation Responsible for initial commands processing and correct orchestration of the stack setup and tear down 21
How Data Flows from Host to Target in OF? Host Target Linux File ystem Linux Block IO Kernel Device Driver Fabrics Kernel Fabrics RDMA Verbs RDMA Driver RDMA Verbs RDMA Driver Device Driver NIC RDMA Offload TCP Offload NIC RDMA Offload TCP Offload Drive Drive 22
upported torage Array Today NetApp E570 All Flash Array (FC -) Puretorage - DirectFlash DELL EMC IBM upermicro Mangstor E8 torage Pavillion Data Excelero Aperion 23
Click to edit Master title style Q&A mail us @ sanjeev24.k@tcs.com 24