A Study of iscsi Extensions for RDMA (iser) Patricia Thaler (Agilent).

Similar documents
A Study of iscsi Extensions for RDMA (iser)

An RDMA Protocol Specification (Version 1.0)

by Brian Hausauer, Chief Architect, NetEffect, Inc

MPA (Marker PDU Aligned Framing for TCP)

NFS/RDMA. Tom Talpey Network Appliance

RDMA enabled NIC (RNIC) Verbs Overview. Renato Recio

Internet Engineering Task Force (IETF) Category: Standards Track. A. Eiriksson Chelsio Communications, Inc. R. Sharp Intel Corporation June 2014

Evaluating the Impact of RDMA on Storage I/O over InfiniBand

Implementation and Evaluation of iscsi over RDMA

RoCE vs. iwarp Competitive Analysis

iscsi Error Recovery Mallikarjun Chadalapaka Randy Haagens Julian Satran London, 6-7 Aug 2001

Multifunction Networking Adapters

The Evolution of iscsi

Remote Persistent Memory SNIA Nonvolatile Memory Programming TWG

Agenda. 1. The need for Multi-Protocol Networks. 2. Users Attitudes and Perceptions of iscsi. 3. iscsi: What is it? 4.

TCP Framing Discussion Slides

10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G

IWARP Consortium. Network Impairments Test Suite. Technical Document. Revision 0.3

High Performance File Serving with SMB3 and RDMA via SMB Direct

[MS-SMBD]: SMB2 Remote Direct Memory Access (RDMA) Transport Protocol

Remote Persistent Memory With Nothing But Net Tom Talpey Microsoft

iscsi or iser? Asgeir Eiriksson CTO Chelsio Communications Inc

Copyright 2006 Penton Media, Inc., All rights reserved. Printing of this document is for personal use only. Reprints

RDMA BASED IP ROUTING PROTOCOLS AND ROUTER ARCHITECTURE

Research Report. IBM Research. Enabling Applications for RDMA: Distributed Compilation Revisited

Updating RPC-over-RDMA: Next steps. David Noveck nfsv4wg meeting at IETF96 July 19, 2016

What is RDMA? An Introduction to Networking Acceleration Technologies

Generic RDMA Enablement in Linux

FaRM: Fast Remote Memory

The Case for RDMA. Jim Pinkerton RDMA Consortium 5/29/2002

TCP/IP Protocol Suite 1

A GENERAL-PURPOSE API FOR IWARP AND INFINIBAND

Title Month Year. IP Storage: iscsi and FC Extension. Introduction. IP Network Layers - In Practice. IP Network Layers

Advanced Computer Networks. End Host Optimization

iscsi Consortium Full Feature Phase Test Suite For iscsi Initiators

iscsi Technology: A Convergence of Networking and Storage

Outrunning Moore s Law Can IP-SANs close the host-network gap? Jeff Chase Duke University

Memory Management Strategies for Data Serving with RDMA

CERN openlab Summer 2006: Networking Overview

SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet

Ethernet: The High Bandwidth Low-Latency Data Center Switching Fabric

Stream Control Transmission Protocol

What a Long Strange Trip It s Been: Moving RDMA into Broad Data Center Deployments

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

CS 5520/ECE 5590NA: Network Architecture I Spring Lecture 13: UDP and TCP

iscsi Extensions for RDMA Updates and news Sagi Grimberg Mellanox Technologies

iwarp Transport Specific Extensions for DAT 2.0

QuickSpecs. HP Z 10GbE Dual Port Module. Models

Objective. Performance. Availability. Cost. A good network citizen

BUILDING A BLOCK STORAGE APPLICATION ON OFED - CHALLENGES

Packet Switching - Asynchronous Transfer Mode. Introduction. Areas for Discussion. 3.3 Cell Switching (ATM) ATM - Introduction

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

UNH IOL iscsi CONSORTIUM

Designing and Enhancing the Sockets Direct Protocol (SDP) over iwarp and InfiniBand

Matt Wakeley 27 June, iscsi Framing Presentation

Performance of RDMA-capable Storage Protocols on Wide-Area Network

Accelerating Web Protocols Using RDMA

Universal RDMA: A Unique Approach for Heterogeneous Data-Center Acceleration

Storage Protocol Offload for Virtualized Environments Session 301-F

CSCI-GA Operating Systems. Networking. Hubertus Franke

OpenFabrics Interface WG A brief introduction. Paul Grun co chair OFI WG Cray, Inc.

UNH IOL iscsi CONSORTIUM

OFED Storage Protocols

Mobile Transport Layer Lesson 10 Timeout Freezing, Selective Retransmission, Transaction Oriented TCP and Explicit Notification Methods

InfiniBand Linux Operating System Software Access Layer

SNIA Developers Conference - Growth of the iscsi RDMA (iser) Ecosystem

iscsi Consortium Error Recovery Test Suite For iscsi Targets

Introduction to Ethernet Latency

Application Acceleration Beyond Flash Storage

Initial Performance Evaluation of the NetEffect 10 Gigabit iwarp Adapter

Design of the iscsi Protocol

iscsi BSDCan 2008, Ottawa is not an Apple appliance Daniel Braniss The Hebrew University of Jerusalem

User Datagram Protocol

NFS/RDMA Next Steps. Chuck Lever Oracle

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS

The OSI Model. Open Systems Interconnection (OSI). Developed by the International Organization for Standardization (ISO).

Remote Access to Ultra-Low-Latency Storage Tom Talpey Microsoft

ATM Technology in Detail. Objectives. Presentation Outline

Voltaire. Fast I/O for XEN using RDMA Technologies. The Grid Interconnect Company. April 2005 Yaron Haviv, Voltaire, CTO

TCP : Fundamentals of Computer Networks Bill Nace

iscsi Protocols iscsi, Naming & Discovery, Boot, MIBs John Hufferd, Sr. Technical Staff IBM SSG

Request for Comments: May 2004

Updates: 3720 October 2007 Category: Standards Track. Internet Small Computer System Interface (iscsi) Corrections and Clarifications

Advanced Computer Networks. RDMA, Network Virtualization

Advancing RDMA. A proposal for RDMA on Enhanced Ethernet. Paul Grun SystemFabricWorks

UNIT IV -- TRANSPORT LAYER

NVMe over Fabrics support in Linux Christoph Hellwig Sagi Grimberg

TSIN02 - Internetworking

TSIN02 - Internetworking

Host Solutions Group Technical Bulletin August 30, 2007

ECE 650 Systems Programming & Engineering. Spring 2018

TSIN02 - Internetworking

TCP/IP. Chapter 5: Transport Layer TCP/IP Protocols

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

Lecture 8: February 19

EE 610 Part 2: Encapsulation and network utilities

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

URDMA: RDMA VERBS OVER DPDK

Comparing Server I/O Consolidation Solutions: iscsi, InfiniBand and FCoE. Gilles Chekroun Errol Roberts

To: T10 Membership T10/97-184R4 Subject: Use of Class 2 for Fibre Channel Tapes From: Date: 23-Oct-1997

Transcription:

A Study of iscsi Extensions for RDMA (iser) Mallikarjun Chadalapaka (HP) Michael Ko (IBM) Patricia Thaler (Agilent). Uri Elzur (Broadcom) Hemal Shah (Intel) Slide 1 August 27, 2003 NICELI, ACM SIGCOMM 2003

Outline Background The Who, Where Motivation and case for iser The Why Layering of iscsi, iser & iwarp Stack and functionality distribution iser design features Connection setup, Transformation, Data integriy management Changes/extensions to iscsi What is changed and why Enhancements in iwarp protocols Automatic invalidation Enhancements to iwarp Verbs Efficient registration of STags Next steps Standardization Questions Slide 2 August 27, 2003 NICELI, ACM SIGCOMM 2003

Background The iser paper is based on a (just concluded) protocol design, explores work done by contributors from several companies in the RDMA Consortium, belongs to the Experience category the E in NICELI. iser is iscsi Extensions for RDMA (iser), iser maps the iscsi protocol over the iwarp protocol suite (RDMA over TCP/IP). The focus of this paper is: how iser enables efficient data movement for iscsi using generic RDMA hardware how/why certain iwarp architectural features were conceived during the iser design. Slide 3 August 27, 2003 NICELI, ACM SIGCOMM 2003

iscsi, TCP and the challenges therein iscsi is an application protocol designed to run on TCP/IP. transports the SCSI protocol exchanges so SCSI I/Os can be done over TCP/IP. TCP copy overhead and reassembly buffer requirements were identified as serious acceptance/deployment barriers during iscsi s early design. The iscsi protocol thus includes an optional feature called markers. Markers delineate iscsi PDU boundaries via recurring pointers showing up at fixed intervals within the TCP data stream. The iscsi markers however aided iscsi-specific direct data placement (can also be done without employing markers, albeit needing more reassembly memory ) that directly places each iscsi PDU into its final memory location. With or without markers, iscsi-specific data placement needed an iscsi-specific NIC to efficiently run iscsi protocol avoiding TCP data copies. Slide 4 August 27, 2003 NICELI, ACM SIGCOMM 2003

The case for iser Considerations the designers pondered over were - Does RDMA over TCP/IP technology satisfy the data movement needs of iscsi? If so, when the RDMA technology advances, so does iscsi. Why tackle fundamental issues such as copy elimination via iscsi-specific protocol?. Did iwarp say it offers CRC-level reliability on TCP/IP? Let iscsi take the opportunity to stop playing transport! If nothing else, iscsi needs iser to run most efficiently on those (presumed to become) pervasive RNICs (RDMA-enabled NICs) in future. The iscsi designers were thus ultimately convinced of the need for iser, an extension to iscsi to enable it to run on RDMA over TCP/IP (aka iwarp). iser has the explicit design goal to let iscsi run on RNICs requiring no greater number of interrupts than an iscsi NIC does i.e. run most efficiently on generic RNICs. Slide 5 August 27, 2003 NICELI, ACM SIGCOMM 2003

iscsi, iser and iwarp The iser protocol is designed to run on RDMAP protocol of the iwarp suite. SCSI The paper contains a discussion of why RDMAP was preferred over DDP. iscsi iser Datamover Interface iwarp Verbs The iser wire protocol is dependent only on RDMAP. However, the iwarp Verbs are a crucial part of the solution puzzle. During the iser design, certain Innovations in iwarp Verbs were also made to best meet the needs of iser. Generic RDMA over TCP/IP RDMAP DDP MPA TCP iwarp protocol suite RNIC The first step was to define an architecture model, Datamover Architecture, that distilled the needs of iscsi to generic data movement primitives. iser design was then mapping the primitives to RDMAP exchanges. Slide 6 August 27, 2003 NICELI, ACM SIGCOMM 2003

iser design iser protocol uses the well-known TCP port used for iscsi connection establishment, rather than using a new iser well-known port. The iscsi/iser connection thus always starts in iscsi streaming mode. A new iscsi login key used for turning the RDMA (iser) mode on after login. The existing discovery and boot mechanisms work with no changes. Transformation or Encapsulation? A question not traditionally encountered in layered protocols. The iser protocol simply encapsulates certain iscsi PDUs (called control-type PDUs) in iser RDMA Send Messages, while it transforms certain other iscsi PDUs (called data-type PDUs) into RDMA Writes or RDMA Reads. The iser protocol relieves iscsi of having to play transport role iser mandates that iscsi-level PDU digests must not be used because iwarp guarantees CRC-level data integrity. iscsi CRC generation, checking, retransmission requests, retransmissions, timeoutbased retransmissions - a lot of complexity in iscsi is thus gone! Slide 7 August 27, 2003 NICELI, ACM SIGCOMM 2003

Changes to iscsi The biggest set of changes to iscsi in order to support iser will be in the area of how iscsi interfaces to its LLP (lower level protocol). Traditional iscsi interfaces directly with TCP. Traditional iscsi is involved in a lot of data movement activity. In the new model, iscsi simply yields the administration of data movement to iser, and iser and iwarp will work together to move the data. Wire protocol iscsi-level PDU digests (header & data) must not be used (so, don t bother to use the PDU level recovery features of iscsi). No piggybacking of status on the last read data PDU (the receiving RNIC doesn t demux during placement!) Other areas Obviously, iscsi should know to negotiate the new login key to turn the RDMA (iser) mode on after login. iscsi must chunk long unsolicited data sequences into PDUs so that each mid- PDU is exactly of negotiated max size. Slide 8 August 27, 2003 NICELI, ACM SIGCOMM 2003

Enhancement to RDMAP (automatic invalidation) SCSI has a clearly defined transactional model Command (Initiator -> Target) data (either way) status (Target -> Initiator) The initiator iser layer (client) exposes its STags to the target (server). After receiving the status, initiator iser layer will invalidate the STag mapping before using those buffers. How about doing this invalidation automatically on receiving the status? That takes one hardware access out from the performance path. iscsi iser RNIC iscsi iser RNIC Allow buffer usage Invalidate the exposed STag Status (SendSE Message) Allow buffer usage Check the invalidated STag Status (SendSE with Invalidate Message) Note -Red line is crossed only once! Slide 9 August 27, 2003 NICELI, ACM SIGCOMM 2003

Enhancements to iwarp Verbs (fast register) The initiator iser layer (client) exposes its STags to the target (server). The initiator iser layer must register the Command buffer locally with the RNIC. Registration process yields the STag, so must precede the advertisement. This is a synchronous wait for a hardware response in the performance path. In the fast-register model, the STag is allocated to iser apriori. It is merely associated with the Command buffer during runtime. The fast-registration is now guaranteed to succeed. The initiator iser layer can post the fast-register and command requests to the hardware back-to-back, no more waiting. The paper also discusses automatic deregistration and Shared Receive Queues. iscsi iser RNIC iscsi iser RNIC SCSI Command SCSI Command Register the buffer to get STag Advertise the STag in the Command Fast-Register with a known STag Advertise the same STag in the Command Slide 10 August 27, 2003 NICELI, ACM SIGCOMM 2003

Next Steps The Datamover Architecture for iscsi (DA) and iscsi Extensions for RDMA (iser) specifications were publicly released by the RDMA Consortium on July 21, 2003 (all specs available on www.rdmaconsortium.org). Several Consortium member companies are working on productization of the iwarp protocol suite and iser. Both DA and iser specs are submitted to IETF as Internet Drafts for pursuing standardization. Slide 11 August 27, 2003 NICELI, ACM SIGCOMM 2003

Thank you! Questions? Slide 12 August 27, 2003 NICELI, ACM SIGCOMM 2003