OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD.

Similar documents
Introduction to High-Speed InfiniBand Interconnect

Introduction to Infiniband

HUAWEI AR Series SEP Technical White Paper HUAWEI TECHNOLOGIES CO., LTD. Issue 1.0. Date

HUAWEI OceanStor Enterprise Unified Storage System. HyperReplication Technical White Paper. Issue 01. Date HUAWEI TECHNOLOGIES CO., LTD.

HUAWEI USG6000 Series Next-Generation Firewall Technical White Paper VPN HUAWEI TECHNOLOGIES CO., LTD. Issue 1.1. Date

Anti-DDoS. User Guide (Paris) Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD.

Operation Guide for Security NEs Management

Huawei OceanStor ReplicationDirector Software Technical White Paper HUAWEI TECHNOLOGIES CO., LTD. Issue 01. Date

Technical White Paper for NAT Traversal

Virtual Private Cloud. User Guide. Issue 21 Date HUAWEI TECHNOLOGIES CO., LTD.

esdk Storage Plugins 1.0.RC4 Compilation Guide 01(vRO) Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD.

Anti-DDoS. FAQs. Issue 11 Date HUAWEI TECHNOLOGIES CO., LTD.

Huawei esight LogCenter Technical White Paper HUAWEI TECHNOLOGIES CO., LTD. Issue 1.0. Date PUBLIC

AD SSO Technical White Paper

espace UMS V100R001C01SPC100 Product Description Issue 03 Date HUAWEI TECHNOLOGIES CO., LTD.

Huawei FusionCloud Desktop Solution 5.1 Resource Reuse Technical White Paper HUAWEI TECHNOLOGIES CO., LTD. Issue 01.

InfiniBand SDR, DDR, and QDR Technology Guide

Aspects of the InfiniBand Architecture 10/11/2001

MPLS OAM Technology White Paper

Domain Name Service. Product Description. Issue 03 Date HUAWEI TECHNOLOGIES CO., LTD.

Mark Falco Oracle Coherence Development

High Performance Computing: Concepts, Methods & Means Enabling Technologies 2 : Cluster Networks

Infiniband Fast Interconnect

Performance monitoring in InfiniBand networks

Energy Saving Technology White Paper HUAWEI TECHNOLOGIES CO., LTD. Issue 01. Date

Huawei MZ110 NIC V100R001. White Paper. Issue 07 Date HUAWEI TECHNOLOGIES CO., LTD.

RoCE vs. iwarp Competitive Analysis

Informatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0

UltraPath Technical White Paper

BGP/MPLS VPN Technical White Paper

S Series Switches. MACsec Technology White Paper. Issue 1.0. Date HUAWEI TECHNOLOGIES CO., LTD.

InfiniBand Networked Flash Storage

Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP October High Speed Network for HPC

esight V300R001C10 SLA Technical White Paper Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD.

HG531 V1 300Mbps Wireless ADSL2+ Router Product Description. Issue _01 HUAWEI TECHNOLOGIES CO., LTD.

espace SoftConsole V200R001C02 Product Description HUAWEI TECHNOLOGIES CO., LTD. Issue 01 Date

Huawei FusionSphere 6.0 Technical White Paper on OpenStack Integrating FusionCompute HUAWEI TECHNOLOGIES CO., LTD. Issue 01.

Quidway S5700 Series Ethernet Switches V100R006C01. Configuration Guide - Ethernet. Issue 02 Date HUAWEI TECHNOLOGIES CO., LTD.

Elastic Load Balance. User Guide. Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD.

Agile Controller-Campus V100R002C10. Permission Control Technical White Paper. Issue 01. Date HUAWEI TECHNOLOGIES CO., LTD.

CDN. Product Description. Issue 03 Date HUAWEI TECHNOLOGIES CO., LTD.

Live Streaming Accelerator. Quick Start. Issue 03 Date HUAWEI TECHNOLOGIES CO., LTD.

Advanced Anti-DDoS. User Guide. Issue 17 Date HUAWEI TECHNOLOGIES CO., LTD.

Mellanox Infiniband Foundations

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09

New Storage Architectures

Hi3520D V300 H.264 CODEC Processor. Brief Data Sheet. Issue 04. Date

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

Huawei Sx7 Series Switches. SVF Technology White Paper. Issue 01. Date HUAWEI TECHNOLOGIES CO., LTD.

Dragon Slayer Consulting

Advanced Computer Networks. Flow Control

CERN openlab Summer 2006: Networking Overview

Storage System. David Southwell, Ph.D President & CEO Obsidian Strategics Inc. BB:(+1)

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

HUAWEI Secospace USG Series User Management and Control White Paper

Application Acceleration Beyond Flash Storage

The Exascale Architecture

TABLE I IBA LINKS [2]

Best Practices for Deployments using DCB and RoCE

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

2008 International ANSYS Conference

Module 2 Storage Network Architecture

HUAWEI TE Mobile&TE Desktop V100R001C10. Product Overview. Issue 01. Date HUAWEI TECHNOLOGIES CO., LTD.

Network bandwidth is a performance bottleneck for cluster computing. Especially for clusters built with SMP machines.

Fibre Channel Gateway Overview

Comparing Server I/O Consolidation Solutions: iscsi, InfiniBand and FCoE. Gilles Chekroun Errol Roberts

QuickSpecs. HP Z 10GbE Dual Port Module. Models

S Series Switch. Cisco HSRP Replacement. Issue 01. Date HUAWEI TECHNOLOGIES CO., LTD.

Routing Verification Tools

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:

Advanced Computer Networks. Flow Control

Extending InfiniBand Globally

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

DRH Hardware Maintenance Guide

InfiniBand and Mellanox UFM Fundamentals

Multifunction Networking Adapters

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET

Domain Name Service. FAQs. Issue 07 Date HUAWEI TECHNOLOGIES CO., LTD.

The Tofu Interconnect 2

SmartAX MA5821/MA5822 V800R016C10

Huawei FusionCloud Desktop Solution 5.3. Branch Technical White Paper. Issue 01. Date HUAWEI TECHNOLOGIES CO., LTD.

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

Multicomputer distributed system LECTURE 8

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

The NE010 iwarp Adapter

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

Object Storage Service. Product Introduction. Issue 04 Date HUAWEI TECHNOLOGIES CO., LTD.

Lightning Fast Rock Solid

Birds of a Feather Presentation

CMSC 611: Advanced. Interconnection Networks

Image Recognition. SDK Reference. Issue 09 Date HUAWEI TECHNOLOGIES CO., LTD.

10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

NOTE: A minimum of 1 gigabyte (1 GB) of server memory is required per each NC510F adapter. HP NC510F PCIe 10 Gigabit Server Adapter

Future Routing Schemes in Petascale clusters

QuickSpecs. HP InfiniBand Options for HP BladeSystems c-class. Overview

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Transcription:

OceanStor 9000 Issue V1.01 Date 2014-03-29 HUAWEI TECHNOLOGIES CO., LTD.

Copyright Huawei Technologies Co., Ltd. 2014. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders. Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied. Huawei Technologies Co., Ltd. Address: Website: Email: Huawei Industrial Base Bantian, Longgang Shenzhen 518129 People's Republic of China http://www.huawei.com support@huawei.com 2014-6-24 Huawei Confidential Page i

Contents 1 InfiniBand Overview... 1 1.1 Technical Background... 1 2 Technical Features of IB... 2 2.1 Introduction to IB... 2 2.1.1 Working Principles... 2 2.1.2 IB Architecture... 3 2.1.3 IB Protocol Layering... 4 2.2 Technical Features... 5 2.3 Customer Benefits... 6 3 Acronyms and Abbreviations... 7 2014-6-24 Huawei Confidential Page ii

1 InfiniBand Overview 1.1 Technical Background The layered structure of the traditional TCP/IP protocol brings high network latency to cache management and extra overhead to the operating system. As network technologies develop, the network requires an open architecture that features high bandwidth, low latency, solid reliability, and flexible scalability and regards switching as the core. InfiniBand (IB) comes into being to meet such requirements. With its excellent features, IB is applicable to the storage network and computing networks. By using the Remote Direct Memory Access (DRMA) technology and a virtual addressing solution, IB enables servers to identify and utilize some memory of other servers without occupying the kernel resources of the operating system. IB inherits the bus's benefits such as high bandwidth and low latency while easing the pressure for processing CPU loads. Therefore, IB perfectly suits the storage cluster applications. 2014-6-24 Huawei Confidential Page 1

2 Technical Features of IB 2.1 Introduction to IB 2.1.1 Working Principles Compared with other network protocols (such as TCP/IP), IB has a higher transfer efficiency. Some protocols provide the ability to resend lost data packets. However, repetitive acknowledgement and resending during packet forwarding causes communication based on such protocols to slow down, negatively affecting system performance. It should be noted that the TCP protocol is a transfer protocol widely used in a variety of devices from fridges to super computers. However, it has disadvantages, for example, the TPC protocol is extremely complicated, the number of codes is huge, different exceptions need to be handled, and the TCP protocol is hard to uninstall. Unlike the TCP protocol, IB uses a trustworthy flow control mechanism to ensure intact connections and near-zero packet loss. When IB is used, data is sent only when the receiving cache has sufficient space. After receiving the data, the recipient returns a signal to specify the available cache space. In this way, IB eliminates the resending delay caused by packet loss, improving transfer efficiency and overall performance. IB supports channel-based bi-directional, serial-connection transfer. Switched Fabric is applied in the IB topology, where IB switches are used. If the channel is not long enough, IBA repeaters can be used to extend. Each IBA network is called a subnet, which has a maximum of 65,536 nodes. IBA switches and IBA repeaters apply only to the subnet. If communication takes place across multiple IBA subnets, IBA routers or IBA gateways are required. Nodes are connected to IBA subnets through adapters. CPU and memory are connected to IBA subnets through host channel adapters (HCAs). Disks and I/Os are connected to IBA subnets through target channel adapters (TCAs). The connection between two components is named one link. All these components form a complete IBA. Different transfer materials are available in IB. Inside a device, the copper foils of the printed circuit boards (PCBs) are used for transfer (especially used in the backplanes of the industrial control and telecom equipment). Outside a device, copper writes are used for transfer. For a long distance, optical fibers can be used for transfer. When copper foils or copper wires are used, the longest transfer distance is 17 m. However, if optical fibers are used, the transfer distance can be as far as 10 km. In addition, the IBA is hot pluggable and supports the auto-detection and auto-adaption Active Cable intelligent connection mechanism. 2014-6-24 Huawei Confidential Page 2

2.1.2 IB Architecture Figure 2-1 IB architecture The IB standard defines a set of devices that are used for system communication, including channel adapters, switches, and routers. 1. A channel adapter is used to connect devices, and can be classified into HCAs for main control nodes and TCAs for peripheral nodes. In this way, I/O devices are independent of hosts and can directly sit on the network. A channel adapter incorporates the functions of the physical layer, link layer, network layer, and transport layer. A channel adapter is an important part of the IB network interface. It is a programmable DMA component with specific protection features, allowing local and remote DMA operations. 2. A switch is a basic component of the IB architecture and is responsible for forwarding packets within an IB subnet. 3. A router is also a basic component of the IB architecture and is responsible for forwarding packets among different IB subnets. 2014-6-24 Huawei Confidential Page 3

2.1.3 IB Protocol Layering Figure 2-2 IB protocol laying Similar to the traditional TCP/IP protocol, IB is also a layered protocol. Each layer provides different functions. The lower layers provide services for the upper layers. Different layers are independent of each other. 1. The upper-level protocol layer provides a Verbs interface between the applications and hardware drivers, allowing the upper-layer applications to enable RDMA programming based on the Verbs interface. 2. The transport layer is responsible for distributing packets to the target and segmenting and reconstructing packets that exceed the maximum transfer unit (MTU). Specifically, the transport layer is responsible for packet distribution, multiplexing, basic transmission service, and sending, receiving, and restructuring of packet segments. The transport layer sends a data packet to a specific QP and instructs the QP to process the data packet. If the data path load of the message exceeds the path MTU, the transport layer divides the message into multiple data packets. The QP of the receiving end is responsible for reconstructing data to a specific data buffer area. 3. The network layer is responsible for routing packets between different subnets. The network layer describes the protocols for forwarding data packets between subnets. It is similar to the network layer of the IP network. The network layer is not required when data is transferred within the subnet. 4. The link layer is responsible for resolving packet formats and packet operations, such as flow control and packet switching on the subnet. The link layer describes the protocols for data packet formats and operations, such as flow control and data packet routing on subnets. The link layer supports two types of packets: link management data packets and data packets. 2014-6-24 Huawei Confidential Page 4

Link management data packages are used to perform and maintain the link operations. Link management data packets are generated and consumed at the link layer and do not conflict with flow control. Data packets contain different optional head information. These packets can determine bit rates and link bandwidth among ports at the end of the links and to transport flow control credits and maintain link completeness. 5. The physical layer is responsible for transferring data frames through cables by bit. IB provides lossless network transfer. The physical layer meets the requirement for 10e -12 bit error rate. The physical layer defines the electrical and mechanical features, including the features of the cables and sockets that use optical fibers and copper as media, the base connectors, and the heat exchangers. In addition, the physical layer defines three types of physical ports: backplane ports, electrical ports, and optical ports. The electric cables are made of copper wires and support 100 m of transfer distance. Optical fiber ports support as far as 10 km of transfer distance. 6. The physical layer also specifies how bits are changed into symbols in channels and defines signals for creating frames (packet start and end), data symbols, and idles. In addition, the physical layer describes signaling protocols for constructing effective packets, such as symbol encoding, frame flag arrangement, invalid or non-data symbols between start and end delimiters, non-parity errors, and synchronization methods. 2.2 Technical Features 1. High bandwidth In IB, 1, 4, 8, and 12 cables are parallel to increase the channel bandwidth, and SDR, DDR, QDR, and FDR technologies are used to further increase bandwidth. Bandwidth varies depending on the number of parallel channels and technologies, as shown in the following table: Number of Channels SDR DDR QDR FDR 1X 2.5 GB/s 5 GB/s 10 GB/s 14 GB/s 4X 10 GB/s 20 GB/s 40 GB/s 56 GB/s 8X 20 GB/s 40 GB/s 80 GB/s 112 GB/s 12X 30 GB/s 60 GB/s 120 GB/s 168 GB/s 2. Low latency Latency is an important index to determine the interconnection of high-performance computers. The latency of switches in the IB topology is less than 100s and the application delay less than 1 to 3 us. 3. Flexible scalability Millions of terminal devices are interconnected in the P2P switched fabric to provide an IB network without congestion. 4. QoS 2014-6-24 Huawei Confidential Page 5

16-level virtual channels that can be mapped to 16 service layers are available. By setting priority for each virtual channel, you can manage service and quality of different SLs of different service levels, credit-based flow control mechanism, and injection rate control mechanism. The congestion can be controlled as a result. 5. Support for RDMA The RDMA technology enables IB servers and the services in the storage network to exchange data with the memory of other servers in a high speed. 6. Dedicated protocol offload engine IB implements efficient and reliable P2P transmission through hardware and supports channel-based message transmission and memory image technologies. It also provides the ability to bypass the core of the operating, sharing the CPU load and improving the overall performance. 7. Separation of the I/O subsystem from the host Channel adapters provide links to the I/O controllers and allow the I/O devices to be detached from hosts. The subnet management pattern (one primary SM + multiple SMAs) is a secure and efficient management pattern. It helps conserve the cabinet space, provide excellent scalability, and shatter the distance bottleneck between the host and I/O system. If the transfer materials are made of copper, the transfer distance is 17 m. If the transfer materials are optical fibers, the transfer distance 10 km. 8. Support for partitioning IB subnets are divided into multiple partitions, providing improved performance and security. 9. Error tolerance Multiple physical channels (separate from each other) are built between the host system and the I/O devices to achieve fault tolerance. 10. IPV6 addressing IB uses the IPv6 header format, in which a data packet contains the source address and destination address of the data. These addresses enable the IB switches and routers to directly forward data to specific devices according to their forwarding tables (configured by the SM-SMP-SMA). 2.3 Customer Benefits Typical Application Scenarios Customer Benefits The IB network applies to the high-concurrency and high-performance computing application scenarios. In these scenarios, customers have demanding bandwidth and latency requirements. General networking is that both front-end and back-end networks use IB, or the front-end network uses 10GE and the back-end network uses IB. The OceanStor 9000 supports both the prior networking modes. In the typical scenario, 4-channel QDR IB adapters and switches are used. With its high bandwidth, low latency, high reliability, and massive cluster scalability, and the adaption of the RDMA technology and dedicated protocol offload engine, IB provides storage customers with sufficient bandwidth and low response latency. 2014-6-24 Huawei Confidential Page 6

3 Acronyms and Abbreviations DDR FDR IB QDR SDR Double Data Rate Fourteen Data Rate InfiniBand Quad Data Rate Single Data Rate 2014-6-24 Huawei Confidential Page 7