OceanStor 9000 Issue V1.01 Date 2014-03-29 HUAWEI TECHNOLOGIES CO., LTD.
Copyright Huawei Technologies Co., Ltd. 2014. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders. Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied. Huawei Technologies Co., Ltd. Address: Website: Email: Huawei Industrial Base Bantian, Longgang Shenzhen 518129 People's Republic of China http://www.huawei.com support@huawei.com 2014-6-24 Huawei Confidential Page i
Contents 1 InfiniBand Overview... 1 1.1 Technical Background... 1 2 Technical Features of IB... 2 2.1 Introduction to IB... 2 2.1.1 Working Principles... 2 2.1.2 IB Architecture... 3 2.1.3 IB Protocol Layering... 4 2.2 Technical Features... 5 2.3 Customer Benefits... 6 3 Acronyms and Abbreviations... 7 2014-6-24 Huawei Confidential Page ii
1 InfiniBand Overview 1.1 Technical Background The layered structure of the traditional TCP/IP protocol brings high network latency to cache management and extra overhead to the operating system. As network technologies develop, the network requires an open architecture that features high bandwidth, low latency, solid reliability, and flexible scalability and regards switching as the core. InfiniBand (IB) comes into being to meet such requirements. With its excellent features, IB is applicable to the storage network and computing networks. By using the Remote Direct Memory Access (DRMA) technology and a virtual addressing solution, IB enables servers to identify and utilize some memory of other servers without occupying the kernel resources of the operating system. IB inherits the bus's benefits such as high bandwidth and low latency while easing the pressure for processing CPU loads. Therefore, IB perfectly suits the storage cluster applications. 2014-6-24 Huawei Confidential Page 1
2 Technical Features of IB 2.1 Introduction to IB 2.1.1 Working Principles Compared with other network protocols (such as TCP/IP), IB has a higher transfer efficiency. Some protocols provide the ability to resend lost data packets. However, repetitive acknowledgement and resending during packet forwarding causes communication based on such protocols to slow down, negatively affecting system performance. It should be noted that the TCP protocol is a transfer protocol widely used in a variety of devices from fridges to super computers. However, it has disadvantages, for example, the TPC protocol is extremely complicated, the number of codes is huge, different exceptions need to be handled, and the TCP protocol is hard to uninstall. Unlike the TCP protocol, IB uses a trustworthy flow control mechanism to ensure intact connections and near-zero packet loss. When IB is used, data is sent only when the receiving cache has sufficient space. After receiving the data, the recipient returns a signal to specify the available cache space. In this way, IB eliminates the resending delay caused by packet loss, improving transfer efficiency and overall performance. IB supports channel-based bi-directional, serial-connection transfer. Switched Fabric is applied in the IB topology, where IB switches are used. If the channel is not long enough, IBA repeaters can be used to extend. Each IBA network is called a subnet, which has a maximum of 65,536 nodes. IBA switches and IBA repeaters apply only to the subnet. If communication takes place across multiple IBA subnets, IBA routers or IBA gateways are required. Nodes are connected to IBA subnets through adapters. CPU and memory are connected to IBA subnets through host channel adapters (HCAs). Disks and I/Os are connected to IBA subnets through target channel adapters (TCAs). The connection between two components is named one link. All these components form a complete IBA. Different transfer materials are available in IB. Inside a device, the copper foils of the printed circuit boards (PCBs) are used for transfer (especially used in the backplanes of the industrial control and telecom equipment). Outside a device, copper writes are used for transfer. For a long distance, optical fibers can be used for transfer. When copper foils or copper wires are used, the longest transfer distance is 17 m. However, if optical fibers are used, the transfer distance can be as far as 10 km. In addition, the IBA is hot pluggable and supports the auto-detection and auto-adaption Active Cable intelligent connection mechanism. 2014-6-24 Huawei Confidential Page 2
2.1.2 IB Architecture Figure 2-1 IB architecture The IB standard defines a set of devices that are used for system communication, including channel adapters, switches, and routers. 1. A channel adapter is used to connect devices, and can be classified into HCAs for main control nodes and TCAs for peripheral nodes. In this way, I/O devices are independent of hosts and can directly sit on the network. A channel adapter incorporates the functions of the physical layer, link layer, network layer, and transport layer. A channel adapter is an important part of the IB network interface. It is a programmable DMA component with specific protection features, allowing local and remote DMA operations. 2. A switch is a basic component of the IB architecture and is responsible for forwarding packets within an IB subnet. 3. A router is also a basic component of the IB architecture and is responsible for forwarding packets among different IB subnets. 2014-6-24 Huawei Confidential Page 3
2.1.3 IB Protocol Layering Figure 2-2 IB protocol laying Similar to the traditional TCP/IP protocol, IB is also a layered protocol. Each layer provides different functions. The lower layers provide services for the upper layers. Different layers are independent of each other. 1. The upper-level protocol layer provides a Verbs interface between the applications and hardware drivers, allowing the upper-layer applications to enable RDMA programming based on the Verbs interface. 2. The transport layer is responsible for distributing packets to the target and segmenting and reconstructing packets that exceed the maximum transfer unit (MTU). Specifically, the transport layer is responsible for packet distribution, multiplexing, basic transmission service, and sending, receiving, and restructuring of packet segments. The transport layer sends a data packet to a specific QP and instructs the QP to process the data packet. If the data path load of the message exceeds the path MTU, the transport layer divides the message into multiple data packets. The QP of the receiving end is responsible for reconstructing data to a specific data buffer area. 3. The network layer is responsible for routing packets between different subnets. The network layer describes the protocols for forwarding data packets between subnets. It is similar to the network layer of the IP network. The network layer is not required when data is transferred within the subnet. 4. The link layer is responsible for resolving packet formats and packet operations, such as flow control and packet switching on the subnet. The link layer describes the protocols for data packet formats and operations, such as flow control and data packet routing on subnets. The link layer supports two types of packets: link management data packets and data packets. 2014-6-24 Huawei Confidential Page 4
Link management data packages are used to perform and maintain the link operations. Link management data packets are generated and consumed at the link layer and do not conflict with flow control. Data packets contain different optional head information. These packets can determine bit rates and link bandwidth among ports at the end of the links and to transport flow control credits and maintain link completeness. 5. The physical layer is responsible for transferring data frames through cables by bit. IB provides lossless network transfer. The physical layer meets the requirement for 10e -12 bit error rate. The physical layer defines the electrical and mechanical features, including the features of the cables and sockets that use optical fibers and copper as media, the base connectors, and the heat exchangers. In addition, the physical layer defines three types of physical ports: backplane ports, electrical ports, and optical ports. The electric cables are made of copper wires and support 100 m of transfer distance. Optical fiber ports support as far as 10 km of transfer distance. 6. The physical layer also specifies how bits are changed into symbols in channels and defines signals for creating frames (packet start and end), data symbols, and idles. In addition, the physical layer describes signaling protocols for constructing effective packets, such as symbol encoding, frame flag arrangement, invalid or non-data symbols between start and end delimiters, non-parity errors, and synchronization methods. 2.2 Technical Features 1. High bandwidth In IB, 1, 4, 8, and 12 cables are parallel to increase the channel bandwidth, and SDR, DDR, QDR, and FDR technologies are used to further increase bandwidth. Bandwidth varies depending on the number of parallel channels and technologies, as shown in the following table: Number of Channels SDR DDR QDR FDR 1X 2.5 GB/s 5 GB/s 10 GB/s 14 GB/s 4X 10 GB/s 20 GB/s 40 GB/s 56 GB/s 8X 20 GB/s 40 GB/s 80 GB/s 112 GB/s 12X 30 GB/s 60 GB/s 120 GB/s 168 GB/s 2. Low latency Latency is an important index to determine the interconnection of high-performance computers. The latency of switches in the IB topology is less than 100s and the application delay less than 1 to 3 us. 3. Flexible scalability Millions of terminal devices are interconnected in the P2P switched fabric to provide an IB network without congestion. 4. QoS 2014-6-24 Huawei Confidential Page 5
16-level virtual channels that can be mapped to 16 service layers are available. By setting priority for each virtual channel, you can manage service and quality of different SLs of different service levels, credit-based flow control mechanism, and injection rate control mechanism. The congestion can be controlled as a result. 5. Support for RDMA The RDMA technology enables IB servers and the services in the storage network to exchange data with the memory of other servers in a high speed. 6. Dedicated protocol offload engine IB implements efficient and reliable P2P transmission through hardware and supports channel-based message transmission and memory image technologies. It also provides the ability to bypass the core of the operating, sharing the CPU load and improving the overall performance. 7. Separation of the I/O subsystem from the host Channel adapters provide links to the I/O controllers and allow the I/O devices to be detached from hosts. The subnet management pattern (one primary SM + multiple SMAs) is a secure and efficient management pattern. It helps conserve the cabinet space, provide excellent scalability, and shatter the distance bottleneck between the host and I/O system. If the transfer materials are made of copper, the transfer distance is 17 m. If the transfer materials are optical fibers, the transfer distance 10 km. 8. Support for partitioning IB subnets are divided into multiple partitions, providing improved performance and security. 9. Error tolerance Multiple physical channels (separate from each other) are built between the host system and the I/O devices to achieve fault tolerance. 10. IPV6 addressing IB uses the IPv6 header format, in which a data packet contains the source address and destination address of the data. These addresses enable the IB switches and routers to directly forward data to specific devices according to their forwarding tables (configured by the SM-SMP-SMA). 2.3 Customer Benefits Typical Application Scenarios Customer Benefits The IB network applies to the high-concurrency and high-performance computing application scenarios. In these scenarios, customers have demanding bandwidth and latency requirements. General networking is that both front-end and back-end networks use IB, or the front-end network uses 10GE and the back-end network uses IB. The OceanStor 9000 supports both the prior networking modes. In the typical scenario, 4-channel QDR IB adapters and switches are used. With its high bandwidth, low latency, high reliability, and massive cluster scalability, and the adaption of the RDMA technology and dedicated protocol offload engine, IB provides storage customers with sufficient bandwidth and low response latency. 2014-6-24 Huawei Confidential Page 6
3 Acronyms and Abbreviations DDR FDR IB QDR SDR Double Data Rate Fourteen Data Rate InfiniBand Quad Data Rate Single Data Rate 2014-6-24 Huawei Confidential Page 7