CHAPTER 6 Host Redundancy, and IPoIB and SRP Redundancies This chapter describes host redundancy, IPoIB redundancy, and SRP redundancy and includes the following sections: HCA Redundancy, page 6-1 IPoIB High Availability, page 6-4 OFED SRP High Availability, page 6-8 IPoIB and SRP are drivers that currently support redundancy. For expansions of acronyms and abbreviations used in this publication, see Appendix A, Acronyms and Abbreviations. HCA Redundancy This section describes HCA redundancy and includes the following topics: Single HCA Redundancy, page 6-1 Multiple HCA Redundancy, page 6-3 Two HCAs with the IBM BladeCenter, page 6-3 Single HCA Redundancy This section describes how a single HCA may be configured to provide redundancy. Single HCAs can each have two ports for redundancy within a single unit. Because such HCAs contain two ports, port-to-port redundancy can be achieved with a single, dual-port HCA (see Figure 6-1). In such cases, the HCA hardware and software drivers handle failovers between ports on the same HCA. 6-1
HCA Redundancy Chapter 6 See the Cisco InfiniBand Host Channel Adapter Hardware Installation Guide for further details about your HCA hardware installation. Figure 6-1 Single HCA Redundancy with Dual Ports InfiniBand Fabric 1 InfiniBand Fabric 2 SFS 7000D-1 SFS 7000D-2 InfiniBand Host 182855 6-2
Chapter 6 HCA Redundancy Multiple HCA Redundancy This section describes how multiple HCAs may be configured to provide redundancy. Multiple HCAs can be installed in a single host. This enables network traffic to failover from one HCA to another HCA. For example, redundancy can be provided with two HCAs serving one host. Installing two HCAs in one host is the minimum recommended configuration for a redundant IB fabric (see Figure 6-2). Such a configuration provides an extra level of redundancy at the host level. Figure 6-2 Two HCAs in a Single Host for Redundancy InfiniBand Fabric 1 InfiniBand Fabric 2 SFS 7000D-1 SFS 7000D-2 Port 1 Port 2 HCA-1 HCA-2 InfiniBand Host 183267 Two HCAs with the IBM BladeCenter For a description about this redundant configuration, see Chapter 3, InfiniBand Server Switch Module Redundancy for the IBM BladeCenter. 6-3
IPoIB High Availability Chapter 6 IPoIB High Availability This section describes IPoIB high availability and includes the following topics: Cisco SFS IPoIB High Availability, page 6-4 OFED IPoIB High Availability, page 6-6 Every host that complies with the RFC-4391 IPoIB specification can use Ethernet gateway redundancies. For more information about Ethernet gateway redundancies, see Chapter 7, Ethernet Gateway and IPoIB Redundancies. Cisco SFS IPoIB High Availability Merging Physical Ports This section describes IPoIB high availability and includes the following topics: Merging Physical Ports Unmerging Physical Ports IPoIB supports active/passive port failover high availability between two or more ports. When you enable the high availability feature, the ports on the HCA (for example, ib0 and ib1) merge into one virtual port. If you configure high availability between the ports on the HCA(s), only one of the physical ports passes traffic. The other ports are used as standby in the event of a failure. For more details about the Cisco SFS host drivers, see the Cisco SFS InfiniBand Host Drivers User Guide for Linux. To configure IPoIB high availability on HCA ports in a Linux host, perform the following steps: Step 1 Step 2 Step 3 Log in to your Linux host. Display the available interfaces by entering the ipoibcfg list command. The following example shows how to configure IPoIB high availability between two ports on one HCA. The following example shows how to display the available interfaces: host1# /usr/local/topspin/sbin/ipoibcfg list ib0 (P_Key 0xffff) (SL:255) (Ports: InfiniHost0/1, Active: InfiniHost0/1) ib1 (P_Key 0xffff) (SL:255) (Ports: InfiniHost0/2, Active: InfiniHost0/2) Take the interfaces offline. You cannot merge interfaces until you take them offline. The following example shows how to take the interfaces offline: host1# ifconfig ib0 down host1# ifconfig ib1 down 6-4
Chapter 6 IPoIB High Availability Step 4 Step 5 Merge the two ports into one virtual IPoIB high availability port by entering the ipoibcfg merge command with the IB identifiers of the first and the second IB ports on the HCA. The following example shows how to merge the two ports into one virtual IPoIB high availability port: host1# /usr/local/topspin/sbin/ipoibcfg merge ib0 ib1 Display the available interfaces by entering the ipoibcfg list command. The following example shows how to display the available interfaces: host1# /usr/local/topspin/sbin/ipoibcfg list ib0 (P_Key 0xffff) (SL:255) (Ports: InfiniHost0/1, Active: InfiniHost0/1) The ib1 interface no longer appears, as it is merged with ib0. Step 6 Step 7 Enable the interface by entering the ifconfig command with the appropriate port identifier ib# argument and the up keyword. The following example shows how to enable the interface with the ifconfig command: host1# ifconfig ib0 up Assign an IP address to the merged port just as you would assign an IP address to a standard interface. Unmerging Physical Ports To unmerge physical ports and disable active-passive IPoIB high availability, perform the following steps: Step 1 Step 2 Disable the IPoIB high availability interface that you want to unmerge by entering the ifconfig command with the appropriate IB interface argument and the down argument. The following example shows how to unmerge by disabling the IPoIB high availability interface: host1# ifconfig ib0 down Unmerge the port by entering the ipoibcfg unmerge command with the identifier of the port that you want to unmerge. The following example shows how to unmerge the port: host1# /usr/local/topspin/sbin/ipoibcfg unmerge ib0 ib1 After unmerging the port, ib1 no longer has an IP address and must be configured. Step 3 Display the available interfaces by entering the ipoibcfg list command. The following example shows how to display the available interfaces: host1# /usr/local/topspin/sbin/ipoibcfg list ib0 (P_Key 0xffff) (SL:255) (Ports: InfiniHost0/1, Active: InfiniHost0/1) ib1 (P_Key 0xffff) (SL:255) (Ports: InfiniHost0/2, Active: InfiniHost0/2) 6-5
IPoIB High Availability Chapter 6 Step 4 Enable the interfaces by entering the ifconfig command with the appropriate IB interface argument and the up argument. The following example shows how to enable the interfaces: host1# ifconfig ib0 up OFED IPoIB High Availability This section describes the OFED IPoIB high availability and includes the following topics: Configuring IPoIB High Availability Verifying IPoIB High Availability This section describes IPoIB high availability. IPoIB supports active/passive port failover high availability between two or more ports. When you enable the high availability feature, the ports on the HCA(s) (such as ib0 and ib1) bond into one virtual port. If you configure high availability between the ports on the HCA(s), only one of the physical ports passes traffic. The other ports are used as standby in the event of a failure. IPoIB high availability is implemented through the IPoIB bonding driver. This driver is based on the Linux Ethernet bonding driver and has been changed to work with IPoIB. The ib-bonding package contains the bonding driver and a utility named ib-bond to manage and control the driver operation. For more details about OFED host drivers, see the Cisco OpenFabrics Enterprise Distribution InfiniBand Host Drivers User Guide for Linux. Configuring IPoIB High Availability To configure IPoIB high availability, perform the following steps: Step 1 Step 2 Remove the existing IP addresses from the interfaces. The IP address from ib0 will be reassigned to the bonding interface. The following example shows how to remove the existing IP addresses: host1# ifconfig ib0 0.0.0.0 host1# ifconfig ib1 0.0.0.0 Bond the two ports into one virtual IPoIB high availability port by using the ib-bond command. The following example shows how to bond two ports into one virtual IPoIB high availability port in verbose mode: host1# ib-bond --bond-ip 192.168.0.1/24 --slaves ib0,ib1 -v enslaving ib0 enslaving ib1 bonding is up: 192.168.0.1 bond0: 80:00:04:04:fe:80:00:00:00:00:00:00:00:05:ad:02:00:23:f0:d0 192.168.0.1/24 slave0: ib0 * slave1: ib1 In the preceding output, ib0 * indicates that ib0 is the active interface, and ib1 is the passive interface. Partition interfaces such as ib0.8002 can also be used with IPoIB high availability. In addition, /24 is the subnet mask. 6-6
Chapter 6 IPoIB High Availability Step 3 (Optional) Enter the ifconfig command. The following example shows how to enter the ifconfig command: host1# ifconfig bond0 bond0 Link encap:infiniband HWaddr 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::205:ad00:20:849/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:65520 Metric:1 RX packets:33523452 errors:0 dropped:0 overruns:0 frame:0 TX packets:165408699 errors:2 dropped:3 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:175570845580 (163.5 GiB) TX bytes:619840713192 (577.2 GiB) The IPoIB high availability status information can be printed at any time with the ib-bond --status-all command. The following example shows how to print the IPoIB high availability status information: host1# ib-bond --status-all bond0: 80:00:04:04:fe:80:00:00:00:00:00:00:00:05:ad:02:00:23:f0:d0 192.168.0.1/24 slave0: ib0 * slave1: ib1 The IPoIB high availability configuration can be removed with the ib-bond --stop-all command. The following example shows how to remove the IPoIB high availability configuration: host1# ib-bond --stop-all IPoIB high availability interfaces that are configured manually are not persistent across reboots. You must use the configuration file /etc/infiniband/openib.conf to configure IPoIB high availability when the host boots. You must also remove any existing IPoIB boot-time configuration files such as ifcfg-ib0. The following example shows the portion of openib.conf that must be edited to configure IPoIB high availability at boot time: # Enable the bonding driver on startup IPOIBBOND_ENABLE=yes # Set bond interface names IPOIB_BONDS=bond0 # Set specific bond params; address and slaves bond0_ip=192.168.0.1/24 bond0_slaves=ib0,ib1 The drivers can be restarted for the change to take effect without rebooting. The following example shows how the drivers can be restarted: host1# /etc/init.d/openibd restart Unloading HCA driver: [ OK ] Loading HCA driver and Access Layer: [ OK ] Setting up InfiniBand network interfaces: No configuration found for ib0 No configuration found for ib1 Setting up service network... [ done ] Setting up bonding interfaces: Bringing up interface bond0 [ OK ] 6-7
OFED SRP High Availability Chapter 6 Verifying IPoIB High Availability To force an IPoIB high availability failover while IPoIB traffic is running, perform the following steps: Step 1 Step 2 Start ping or Netperf between two IPoIB hosts. For more details about how to start ping or Netperf between two IPoIB hosts, see the Cisco OpenFabrics Enterprise Distribution InfiniBand Host Drivers User Guide for Linux. Remove the cable connected to ib0, either manually or by using the OS CLI or GUI. Print the ib-bond --status-all command to verify the IPoIB high availability status. The following example shows how to print the IPoIB high availability status: host1# ib-bond --status-all bond0: 80:00:04:04:fe:80:00:00:00:00:00:00:00:05:ad:02:00:23:f0:d0 192.168.0.1/24 slave0: ib0 slave1: ib1 * The ib-bond --status-all command prints an asterix next to the primary interface. The primary interface has now switched to ib1 as shown in the preceding example. A kernel sys log message is also printed every time there is a failover. host1# dmesg bonding: bond0: link status definitely down for interface ib0, disabling it bonding: bond0: making interface ib1 the new active one. bonding: send gratuitous arp: bond bond0 slave ib1 Step 3 Verify that the ping or Netperf continues with little or no interruption. The Element Manager GUI can also be used to display port statistics, which is useful for watching a port failover. For more information about the Element Manager GUI, see the Cisco SFS Product Family Element Manager User Guide. OFED SRP High Availability This section describes how to configure SRP for use with Device Mapper Multipath, which is included with both RHEL and SLES. Device Mapper Multipath supports both active/active (load balancing and failover) and active/passive (failover only) high availability, depending on the capability of the storage device. SRP should always be used with multipathing software for high availability, to prevent data corruption and data loss. Other third-party multipathing software can also be used with SRP, for configuration information. Consult the relevant documentation for that software. Device Mapper Multipath allows hosts to route I/O over the multiple paths available to an end storage unit. A path refers to the connection from a host IB port to a storage controller port. When an active path through which I/O happens fails, Device Mapper Multipath reroutes the I/O over other available paths. In a Linux host, when there are multiple paths to a storage controller, each path appears as a separate block device and hence results in multiple block devices for a single LUN. Device Mapper Multipath creates a new multipath block device for those devices having the same LUN WWN. For example, a host with two IB ports attached to a Cisco SFS 3012R Server Switch with two Fibre Channel port(s) attached 6-8
Chapter 6 OFED SRP High Availability to a storage controller, sees two block devices: /dev/sda and /dev/sdb, for example. Device Mapper Multipath creates a single block device, /dev/mapper/360003ba27cf53000429f82b300016652, that reroutes I/O through those two underlying block devices. Device Mapper Multipath includes the following software components: dm-multipath kernel module routes I/O and does failover to paths multipath configuration tool provides commands to configure, list, and flush multipath devices multipathd daemon monitors paths to check if paths have failed or been fixed Independent of storage device high availability capability, the Device Mapper Multipath provides active/active high availability on host IB ports. The Cisco SFS Fibre Channel gateway similarly provides active/active high availability between the SFS chassis and the Fibre Channel fabric. To configure SRP high availability with Device Mapper Multipath, perform the following steps: Step 1 Edit the file /etc/infiniband/openib.conf. Change SRPHA_ENABLE=no to SRPHA_ENABLE=yes. This starts the srp_daemon program at boot time to create block devices for all paths to the SRP storage. The srp_daemon program also handles dynamic storage reconfiguration, such as new storage being added after the host is booted. Both SRP_LOAD and SRPHA_ENABLE must be set to yes for SRP high availability to function correctly. Step 2 (Optional) Edit the file /etc/srp_daemon.conf to restrict SRP host driver access to a subset of available SRP targets. By default, srp_daemon configures block devices for all SRP targets. The default /etc/srp_daemon.conf file contains this information. For more details about the srp_daemon.conf file, see the Cisco OpenFabrics Enterprise Distribution InfiniBand Host Drivers User Guide for Linux. Step 3 Edit the file /etc/multipath.conf. On RHEL4, the devnode_blacklist section (blacklist on RHEL5) should be removed, commented out, or modified. The following example shows the section of the file to be edited: devnode_blacklist { devnode "*" } (Optional) Change user_friendly_names yes to user_friendly_names no in /etc/multipath.conf on RHEL as well. The friendly names are not consistent between different hosts and operating systems. On SLES, /etc/multipath.conf does not exist by default, and devnode_blacklist is not in effect. On both RHEL and SLES, additional storage-specific configuration information may be required in /etc/multipath.conf. Consult your storage device documentation for more details. For more information on multipath.conf, consult the device-mapper-multipath package (RHEL) or multipath-tools (SLES). Both packages have well-documented sample multipath.conf example files in /usr/share/doc. 6-9
OFED SRP High Availability Chapter 6 Step 4 Configure Device Mapper Multipath to start at boot time. The following example shows the command to enter on RHEL: host1# chkconfig multipathd on The following example shows the commands to enter on SLES: host1# chkconfig boot.multipath on host1# chkconfig multipathd on Step 5 Step 6 Reboot the Linux host. After the reboot, multipath SRP devices should be accessible in /dev/mapper. Depending on the configuration, it may take a few minutes after reboot for /dev/mapper to be fully populated. The following example shows the output for the Fibre Channel gateway configuration when the IB host has one HCA with both IB ports connected to the Server Fabric Switch: host1# ls /dev/mapper 3600c0ff00000000007a6d11b6f245e00 3600c0ff00000000007a6d11b6f245e00p1 3600c0ff00000000007a6d11b6f245e00p2 3600c0ff00000000007a6d11b6f245e01 3600c0ff00000000007a6d11b6f245e02 3600c0ff00000000007a6d11b6f245e02p1 3600c0ff00000000007a6d11b6f245e02p2 3600c0ff00000000007a6d11b6f245e02p3 3600c0ff00000000007a6d11b6f245e03 3600c0ff00000000007a6d11b6f245e03p1 3600c0ff00000000007a6d11b6f245e03p2 3600c0ff00000000007a6d11b6f245e04 3600c0ff00000000007a6d11b6f245e05 3600c0ff00000000007a6d11b6f245e06 3600c0ff00000000007a6d11b6f245e06p1 control View the SCSI devices. The following example shows how to view the SCSI devices: host1# lsscsi [0:0:0:0] disk IBM-ESXS MAY2036RC T107 /dev/sda [1:0:0:0] disk SUN StorEdge 3510 327P /dev/sdb [1:0:0:1] disk SUN StorEdge 3510 327P /dev/sdd [1:0:0:2] disk SUN StorEdge 3510 327P /dev/sdk [1:0:0:3] disk SUN StorEdge 3510 327P /dev/sdl [1:0:0:4] disk SUN StorEdge 3510 327P /dev/sdm [1:0:0:5] disk SUN StorEdge 3510 327P /dev/sdn [1:0:0:6] disk SUN StorEdge 3510 327P /dev/sdo [2:0:0:0] disk SUN StorEdge 3510 327P /dev/sdc [2:0:0:1] disk SUN StorEdge 3510 327P /dev/sde [2:0:0:2] disk SUN StorEdge 3510 327P /dev/sdf [2:0:0:3] disk SUN StorEdge 3510 327P /dev/sdg [2:0:0:4] disk SUN StorEdge 3510 327P /dev/sdh [2:0:0:5] disk SUN StorEdge 3510 327P /dev/sdi [2:0:0:6] disk SUN StorEdge 3510 327P /dev/sdj The lsscsi command is supported by RHEL5 and SLES10 only. The lsscsi command is not supported by RHEL4. 6-10
Chapter 6 OFED SRP High Availability Step 7 List the relationship between the SRP block devices and multipath devices by using the multipath -l command. The following example shows how to use the multipath -l command: host1# multipath -l 3600c0ff00000000007a6d11b6f245e06dm-13 SUN,StorEdge 3510 [size=13g][features=0][hwhandler=0] \_ 2:0:0:6 sdj 8:144 [active][undef] \_ 1:0:0:6 sdo 8:224 [active][undef] 3600c0ff00000000007a6d11b6f245e05dm-12 SUN,StorEdge 3510 [size=15g][features=0][hwhandler=0] \_ 2:0:0:5 sdi 8:128 [active][undef] \_ 1:0:0:5 sdn 8:208 [active][undef] 3600c0ff00000000007a6d11b6f245e04dm-11 SUN,StorEdge 3510 [size=15g][features=0][hwhandler=0] \_ 2:0:0:4 sdh 8:112 [active][undef] \_ 1:0:0:4 sdm 8:192 [active][undef] 3600c0ff00000000007a6d11b6f245e03dm-3 SUN,StorEdge 3510 [size=15g][features=0][hwhandler=0] \_ 2:0:0:3 sdg 8:96 [active][undef] \_ 1:0:0:3 sdl 8:176 [active][undef] 3600c0ff00000000007a6d11b6f245e02dm-2 SUN,StorEdge 3510 [size=15g][features=0][hwhandler=0] \_ 2:0:0:2 sdf 8:80 [active][undef] \_ 1:0:0:2 sdk 8:160 [active][undef] 3600c0ff00000000007a6d11b6f245e01dm-1 SUN,StorEdge 3510 [size=15g][features=0][hwhandler=0] \_ 2:0:0:1 sde 8:64 [active][undef] \_ 1:0:0:1 sdd 8:48 [active][undef] 3600c0ff00000000007a6d11b6f245e00dm-0 SUN,StorEdge 3510 [size=15g][features=0][hwhandler=0] \_ 2:0:0:0 sdc 8:32 [active][undef] \_ 1:0:0:0 sdb 8:16 [active][undef] In the preceding output, each multipath device corresponds to two SRP block devices, one on each of the two attached host IB ports. 6-11
OFED SRP High Availability Chapter 6 6-12