STO1193BU A Closer Look at vsan Networking Design and Configuration Considerations Cormac Hogan Andreas Scherr VMworld 2017 Content: Not for publication #VMworld #STO1193BU
Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitment from VMware to deliver these features in any generally available product. Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features discussed or presented have not been determined. 2
Agenda 1 vsan Networking Overview 2 Multicast and Unicast 3 NIC Teaming and Load Balancing 4 Network Topologies (incl. Stretched and 2-node) 5 Network Performance Considerations 3
Where Should I Begin? StorageHub! https://storagehub.vmware.com/#!/vmware-vsan/plan-and-design 4
vsan Networking Overview 5
vsan Networking Major Software Components CMMDS (Cluster Monitoring, Membership, and Directory Service) Inter cluster communications and metadata exchange Multicast with <= vsan 6.5 Unicast with >= vsan 6.6 Heartbeat sent from master to all hosts every second Traffic light in steady state RDT (Reliable Datagram Transport) Bulk of vsan traffic Virtual Disk data distributed across cluster Replication /Resynch Traffic 6
vsan Networking Ports and Firewalls ESXi Firewall considerations On enablement of vsan on a given cluster, all required ports are enabled/disabled automatically; no admin action Ports CMMDS (UDP 12345, 23451, 12321) RDT (TCP 2233) VSANVP (TCP 8080) Witness Host (TCP port 2233 and UDP Port 12321) vsan Encryption / KMS Server Communication between vcenter and KMS to obtain keys vsan Encryption has special dynamic firewall rule opened on demand on ESXi hosts 7
Network Connectivity IPv6 vsan can operate in IPv6-only mode Available since vsan 6.2 All network communications are through IPv6 network vsan supports mixed IPv4 & IPv6 during upgrade only Do not run mixed mode in production 8
Minimum NIC Requirements for vsan Networking +10Gb support 1Gb support Hybrid Cluster Y Y All-Flash Cluster Y N Stretched Cluster - Data to Data Y N Stretched Cluster - Witness to Data Y Y 2-node Data to Data Y Y 2-node Witness to Data Y Y Comments 10Gb min. recommended, but 1Gb supported, <1ms RTT All Flash requires 10Gb min. 1Gb not supported, <1ms RTT 10Gb required between data sites*, <5ms RTT 100Mbps connectivity required from data sites to witness. <200ms RTT 10Gb min. required for All-Flash. 1Gb supported for hybrid, but 10Gb recommended 1.5Mbps bandwidth required. <500ms RTT 9
Distributed or Standard Switches? vsphere Standard Switch No management dependence on vcenter Recovery is simple Prone to misconfiguration in larger setups vsphere Distributed Switch Consistency Avoids configuration skew Teaming and Failover LACP/LAG/ether-channel Network I/O Control Manage/allocate network bandwidth for different vsphere traffic types vsphere Distributed Switch is Free with vsan 10
Network I/O Control (NIOC) Configuration Sample Single 10-GbE physical adapters for simplicity NICs handles traffic for vsan, vmotion, and virtual machines and management traffic If adapter becomes saturated, Network I/O Control controls bandwidth allocation Sample configuration: Traffic Type Custom Shares Value Bandwidth vsan 100 5Gbps vmotion 50 2.5Gbps Virtual Machine 30 1.5Gbp Management 20 1Gbps 11
NIC Teaming and Failover Options Keep it simple folks! All Virtual Switches Support (vss + vds) Routed based on IP Hash / Virtual Port ID Distributed Switch Only (vds) Route based on Physical NIC Load (LBT) Distributed Switch + Physical Switch Only Physical switches that support LACP/LAG/etherchannel provide additional load balancing algorithms Multi chassis link aggregation capable switches 12
vsan Multicast & Unicast
What Is Multicast? vsan 6.5 (and earlier) used multicast traffic as a discovery protocol to find all other nodes trying to join a vsan cluster Multicast is a network communication technique utilized to send information simultaneously (one-to-many or many-to-many) to a group of destinations over an IP network Multicast needs to be enabled on the switch/routers of the physical network Internet Group Management Protocol (IGMP) used within an L2 domain for group membership (follow switch vendor recommendations) Protocol Independent Multicast (PIM) used for routing multicast traffic to a different L3 domain Multicast added complexity to vsan networking 14
IGMP Considerations Consideration with multiple vsan clusters Prevent individual clusters from receiving all multicast streams Option 1 Separate VLANs for each vsan cluster Option 2 - When multiple vsan clusters reside on the same layer 2 network, VMware recommends changing the default multicast address See VMware KB 2075451 15
Multicast Group Address on vsan The vsan Master Group Multicast Address created is 224.1.2.3 CMMDS updates The vsan Agent Group Multicast Address is 224.2.3.4 heartbeats The vsan traffic service will assign the default multicast address settings to each host node # esxcli vsan network list Interface VmkNic Name: vmk2 IP Protocol: IP Interface UUID: 26ce8f58-7e8b-062e-ba57-a0369f56deac Agent Group Multicast Address: 224.2.3.4 Agent Group IPv6 Multicast Address: ff19::2:3:4 Agent Group Multicast Port: 23451 Master Group Multicast Address: 224.1.2.3 Master Group IPv6 Multicast Address: ff19::1:2:3 Master Group Multicast Port: 12345 Host Unicast Channel Bound Port: 12321 Multicast TTL: 5 16
vsan 6.6 Introduces Unicast in Place of Multicast for vsan Communication VMworld 2017 Content: Not for publication
vsan and Unicast vsan 6.6 now communicates using unicast for CMMDS updates A unicast transmission/stream sends IP packets to a single recipient on a network vcenter becomes the new source of truth for vsan membership List of nodes is pushed to the CMMDS layer The Networking Mode (unicast/multicast) is not configurable VMworld 2017 vsan 6.6 and above Unicast Content: Not for publication 18
vsan and Unicast The Cluster summary now shows if a vsan cluster network mode is Unicast or Multicast: 19
Member Coordination with Unicast on vsan 6.6 vcenter now becomes the source of truth for vsan cluster membership with unicast The vsan cluster continues to operate in multicast mode until all participating nodes are upgraded to vsan 6.6 All hosts maintain a configuration generation number in case vcenter has an outage. On recovery, vcenter checks the configuration generation number to see if the cluster configuration has changed in its absence. VMworld 2017 vcenter Content: Not for publication 20
New Unicast Considerations in vsan 6.6 VMworld 2017 Content: Not for publication
Upgrade / Mixed Cluster Considerations with Unicast vsan Cluster Software Configuration Disk Format Version(s) CMMDS Mode 6.6 Only Nodes* All Version 5 Unicast 6.6 Only Nodes* Mixed 6.6 and vsan pre-6.6 Nodes Mixed 6.6 and vsan pre-6.6 Nodes All Version 3 or below Mixed Version 5 with Version 3 or below All Version 3 or Below Unicast Unicast Multicast Comments Permanently operates in unicast. Cannot switch to multicast. Adding pre-6.6 nodes will partition cluster. 6.6 nodes operate in unicast mode. Switches back to multicast if < vsan 6.6 node added 6.6 nodes with v5 disks operate in unicast mode. Pre-6.6 nodes with v3 disks will operate in multicast mode. *** This will cause a cluster partition if mixed in a cluster! *** Cluster operates in multicast mode. All vsan nodes must be upgraded to 6.6 to switch to unicast mode. *** Disk format upgrade to v5 makes unicast permanent *** 22
Considerations with Unicast Considerations with vsan 6.6 unicast and DHCP vcenter Server deployed on a vsan 6.6 cluster vsan 6.6 nodes obtained IP addresses via DHCP If IP addresses change, vcenter VM may become unavailable Can lead to cluster partition as vcenter cannot update membership This is not supported unless DHCP reservations are used Considerations with vsan 6.6 unicast and IPv6 IPv6 is supported with unicast communications in vsan 6.6 However IPv6 Link Local Addresses are not supported for unicast communications on vsan 6.6 vsan doesn t use link local addresses to track membership vcenter 23
Query Unicast with esxcli vsan cluster node now displays the CMMDS networking mode - unicast or multicast esxcli vsan cluster get 24
Query Unicast with esxcli One can also check which vsan cluster nodes are operating in unicast mode esxcli vsan cluster unicastagent list: Unicast info is also displayed in vsan network details esxcli vsan network list 25
NIC Teaming and Load-Balancing Recommendations VMworld 2017 Content: Not for publication
NIC Teaming Single vmknic, Multiple vmnics (Uplinks) VMworld 2017 Route based on originating virtual port Pros Simplest teaming mode, with very minimum physical switch configuration. Cons A single VMkernel interface cannot use more than a single physical NIC's bandwidth. Route Based on Physical NIC Load Content: Not for publication Pros No physical switch configuration required. Cons Since only one VMkernel port, effectiveness of using this is limited Minor overhead when ESXi re-evaluates the load 27
Load Balancing - Single vmknic, Multiple vmnics (Uplinks) 1000000 900000 800000 700000 600000 500000 400000 300000 200000 100000 0 KBps Utilization per vmnic -Multiple VMknics Node 1 Node 2 Node 3 Node 4 vmnic0 vmnic1 vsan does not use NIC teaming for load balancing vsan has no load balancing mechanism to differentiate between multiple vmknics. As such, the vsan IO path chosen is not deterministic across physical NICs 28
NIC Teaming LACP & LAG (***Preferred***) Pros Improves performance and bandwidth If a NIC fails and the link-state goes down, the remaining NICs in the team continue to pass traffic. Many load balancing options Rebalancing of traffic after failures is automatic Based on 802.3ad standards. Cons VMworld 2017 Content: Not for publication Requires that physical switch ports be configured in a port-channel configuration. Complexity on configuration and maintenance 29
Load Balancing LACP & LAG (***Preferred***) 500000 450000 400000 350000 300000 250000 200000 150000 100000 50000 KBps Utilization per vmnic - LACP Setup VMworld 2017 More consistency compared to Route based on physical NIC load More individual Clients (VMs) will cause further increase probability of a balanced load Content: Not for publication 0 Node 1 Node 2 Node 3 Node 4 vmnic0 vmnic1 30
vsan Network on Different Subnets (air-gap) VMworld 2017 vsan networks on 2 different subnets? If subnets are routed, and one host s NIC fails, host will communicate on other subnet If subnets are air-gapped, and one host s NIC fails, it will not be able to communicate to the other hosts via other subnet Content: Not for publication That host with failing NIC will become isolated No software controlled failover mechanism TCP timeout 90sec on failure 31
Supported Network Topologies
Topologies Single site, multiple hosts Single site, multiple hosts with Fault Domains Multiple sites, multiple hosts with Fault Domains (campus cluster but not stretched cluster) Stretched Cluster ROBO/2-node Design considerations L2/L3 Multicast/Unicast RTT (round-trip-time) 33
Simplest Topology - Layer-2, Single Site, Single Rack Single site, multiple hosts, shared subnet/vlan/l2 topology, multicast with IGMP No need to worry about routing the multicast traffic in pre-vsan 6.6 deployments Layer-2 implementations are simplified even further with vsan 6.6, and unicast. With such a deployment, IGMP snooping is not required 34
Layer-2, Single Site, Multiple Racks pre-vsan 6.6 (multicast) pre-vsan 6.6 where vsan traffic is multicast Vendor specific multicast configuration required (IGMP/PIM) 35
Layer-2, Single Site, Multiple Racks 6.6 and Later (Unicast) vsan 6.6 where vsan traffic is unicast No need to configure IGMP/PIM on the switches 36
Stretch Cluster Topologies
Stretched Cluster L2 for Data, L3 to Witness or L3 Everywhere vsan 6.5 and earlier, traffic between data sites is multicast (meta) and unicast (IO). vsan 6.6 and later, all traffic is unicast In all versions of vsan, the witness traffic between a data site and the witness site has always been unicast 38
Stretched Cluster - Why not L2 Everywhere? (Unsupported) Consider a situation where the link between S2 and S3 is broken Spanning Tree may discover a path between S2 and S3 exists via switch S1 Possible performance decrease if data network traffic passes through a lower specification witness site 39
2-Node (ROBO) 40
2-Node vsan for Remote Locations vsphere Cluster Witness Witness vsan 500ms RTT latency 1.5Mbps bandwidth vsphere 500ms RTT latency 1.5Mbps bandwidth vsan vsphere vsan Both hosts in remote office store data Witness in central office or 3 rd site stores witness data Unicast connectivity to witness appliance 500ms RTT Latency 1.5Mbps bandwidth from Data Site to Witness vsphere vsan 41
2-node Direct Connect and Witness traffic separation witness management & witness traffic 10GbE vsan traffic via Direct Cable Separating the vsan data traffic from witness traffic Ability to connect Data nodes directly using Ethernet cables Two cables between hosts for higher availability of network Witness traffic uses management network Note: Witness Traffic Separation is NOT supported for stretch Cluster at this time VSAN Datastore 42
vsan and Network Performance
General Concept on Network Performance Understanding vsan concepts and features Standard vsan setup vs. Stretch Cluster, FTT=1 or RAID5/6 Understand Network Best Practice for optimum Performance physical switch topology ISL trunks are not over subscripted MTU size factor No errors/drops/pause frames on the Network switches 44
General Concept on Network Performance Understand Host communication No errors/drops/crc/pause frames on the Network card Driver/Firmware as per our HCL Use SFP/Gbic certified by your Hardware Vendor Use of NIOC to optimize traffic on the protocol layer if links sharing traffic (Ex. VM/vMotion/..) 45
DEMO: Adding 10ms Network Latency 46
Summary: Graphical Interpretation IOPS vs. Latency IOPS 50000 45000 40000 35000 30000 25000 20000 15000 10000 5000 Native = ~47000 IOPS +5ms latency = ~33000 IOPS +10ms latency = ~23100 IOPS 0 0 5 10 15 20 25 additional latency increase ms latency ms Linear (latency ms) 47
DEMO: Network 2% and 10% Packet Loss 48
Summary: Graphical Interpretation IOPS vs. Loss % IOPS 50000 45000 40000 35000 30000 25000 20000 15000 10000 5000 Native = ~47000 IOPS 1% loss = ~42300 IOPS 2% loss = ~32000 IOPS 10% loss = ~3400 IOPS 0 0 5 10 15 20 25 % loss loss % Expon. (loss %) 49
Cormac Hogan @CormacJHogan Andreas Scherr @vsantester