Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Similar documents
Intelligent Rebuilds in vsan 6.6 January 08, 2018

vsan All Flash Features First Published On: Last Updated On:

vsan Stretched Cluster & 2 Node Guide January 26, 2018

Adaptive Resync in vsan 6.7 First Published On: Last Updated On:

vsan 6.6 Performance Improvements First Published On: Last Updated On:

Administering VMware Virtual SAN. Modified on October 4, 2017 VMware vsphere 6.0 VMware vsan 6.2

Administering VMware vsan. Modified on October 4, 2017 VMware vsphere 6.5 VMware vsan 6.6.1

vsan Planning and Deployment Update 1 16 OCT 2018 VMware vsphere 6.7 VMware vsan 6.7

Administering VMware vsan. 17 APR 2018 VMware vsphere 6.7 VMware vsan 6.7

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

vsan Stretched Cluster Bandwidth Sizing First Published On: Last Updated On:

VMware vsan 6.7 Technical Overview First Published On: Last Updated On:

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Microsoft SQL Server 2014 on vsan 6.2 All-Flash December 15, 2017

VMWARE VIRTUAL SAN: ENTERPRISE-GRADE STORAGE FOR HYPER- CONVERGED INFRASTRUCTURES CHRISTOS KARAMANOLIS RAWLINSON RIVERA

Microsoft SQL Server 2014 on VMware vsan 6.2 All-Flash October 31, 2017

vsan Space Efficiency Technologies First Published On: Last Updated On:

DELL EMC VxRAIL vsan STRETCHED CLUSTERS PLANNING GUIDE

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

VMware vsan 6.6 Technical Overview First Published On: Last Updated On:

VMware Virtual SAN. Technical Walkthrough. Massimiliano Moschini Brand Specialist VCI - vexpert VMware Inc. All rights reserved.

What's New in vsan 6.2 First Published On: Last Updated On:

Table of Contents HOL HCI

What's New in VMware vsan 6.6 First Published On: Last Updated On:

VMware Virtual SAN 6.2 Space Efficiency Technologies

What s New in VMware Virtual SAN (VSAN) v 0.1c/AUGUST 2013

DELL EMC VXRAIL TM APPLIANCE OPERATIONS GUIDE

iscsi Target Usage Guide December 15, 2017

Running VMware vsan Witness Appliance in VMware vcloudair First Published On: April 26, 2017 Last Updated On: April 26, 2017

Reference Architecture: Lenovo Client Virtualization with VMware Horizon and System x Servers

vsan Remote Office Deployment January 09, 2018

Modern hyperconverged infrastructure. Karel Rudišar Systems Engineer, Vmware Inc.

VMware vsan 6.5 Technical Overview December 15, 2017

Native vsphere Storage for Remote and Branch Offices

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Best Practices for Deploying Hadoop Workloads on HCI Powered by vsan

VMware vsan 6.5 Technical Overview January 24, 2017

VMware Virtual SAN. High Performance Scalable Storage Architecture VMware Inc. All rights reserved.

TECHNICAL WHITE PAPER - JANUARY VMware Horizon 7 on VMware vsan Best Practices TECHNICAL WHITE PAPER

VMware vsphere Clusters in Security Zones

VMware vsan and HCI: Validate and Prove. Student Handbook. VMware vsan and HCI: Validate and Prove Page 1

vsan Security Zone Deployment First Published On: Last Updated On:

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

2014 VMware Inc. All rights reserved.

Developing a Hyper- Converged Storage Strategy for VMware vcloud Director with VMware vsan

Your World is Hybrid:

Dell Technologies IoT Solution Surveillance with Genetec Security Center

Vmware VCP550PSE. VMware Certified Professional on vsphere 5.

vsan Mixed Workloads First Published On: Last Updated On:

VMware vsphere APIs for I/O Filtering (VAIO) November 14, 2017

vsan Monitoring January 08, 2018

Storage Policies and vsan January 08, 2018

Delivering HCI with VMware vsan and Cisco UCS

Hedvig as backup target for Veeam

Cisco HyperFlex Hyperconverged Infrastructure Solution for SAP HANA

VMWARE VSAN LICENSING GUIDE - MARCH 2018 VMWARE VSAN 6.6. Licensing Guide

VMware vsan Design and Sizing Guide First Published On: February 21, 2017 Last Updated On: April 04, 2018

vsansparse Tech Note First Published On: Last Updated On:

Microsoft SQL Server 2014 on VMware VSAN 6 Hybrid October 30, 2017

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Tech Note: vsphere Replication with vsan First Published On: Last Updated On:

Exam Name: VMware Certified Professional on vsphere 5 (Private Beta)

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

XenApp and XenDesktop 7.12 on vsan 6.5 All-Flash January 08, 2018

vsan Data Encryption at Rest January 18, 2018

Introduction to Virtualization. From NDG In partnership with VMware IT Academy

The vsphere 6.0 Advantages Over Hyper- V

VMware vstorage APIs FOR ARRAY INTEGRATION WITH EMC VNX SERIES FOR SAN

VMware vsan 6.6. Licensing Guide. Revised May 2017

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Take control of storage performance

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

vsan Monitoring and Troubleshooting Update 1 16 OCT 2018 VMware vsphere 6.7 VMware vsan 6.7

Eliminate the Complexity of Multiple Infrastructure Silos

Using VMware vsphere Replication. vsphere Replication 6.5

VMware Virtual SAN Design and Sizing Guide for Horizon View Virtual Desktop Infrastructures TECHNICAL MARKETING DOCUMENTATION REV A /JULY 2014

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Understanding Data Locality in VMware vsan First Published On: Last Updated On:

VMware vsphere Administration Training. Course Content

Metro Availability. Nutanix Best Practices Guide

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Oracle Real Application Clusters on VMware vsan January 08, 2018

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

IOmark-VM. VMware VSAN Intel Servers + VMware VSAN Storage SW Test Report: VM-HC a Test Report Date: 16, August

Expert Reference Series of White Papers. VSAN: Reimagining Storage in vsphere

vsan Disaster Recovery November 19, 2017

White Paper Effects of the Deduplication/Compression Function in Virtual Platforms ETERNUS AF series and ETERNUS DX S4/S3 series

Storage Strategies for vsphere 5.5 users

Performance Testing December 16, 2017

Open vstorage RedHat Ceph Architectural Comparison

The Google File System

Virtual Server Agent for VMware VMware VADP Virtualization Architecture

VPLEX & RECOVERPOINT CONTINUOUS DATA PROTECTION AND AVAILABILITY FOR YOUR MOST CRITICAL DATA IDAN KENTOR

Fujitsu PRIMEFLEX for VMware vsan 20,000 User Mailbox Exchange 2016 Mailbox Resiliency Storage Solution

VMware vsphere 5.0 STORAGE-CENTRIC FEATURES AND INTEGRATION WITH EMC VNX PLATFORMS

Copyright 2013 EMC Corporation. All rights reserved. FLASH NEXT: Zero to One Million IOPs In A Flash

Detail the learning environment, remote access labs and course timings

Transcription:

STO1926BU A Day in the Life of a VSAN I/O Diving in to the I/O Flow of vsan John Nicholson (@lost_signal) Pete Koehler (@vmpete) VMworld 2017 Content: Not for publication #VMworld #STO1926BU

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitment from VMware to deliver these features in any generally available product. Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features discussed or presented have not been determined. 2

vsan Objects and Components 3

vsan s and components C1 RAID-0 C2 C3 Components C1 Object RAID-1 RAID-0 C2 C3 Components 700GB RAID-1 FTT=1 W RAID-0 Components The vsan datastore is an store Object store allows you to meet granular availability and performance requirements Each made up of one or more components Data (components) is distributed across cluster based on VM storage policy Copy of Object Copy of Object

Virtual Machine as a set of Objects on VSAN Snapshot VM Home VM Swap VMDK Snap delta Snap memory VM Home Namespace VM Swap Object Virtual Disk (VMDK) Object Snapshot (delta) Object Snapshot (delta) Memory Object 5

Applying performance and protection policies to s What If APIs VMworld 2017 Content: Not for Policies define levels of protection and performance Applied at a per VM level, or per VMDK level vsan currently provides 10 unique storage capabilities to vcenter Server publication 6

Number of Failures to Tolerate (How many copies of your data?) esxi-01 esxi-02 esxi-03 RAID-1 FTT=1 ~50% of I/O ~50% of I/O esxi-04 witness FTT defines the number of hosts, disk or network failures a storage can tolerate. For n failures tolerated, n+1 copies of the are created and 2n+1 host contributing storage are required! Primary Failures to Tolerate (PFTT) defines the number of sites that can accept failure. (0, 1) Secondary Failures to Tolerate (SFTT) defines the number within a site that can accept failure (0, 1, 2, 3)

Number of Disk Stripes Per Object (on how many devices?) RAID-0 esxi-01 esxi-02 esxi-03 RAID-1 RAID-0 stripe-1a stripe-1b FTT=1 Stripe width=2 stripe-2b stripe-2a esxi-04 witness Defines the minimum number of capacity devices across which each replica of a storage is distributed. Higher values may result in better performance. Stripe width can improve performance of write Destaging, and fetching of reads Higher values may put more constraints on flexibility of meeting storage compliance policies To be used only if performance is an issue

vsan Fault Domains FD1 RAID-1 esxi-01 esxi-02 Rack FTT=1 FD2 FD3 FD4 esxi-03 esxi-05 esxi-07 esxi-04 esxi-06 esxi-08 witness Rack Rack Rack Create fault domains to increase availability Protect against rack failure, etc. Example: Four defined fault domains FD1 = esxi-01, esxi-02 FD2 = esxi-03, esxi-04 FD3 = esxi-05, esxi-06 FD4 = esxi-07, esxi-08 Cluster can tolerate single rack failure in illustrated scenario 9

Nested fault domains Remote Protection for Stretched Clusters RAID-6 Cluster vsphere RAID-1 5ms RTT, 10GbE 3 rd site for witness vsan RAID-6 VMworld 2017 Content: Not for Cluster Redundancy locally and across sites With site failure, vsan maintains availability with local redundancy in surviving site publication No change in stretched cluster configuration steps Optimized site locality logic to minimize I/O traffic across sites

vsan I/O path explained 11

Anatomy of a write 1 2 esxi-01 esxi-02 esxi-03 RAID-1 FTT=1 vsphere 5 vsan 4 4 3 3 6 6 1. Guest OS issues write op to virtual disk 2. Owner clones write operation representing FTT policy 3. esxi-01, esxi-03 synchronously write to flash (log) 4. esxi-01, esxi-03 ACK prepare operation to owner 5. Owner receives ACK from both prepare operations and completes I/O 6. Batches of writes committed during destaging process

Anatomy of a Read (hybrid) 1 esxi-01 esxi-02 esxi-03 6 2 3 vsphere vsan 4 5 1. Guest OS issues a read on virtual disk 2. Owner chooses replica to read from Load balance across replicas Not necessarily local replica (if one) A block always reads from same replica 3. At chosen replica (esxi-03): read data from flash Read Cache, if present 4. Otherwise, read from HDD and place data in flash Read Cache Replace cold data 5. Return data to owner 6. Complete read and return data to VM

Anatomy of a Read (all-flash) esxi-01 esxi-02 esxi-03 1 6 2 3 vsphere vsan 4 5 1. Guest OS issues a read on virtual disk 2. Owner chooses replica to read from Load balance across replicas Not necessarily local replica (if one) A block always read from same replica 3. At chosen replica (esxi-03): read data from (write) Flash Cache, if present 4. Otherwise, read from capacity flash device 5. Return data to owner 6. Complete read and return data to VM

Orders per minute vsan Caching vmotion Orders per minute 5-minute moving average vmotion Consistent performance throughout Time (seconds) VMworld 2017 vsan caches based on frequency of data accessed, and spacial locality Smart data locality Improved flash utilization in cluster Avoid data migration with VM migration (e.g. DRS) Minor latency penalty Network latencies: 5 50 microseconds (10GbE) Flash latencies with real load: ~1 milliseconds Content: Not for publication vsan supports in-memory local cache. ( client cache ) Memory: very low latency Read caching using host RAM (0.4% of host RAM up to 1GB per host). Compliments CRBC

Checksum and disk scrubbing esxi-01 esxi-02 esxi-03 RAID-1 vsphere vsan Detects and resolves silent disk errors Checks data in flight, and at rest Upon checksum verification failure RAID-1: Fetches from other copy RAID-5/6: Rebuilds data Disk scrubbing will run in the background Dramatic performance improvements of checksum in 6.6 25%-73% more IOPS* 21%-44% reduction in latency*

Deduplication and Compression how it works esxi-01 esxi-02 esxi-03 RAID-1 vsphere vsan Deduplication Nearline Deduplication occurs on a per disk group level Deduplicated when destaging from cache tier to capacity tier Deduplication used 4KB fixed blocks for high dedup rates Compression Occurs after deduplication, prior to data being destaged If block is compressed <= 2KB Otherwise full 4KB block is stored All-Flash Only Beta

Deduplication and Compression Disk Group Stripes VMworld 2017 disk group Deduplication and Compression per disk group level Data stripes across the disk group Fault domain isolated to disk group Fault of device leads to rebuild of disk group Stripes reduce hotspots Endurance/Throughput Impact All-Flash Only Beta Content: Not for publication 18

All-Flash Only Deduplication and Compression (I/O Path) 1. VM issues write 2. VM issues write 3. Cold data to memory 4. Deduplication 5. Compression 6. Data written to capacity VMworld 2017 Avoids Inline or post process downsides Performed at disk group level 4KB fixed block LZ4 compression after deduplication Content: Not for publication 19

All-Flash Only Costs of Deduplication (nothing is free) 1. VM issues write 2. VM issues write 3. Cold data to memory 4. Deduplication 5. Compression 6. Data written to capacity CPU overhead Metadata and Memory overhead Overhead for Metadata? IO Overhead Metadata lookup Data movement from WB Fragmentation Endurance Overhead 20

All-Flash Only Costs of Compression (nothing is free) 1. VM issues write 2. VM issues write 3. Cold data to memory 4. Deduplication 5. Compression 6. Data written to capacity CPU overhead Capacity overhead Memory overhead IO overhead 21

All-Flash Only Erasure coding - RAID-5 and RAID-6 esxi-01 esxi-02 esxi-03 RAID-5 vsphere C1 C2 C3 vsan esxi-04 VMworld 2017 Content: Not for Object C4 Alternative to RAID-1 mirroring Guaranteed space efficiency feature available in all-flash configurations only Object comprised of components that are striped across devices publication Set per using SPBM policy RAID-5 implies a failures to tolerate (FTT) of 1 RAID-6 implies a failures to tolerate (FTT) of 2 22

All-Flash Only Data layout for RAID-5 parity data data data ESXi Host data parity data data ESXi Host RAID-5 data data parity data ESXi Host Object data data data parity ESXi Host Available in all-flash configurations only Example: FTT = 1 with FTM = RAID-5 3+1 (4 host minimum, 1 host can fail without data loss) 5 hosts would tolerate 1 host failure or maintenance mode state, and still maintain redundancy 1.33x instead of 2x overhead. 30% savings (20GB disk consumes 40GB with RAID-1, now consumes ~27GB with RAID-5)

Data-at-Rest Encryption Ingesting writes 5 esxi-01 esxi-02 esxi-03 RAID-1 1 3 2 6 7 8 4 9 vsphere vsan Encryption occurs in last step of I/O flow for highest level of protection and efficiency of dedup Incoming to buffer 1. Write I/O broken into 64K chunks 2. Checksum performed on 4K blocks 3. Encryption performed on 4K blocks 4. Lands in buffer Destaging 5. Decrypt performed on 4K blocks 6. Dedupe performed on 4K blocks 7. Compression performed on 4K blocks 8. Encryption performed on 2-4K blocks 9. Lands in persistent tier Data in flight is not encrypted

Swap Placement? VMworld 2017 Content: Not for Sparse Swap Reclaim Space used by memory swap Host advanced option enables setting How to set it? esxcfg-advcfg -g /VSAN/SwapThickProvisionDisabled publication https://github.com/jasemccarty/sparseswap 25

Snapshots for VSAN VMworld 2017 Content: Not for Not using VMFS Redo Logs Writes allocated into 4MB allocations Snapshot metadata cache (avoids read amplification) Performs Pre-Fetch of metadata cache publication Maximum: 31 26

vsan back end storage I/O explained 27

vsan storage traffic types vsphere vsan Datastore vsan VMworld 2017 Front end VM traffic Back end storage traffic Front end storage traffic Guest VM storage I/O traffic Back end storage traffic traffic Data resynchronizations Content: Not for publication Object policy changes Host or disk group evacuations Object or component rebalancing Object or component repairs 28

Repairs and Rebuilds 700GB RAID-1 FTT=1 W Occurs at the granular component level Reestablishes level of compliance for protection as defined in SPBM policy Repair process begins after 60 minutes from time reported as absent Works in non-stretched and stretched clusters

Repairs and Rebuilds 700GB RAID-1 FTT=1 C1 C2 C3 W Occurs at the granular component level Reestablishes level of compliance for protection as defined in SPBM policy Repair process begins after 60 minutes from time reported as absent Works in non-stretched and stretched clusters

Intelligent Rebuilds - Enhanced Rebalancing C1 RAID-0 C2 C3 70% 70% 85% 60% Disk capacity used on host BEFORE reactive rebalance. C1 RAID-0 VMworld 2017 Content: Not for C2 C3 C4 70% 70% 75% 70% Disk capacity used on host AFTER reactive rebalance with component split. Larger components can be split during redistribution Better balance Higher level of effective capacity publication Improved placement decisions reduces overhead. Faster time to completion Can be manually throttled for corner case scenarios Improved visibility in rebalancing status in Health & Performance Services

Intelligent Rebuilds - Smart, Efficient Repairs 700GB RAID-1 FTT=1 C1 C2 C3 W Two methods for repairs of offline components reappearing after 60 minutes. Calculates cost of methods of repair at time host comes back online Will choose most efficient method, and cancel other action Significant improvement in speed and efficiency of component repairs

Intelligent Rebuilds - Smart, Efficient Repairs 700GB RAID-1 FTT=1 W Will choose fastest option Two methods for repairs of offline components reappearing after 60 minutes. Calculates cost of methods of repair at time host comes back online Will choose most efficient method, and cancel other action Significant improvement in speed and efficiency of component repairs

Intelligent Rebuilds Using Partial Repairs Status: Two host failures Configured policy: FTT2 Effective compliance: FTT2 W W More resilient repair process Repairs as many components as possible even if not enough resources to ensure full compliance Remaining components will be repaired as soon as enough resources are available Works in non-stretched and stretched clusters

Intelligent Rebuilds Using Partial Repairs Status: Two host failures Configured policy: FTT2 Effective compliance: FTT0 W W More resilient repair process Repairs as many components as possible even if not enough resources to ensure full compliance Remaining components will be repaired as soon as enough resources are available Works in non-stretched and stretched clusters

Intelligent Rebuilds Using Partial Repairs Status: Partial repair completed Configured policy: FTT2 Effective compliance: FTT1 C1 W C2 C3 C1 C2 C3 More resilient repair process Repairs as many components as possible even if not enough resources to ensure full compliance Remaining components will be repaired as soon as enough resources are available Works in non-stretched and stretched clusters

Intelligent Rebuilds Using Partial Repairs Status: New host added. Full repair completed Configured policy: FTT2 Effective compliance: FTT2 W W More resilient repair process Repairs as many components as possible even if not enough resources to ensure full compliance Remaining components will be repaired as soon as enough resources are available Works in non-stretched and stretched clusters

Wrapping up 38

www.vspeakingpodcast.com @vpedroarrow @Lost_Signal