vsan 6.6 Performance Improvements First Published On: Last Updated On:

Similar documents
Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Adaptive Resync in vsan 6.7 First Published On: Last Updated On:

What's New in vsan 6.2 First Published On: Last Updated On:

Performance Testing December 16, 2017

vsan Space Efficiency Technologies First Published On: Last Updated On:

Understanding Data Locality in VMware vsan First Published On: Last Updated On:

VMware Virtual SAN 6.2 Space Efficiency Technologies

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

vsan Mixed Workloads First Published On: Last Updated On:

Intelligent Rebuilds in vsan 6.6 January 08, 2018

Microsoft SQL Server 2014 on vsan 6.2 All-Flash December 15, 2017

Nimble Storage Adaptive Flash

vsan Remote Office Deployment January 09, 2018

VMware vsphere Clusters in Security Zones

Eliminate the Complexity of Multiple Infrastructure Silos

Native vsphere Storage for Remote and Branch Offices

vsan Security Zone Deployment First Published On: Last Updated On:

Understanding Data Locality in VMware Virtual SAN

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Microsoft SQL Server 2014 on VMware vsan 6.2 All-Flash October 31, 2017

vsan Stretched Cluster Bandwidth Sizing First Published On: Last Updated On:

LATEST INTEL TECHNOLOGIES POWER NEW PERFORMANCE LEVELS ON VMWARE VSAN

Pivot3 Acuity with Microsoft SQL Server Reference Architecture

Online Transaction Processing Workloads First Published On: Last Updated On:

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

vsan All Flash Features First Published On: Last Updated On:

Virtualization of the MS Exchange Server Environment

DataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage

Running VMware vsan Witness Appliance in VMware vcloudair First Published On: April 26, 2017 Last Updated On: April 26, 2017

VMWARE VSAN LICENSING GUIDE - MARCH 2018 VMWARE VSAN 6.6. Licensing Guide

Free up rack space by replacing old servers and storage

Webinar Series: Triangulate your Storage Architecture with SvSAN Caching. Luke Pruen Technical Services Director

vsan Management Cluster First Published On: Last Updated On:

StorMagic SvSAN 6.1. Product Announcement Webinar and Live Demonstration. Mark Christie Senior Systems Engineer

VMware vsan 6.6. Licensing Guide. Revised May 2017

INTEL NEXT GENERATION TECHNOLOGY - POWERING NEW PERFORMANCE LEVELS

A Micron Reference Architecture. Micron Accelerated All-Flash NVMe and SATA vsan 6.6 Solution. Reference Architecture

Micron Accelerated All-Flash VMware vsan 6.6 on HPE ProLiant DL380 Gen10 Server Solution

Emulex LPe16000B 16Gb Fibre Channel HBA Evaluation

Condusiv s V-locity VM Accelerates Exchange 2010 over 60% on Virtual Machines without Additional Hardware

Micron Accelerated All-Flash SATA vsan 6.7 Solution

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

SolidFire and Ceph Architectural Comparison

Take control of storage performance

VMware Virtual SAN. High Performance Scalable Storage Architecture VMware Inc. All rights reserved.

IBM Emulex 16Gb Fibre Channel HBA Evaluation

VMware Virtual SAN. Technical Walkthrough. Massimiliano Moschini Brand Specialist VCI - vexpert VMware Inc. All rights reserved.

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

XenApp and XenDesktop 7.12 on vsan 6.5 All-Flash January 08, 2018

The HP 3PAR Get Virtual Guarantee Program

VSAN Virtual Desktop Infrastructure Workload First Published On: Last Updated On:

Building Your Own Robust and Powerful Software Defined Storage with VMware vsan. Tips on Choosing Hardware for vsan Deployment

Reasons to Deploy Oracle on EMC Symmetrix VMAX

ProphetStor DiskProphet Ensures SLA for VMware vsan

Delivering HCI with VMware vsan and Cisco UCS

vsan Stretched Cluster & 2 Node Guide January 26, 2018

How Architecture Design Can Lower Hyperconverged Infrastructure (HCI) Total Cost of Ownership (TCO)

Extreme Storage Performance with exflash DIMM and AMPS

What's New in VMware vsan 6.6 First Published On: Last Updated On:

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

NEXGEN N5 PERFORMANCE IN A VIRTUALIZED ENVIRONMENT

Modern hyperconverged infrastructure. Karel Rudišar Systems Engineer, Vmware Inc.

Reference Architecture: Lenovo Client Virtualization with VMware Horizon and System x Servers

Fujitsu PRIMEFLEX for VMware vsan 20,000 User Mailbox Exchange 2016 Mailbox Resiliency Storage Solution

Storage Strategies for vsphere 5.5 users

VMWARE VIRTUAL SAN: ENTERPRISE-GRADE STORAGE FOR HYPER- CONVERGED INFRASTRUCTURES CHRISTOS KARAMANOLIS RAWLINSON RIVERA

Copyright 2013 EMC Corporation. All rights reserved. FLASH NEXT: Zero to One Million IOPs In A Flash

vsan Health Check Improvements First Published On: Last Updated On:

VMware vsan 6.7 Technical Overview First Published On: Last Updated On:

Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA

SolidFire and Pure Storage Architectural Comparison

The next step in Software-Defined Storage with Virtual SAN

VMware vsan Design and Sizing Guide First Published On: February 21, 2017 Last Updated On: April 04, 2018

WHITE PAPER. Optimizing Virtual Platform Disk Performance

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

Administering VMware Virtual SAN. Modified on October 4, 2017 VMware vsphere 6.0 VMware vsan 6.2

Benefits of Automatic Data Tiering in OLTP Database Environments with Dell EqualLogic Hybrid Arrays

Virtualized SQL Server Performance and Scaling on Dell EMC XC Series Web-Scale Hyper-converged Appliances Powered by Nutanix Software

2014 VMware Inc. All rights reserved.

Hedvig as backup target for Veeam

IOmark-VM. VMware VSAN Intel Servers + VMware VSAN Storage SW Test Report: VM-HC a Test Report Date: 16, August

vrealize Operations Management Pack for vsan 1.0 Guide

Managing Performance Variance of Applications Using Storage I/O Control

Increasing Performance of Existing Oracle RAC up to 10X

Why Datrium DVX is Best for VDI

DELL EMC VXRAIL TM APPLIANCE OPERATIONS GUIDE

Administering VMware vsan. Modified on October 4, 2017 VMware vsphere 6.5 VMware vsan 6.6.1

Considerations to Accurately Measure Solid State Storage Systems

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

Performance Testing of SQL Server on Kaminario K2 Storage

VMware vsan 6.0 Performance First Published On: Last Updated On:

Hyper-Convergence De-mystified. Francis O Haire Group Technology Director

The Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput

Infinio Accelerator Product Overview White Paper

VMware Virtual SAN Technology

VxRail: Level Up with New Capabilities and Powers GLOBAL SPONSORS

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage

SC Series: Performance Best Practices. Brad Spratt Performance Engineering Midrange & Entry Solutions

HCI: Hyper-Converged Infrastructure

RADICALLY SIMPLE HIGHEST PERFORMANCE LOWEST COST R E ADY FO R ANY APP, ANY SCALE

Transcription:

vsan 6.6 Performance Improvements First Published On: 07-24-2017 Last Updated On: 07-28-2017 1

Table of Contents 1. Overview 1.1.Executive Summary 1.2.Introduction 2. vsan Testing Configuration and Conditions 2.1.Hardware and vsan Configurations 2.2.Measurements 3. Performance Comparisons and Findings 3.1.Large Working Set 3.2.Small Working Set 3.3.Latency Improvements 3.4.Additional Performance Benefits 4. Enhancements behind Performance Improvements 4.1.Enhancements in vsan 6.6 4.2.Summary 2

1. Overview Provides an executive summary, and an introduction to the objective of the document 3

1.1 Executive Summary Executive Summary VMware vsan 6.6 delivers a new level of performance and consistency of VMs running in VMware's integrated Hyper-Converged Infrastructure (HCI) platform. Driving additional performance entirely through a software update, with no additional costs in hardware or software, demonstrates the power of a software defined data center. The test results shared in this document show the ability for vsan 6.6 to deliver a 50%, or greater, performance improvement under a variety of workload conditions, using common data services such as deduplication, compression, and data integrity features such as software checksum. vsan 6.6 not only demonstrates improved IOPS and throughput with decreased latency as compared to vsan 6.5, but it does so with more consistent results. Lower and more predictable latencies are what application owners expect in delivering solutions and addressing their business objectives. Faster, more efficient delivery of storage in vsan allows organizations to run more workloads per host while still meeting performance expectations. 1.2 Introduction Introduction vsan 6.6 introduced several improvements that aim to improve the efficiency and performance of vsan. Some enhancements focused on data management improvements. These optimizations improved the efficiency of maintaining the health and balance of data on vsan. These improvements have been detailed in the "VMware vsan 6.6 - Intelligent Rebuilds" Technical Note. This document focuses on the performance improvements as a result of enhancements that allow vsan to deliver improved storage I/O performance when using the very same underlying hardware running vsan 6.5. These improvements are not limited to just I/O activity related to read and write requests from the VM, but also data management activity such as resynchronization and rebuilds. 4

2. vsan Testing Configuration and Conditions Details the hardware and software settings used when running these tests in a controlled environment. 5

2.1 Hardware and vsan Configurations vsan Testing Configuration and Conditions The objective of the testing described is to observe the comparative advantages of vsan 6.6 over vsan 6.5. The test methods used for evaluation comprise of a series of micro-benchmarks to evaluate the performance improvements of data services (e.g. checksum), space efficiency features (Deduplication & Compression) and fault tolerance methods (FTM) independently, and together. The battery of tests were run against a large working set of data, and a small working set of data. The combination of these conditions provide the best cross section of conditions necessary to evaluate the improvements in performance in vsan 6.6. Hardware and vsan Configurations The test conditions for all comparisons between vsan 6.5, and vsan 6.6 remained the same, and are as follows: Hardware Physical Hosts: SuperMicro Sys-2028u-TNRT+ with Intel Haswell-EP E5-2670 v3 processors Host count: Four hosts comprising one vsan cluster Disk groups: Each host consist of two disk groups. Each disk group has a single 400GB Intel P3700 NVMe flash device, and three 800GB Intel S3500 SATA SSD flash devices. Software Hypervisor versions tested: vsphere 6.5 versus vsphere 6.5 EP2 (aka vsan 6.6). In order to sufficiently accommodate the larger working sets, the vsan 6.5 configuration in this test used adjustments similar to the manual settings described in VMware vsan 6.0 Performance: Scalability and Best Practices. Worker VMs: A quantity of 8 VMs were used on each host, with one virtual disk per VM Data placement: Working set of data is split across the virtual disks on each host Data uniqueness: In any testing involving deduplication and compression, the workload buffer is configured for 0% of the data to be deduped, or compressed. Working sets tested Two sizes were used for working sets on each test scenario. The large and small working set sizes attempt to simulate two distinct conditions commonly found with production workloads. Large working sets reflect a considerable amount of active, or hot data relative to the size of any caching or buffering tiers. Small working sets reflect a smaller amount of hot data relative to the size of any caching or buffering tiers. Large working set configuration: This configuration consists of a working set size of 1680GB per host, and occupies about 60% of the total storage capacity of the vsan cluster. With two disk groups per host, the 400GB NVMe drive in each disk group does not allow for the working set to fit in the 800GB buffer space per host. This creates a large, sustained amount of destaging activity after the test is ramped up. Small working set configuration: This configuration consists of a working set size of 200GB per host for random I/Os, and 400GB per host for sequential I/Os. This size allows the working set to fit within the capacity of the 2, 400GB NVMe buffering devices on each host. Just as with the large working set, the test data collected is are all based on steady state numbers. The definition of large and small working set sizes are for the purposes of these test scenarios only. In production environments, determining working set sizes, and their respective cache coherency can be challenging. For this reason, the primary focus of the test results will be on the large working set tests, which will generate the most conservative test results. 6

Data Services and protection schemes tested vsan is an object based storage system. Unlike traditional storage systems, this allows data services such as software checksum, or a failure tolerance method (FTM) to be assigned as a storage policy on a per VM, or VMDK basis. Each test case was run under four distinct configurations. Table 1 shows the combination of testing used against vsan 6.5 and vsan 6.6. All testing used the same hardware, and the same test methods for each scenario. Table 1. Conditions tested Other combinations of data services and protection schemes exist, but this test was limited to the scenarios listed in Table 1. Access patterns tested Each test condition included the following I/O patterns. Random Read Random Write Mixed random Read/Write (70/30%) Production workloads will often have a mix of these patterns throughout the course of a given time period. In most cases, the aggregation of multiple workloads across a collection of hosts will blend discrete patterns into a mix of random reads and random writes. This is the primary reason why mixed, random I/O is the most prevalent pattern in production, virtualized environments, and is the area of emphasis in the analysis of the test results. 2.2 Measurements Measurements Testing of performance measures data activity in the form of I/O commands successfully completed per second (IOPS). IOPS is an aggregation, or sum of the total IOPS generated by worker VMs. Latency is the time in microseconds (us) or milliseconds (ms) and may be referred to for readability. Latency defines the time needed to complete a read or a write operation. Latency data reported is an average. Throughput, which is the amount of payload transmitted per second, is omitted for clarity. All random I/O tests used 4K for an I/O size. Test results provided in this document are reported as a percentage of change when comparing vsan 6.5 to vsan 6.6. The test results in this document refer to data collected from an all-flash cluster. The performance improvements included in vsan 6.6 will apply to both hybrid, and all-flash based vsan clusters, but the degree of improvement will be different due to caching considerations, as well as the physical properties of spinning disks. Deduplication and Compression is a feature exclusive to all-flash based configurations and enterprise based vsan licensing. 7

3. Performance Comparisons and Findings Provides the test results in a number of different ways to demonstrate the performance differences between vsan 6.5, and vsan 6.6. 8

3.1 Large Working Set Large Working Set Tests using a large working set are designed to simulate workloads that have a large amount of data that is active, thereby flushing out any recently accessed data preserved through caching, or buffering to a faster tier of storage. Characteristics of a large working set typically involves large amounts of sustained writes, with read requests for data that is well outside of the capacity of any buffering and caching tier. Large amounts of sustained writes, which contribute to generating a large working set, generally represent the most lowest performance results for any storage system. Testing large working sets helps establish a minimum level of performance and consistency. Overview of Improvement The following test results summarize the changes in effective performance under the conditions outlined in the vsan Testing Configuration and Conditions section in this document. Test results of IOPS and latency against multiple I/O patterns are measured as a percentage of improvement from vsan 6.5 to vsan 6.6. Smaller bars do not indicate less performance, but simply less improvement. Figure 1 shows the percentage of performance improvement under a variety of different I/O patterns, using a large working set of data. It is easy to see that improvements are greater as more data services are used. The most significant improvements come with all tested data services (RAID-5 Erasure Coding, checksum, Dedup & Compression) enabled. All random patterned I/O (random read, random write, and mixed) demonstrated significant improvement in vsan 6.6. The mixed workload using all data services improved by 63.5%. The same workload, but using just checksum and RAID-5, improved by 56.3%. Figure 1. Improvement in IOPS for vsan 6.6 over vsan 6.5 Large working sets 9

The reductions in latency shown in Figure 2 generally correspond with IOPS improvements described in Figure 1. When using a large working set with a mixed workload using all data services, latency was reduced by 39.8%. This same test with using just RAID-5 and checksum yields a latency reduction of 37.4%. Notable improvements also show up in the random write test, where a latency reduction of 20.2% occurs when using all data services, and a 35.1% reduction when using just RAID-5 and checksum. Figure 2. Reduction in latency for vsan 6.6 over vsan 6.5 Large working sets The reductions in latency are the best measurement to understand how real production workloads can benefit from the performance enhancements. Results of common I/O patterns For large working sets, the data above shows that the mixed workloads saw the most improvement. Two I/O pattern types will be examined in more detail to to better understand the results. Figure 3 shows that for a 70/30 mixed workload using a large working set, a 28.5% and 63.5% increase in IOPS was observed when using data services. This allowed vsan 6.6 to drive more IOPS with all data services than vsan 6.5 using just RAID-5 and checksum. 10

Figure 3. Improvement of IOPS for a 70R/30W mixed workload with a large working set In Figure 4, the latency improvements for those corresponding IO improvements are equally impressive, ranging between 22.5% and 39.8% reduction in latency. This allowed vsan 6.6 to provide less latency for all services (Deduplication & Compression, RAID-5, and checksum) than vsan 6.5 using just RAID-5 and checksum. Figure 4. Reduction in latency for a 70R/30W mixed workload with a large working set 11

Long periods of random writes using a large working set is a challenging I/O pattern to improve upon. With the improvements shown in Figure 5, vsan 6.6 was able to deliver more IOPS with using RAID-5 and checksum than the same test run on vsan 6.5 using just checksum. Figure 5. Improvement in IOPS for a random write workload with a large working set This type of I/O scenario is particularly difficult to improve upon because it is largely dependent on the constraints and characteristics of the physical hardware. The reduction in latency shown in Figure 6 allowed vsan 6.6 to deliver lower effective latency with using RAID-5 and checksum, than the same test run on vsan 6.5 using just checksum. 12

Figure 6. Reduction in latency for a random write workload with a large working set While large amounts of random write operations sustained across a long period of time can be difficult for any storage system, it is important to recognize the effective improvements made in reducing latency, especially when multiple data services are used. Figure 5 and Figure 6 show that for tests running only checksum, the improvement was less significant than the improvement with other data services. The result in this specific test case is due to RAID-1 (mirroring) being constrained by the limit of components the I/Os are written to. In the checksum test, RAID-1 with checksum is written to two mirrored components. Writing data to two components triggered congestion, which reduced the level of improvement. In the checksum with RAID-5 test, the data was striped across four components (3 data + 1 parity). Writing the data across four components eliminated the component congestion, and demonstrated a significant increase in IOPS, and reduction in latency. Summary of Analysis Several observations can be taken from the data provided of these tests against large working sets. All profiles of random I/O generation demonstrated significant improvement in performance. We can see clearly that in the case of mixed I/O patterns, vsan 6.6 delivered over a 60% increase in IOPS, and nearly a 40% reduction in latency when comparing the same test between vsan 6.5. vsan 6.6 is often able to drive higher performance (more I/O, less latency), using a broader set of data services than vsan 6.5 using a reduced set of data services. vsan 6.6 demonstrates a consistent, reduction of impact on performance with the use of data services and protection policies. Areas of benefit Mixed workload patterns are the most common I/O profile found in virtualized environments, and where vsan 6.6 reports some of the most significant performance improvements. Pairing this with 13

applications that have large working sets, the following are some examples of environments and applications that may see the most benefit. Large batch processes from databases, and structured data sets. ERP systems often have this type of activity. Transactional applications. These are applications that have a serialized I/O dialog in which the next write is waiting on the previous read. Latency is critical in these types of applications. Unstructured data. File servers, and data warehouse environments. High density of VMs per physical host. This randomizes I/O patterns, and has a greater potential for contention as a result of bursts from more VMs. Latency sensitive applications. These often have strict service level requirements defined in deployment guides. 3.2 Small Working Set Small Working Sets Tests using a small working set of data are designed to simulate workloads that have a set of data that fits within a given boundary of allocated caching and buffering of the storage system. Characteristics of this small working set typically involves a blend of recently accessed data, read or written, that remains in the caching tier for subsequent reads. Modest sized working sets are more representative of a general-purpose workload that have relatively frequent, repeating tasks over a period of time, or duty cycle. Despite small working sets not being representative of every workload, including them in a battery of tests serves a very specific purpose when testing software optimizations like those made in vsan 6.6. In many cases, test results from smaller working sets better represent optimizations made to a software stack, as it is less reliant on characteristics and constraints of hardware components used in a specific test. Testing small working sets in conjunction with large working sets also helps demonstrate effective performance when application workloads and workflows are able to take advantage of cached content. Caching is an important aspect of any data center, and exists in compute, network, and storage. Overview of Improvement The following test results summarize the changes in effective performance under the conditions outlined in the vsan Testing Configuration and Conditions section in this document. Test results of IOPS and latency against multiple I/O patterns are measured as a percentage of improvement from vsan 6.5 to vsan 6.6. Smaller bars do not indicate less performance, but simply less improvement. Figure 7 illustrates substantial improvements across a variety of I/O profiles when using a smaller working set size. The most significant improvements are a result of when data services such as checksum, RAID-5, and Dedup & Compression are enabled. Just as with the testing with a large working set, all random patterned I/O (random read, random write, and mixed) demonstrated significant improvement in vsan 6.6. For a mixed workload, using all data services improved by 52.3%. The same workload, but using just checksum and RAID-5, improved by 86%. Particularly interesting is the improvement in performance of random writes. Using all data services, there was a 48% improvement in performance. The same workload, but using just checksum and RAID-5 showed a 74.4% improvement. 14

Figure 7. Improvement in IOPS for vsan 6.6 over vsan 6.5 Small working sets The improvements in latency illustrated in Figure 8 shows reductions that mirror performance increases of IOPS described in Figure 7. When using a small working set with a mixed workload, and all data services saw a 35.6% reduction in latency. The same test using just RAID-5 and checksum saw a 47.7% reduction. Latencies on random writes were also improved dramatically, where latency reductions of 33.4% were observed when using all data services, and a 44% reduction when using just RAID-5 and checksum. Note that when no data services are used, there is less opportunity for improvement in data path optimization. An example of this is shown in Figure 8, but can also be found in other test results in this document. This specific test result shows a slight regression for random reads when no data services are used. The actual latency measurements for this specific test were 830us (microseconds) compared to 833us, which equates to a 0.36% increase in latency. These small variances (+/-) can be the result of fluctuating behaviors throughout the entire stack, including all hardware components, and fall well within the range of variability across multiple test runs, even under extremely controlled environments. 15

Figure 8. Reduction in latency for vsan 6.6 over vsan 6.5 Small working sets Latency reductions using the smaller working set can be representative of the effective performance improvement that some workloads in a production environment would see. Results of common I/O patterns Much like the large working sets, the data above shows that when using smaller working sets, mixed workloads saw the most improvement. Two I/O pattern types will be examined in more detail to to better understand the results. For a 70/30 mixed workload using a smaller working set, we see between a 52.3% and 86% increase in IOPS when using data services. With the dramatic increases shown in Figure 9, this allowed vsan 6.6 to drive almost as many IOPS when using all data services (Deduplication & Compression, RAID-5, and checksum) as vsan 6.5 using just checksum. 16

Figure 9. Improvement in IOPS for a mixed workload with a small working set The latency reductions for those corresponding IO improvements reinforce the performance gains shown in Figure 9. These latency reductions ranged from 35.6% to 47.7%. The amount of reduction in latency shown in Figure 10 allowed vsan 6.6 to run all data services (Deduplication & Compression, RAID-5, and checksum) at about the same latency to vsan 6.5 using just checksum. Figure 10. Reduction in latency for a random write workload with a small working set Random writes using a smaller working set exposes more of the performance gains built into vsan 6.6 than the same test using a large working set. As shown in Figure 11, improvements ranged between 25.3% and 75.5% when using data services. 17

Figure 11. Improvement in IOPS for a random write workload with a small working set Latency reductions for random writes were significant. When using data services, Figure 12 shows latency reductions ranging from 21.7% to 44%. Not only were the latencies lower in vsan 6.6, but there is less of a variance between latency when selecting the respective data services. Figure 12. Reduction in latency for a random write workload with a small working set While large amounts of sustained random write operations can be a challenge for any storage system, the results showing the improvements when testing a small working set of data is most representative of production workloads that have short duty cycles with a modestly sized working set. 18

Summary of Analysis Running the same battery of tests using a smaller working set provides a number of interesting observations. All profiles of random I/O generation demonstrated significant improvement in performance. In cases of mixed I/O patterns, there were increases in IOPS of up to 86%, and reductions of latency of up to 47.7% when comparing the same test between vsan 6.5 and vsan 6.6. vsan 6.6 is consistently able to drive higher performance (more I/O, less latency), using a broader set of data services than vsan 6.5 using a limited set of data services. Significant performance improvement on checksum were realized for a larger variety of reads and writes. This is even more visible in a smaller working set. vsan 6.6 demonstrates a consistent, reduction of impact on performance with the use of data services and protection policies. Testing smaller working set sizes showcases the potential benefits that can occur with workloads that have smaller working sets as a result of an overall smaller footprint of data, or shorter duty cycles. Areas of benefit Just as with workloads using large working sets, applications that have a mixed I/O pattern combined with a smaller working set will see significant improvements in vsan 6.6. Applications and scenarios that would see benefit would include, but are not limited to the following. Multi-tier, and scale out applications. These types of application architectures tend to have smaller working sets of data dispersed across the application nodes. VDI environments. These environments have a number of different workload profiles depending on what task is being performed. Transactional applications. These are applications that have a serialized I/O dialog in which the next write is waiting on the previous read. Latency is critical in these types of applications. High density of VMs per physical host. This randomizes I/O patterns, and has a greater potential for contention as a result of bursts from more VMs. The reduction in latency will allow for VMs to deliver a lower, more consistent level of latency. Latency sensitive applications. These often have strict service level requirements defined in deployment guides. 3.3 Latency Improvements Latency Improvements Delivering minimal latency consistently over long periods of sustained activity is a challenge for any storage architecture. Latency of a VM is the best way to measure if the underlying infrastructure is able to deliver adequate performance to an application. Higher latency translates to a longer time a task takes to be completed. Predictable, low latency over a period of time is a desired result for application owners, but can be difficult for storage systems to deliver. vsan 6.6 makes significant improvements in this area. Tests that ran for a duration of 4 hours showed significant improvement in the predictability and consistency of latency in vsan 6.6. When viewing the range of latencies, (low mark to 95th percentile peak), vsan 6.6 exhibited over an 80% reduction in latency deviation as compared to vsan 6.5. 3.4 Additional Performance Benefits 19

Additional Performance Benefits An increase in IOPS with a reduction in latency can also represent an additional benefit not explicitly measured in this testing. This benefit would show in the form of a reduction of host CPU utilization when comparing steady state, non-synthetic workloads, which is more representative of production environments. Synthetic testing is meant to stress resource usage, and thus, free CPU cycles will be used for processing additional I/O. With steady state workloads, I/O's would be processed more quickly, which reduces the length of time in which host CPU resources are being utilized. Reducing host resources could increase the density of running VMs that could be achieved on a host. 20

4. Enhancements behind Performance Improvements Describes the specific enhancements that had the most influence on the improved test results in vsan 6.6 21

4.1 Enhancements in vsan 6.6 Enhancements behind Performance Improvements The test comparisons provided in this document showcase the improvements in performance as a result of four specific enhancements. Checksum optimizations. Checksum is a technique that provides an additional layer of data integrity for data in flight, or at rest. Checksum is a storage policy setting, applied per object, and is turned on unless explicitly disabled in a policy. Improvements with checksum come from several optimizations. The improvements result in gains in performance on both read and write operations. Write buffer destaging optimizations. Destaging is the act of moving recently written data from the write buffer to the capacity tier. Data coming into the buffer can be place more efficiently, which helps reduce the fill rate and usage of the buffer. More proactive destaging of data helps reduce meta-data build up that could impact guest I/O or resync operations. This helps scenarios with large numbers of deletes, which invoke metadata writes. More aggressive destaging can help in write intensive environments, especially when using RAID-5/6, reducing times in which vsan must experiences "Congestions," vsan's built in mechanism of controlling contention in the storage system. Deduplication and Compression improvements. Deduplication and Compression is a clusterwide feature in vsan that improves space efficiency, and is applied as data is destaged from the write buffer to the capacity tier. vsan 6.6 changes the approach in ordering of the data to be destaged. This offers more predictable performance, especially with sequential writes. Object management improvements. vsan is an object based storage system, and manages its duties by the use of an object manager. vsan 6.6 has tuned the use of memory by the object manager to help reduce the amount of CPU overhead. This optimization reduces heavy context switching of cache and CPU, making delivery if I/O more efficient. These performance benefits outlined also extend to back-end vsan management activities such as rebalancing, resyncing, and repairing of objects. These are activities that will happen in any environment, regardless of the workload type. 4.2 Summary Summary vsan 6.6 provides a substantial improvement in performance over vsan 6.5, all with the simple click of a software upgrade. Better performance will allow administrators to provide higher, more consistent levels of performance for the applications and services powered by vsan, and gives environments the ability to absorb increases in performance and efficiency as more workloads are introduced to a vsan powered environment. About the Author This content in this document was assembled using data collected from extensive testing efforts by the VMware Performance Engineering Team. You can read more from the Performance Engineering team on their blog at VMware VROOM! Pete Koehler is a Sr. Technical Marketing Manager, working in the Storage and Availability Business Unit at VMware, Inc. He specializes in enterprise architectures, data center analytics, software-defined storage, and hyper-converged Infrastructures. Pete provides more insight to challenges of the data center at vmpete.com, and VMware s Virtual Blocks. He can also be found on twitter at @vmpete. 22