Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware 2010 VMware Inc. All rights reserved
About the Speaker Hemant Gaidhani Senior Technical Marketing Manager, VMware Capacity Management, Mastermind Author, Virtualizing Microsoft Tier-1 Apps with VMware vsphere 4
Performance & Scalability in Virtual Environment Understanding Virtualization Performance Performance Benchmarking in Virtual Environment Performance Monitoring in Virtual Environment Virtual better than Physical?
Real Life Examples Physical to Virtual Comparison Physical machine: 2 socket, dual core server 2 GB RAM Virtual machine: (default settings) 1 virtual CPU 256 MB RAM Comparing different virtualization products VMware Server Hosted architecture VMware ESX Server Bare-metal architecture (hypervisor)
Real Life Examples (Continued) Server Consolidation or VDI benchmarks Multiple under-utilized physical machines migrated to virtual machines Over-commitment of resources as server utilization goes up CPU Memory In over-committed environments, virtual machines can start to starve each other performance at the cost of another virtual machine
Performance in Virtualized Environment Two dimensions Vertical performance monitor efficiency (VMM) - Single VM performance - monitor overhead Horizontal performance scheduling (VMKernel) - Multi VM performance - Scheduling overhead - often overlooked VCPU VCPU VCPU VCPU Hypervisor Hypervisor PCPU PCPU
Horizontal Performance Some examples Ready time VMs don t get to run all the time even if they want to Migration/Switching overhead VMs will have to be context switched out for performing critical tasks or for scheduling another VM CPU cache contention Multiple VMs share the same physical CPU NUMA effects VM could be temporarily migrated to remote Node VCPU Memory Node 0 CPU Node 0 VCPU Hypervisor PCPU Memory Node 1 CPU Node 1 VCPU
SQL Server Scale-Out Overcommittment Fairness Percentage of Aggregate throughtput for Each VM 100.0 87.5 75.0 62.5 50.0 37.5 25.0 12.5 0.0 VM8 VM7 VM6 VM5 VM4 VM3 VM2 VM1 Throughput distribution for 8 X 2-vCPU VMs " Fair distribution of resources to all eight VMs
Third Dimension Distributed Resource Scheduling (DRS) Reduces the impact of horizontal performance issues However aggressive migration impacts performance VM VM VM VM VM VM Hypervisor Host vmotion Virtual Infrastructure Hypervisor Host
vsphere Scaling Trends ESX 2 ESX 3 ESX 3.5 ESX 4 30% - 60% 20% - 30% <10% - 20% <2% - 10% Overhead 1 vcpu 2 vcpu 4 vcpu 8 vcpu % of Applications < 4 GB 380 Mb/s < 10,000 16 GB 800 Mb/s 20,000 64 GB 9 Gb/s 100,000 255 GB 30 Gb/s > 350,000 Application Performance Requirements
Virtualization Architectures Hosted architecture VMware Server/ Workstation/ Fusion Bare-metal architecture (hypervisor) VMware ESX Server VMware vsphere Hypervisor free version
VMware ESX Architecture Guest Guest TCP/IP File System CPU is controlled by scheduler and virtualized by monitor Monitor Monitor (BT, HW, PV) Monitor supports: BT (Binary Translation) HW (Hardware assist) PV (Paravirtualization) Virtual NIC Virtual SCSI VMkernel Scheduler Memory Allocator Virtual Switch NIC Drivers File System I/O Drivers Memory is allocated by the VMkernel and virtualized by the monitor Physical Hardware Network and I/O devices are emulated and proxied though native device drivers
HW Assist CPU Architecture Intel Intel VT-x Available since: 2006 AMD AMD-V Available since: 2006 Prior to these hardware technologies binary translation was used. First implementations of hardware assist were slower than binary translation in software.
Hardware Assist (HWmmu) Memory Architectures Intel Extended Page Tables (EPT) Available since: 2009 Supported in ESX4.0 + Nehalem or better AMD Rapid Virtualization Indexing (RVI) Available since: 2008 Supported in ESX3.5 + Shanghai or better Prior to these hardware technologies shadow paging (or SWmmu) was used.
vsphere Unlocks Processing Cores for Applications VMware ESX Scaling: Keeping up with core counts Virtualization provides a means to exploit the hardware s increasing parallelism Most applications don t scale beyond 4/8 way
Agenda Understanding Virtualization Performance Performance Benchmarking in Virtual Environment Performance Monitoring in Virtual Environment Virtual Better than Physical?
Performance Benchmarking in Virtual Environment The standard performance benchmarking guidelines still hold true Out-of-the-box performance optimization for vsphere No tuning required Virtualization overhead depends upon the workload Virtualization does not create new resources enables more efficient utilization of existing resources Cannot break the laws of physics
Virtual Environment Implications Guest OS metrics Performance metrics in the guest could be skewed when the rate of progress of time is skewed Guest OS resource availability can give incorrect picture Resource availability Resources are shared, hypervisors control the allocation Virtual machines may not get all the hardware resources Performance Profiling Hardware performance counters are not virtualized Applications cannot use hardware performance counters for performance profiling in the guest Virtualization moves performance measurement and management to the hypervisor layer
Benchmarking Options Benchmarking Methodology Fair and consistent Avoid Apples to Oranges comparison Careful selection of benchmarks Several public and custom benchmarks available Best benchmark is the one that simulates your workload Some benchmarks may provide inconsistent, unpredictable results in virtual environments Typical physical benchmarks are single workload benchmark Does not directly apply to virtual environment Consider VMmark: A Scalable Benchmark for Virtualized Systems
Focus on Storage Storage Virtualization Concepts Storage array consists of physical disks that are presented as logical disks (storage array volumes or LUNs) to the ESX Server. " Storage array LUNs formatted as VMFS volumes. " Virtual disks presented to the guest OS; can be partitioned and used in guest file systems. D:\SQLdata
Server Consolidation: Storage Planning Physical setup: each instance provided 5-spindle LUN SQL Win2k3 SQL Win2k3 SQL Win2k3 ESX Server vsphere ESX Server 5 Disks 5 Disks 5 Disks VMDK VMDK VMDK Virtual architecture: Each VM provided its own VMDK But now do they map to disks?
Sequential Workloads Generate Random Access As observed in VMFS scalability tests
Creating Virtual Machines Two options P2V Create a fresh VM - Recommended Select correct guest OS when creating virtual machine Install latest version of VMware tools Disable unused devices Serial and parallel ports USB devices Floppy drive, CD-ROM/ DVD-ROM
Configuration Guidelines Smaller VMs recommended SMP VMs have co-scheduling overheads Avoid over-commitment Linear scalability until CPU or memory are over-committed Do NOT disable vsphere optimization features Several optimizations for CPU, memory, Storage and Networking to improve performance in shared environment Leverage para-virtualized drivers Storage (pv-scsi), Network (vmxnet3)
Run-time Guidelines Idle Virtual Machines Turn off virtual machines that are not required Do consume resources Benchmark Clients Can use virtual machines Pay attention to time keeping issue if VMs are used Measuring end user response time Will include network latency Remote performance monitoring
Virtualizing Application Stack Multi-tier applications Obviously NOT virtualizing to run single VM per server So how do you mix-n-match Overheads multiply Orchestra analogy Performance benchmarking Virtualize one at a time Virtualize all at once and then optimize
Mix-n-Match Application Stack Components Understand the load profile of the different components in the application stack Place the VMs accordingly Do not combine all CPU or I/O or memory intensive VMs on the same server Horizontal scalability or scaling out Do not place similar multiple VMs on the same ESX server Similar load profile can result in bottlenecks Lead to single point of failure in application stack Do not have to virtualize all components
VMmark Benchmark Consider VMmark: A Scalable Benchmark for Virtualized Systems Provides meaningful measurement of virtualization performance. Generates an easily understandable and precise metric that scales with underlying system capacity. Used to effectively compare the suitability and performance of different hardware platforms for virtual environments. Employs realistic, diverse workloads running on multiple operating systems VMware worked with SPEC to establish industry-wide credibility and standardization
VMmark: Mixed Workloads Mail Files Web OLTP Database Java Order Entry 6 VMs One Tile ESX
Benchmark Topology Client 0 Files Mail Mail Client 1 Files Mail Web Web OLTP Database Java Order Entry Files OLTP Database Java Order Entry Client 2 Web OLTP Database Java Order Entry ESX 18 VMs Three Tiles
Agenda Understanding Virtualization Performance Performance Benchmarking in Virtual Environment Performance Monitoring in Virtual Environment Virtual Better than Physical?
Troubleshooting Tools vcenter (high level) Historical performance data (*check statistics levels) Consolidated metrics for all hosts / datastores in environment vscsistats (storage guts) Detailed virtual SCSI device latency metrics Seek distance, IO size, Latency Displays Histograms esxtop / resxtop (tactical) Single ESX/ESXi Host Detailed performance data in real time
Performance Monitoring Guidelines
Agenda Understanding Virtualization Performance Performance Benchmarking in Virtual Environment Performance Monitoring in Virtual Environment Virtual Better than Physical?
Virtual better than Physical? Workaround application stack scalability limits Components of application stack do not scale in physical environment Virtualize and achieve higher scalability Higher ROI despite virtualization overheads Examples Citrix Presentation Server 32-bit 32-bit operating system limitations prohibit scaling Java Virtual Machines Heap size limits on 32-bit Windows operating system
Multi-VM Approach to Scalability: Building Blocks Some applications have limited SMP scalability Web Server, Mail Server, Java Virtual Machines Virtualization onto the ESX platform enables horizontal scaling Concurrently run several operating environments Advanced Resource Management features allow efficient sharing of compute, network and storage resources Let s look at examples : Exchange, Apache & Citrix XenApp
Virtualization-aware Architecture: Building Blocks Many applications lack scalability beyond certain CPUs Apache web server, WebSphere, Exchange Configure vcpus to application scalability limits For additional capacity instantiate more of such VMs SPECweb2005 aggregate metric SPECweb2005 Native and Virtual Scaling 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 1 2 3 4 5 6 Number of Virtual Machines or CPUs Virtual Native
Citrix XenApp\ Microsoft Terminal Services Physical XenApp Deployment Virtualized XenApp Deployment XenApp XenApp XenApp XenApp XenApp Windows OS Windows OS Windows OS Windows OS Windows OS VMware vsphere
Example Migration Scenario 4_4_0_0 with DRS Balanced Cluster
DRS Scalability Transactions Per Minute (Higher the better) Already balanced So, fewer gains Transactions per minute - DRS vs. No DRS No DRS DRS Higher gains (> 40%) with more imbalance Transaction per minute 140000 130000 120000 110000 100000 90000 80000 70000 60000 50000 40000 2_2_2_2 3_2_2_1 3_3_1_1 3_3_2_0 4_2_1_1 4_2_2_0 4_3_1_0 4_4_0_0 5_3_0_0 Run Scenario
DRS Scalability Application Response Time (Lower the better) 70.00 Transaction Response Time - DRS vs. No DRS No DRS DRS Transaction Response time (ms) 60.00 50.00 40.00 30.00 20.00 10.00 0.00 2_2_2_2 3_2_2_1 3_3_1_1 3_3_2_0 4_2_1_1 4_2_2_0 4_3_1_0 4_4_0_0 5_3_0_0 Run Scenario
Q & A 44 Confidential