@kleegeek davidklee.net gplus.to/kleegeek linked.com/a/davidaklee Specialties / Focus Areas / Passions: Performance Tuning & Troubleshooting Virtualization Cloud Enablement Infrastructure Architecture Health & Efficiency Capacity Management High Availability Disaster Recovery SQLPASS Virtual Chapters Virtualization HA / DR Performance Heraflux Technologies 2 1
What is right-sizing, and why Profiling the system stack components CPU / memory / storage Analyzing environment Workload analysis Perfmon data review Abstraction layer between hardware and OS Resources Queues Limits in the environment Resource limitations (hard) Queue contention (soft) 2
resource allocations vcpu Memory Storage presentation One size does not fit all workloads Inappropriate resource allocations can hurt performance Hard Limits (Resources) Single compute node hardware Total cluster compute capacity Storage speed (IOPs, throughput) maximums Interconnect path speed Soft Limits (Queues) Memory oversubscription CPU scheduler contention Shared resource utilization Variable resource utilization levels Noisy Neighbors 3
16 vcpu 128 GB vram 8 vcpu 64 GB vram 2 vcpu 16 GB vram 2 vcpu 16 GB vram 2 vcpu 16 GB vram 2 vcpu 16 GB vram 2 vcpu 16 GB vram 2 vcpu 16 GB vram V I R T U A L I Z A T I O N 150 GHz CPU 4 TB Memory 4x10GbE Network 20 TB Tier 1 Storage 40 TB Tier 2 Storage TASK TASK TASK TASK TASK Hypervisor CPU Scheduler CPU Scheduling Queue Memory Allocator Mem Allocation Queue Disk Scheduler Disk Scheduling Queue Network Scheduler Network Scheduling Queue CPU Execution Mem R / W Disk R / W Network Tran / Rec 4
Resource limits are easy to detect / work around Queue contention much harder Time in queue = time lost from Silent performance killer Everything in a must be scheduled including idle resources Queue processing is not always FIFO vcpu scheduling queues by pcpu core Scheduling queue waits vcpu0 SMP TASKS vcpu1 vcpu2 vcpu3 High vcpu queue contention 5
vcpu scheduling queues by pcpu core Scheduling queue waits vcpu0 SMP TASKS vcpu1 High vcpu queue contention Application 24x7 performance metric collection CRITICAL Metrics from every piece of the system stack Interconnects SQL Server DB SQL Server Instance Operating System Virtualization Physical Server Storage Networking 6
SQL Server Raw CPU / mem / disk usage NUMA memory usage Signal waits Storage latency by DB file Wait statistics Glenn Berry @ bit.ly/1wdmb8n Windows CPU & memory consumption Storage IOPs / latency / throughput Processes (SQL Server vs other) Perfmon how-to @ bit.ly/1sqsvns @ bit.ly/1xw4jzj Capture all metrics as granularly as possible! Virtualization Resource consumption by Resource utilization by host CPU scheduling queue wait Overcommitment metrics ware vsphere: CPU Ready MS Hyper-V: CPU Wait Time per Dispatch Storage IOPs / latency / throughput By LUN By disk group Controller Interconnect path utilization Controller cache hit metrics Capture all metrics as granularly as possible! 7
Overlay all data streams Understand / classify: Workload periods Workload sources Business time period Goal: metrics by time period Median & Percentile analysis Explain & filter statistical anomalies Statistics Min / Average / Max / Median Percentile 8
vcpu counts matter! Size for what you need today Too many vcpus = BAD (probably) Too few vcpus = BAD (usually) Workload / server specific Not done at just vcpucount vnuma configuration also matters Closely align with pnuma Adds efficiency by aligning with underlying hardware Performance difference improves with larger s 9
Example: 16 vcpu What s better? 2 vsocketx 8 vcore? 4 vsocketx 4 vcore? 8 vsocketx 2 vcore? Varies by workload, hardware Transactions / min 900000 800000 700000 600000 500000 400000 300000 200000 100000 vnuma SQL Server Scalability - 16 vcpus - HammerDB Test it for yourself! 0 8 16 64 256 Concurrent HammerDB Users 4socket x 4CPU 8socket x 2CPU 2socket x 8CPU SQL Server CPU consumption by DB Top waits Signal waits Scrape parallelism from execution plan @ bit.ly/1rts9ux Windows CPU usage per core SQL Server vs. background Host CPU utilization over 80% CPU queue waits high 10
Understand the workload parallelism, concurrent volume Determine averages, maximums, and percentiles Determine the appropriate profiling period < 40% utilization avg too many CPUs > 60% utilization avg too few CPUs Factor CPU waits inside SQL Server Vary according to your circumstances 11
SQL Server data must be in buffer pool More memory less I/O Less I/O = less waiting on shared storage & queues NO HOST MEMORY OVERCOMMITMENT Too much memory = lower consolidation ratio Balancing act SQL Server Page Life Expectancy Buffer Cache Hit Ratio High page fault count High recompile ratio RESOURCE_SEMAPHORE waits Memory grants pending Windows MB free Paging Host Memory consumption > 90% Memory ballooning / dynamic memory expansion 12
How much memory? Slow storage? More RAM! Fast storage? Less RAM? More RAM = less host-level consolidation More SQL Server licensing (possibly) Table / index compression 13
7/11/2015 Much less variable in nature Most shared resource Most critical Most complex Most problematic Slowest piece of the stack Random I/O disk patterns Many individual points of contention LUN Controller LUN Controller Disk Pool LUN LUN 14
Test raw performance SQLIO Batch bit.ly/1meas9w DiskSpd bit.ly/1ceqauw Collect metrics: I/Osper second (IOPs) Latency (ms) Throughput (MB/s) IOps 70000.00 60000.00 50000.00 40000.00 30000.00 20000.00 10000.00 0.00 IOps Per Operations per Thread 1 2 4 8 16 32 64 128 Thread Intensity Sequential Read Random Read Sequential Write Random Write 15
Determine your runtime stats & percentiles Determine load thresholds Review estimated requirements Change configuration Incremental changes, not huge ones Test and retest 16
Workloads & applications change DBs are added / removed Perform a right-sizing analysis as necessary Adjust the resources accordingly Recommended: Periodic review of sizing Quarterly for volatile environments One size does not fit all workloads Profile and record your workload performance characteristics Analyze the numbers Adjust configuration and validate Repeat as often as your workload changes 17
@kleegeek davidklee.net gplus.to/kleegeek linked.com/a/davidaklee Heraflux Technologies 35 18