@kleegeek davidklee.net heraflux.com linkedin.com/in/davidaklee Specialties / Focus Areas / Passions: Performance Tuning & Troubleshooting Virtualization Cloud Enablement Infrastructure Architecture Health & Efficiency Capacity Management Founder & Chief Architect Heraflux Technologies 2 Do not reproduce this document in any way. 1
What is Virtualization? Resources and queues Physical servers Storage and interconnects architecture Inside the Heraflux Technologies 3 Added layer Physical resources Resource queues Very small queue delays When done right Shared everything architecture Heraflux Technologies 4 Do not reproduce this document in any way. 2
16 vcpu 128 GB vram 8 vcpu 64 GB vram 2 vcpu 16 GB vram 2 vcpu 16 GB vram V I R T U A L I Z A T I O N 2 vcpu 16 GB vram 2 vcpu 16 GB vram 2 vcpu 16 GB vram 2 vcpu 16 GB vram 150 GHz CPU 4 TB Memory 4x10GbE Network 20 TB Tier 1 Storage 40 TB Tier 2 Storage Heraflux Technologies 5 TASK TASK TASK TASK TASK H y p e r v i s o r CPU Scheduler CPU Scheduling Queue Memory Allocator Mem Allocation Queue Disk Scheduler Disk Scheduling Queue Network Scheduler Network Scheduling Queue CPU Execution Mem R / W Disk R / W Network Tran / Rec Heraflux Technologies 6 Do not reproduce this document in any way. 3
Hard Limits (Resources) Soft Limits (Queues) Finite! Measurable but not finite Single compute node hardware Memory oversubscription Total cluster compute capacity CPU scheduler contention Storage speed (IOPs, throughput) Shared resource utilization maximums Noisy Neighbors Interconnect path speed Heraflux Technologies 7 7 7 7 Application SQL Server DB s t c e n n o c r e t I n SQL Server Instance Operating System Virtualization Physical Server Storage g in r k o t w e N Heraflux Technologies 8 Do not reproduce this document in any way. 4
Heraflux Technologies 9 Most shared Most critical Most complex Most problematic Slowest piece of the stack Many individual points of contention Heraflux Technologies 10 Do not reproduce this document in any way. 5
LUN T1 Controller LUN T2 LUN Controller Disk Pool LUN T3 Heraflux Technologies 11 Test raw performance SQLIO Deprecated! DiskSpd Batch heraflux.com/go/diskspd Collect metrics: I/Osper second (IOPs) Latency (ms) Throughput (MB/s) IOps 70000.00 60000.00 50000.00 40000.00 30000.00 20000.00 10000.00 IOps Per Operations per Thread Sequential Read Random Read Sequential Write Random Write 0.00 1 2 4 8 16 32 64 128 Thread Intensity Heraflux Technologies 12 Do not reproduce this document in any way. 6
Heraflux Technologies 13 (Img src: http://www.asacomputers.com/2u-server-sandy-bridge.html) 14 Do not reproduce this document in any way. 7
Manufacturer Intel AMD Sockets Cores Speed vs. GHz Logical vs. Physical Hyperthreading Heraflux Technologies 15 CPU CPU CPU CPU Memory (Img src: http://frankdenneman.nl/2011/01/05/amd-magny-cours-and-esx/) Heraflux Technologies 16 Do not reproduce this document in any way. 8
Virtual SQL Servers. Actual Performance. 2016 (Img src: http://www.learnyourtech.com/hardware/) Heraflux Technologies 17 Sample Cisco UCS memory config Slot & chip placement Heraflux Technologies (Src: http://www.cisco.com/c/dam/en/us/products/collateral/servers-unified-computing/ucs-b-series-blade-servers/b200m4-specsheet.pdf page 44) Do not reproduce this document in any way. 18 9
Heraflux Technologies 19 Resource limits are easy to detect / work around Queue contention much harder Time in queue = time lost from Silent performance killer Everything in a must be scheduled including idle resources Queue processing is not always FIFO Heraflux Technologies 20 Do not reproduce this document in any way. 10
Get physical machine configuration Try to fit inside one NUMA node Otherwise, balance across number of NUMA nodes Test configurations for best results Heraflux Technologies 21 Example: 16 vcpu What s better? 2 vsocket x 8 vcore? 4 vsocket x 4 vcore? 8 vsocket x 2 vcore? Varies by workload, hardware Test it for yourself! 900000 800000 700000 600000 in m / s 500000 n tio c s a400000 n ra T 300000 200000 100000 vnuma SQL Server Scalability - 16 vcpus - HammerDB 0 8 16 64 256 Concurrent HammerDB Users 4socket x 4CPU 8socket x 2CPU 2socket x 8CPU Heraflux Technologies 22 Do not reproduce this document in any way. 11
1 vcputo 1 pcpu is a poor recommendation Queues Ready Time Co-Stop Ex: Client environment 74 running s s @ 2-8 vcpu 72 total pcores Fantastic performance Heraflux Technologies 23 Heraflux Technologies 24 Do not reproduce this document in any way. 12
Heraflux Technologies 25 Heraflux Technologies 26 Do not reproduce this document in any way. 13
Right amount of vcpu and vram resources Physical world = Size for end of life Virtual world = Size for right now Idle vcpus will slow application s performance Repeat right-sizing analysis routinely Heraflux Technologies 27 Heraflux Technologies 28 Do not reproduce this document in any way. 14
Virtual disks whenever possible Multiple SCSI controllers ware PVSCSI Spread out the workload Heraflux Technologies 29 C: -Operating System D: -SQL Server Instance Home E: -System Databases (master, model, msdb) * F: -User Database Data (1 of X) G: -User Database Log (1 of Y) H: -TempDB Y: -Windows Page file ** Z: - Backups Adjust as necessary (but stay standardized) Heraflux Technologies 30 Do not reproduce this document in any way. 15
64KB NTFS block sizes Set power settings to High Performance (CPU-Z) Set antivirus exclusions for SQL Server (tinyurl.com/sqlav) Ongoing OS-level performance metric collection No greater than five minute interval Windows Perfmon, Microsoft SCOM, or other third-party utility heraflux.com/go/perfmon Heraflux Technologies 31 Goal: Maximize performance while reducing resource scheduling Parallelizable workloads Determine how parallel the workload is Set MaxDOP= vnuma node core count (?) Cost threshold for parallelism = Not default Jonathan Kehayias Tuning CTOP from Plan Cache bit.ly/1rts9ux Heraflux Technologies 32 Do not reproduce this document in any way. 16
Spread out the I/O File groups, data files, partitions Parallelism with multiple active storage paths Reduce I/O Table / index compression vs. SAN compression In-memory constructs More RAM SSD read / write caching Faster I/O All flash SAN And then clean up bad schemas & queries! Heraflux Technologies 33 Virtualization works. Equivalent performance if done right Efficiency in data handling Heraflux Technologies 34 Do not reproduce this document in any way. 17
@kleegeek davidklee.net heraflux.com linkedin.com/in/davidaklee Heraflux Technologies 35 Do not reproduce this document in any way. 18