Accelerating OLTP performance with NVMe SSDs Veronica Lagrange Changho Choi Vijay Balakrishnan
Agenda OLTP status quo Goal System environments Tuning and optimization MySQL Server results Percona Server results Summary 2
OLTP status quo On Line Transaction Processing is typically I/O bound ACID properties -> transaction must be durable Capacity planning: needed IOPS -> lots of storage devices and idle CPUs 3
Goals Maximize throughput (tpmp= New Order transactions per minute) Minimize Response Times Side line benefit: Increase Server Capacity 4
MySQL Server and Percona Server MySQL Server: Open-source relational database management system (RDBMS) The world s most used opensource/client-server RDBMS Optimized for On Line Transaction Processing (OLTP) Percona Server: A free, fully compatible, open source enhancement for MySQL Server Developed and distributed by Percona Especially optimized for the I/O subsystem 5
TPC-C and tpcc-mysql TPC-C A 23-year old OLTP Benchmark. 5 types of well-defined transactions: 1. New Order (Read & Write) 2. Payment (Read & Write) 3. Delivery (Read & Write) 4. Order Status (Read Only) 5. Stock Level (Read Only) Throughput is New Order Transactions per minute (tpmc) Relational schema tpcc-mysql: The best open source implementation available Developed by Percona Not 100% compatible with the standard 6
Methodology Establish baseline configuration Characterize performance of system and software Identify key parameters for SSD NVMe Optimize system and software to achieve highest throughput 7
Performance Measurement Environment tpcc-mysql Linux ODBC MySQL Linux InnoDB Filesystem 10Gbit ethernet Database size: initial size after 2 hour run 500 warehouses 43 GB 79 GB Storage HDD or SSD data dir log dir Workload: 50 connections 100 connections 150 connections 200 connections Replace Devices 8
Dual Socket Server Environment Model name Memory OS version Server DELL 730xd Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz 64GB Linux 4.4.0-040400-generic MySQL Server 5.7.11 Percona Server 5.7.11-4 Sequential Reads (MB/s) Sequential Writes (MB/s) Random Reads (IOPS) Random Writes (IOPS) Capacity NVMe XS1715* 3,000 1,400 750M 115K 1.6T SATA 850Pro 550 520 100 K 90K 512GB SAS PM1633 1,350 750 190 K 30K 960GB HDD is a Seagate 15Krpm SAS HDD. XS1715 is a discontinued drive. Results from updated drive later in the presentation 9
MySQL Server Throughput 90,000 80,000 100 connections 70,000 60,000 Throughput (tpmp) 50,000 40,000 2.6X 67X SAS-HDD NVME (XS1715) 30,000 49X 20,000 10,000-37% OutOfBox MySQL initial MySQL optimized 10
MySQL Server Optimized Response Times New Order 95th percentile R. Time 10,000 100,000 New Order Response Times 172X 22X 10,000 1,000 Milliseconds (log scale) 100 95th SAS-HDD 95th SATA-SSD 95th NVMe Milliseconds (log scale) 1,000 100 95th SAS-HDD 95th SATA-SSD 95th NVMe Max SAS-HDD Max SATA-SSD Max NVMe 10 10 1 50 c 100 c 150 c Connections 1 50 c 100 c 150 c 11
MySQL Key Parameters Parameter Name MySQL out-of-box MySQL initial MySQL optimized Percona optimized innodb_flush_method NULL <empty> O_DIRECT innodb_buffer_pool_size 128MB 3GB 12GB innodb_io_capacity 200 300,000 15,000 innodb_io_capacity_max 2,000 600,000 25,000 innodb_adaptive_hash_index ON OFF 0 innodb_fill_factor 100 50 100 innodb_page_cleaners 4 32 8 innodb_buffer_pool_instances 8 32 8 innodb_flush_neighbors 1 1 0 innodb_log_file_size 48MB 48MB 1G 10G innodb_lru_scan_depth 1024 4000 8192 innodb_write_io_threads 4 16 innodb_read_io_threads 4 16 innodb_log_files_in_group 2 3 innodb_max_dirty_pages_pct 75 90 innodb_max_dirty_pages_pct_lwm 0 10 join_buffer_size 256KB 32K sort_buffer_size 256KB 32K innodb_spin_wait_delay 6 96 6 innodb_max_purge_lag_delay 0 30000000 0 performance_schema ON OFF 12
MySQL Server Top Tunables The following parameters are especially important for NVMe: innodb_io_capacity Sets the upper limit on the I/O activity Default is 200 a clear resource throttle for fast storage should be set to approximately the number of IOPS the system is capable of innodb_flush_method To use or not to use the filesystem cache O_DIRECT will use the innodb_buffer_pool instead innodb_buffer_pool_size The memory area where Innodb caches table and index data Typical Goldilocks trade offs Need more buffer pool space when using O_DIRECT 13
MySQL Server Top Tunables The following parameters are especially important for TPCC: innodb_thread_concurrency Experimental results: For OLTP, default (0 = unlimited) yields better throughput. And, less System CPU utilization (no gatekeeping on thread creation count). innodb_adaptive_hash_index Extra work to monitor index lookups and maintain the hash index structure May become a source of contention Experimental result: brought latency from 3+ minutes to less than 50 seconds for OrderStatus/Payment transaction types. innodb_fill_factor. Percentage of each B-tree page that is filled during a sorted index build. A hint, not a hard limit A smaller number may benefit transactions with INSERTs because it will decrease the number of page splits. 14
MySQL Server Optimization Learning MySQL Server is build for STABILITY Lots of latches to prevent overwhelming any subcomponent Important to tune I/O specific parameters Important to tune OLTP specific parameters Achieved more than 2x throughput with the optimization process 15
MySQL Server Optimized Throughput 90,000 80,000 100 connections 70,000 60,000 Throughput (tpmp) 50,000 40,000 2.6X 67X SAS-HDD NVME (XS1715) 30,000 49X 20,000 10,000-37% OutOfBox MySQL initial MySQL optimized 16
MySQL Server Optimized Throughput MySQL Server throughput 90,000 80,000 70,000 60,000 87X 67X 13X tpmp 50,000 40,000 30,000 SAS-HDD SATA NVMe 20,000 10,000-50 c 100 c 150 c Connections 17
MySQL Server System Metrics tpcc-mysql running on Dual-core: SAS-HDD is I/O bound NVMe is CPU bound Mean CPU utilization (User%+Sys%) sas-hdd nvme 50 c 1.81 100 c 2.45 80.93 150 c 86.44 18
Percona Server on a Quad Socket Server Apply and tune optimization to Percona s Distribution Change to Quad-Socket Server NVMe is now PM1725 Model Name Memory OS version Storage Dell PowerEdge R930 (Server) Intel(R) Xeon(R) CPU E7-4850 v3 @ 2.20GHz 124GB Linux 4.4.0-040400-generic SAS HDD SEAGATE ST600MP0005 15K rpm SATA SSD Samsung 850 Pro SAS SSD Samsung PM1633 NVMe Samsung PM1725 Sequential Reads (MB/s) Sequential Writes (MB/s) Random Reads (IOPS) Random Writes (IOPS) Capacity NVMe PM1725 3,000 1,900 750M 130K 1.5T SATA 850Pro 550 520 100 K 90K 512GB SAS PM1633 1,350 750 190 K 30K 960GB 19
Percona Server on a Quad Socket Server 160,000 Transactions per Minute 140,000 120,000 180X 100,000 tpmp 80,000 60,000 SAS-HDD SAS-SSD NVMe(PM1725) 40,000 20,000-50 c 100 c 150 c 200 c Connections 20
Percona Server Optimized Response Time NewOrder 95th Percentile 100,000 New Order Response Time 100,000 10,000 267X 333X 95th SAS-HDD 10,000 milliseconds (log scale) 1,000 100 95th SAS 95th NVMe milliseconds (log scale) 1,000 100 95th SAS-HDD Max SAS-HDD MAX SAS 10 10 MAX NVMe 1 50 c 100 c 150 c 200 c 1 50 c 100 c 150 c 200 c 21
Percona Server: System Resources CPU Utilization 100 connections 35 30000 Percona Optimized - I/O Combined 48K iops 30 25000 25 20000 20 15 SAS-HDD SAS-SSD NVMe(PM1725) IOPS 15000 SAS-HDD SAS-SSD NVMe(PM1725) 10000 10 5 5000 0 mean CPU % 0 mean Read IOPS mean Write IOPS 22
How about Server Capacity? Introducing CPU PATH LENGTH: The average number of CPU cycles it takes to complete one transaction CPU PATHL = (CPU frequency * cores * total average CPU utilization) / (transactions per second) Where CPU frequency is the one reported under Model Name It is a measure of how much work CPUs need to do to execute one transaction Because the workload is always the same, CPU PATHL variations indicate the extra, book-keeping work that needs to be done by the Server to manage queues, buffers, context switches, etc. We notice that faster devices require less book-keeping by the CPUs, therefore freeing resources, therefore increasing the Server Capacity. 23
Server Capacity: Dual Socket MySQL Server Decrease CPU path length by at least 50% when replacing storage from HDD to NVMe. 90.00 MySQL Server - Dual core - CPU path length 80.00 cycles per transaction (MHz) 70.00 60.00 50.00 40.00 30.00-65% -50% -44% SAS-HDD Optimized SATA-SSD Optimized NVMe(XS1715) Optimized 20.00 10.00-50 c 100 c 150 c connections 24
Server Capacity: Quad Socket Percona Server Percona Server CPU pathl 90.00 80.00 70.00-60% -56% -46% cycles per transaction (MHz) 60.00 50.00 40.00 30.00 SAS-HDD SAS-SSD NVMe(PM1725) 20.00 10.00-50 100 150 Connections 25
Summary NVMe throughput can be 100x better than HDD All SSD maximum latencies are much smaller than 95 th percentile HDD response times OLTP paradigm change from I/O bound on HDD to healthier CPU utilization when using NVMe Tuning is critical 26
Next steps Multiple instances using same NVMe Leverage fast storage: Optimize software by removing latency workarounds added over time to minimize HDD latencies Or, do less caching and buffering, do more I/O. 27
Questions? veronica.l@samsung.com 28
Backup Slides Better I/O balance between DATA and LOG disks with Percona Servers Example using NVMe, 100-connection test, both on Dual Socket Server. MySQL Server Percona Server Mean CPU (User+Sys) % 81 65 Mean CPU Wait % 10 15 Mean Data Disk Reads IOPS 19,919 30,291 Mean Data Disk Writes IOPS 79,274 31,933 Mean Log Disk Writes IOPS 3,541 32,634 29
Percona Server Throughput 200 180 160 Transactions per minute normalized to SAS- HDD 50c 110K 136K 139K 139K 140 120 100 80 SAS-HDD SAS SSD NVMe SSD 60 40 20-50 c 100 c 150 c 200 c 30