Mastering the art of indexing

Similar documents
Covering indexes. Stéphane Combaudon - SQLI

MySQL Database Scalability

MySQL Indexing. Best Practices for MySQL 5.6. Peter Zaitsev CEO, Percona MySQL Connect Sep 22, 2013 San Francisco,CA

MySQL Indexing. Best Practices. Peter Zaitsev, CEO Percona Inc August 15, 2012

State of MariaDB. Igor Babaev Notice: MySQL is a registered trademark of Sun Microsystems, Inc.

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Query Optimization, part 2: query plans in practice

Writing High Performance SQL Statements. Tim Sharp July 14, 2014

Practical MySQL indexing guidelines

Kathleen Durant PhD Northeastern University CS Indexes

RAID in Practice, Overview of Indexing

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

Oracle on RAID. RAID in Practice, Overview of Indexing. High-end RAID Example, continued. Disks and Files: RAID in practice. Gluing RAIDs together

Advanced MySQL Query Tuning

Firebird Tour 2017: Performance. Vlad Khorsun, Firebird Project

Storage hierarchy. Textbook: chapters 11, 12, and 13

How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan

The Right Read Optimization is Actually Write Optimization. Leif Walsh

Firebird Tour 2017: Performance. Vlad Khorsun, Firebird Project

Innodb Performance Optimization

Advanced Database Systems

CS3600 SYSTEMS AND NETWORKS

1Z MySQL 5 Database Administrator Certified Professional Exam, Part II Exam.

Introduction to Linux features for disk I/O

MySQL B+ tree. A typical B+tree. Why use B+tree?

Operating Systems. Operating Systems Professor Sina Meraji U of T

Disks, Memories & Buffer Management

File Structures and Indexing

STORING DATA: DISK AND FILES

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

Switching to Innodb from MyISAM. Matt Yonkovit Percona

MySQL 101. Designing effective schema for InnoDB. Yves Trudeau April 2015

CS122A: Introduction to Data Management. Lecture #14: Indexing. Instructor: Chen Li

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1

Optimizer Standof. MySQL 5.6 vs MariaDB 5.5. Peter Zaitsev, Ovais Tariq Percona Inc April 18, 2012

Accelerating OLTP performance with NVMe SSDs Veronica Lagrange Changho Choi Vijay Balakrishnan

IMPORTANT: Circle the last two letters of your class account:

Lecture S3: File system data layout, naming

System Characteristics

Introduction to Data Management. Lecture 14 (Storage and Indexing)

Column Stores vs. Row Stores How Different Are They Really?

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

Using Transparent Compression to Improve SSD-based I/O Caches

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

Introduction to Data Management. Lecture #13 (Indexing)

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

CSED421 Database Systems Lab. Index

Detecting MySQL IO problems on Linux at different abstraction layers. Nickolay Ihalainen Percona Live London 2011

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

VIRTUAL MEMORY READING: CHAPTER 9

Lecture 19 Query Processing Part 1

MySQL Performance Tuning 101

Information Systems (Informationssysteme)

Chapter 12: File System Implementation

IBM System Storage DS8870 Release R7.3 Performance Update

Disks and Files. Storage Structures Introduction Chapter 8 (3 rd edition) Why Not Store Everything in Main Memory?

InnoDB Scalability Limits. Peter Zaitsev, Vadim Tkachenko Percona Inc MySQL Users Conference 2008 April 14-17, 2008

Operating Systems. File Systems. Thomas Ropars.

How to Stop Hating MySQL: Fixing Common Mistakes and Myths

When and How to Take Advantage of New Optimizer Features in MySQL 5.6. Øystein Grøvlen Senior Principal Software Engineer, MySQL Oracle

File System Implementation

Query Optimization Percona, Inc. 1 / 74

Some Practice Problems on Hardware, File Organization and Indexing

PostgreSQL to MySQL A DBA's Perspective. Patrick

A Performance Puzzle: B-Tree Insertions are Slow on SSDs or What Is a Performance Model for SSDs?

Indexes - What You Need to Know

File Systems: Fundamentals

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

File Systems: Fundamentals

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

MySQL Schema Review 101

EXTERNAL SORTING. Sorting

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

L9: Storage Manager Physical Data Organization

PostgreSQL Performance The basics

I/O & Storage. Jin-Soo Kim ( Computer Systems Laboratory Sungkyunkwan University

CS317 File and Database Systems

<Insert Picture Here> Looking at Performance - What s new in MySQL Workbench 6.2

Tired of MySQL Making You Wait? Alexander Rubin, Principal Consultant, Percona Janis Griffin, Database Evangelist, SolarWinds

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

OPERATING SYSTEM. Chapter 12: File System Implementation

CHAPTER. Configuring the Hardware for Linux and Oracle

[537] Fast File System. Tyler Harter

Principles of Data Management. Lecture #2 (Storing Data: Disks and Files)

SSD/Flash for Modern Databases. Peter Zaitsev, CEO, Percona November 1, 2014 Highload Moscow,Russia

EECS 482 Introduction to Operating Systems

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Database Management Systems need to:

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig.

NOTE: sorting using B-trees to be assigned for reading after we cover B-trees.

Chapter 11: Implementing File Systems

File System Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Chapter 7

CS 111. Operating Systems Peter Reiher

NoVA MySQL October Meetup. Tim Callaghan VP/Engineering, Tokutek

CS370 Operating Systems

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

Transcription:

Mastering the art of indexing Yoshinori Matsunobu Lead of MySQL Professional Services APAC Sun Microsystems Yoshinori.Matsunobu@sun.com 1

Table of contents Speeding up Selects B+TREE index structure Index range scan, MySQL 6.0 MRR Covering Index (Index-only read) Multi-column index, index-merge Insert performance Understanding what happens when doing insert Insert buffering in InnoDB Benchmarks SSD/HDD InnoDB/MyISAM Changing RAM size Using MySQL 5.1 Partitioning Changing Linux I/O scheduler settings 2

Important performance indicator: IOPS Number of (random) disk i/o operations per second Regular SAS HDD : 200 iops per drive (disk seek & rotation is heavy) Intel SSD (X25-E): 2,000+ (writes) / 5,000+ (reads) per drive Currently highly depending on SSDs and device drivers Best Practice: Writes can be boosted by using BBWC (Battery Backed up Write Cache), especially for REDO Logs seek & rotation time Write cache disk disk seek & rotation time 3

Speeding up Selects 4

SELECT * FROM tbl WHERE key1=1; B+Tree index structure Branch 1 < 60 Leaf 1 < 120 Leaf 2 Root < 100000 Branch1 Leaf block 1 key1 RowID 1 10000 2 5 5: col2= aaa, col3=10 15321: col2= a, col3=7 10000: col2= abc, col3=100 3 15321 60 431 table records - Index scan is done by accessing Root -> Branch -> Leaf -> table records - Index entries are stored in leaf blocks, sorted by key order - Each leaf block usually has many entries - Index entries are mapped to RowID. -RowID = Record Pointer (MyISAM), Primary Key Value (InnoDB) - Root and Branch blocks are cached almost all times - On large tables, some(many) leaf blocks and table records are not cached 5

InnoDB: Clustered Index (index-organized table) SELECT * FROM tbl WHERE secondary_index1=1; Secondary indexes Branch 1 Leaf 1 sec_ind1 PK 1 10 2 1 3 3 60 5 Root PK Clustered Index (Primary key) Root Branch 1 Leaf 1 table records 1 Col2= a, col3= 2008-10-31. 2 Col2= aaa, col3= 2008-10-31. 3 Col2= aaa, col3= 2008-10-31. 10 Col2= abc, col3= 2008-10-31. - On Secondary indexes, entries are mapped to PK values - Two-step index lookups are required for SELECT BY SECONDARY KEY - Primary key lookup is very fast because it is single-step - Only one random read is required for SELECT BY PRIMARY KEY 6

Non-unique index scan SELECT * FROM message_table WHERE user_id =1; Branch 1-60 Leaf 1-120 Leaf 2 Leaf Block 1 user_id RowID 1 5 1 10000 5: col2= aaa, col3=10 15321: col2= a, col3=7 10000: col2= abc, col3=100 1 15321 10 10 table records When three index entries meet conditions, three random reads might happen to get table records Only one random read for the leaf block because all entries are stored in the same block 1 time disk i/o for index leaf, 3 times for table records (1 + N random reads) 7

Range scan SELECT * FROM tbl WHERE key1 BETWEEN 1 AND 3; Branch 1-60 Leaf 1-120 Leaf 2 Leaf Block 1 key1 RowID 1 10000 2 5 5: col2= aaa, col3=10 15321: col2= a, col3=7 10000: col2= abc, col3=100 3 15321 60 431 table records key 1.. 3 are stored in the same leaf block -> only one disk i/o to get index entries RowIDs skip, not sequentially stored -> Single disk read can not get all records, 3 random disk reads might happen 1 time disk i/o for index leaf, 3 times for table records (1 + N random reads) 8

SELECT * FROM tbl WHERE key1 < 2000000 Bad index scan Leaf block 1 key1 RowID 1 10000 2 5 Branch 1-60 Leaf 1-120 Leaf 2-180 Leaf 3 5: col2= aaa, col3=10 15321: col2= a, col3=7 10000: col2= abc, col3=100 3 15321 60 431 table records When doing index scan, index leaf blocks can be fetched by *sequential* reads, but *random* reads for table records are required (very expensive) Normally MySQL doesn t choose this execution plan, but choose full table scan (Sometimes not, control the plan by IGNORE INDEX) Using SSD boosts random reads 9

SELECT * FROM tbl WHERE key1 < 2000000 Branch 1-60 Leaf 1-120 Leaf 2-120 Leaf 3 Full table scan Full table scan Leaf Block 1 key1 RowID 1 10000 2 5 5: col2= aaa, col3=10 15321: col2= a, col3=7 10000: col2= abc, col3=100 3 15321 60 431 table records - Full table scan does sequential reads - Reading more data than bad index scan, but faster because: - Number of i/o operations are much smaller than bad index scan because: - Single block has a number of records - InnoDB has read-ahead feature. Reading an extent (64 contiguous blocks) at one time - MyISAM: Reading read_buffer_size (128KB by default) at one time 10

Multi-Range Read (MRR: MySQL6.0 feature) SELECT * FROM tbl WHERE key1 < 2000000 Leaf block 1 key1 RowID 1 10000 2 5 Branch 1-60 Leaf 1-120 Leaf 2-180 Leaf 3 Sort by RowID 5: col2= aaa, col3=10 15321: col2= a, col3=7 10000: col2= abc, col3=100 3 15321 60 431 table records Random read overheads (especially disk seeks) can be decreased Some records fitting in the same block can be read at once 11

Benchmarks : Full table scan vs Range scan Selecting 2 million records from a table containing 100 million records Selecting 2% of data Suppose running a daily/weekly batch jobs Query: SELECT * FROM tbl WHERE seconday_key < 2000000 Secondary key was stored by rand()*100,000,000 (random order) Using InnoDB(built-in), 5.1.33, RHEL5.3 (2.6.18-128) 4GB index size, 13GB data (non-indexed) size Innodb_buffer_pool_size = 5GB, using O_DIRECT Not all records fit in buffer pool Benchmarking Points Full table scan (sequential access) vs Index scan (random access) Controlling optimizer plan by FORCE/IGNORE INDEX HDD vs SSD Buffer pool size (tested with 5GB/10GB) 12

Benchmarks (1) : Full table scan vs Index scan (HDD) Index scan for 2mil rows vs Full scan for 100mil rows 900rows/s 502,310rows/s 297rows/s Index scan (buffer pool=10g) Index scan (buffer pool=5g) Full table scan 0 1000 2000 3000 4000 5000 6000 7000 8000 seconds HDD: SAS 15000RPM, 2 disks, RAID1 Index scan is 10-30+ times slower even though accessing just 2% records (highly depending of memory hit ratio) MySQL unfortunately decided index scan in my case (check EXPLAIN, then use IGNORE INDEX) 13

Benchmarks (2) : SSD vs HDD, index scan Time to do 2,000,000 random reads 13,418rows/s 4,614rows/s 900rows/s 297rows/s SSD(10G) SSD(5G) HDD(10G) HDD(5G) 0 1000 2000 3000 4000 5000 6000 7000 8000 seconds HDD: SAS 15000RPM, 2 disks, RAID1 SSD: SATA Intel X25-E, no RAID SSD was 15 times faster than HDD Increasing RAM improves performance because hit ratio is improved (5G -> 10G, 3 times faster on both SSD and HDD) 14

OS statistics HDD, range scan #iostat -xm 1 rrqm/s wrqm/s r/s w/s rmb/s wmb/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 0.00 243.00 0.00 4.11 0.00 34.63 1.23 5.05 4.03 97.90 SSD, range scan # iostat -xm 1 rrqm/s wrqm/s r/s w/s rmb/s wmb/s avgrq-sz avgqu-sz await svctm %util sdc 24.00 0.00 2972.00 0.00 53.34 0.00 36.76 0.72 0.24 0.22 66.70 4.11MB / 243.00 ~= 53.34MB / 2972.00 ~= 16KB (InnoDB block size) 15

Benchmarks (3) : Full table scan vs Index scan (SSD) Index scan for 2mil rows vs Full scan for 100mil rows (SSD) 13,418rows/s 1,083,658rows/s 4,614rows/s Index scan (buffer pool=10g) Index scan (buffer pool=5g) Full table scan 0 100 200 300 400 500 seconds SSD: SATA Intel X25-E Index scan is 1.5-5 times slower when accessing 2% of tables (still highly depending of memory hit ratio) Not so much slower than using HDD This tells that whether MySQL should choose full table scan or index scan depends on storage (HDD/SSD) and buffer pool FORCE/IGNORE INDEX is helpful 16

Benchmarks (4) : SSD vs HDD, full scan Full table scan for 100mil rows 1,083,658rows/s 502,310rows/s SSD HDD 0 50 100 150 200 250 seconds Single SSD was two times faster than two HDDs 17

Benchmarks(5) : Using MySQL6.0 MRR Time to do 2,000,000 random reads (HDD) 1376rows/s 900rows/s 465rows/s 297rows/s 6.0 MRR (10G) 5.1 range (10G) 6.0 MRR (5G) 5.1 range (5G) 0 1000 2000 3000 4000 5000 6000 7000 8000 Seconds 1.5 times faster, nice effect No performance difference was found on SSD 18

Covering index, Multi column index, index merge 19

Covering Index (Index-only read) SELECT key1 FROM tbl WHERE key1 BETWEEN 1 AND 60; Branch 1-60 Leaf 1-120 Leaf 2 Leaf 1 key1 RowID 1 10000 2 5 5: col2= aaa, col3=10 15321: col2= a, col3=7 10000: col2= abc, col3=100 3 15321 60 431 table records Some types of queries can be completed by reading only index, not reading table records Very efficient for wide range queries because no random read happens All columns in the SQL statement (SELECT/WHERE/etc) must be contained within single index 20

Covering Index > explain select count(ind) from t id: 1 select_type: SIMPLE table: t type: index possible_keys: NULL key: ind key_len: 5 ref: NULL rows: 100000181 Extra: Using index mysql> select count(ind) from d; +---------------+ count(ind) +---------------+ 100000000 +---------------+ 1 row in set (15.98 sec) > explain select count(c) from t id: 1 select_type: SIMPLE table: t type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 100000181 Extra: mysql> select count(c) from d; +-----------+ count(c) +-----------+ 100000000 +-----------+ 1 row in set (28.99 sec) 21

Multi column index SELECT * FROM tbl WHERE keypart1 = 2 AND keypart2 = 3 Leaf Block 1 keypart1 keypart2 RowID 1 5 10000 2 1 5 2 2 4 2 3 15321 3 1 100 3 2 200 3 3 300 4 1 400 5: col2= aaa, col3=10 10000: col2= abc, col3=100 15321: col2= a, col3=7 table records one access for leaf block, one access for the table records (If keypart2 is not indexed, three random accesses happen for the table records) 22

Index merge SELECT * FROM tbl WHERE key1 = 2 AND key2 = 3 Key1 s Leaf Block key1 RowID 1 10000 2 4 2 537 2 999 3 100 3 200 3 300 4 400 4 537 merge 999 100 200 300 Key2 s Leaf Block key2 RowID 1 10 1 20 1 30 2 500 3 100 3 200 3 300 3 999 5: col2= aaa, col3=10 999: col2= a, col3=7 table records Key 1 and Key2 are different indexes each other One access for key1, One access for key2, merging 7 entries, one access on the data The more records matched, the more overhead is added 23

Case: index merge vs multi-column index mysql> SELECT count(*) as c FROM t WHERE t.type=8 AND t.member_id=10; +---+ 2 +---+ 1 row in set (0.06 sec) type: index_merge key: idx_type,idx_member key_len: 2,5 rows: 83 Extra: Using intersect(idx_type,idx_member); Using where; Using index mysql> ALTER TABLE t ADD INDEX idx_type_member(type,member_id); mysql> SELECT count(*) as c FROM t WHERE t.type=8 AND t.member_id=10; +---+ 2 +---+ 1 row in set (0.00 sec) type: ref key: idx_type_member key_len: 7 ref: const,const rows: 1 Extra: Using index 24

Cases when multi-column index can not be used SELECT * FROM tbl WHERE keypart2 = 3 SELECT * FROM tbl WHERE keypart1 = 1 OR keypart2 = 3 Leaf 1 keypart1 keypart2 RowID 1 5 10000 2 1 5 2 2 4 2 3 15321 3 1 100 3 2 200 3 3 300 4 1 400 5: col2= aaa, col3=10 10000: col2= abc, col3=100 15321: col2= a, col3=7 table records Full table scan The first index column must be used Index Skip Scan is not supported in MySQL 25

Range scan & multi-column index SELECT * FROM tbl WHERE keypart1 < 2009-03-31 20:00:00 AND keypart2 = 3 ; keypart1 Leaf 1 keypart2 RowID 2009-03-29 01:11:11 5 10000 2009-03-29 02:11:11 10 5 2009-03-29 03:11:11 1 4 2009-03-30 04:11:11 3 999 2009-03-31 01:11:11 7 100 2009-03-31 02:11:11 4 200 2009-03-31 03:11:11 1 300 2009-04-01 01:11:11 2 400 Leaf 1 keypart2 keypart1 RowID 1 2009-03-29 03:11:11 4 1 2009-03-31 03:11:11 300 2 2009-04-01 01:11:11 400 3 2009-03-30 04:11:11 999 3 2009-04-29 02:11:11 6 4 2009-03-31 02:11:11 200 5 2009-03-29 01:11:11 10000 7 2009-03-31 01:11:11 100 5: col2= aaa, col3=10 10000: col2= abc, col3=100 999: col2= a, col3=7 table records keypart2 is sorted by keypart1 If cardinality of keypart1 is very high (common for DATETIME/TIMESTAMP) or used with range scan, keypart2 is useless to narrow records 26

Covering index & multi-column index SELECT a,b FROM tbl WHERE secondary_key < 100; Leaf 1 sec_key RowID 1 4 1 10000 1 5 1 15321 1 100 2 200 3 300.. 5: status=1, col2= aaa, col3=10 10000: status=0, col2= abc, col3=100 15321: keypart2=3, col2= a, col3=7 table records sec_key 1 1 1 1 1 2 3.. Leaf 1 a 2009-03-29 2009-03-30 2009-03-31 2009-04-01 2009-03-31 2009-03-30 2009-04-13 b 1 0 1 1 1 2 3.. RowID 4 10000 5 15321 100 200 300 400 {a, b} are not useful to narrow records, but useful to do covering index In InnoDB, PK is included in secondary indexes 27

Back to the previous benchmark.. Query: SELECT * FROM tbl WHERE seconday_key < 2000000 Can be done by covering index scan ALTER TABLE tbl ADD INDEX (secondary_key, a, b, c..) ; 28

Benchmarks: index range vs covering index range Index scan for 2 mil rows (HDD) 1 hour 52 min 3.29 sec 1 min 59.28 sec (less than 1sec when fully cached) Index range scan Covering index range scan 0 2000 4000 6000 8000 Seconds Warning: Adding additional index has side effects Lower write performance Bigger index size Lower cache efficiency 29

Speeding up Inserts 30

What happens when inserting INSERT INTO tbl (key1) VALUES (61) Leaf Block 1 key1 RowID 1 10000 2 5 3 15321 60 431 Leaf block is (almost) full Leaf Block 1 key1 RowID 1 10000 2 5 3 15321 60 431 A new block is allocated Leaf Block 2 key1 RowID 61 15322 Empty 31

Sequential order INSERT INSERT INTO tbl (key1) VALUES (current_date()) Leaf Block 1 key1 RowID 2008-08-01 1 2008-08-02 2 2008-08-03 3 2008-10-29 60 Leaf Block 1 key1 RowID 2008-08-01 1 2008-08-02 2 2008-08-03 3 Some indexes are inserted by sequential order (i.e. auto_increment, current_datetime) Sequentially stored No fragmentation 2008-10-29 60 Small number of blocks, small size Leaf Block 2 key1 RowID 2008-10-29 61 Empty Highly recommended for InnoDB PRIMARY KEY All entries are inserted here: cached in memory 32

Random order INSERT INSERT INTO message_table (user_id) VALUES (31) Leaf Block 1 user_id RowID 1 10000 2 5 3 15321 60 431 Leaf Block 1 user_id RowID 1 10000 30 333 Empty Leaf Block 2 user_id RowID 31 345 60 431 Empty Normally insert ordering is random (i.e. user_id on message_table) Fragmentated Small number of entries per each leaf block More blocks, bigger size, less cached Many leaf blocks are modified: less cached in memory 33

Random order INSERT does read() for indexes INSERT INTO message (user_id) VALUES (31) Buffer Pool 1. Check indexes are cached or not 3. Modify indexes Leaf Block 1 user_id RowID 1 10000 2 5 3 15321 2. pread() (if not cached) 60 431 Disk Index blocks must be in memory before modifying/inserting index entries When cached within RDBMS buffer pool, pread() is not called. Otherwise pread() is called Sequentially stored indexes (AUTO_INC, datetime, etc) usually do not suffer from this Increasing RAM size / using SSD helps to improve write performance 34

InnoDB feature: Insert Buffering If non-unique, secondary index blocks are not in memory, InnoDB inserts entries to a special buffer( insert buffer ) to avoid random disk i/o operations Insert buffer is allocated on both memory and innodb SYSTEM tablespace Periodically, the insert buffer is merged into the secondary index trees in the database ( merge ) Insert buffer Pros: Reducing I/O overhead Reducing the number of disk i/o operations by merging i/o requests to the same block Optimized i/o Some random i/o operations can be sequential Cons: Additional operations are added Merging might take a very long time when many secondary indexes must be updated and many rows have been inserted. it may continue to happen after a server shutdown and restart 35

Benchmarks: Insert performance Inserting hundreds of millions of records Suppose high-volume of insert table (twitter mesasge, etc..) Checking time to add one million records Having three secondary indexes Random order inserts vs Sequential order inserts Random: INSERT.. VALUES (id, rand(), rand(), rand()); Sequential: INSERT.. VALUES (id, id, id, id) Primary key index is AUTO_INCREMENT InnoDB vs MyISAM InnoDB: buffer pool=5g, O_DIRECT, trx_commit=1 MyISAM: key buffer=2g, filesystem cache=5g Three indexes vs One index Changing buffer pool size 5.1 Partitioning or not 36

Benchmarks (1) : Sequential order vs random order Time to insert 1 million records (InnoDB, HDD) Seconds 600 500 400 300 200 100 0 1 13 25 37 49 61 73 85 97 109 121 133 145 Existing records (millions) 2,000 rows/s Sequential order Random order 10,000 rows/s Index size exceeded buffer pool size Index size exceeded innodb buffer pool size at 73 million records for random order test Gradually taking more time because buffer pool hit ratio is getting worse (more random disk reads are needed) For sequential order inserts, insertion time did not change. No random reads/writes 37

Benchmarks (2) : InnoDB vs MyISAM (HDD) Time to insert 1 million records (HDD) Seconds 5000 4000 3000 2000 1000 250 rows/s innodb myisam 0 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 Existing records (millions) MyISAM doesn t do any special i/o optimization like Insert Buffering so a lot of random reads/writes happen, and highly depending on OS Disk seek & rotation overhead is really serious on HDD 38

Benchmarks(3) : MyISAM vs InnoDB (SSD) Time to insert 1million records (SSD) Seconds 600 500 400 300 200 100 0 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 Existing records (millions) 2,000 rows/s InnoDB MyISAM 5,000 rows/s Index size exceeded buffer pool size Filesystem cache was fully used, disk reads began MyISAM got much faster by just replacing HDD with SSD! 39

Benchmarks (4) : SSD vs HDD (MyISAM) Time to insert 1 million records (MyISAM) Seconds 6000 5000 4000 3000 2000 1000 0 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 Existing records (millions) MyISAM(SSD) MyISAM(HDD) MyISAM on SSD is much faster than on HDD No seek/rotation happens on SSD 40

Benchmarks (5) : SSD vs HDD (InnoDB) Time to insert 1 million records (InnoDB) Seconds 600 500 400 300 200 100 0 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 Existing records (millions) InnoDB (SSD) InnoDB (HDD) SSD is 10% or more faster Not so big difference because InnoDB insert buffering is highly optimized for HDD Time to complete insert buffer merging was three times faster on SSD (SSD:15min / HDD:42min) 41

Benchmarks (6) : Three indexes vs Single index 600 Time to insert 1 million records (InnoDB-HDD) Seconds 400 200 three indexes one index 0 1 41 81 121 161 201 241 281 321 361 401 Existing records (millions) Index size exceeded buffer pool size at these points For single index, index size was three times smaller so exceeded buffer pool size much slowly For single index, random i/o overhead was much smaller Common Practice: Do not create unneeded indexes 42

Benchmarks (7) : Increasing RAM (InnoDB) 600 Time to insert 1 million records (InnoDB-HDD) Seconds 400 200 buffer pool=5g buffer pool=10g 0 1 21 41 61 81 101 121 141 161 181 201 Existing records (millions) Increasing RAM (allocating more memory for buffer pool) raises break-even point Common practice: Make index size smaller 43

Single big physical table(index) Make index size smaller Partition 1 Partition 2 Partition 3 Partition 4 When all indexes (active/hot index blocks) fit in memory, inserts are fast Follow the best practices to decrease data size (optimal data types, etc) Sharding MySQL 5.1 range partitioning Range Partitioning, partitioned by sequentially inserted column (i.e. auto_inc id, current_datetime) Indexes are automatically partitioned Only index blocks in the latest partition are *hot* if old entries are not accessed 44

Benchmarks(8) : Using 5.1 Range Partitioning (InnoDB) Time to insert 1 million records (InnoDB-HDD) Seconds 600 400 200 0 1 17 33 49 65 81 97 113 129 145 Existing records (millions) Normal table Range-partitioned table PARTITION BY RANGE(id) ( PARTITION p1 VALUES LESS THAN (10000000), PARTITION p2 VALUES LESS THAN (20000000),. On insert-only tables (logging/auditing/timeline..), only the latest partition is updated. 45

Benchmarks(9) : Using 5.1 Range Partitioning (MyISAM) Time to insert 1 million records (MyISAM-HDD) Seconds 5000 4000 3000 2000 1000 0 1 21 41 61 81 101 121 141 161 181 Existing records (millions) Normal table Range-partitioned table Random read/write overhead is small for small indexes No random read when fitting in memory Less seek overhead for small indexes 46

Benchmarks (10) : Linux I/O Scheduler (MyISAM-HDD) Time to insert 1 million records (HDD) Seconds 5000 4000 3000 2000 1000 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 Existing records (millions) queue size=100000 queie size=128 (default) MyISAM doesn t have mechanism like insert buffer in InnoDB I/O optimization highly depends on OS and storage Linux I/O scheduler has an i/o request queue Sorting requests in the queue to process i/o effectively Queue size is configurable # echo 100000 > /sys/block/sdx/queue/nr_requests 47

Benchmarks (11) : Linux I/O Scheduler (MyISAM-SSD) 400 Time to insert 1 million records (SSD) Seconds 300 200 100 queue size=100000 queue size=128 (default) 0 1 9 17 25 33 41 49 57 65 73 81 89 97 Existing records (millions) No big difference Sorting i/o requests decreases seek & rotation overhead for HDD, but not needed for SSD 48

Summary Minimizing the number of random access to disks is very important to boost index performance Increasing RAM boosts both SELECT & INSERT performance SSD can handle 10 times or more random reads compared to HDD Utilize common & MySQL specific indexing techniques Do not create unneeded indexes (write performance goes down) covering index for range scans Multi-column index with covering index Check Query Execution Plans Control execution plans by FORCE/IGNORE INDEX if needed Query Analyzer is very helpful for analyzing Create small-sized indexes Fitting in memory is very important for random inserts MySQL 5.1 range partitioning helps in some cases MyISAM is slow for inserting into large tables (on HDD) when using indexes and inserting by random order MyISAM for Logging/Auditing table is not always good Using SSD boosts performance 49

Enjoy the conference! 50