1 A Performance Puzzle: B-Tree Insertions are Slow on SSDs or What Is a Performance Model for SSDs? Bradley C. Kuszmaul MIT CSAIL, & Tokutek 3 iibench - SSD Insert Test 25 2 Rows/Second 15 1 5 2,, 4,, 6,, 8,, 1,, 12,, 14,, Rows Inserted InnoDB - Intel SSD TokuDB 1.1.3 - Intel SSD InnoDB - RAID1 TokuDB 1.1.3 - RAID1 Motivation: I want to understand SSD performance so I can design fast data structures. HPTS 29
2 Poor MYSQL B-Tree SSD Performance? 3 iibench - SSD Insert Test 25 2 Rows/Second 15 1 5 2,, 4,, 6,, 8,, 1,, 12,, 14,, Rows Inserted InnoDB - Intel SSD TokuDB 1.1.3 - Intel SSD InnoDB - RAID1 TokuDB 1.1.3 - RAID1 Surprisingly, disk almost as good as SSD. InnoDB s insertion buffer helps. Not CPU bound.
3 Intel X25E Specifications Read bandwidth up to 25 MB/s. Write bandwidth up to 17 MB/s. Random 4KB read rate: 35 KIO/s. Random 4KB write reate: 3.5 KIO/s. Disk bandwidth (5 disk RAID): Read/Write bandwidth about 4 MB/s. Random Read/Write rate: 6/s (with the wind at your back.) So why isn t the SSD giving InnoDB a 6x performance boost?
4 MySQL Complex Measure Berkeley DB 1e+6 Insertions per Second 1 1 Out of BDB cache (into OS cache) Out of OS cache 1 5 1 15 2 25 3 35 4 45 5 Trending to 45 writes per second (still dropping...)
5 Berkeley DB too complex Try File I/O Method: Build a 12GB file on a machine with 3GB RAM. Perform random reads and writes of various sizes. Build a performance model. Still strange and unpredictable.
6 A Performance Model Can I make the following performance model work? When reading a block of size B, There is a startup cost, S, ( seek time ) There is bandwidth, W, ( transfer rate ). The simple model is thus T R = S R + B/W R T W = S W + B/W W
7 Read Performance as a Function of Block Size 3 25 Bandwidth (MB/s) 2 15 1 5 8 1 12 14 16 18 2 22 lg(block Size)
8 A Model from the Data Sheet? The Manufacturer s 35 KIO/s and 25 MB/s suggests this (poor) model: T B = 16µs + B/(25MB/s). 3 25 Bandwidth (MB/s) 2 15 1 5 8 1 12 14 16 18 2 22 lg(block Size)
A Model from the Data Sheet? The bandwidth looks good, but I never saw anything like 35, IO/s on any workload. Actual read performance is about 1, IO/s: T B = 2µs + B/(25MB/s). 3 25 Bandwidth (MB/s) 2 15 1 5 8 1 12 14 16 18 2 22 lg(block Size) 9
1 Small Blocks A Little Misleading For block sizes of less than 496, the OS first does a read (of 4KB) and then a write. 3 25 Bandwidth (MB/s) 2 15 1 5 8 1 12 14 16 18 2 22 lg(block Size) Surprisingly, this doesn t affect the curve much.
11 Up to 1, reads/s with Multithreading 12 1 4KB reads/s 8 6 4 1/s + 18/s/thread, max=11, 2 5 1 15 2 25 3 35 Number of threads
12 Write Performance 18 16 14 Spec sheet: 3us/write + 17MB/s Bandwidth (MB/s) 12 1 8 6 4 2 Actual: 1us/write + 1MB/s 8 1 12 14 16 18 2 22 lg(block Size)
13 Read-Write Performance Mixing reads and writes gives the worst of both. startup (S) bandwidth (W) read 2µs 25MB/s write 1µs 1MB/s mixed 2µs 1MB/s
14 What Block Size To Use? For point-queries, B-trees are insensitive to block size. As soon as you have any reasonable fanout you do well. For range queries, the block size is important. Tension: Large block sizes make range queries faster. Large block sizes make point queries slower.
15 Half-power point Idea: Set block size so that half the time is accounted for the seek time, and half the time by bandwidth. Half Power Point SSD Rotating Disk read 5KB.5MB 1MB write 1KB.5MB 1MB read/write mix 21KB.5MB 1MB
16 Cache-Oblivious Approach Use data structures that are fast for any block size. Can also speed insertions without slowing searches. 3 iibench - SSD Insert Test 25 2 Rows/Second 15 1 5 2,, 4,, 6,, 8,, 1,, 12,, 14,, Rows Inserted InnoDB - Intel SSD TokuDB 1.1.3 - Intel SSD InnoDB - RAID1 TokuDB 1.1.3 - RAID1 Tokutek s MySQL storage engine uses these ideas.