A Performance Puzzle: B-Tree Insertions are Slow on SSDs or What Is a Performance Model for SSDs?

Similar documents
How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. Guest Lecture in MIT Performance Engineering, 18 November 2010.

Databases & External Memory Indexes, Write Optimization, and Crypto-searches

NoVA MySQL October Meetup. Tim Callaghan VP/Engineering, Tokutek

Write-Optimization in B-Trees. Bradley C. Kuszmaul MIT & Tokutek

Hard Disk Drives. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

The TokuFS Streaming File System

Indexing Big Data. Big data problem. Michael A. Bender. data ingestion. answers. oy vey ??? ????????? query processor. data indexing.

L7: Performance. Frans Kaashoek Spring 2013

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

Main Memory and the CPU Cache

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

The Right Read Optimization is Actually Write Optimization. Leif Walsh

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

STORING DATA: DISK AND FILES

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona Percona Technical Webinars 9 May 2018

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

MySQL Database Scalability

Sizing & Quotation for Sangfor HCI Technical Training

EE 457 Unit 7b. Main Memory Organization

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

TokuDB vs RocksDB. What to choose between two write-optimized DB engines supported by Percona. George O. Lorch III Vlad Lesin

Mastering the art of indexing

How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan

CSE 451: Operating Systems Spring Module 12 Secondary Storage

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

CS429: Computer Organization and Architecture

High Performance SSD & Benefit for Server Application

CS429: Computer Organization and Architecture

Recovering Disk Storage Metrics from low level Trace events

ASN Configuration Best Practices

The Care and Feeding of a MySQL Database for Linux Adminstrators. Dave Stokes MySQL Community Manager

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Pharmacy college.. Assist.Prof. Dr. Abdullah A. Abdullah

Record Placement Based on Data Skew Using Solid State Drives

SSD/Flash for Modern Databases. Peter Zaitsev, CEO, Percona October 08, 2014 Percona Technical Webinars

IBM DS8870 Release 7.0 Performance Update

Chapter 4 File Systems. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

Operating Systems Design Exam 2 Review: Spring 2011

Robert Gottstein, Ilia Petrov, Guillermo G. Almeida, Todor Ivanov, Alex Buchmann

Flash Trends: Challenges and Future

Warehouse- Scale Computing and the BDAS Stack

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

CS 416: Opera-ng Systems Design March 23, 2012

Using Synology SSD Technology to Enhance System Performance Synology Inc.

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

Performance Modeling and Analysis of Flash based Storage Devices

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

Third Midterm Exam April 24, 2017 CS162 Operating Systems

I/O CANNOT BE IGNORED

Column Stores vs. Row Stores How Different Are They Really?

The TokuFS Streaming File System

Extreme Storage Performance with exflash DIMM and AMPS

Random-Access Memory (RAM) CS429: Computer Organization and Architecture. SRAM and DRAM. Flash / RAM Summary. Storage Technologies

Princeton University. Computer Science 217: Introduction to Programming Systems. The Memory/Storage Hierarchy and Virtual Memory

Benchmarking Enterprise SSDs

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

The University of Adelaide, School of Computer Science 13 September 2018

Record Placement Based on Data Skew Using Solid State Drives

Managing Array of SSDs When the Storage Device is No Longer the Performance Bottleneck

SSD/Flash for Modern Databases. Peter Zaitsev, CEO, Percona November 1, 2014 Highload Moscow,Russia

I/O CANNOT BE IGNORED

A New Metric for Analyzing Storage System Performance Under Varied Workloads

Performance of popular open source databases for HEP related computing problems

Comparing Performance of Solid State Devices and Mechanical Disks

GPUs and Emerging Architectures

Advanced Database Systems

This Unit: Main Memory. Building a Memory System. First Memory System Design. An Example Memory System

Name: Instructions. Problem 1 : Short answer. [56 points] CMU Storage Systems 25 Feb 2009 Spring 2009 Exam 1

Computer Engineering II Solution to Exercise Sheet Chapter 6

ibench: Quantifying Interference in Datacenter Applications

Architecting For Availability, Performance & Networking With ScaleIO

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Storage Management 1

MySQL Performance Improvements

System Characteristics

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Oracle Performance on M5000 with F20 Flash Cache. Benchmark Report September 2011

RocksDB Key-Value Store Optimized For Flash

88X + PERFORMANCE GAINS USING IBM DB2 WITH BLU ACCELERATION ON INTEL TECHNOLOGY

Data Storage and Query Answering. Data Storage and Disk Structure (2)

Readings. Storage Hierarchy III: I/O System. I/O (Disk) Performance. I/O Device Characteristics. often boring, but still quite important

Non-Blocking Writes to Files

CS 261 Fall Mike Lam, Professor. Memory

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

Reduced Instruction Set Computers

Using Transparent Compression to Improve SSD-based I/O Caches

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

Normalization, Generated Keys, Disks

COMP375 Practice Final Exam

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

Moving Weather Model Ensembles To A Geospatial Database For The Aviation Weather Center

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III

A Case Study in Optimizing GNU Radio s ATSC Flowgraph

Chapter 8 Virtual Memory

Transcription:

1 A Performance Puzzle: B-Tree Insertions are Slow on SSDs or What Is a Performance Model for SSDs? Bradley C. Kuszmaul MIT CSAIL, & Tokutek 3 iibench - SSD Insert Test 25 2 Rows/Second 15 1 5 2,, 4,, 6,, 8,, 1,, 12,, 14,, Rows Inserted InnoDB - Intel SSD TokuDB 1.1.3 - Intel SSD InnoDB - RAID1 TokuDB 1.1.3 - RAID1 Motivation: I want to understand SSD performance so I can design fast data structures. HPTS 29

2 Poor MYSQL B-Tree SSD Performance? 3 iibench - SSD Insert Test 25 2 Rows/Second 15 1 5 2,, 4,, 6,, 8,, 1,, 12,, 14,, Rows Inserted InnoDB - Intel SSD TokuDB 1.1.3 - Intel SSD InnoDB - RAID1 TokuDB 1.1.3 - RAID1 Surprisingly, disk almost as good as SSD. InnoDB s insertion buffer helps. Not CPU bound.

3 Intel X25E Specifications Read bandwidth up to 25 MB/s. Write bandwidth up to 17 MB/s. Random 4KB read rate: 35 KIO/s. Random 4KB write reate: 3.5 KIO/s. Disk bandwidth (5 disk RAID): Read/Write bandwidth about 4 MB/s. Random Read/Write rate: 6/s (with the wind at your back.) So why isn t the SSD giving InnoDB a 6x performance boost?

4 MySQL Complex Measure Berkeley DB 1e+6 Insertions per Second 1 1 Out of BDB cache (into OS cache) Out of OS cache 1 5 1 15 2 25 3 35 4 45 5 Trending to 45 writes per second (still dropping...)

5 Berkeley DB too complex Try File I/O Method: Build a 12GB file on a machine with 3GB RAM. Perform random reads and writes of various sizes. Build a performance model. Still strange and unpredictable.

6 A Performance Model Can I make the following performance model work? When reading a block of size B, There is a startup cost, S, ( seek time ) There is bandwidth, W, ( transfer rate ). The simple model is thus T R = S R + B/W R T W = S W + B/W W

7 Read Performance as a Function of Block Size 3 25 Bandwidth (MB/s) 2 15 1 5 8 1 12 14 16 18 2 22 lg(block Size)

8 A Model from the Data Sheet? The Manufacturer s 35 KIO/s and 25 MB/s suggests this (poor) model: T B = 16µs + B/(25MB/s). 3 25 Bandwidth (MB/s) 2 15 1 5 8 1 12 14 16 18 2 22 lg(block Size)

A Model from the Data Sheet? The bandwidth looks good, but I never saw anything like 35, IO/s on any workload. Actual read performance is about 1, IO/s: T B = 2µs + B/(25MB/s). 3 25 Bandwidth (MB/s) 2 15 1 5 8 1 12 14 16 18 2 22 lg(block Size) 9

1 Small Blocks A Little Misleading For block sizes of less than 496, the OS first does a read (of 4KB) and then a write. 3 25 Bandwidth (MB/s) 2 15 1 5 8 1 12 14 16 18 2 22 lg(block Size) Surprisingly, this doesn t affect the curve much.

11 Up to 1, reads/s with Multithreading 12 1 4KB reads/s 8 6 4 1/s + 18/s/thread, max=11, 2 5 1 15 2 25 3 35 Number of threads

12 Write Performance 18 16 14 Spec sheet: 3us/write + 17MB/s Bandwidth (MB/s) 12 1 8 6 4 2 Actual: 1us/write + 1MB/s 8 1 12 14 16 18 2 22 lg(block Size)

13 Read-Write Performance Mixing reads and writes gives the worst of both. startup (S) bandwidth (W) read 2µs 25MB/s write 1µs 1MB/s mixed 2µs 1MB/s

14 What Block Size To Use? For point-queries, B-trees are insensitive to block size. As soon as you have any reasonable fanout you do well. For range queries, the block size is important. Tension: Large block sizes make range queries faster. Large block sizes make point queries slower.

15 Half-power point Idea: Set block size so that half the time is accounted for the seek time, and half the time by bandwidth. Half Power Point SSD Rotating Disk read 5KB.5MB 1MB write 1KB.5MB 1MB read/write mix 21KB.5MB 1MB

16 Cache-Oblivious Approach Use data structures that are fast for any block size. Can also speed insertions without slowing searches. 3 iibench - SSD Insert Test 25 2 Rows/Second 15 1 5 2,, 4,, 6,, 8,, 1,, 12,, 14,, Rows Inserted InnoDB - Intel SSD TokuDB 1.1.3 - Intel SSD InnoDB - RAID1 TokuDB 1.1.3 - RAID1 Tokutek s MySQL storage engine uses these ideas.