Ceph in a Flash Micron s Adventures in All-Flash Ceph Storage Ryan Meredith & Brad Spiers, Micron Principal Solutions Engineer and Architect 217 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject to change without notice. All information is provided on an AS IS basis without warranties of any kind. Statements regarding products, including regarding their features, availability, functionality, or compatibility, are provided for informational purposes only and do not modify the warranty, if any, applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their respective owners.
Solve the Storage Optimization Puzzle with Micron We ve Done the Tuning for You Consider Micron Powered Ceph Architectures Discuss How Your Workloads Could Benefit From 3D XPoint or Persistent Memory When It s Time How SSDs Might Even Be Best for Archive 2
Micron Storage Solutions Engineering Austin, TX BFL - Big Fancy Lab Real-world application performance testing using Micron Storage & DRAM Ceph, VSAN, Storage Spaces Hadoop, Spark MySQL, MSSQL, PostgreSQL Cassandra, MongoDB 3
Performance Comparison of Micron-powered Ceph Architectures 4
Micron-Powered Ceph Architectures PERFORMANCE COMPARISON Micron SATA PoC Micron SAS+SATA PoC Micron NVMe RA Completion Date March 216 July 216 April 217 # of Storage Nodes 8 1 4 # of Drives/Node 1x 8GB Micron M51DC 2x 8GB Micron S65DC 1x 8GB Micron M51DC 1x 2.4TB Micron 91MAX NVMe SSD Raw Capacity / Node 8TB 8TB 24TB CPU Intel 269v3 x2 Intel 269v4 x2 Intel 2699v4 x2 RAM 256GB 256GB 256GB Network Mellanox 4GbE Mellanox 4GbE Mellanox 5GbE OS Ubuntu 14.4 RHEL 7.2 RHEL 7.3 Ceph Version Ceph Hammer.94.5 Red Hat Ceph Storage 1.3.2 (Hammer.94.5) Red Hat Ceph Storage 2.1 (Jewel 1.2.3) 5
Micron-Powered Ceph Architectures PERFORMANCE COMPARISON RBD FIO 4KB RANDOM PERFORMANCE / NODE 287K IOPS 199K IOPS 125K IOPS 6K IOPS 1K IOPS 23K IOPS Micron Ceph.94.5 SATA Micron RH Ceph 1.3 SAS + SATA 4KB Random Read Micron RH Ceph 2.1 91 MAX NVMe Micron Ceph.94.5 SATA Micron RH Ceph 1.3 SAS + SATA 4KB Random Write Micron RH Ceph 2.1 91 MAX NVMe 6
Micron + Red Hat + Supermicro All-NVMe Ceph Reference Architecture 7
Hardware Configuration MICRON + RED HAT + SUPERMICRO ALL-NVMe CEPH RA Storage Nodes (x4): Supermicro Ultra Server SYS-128U-TN1RT+ 2x Intel 2699v4 22 core Xeon 256GB DDR4-24 DRAM (8x 32GB) 2x Mellanox 5GbE 1-port NICs 1x Micron 2.4TB 91MAX NVMe SSD Monitor Nodes (x3): Supermicro SYS-128U-TNRT+ (1U) Network: 2 Supermicro 1GbE 32-Port Switches SSE-C3632SR 8
Software Configuration MICRON + RED HAT + SUPERMICRO ALL-NVMe CEPH RA Storage + Monitor Nodes Red Hat Ceph Storage 2.1 (Jewel 1.2.3) Red Hat Enterprise Linux 7.3 Mellanox OFED Driver 3.4.2 Switch OS Cumulus Linux 3.1.2 Deployment Tool Ceph-Ansible 9
Performance Testing Methodology MICRON + RED HAT + SUPERMICRO ALL-NVMe CEPH RA FIO RBD for Block Tests RADOS Bench for Object Tests 12x Supermicro SYS-228U Load Generators Mellanox Connectx-4 4GbE Networking Tests kicked off on multiple clients simultaneously 15-minute test runs x 3 for recorded average performance results 5TB of Data on 2x Replicated pool (1TB total Data) 1
Drive Scaling TESTED IN REFERENCE ARCHITECTURE Drives were scaled up to determine the performance sweet spot, a balance between CPU utilization, Network, Storage, and Ceph 8+ OSD processes necessary to fully utilize 2x 2699v4 CPUs 2 Drives / Storage Node 4 Drives / Storage Node 1 Drives / Storage Node Total # of Drives 8 16 4 Total Raw Capacity # of OSDs per Drive 19.2 TB 38.4 TB 96 TB 4 2 1 Total # of OSDs 32 32 4 11
4KB Read IOPs FIO RBD 4KB Random Read Performance MICRON + RED HAT + SUPERMICRO ALL-NVMe CEPH RA 4 & : CPU Limited 1,5 K 1, K 4KB Random Read IOPs : Drive & CPU Limited 5 K 19.2TB 38.4TB 96TB 4KB Random Read IOPs Latency 1 4KB Random Read Storage Node 745K 2.1ms 1.13M 1.38ms 5 1.15M 1.11ms 12
4KB Write IOPs FIO RBD 4KB Random Write Performance MICRON + RED HAT + SUPERMICRO ALL-NVMe CEPH RA 4 & : CPU Limited 3 K 2 K 1 K 4KB Random Write IOPs : Drive Limited 19.2TB 38.4TB 4KB Rand Write Storage Node 96TB 4KB Random Write IOPs Latency 163K 6usec 1 5 24K 5usec 242K 5usec 13
4MB Read (GB/s) Rados Bench 4MB Object Read Performance MICRON + RED HAT + SUPERMICRO ALL-NVMe CEPH RA Object Read is Network Limited 4MB Read (GB/s) 25 2 15 1 5 19.2TB 38.4TB 96TB 4MB Object Read Throughput Latency 4 4MB Object Read 2.7 GB/s, 166 Gbps 37ms 2 21.2 GB/s, 17 Gbps 21.8 GB/s, 174 Gbps 36ms 35ms 14
4MB Write (GB/s) Rados Bench 4MB Object Write Performance MICRON + RED HAT + SUPERMICRO ALL-NVMe CEPH RA Object Write is Drive Limited 4MB Object Write Throughput 1.8 GB/s, 14 Gbps 3.2 GB/s, 26 Gbps 4.6 GB/s, 37 Gbps Latency 14ms 81ms 41ms 4 2 5. 4. 3. 2. 1.. 19.2TB 4MB Write (GB/s) 38.4TB 4MB Object Write Drive Latency (ms) 96TB 15
4KB Block Performance Summary (RBD) MICRON + RED HAT + SUPERMICRO ALL-NVMe CEPH RA 4 Micron 91MAX NVMe drives/storage node is the optimal IOPs / node Increasing past 4 drives marginally reduces latency and increases IOPs Red Hat Ceph Storage 2.1 can saturate 2x Intel 2699v4 s with 8 to 1 OSDs provided proper tuning and sufficiently fast drives 4KB Reads will saturate a 1GbE link at this performance level, 25GbE+ recommended 4KB Writes can be serviced by 1GbE at this performance level 16
Object Performance Summary (RADOS Bench) MICRON + RED HAT + SUPERMICRO ALL-NVMe CEPH RA Reads are always network limited with 5GbE, even with 2 drives per node Writes are drive limited, can saturate 25GbE Symptom of large object writes with journals and OSDs co-located Preliminary testing with Kraken + BlueStore showed large improvements in 4MB writes CPU utilization is low 17
Platform Notes MICRON + RED HAT + SUPERMICRO ALL-NVMe CEPH RA CPU 1 has 6 NVMe CPU 2 has 4 NVMe + 2 PCIe x8 slots (2x 1-port 5GbE NIC) Tuning OSD processes to reside on specific CPUs did not net a performance gain Good old irqbalance did a decent job evenly distributing load 5GbE is the fastest NIC for x8 PCIe, this server could not use 1GbE 4MB object read is the only test that would benefit from 1GbE 18
Future Ceph Testing at Micron 19
NVDIMM: Non-volatile DRAM FUTURE CEPH TESTING AT MICRON 8GB 16GB capacities Fits in DDR4 DRAM Slot Super capacitor allows DRAM to dump to local flash during power outage Crazy fast Possible use cases for Ceph Small journals (2GB-4GB) in front of NVMe OSDs (Jewel) Storage for RocksDB data using BlueStore (Kraken / Luminous) 2
Micron 51 SATA SSD FUTURE CEPH TESTING AT MICRON 8TB capacity Utilizes 3D TLC NAND 1U Storage Nodes, up to 8TB/Storage Node Possible Ceph Architectures All SATA Capacity Solution SATA + NVMe Journals SATA + NVDIMM Journals 21
Flash optimizes your space smarter, better, faster. + Bankable TCO + Bringing data closer to CPU + Exponential capacity and speed increases + Drastic reduction/re-investment in datacenter real estate + Dramatic cut in physical power / energy costs 22
What is the Impact of System Implementation TRADITIONAL SAN TO ALL-FLASH SAN 2.5x Block-Mode NVDIMM DDR <1ns 3D XPoint NVMe <1µs NVMe SSD 25µs SATA SSD 3µs SAS HDD 1K RPM 6ms All-Flash Array Fibre Channel SAN 12ms Hybrid Array Fibre Channel SAN 3ms FASTEST FASTER FAST NVDIMM DRAM NAND Flash 3D XPoint NVMe SSD SATA SSD SAS HHD NAND Flash NAND Flash 1K AFA FC-SAN NAND Flash Hybrid SAN NAND Flash 7.2K, 1K, 15K Software Protection & Scale Array Protection & Scale 23
Moving into the Realm of Real Advantages SAN TO VSAN WITH SATA SSD 1x Block-Mode NVDIMM DDR <1ns 3D XPoint NVMe <1µs NVMe SSD 25µs SATA SSD 3µs SAS HDD 1K RPM 6ms All-Flash Array Fibre Channel SAN 12ms Hybrid Array Fibre Channel SAN 3ms FASTEST FASTER FAST NVDIMM DRAM NAND Flash 3D XPoint NVMe SSD SATA SSD SAS HHD NAND Flash NAND Flash 1K AFA FC-SAN NAND Flash Hybrid SAN NAND Flash 7.2K, 1K, 15K Software Protection & Scale Array Protection & Scale 24
Not All Flash Built Systems are the Same AFA SAN TO VSAN WITH SATA SSD 4x Block-Mode NVDIMM DDR <1ns 3D XPoint NVMe <1µs NVMe SSD 25µs SATA SSD 3µs SAS HDD 1K RPM 6ms All-Flash Array Fibre Channel SAN 12ms Hybrid Array Fibre Channel SAN 3ms FASTEST FASTER FAST NVDIMM DRAM NAND Flash 3D XPoint NVMe SSD SATA SSD SAS HHD NAND Flash NAND Flash 1K AFA FC-SAN NAND Flash Hybrid SAN NAND Flash 7.2K, 1K, 15K Software Protection & Scale Array Protection & Scale 25
The NVMe Advantage Can Vary By Use NVME VSAN OVER AFA SAN 48x Block-Mode NVDIMM DDR <1ns 3D XPoint NVMe <1µs NVMe SSD 25µs SATA SSD 3µs SAS HDD 1K RPM 6ms All-Flash Array Fibre Channel SAN 12ms Hybrid Array Fibre Channel SAN 3ms FASTEST FASTER FAST NVDIMM DRAM NAND Flash 3D XPoint NVMe SSD SATA SSD SAS HHD NAND Flash NAND Flash 1K AFA FC-SAN NAND Flash Hybrid SAN NAND Flash 7.2K, 1K, 15K Software Protection & Scale Array Protection & Scale 26
What about Non Volatile RAM? NVDIMM IS DATA SAFE AND FAST 3,x Block-Mode NVDIMM DDR <1ns 3D XPoint NVMe <1µs NVMe SSD 25µs SATA SSD 3µs SAS HDD 1K RPM 6ms All-Flash Array Fibre Channel SAN 12ms Hybrid Array Fibre Channel SAN 3ms FASTEST FASTER FAST NVDIMM DRAM NAND Flash 3D XPoint NVMe SSD SATA SSD SAS HHD NAND Flash NAND Flash 1K AFA FC-SAN NAND Flash Hybrid SAN NAND Flash 7.2K, 1K, 15K Software Protection & Scale Array Protection & Scale 27
Summary: Solve the Storage Optimization Puzzle with Micron We ve Done the Tuning for You Consider Micron Powered Ceph Architectures Discuss How Your Workloads Could Benefit From 3D XPoint or Persistent Memory When It s Time How SSDs Might Even Be Best for Archive Visit Us In Booth B3 For More Information 28
Thanks All These slides will be available on the OpenStack conference website Reference architecture available now! 29
Micron Ceph Collateral MICRON + RED HAT + SUPERMICRO ALL-NVMe CEPH RA Micron NVMe Reference Architecure: https://www.micron.com/solutions/micronaccelerated-solutions Direct Link to RA Document: https://www.micron.com/~/media/documents/product s/technical-marketingbrief/accelerated_ceph_solution_nvme_ref_arch.pdf 3
Backup 31
4KB Read IOPs IOPs Per Drive FIO RBD 4KB Random Read Performance MICRON + RED HAT + SUPER MICRO ALL-NVMe CEPH REFERENCE ARCHITECTURE 4KB Random Read IOPs vs. Ceph IOPs/Drive 4KB Random Read Storage Node 1,2 K 1,1 K 1, K 9 K 8 K 7 K 6 K 19.2TB 38.4TB 96TB 1 K 9 K 8 K 7 K 6 K 5 K 4 K 3 K 2 K 1 K 1 8 6 2 4KB Random Read Client Network (MB/s) External Network External Network External Network 4KB Random Read IOPs Latency IOPS/Drive 745k 2.1ms 93k 1.13M 1.38ms 71k 1 K 4KB Random Read Single Drive IOPs 1.15M 1.11ms 29k Single Drives IOPs Single Drives IOPs Single Drives IOPs 32
FIO RBD 4KB Random Read Performance MICRON + RED HAT + SUPER MICRO ALL-NVMe CEPH REFERENCE ARCHITECTURE RBD FIO 4KB Random Read Performance is CPU Limited at 4 & CPU and Drive Limited at 4KB Random Read IOPs 4KB Random Read Storage Node 1,2 K 1 1,1 K 95 9 1, K 85 8 9 K 75 7 8 K 65 7 K 6 55 6 K 19.2TB 38.4TB 96TB 5 33
4KB Write IOPs IOPs per Drive FIO RBD 4KB Random Write Performance MICRON + RED HAT + SUPER MICRO ALL-NVMe CEPH REFERENCE ARCHITECTURE 4KB Random Write IOPs vs. Ceph IOPs/Drive 4KB Random Write Storage Node 26 K 25 K 1 24 K 22 K 2 K 2 K 18 K 16 K 14 K 12 K 15 K 1 K 5 K 5 4KB Random Write Client Network (MB/s) 1 K 19.2TB 38.4TB 96TB External Network External Network External Network 4KB Random Write IOPs Latency IOPS/Drive 163k 6usec 2k 1 4KB Random Write Storage Network (MB/s) 24k 1.38ms 15k 242k 5usec 6k Storage Network Storage Network Storage Network 34
FIO RBD 4KB Random Write Performance MICRON + RED HAT + SUPER MICRO ALL-NVMe CEPH REFERENCE ARCHITECTURE 4KB Write is CPU Limited at 4 & 5 4KB Random Write Single Drive Latency (ms) Drive Limited at Single Drives Latency Single Drives Latency Single Drives Latency 4KB Random Write Storage Node 4KB Random Write IOPs 1 26 K 24 K 22 K 2 K 18 K 16 K 14 K 5 K 4KB Random Write Single Drive IOPs 12 K 1 K 19.2TB 38.4TB 96TB Single Drives IOPs Single Drives IOPs Single Drives IOPs 35
4MB Read (GB/s) Throughput/Drive (GB/s) Rados Bench 4MB Object Read Performance MICRON + RED HAT + SUPER MICRO ALL-NVMe CEPH REFERENCE ARCHITECTURE 4MB Object Read is Network Limited 4 4MB Object Read 4MB Read (GB/s) vs. Ceph Throughput/Drive (GB/s) 3 24. 3. 2 22. 2. 18. 16. 14. 12. 2.5 2. 1.5 1..5 1 1. 19.2TB 38.4TB 96TB. 3 4MB Object Read Single Drive Throughput (MB/s) 4MB Object Read Throughput Latency Throughput /Drive 2.7 GB/s, 166 Gbps 37ms 2.6 GB/s 21.2 GB/s, 17 Gbps 36ms 1.3 GB/s 21.8 GB/s, 174 Gbps 35ms.55 GB/s 2 1 Single Drives Throughput Single Drives Throughput Single Drives Throughput 36
4MB Write (GB/s) Throughput/Drive (MB/s) Rados Bench 4MB Object Write Performance MICRON + RED HAT + SUPER MICRO ALL-NVMe CEPH REFERENCE ARCHITECTURE 5. 4.5 4. 3.5 3. 2.5 2. 1.5 1..5. 4MB Write (GB/s) vs. Ceph Throughput/Drive (MB/s) 19.2TB 38.4TB 96TB 25 2 15 1 5 5 2 4MB Object Write 4MB Object Write Client Network (MB/s) External Network External Network External Network 4MB Object Write Throughput Latency Throughput /Drive 1.8 GB/s, 14 Gbps 14ms 23 MB/s 3.2 GB/s, 26 Gbps 81ms 25 MB/s 4.6 GB/s, 37 Gbps 41ms 118 MB/s 5 4MB Object Write Storage Network (MB/s) Storage Network Storage Network Storage Network 37
Rados Bench 4MB Object Write Performance MICRON + RED HAT + SUPER MICRO ALL-NVMe CEPH REFERENCE ARCHITECTURE 4MB Object Write is Drive Limited 5. 4.5 4. 3.5 3. 2.5 2. 1.5 1..5. 19.2TB 4MB Object Write (GB/s) 38.4TB 96TB 2 15 1 5 4 4MB Object Write Single Drive Throughput Single Drives Throughput Single Drives Throughput 4MB Object Write Single Drive Latency Single Drives Throughput Average Drive Throughput Average Drive Latency 3 2 929 MB/s 112ms 1 772 MB/s 92ms 479 MB/s 52ms Single Drives Latency Single Drives Latency Single Drives Latency 38