ysql and Ceph A tale of two friends
Karan Singh Sr. Storage Architect Red Hat Taco Scargo Sr. Solution Architect Red Hat
Agenda Ceph Introduction and Architecture Why ysql on Ceph ysql and Ceph Performance Tuning Head-to-Head Performance ysql on Ceph vs. AWS Architectural Considerations Where to go next?
Quick Poll - Who runs DB workloads on V / Cloud? - Who is familiar with Ceph?
Ceph Introduction & Architecture
What is Ceph? Open Source Software Defined Storage Solution Unified Storage Platform ( Block, Object and File Storage ) Runs on Commodity Hardware Self anaging, Self Healing assively Scalable No Single Point of failure
Ceph : Under the hood
Architectural Components OB JE CT S VIRTUA L DISKS FILES YSTE RGW RBD CEPHFS A web services gateway for object storage, compatible with S3 and Swift A reliable, fully-distributed block device with cloud platform integration A distributed file system with POSIX semantics and scale-out metadata LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors
RADOS Components OSDs ( Object Storage Daemon ) s to 000s in a cluster Typically one daemon per disk Stores actual data on disk Intelligently peer for replication & recovery onitors aintain cluster membership and health Provide consensus for distributed decision-making Small, odd number Do not store data
Ceph OSDs OSD OSD OSD OSD XFS XFS XFS XFS DISK DISK DISK DISK
RADOS cluster a.k.a Ceph cluster APPLICATION RADOS CLUSTER
How to access the cluster? APPLICATION OBJECTS??
CRUSH Algorithm Controller Replication Under Scalable Hashing 11 11 OBJECTS 11 PLACEENT GROUPS (PGs) CLUSTE R
Data is organized into pools OBJECTS OBJECTS OBJECTS OBJECTS POOL A 11 11 POOL B POOL C POOL D 11 11 POOLS (CONTAINING PGs) 11 11 CLUSTE R
Ceph Access ethods
ARCHITECTURAL COPONENTS AP P HOST/V CLIEN T RGW RBD CEPHFS A web services gateway for object storage, compatible with S3 and Swift A reliable, fully-distributed block device with cloud platform integration A distributed file system with POSIX semantics and scale-out metadata LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors
ARCHITECTURAL COPONENTS AP P HOST/V CLIEN T RGW RBD CEPHFS A web services gateway for object storage, compatible with S3 and Swift A reliable, fully-distributed block device with cloud platform integration A distributed file system with POSIX semantics and scale-out metadata LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors
STORING VIRTUAL DISKS V HYPERVISOR LIBRBD RADOS CLUSTER
VIRTUAL ACHINE LIVE IGRATION HYPERVISOR HYPERVISOR V LIBRBD LIBRBD RADOS CLUSTER
PERSISTENT STORGE FOR CONTAINERS CONTAINER HOST KRBD RADOS CLUSTER
PERCONA SERVER ON KRBD CONTAINER HOST KRBD RADOS CLUSTER
Why ysql on Ceph
Why ysql on Ceph? ARKET DRIVERS Ceph #1 block storage for OpenStack ysql #4 workload on OpenStack (#1-3 often use database too!) 70% apps use LAP on OpenStack ysql leading open-source RDBS Ceph leading open-source SDS
Why ysql on Ceph? OPS EFFICIENCY DRIVERS Distributed, elastic storage pools on commodity servers Dynamic data placement Flexible volume resizing Live instance migration Pool and volume snapshot Read replicas via copy-on-write snapshots Familiar environment like public clouds
Why ysql on Ceph? Database Requires HIGH IOPS Workload edia Access ethod General Purpose Spinning/SSD Block Capacity ( $/GB ) Spinning Object High IOPS ( $/IOPS ) SSD / NVe Block
ysql and Ceph : Performance Tuning
Tuning for Harmony Tuning ysql Tuning Ceph Buffer pool > 20% RHCS 1.3.2, tcmalloc 2.4, 128 thread cache Flush each Transaction or batch? If ( OSDs on Flash media) ; then Percona Parallel double write buffer feature Co-resident journals 2-4 OSDs per SSD/NVe If ( OSDs on agnetic media ) ; then SSD Journals RAID write back cache RBD cache Software cache
Tuning for Harmony Effect of ysql Buffer Pool On TpmC
Tuning for Harmony Effect of ysql Tx flush on TpmC
Tuning for Harmony Creating a separate pool to serve IOPS workload Creating multiple pools in the CRUSH map Distinct branch in OSD tree Edit CRUSH map, add SSD rules Create pool, set crush_ruleset to SSD rule If storage provisioning using OpenStack ; then Add volume type to Cinder If! OpenStack ; then Provision database storage volumes from SSD pool
Head to Head Performance ysql on Ceph vs. ysql on AWS 30 IOPS/GB: AWS EBS P-IOPS TARGET
Head-To-Head LAB Test Environment EC2 r3.2xlarge and m4.4xlarge Supermicro servers EBS Provisioned IOPS and GPSSD Red Hat Ceph Storage RBD Percona Server Percona Server
SUPERICRO Ceph Cluster Lab Environment Shared G SFP+ Networking 5x OSD Nodes Ceph OSD Nodes 5x SuperStorage SSG-6028R-OSDXXX Dual Intel Xeon E5-2650v3 (x core) 32GB SDRA DDR3 2x 80GB boot drives 4x 800GB Intel DC P3700 (hot-swap U.2 NVe) 1x dual port GbE network adaptors AOC-STGN-i2S 8x Seagate 6TB 7200 RP SAS (unused in this lab) ellanox 40GbE network adaptor(unused in this lab) onitor Nodes ysql Client Nodes 12x Super Server 2UTwin2 nodes Dual Intel Xeon E5-2670v2 (cpuset limited to 8 or 16 vcpus) 64GB SDRA DDR3 12x Client Nodes Storage Server Software: Red Hat Ceph Storage 1.3.2 Red Hat Enterprise Linux 7.2 Percona Server 5.7.11
IOPS/GB per ysql Instance
Focusing on Write IOPS/GB AWS does throttling to serve deterministic performance
Effect of Ceph cluster loading on IOPS/GB
HEAD-TO-HEAD: ysql on Ceph vs. AWS
$/STORAGE-IOP
Architectural Considerations
Architectural Considerations Understanding the workloads Traditional Ceph Workload ysql Ceph Workload $/GB $/IOP PBs TBs Unstructured data Structured data B/sec IOPS
Architectural Considerations Fundamentally Different Design Traditional Ceph Workload ysql Ceph Workload 50-300+ TB per server < TB per server agnetic edia (HDD) Flash (SSD -> NVe) Low CPU-core:OSD ratio High CPU-core:OSD ratio GbE->40GbE GbE
Considering CPU Core to Flash Ratio
SUPERICRO ICRO CLOUD CEPH YSQL PERFORANCE SKU 1x CPU + 1x NVe + 1x SFP + 8x Nodes in 3U chassis odel: SYS-5038R-OSDXXXP + Per Node Configuration: CPU: Single Intel Xeon E5-2630 v4 emory: 32GB NVe Storage: Single 800GB Intel P3700 Networking: 1x dual-port G SFP+
Where to go Next?
ysql on Red Hat Ceph Storage Reference Architecture White Paper Download the PDF http://bit.ly/mysql-on-ceph
Red Hat Ceph Storage Test Drive Learning by Doing Absolutely Free Ceph playground Node Ceph Lab on AWS Self paced, instruction led http://bit.ly/ceph-test-drive
Thank You Ceph Test Drive: http://bit.ly/ceph-test-drive ysql on Ceph Reference Arch: http://bit.ly/mysql-on-ceph Join us to hear about ysql and Red Hat Storage Free Test Drive Environment Today 3:40 P, Room : Lausanne
How to access the cluster?