Ceph vs Swift Performance Evaluation on a Small Cluster edupert monthly call July, 24th 2014
About me Vincenzo Pii Researcher @ Leading research initiative on Cloud Storage Under the theme IaaS More on ICCLab: www.cloudcomp.ch
About this work Performance evaluation study on cloud storage Small installations Hardware resources hosted at the ZHAW ICCLab data center in Winterthur Two OpenStack clouds (stable and experimental) One cluster dedicated to storage research
INTRODUCTION
Cloud storage Cloud storage Based on distributed, parallel, fault-tolerant file systems Distributed resources exposed through a single homogeneous interface Typical requirements Highly scalable Replication management Redundancy (no single point of failure) Data distribution Object storage A way to manage/access data in a storage system Typical alternatives Block storage File storage
Ceph and Swift Ceph (ceph.com) Supported by Inktank Recently purchased by RedHat (owners of GlusterFS) Mostly developed in C++ Started as PhD thesis project in 2006 Block, file and object storage Swift (launchpad.net/swift) OpenStack object storage Completely written in Python RESTful HTTP APIs
Objectives of the study 1. Performance evaluation of Ceph and Swift on a small cluster Private storage Storage backend for own-apps with limited requirements in size Experimental environments 2. Evaluate Ceph maturity and stability Swift already widely deployed and industry-proven 3. Hands-on experience Configuration Tooling
CONFIGURATION AND PERFORMANCE OF SINGLE COMPONENTS
Network configuration Three servers on a dedicated VLAN 1 Gbps NICs 100BaseT cabling Node 1 (.2) Node 2 (.3) Node 3 (.4) 10.0.5.x/24
Servers configuration Hardware Lynx CALLEO Application Server 1240 2x Intel Xeon E5620 (4 core) 8x 8 GB DDR3 SDRAM, 1333 MHz, registered, ECC 4x 1 TB Enterprise SATA-3 Hard Disk, 7200 RPM, 6 Gb/s (Seagate ST1000NM0011) 2x Gigabit Ethernet network interfaces Operating system Ubuntu 14.04 Server Edition with Kernel 3.13.0-24- generic
Disks performance READ: $ sudo hdparm -t --direct /dev/sdb1 /dev/sdb1: Timing O_DIRECT disk reads: 430 MB in 3.00 seconds = 143.17 MB/sec WRITE: $ dd if=/dev/zero of=anof bs=1g count=1 oflag=direct 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 8.75321 s, 123 MB/s
Network performance $ iperf -c ceph-osd0 ------------------------------------------------------------ Client connecting to ceph-osd0, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.0.5.2 port 41012 connected with 10.0.5.3 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.10 GBytes 942 Mbits/sec 942 Mbits/s 117.5 MB/s
CLOUD STORAGE CONFIGURATION
Ceph OSDs cluster-admin@ceph-mon0:~$ ceph status cluster ff0baf2c-922c-4afc-8867-dee72b9325bb health HEALTH_OK monmap e1: 1 mons at {ceph-mon0=10.0.5.2:6789/0}, election epoch 1, quorum 0 ceph-mon0 osdmap e139: 4 osds: 4 up, 4 in pgmap v17348: 632 pgs, 13 pools, 1834 bytes data, 52 objects 199 MB used, 3724 GB / 3724 GB avail 632 active+clean cluster-admin@ceph-mon0:~$ ceph osd tree # id weight type name up/down reweight -1 3.64 root default -2 1.82 host ceph-osd0 0 0.91 osd.0 up 1 1 0.91 osd.1 up 1-3 1.82 host ceph-osd1 2 0.91 osd.2 up 1 3 0.91 osd.3 up 1 Monitor (mon0) HDD1 (OS) Not used Not used Not used St. node 0 HDD1 (OS) osd0 XFS osd1 XFS Journal St. node 1 HDD1 (OS) osd2 XFS osd3 XFS Journal
Swift devices Building rings on storage devices (No separation of Accounts, Containers and Objects) export ZONE= # set the zone number for that storage device export STORAGE_LOCAL_NET_IP= # and the IP address export WEIGHT=100 # relative weight (higher for bigger/faster disks) export DEVICE= swift-ring-builder account.builder add z$zone-$storage_local_net_ip:6002/$device $WEIGHT swift-ring-builder container.builder add z$zone-$storage_local_net_ip:6001/$device $WEIGHT swift-ring-builder object.builder add z$zone-$storage_local_net_ip:6000/$device $WEIGHT Swift Proxy HDD1 (OS) Not used Not used Not used St. node 0 HDD1 (OS) dev1 XFS dev2 XFS Not used St. node 1 HDD1 (OS) dev3 XFS dev4 XFS Not used
Highlighting a difference LibRados used to access Ceph Plain installation of a Ceph storage cluster Non ReST-ful interface This is the fundamental access layer in Ceph RadosGW (Swift/S3 APIs) is an additional component on top of LibRados (as block and file storage clients) ReST-ful APIs over HTTP used to access Swift Extra overhead in the communication Out-of-the box access method for Swift This is part of the differences to be benchmarked, even if... HTTP APIs for object-storage are interesting for many use cases This use case: Unconstrained self-managed storage infrastructure for, e.g., own apps Control over infrastructure and applications
WORKLOADS
Tools COSBench (v. 0.4.0.b2) - https://github.com/intel-cloud/cosbench Developed by Intel Benchmarking for Cloud Object Storage Supports both Swift and Ceph Cool web interface to submit workloads and monitor current status Workloads defined as XML files Very good level of abstractions applying to object storage Supported metrics Op-Count (number of operations) Byte-Count (number of bytes) Response-Time (average response time for each successful request) Processing-Time (average processing time for each successful request) Throughput (operations per seconds) Bandwidth (bytes per seconds) Success-Ratio (ratio of successful operations) Outputs CSV data Graphs generated with cosbench-plot - https://github.com/icclab/cosbench-plot Describe inter-workload charts in Python
Workloads gist
COSBench web interface
Workload matrix Containers Objects size R/W/D Distr. (%) Workers 1 4 kb 80/15/5 1 20 128 kb 100/0/0 16 512 kb 0/100/0 64 1024 kb 128 5 MB 256 10 MB 512
Workloads 216 workstages (all the combinations of the values of the workload matrix) 12 minutes per workstage 2 minutes warmup 10 minutes running time 1000 objects per container (pools in Ceph) Uniformly distributed operations over the available objects (1000 or 20000)
Performance Results READING
Read tpt Workstage AVGs
Read tpt 1 cont 4 KB
Read tpt 1 cont 128 KB
Read ResponseTime 1 cont 128 KB
Read bdw Workstage AVGs
Read tpt 20 cont 1024 KB
Response time 20 cont 1024 KB
Performance Results WRITING
Write tpt Workstage AVGs
Write tpt 1cont 128 KB
Ceph write tpt 1 cont 128 KB replicas
Ceph write RT 1 cont 128 KB replicas
Write bdw Workstage AVGs
Write Response Time 20 cont 512 KB
Performance Results READ/WRITE/DELETE
R/W/D tpt Workstage AVGs
Read R/W/D Response Time
General considerations and future works CONCLUSIONS
Performance Analysis Recap Ceph performs better when reading, Swift when writing Ceph librados Swift ReST APIs over HTTP More remarkable difference with small objects Less overhead for Ceph Librados CRUSH algorithm Comparable performance with bigger objects Network bottleneck at 120 MB/s for read operations Response time Swift: greedy behavior Ceph: fairness
General considerations: challenges Equivalency Comparing two similar systems that are not exactly overlapping Creating fair setups (e.g., Journals on additional disks for Ceph) Transposing corresponding concepts Configuration Choosing the right/best settings for the context (e.g., number of Swift workers) Identifying bottlenecks To be done in advance to create meaningful workloads Workloads Run many tests to identify saturating conditions Huge decision space Keep up the pace Lot of developments going on (new versions, new features)
General considerations: lessons learnt Publication medium (blog post) Excellent feedback (e.g., Rackspace developer) Immediate right of reply and real comments Most important principles Openness Share every bit of information Clear intents, clear justifications Neutrality When analyzing the results When drawing conclusions Very good suggestions coming from you could, you should, you didn t comments
Future works Performance evaluation necessary for cloud storage More object storage evaluations Interesting because it s very close to the application level Block storage evaluations Very appropriate for IaaS Provide storage resources to VMs Seagate Kinetic Possible opportunity to work on a Kinetic setup
Vincenzo Pii: piiv@zhaw.ch THANKS! QUESTIONS?