Evaluating Cloud Storage Strategies James Bottomley; CTO, Server Virtualization
Introduction to Storage Attachments: - Local (Direct cheap) SAS, SATA - Remote (SAN, NAS expensive) FC net Types - Block Spinning Disk Drive SSD RAID unit - File NFS CEPH - Object RADOS PCS Profit from the cloud 2
Storage Performance Comparison Profit from the cloud 3
Storage Cost Comparison Profit from the cloud 4
A Closer Look at the Terms Block device - A unit of storage - May be divided inflexibly (by partitioning) - Usually locally attached, but may be on a SAN File based Storage - Exports views of a filesystem via NFS, CIFS or other protocols - Is flexible storage in views can be expanded and contracted on the fly - Suffers from metadata issues on the server Object Storage - Really just means a flexible block device - May be expanded and contracted on the fly - Easily administrable (unlike LUN partitioning in SANs) Profit from the cloud 5
Storage Types Comparison Cloud Utility Simple Web API No easy way to update objects Slow CEPH, Gluster Object Size tuning problem Tuned to disk image size objects Designed for rapid update Scalable B/W Inelastic Hard to Aggregate Attached to individual systems Slightly Elastic Fixed size Good B/W Dedicated network Based on SAN Limited Scaling Hosting Utility Profit from the cloud 6
Object vs File and the Metadata Problem A large number of Cloud storage systems are file based - CEPH, Gluster The specific problem is that updating any file requires a change in the metadata - This produces both a hotness in the journal - As well as locking hierarchy issues - And communication with the metadata server - All of which slow the operations down Object storage only uses metadata when objects are resized, created or destroyed - Using a fixed size object incurs no metadata overhead whatsoever So objects providing virtual environment roots allows efficient embedded filesystems with zero metadata overhead Profit from the cloud 7
FUSE Issues Fuse is the Linux Userspace Filesystem Main problem is it s incredibly SLOW However, it is very useful, so a large number of cloud filesystems use it - Gluster Parallels originally avoided using it. However, now we ve decided we ll fix it for everyone Parallels engineers are currently interacting with the linux filesystems and fuse lists Object is to add write caching and mtime fixes to accelerate fuse Tests show we can get ~95% of the performance of a natively written filesystem Profit from the cloud 8
Consistency Strong Consistency is hard to achieve in clusters - Strong Consistency means that all updates are seen immediately after they are committed - Strong consistency is most often violated across cluster reconfigurations - Ironically, this is precisely when you usually need it (HA) - Sheepdog, CEPH, PStorage Eventual Consistency is the usual norm - Means that all updates are eventually seen, but may not be immediately visible after they are committed - SWIFT, Gluster (does have a much slower strong consistency quorum enforcement mode) Weak Consistency - Does not guarantee write ordering and visibility - Too weak to be useful for most cloud storage Profit from the cloud 9
Performance and Scalability Cloud storage must be designed to scale not just per node, but also per Virtual Environment per node This requires there be no bottlenecks connecting a virtual environment to storage - Sheepdog problem: it uses a single threaded per-node gateway process causing its scalability per VE to be poor Ideally, a direct connection should be made between the virtual environment using the object and the storage providing it with no intermediate broker - Or using an intermediate broker tuned for scalability Chunking (large block size for objects) also improves performance Profit from the cloud 10
Requirements for Hosting Storage The Cardinal hosting requirement is that existing local storage should be repurposed as generic object based storage for 1. Supporting Existing Hosting Environments and additional services 2. Enabling the provision of Cloud Services Equating to the technical requirements 1. Performance must be wire speed SATA (100MB/s) Tuned exactly for GB objects containing small files 2. Storage must be object based to avoid metadata issues 3. Objects should be capable of rapid random read/write updates 4. Storage bandwidth should scale linearly with the cluster Profit from the cloud 11
Simple Requirements for Additional Benefits Hosting Enhancements 1. Free storage from individual nodes Easy, fast migration of Virtual Environments High Availability 2. Simple and Efficient resizing with assist for legacy roots (ext3) Makes storage easier to sell in increments 3. Cloning and Snapshotting Value add for templating block based roots Permits easy backup 4. Redundancy Allows different storage SLAs for different prices Cloud Enhancements (Ideal Storage Solution) 1. Dropbox like services 2. Storage as a Service (like S3) 3. Storage on Demand 4. Tiered Storage Pricing Profit from the cloud 12
Ideal solution Technical Specs - Metadata is the key to improving performance - Large Static objects with rapid updates have fixed metadata - 100MB/s performance over gigabit ethernet (no 10GE requirement) Avoid - Anything like a filesystem (CEPH, Gluster) because of Locking problems Speed issues with per file need to consult metadata - Anything using FUSE (Gluster) At least anything using FUSE without the Parallels acceleration patches - Anything with a single threaded connection multiplexor (sheepdog) Per cluster is worse (kills all scalability) Per node is still bad (kills VE scalability) Profit from the cloud 13
Introducing Parallels Cloud Storage Why Choose Us? - We re the experts in the field (we studied the problem) - We fixed FUSE - We redid the Linux loop device to work efficiently for virtual environment roots In collaboration with Oracle who did the Direct I/O patches - Loop device also modified to do snapshotting and legacy filesystem resizing. - All the necessary infrastructure patches are upstream in linux Or are moving that way What we provide - Complete leverage of existing local node storage - Strong Consistency and Redundancy - Wire speed transfers because of optimised data architecture Up to 100MB/s/node over 1GigE - Hot object tiering and SSD caching Profit from the cloud 14
Parallels Cloud Storage Architecture Profit from the cloud 15
Future Features Chunk Server based snapshotting De-duplication Thin Provisioning - Actual storage size can appear much larger than in-use backing store because of sparsity of objects - Also provides ability to do dynamic in-place upgrades of actual storage capacity Innovative redundancy algorithms Geographic Object Replication for advanced disaster recovery Profit from the cloud 16
Conclusions Getting Cloud storage right for current hosting needs is not a simple problem - The basic construction of many cloud storage offerings is unsuitable to hosting provider environments Parallels has devoted considerable study and effort to mapping the needs of hosters on to cloud storage Parallels has studied the strengths and weaknesses of current cloud storage offerings and incorporated the best into our cloud storage offerings - While attempting to eliminate all the negative issues - And improve performance Parallels will leverage (and enhance) open source to achieve the best cloud storage system for hosters Profit from the cloud 17