Turning Object. Storage into Virtual Machine Storage. White Papers

Turning Object Open vstorage is the World s fastest Distributed Block Store that spans across different Datacenter. It combines ultrahigh performance and low latency connections with a data integrity that has no comparison. Data is distributed across datacenters using both Replication and Erasure Coding. Joining Performance and Integrity is not a simple bolt-on solution and requires a from-the-ground-up approach. Disk Failures, Node Failures and even Datacenter Failures do not present data loss and hence do not threaten any of your Data Integrity. You have been lead to believe that in order to have a 100% Data Loss Protection you have to compromise on Performance. While this might sound logical and acceptable, in is time to step out of the box and demand a noncompromise Storage Platform. With Open vstorage you can have your cake and eat it too! Storage into Virtual Machine Storage Object Storage is today the standard to build scale-out storage. But due to technical hurdles it is impossible to run Virtual Machines directly from an Object Store. Open vstorage is the layer between the hypervisor and Object Store and turns the Object Store into a high performance, distributed, VM-centric storage platform. Antwerpse Steenweg 19, 9080 Lochristi Belgium Phone: +32 9 324 25 74 Mail: Info@openvstorage.com

The Features of Swift Highly Scalable Introduction Object Storage became mainstream over the last year. Amazon S3 started the Object Storage momentum but today other players such as Scality, Ceph, OpenStack (Swift) and many other on-site Object Storage Solutions are taking over. The adoption for general file storage use of this scale-out, costeffective storage systems are no longer to be stopped. Using Object Storage as primary storage for Virtual Machines on the other hand has not taken off due to many technical hurdles. With Open vstorage these hurdles are lifted and any Object Store can be turned into high performance, VM Centric Virtual Machine storage. The Rise of Object Storage Object Storage, a storage architecture that stores data as objects identified by a unique key, is fast becoming the standard way to store data. IDC1 estimates that the market for File- and Object-Based Storage will experience an annual growth rate of 27% through 2017, reaching $21.7 billion. This estimate might even be modest considering the amount of funding Object Storage companies have received over the last few years2: 1 http://amplidata.com/wp-content/uploads/2013/11/amplidata-idc-marketscape-2013.pdf 2 http://blog.oxygencloud.com/2013/09/16/after-10-years-object-storage-investment-continues-and-begins-tobear-significant-fruit/

The benefits of Object Storage are immense: It allows Service Providers to build scale-out storage solutions that offer the flexibility to scaleas-you-grow by adding more disks and standard x86 servers to the storage repository. Reliability is offered by duplicating data across multiple hosts or by even more advanced erasure coding algorithms. This makes it virtually impossible to lose data. Ease of management by taking away administrative low level functions such as managing logical volumes and raids. Standardized APIs as almost all Object Storage Solutions offer support for the Amazon Simple Storage Service (S3) API while traditional storage solutions each have their own proprietary API. This standardized Object Storage API significantly reduces vendor lock-in and makes migration between different Object Storage Solutions easy. Cost-effective as different storage tiers can easily be created by mixing fast storage with large capacity slow storage. Let s have a look on the traditional way of setting up Virtual Machine environments. Virtual Machines require block storage. But block level storage such as a SAN is hard to manage, hard to scale and is expensive. What is needed is a technology whereby Virtual Machines can use Object Stores instead of a SAN and get the benefits of the low cost and scale-out capabilities of Object Stores. However, there are a number of challenges in doing this, which are described below. Open vstorage is a "Grid Storage Router" that on one side connects the hypervisor and on the other side an Object Store to create a high performance, ultra reliable VM-centric and scale-out storage system.

The Object Storage Challenges Eventual Consistency Object Storage solutions are designed to be scale-out by simply adding more x86 servers, nodes to the Object Store. All these nodes work together to form a distributed, high available storage repository. Due to this distributed nature of Object Storage, it is subject to Brewer s CAP Theorem3. This theorem states that it is impossible for a distributed system to simultaneously provide Consistency (all nodes see the same data at the same time), Availability (a guarantee that every request receives a response about whether it was successful or failed) and Partition Tolerance (the system continues to operate despite failure of part of the system) at the same time. Object Stores can offer two but never all three. So a trade-off has to be made. For Object Storage the trade-off is eventual consistency. Eventual consistency means that in case data objects are stored and receive no new updates, that eventually all nodes with access to these data objects will return the last updated value. Eventual consistency has been proposed so Object Stores can offer an acceptable performance. Introducing eventual consistency has a big impact on the correctness of data. If you retrieve data from an Object Store, you are never sure that you actually received the latest data. By introducing eventual consistency we have allowed possible data corruption in order to have an acceptable performance. 3 http://ksat.me/a-plain-english-introduction-to-cap-theorem/

But rest assured, this doesn t mean that your data on the Object Store is possibly corrupt. It means that applications accessing data on the Object Store need to be aware and detect that data might be outdated. Upon this detection, the application can retrieve the data again and in many cases the subsequent call will return the correct data. Latency and performance Virtual Machines and especially IOPS devouring applications require low storage latency and high performance storage. Each Virtual Machine requires for its disks almost immediate access to the underlying storage. As latency and IOPS issues became a flood tide in larger Virtual Machine environments, faster and more expensive hardware was developed to bring the latency down. SAS disks, fiber channel, infiniband and All-Flash Arrays were introduced to offer the necessary bandwidth and an acceptable latency. Object Storage is developed and optimized to contain a massive amount of data. To maximize the amount of storage capacity per node in the Object Storage Cluster, large SATA disks are selected as these provide the best price per GB. By selecting these large disks, you can t achieve the IOPS and storage performance needed by Virtual Machines. This fact isn t jaw dropping as for years SANs have been fitted with fast, but small SAS drives. One could of course not try to maximize the amount the storage per node and select smaller, more expensive SSD disks, but this makes the price per GB stored data skyrocket. Having expensive, fast disks also does not remove the additional latency introduced by having the hypervisor connect over the local LAN to the Object Store. Fetching data across the network will never be faster than fetching it local even with infiniband or 40 GbE technologies. With converged and hyperconverged infrastructure, the trend towards bringing storage closer to the application layer is irreversibly started. Different Management Paradigms Object Stores understand objects, while hypervisors understand Virtual Machines. What is needed is a software layer that plugs into the hypervisor such that the system administrator doesn't need to understand LUNs, RAID groups, etc but can just manage Virtual Machines. This software layer has to translate a VM paradigm into an Object Store paradigm.

Why a (distributed) file system does not work for Virtual Machines Virtual Machines need block level storage, a block of storage they can control like a hard drive. File systems have over time been adjusted to emulate block storage behavior. For example copy-on-write file systems, where every write requires a read and 2 write actions, were developed to support blocklevel snapshots. It is clear that in case multiple Virtual Machines are writing at the same time, these write actions, which are very expensive IO actions, become a limitation for the performance. Virtualized environments demand the same file system to be available on all Hosts in the virtualization cluster. This requires a distributed file system or dedicated, expensive hardware like SANs. These distributed file systems are not designed for Virtual Machines, as they need to balance Consistency, Availability and Partition tolerance, which means their performance, is fundamentally limited and hence are not suited for virtualized environments. To conclude, none of the file systems today have been designed to link Virtual Machines and Object Storage. For example copy-on-write file systems struggle with eventual consistency as for every write they first need a read to safeguard the latest data and with eventual consistency you never know for sure you have the latest data. Turning Object Storage into Virtual Machine Storage To turn Object Storage into primary storage for hypervisors, the solution must be especially designed for Virtual Machines and take a radical different approach compared to existing storage technologies. Open vstorage takes this different approach and is designed from the ground up with Virtual Machines and their performance requirements in mind. It uses a well-considered architecture, which allows Object Storage to be turned into block storage for Virtual Machines and avoids pitfalls such as seen with distributed file systems linked to Object Storage. Open vstorage creates a unified namespace for the Virtual Machines across multiple Hosts. But in that namespace not all data gets treated the same way. The actual data of the Virtual Machine, the bits which make up the volume, are separated from all other files. Each created volume will be stored as a separate block device in a different bucket on the Object Store. Every new write on the Virtual Machine volume will result in a new 4k block that will be added to a Storage Container Object (SCO). Once a SCO is full, typically when it contains 4MB, it is pushed at a

slower pace to the back-end, an Object Store like Openstack Swift. As this second layer of storage is also a Time Based storage implementation, eventual consistency is no longer an issue. Let s say a Storage Container Object is pushed to the Object Store and later retrieved. Due to the fact that data is always appended and not overwritten, the Object Store can due to the eventual consistency rule give 2 answers, the actual data or no answer at all. But under no circumstances the hypervisor will receive outdated, incorrect data. Another big difference compared with traditional distributed file systems is that a volume of Virtual Machine is only available on one Host and not on all Hosts. Each Virtual Machine with the Open vstorage software has its own NFS server and exports a different file system instance. Nevertheless each Host is tricked into believing that it accesses a single unified namespace shared across all these Virtual Machines running the Open vstorage software. The non-volume files are treated completely different. Depending on their size and role, they are stored in a distributed database or Virtual File Server. For example VMware Virtual Machine configuration files (vmx files) need to be available on all Hosts so there are stored in the distributed database. By having the mission critical files stored in a distributed a database, Open vstorage supports VMware vmotion as the offered storage presented to each Hosts look like shared storage. ISO files, on the other hand, are not mission critical and are routed to a Virtual File Server stored on the Object Store. In case the Virtual File Server is down, it can easily be restarted on another Host.

Open vstorage features Open vstorage, as only solution in the market, turns Object Storage into usable, high performance storage for Virtual Machines with following features: Scale-out Open vstorage offers scalability both in performance and storage. Adding more Virtual Machines running the Open vstorage software will linearly scale the performance. This guarantees that storage performance will never be a bottleneck in the virtualized environment. Open vstorage allows adding multiple Object Stores to a single virtualized environment. Start with an Object Store and when available storage space becomes an issue, take a decision. Buy new hardware to enlarge the existing storage pool or invest into a new Object Store. Mixing and matching Object Stores as primary storage for Virtual Machines is something only Open vstorage offers. VM-Centric The flexibility of Open vstorage doesn t only appear in the possibility to mix different Object Stores but Open vstorage also allows to carefully designing your storage tiers. On one and the same Object Storage Solution you could for example have a test tier and a highly redundant tier. The highly redundant tier could for example make 3 copies of the data while the test tier saves the data only once. Both these storage tiers can be made available in Open vstorage as primary storage for Virtual Machines. Splitting up the Virtual Machine volumes into separately manageable buckets and objects turns Open vstorage into a VM-centric storage platform, which allows for storage actions like snapshotting, cloning or replication at the Virtual Machine level. Gone are the days of selecting a single retention policy across all Virtual Machines on the LUN. With Open vstorage, administrators can easily select only the most important Virtual Machines for replication4. On top, Open vstorage supports thin provisioning as only data that has been written to the Virtual Machine disk will be stored. Having VM-centric functionality lowers the management overhead as for example bulk provisioning of hundreds of Virtual Machines comes out of the box. These Virtual Machines are nearly instant provisioned as only metadata needs to be copied for each Virtual Machine. A snapshot is merely a reference to the correct metadata. Taking snapshots imposes thus no overhead as no data needs to be copied. 4 Planned for Q1 2015

High Performing To eliminate the typical VM I/O blender effect, circumvent the eventual consistency issue of Object Storage and boost storage performance, Open vstorage uses a write cache, which works as a transaction log, on fast Flash or SSD in the Host. These Storage Containers Objects (SCO s) are sequentially filled with each new 4k blocks that is written by a Virtual Machine. This basically turns any random write I/O behavior into a sequential write operation. The transaction log immediately confirms the write to the hypervisor for fast response times. During each write, the address of the 4k blocks, the hash, the SCO number and the offset are stored as metadata. Open vstorage uses a Paxos distributed database to provide redundancy and immediate access to the metadata in case the volume is moved or failed over to another Host. To provide redundancy all writes are mirrored to a Fail-Over Cache (FOC) on a second Host. The size of this Fail-Over Cache can be very small (couple of Mbytes per volumes) because there is only a need to protect data that is not yet stored on the Object Store. To improve the read performance Open vstorage uses a deduplicated read cache across all volumes hosted on the same hypervisor. If a read request is done, Open vstorage looks up the hash in the metadata and if it exists in the cache it will serve the data directly from SSD or flash storage, resulting in very fast read operations. When thin clones are made, for example when multiple Virtual Machines are cloned from a master template, the same 4k blocks will have the same hash and will be stored only once in the read cache (dedupe), while the hypervisor will see them all as individual and independent volumes. Conclusion In the past year on-site Object Stores (Ceph, Openstack Swift, ) have left the niche status and is becoming fast the de facto standard for scale-out, redundant storage. Running Virtual Machines on this type of storage does not come out of the box due to issues such as eventual consistency, high latency and limited bandwidth. Open vstorage is the solution to run Virtual Machines on top of these Object Stores. By using an architecture with caching on fast Flash or SSD drives close to the hypervisor, transaction logs and isolating Virtual Machine volumes from other Virtual Machine files, Open vstorage turns an Object Store into a high performance, distributed, VM-centric storage platform which lowers the management overhead and offers features such as zero-copy snapshots, thin provisioning, bulk provisioning and quick restores.