Strong Consistency versus Weak Consistency

Similar documents
Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Hyperconverged Infrastructure: Cost-effectively Simplifying IT to Improve Business Agility at Scale

Efficient Data Center Virtualization Requires All-flash Storage

Dell EMC Hyperconverged Portfolio: Solutions that Cover the Use Case Spectrum

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010.

Enabling Hybrid Cloud Transformation

VMAX3: Adaptable Enterprise Resiliency

Dell EMC Isilon All-Flash

White. Paper. EMC Isilon Scale-out Without Compromise. July, 2012

Software-defined Storage by Veritas

(TBD GB/hour) was validated by ESG Lab

Who Better than Dell EMC to Offer Best-for-VMware Data Protection?

Abstract: Data Protection Cloud Strategies

Best Practices in Cloud-powered Data Protection

Analyzing the Economic Benefits of the HPE SimpliVity 380 All-flash Hyperconverged Portfolio

Cloud Migration Strategies

Analyzing the Economic Benefits of Datrium Cloud DVX

Closing the Hybrid Cloud Security Gap with Cavirin

The Role of Converged and Hyper-converged Infrastructure in IT Transformation

Veritas Resiliency Platform: The Moniker Is New, but the Pedigree Is Solid

Flash Storage-based Data Protection with HPE

ESG Research. Executive Summary. By Jon Oltsik, Senior Principal Analyst, and Colm Keegan, Senior Analyst

Technical Review Managing Risk, Complexity, and Cost with SanerNow Endpoint Security and Management Platform

Video Surveillance Solutions from EMC and Brocade: Scalable and Future-proof

ActiveScale Erasure Coding and Self Protecting Technologies

Economic Benefit Analysis of On-premises Object Storage versus Public Cloud

Flash Storage Fuels IT Transformation

HPE SimpliVity Hyperconverged Infrastructure for VDI Environments

ESG Lab Review Accelerating Time to Value: Automated SAN and Federated Zoning with HPE 3PAR and Smart SAN for 3PAR

Reference Research: Disk-based Storage Capacity Trends Date: September 2012 Author: Bill Lundell, Senior Research Analyst

ActiveScale Erasure Coding and Self Protecting Technologies

Cat Herding. Why It s Time for a Millennial Approach to Storage. Cloud Expo East Western Digital Corporation All rights reserved 01/25/2016

NEC HYDRAstor Date: September, 2009 Author: Terri McClure, Senior Analyst, and Lauren Whitehouse, Senior Analyst

Seeing Past the Hype: Understanding Converged and Hyperconverged Platforms

Running Splunk on VxRack FLEX

Simply Pure Cloud Economics 101

powered by Cloudian and Veritas

Modernizing Virtual Infrastructures Using VxRack FLEX with ScaleIO

CONFIGURATION GUIDE WHITE PAPER JULY ActiveScale. Family Configuration Guide

Analyzing the Economic Value of HPE ConvergedSystem 700 in Enterprise Environments. By Mark Bowker, Senior Analyst and Adam DeMattia, Research Analyst

MiTek Sapphire Build. Scalable Software for Home Building Management. ESG Lab Validation. By Brian Garrett, Vice President, ESG Lab April 2017

IBM FlashSystem 9100: Accelerate Data for the Multi-cloud Era

NetApp Clustered Data ONTAP 8.2 Storage QoS Date: June 2013 Author: Tony Palmer, Senior Lab Analyst

Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection

Performance Evaluation Criteria for Hyperconverged Infrastructures

The Microsoft Large Mailbox Vision

Converged Platforms and Solutions. Business Update and Portfolio Overview

Always Available Dell Storage SC Series Date: October 2015 Author: Brian Garrett, VP ESG Lab

Lab Validation Report

ESG Lab Report. Integrated Platforms for Breakthrough Insights. The HP and Microsoft Data Management Portfolio

Why Converged Infrastructure?

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

(TBD GB/hour) was validated by ESG Lab

Abstract. The Challenges. The Solution: Veritas Velocity. ESG Lab Review Copy Data Management with Veritas Velocity

Shavlik Protect: Simplifying Patch, Threat, and Power Management Date: October 2013 Author: Mike Leone, ESG Lab Analyst

For Healthcare Providers: How All-Flash Storage in EHR and VDI Can Lower Costs and Improve Quality of Care

Eight Tips for Better Archives. Eight Ways Cloudian Object Storage Benefits Archiving with Veritas Enterprise Vault

Zero Branch IT with Riverbed SteelFusion

White. Paper. The Evolution of IP Storage and Its Impact on the Network. December 2014

Data center interconnect for the enterprise hybrid cloud

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

Enabling IT Transformation with Modern Data Protection Strategies

ESG Lab Review The Performance Benefits of Fibre Channel Compared to iscsi for All-flash Storage Arrays Supporting Enterprise Workloads

i365 EVault for Microsoft System Center Data Protection Manager Date: October 2010 Authors: Ginny Roth, Lab Engineer, and Tony Palmer, Senior Engineer

SQL Server Consolidation with Server Virtualization on NetApp Storage

Optimizing the Data Center with an End to End Solutions Approach

BACKUP AND RECOVERY OF A HIGHLY VIRTUALIZED ENVIRONMENT

Why Enterprises Need to Optimize Their Data Centers

The case for cloud-based data backup

Simplifying and Accelerating the Transition to Hybrid Cloud Environments. By Mark Bowker, Senior Analyst and Bob Laliberte, Senior Analyst

WHITE PAPER. Controlling Storage Costs with Oracle Database 11g. By Brian Babineau With Bill Lundell. February, 2008

Lab Validation Report

EMC SOLUTION FOR SPLUNK

Cloud-based data backup: a buyer s guide

Technical Review Diamanti D10 Bare-metal Container Platform

Top 5 Reasons to Consider

Choosing the Right Cloud. ebook

Pivot3 vstac VDI-Simple Scalability for VMware View 5 Date: February 2012 Author: Tony Palmer, Sr. Lab Engineer/Analyst

Nutanix Complete Cluster Date: May 2012 Authors: Ginny Roth, Tony Palmer, and Emil Flock

Veritas NetBackup Appliance Family OVERVIEW BROCHURE

SD-WAN Solution How to Make the Best Choice for Your Business

Software-defined Storage: Fast, Safe and Efficient

All-Flash Storage Solution for SAP HANA:

Introducing SUSE Enterprise Storage 5

ForeScout Extended Module for Splunk

IBM Data Protection for Virtual Environments: Extending IBM Spectrum Protect Solutions to VMware and Hyper-V Environments

QLogic 2500 Series FC HBAs Accelerate Application Performance

Increasing the Availability of VMware Virtual Machines Is about And Instead of Or

Veritas Scalable File Server (SFS) Solution Brief for Energy (Oil and Gas) Industry

Cloud Strategies for Addressing IT Challenges

Modern Compute Is The Foundation For Your IT Transformation

Colocation Goes to the Edge

Software Defined Storage

Active Archive and the State of the Industry

Flash Decisions: Which Solution is Right for You?

ESG Lab Review RingCentral Mobile Voice Quality Assurance

Cisco APIC Enterprise Module Simplifies Network Operations

Rethink Storage: The Next Generation Of Scale- Out NAS

GET CLOUD EMPOWERED. SEE HOW THE CLOUD CAN TRANSFORM YOUR BUSINESS.

DDN Annual High Performance Computing Trends Survey Reveals Rising Deployment of Flash Tiers & Private/Hybrid Clouds vs.

Transcription:

Enterprise Strategy Group Getting to the bigger truth. White Paper Strong Consistency versus Weak Consistency Why You Should Start from a Position of Strength By Terri McClure, ESG Senior Analyst August 2016 This ESG White Paper was commissioned by HGST, a Western Digital Brand, and is distributed under license from ESG.

White Paper: Strong Consistency versus Weak Consistency 2 Contents Building for Scale... 3 Consistency: Why It Matters... 4 Consistency Models... 5 Consistency Considerations... 5 Strong Consistency: What Is It and When Would You Use It?... 6 Weak Consistency: What Is It and When Would You Use It?... 6 The Bigger Truth... 7

White Paper: Strong Consistency versus Weak Consistency 3 Building for Scale We are seeing a fundamental shift in how we architect the storage environment for unstructured data. Unstructured data is the bulk of data under management it is made up of documents, presentations, log files, images, videos, and all the other core bits and pieces that enable us to share information, communicate with one another and the outside world, and efficiently do our jobs. Much of this data needs to be retained for some long period of time medical records for the life of the patient, and mortgage documents for the life of the mortgage plus another seven years, for instance. This data grows and builds until the storage systems (typically NAS arrays or Windows File Servers) that contain it are bursting at the seams. Then we add another NAS array to the mix. And another. And another. And before you know it, hundreds of NAS arrays are spread throughout the organization, data is everywhere, and it takes an army of people just to manage it and ensure data is protected and retained. The problem certainly isn t going away on its own. It s getting worse. Advances in imaging and graphics technology mean that we are seeing very rich detail in our medical images, photos, videos, and PowerPoint presentations. But this detail comes at a price. For example, a 60-minute video recorded in standard definition (SD), uncompressed, consumes just under 90 GB of storage capacity. In high definition (HD), it consumes just under 440 GB. Because of advances such as these, the data growth challenge is accelerating. Challenges arise in managing, retaining, and protecting large and growing data sets. This is all driving more organizations to shift how they approach unstructured data storage and take a look at scalable, efficient, long-term options. Active archives allow data to be retrieved for reuse (for example, medical images for diagnosis, or video clips for an outtake reel) yet stored in a highly efficient manner for a long period of time. Data growth is driving the adoption of object storage technology, which was created to meet these needs. There are clear benefits to deploying object storage to handle the data growth challenge. Challenges in managing, retaining, and protecting large and growing data sets are driving more organizations to shift how they approach unstructured data storage and take a look at scalable, efficient, long-term options. The primary drivers for object storage adoption are all somehow related to easing the challenges associated with storing and managing ever-growing data sets: reduction in CapEx thanks to a reduction in the number of systems under management and better density/storage efficiency; simplified management; reduction in OpEx; and use of object storage as a foundation for a private cloud, among other things (see Figure 1). In recent ESG research, only 5% of the 323 storage decision makers surveyed had no plans or interest in object storage technology the other 95% are investigating object storage, planning to deploy it, or have already deployed it. 1 1 Source: ESG Research Report, 2015 Data Storage Market Trends, October 2015.

White Paper: Strong Consistency versus Weak Consistency 4 Figure 1. Object Storage Adoption Drivers To the best of your knowledge, which of the following factors are responsible for your organization s initial deployment or consideration of object storage technology? (Percent of respondents, N=305) Reduction in capital expenditures Simplified management of unstructured data Reduction in operational expenditures Foundation for cloud-based storage solution Total cost of ownership (TCO) Repository for data collected as part of BI/analytics initiatives Improved regulatory compliance Repository for archived data Don't know 13% 49% 15% 48% 12% 46% 17% 45% 13% 44% 11% 42% 10% 41% 8% 39% 1% 1% 0% 20% 40% 60% Most important factor driving deployment or consideration of object storage technology All factors driving initial deployment or consideration of object storage technology Source: Enterprise Strategy Group, 2016 These are compelling benefits. In fact, 45% of those organizations that have deployed, plan to deploy, or have an interest in object stores plan to significantly reduce their NAS footprint, while 25% plan to eliminate NAS completely! 2 Object storage solutions were designed to solve the challenges of protecting massive capacity environments where traditional backup is often not an option. Object storage architectures provide robust protection by expecting multifailure scenarios to be common occurrences. In any scale-out system, consistency of data across the nodes is a core consideration. Consistency is the guarantee that the state of the data can be brought back to a valid state with the most recently completed transaction. Within enterprises, this is normally done with hardware and software designed to work together to provide the highest levels of consistency (strong consistency). In object stores, however, there are two consistency models: strong and weak. Consistency: Why It Matters Achieving availability at scale is a complex challenge that traditional, rigidly configured scale-up NAS systems simply cannot meet. In order to achieve cloudscale, large, object-based, multi-node, or clustered scale-out systems are a core architectural requirement, but balancing performance and availability across these clusters is a challenge. For example, the same piece of unstructured content may be stored in and accessed from multiple nodes, so data in the nodes must maintain some level of consistency to return the most current data lack of consistency means running the risk of users obtaining, and acting on, stale or incorrect data. 2 ibid.

White Paper: Strong Consistency versus Weak Consistency 5 Two consistency models are in use today: weak consistency (sometimes referred to as eventual consistency) and strong consistency. Understanding the differences in consistency approaches can help IT administrators, architects, cloud builders, and executives decide which model may work best for their workloads, and whether data inconsistencies are acceptable for their particular applications. Consistency Models At a high level, weak consistency models are widely deployed in cloud infrastructures in order to manage the exponential growth of unstructured data when getting any data at all is more important than getting the most up-todate data. Strong consistency models, which have historically been optimized for file systems and block storage to support operating system and database uses, are employed when having the most up-to-date and consistent information is essential for applications, such as for financial transactions or statistics. You can use strong consistency and still get to cloud scale with the right system. But IT must take a lot of elements into consideration when choosing which model to deploy. Consistency Considerations Data is in a consistent state when any changes to that data are written to all nodes where data is housed, or, more specifically, all nodes where replicas (or erasure-coded data) exist. This is a nuance associated with scale-out systems (as opposed to more traditional scale-up systems). Any system with finite controller resources is a scale-up system and cannot scale out, whereas distributed systems in which controller resources get added with each node (along with memory and storage capacity) can scale out. Object stores are typically based on scale-out architectures, which add processing and storage capacity as systems grow that is part of the secret sauce that allows object stores to scale. That s not to say they can t scale up by adding bulk capacity to a single node they typically can do this as well. The other part of the secret sauce is how the system manages storing and protecting data. In scale-out object stores, data is often protected by storing copies of data or segments of erasure code on multiple nodes so that if one node fails, the data can be retrieved from another node (in the case of storing exact copies), or can be rebuilt from the remaining erasure-coded data. Erasure coding is similar to RAID where failure protection can be provided without a direct multiplication of the raw capacity by using parity information. The difference between erasure coding and RAID is that erasure coding is more flexible and can be distributed across multiple drives, nodes, or even sites for multi-site failure survivability. Due to the natural alignment of massive content repositories and multi-site failure survivability, erasure coding saw its earliest implementations in object storage. Erasure-coded schemes can provide the flexibility to protect against a large number of simultaneous failures, if desired, which may not be feasible when using a replica-based protection approach. For example, HGST s Active Archive System, and its BitSpread technology, can tolerate up to five simultaneous failures with no data loss. It uses an advanced erasure coding (18/5) that forces the data across a wide variety of nodes to enhance durability (15 nines), availability (6 nines), and performance. This differs from the Reed Solomon erasure coding, often used in weak consistency models, which has a fixed structure that is difficult to grow and can cause hot spots that impact performance. Advanced erasure coding can take more CPU, but CPU is getting cheaper every day, and the tradeoff is that you get a more flexible architecture that is dynamic and grows more easily. Of course, the more failures the system is designed to withstand, the larger the potential hardware investment, so it is important to understand how higher resiliency affects the cost of the solution and thus balance expectations accordingly. Many object stores that store multiple copies of data allow users to specify how many copies are required to meet availability requirements typically three is the recommended amount. This generally means that only three of the nodes in the cluster will have a copy of a particular object. When erasure code is employed, the data can be striped

White Paper: Strong Consistency versus Weak Consistency 6 across many nodes, and depending upon the implementation, potentially all nodes. Solutions such as HGST s Active Archive System with BitSpread technology will distribute the code across as many racks, nodes, shelves, and drives as possible, helping to reduce latency for rebuilds and enabling strong consistency at scale. For both of these models, multiple disks or complete nodes can fail while data remains available, which is why multi-node scale-out systems are ideal for meeting the cost, scale, and availability requirements of public and private cloud environments. In any scale-out system, consistency of data across the nodes is a key consideration. Consistency is the guarantee that data can be brought back to a valid state with the most recently completed transaction. Within enterprises, this is normally achieved through hardware and software designed to work together to provide the highest levels of consistency (strong consistency). Real-time analytics is a great case for strong consistency, and it is often based on unstructured data. However, it should be remembered that the highest levels of consistency are not always necessary; realizing a consistent state for each transaction as with strong consistency can be more challenging when it comes to widely dispersed, multi-site configurations. Strong consistency adds latency, which is dependent on geographical distance spreads. For example, operating systems need consistency and databases with transactions need strong consistency, but backup files, log files, and unstructured data operations that can be batched and do not require real-time data do not need that same level of consistency. This situation has led to the emergence of weak consistency models that can allow storage systems to scale geographically even as they meet generally higher performance and availability requirements. Strong Consistency: What Is It and When Would You Use It? The state of strong consistency occurs when data is written to all nodes for the purposes of protection, and reads are guaranteed to return the most recent data regardless of which node delivers the data. In this environment, all nodes must be queried to ensure that all updates have been written to all nodes and the read is returning the most recent copy. In other words, strong consistency guarantees that the database is in a consistent state when a transaction is finished and before the next transaction can be handled. This is how most data center storage arrays and POSIX-compliant file systems support business and back-office applications, as well as virtual machines that absolutely must return the most recent data and maintain transactional consistency across all copies. For example, when removing money from one bank account and adding it to another, the total balance for both accounts should be the same as it was before the transactions. The strong consistency model is required for real-time systems. Strong consistency offers a positive user experience when having the most up-to-date data is paramount, as in financial transactions or other relational database types of applications. However, strong consistency may imply a compromise on the scalability and performance of the system. Weak Consistency: What Is It and When Would You Use It? Weak consistency changes the rules slightly. In this model, when data is written across multiple nodes for data protection purposes, the read returns the version it finds first, whether or not it is the most recent version. In the weak consistency model, there is a guarantee that the system will eventually become consistent and have the most up-to-date version of data for all copies of the data. While that is determined, however, reads are not held until the cluster reaches a consistent state. Weak consistency only matters when you need to change something (e.g., overwrite all or part of a piece of data), so it can be very effective for certain data types. Like the strong consistency model, certain use cases for the weak consistency technology make it a good design choice that optimizes other factors, such as performance and scalability, which are particularly important for massive, highly distributed infrastructures with lots of unstructured data serving global sites. Examples include follower lists on Twitter, friend lists on Facebook, and log data for scientific applications. In general, any use cases in which

White Paper: Strong Consistency versus Weak Consistency 7 providing an answer is more important than providing the most up-to-date answer can be served well by weak consistency. If the users of an application will not notice (user-perceived consistency) or care if updates are reflected consistently at all times, then weak consistency could make sense for its read performance benefits and scalability. But in cases where stale data could present a problem, such as financial transactions or other systems that require real-time data, the tradeoff for scale or performance introduces business risk. The Bigger Truth Storage strategies have become critically important as users struggle to keep up with the exponentially increasing speed of unstructured data growth. Deploying the right storage model with the right consistency model is key to transforming the storage infrastructure to meet today s needs: It is how IT can help its company achieve new levels of scale and flexibility at a much better cost point than using traditional NAS or SAN storage. But it is no longer necessarily a choice between scale, performance, and consistency. Solutions such as HGST s Active Archive System enable strong consistency and at cloud scale.

All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188. Enterprise Strategy Group is an IT analyst, research, validation, and strategy firm that provides actionable insight and intelligence to the global IT community. www.esg-global.com 2016 by The Enterprise contact@esg-global.com Strategy Group, Inc. All Rights Reserved. P. 508.482.0188