Scale-Out Architectures for Secondary Storage

Similar documents
Data Protection for Virtualized Environments

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

Milestone Solution Partner IT Infrastructure Components Certification Report

Using Self-Protecting Storage to Lower Backup TCO

INTEL NEXT GENERATION TECHNOLOGY - POWERING NEW PERFORMANCE LEVELS

How Architecture Design Can Lower Hyperconverged Infrastructure (HCI) Total Cost of Ownership (TCO)

De-dupe: It s not a question of if, rather where and when! What to Look for and What to Avoid

NEC Sets High Bar in Scale-Out Secondary Storage Market

Maximizing Data Efficiency: Benefits of Global Deduplication

LATEST INTEL TECHNOLOGIES POWER NEW PERFORMANCE LEVELS ON VMWARE VSAN

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

Technology Insight Series

Understanding Virtual System Data Protection

Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection

VMware vsan Ready Nodes

Oracle s Engineered Systems Approach to Maximizing Database Protection

Scale-out Data Deduplication Architecture

powered by Cloudian and Veritas

HPE SimpliVity. The new powerhouse in hyperconvergence. Boštjan Dolinar HPE. Maribor Lancom

Hedvig as backup target for Veeam

Evaluator Group Inc. Executive Editor: Randy Kerns

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

NEC Express5800 R320f Fault Tolerant Servers & NEC ExpressCluster Software

White Paper Simplified Backup and Reliable Recovery

IBM ProtecTIER and Netbackup OpenStorage (OST)

The Data-Protection Playbook for All-flash Storage KEY CONSIDERATIONS FOR FLASH-OPTIMIZED DATA PROTECTION

Software Defined Storage for the Evolving Data Center

SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility

Discover the all-flash storage company for the on-demand world

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

The storage challenges of virtualized environments

Preserving the World s Most Important Data. Yours. SYSTEMS AT-A-GLANCE: KEY FEATURES AND BENEFITS

7 Things ISVs Must Know About Virtualization

Protect enterprise data, achieve long-term data retention

Using Cohesity with Amazon Web Services (AWS)

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

DATA CENTRE SOLUTIONS

Boost your data protection with NetApp + Veeam. Schahin Golshani Technical Partner Enablement Manager, MENA

SolidFire and Ceph Architectural Comparison

High performance and functionality

IBM TS7700 grid solutions for business continuity

HPE MSA 2042 Storage. Data sheet

Simple Data Protection for the Cloud Era

Renovating your storage infrastructure for Cloud era

Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

HPE Nimble Storage HF20 Adaptive Dual Controller 10GBASE-T 2-port Configure-to-order Base Array (Q8H72A)

Technology Insight Series

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

Maximizing Availability With Hyper-Converged Infrastructure

VMworld 2018 Content: Not for publication or distribution

New HPE 3PAR StoreServ 8000 and series Optimized for Flash

CONFIGURATION GUIDE WHITE PAPER JULY ActiveScale. Family Configuration Guide

Why Datrium DVX is Best for VDI

Flash Storage-based Data Protection with HPE

Cohesity Flash Protect for Pure FlashBlade: Simple, Scalable Data Protection

SolidFire and AltaVault

EMC DATA DOMAIN PRODUCT OvERvIEW

The Business Case to deploy NetBackup Appliances and Deduplication

IM B32: What s New in NetBackup: The Vision and Roadmap

Today s trends in the storage world. Jacint Juhasz Storage Infrastructure Architect

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

IBM Real-time Compression and ProtecTIER Deduplication

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

IBM Storwize V7000, Storwize V5000 and IBM Storwize V5000F

Hyper-Convergence De-mystified. Francis O Haire Group Technology Director

ADVANCED DATA REDUCTION CONCEPTS

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process

Veeam with Cohesity Data Platform

Modernizing Virtual Infrastructures Using VxRack FLEX with ScaleIO

Hitachi Virtual Storage Platform Family: Virtualize and Consolidate Storage Management

HYDRAstor: a Scalable Secondary Storage

SOLUTION BRIEF Fulfill the promise of the cloud

Data Domain OpenStorage Primer

Veritas NetBackup Appliance Family OVERVIEW BROCHURE

HPE SimpliVity 380. Simplyfying Hybrid IT with HPE Wolfgang Privas Storage Category Manager

LTO and Magnetic Tapes Lively Roadmap With a roadmap like this, how can tape be dead?

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

IBM TS4300 with IBM Spectrum Storage - The Perfect Match -

NetApp HCI. Ready For Next. Enterprise-Scale Hyper Converged Infrastructure VMworld 2017 Content: Not for publication Gabriel Chapman: Sr. Mgr. - NetA

HOW DATA DEDUPLICATION WORKS A WHITE PAPER

Ten things hyperconvergence can do for you

Vision of the Software Defined Data Center (SDDC)

Verron Martina vspecialist. Copyright 2012 EMC Corporation. All rights reserved.

Microsoft Azure StorSimple Hybrid Cloud Storage. Manu Aery, Raju S

(TBD GB/hour) was validated by ESG Lab

MODERNISE WITH ALL-FLASH. Intel Inside. Powerful Data Centre Outside.

Dell EMC All-Flash solutions are powered by Intel Xeon processors. Learn more at DellEMC.com/All-Flash

Dell Storage Point of View: Optimize your data everywhere

EMC Data Protection for Microsoft

Modernize Your Backup and DR Using Actifio in AWS

1 Quantum Corporation 1

HCI File Services Powered by ONTAP Select

EMC DATA DOMAIN OPERATING SYSTEM

Veeam Availability Solution for Cisco UCS: Designed for Virtualized Environments. Solution Overview Cisco Public

Solution Brief. Bridging the Infrastructure Gap for Unstructured Data with Object Storage. 89 Fifth Avenue, 7th Floor. New York, NY 10003

Cohesity extends its reach deeper into the broad market for

Nutanix White Paper. Hyper-Converged Infrastructure for Enterprise Applications. Version 1.0 March Enterprise Applications on Nutanix

The Data Protection Rule and Hybrid Cloud Backup

VMware vsan 6.6. Licensing Guide. Revised May 2017

Transcription:

Technology Insight Paper Scale-Out Architectures for Secondary Storage NEC is a Pioneer with HYDRAstor By Steve Scully, Sr. Analyst February 2018

Scale-Out Architectures for Secondary Storage 1

Scale-Out Architectures for Secondary Storage 2 IT organizations have seen explosive growth in the amount of data for several years. Forecasts are for that growth to continue at a rapid pace and even accelerate for organizations where the deluge of data from next generation applications such as rich media or IoT networks is just beginning to have an impact. All this growth puts pressure on storage resources, IT budgets and on the delivery of IT services including data protection. This pressure in turn is driving organizations to re-evaluate various aspects of their IT environment including data protection strategies. In addition to the continued growth of data, other industry trends - like server virtualization, all-flash primary storage, hyperconverged and hybrid cloud infrastructures - puts pressure on storage administrators charged with protecting the enterprise s data to continually monitor and reevaluate their processes and supporting infrastructure. One consequence of this growth in overall data is the effect on growth of secondary data. Industry estimates are that three to four times as much storage is used for secondary copies of primary data. Many IT organizations have reached a point where their current data protection plans are not working. Because of this, Evaluator Group believes we are entering a period of change regarding data protection processes and solutions. Unlike previous waves driven by a technology change say the introduction of deduplication backup appliances - this period of change is driven by a combination of factors from inside and outside the IT organization and supported by technology advancements that offer new ways of solving data protection challenges. Secondary Storage Features and Benefits One of the most important technology advancements to improve data protection strategies is the development of secondary storage systems, especially those using scale-out or grid architectures. This solution segment has attracted several vendors from interesting startups to established data protection vendors, each offering a converged appliance with storage software running on scale-out commodity hardware. Secondary storage systems allow IT organizations to re-evaluate multiple aspects of their data protection strategies and develop new approaches that work best in their environment. A secondary storage system provides a common platform to consolidate a variety of secondary workloads including backup, archive, disaster recovery, and content management into one place, eliminating multiple independent silos of storage that currently exist in many IT environments. They can offload certain applications and data from primary storage to optimize performance of that tier, especially useful with all-flash systems. These systems apply storage optimization techniques such as compression and deduplication across all the different data types for improved efficiencies over standalone storage silos.

Scale-Out Architectures for Secondary Storage 3 Enterprise backup solutions are the most common deployments we see customers using scale-out secondary storage systems for. As mentioned above, they are also used for archiving, hierarchical storage, content management, and meeting compliance requirements. As such, they are commonly used to store the increasing types and sizes of today s unstructured files documents, images, videos, files, etc. There are a number of defining features and capabilities in a scale-out, secondary storage system. Some of the more important characteristics we look for include: Scalability. This is a given since we are discussing scale-out secondary storage. There are several aspects to scalability that need to be considered. The first is to be able to scale the overall system capacity into the multi-pb range. The second is the granularity of scaling with system nodes that scale in increments that can meet application and budget requirements. Finally, the ability to grow the cluster and refresh the technology over time must be transparent and nondisruptive to the operation of the system. Performance. A secondary storage system also needs to deliver high performance in the range of PB per hour for ingest of data from a variety of applications and for access to the stored data. The performance should scale linearly with the number of nodes in the cluster. Resiliency. Since one of the main goals is to consolidate data into one environment, secondary storage systems must provide a high degree of internal data resiliency which doesn t impact the performance or capacity of the system as the cluster grows. They also need to provide system resiliency features such as snapshots and replication to Data Reduction. The system needs to make use of various data reduction techniques to make it cost effective at multi-pb scale, preferably implemented inline while the data is stored. These techniques typically include data deduplication and compression to reduce the amount of data stored while the use of thin provisioning can also lower the configured storage needed. Data deduplication needs to be global across all the data in the cluster to improve its efficacy. Scalable File System. Another key component of the scale-out secondary storage system is the use of a distributed file system that contains the stored data and scales in capacity and performance. The ability to provide a single name space at multi-pb scale is a significant advantage for the overall efficiency of the solution. Commodity hardware. Industry-standard servers with commodity components (like HDDs) are typically used by vendors of secondary storage systems to lower the costs of storage. This use of commodity hardware also allows customers to more quickly take advantage of the latest hardware advancements. Cloud Integration. Properly executed, a large secondary storage system can be viewed as a private cloud for an organization s secondary data. Increasingly, customers are looking for

Scale-Out Architectures for Secondary Storage 4 secondary storage systems that have the ability to integrate with public cloud resources, typically as a tier of storage for the secondary system or for use in data movability. In addition to the defining characteristics, there are a variety of additional features offered in some secondary storage systems which can be requirements for certain applications. These can include capabilities to satisfy compliance requirements like data encryption, WORM, data shredding and legal holds. They can include data management capabilities such as search and indexing. They can also include deeper integration with backup and archiving applications that leverage the capabilities of the application. A few vendors bundle backup software into their solution to offer what they refer to as a hyperconverged, scale-out data protection solution. Successful systems will have a broad range of partner and ISV support for a variety of secondary storage applications. The use of a scale-out secondary storage system as part of an updated data protection strategy provides a number of benefits to IT organizations. Consolidating any separate, secondary storage silos into a single platform is an important step in managing data and optimizing storage usage. Consolidation improves the efficacy in data reduction techniques like deduplication and copy data management. Overall, secondary storage systems provide ways to do more with less. Some of the more significant benefits that come from implementing secondary storage include: Simplified IT infrastructure. Standardizing on and consolidating to one scale-out secondary storage solution will simplify the IT infrastructure and improve storage utilization for many IT organizations. Leveraging a scale-out architecture also makes the system simpler to deploy, manage, and grow, improving the user experience. Reducing the number of silos can improve compliance with regulatory requirements such as GDPR. Lower costs. The overall costs of scale-out secondary storage systems are lower when compared to primary storage costs or the costs of maintaining multiple storage silos. In addition to using a variety of data reduction techniques, the use of commodity systems with lower costs components is typical. Single image cluster scalability is a critical determinant of storage resource cost efficiency. Improved Data Management. Value comes with access to information and access can be improved when more of an organization s information is co-located because data in individual silos is not easily inventoried and managed. Less duplicate data in the environment since duplication can be global. Provides for common tools and consistent policies across different types of secondary data which support a transition to data management activities like Copy Data Management (CDM). Long System Life. A scale-out architecture can provide a long-term storage system, lasting for many years as the one system continuously evolves throughout the life of the data store. There are no forced refresh cycles due to technology obsolescence and when appropriate technology

Scale-Out Architectures for Secondary Storage 5 refreshes happen non-disruptively. Especially when they scale to multi-pb capacities, this architecture allows for decades of growth without requiring specific migration activities to keep them current. NEC HYDRAstor in a Modern Data Protection Environment IT organizations are looking for new approaches to address long-standing data protection challenges for their changing IT environments. We believe that a modern approach to enterprise data protection should consider several available technologies including scale-out secondary storage. Here we offer a review of one such system - HYDRAstor from NEC as part of an updated approach to data protection. NEC HYDRAstor Essentials NEC HYDRAstor is an on-premise, scale-out, secondary storage system in line with the definitions from the first section of this Technical Insight. NEC is one of the pioneers in scale-out secondary storage having launched the first version of HYDRAstor in early 2007. NEC has been growing their solution for more than a decade and is currently shipping the fifth generation of HYDRAstor. Designed to run on industry standard Linux server clusters, HYDRAstor can be implemented as a virtual appliance, a preconfigured single node appliance (HS3 model), or as a scale-out cluster of pre-configured nodes (HS8 model). The remainder of this Technical Insight focuses on the scale-out capabilities of the HS8 model of HYDRAstor. HYDRAstor Scale-Out Storage HYDRAstor systems are scaled out using Hybrid Nodes that provide connectivity, performance and capacity, or scaled up using Storage Nodes that provide additional capacity. This allows scaling of performance and capacity together, or capacity alone if that is all that is required. Each node ships with full capacity using 12 x 6TB 3.5 SATA drives with appropriate systems resources (CPUs, memory, NICs, etc.) for the node s purpose. These nodes are connected via a grid architecture that creates a matrix with no single point of failure in either the nodes or in the grid. IT administrators can start with as little as one physical node and can scale to a total of 165 Hybrid and Storage nodes. Cluster nodes of differing configurations and capacities are added non-disruptively to form the multi-pb repository. A distributed grid file system and data services present as a common pool of storage and provides resiliency without imposing significant capacity or performance overhead. At scale, HYDRAstor provides industry leading ingest of deduped data at 5.2 PB/hour and an overall effective deduped capacity of 154 PB. Transparent, non-disruptive upgrades with three generations of nodes coexisting in the same cluster allows the HYDRAstor system to last for decades. The system automatically load balances across the

Scale-Out Architectures for Secondary Storage 6 nodes and ports as new nodes are added to the cluster. Once rebalanced, the overall performance scales linearly with the number of nodes. Evaluator Group Comments: The ability to scale capacity and performance up to 11.88PB of raw capacity, and over 154PB of de-duped capacity is significant. Further, the performance of nearly 5.2PB/hr. of deduped ingest for the HS8 is also significant. For many IT organizations, a single HYDRAstor cluster can provide for all their secondary storage needs. Data Resiliency at Scale The NEC HYDRAstor derives its data resiliency capabilities from its grid architecture, coupled with the HYDRAstor software elements. The replaceable, multi-node, shared nothing approach of HYDRAstor is able to tolerate multiple hardware failures without loss of either availability or data. NEC was an early adopter of erasure coding to provide internal data protection against device or node failures in the system known as Distributed Resilient Data (DRD). For systems of this scale, erasure coding provides greater reliability with less overhead than traditional RAID. Data is dispersed across the cluster and IT administrators can select from multiple erasure coding configurations depending on their resiliency requirements. HYDRAstor can recover from a drive failure faster since only the data is rebuilt and not the entire drive while the rebuild also happens across multiple drives. HYDRAstor also provides point in time snapshots to protect against user errors and accidental deletion, as well as asynchronous remote copy (optional RepliGrid software) to provide disaster recovery capabilities. The snapshots offer read-only and read-write capability and are implemented using a redirect on write technique. Unlike some implementations, HYDRAstor attempts to keep the original pointers and any changes near each other, in order to maximize streaming read/write performance. Data Reduction at Scale HYDRAstor s DynamicStor feature virtualizes all available storage resources into a common shared pool and dynamically allocates storage capacity as needed when writing data to the system. DynamicStor eliminates the task of provisioning, automatically allocates storage as needed and balances incoming data across all the storage resources within the grid. Through this dynamic and adaptive resource sharing, HYDRAstor ensures maximum capacity utilization. HYDRAstor implements variable-sized block, inline, hash-based data deduplication known as DataRedux. This application-aware data deduplication occurs globally, across all nodes within a grid and across all data types in the system. HYDRAstor also provides various compression algorithms that compress data as it s stored. The overall impact of these data reduction capabilities means that HYDRAstor s effective

Scale-Out Architectures for Secondary Storage 7 capacity is 20x that of its raw capacity. HYDRAstor also uses the deduped and compressed data when replicating to another system to reduce the amount of data and networking resources needed. Evaluator Group Comments: The design of the HYDRAstor hardware and software components represent a well thought out, scalable approach to long-term data storage. The use of DRD provides the necessary resiliency without imposing significant capacity or performance overhead as the system scales out. The DataRedux design provides high scalability by distributing the data reduction tasks among all nodes in a grid. Optional Software Capabilities NEC offers several optional capabilities for the HYDRAstor system to meet various data protection and data requirements. Some of the highlighted software options include: HYDRAlock provides compliance and security capabilities including enterprise and compliance WORM, data shredding, and in-flight and at-rest data encryption including secure and reliable key management. RepliGrid provides WAN-optimized remote replication between multiple HYDRAstor grids for disaster recovery. Only unique, deduplicated and compressed data is encrypted in-flight and replicated to the destination site. Advanced Data Services supports Universal Express I/O and Universal Deduped Transfer which offer more efficient data movement when using any Linux-based backup solution. HYDRAstor OpenStorage Suite enhances NetBackup integrations with source-side deduplication, express data transfers, adaptive load balancing across nodes, storage-synthesized full backups and WAN-optimized copy services and Auto Image Replication. Evaluator Group Comments: HYDRAstor s OST interface is not unique for a disk backup product, however, its capability to load balance across multiple nodes and ports is unique. When coupled with load balancing across multiple accelerator nodes, the NEC HYDRAstor should provide very high performance along with high reliability. Cloud Integration The HYDRAstor scales to one of the highest capacity systems available in the market, which reduces the need to use public cloud resources for many organizations. However, many IT organizations are looking for IT solutions that integrate with public clouds even if they don t currently need the capabilities. The current version of HYDRAstor supports cloud-ready data through Amazon S3 and OpenSwift compatible APIs and the availability of the virtual appliance version of HYDRAstor provides deployment flexibility for the future.

Scale-Out Architectures for Secondary Storage 8 Conclusion We have positioned the use of a scale-out secondary storage systems as the important component for an updated data protection strategy. In addition, it can be a foundation for a transition to a higher-level data management strategy. The HYDRAstor product is designed to accommodate large capacity environments using a scale out architecture. With support for up to 165 nodes, this degree of scale goes far beyond what many competing systems and architectures are able to support. With current systems supporting 154 PB of effective capacity and 5.2PB/hour of backup speed, scale, capacity and performance should not be a concern for any anticipated use case. The data protection strength of NEC s Distributed Resilient Data, coupled with thin provisioning and data deduplication as well as the capability to load balance OST backup jobs across multiple nodes, all provide evidence of a robust, scalable, highly capable scale-out secondary storage solution. Optional capabilities for WAN-optimized replication, data encryption, WORM and data shredding enhance the functionality. Another strength of HYDRAstor is its ease of use and configuration. NEC has continued to refine their management interface for the product, which is an important consideration in the data protection and archiving appliance market. By intelligently linking multiple components into a single grid, management duties are also reduced with the ability to manage one logical system. About Evaluator Group Evaluator Group Inc. is dedicated to helping IT professionals and vendors create and implement strategies that make the most of the value of their storage and digital information. Evaluator Group services deliver in-depth, unbiased analysis on storage architectures, infrastructures and management for IT professionals. Since 1997 Evaluator Group has provided services for thousands of end users and vendor professionals through product and market evaluations, competitive analysis and education. www.evaluatorgroup.com Follow us on Twitter @evaluator_group Copyright 2018 Evaluator Group, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or stored in a database or retrieval system for any purpose without the express written consent of Evaluator Group Inc. The information contained in this document is subject to change without notice. Evaluator Group assumes no responsibility for errors or omissions. Evaluator Group makes no expressed or implied warranties in this document relating to the use or operation of the products described herein. In no event shall Evaluator Group be liable for any indirect, special, inconsequential or incidental damages arising out of or associated with any aspect of this publication, even if advised of the possibility of such damages. The Evaluator Series is a trademark of Evaluator Group, Inc. All other trademarks are the property of their respective companies. This document was developed with NEC funding.