Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

Similar documents
Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Rio-2 Hybrid Backup Server

Using Cohesity with Amazon Web Services (AWS)

HPE Reference Configuration for HPE Cloud Bank Storage with S3 Connector and Scality

How To Guide: Long Term Archive for Rubrik. Using SwiftStack Storage as a Long Term Archive for Rubrik

White paper ETERNUS CS800 Data Deduplication Background

DELL EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE

Understanding Virtual System Data Protection

Hedvig as backup target for Veeam

Modernize Your Backup and DR Using Actifio in AWS

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

Protect enterprise data, achieve long-term data retention

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

DATA PROTECTION FOR THE CLOUD

Tintri Cloud Connector

Cloud FastPath: Highly Secure Data Transfer

StorageCraft OneXafe and Veeam 9.5

ZYNSTRA TECHNICAL BRIEFING NOTE

Scale-Out Architectures for Secondary Storage

Data Domain OpenStorage Primer

StorageCraft OneBlox and Veeam 9.5 Expert Deployment Guide

Technology Insight Series

Data safety for digital business. Veritas Backup Exec WHITE PAPER. One solution for hybrid, physical, and virtual environments.

Microsoft DPM Meets BridgeSTOR Advanced Data Reduction and Security

SwiftStack Object Storage

Symantec Backup Exec Blueprints

vsphere Replication 6.5 Technical Overview January 08, 2018

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

HCI: Hyper-Converged Infrastructure

What's in this guide... 4 Documents related to NetBackup in highly available environments... 5

powered by Cloudian and Veritas

VMware vsphere Data Protection 5.8 TECHNICAL OVERVIEW REVISED AUGUST 2014

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved.

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

EMC DATA DOMAIN PRODUCT OvERvIEW

How Symantec Backup solution helps you to recover from disasters?

Arcserve Solutions for Amazon Web Services (AWS)

Solution Brief: Archiving with Harmonic Media Application Server and ProXplore

Complete Data Protection & Disaster Recovery Solution

Technology Insight Series

Symantec Backup Exec Blueprints

OpenDedupe. Architecture

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

Eight Tips for Better Archives. Eight Ways Cloudian Object Storage Benefits Archiving with Veritas Enterprise Vault

Symantec NetBackup Blueprints

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD

DELL EMC DATA DOMAIN OPERATING SYSTEM

Dell DR4100. Disk based Data Protection and Disaster Recovery. April 3, Birger Ferber, Enterprise Technologist Storage EMEA

EMC DATA DOMAIN OPERATING SYSTEM

DELL EMC DATA DOMAIN OPERATING SYSTEM

Veritas NetBackup Appliance Family OVERVIEW BROCHURE

1 Quantum Corporation 1

Countering ransomware with HPE data protection solutions

White Paper Simplified Backup and Reliable Recovery

Veritas Backup Exec. Powerful, flexible and reliable data protection designed for cloud-ready organizations. Key Features and Benefits OVERVIEW

Backup and Recovery Best Practices

Deduplication Storage System

Next Generation Storage for The Software-Defned World

Data Deduplication Makes It Practical to Replicate Your Tape Data for Disaster Recovery

VMware vcloud Air User's Guide

Object storage platform How it can help? Martin Lenk, Specialist Senior Systems Engineer Unstructured Data Solution, Dell EMC

Deduplication File System & Course Review

AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT

HP Dynamic Deduplication achieving a 50:1 ratio

Backup & Recovery on AWS

Symantec Design of DP Solutions for UNIX using NBU 5.0. Download Full Version :

Advanced Architectures for Oracle Database on Amazon EC2

Veritas Access NetBackup Solutions Guide

When failure is not an option HP NonStop BackBox VTC

The World s Fastest Backup Systems

System Requirements. Hardware and Virtual Appliance Requirements

Dell EMC CIFS-ECS Tool

Symantec NetBackup 7 for VMware

High performance and functionality

Overview Brochure. Veritas NetBackup TM Appliance Family

NetApp Cloud Backup. Technology Overview. Abstract. Technical Report. Christopher Wong, NetApp March 2018 TR-4427

Zadara Enterprise Storage in

DISK LIBRARY FOR MAINFRAME

The Business Case to deploy NetBackup Appliances and Deduplication

FEBRUARY - MAY 2017 PROOF OF CONCEPT AND CASE STUDY. IBM Spectrum Protect and Backing up to Object Storage in the Cloud

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

Mainframe Backup Modernization Disk Library for mainframe

Disaster Recovery-to-the- Cloud Best Practices

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

The Technology Behind Datrium Cloud DVX

Amazon AWS-Solution-Architect-Associate Exam

Discover the all-new CacheMount

Configuring Generic Amazon AWS S3 Compatible Storage and Actifio OnVault for Long-Term Data Retention

SwiftStack and python-swiftclient

Disk-Based Data Protection Architecture Comparisons

Storage S3 in backup. When? Value Architecture.

Tape Sucks for Long-Term Retention Time to Move to the Cloud. How Cloud is Transforming Legacy Data Strategies

Madis Pärn Sr. System Engineer EMC CORE TECHNOLOGIES DATA PROTECTION OVERVIEW. Copyright 2015 EMC Corporation. All rights reserved.

Enterprise Cloud Data Protection With Storage Director and IBM Cloud Object Storage October 24, 2017

product overview CRASH

The 5 Keys to Virtual Backup Excellence

Veritas Resiliency Platform 3.1 Overview and Planning Guide. Applicable for Veritas Resiliency Platform 3.1 and 3.1 Update 1

EMC Integrated Infrastructure for VMware. Business Continuity

Transcription:

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

Contents Introduction... 2 Overview... 2 Architecture... 2 SDFS File System Service... 3 Data Writes... 3 Data Reads... 3 De-duplication Storage Engine... 4 Data Blocks... 4 Cloud Storage of Data Blocks... 4 Reference configurations... 4 Standalone Opendedupe system... 4 Customer Pain Points/Business Challenges... 5 Applications... 6 Case 1: Backup and Replication... 6 Case 2: Tape Elimination... 7 Case 3: Migrate Data from EMC Data Domain to alternate Object Store... 8 Case 4: Long term archival to lower storage costs... 9 Case 5: Zero Data Movement, A.I.R based Cloud DR recovery of on-premise servers...10 Case 7: NetBackup in the cloud...12 Conclusion... 13 E: info@ T: 1-800-526-2821

2 Opendedupe & Veritas NetBackup Whitepaper Introduction Opendedupe represents an opportunity for enterprises to optimize storage usage, and protect large amounts of data. As an open source product, there is no capital cost to the customer, but those who do acquire Opendedupe have an option to purchase enterprise grade 24/7 support. Opendedupe is an excellent fit for backup, and long-term archival retention, either on-premise, off-premise (in the cloud) or a combination of the two. It can be used to store data either locally on-site, or off-site in the cloud, while using the smallest amount of actual storage possible. OVERVIEW Opendedupe originated in 2010, and is comprised of both the SDFS filesystem and SDVOL volume management. SDFS performs inline de-duplication, provides expandability and flexibility to either local or cloud storage (typically using the standard S3 protocol used by the likes of Amazon, Google, Azure, etc.), while SDVOL is a distributed and expandable volume manager that provides inline de-duplication and replication to any filesystem. The two components, SDFS and SDVOL, can be deployed in either a standalone or distributed, multi-node configuration. In a standalone configuration, inline de-duplication, replication and unlimited snapshot capabilities are available. Multi-node deployments gain global, intra-volume de-duplication, and configurable block storage redundancy with block storage expandability. A unique and powerful feature of Opendedupe is that it stores a copy of its hash and metadata lookup information with the de-duplicated data in the cloud or local storage. ARCHITECTURE SDFS s design decouples block data from file metadata, allowing any number of logical files to reference the same unique data block. The data block has no knowledge of what files reference it, or where the files are located. The metadata contains a hash that is associated with the logical location within a file. As the data is de-duplicated and shared between volumes, the I/O is significantly reduced across both the network and on the system. SDFS has 3 basic components: SDFS File System Service (FSS) De-duplication Storage Engine (DSE) Data Chunks E: info@ T: 1-800-526-2821

3 Opendedupe & Veritas NetBackup Whitepaper Figure 1 - Read/Write flow from application layer to local or S3 bucket storage SDFS File System Service The SDFS file-system provides a POSIX compliant view of the de-duplicated data, and is a logical container responsible for all filesystem level activity (i.e.: chunking data into blocks for de-duplication, file system statistics, snapshots). The SDFS File System Service stores the metadata regarding files and folders, and the mapping file for the actual file data. The actual data blocks or chunks are either stored within the local De-duplication Storage Engine or, if multi-node, a node s DSE. SDFS logical files are represented by two different metadata components, and are held in two files. The first represents the filesystem namespace that is presented when the filesystem is mounted. This contains the filesystem attributes associated with the file (example: size, atime, ctime, acls,) and links to the associated map file. The second metadata file is the mapping file, which contains the list of records corresponding to the locations of the data blocks that represent the file. Each record contains a hash entry, if the data is duplicate, stored on remote nodes, and what nodes the data can be found on. Data Writes A data write to the SDFS filesystem is first sent to the File-System Process for the kernel via the FUSE library. SDFS grabs the data from the FUSE layer API, breaks the data into fixed chunks, and is immediately cached on a per file basis for active I/O in a FIFO buffer. The chunks are expired from the FIFO buffer as new data enters or after two seconds. The expired chunks are moved to a flushing buffer, which in turn is emptied by several configurable write threads that compute the hash for the data block, then a search is done to see if the hash or data has already been stored, confirming the data has been stored and updating the record associated with the data block in the mapping file. Data Reads When a read request is made for data on the SDFS filesystem, the file position and data length is passed via the FUSE layer to the SDFS application. The record(s) associated with the file s position and length are looked up in the mapping file and the relevant block data is recalled by looking up the hash for local storage. E: info@ T: 1-800-526-2821

4 Opendedupe & Veritas NetBackup Whitepaper De-duplication Storage Engine The DSE stores, retrieves and removes all deduped data chunks. The DSE can be run as part of an SDFS volume (default) or as part of a global de-duplication storage pool. The chunks of data are stored either on local disk or at a cloud provider and are indexed for retrieval with a custom written hash table. Data Blocks Unique data chunks are stored in a Data Block by the DSE either on disk or in the cloud. The data chunks are stored in sequence within the chunkstore folder. The datablocks default to 40 MB in size but can be set as high as 2 GB. Once a block has filled, or times out waiting for new data (six seconds), it is closed and isn t available for writing. New blocks are created as unique data is written into the DSE, and is given a unique long integer identifier and is either compressed or encrypted in the chunkstore. The associated DSE map file is updated, and the data is written to permanent location on disk, or uploaded to the cloud and cached locally. Cloud Storage of Data Blocks Data that is uploaded to the cloud is cached locally, but should the data flush from local cache, the data block is retrieved from the cloud storage provider, and re-cached locally. The data is then read back to the requesting application from the local cache. Should the data be stored in an Amazon Glacier repository, the volume is informed the data is archived, and the volume initiates an archive retrieval process. REFERENCE CONFIGURATIONS Standalone Opendedupe system The configuration below is designed to manage 100TB of backend stored (de-duplicated data) and assumes physical resources with an 8:1 de-duplication rate. 36 GB of RAM 16 Core CPU @ 2.3 GHZ or better. 103 TB storage (actual data, operating system, hash table and meta data requirements. If using cloud storage, a minimum link speed of 5 MB/s is required, and faster may be required. E: info@ T: 1-800-526-2821

5 Opendedupe & Veritas NetBackup Whitepaper Figure 2 - Single Veritas NetBackup domain, on-premise with cloud replication using DPOO or capacity licensing NetBackup Media server with Opendedupe OST Plugin 16 Core CPU 2.3 GHZ 32 GB RAM OS Hash Meta De-duplicated Data Pool 3 x 1 TB Mirrored volumes 100 TB RAIDS S3 Object Store in the Cloud Customer Pain Points/Business Challenges Businesses today are faced with multiple reasons for storing and archiving large amounts of unstructured data. This adds a burgeoning cost today to operational activity and costs. De-duplicating this data permits customers to reduce the electronic footprint, as well as the physical footprint of their back end storage. Using the 8:1 de-duplication ratio specified in the reference configurations, Opendedupe coupled with Veritas NetBackup can reduce customers onpremise backup storage needs significantly, and extend the storage density of existing storage. This will mean longer term images can be retained on-premise, and with the ability to use cloud S3 based storage, longer archival data can be moved off-site, with a commensurate reduction in media handling fees and physical system costs. Because Opendedupe also stores a copy of the metadata needed to access the data chunks with the data sent to the cloud, a customer is also able to survive a catastrophic event at the source site by simply deploying a new Opendedupe server, and configuring it to access the appropriate S3 bucket. E: info@ T: 1-800-526-2821

6 Opendedupe & Veritas NetBackup Whitepaper APPLICATIONS The following are some high-level use cases of Opendedupe with Veritas NetBackup presented for consideration: Case 1: Backup and Replication NB Master/Media Server p Server Data replicated to both destinations On Premise Cloud Public Cloud Provider Consider a customer with Veritas NetBackup 7.7.3. Due to business requirements, they cannot upgrade to NetBackup 8.1 or later for a protracted period, but require a way to store data in the cloud. The Opendedupe OST plugin (on the NB HCL) will allow a customer to replicate images from their disk solution (basic, advanced or MSDP) to any S3 based cloud solution of their choice, on-site or off-site. Should the cloud solution be an on-premise version, the customer can protect the data for disaster recovery (DR) purposes by using Opendedupe s built in, de-duplication replication features to create a synchronous or asynchronous copy of the data at an alternate location. E: info@ T: 1-800-526-2821

7 Opendedupe & Veritas NetBackup Whitepaper Case 2: Tape Elimination Opendedupe NB Master/Media Server Opendedupe Server Consider a customer with a heavily tape based NetBackup solution, who wants to move to a disk based solution combined with the cloud. Opendedupe s OST plugin combined with either Capacity or DPOO licensing, can move the customer efficiently to a cloud based archival storage of data with a resulting reduction in data center footprint as there is no longer a need for tape libraries. The benefit is that the data will be protected by the cloud provider s data protection solutions, and is no longer subject to the physical damage that is inherent with the use of tape. E: info@ T: 1-800-526-2821

8 Opendedupe & Veritas NetBackup Whitepaper Case 3: Migrate Data from EMC Data Domain to alternate Object Store Data Domain data migrated to DataDomain NB Master/Media Server Server Private Cloud EMC Data Domain (DD) storage, while effective, is costly for customers. Given the way Opendedupe s OST plugin works, the data can be duplicated efficiently from the DD devices to an Opendedupe solution and written to any lower cost S3 compatible Object storage either on-site or off-site. E: info@ T: 1-800-526-2821

9 Opendedupe & Veritas NetBackup Whitepaper Case 4: Long term archival to lower storage costs NB Master/Media Server Server With the advent of cloud storage services, Opendedupe is a good fit as it provides reliable, low cost storage of long-term data retention for data compliancy requirements in a de-duplicated and encrypted (in-flight and at rest) manner. The customer does not need costly maintenance and infrastructure support costs for tape libraries and is better able to capitalize on the resulting free-up of physical infrastructure resources, and operational time spent on the care and feeding of the tape library. E: info@ T: 1-800-526-2821

10 Opendedupe & Veritas NetBackup Whitepaper Case 5: Zero Data Movement, A.I.R based Cloud DR recovery of on-premise servers NB Master/Media Server Server NB Master/Media Server Server Data Center 1 S3 Connector Data Center Backup image write S3 Connector Data Center Backup image import Data Center 2 S3 Data Bucket Replication with traditional storage targets require unique data synchronization between the primary and replica target. Historically this worked quite well but, as backup datasets become larger, backup target replication has come to consume a greater part of site to site bandwidth, potentially up to 70% of inter-site traffic. Opendedupe has the capability to replicate using this method and includes a Cloud Optimized method of replication to reduce replication traffic to nearly zero. With Opendedupe Zero Data Movement Feature, multiple NetBackup domains can share the same object storage bucket for reads and writes. This means that organizations can back up an image from one NetBackup domain and restore it from another simply by reading the data and metadata directly from the shared bucket (backup image import required). This is effective for organizations that are using or desire to use cloud infrastructure for disaster recovery. Using Opendedupe, an organization can backup the on-premise servers to one cloud S3 storage provider, and then restore those servers to an alternate cloud S3 provider s storage through a second NetBackup domain in the cloud, without any additional data movement. Additionally, since the S3 object storage is co-located with the secondary cloud environment, faster restore speeds and recovery are possible. E: info@ T: 1-800-526-2821

11 Opendedupe & Veritas NetBackup Whitepaper Case 6: Optimized replication between different cloud storage targets NB Master/Media Server p Server Data replicated to both destinations Cloud Provider 1 Cloud Provider 2 For DR and long term retention, backup data protection is paramount to prevent data corruption and ensure data availability in the event of a DR incident. Using traditional backup storage targets data consistency and availability is achieved through creating multiple optimized replicas of backup images across many data centers. This method can achieve 5 (99.999% uptime) nines of availability which is typically required for most DR plans. When migrating to cloud storage as a target, it is important to keep in mind that even the best cloud infrastructure vendors do not guarantee or report 5 nines of availability for their object storage. For organizations that require high availability of their backup data, replicating backup data to two cloud storage vendors is critical. Opendedupe includes optimized replication and Auto Image Replication (AIR) support that enables organizations to replicate backup images between two or more cloud storage vendors with reduced bandwidth and storage costs. Like traditional backup storage targets, Opendedupe enables backup images to be replicated to secondary storage targets, only ever moving unique data across the wire. But, unlike traditional storage targets, Opendedupe enables organizations to migrate data automatically between cloud storage vendors, not just between on-premises storage targets. This means organizations can replicate backup images between Microsoft Azure and Amazon AWS, or, on-premise object storage and off-premise object storage without any additional infrastructure. E: info@ T: 1-800-526-2821

12 Opendedupe & Veritas NetBackup Whitepaper Case 7: NetBackup in the cloud NB Master/Media Server Server NB Master/Media Server Server Data Center 1 Data Center 2 S3 Connector S3 Connector VPN VPN S3 Data Bucket Virtual system restored on cloud provider infrastructure Virtual Server Virtual NB Master/Media Server MSDP and traditional de-duplicated backup storage targets rely on block storage as primary storage for backup images. On-premise can be cost effective, but when migrating to cloud infrastructure block storage, cost can be prohibitive for large backup environments. As an example, block storage within Amazons EBS, at the time of writing this document, is $.10 a GB U.S. dollars. Depending on the IO requirements for block storage, this cost can be even higher. In contrast, S3 storage costs are $.023 per GB U.S. per month. Opendedupe leverages S3 storage for backup images and reduces cost to approximately 1/5 the cost of traditional backup targets. E: info@ T: 1-800-526-2821

13 Opendedupe & Veritas NetBackup Whitepaper Conclusion Opendedupe coupled with Veritas NetBackup addresses several business concerns and challenges; It permits global de-duplication of backup data across multiple NetBackup domains. Any NetBackup media server, including appliances, can access cloud storage solutions using the S3 protocol. Migration of data from data domain to lower cost object storage becomes possible and is optimized. Finally, Opendedupe eliminates the need for the care and feeding of tape libraries, storing data either on-premise or in the cloud, in a dense, de-duplicated state, with encryption. Where cloud storage is used, site failure does not mean the loss of data, as it is possible to recall the hash table and other metadata from the cloud and recover data from the cloud. Authors: David Kerrivan (Kanatek), Darryl Levesque (Kanatek), Sam Silverberg (Veritas). E: info@ T: 1-800-526-2821