How NAS Systems Participate in Data Protection. Paul Massiglia, Symantec Corporation

Similar documents
Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

ADVANCED DATA REDUCTION CONCEPTS

Data Deduplication Methods for Achieving Data Efficiency

Storage Virtualization II Effective Use of Virtualization - focusing on block virtualization -

A Promise Kept: Understanding the Monetary and Technical Benefits of STaaS Implementation. Mark Kaufman, Iron Mountain

Application Recovery. Andreas Schwegmann / HP

Restoration Technologies

A Crash Course In Wide Area Data Replication. Jacob Farmer, CTO, Cambridge Computer

Storage Virtualization II. - focusing on block virtualization -

Notes & Lessons Learned from a Field Engineer. Robert M. Smith, Microsoft

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

TSM Paper Replicating TSM

ASPECTS OF DEDUPLICATION. Dominic Kay, Oracle Mark Maybee, Oracle

Server Fault Protection with NetApp Data ONTAP Edge-T

Multi-Cloud Storage: Addressing the Need for Portability and Interoperability

Replication is the process of creating an

The Role of WAN Optimization in Cloud Infrastructures. Josh Tseng, Riverbed

Overview and Current Topics in Solid State Storage

Virtualization Practices:

Identifying and Eliminating Backup System Bottlenecks: Taking Your Existing Backup System to the Next Level

Overview and Current Topics in Solid State Storage

Scaling Data Center Application Infrastructure. Gary Orenstein, Gear6

SRM: Can You Get What You Want? John Webster, Evaluator Group.

Chapter 11. SnapProtect Technology

Trends in Data Protection CDP and VTL

BrightStor ARCserve Backup for Windows

VERITAS Volume Replicator. Successful Replication and Disaster Recovery

High Availability Using Fault Tolerance in the SAN. Wendy Betts, IBM Mark Fleming, IBM

Tiered File System without Tiers. Laura Shepard, Isilon

ECE Engineering Robust Server Software. Spring 2018

Module 4 STORAGE NETWORK BACKUP & RECOVERY

SNIA Tutorial 1 A CASE FOR FLASH STORAGE HOW TO CHOOSE FLASH STORAGE FOR YOUR APPLICATIONS

in Transition to the Cloud David A. Chapa, CTE EVault, a Seagate Company

EMC RecoverPoint. EMC RecoverPoint Support

Effective Storage Tiering for Databases

Storage in combined service/product data infrastructures. Craig Dunwoody CTO, GraphStream Incorporated

LTFS Bulk Transfer Standard PRESENTATION TITLE GOES HERE

Everything You Wanted To Know About Storage (But Were Too Proud To Ask) The Basics

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.

Architectural Principles for Networked Solid State Storage Access

An Introduction to Key Management for Secure Storage. Walt Hubis, LSI Corporation

The File Systems Evolution. Christian Bandulet, Sun Microsystems

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Chapter 12 Backup and Recovery

MOVING TOWARDS ZERO DOWNTIME FOR WINTEL Caddy Tan 21 September Leaders Have Vision visionsolutions.com 1

Virtualization Practices: Providing a Complete Virtual Solution in a Box

PRESENTATION TITLE GOES HERE

Netapp Exam NS0-510 NCIE-Backup & Recovery Implementation Engineer Exam Version: 7.0 [ Total Questions: 216 ]

Deploying Public, Private, and Hybrid. Storage Cloud Environments

White paper ETERNUS CS800 Data Deduplication Background

Business Continuity and Disaster Recovery. Ed Crowley Ch 12

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Felix Xavier CloudByte Inc.

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

BEST PRACTICES GUIDE FOR DATA PROTECTION WITH FILERS RUNNING FCP

Extending RDMA for Persistent Memory over Fabrics. Live Webcast October 25, 2018

Felix Xavier CloudByte Inc.

DATA PROTECTION SOLUTION MICROSOFT SQL SERVER

SAP HANA. HA and DR Guide. Issue 03 Date HUAWEI TECHNOLOGIES CO., LTD.

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

Zero Data Loss Recovery Appliance DOAG Konferenz 2014, Nürnberg

ECE Enterprise Storage Architecture. Fall 2018

Cloud Archive and Long Term Preservation Challenges and Best Practices

SAS: Today s Fast and Flexible Storage Fabric. Rick Kutcipal President, SCSI Trade Association Product Planning and Architecture, Broadcom Limited

Exam : S Title : Snia Storage Network Management/Administration. Version : Demo

Trends in Data Protection and Restoration Technologies. Jason Iehl, Network Appliance Michael Rowan, Consultant

The Data-Protection Playbook for All-flash Storage KEY CONSIDERATIONS FOR FLASH-OPTIMIZED DATA PROTECTION

REC (Remote Equivalent Copy) ETERNUS DX Advanced Copy Functions

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

Planning For Persistent Memory In The Data Center. Sarah Jelinek/Intel Corporation

CA ARCserve Backup. Benefits. Overview. The CA Advantage

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

VERITAS Volume Replicator Successful Replication and Disaster Recovery

Everything You Wanted To Know About Storage: Part Teal The Buffering Pod

Chapter 11: File System Implementation. Objectives

VERITAS Storage Foundation for Windows FlashSnap Option

SNIA Tutorial 3 EVERYTHING YOU WANTED TO KNOW ABOUT STORAGE: Part Teal Queues, Caches and Buffers

Boost your data protection with NetApp + Veeam. Schahin Golshani Technical Partner Enablement Manager, MENA

ZYNSTRA TECHNICAL BRIEFING NOTE

Protect enterprise data, achieve long-term data retention

Oracle Zero Data Loss Recovery Appliance (ZDLRA)

Using Self-Protecting Storage to Lower Backup TCO

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

TOP REASONS TO CHOOSE DELL EMC OVER VEEAM

HUAWEI OceanStor Enterprise Unified Storage System. HyperReplication Technical White Paper. Issue 01. Date HUAWEI TECHNOLOGIES CO., LTD.

Simplify and Improve DB2 Administration by Leveraging Your Storage System

SAP HANA Disaster Recovery with Asynchronous Storage Replication

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

C H A P T E R Overview Figure 1-1 What is Disaster Recovery as a Service?

Veritas NetBackup OpenStorage Solutions Guide for Disk

Benefits of Multi-Node Scale-out Clusters running NetApp Clustered Data ONTAP. Silverton Consulting, Inc. StorInt Briefing

Where s Your Third Copy?

SRM: Can You Get What You Want? John Webster Principal IT Advisor, Illuminata

Microsoft DPM Meets BridgeSTOR Advanced Data Reduction and Security

Exam Name: Midrange Storage Technical Support V2

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

Massively Scalable File Storage. Philippe Nicolas, KerStor

Transcription:

How NAS Systems Participate in Data Protection Paul Massiglia, Symantec Corporation

SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced without modification The SNIA must be acknowledged as source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the Author nor the Presenter is an attorney and nothing in this presentation is intended to be nor should be construed as legal advice or opinion. If you need legal advice or legal opinion please contact an attorney. The information presented herein represents the Author's personal opinion and current understanding of the issues involved. The Author, the Presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2

Abstract High Availability and Disaster Recovery for NAS Data Network Attached Storage (NAS) systems are able to use their awareness of file structures to take a more proactive part in protecting data against faults and disasters than is possible with disk arrays. This session surveys common techniques for protecting data against hardware and software failures, accidental or deliberate corruption, and disasters that incapacitate entire data centers, and shows how NAS systems are able to participate actively in all forms of protection. Backup, snapshots, continuous and periodic replication, and continuous data protection will be discussed. For each technique, the threats it covers, the costs of using it, and expected recovery times and recovery points will be pointed out. The goal of the session is to give students an appreciation for the high availability and disaster protection options available for data stored in NAS systems, to better equip them to make informed decisions when purchasing equipment or defining operating procedures. 3

Background: Data protection vs. high availability Data protection An intact copy of critical data survives disaster events Restoring application and client access is treated as a separate problem High availability An intact copy of critical data survives disaster events Procedures (usually automatic) in place to restore service to applications and clients 4

Background: Measuring availability RPO details: slide 36 RTO details: slide 35 Business time represented by data when service to clients resumes Service to clients restored Event time recovery point lost work (data) recovery time Business objectives for these measures determine appropriate data recovery and availability techniques In general, RTO&RPO $$$ 5

What s unique about NAS? Application Database File system I/O driver Client stack Application Database NFS/CIFS client Network stack Specialized storage network Network General enterprise network presentation presentation /files_n Presented objects /files_1 SAN storage system File system NAS storage system 6

What s unique about NAS? Application Database File system I/O driver Client stack Application Database NFS/CIFS client Network stack Specialized storage network Network General enterprise network presentation presentation A NAS server knows more about the data it contains, and can therefore take a more active part SAN storage system in protection and recovery Presented objects /files_1 File system NAS storage system /files_n 7

Background: Disasters Logical Equipment and facilities remain physically intact Data has been destroyed or corrupted Physical System failure Surrounding environment is intact Data center outage Entire IT environment must be restored or recreated 8

Logical disasters Causes Malice (malware or human action) Innocent human error Hardware or software fault General recovery strategy: turn back the clock Restore and revert to a good version of data Inherent consequences: Periodic copies of data sets stored in safe locations s that occur between latest recovery point and time of fault discovery are lost 9

A good version of data is Correct Is captured prior to the corrupting event Consistent Represents a snapshot of the data set Modular Is restorable en masse or file-by-file (Near-)current Represents a recent recovery point when restored Persistent persistence Synonym for non-volatility. Usually used to distinguish between data and metadata held in DRAM, which is lost when electrical power is lost, and data held on non-volatile storage (disk, tape, battery-backed DRAM, etc.) that survives, or persists across power http://www.snia.org/education/dictionary/s/ 10 Is restorable after years or decades

Techniques for protecting NAS data Technique Properties NAS Role Copy files Backup management software Low cost (nothing to buy) Labor intensive Fragile Higher cost Robust Infrequent recovery points Low-cost storage for disk-based backup VTL or filer model Low-cost storage for disk-based backup NDMP: direct copy from storage to backup device or system Backup details: slide 37 VTL details: slide 42 11

Enabling technology: Snapshots snapshot CONTEXT [Data Recovery] [Storage System] A fully usable copy of a defined collection of data that contains an image of the data as it appeared at the point in time at which the copy was initiated. A snapshot may be either a duplicate or a replicate of the data it represents. http://www.snia.org/education/dictionary/s/ Live data Snap! Snap! Snap! 3 PM image 2 PM image 1 PM image Recovery points Applications Make data stand still for copy/backup Ready-made recovery from logical corruption 12

The three basic snapshot choices Scheduling Periodic Ad hoc Fullsize Spacesaving Preservation technique Snapped objects 13

Block vs. file system snapshots Live data volumes Live file system Snap! Block update Block update Block update Snap! Block update Block update Block update 3 PM image 2 PM image Snap! Block update Block update Block update 1 PM image Snap! trunc file mkdir remove file Snap! extend file rmdir 3 PM image 2 PM image Snap! create file mkdir 1 PM image Point-in-time images of logical volume contents Point-in-time images of directory tree contents Compare properties: Slide 38 14

Full-size vs. space-saving snapshots Data set Live metadata copy on write CONTEXT [Storage System, Backup] A technique for maintaining a point in time copy of a collection of data by copying only data which is modified after the instant of replicate initiation. The original source data is used to satisfy read requests for both the source data itself and for the unmodified portion of the point in time copy. cf. pointer remapping http://www.snia.org/education/dictionary/c/ Snapshot metadata Data set Snapshot Snap! Snapshot new new new new Snap! Compare properties: slide 39 COW vs mapping: slides 40-41 15

Which kind of snapshot is best? It depends on your definition of best Full-size ( split mirror ) snapshots Protect against physical destruction of data set Enable off-hosting Almost no impact on production data High cost in storage and re-synchronization Space-saving snapshots Space-saving ( frequent recovery points) Some (usually minor) impact on application write performance Do not protect against physical destruction of data set 16

Which kind of snapshot is best? It depends on your definition of best File system snapshots Ready-to-use as read-only file systems File-system specific Certain performance anomalies (e.g., bulk deletions) Disk (LUN) snapshots Universal: snap any kind of data Coordinated snapshots of multiple name spaces Require external declaration of sync points 17

Which kind of snapshot is best? It depends on your definition of best Copy-on-write snapshots Favor live file performance Pointer-mapped snapshots Favor snapshot performance Why not a hybrid? Why not, indeed? Some vendors are at least experimenting with the concept 18

The data protection game changer: Continuous Data Protection (CDP) Fundamental technique Log every transaction on a data set (a universal redo log ) Fundamental benefit Defers specification of recovery point until restore time Compared to snapshots Snapshots:..Pre-stored images of data taken at fixed times CDP:..Image of recovery point data created dynamically at recovery time Challenges What s in a transaction and a data set? Resource consumption Impact on application performance Time to identify and present a point-in-time data set image Integration with applications and data managers Continuous Data Protection CDP CONTEXT [Data Recovery] A data protection service that captures changes to data to a separate storage location. There are multiple methods for capturing the continuous changes involving different technologies that serve different needs. CDP-based solutions can provide fine granularities of restorable objects ranging from crash-consistent images to logical objects such as files, mail boxes, messages, etc. http://www.snia.org/education/dictionary/s/ 19

Physical disasters System failure Definition Failure beyond the reach of usual techniques like RAID/mirroring NAS head clustering Network redundancy Needed to recover Intact, accessible copy of critical production data Connectivity to app servers Data center loss Entire IT environment incapacitated Storage systems App servers Local clients Connections to remote clients Intact recovery site Staff (including provisioning) Hardware and connectivity for critical apps Intact, accessible copy of critical production data 20

The case for data replication Backup-restore Cost advantage Replication Tape drive, media, and library Online storage capacity Physical transportation Replication link with adequate bandwidth and latency Acquisition or activation of recovery site Cost of downtime to restore data (hours to days) Cost of data loss due to recovery point (hours to days) Full-time recovery facility premises and staff Seconds to minutes Zero to seconds 21

Recovery from physical disasters Have or acquire adequate equipment for access to critical data and applications Have or recreate a copy of critical data at the recovery location 22

Recovery from physical disasters Have or acquire adequate equipment for access to critical data and applications Have or recreate a copy of critical data at the recovery location acquire The difference between have and recreate recreate is the distinction between high and normal availability 23

Replication Basic purpose: keep an up-to-date copy of a data set on separate storage resources (usually attached to separate processing resources) Different from mirroring Primary-secondary relationship Time ordered updates to a consistency group of devices May be asynchronous Use cases Recovery from physical disaster Data distribution/consolidation Second data source for read-only applications Baseline for writable clones of data 24

The three basic replication choices Currency Real-time Batch (episodic) App/database server File server or storage system Implementation location Replicated objects 25

Block vs. file-level replication Block replication Client activates when replica becomes live data File replication Readable while replication is occurring Block update Block update Block update create update file extendfile file mkdir rmdir Client read/write access Replication traffic Compare properties: slide 44 26

Host-based vs. storage system-based replication Host (application server)-based Storage system-based Activates when replica becomes live data Other variations: SAN switch, NAS aggregator, dedicated appliance Roughly equivalent to storage system-based replication Client read/write access Replication traffic Compare properties: slide 43 27

Real-time vs. batch replication Live data Replica 28

Real-time vs. batch replication Live data Replica Batch t 3 T=t 3 T=t 2 Batch t 2 Batch t 1 T=t 1 Compare properties: slide 45 29

Synchronous vs. asynchronous replication Log update at primary Queue for secondary Send to secondary Receive ack from secondary Signal completion to client Persist update at primary Incremental response time time Log update at primary Queue for secondary Signal completion to client s at risk Persist update at primary Send to secondary Receive ack from secondary Compare properties: slide 46 Variation on a theme: limited asynchronous replication 30

Which kind of replication is best? It depends on your definition of best Block-level vs. file-level Universal availability vs. bandwidth and readability Synchronous vs. asynchronous Potential for data loss vs. client responsiveness Real-time vs batch update Bandwidth consumption vs. recovery point granularity 31

Summary Different threats different recovery techniques Logical disaster Technique: turn back the clock Property: Inherently some data loss Physical disaster Technique: Recovery facility + replica of critical data Property: Cost vs. recovery point tradeoff Different data different requirements Critical: Less-critical RTO = seconds; RPO = now! Cost becomes a factor NAS is unique because it knows about data structure Able to participate proactively in protection and recovery processes 32

Q&A / Feedback Please submit any questions or comments on this tutorial to the SNIA at trackfilemgmt@snia.org Many thanks to the following individuals for their contributions to this tutorial. SNIA Education Committee David Dale Rob Peglar Warren Avery Norman Owens Netapp Xiotech storagenetworking.org storagenetworking.org 33

Background slides for publication not to be presented

Availability objectives: RTO Recovery time objective Requirements Back Days Hours Minutes Seconds Prepare and staff facilities and hardware Acquire backup copy of data and restore Replay logs, restore app environment and restart apps Restore data from onsite incremental backup Replay logs, restore app environment and restart apps Replay logs against data replica and activate Restart apps Restore client connections Detect failure (avoiding false positives) Switch data replica to live mode Switch apps to live or full-service 35

Availability objectives: RPO Recovery point objective Recovery point is a consequence of the data preservation technique Days Time of newest backup copy Hours Time and location of newest incremental backup Minutes Amount of time by which data replica lags live data Seconds Zero Back 36

Techniques for backing up NAS data Backup management software Advantages Limitations Set-and-forget automation Schedules Device and media management Data selection Added-value features Incremental backup Multi-streaming & multiplexing Cost License & maintenance Training Requires data to stand still during backup Application (e.g., database) integration Inherently coarse-grained recovery times & recovery points Back 37

Snapshots: what to snap blocks vs files Creation time Fast Blocks Files Some metadata creation required Atomicity Block update File system operation Impact on live data Directly related to disk I/O operations Some surprises possible (e.g., deletions) NAS developers have the luxury of choosing either Back 38

Full-size vs. space-saving snapshots Split mirror (content = bit-for-bit copy) Protects against device failures and media defects Storage requirement: full data set size Overhead to maintain: zero Deletion cost: Resynchronization with data set Copy-on-write (content = changes only) Requires separate physical protection mechanism Storage requirement: change in data set size during snapshot life Overhead to maintain: 2-3x for every first write Deletion cost: Space reclamation Back 39

Copy-on-write vs. pointer-mapped snapshots Allocate space for new data Write new data Adjust data set pointers Data set Snapshot Allocate space for data new Copy (read+write) data Adjust snapshot pointers Data set new Snapshot Streaming read Streaming read Performance impact Back 40

Copy-on-write vs pointer-mapped snapshots Pointer-mapped Copy-on-write Faster initial update-in-place (allocate+write+pointer adjust) More work for initial updatein-place (allocate+read+write+pointer adjust) For small updates: Tends to fragment live data set For small updates: no live data set fragmentation For large updates: Both tend to preserve physical contiguity (for later streaming reads) Back 41

Virtual Tape Library (VTL) Disk-based storage that emulates tape libraries Exploits existing backup infrastructure Easy to implement Can use tiered storage to minimize cost per byte stored e.g., MAID e.g., compression & de-duplication Behaves like Tape drives +Media +Robotic loader Tape media Enhances NDMP capabilities Mitigates impact of tape drive reservations Mitigates impact of single-streaming Back 42

Host-based vs. storage system-based replication Host-based Uses app server processing resources Storage systembased No processing impact on app server Arbitrary consistency groups (e.g., volumes from multiple arrays) Consistency group limited to storage system s scope Back 43

Block and file-based replication Block replication File replication Sends every block update over the replication link Sends file system operations over the replication link (uses less network bandwidth) Replicates file system operations I/O by I/O (i.e., is bug-compatible ) No context for block updates: (Replica is not usable during replication) Performs source and target file system actions independently File system operation-atomic (Replica can be used by read-only apps during replication) Back 44

Real-time and batch replication Real-time (op-by-op) Transmits every operation (i.e., sends repeated ops repeatedly) Replicates every primary data set state (any primary data set recovery point can be recreated from the replica) Batch (periodic snapshots) May consume less network bandwidth May not represent all primary data set states in the replica (recovery points at batch boundaries only) Back 45

Synchronous vs asynchronous replication Synchronous Data update is at the replica before client I/O completes (No data lost in a disaster) Round trip time is additive to client response time Asynchronous Data is queued for transmission to replica before client I/O completes (committed updates may be lost in a disaster) Little if any increment in client response time Back 46

Refer to Other Tutorials Please use this icon to refer to other SNIA Tutorials where appropriate. Check out SNIA Tutorial: Enter Tutorial Title Here 47