THE COMPLETE GUIDE COUCHBASE BACKUP & RECOVERY

Similar documents
THE COMPLETE GUIDE HADOOP BACKUP & RECOVERY

Archiving, Backup, and Recovery for Complete the Promise of Virtualisation Unified information management for enterprise Windows environments

Veeam Availability Solution for Cisco UCS: Designed for Virtualized Environments. Solution Overview Cisco Public

Understanding Virtual System Data Protection

Remove complexity in protecting your virtual infrastructure with. IBM Spectrum Protect Plus. Data availability made easy. Overview

Data Protection for Virtualized Environments

Your Complete Guide to Backup and Recovery for MongoDB

IBM Spectrum Protect Plus

TOP REASONS TO CHOOSE DELL EMC OVER VEEAM

Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery

CONFIGURATION GUIDE WHITE PAPER JULY ActiveScale. Family Configuration Guide

Scalable backup and recovery for modern applications and NoSQL databases. Best practices for cloud-native applications and NoSQL databases on AWS

When, Where & Why to Use NoSQL?

The Definitive Guide to Backup and Recovery for Cassandra

Chapter 1. Storage Concepts. CommVault Concepts & Design Strategies:

Top Trends in DBMS & DW

Commvault Backup to Cloudian Hyperstore CONFIGURATION GUIDE TO USE HYPERSTORE AS A STORAGE LIBRARY

NEXT BIG THING. Edgemo Summit Martin Plesner-Jacobsen Team Lead Nordic

Data Protection Modernization: Meeting the Challenges of a Changing IT Landscape

White paper: Agentless Backup is Not a Myth. Agentless Backup is Not a Myth

The Definitive Guide to MongoDB Backup and Recovery

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010.

Solution Brief: Commvault HyperScale Software

Tintri Cloud Connector

VMware vsphere Data Protection 5.8 TECHNICAL OVERVIEW REVISED AUGUST 2014

Technology Insight Series

INFINIDAT Data Protection. White Paper

DATA PROTECTION SOLUTION MICROSOFT SQL SERVER

Cloud Backup and Recovery for Healthcare and ecommerce

Boost your data protection with NetApp + Veeam. Schahin Golshani Technical Partner Enablement Manager, MENA

What's New in vsan 6.2 First Published On: Last Updated On:

How To Guide: Long Term Archive for Rubrik. Using SwiftStack Storage as a Long Term Archive for Rubrik

HYCU and ExaGrid Hyper-converged Backup for Nutanix

Table of Contents... 2

Daily, Weekly or Monthly Partitions? A discussion of several factors for this important decision

Title DC Automation: It s a MARVEL!

Microsoft SQL Server HA and DR with DVX

The Microsoft Large Mailbox Vision

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017

Introduction to K2View Fabric

The vsphere 6.0 Advantages Over Hyper- V

New Approach to Unstructured Data

Eight Tips for Better Archives. Eight Ways Cloudian Object Storage Benefits Archiving with Veritas Enterprise Vault

Symantec NetBackup 7 for VMware

Virtual protection gets real

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

vsan Disaster Recovery November 19, 2017

Redefine Data Protection: Next Generation Backup And Business Continuity

VMware vsphere Data Protection Evaluation Guide REVISED APRIL 2015

NOSQL OPERATIONAL CHECKLIST

Protecting Microsoft Hyper-V 3.0 Environments with Arcserve

How to Protect SAP HANA Applications with the Data Protection Suite

Optimizing Apache Spark with Memory1. July Page 1 of 14

Active Archive and the State of the Industry

StorageCraft OneXafe and Veeam 9.5

Upgrade Your MuleESB with Solace s Messaging Infrastructure

Symantec Enterprise Vault

WHITE PAPER: ENTERPRISE SOLUTIONS. Disk-Based Data Protection Achieving Faster Backups and Restores and Reducing Backup Windows

Catalogic DPX: Backup Done Right

Microsoft SQL Server

Design Patterns for the Cloud. MCSN - N. Tonellotto - Distributed Enabling Platforms 68

SECURE CLOUD BACKUP AND RECOVERY

Oracle Zero Data Loss Recovery Appliance (ZDLRA)

Virtualization & On-Premise Cloud

The storage challenges of virtualized environments

Designing Data Protection Strategies for Oracle Databases

powered by Cloudian and Veritas

Cohesity Microsoft Azure Data Box Integration

Cisco Cloud Services Router 1000V and Amazon Web Services CASE STUDY

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Virtualization. Disaster Recovery. A Foundation for Disaster Recovery in the Cloud

HOW TO AVOID STORAGE. Leveraging Flash Memory and Breakthrough Architecture to Accelerate Performance, Simplify Management, and Protect Your Data

At-Scale Data Centers & Demand for New Architectures

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

Protecting VMware vsphere/esx Environments with CA ARCserve

Using Self-Protecting Storage to Lower Backup TCO

VMWARE PROTECTION WITH DELL EMC NETWORKER 9

Zero Data Loss Recovery Appliance DOAG Konferenz 2014, Nürnberg

Lambda Architecture for Batch and Stream Processing. October 2018

EMC Integrated Infrastructure for VMware. Business Continuity

Elevate the Conversation: Put IT Resilience into Practice for Cloud Service Providers

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

A BigData Tour HDFS, Ceph and MapReduce

MAPR TECHNOLOGIES, INC. TECHNICAL BRIEF APRIL 2017 MAPR SNAPSHOTS

Veritas Backup Exec. Powerful, flexible and reliable data protection designed for cloud-ready organizations. Key Features and Benefits OVERVIEW

Real-time Protection for Microsoft Hyper-V

Chapter 10 Protecting Virtual Environments

The Google File System

Controlling Costs and Driving Agility in the Datacenter

Using Computer Associates BrightStor ARCserve Backup with Microsoft Data Protection Manager

Service Mesh and Microservices Networking

Protecting Microsoft SharePoint

HPE 3PAR File Persona on HPE 3PAR StoreServ Storage with Veritas Enterprise Vault

XtremIO Business Continuity & Disaster Recovery. Aharon Blitzer & Marco Abela XtremIO Product Management

THE JOURNEY OVERVIEW THREE PHASES TO A SUCCESSFUL MIGRATION ADOPTION ACCENTURE IS 80% IN THE CLOUD

Veeam and HP: Meet your backup data protection goals

Ten things hyperconvergence can do for you

The Data-Protection Playbook for All-flash Storage KEY CONSIDERATIONS FOR FLASH-OPTIMIZED DATA PROTECTION

Designing Data Protection Strategies for Oracle Databases

Transcription:

THE COMPLETE GUIDE COUCHBASE BACKUP & RECOVERY

INTRODUCTION Driven by the need to remain competitive and differentiate themselves, organizations are undergoing digital transformations and becoming increasingly data driven, leading to the proliferation of modern applications like IoT and Customer 360 built on massively scalable NoSQL data platforms including Cassandra. This has created a critical data protection gap leaving organizations exposed to data loss, unprotected against Ransomware attacks, and unable to address compliance requirements. This ebook discusses the key data protection challenges and provides guidance on what to look for in a solution.

UNDERSTANDING COUCHBASE BACKUP & RECOVERY REQUIREMENTS A successful backup and recovery strategy is predicated on addressing numerous functional requirements, mainly Full automation requiring no scripting Incremental-forever backups Fast and granular point-in-time recovery Agentless architecture Massive scalability Backup storage optimization Application-aware backups and restores We will detail each of these requirements in more detail in the following chapters.

FULLY AUTOMATED BACKUP No one likes doing backups. And with Big Data platforms, you get 3 replicated copies of your data. So why back it up? If all you care about is protection against hardware failures, perhaps replication is good enough. But replication exacerbates more common problems like user errors, application corruption or Ransomware attacks. Ok, so maybe backups are good. Now, database technologists are inherently tech savvy. They can just write scripts to handle the process, right? Not really. There are a lot of considerations: Database awareness: In addition to HDSF files, metadata and schema need to be protected Complexity: Multiple nodes and replicas are difficult to manage for backup Changes: Described later in this ebook, multiple full backups are not practical. Any solution must be able to track net changes. Copying data: Scripting must copy all SSTable files from all nodes, copy in parallel, handle errors and retries, and manage cloud APIs and storage tiering. Versioning: Scripts must track historical backups to enable multiple restore points Errors: Scripts must handle a wide variety of errors and warnings Capacity: Backup storage consumption will be high due to backup of all replicas and periodic full backups. Restores: Restores will continue to be manual, error-prone, and resource intensive RPO: With different data requiring different RPOs, scripting for this can be a challenge. These factors make scripting an incredibly painful and ongoing process. They also significantly increase risk. Unless environments are small and expected to be relatively static, a decision to script should be taken very carefully. In most cases, a 3rd-party solution is going to perform better and be less work, lower risk, and lower TCO. In the end, an organization must decide if it is better to heavily dedicate engineering resources to backup scripting with all the inherent risks or use a proven 3rd-party platform. Performance: Backup and restore performance will be slow and limited to a single server on which the script is running 4

THE IMPORTANCE OF INCREMENTAL FOREVER Traditionally, organizations backed up data by creating a complete backup each week, followed by daily incremental backups. For recovery, the last full backup became the starting poin to which the subsequent incremental backups were added, thus generating the image that needed to be recovered. Consider a Couchbase application with several buckets with a total data size of 20 TB. Implementing a full weekly backup of a 20 TB data set is not feasible and can never meet any reasonable service-level agreement. The only practical way to backup Couchbase and other Big Data databases is with incrementalforever techniques. A full backup will be done only once, when the backup workflow is initially set up. After that, only changes will be incrementally copied to the destination cluster. For example, when a new Hive partition is added, only the files and metadata corresponding to the new Couchbase documents are added, only the new documents need be copied to the backup cluster. 5

FAST AND GRANULAR RECOVERY IS THE TICKET Another requirement in the Big Data world is that incremental changes must be immediately added to the full backup to create a complete image of the primary data at a particular time. This requirement ensures that recovery can be done in a single step without the lag time associated with creation of the image for recovery using a full backup and multiple incremental backup copies. Data recovery must be application- aware and granular. For the example of a Couchbase database, the backup cluster must be able to recover a single bucket. Additionally, all buckets comprising hundreds of documents might be backed up in a single workflow. The recovery workflow needs to be flexible enough to restore a single table from this backup workflow. 6

SPEED YOUR WAY WITH PARALLEL DATA TRANSFERS The architectures of all Big Data platforms, from Couchbase to Impala, specify a loosely coupled, shared-nothing architecture built on commodity hardware with direct-attached cheap storage. Implicit in this design is that data is actually distributed for storage across several nodes on the primary cluster for all these Big Data applications. Good performance, therefore, will require parallel-capable backup workflows. That is, each node containing data will be contacted independently and its data copied directly from the individual container nodes on the primary cluster hosting the data. That requirement suggests that a monolithic backup solution will not scale to Big Data levels because of several chokepoints in the design. Hence, any workable Couchbase backup implementation will have to run on a scale-out platform built on commodity hardware with direct-attached drives. All the nodes in the backup cluster will set up connections to all the nodes in the primary cluster so that data can be transferred in parallel. 7

AVOID AGENT OVERHEAD Big Data cluster configurations are in a state of constant flux.since commodity hardware is used to deploy platforms such as Couchbase, Hadoop, and Vertica, those clusters are configured to withstand or quickly recover from failures of various components such as drives, network adapters, and even nodes in the cluster. A traditional backup solution deploys agents on primary nodes to perform data transfers. That model is not operationally feasible in a Couchbase environment because new nodes are constantly commissioned and dead nodes are decommissioned. Managing and monitoring the availability of agents that are deployed on the individual nodes is a non-trivial task due to the number of nodes in a Couchbase cluster. There are also security implications in a datacenter where authorization from the security infrastructure team is typically required before additional daemons are deployed on the production nodes. For these operational reasons, any Couchbase data protection solution will need to incorporate an agentless model with no backup software installed on the nodes of the primary cluster. 8

LET THE SCALABLE CATALOG DO THE HARD WORK Backing up 10s or 100s of Couchbase clusters is non-trivial. The problem becomes harder as data sets grow to 100s of terabytes and number of stored documents is in the millions or billions. A centralized backup and recovery strategy makes most sense to ensure operational simplicity and reduce costs. The number of objects that need to be versioned and monitored in the Big Data world is in the millions, and the catalog to support these many objects will have to scale horizontally. For example, a Couchbase data store might easily have thousands of documents. Assuming a typical change rate, every incremental backup will add a large number of objects. These objects will have to be mapped to an appropriate recover point. A catalog will need extensive search capabilities and must scale to Big Data levels. Metadata of the objects will need to be stored, and the mutations of the metadata must be searchable across different versions and transitions. 9

BACKUP STORAGE OPTIMIZATION By its very nature Big Data is, well, big. Fully protecting your Couchbase environment can consume a huge amount of storage. To reduce these potential costs, organizations should squeeze data into the smallest possible footprint and move it to the lowest-cost storage tier, all without impacting performance. A robust deduplication and compression methodology can reduce data needs by up to 90%. When you consider the potential size of Couchbase backups, that can lead to huge savings. But in addition to reduction factor, deduplication and compression need to be high performance, since they do take time and compute cycles to complete, during which you are at increased risk. The best footprint optimization methodologies generate the smallest storage requirement in the shortest duration. Minimizing the footprint is only half the process, though. All storage cost is not created equal. Direct attached flash is going to cost considerably more than cloud storage, and long-term cloud storage will cost less than transactional cloud storage. But with data classification and policies (performance, compliance) often shifting, data placement needs to remain highly dynamic. Ensuring that your data is always on the lowestcost storage that still meets SLAs is a near impossibility for both human and script alike. Some form of automation is needed to fully optimize for cost and risk. 10

APPLICATION-AWARE BACKUPS AND RESTORES The Big Data world involves different applications with different types of data abstractions. For example, data in Couchbase is stored in buckets and documents while the abstraction layer for Hive focuses on databases, tables, and partitions. These differences impact backup requirements in several ways. For example, the user setting up workflows needs to interact with the backup system at the data abstraction layer supported by the application. Another requirement is that all the metadata and attributes associated with the abstraction layer also need to be backed up. For example, with Couchbase, we backup and restore bucket properties, view definition, and index definition. 11

WHY ARE WE PASSIONATE ABOUT THIS? We have witnessed numerous companies feel the pain of losing Big Data and that inspired us to create Imanis Data. Our software solution is built on a fully scale-out architecture and supports commodity hardware including direct-attached, network-attached, or cloud-attached, software-defined storage. We use an incremental-forever model to fetch only modified objects from the primary cluster. We are completely application-aware. For an application like Couchbase, we pull bucket metadata information like configuration, indexes, & views. Any recovery of the bucket will ensure that the metdata associated with the bucket, are applied to the recovered bucket. We are also agentless. No Imanis Data software need be installed on any of the nodes of the Couchbase cluster. The catalog is architected to host millions of versioned objects along with their attributes and properties. It is searchable with different attributes and regular expressions. In addition to backup/recovery, we also support cloud migration, test data management, and enhanced security and compliance. To learn more or to schedule an introductory meeting, please contact info@imanisdata.com.

Imanis Data, Inc. 2860 Zanker Road, Suite 109, San Jose CA 95134 +1.408.649.6338 imanisdata.com 2018 Imanis Data, Inc. All rights reserved. Imanis Data and the Imanis Data logo are trademarks of Imanis Data in the US and in other countries. Information subject to change without notice. All other trademarks and service marks are property of their respective owners.