BARNS: Backup and Recovery for NoSQL Databases

Similar documents
BARNS: Towards Building Backup and Recovery for NoSQL Databases

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Cloud Backup and Recovery for Healthcare and ecommerce

MongoDB Architecture

How to Scale MongoDB. Apr

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Finding Consistency in an Inconsistent World: Towards Deep Semantic Understanding of Scale-out Distributed Databases

Migrating to Cassandra in the Cloud, the Netflix Way

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

Course Content MongoDB

Hedvig as backup target for Veeam

SOLUTION BRIEF Fulfill the promise of the cloud

A Non-Relational Storage Analysis

THE COMPLETE GUIDE COUCHBASE BACKUP & RECOVERY

Microsoft SQL Server HA and DR with DVX

Introduction to OpenStack Trove

Your Complete Guide to Backup and Recovery for MongoDB

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration

Hadoop An Overview. - Socrates CCDH

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

The Definitive Guide to Backup and Recovery for Cassandra

Introduction to Database Services

Kubernetes Integration with Virtuozzo Storage

Cohesity Flash Protect for Pure FlashBlade: Simple, Scalable Data Protection

Backup Edition Comparison OVERVIEW

Simple Data Protection for the Cloud Era

Backup License Comparison OVERVIEW

Distributed PostgreSQL with YugaByte DB

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD

Dell EMC Unity: Data Protection & Copy Data Management Options. Ryan Poulin Product Technologist Midrange & Entry Solutions Group

Axway API Management 7.5.x Cassandra Best practices. #axway

Using Cohesity with Amazon Web Services (AWS)

Backup and Recovery Best Practices With Tintri VMstore

MongoDB Backup & Recovery Field Guide

Making Non-Distributed Databases, Distributed. Ioannis Papapanagiotou, PhD Shailesh Birari

Cohesity DataPlatform Protecting Individual MS SQL Databases Solution Guide

OpenStack Trove and DBaaS: Impedance Match?

The Definitive Guide to MongoDB Backup and Recovery

Introducing RecoverX 2.5

The course modules of MongoDB developer and administrator online certification training:

Tintri & Veeam VM Backup & Replication Best Practices. John Phillips Strategic Alliances and Technical Marketing Ryan Post Systems Engineer

THE COMPLETE GUIDE HADOOP BACKUP & RECOVERY

NoSQL BENCHMARKING AND TUNING. Nachiket Kate Santosh Kangane Ankit Lakhotia Persistent Systems Ltd. Pune, India

Distributed System. Gang Wu. Spring,2018

Acronis Backup 12.5 License Comparison incl. cloud deployment functionality

Protecting Hyper-V Environments

Adaptation in distributed NoSQL data stores

ApsaraDB for Redis. Product Introduction

Data Protection Guide

Construct a High Efficiency VM Disaster Recovery Solution. Best choice for protecting virtual environments

Architecture of a Real-Time Operational DBMS

EMC Virtual Infrastructure for Microsoft Exchange 2007

Deploying Software Defined Storage for the Enterprise with Ceph. PRESENTATION TITLE GOES HERE Paul von Stamwitz Fujitsu

SnapCenter Software 4.0 Concepts Guide

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Nutanix White Paper. Hyper-Converged Infrastructure for Enterprise Applications. Version 1.0 March Enterprise Applications on Nutanix

Availability for the Modern Data Center on FlexPod Introduction NetApp, Inc. All rights reserved. NetApp Proprietary Limited Use Only

Virtual Server Agent for VMware VMware VADP Virtualization Architecture

Distributed File Systems II

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

StorageCraft OneXafe and Veeam 9.5

Scylla Open Source 3.0

Integrated Data Protection

The Data Protection Rule and Hybrid Cloud Backup

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

Data Protection Modernization: Meeting the Challenges of a Changing IT Landscape

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

Cohesity Architecture White Paper. Building a Modern, Web-Scale Architecture for Consolidating Secondary Storage

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

Server Fault Protection with NetApp Data ONTAP Edge-T

Understanding Virtual System Data Protection

XtremIO Business Continuity & Disaster Recovery. Aharon Blitzer & Marco Abela XtremIO Product Management

How Microsoft Built MySQL, PostgreSQL and MariaDB for the Cloud. Santa Clara, California April 23th 25th, 2018

EBOOK. NetApp ONTAP Cloud FOR MICROSOFT AZURE ENTERPRISE DATA MANAGEMENT IN THE CLOUD

Protecting Miscrosoft Hyper-V Environments

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

Move Amazon RDS MySQL Databases to Amazon VPC using Amazon EC2 ClassicLink and Read Replicas

A. Deduplication rate is less than expected, accounting for the remaining GSAN capacity

Copyright 2012 EMC Corporation. All rights reserved.

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

MySQL Cluster Ed 2. Duration: 4 Days

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

GFS: The Google File System. Dr. Yingwu Zhu

5 Fundamental Strategies for Building a Data-centered Data Center

SnapCenter Software 2.0 Installation and Setup Guide

EMC RecoverPoint. EMC RecoverPoint Support

Master Services Agreement:

CS 655 Advanced Topics in Distributed Systems

INTRODUCTION TO XTREMIO METADATA-AWARE REPLICATION

Google File System. Arun Sundaram Operating Systems

Modernize Your Backup and DR Using Actifio in AWS

EMC Data Protection for Microsoft

White paper ETERNUS CS800 Data Deduplication Background

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

Asigra Cloud Backup Provides Comprehensive Virtual Machine Data Protection Including Replication

Transcription:

BARNS: Backup and Recovery for NoSQL Databases Atish Kathpal, Priya Sehgal Advanced Technology Group, NetApp 1

Why Backup/Restore NoSQL DBs? Customers are directly ingesting into NoSQL Security breach are on the rise e.g. ransomware attacks on MongoDB [1] and recent WannaCrypt exploits Fat-finger errors eventually propagate to replicas Ransomware Sandbox deployments for test/dev Bring up shadow clusters of different cardinality (from production cluster snapshots) Compliance and regulatory requirements IDC, 2016 report [2] lists data-protection and retention as one of the top infrastructural requirements for NoSQL 2 [1] http://thehackernews.com/2017/01/secure-mongodb-database.html [2] Nadkarni A., Polyglot Persistence: Insights on NoSQL Adoption and the Resulting Impact on Infrastructure. IDC. 2016 Feb.

NoSQL Database Classes From Backup/Restore Service Perspective Master-slave Authoritative copy of each partition is contained in the master node that we can backup. Loss of primary node leads to shard/partition-unavailability until new leader is elected. Example: MongoDB, Redis, Oracle NoSQL, MarkLogic Master-less Data is scattered across nodes using consistent hashing techniques, no single node has all data for a given partition Eventual consistency: Unavailability of a destination node does not lead to write-failure, data is eventually replicated Example: Cassandra, Couchbase https://www.slideshare.net/mongodb/sharding-v-final, https://blog.imaginea.com/consistent-hashing-in-cassandra/ 3

NoSQL DBs Hosted on Shared Storage High-level, conceptual deployment architecture Backup: Leverage storage snapshots Restore: Leverage cloning NoSQL DB software (Node1) NoSQL DB software (Node2) NoSQL DB software (Node3) Filesystem (DB Journal, Logs, Data Files) Filesystem (DB Journal, Logs, Data Files) Filesystem (DB Journal, Logs, Data Files) LUN-1 LUN-2 LUN-3 Shared Storage Array (Snapshots, Cloning, Compression, Deduplication, Encryption, Cloud Integrations) 4

Backup/Restore Challenges Cluster-consistency at scale Cluster/App quiesce significantly hampers application performance. Cross node consistency not guaranteed Take crash consistent snapshots Post process crash consistent snapshots (in a sand-box) using NoSQL DB stack to reach an cluster-consistent state Space Efficiency Replica set data copies do not de-duplicate small row sizes, scattered across nodes (Cassandra) and unique ids added by storage engines (MongoDB) DB performs compression and encryption Remove replicas logically (application aware backup) Topology Changes Commodity nodes, at scale of 10-100s of nodes. E.g., Primary node might be unreachable while taking backup in case of MongoDB Storage snapshots do not have context about cluster topology Use cases may require restore to a test/dev cluster of different cardinality Save Cluster topology and storage mapping as part of backup Insignificant block based deduplication Write: (K1, V1), (K3, V3) Cassandra N1 Replication factor = 2 Write: (K1, V1), (K2, V2) Cassandra N2 Shared Storage cluster LUN1: K1, K3 LUN2: K1, K2 LUN3: K2, K3 Write: (K2, V2), (K3, V3) Cassandra N3 Existing open source utilities like Mongodump and Cassandra snapshots suffer from above challenges. 5

NoSQL Data Protection Challenges Master-Less Databases Client, Update K1 (CL.ONE) Challenges: Fault tolerance Backup may capture stale data due to eventual consistency Higher restore times, since Cassandra will perform repairs during restore Ack Update: (K1, V11, Tnew) Cassandra N1 Update: (K1, V11, Tnew) Cassandra N2 Cassandra N3 Snapshot of LUN2 will point to stale data Shared Storage cluster LUN1: (K1, V1, Told), (K1, V11, Tnew) LUN2: (K1, V1, Told), LUN3: http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsrepairnodesreadrepair.html, https://wiki.apache.org/cassandra/hintedhandoff 6

BARNS Architecture Addresses challenges of: 1. Taking cluster-consistent backup at scale 2. Taking storage efficient backups (through replica removal) 3. Enables recovery/cloning to different cluster topologies Mongo APIs MongoDB Cluster Lib_MongoDB Light Weight Backup Process (LWB) Lib_Store1 APIs Lib_DB Post- Process (PP) Lib_Storage Cassandra Cluster Lib_Cassandra Restore Process Lib_Store2 Cassandra APIs APIs Stop_balancer() Get_topology() Check_cluster_health() Prepare_Backup_Topo( Etc. Get_lun_mappings() Take_snapshots() Lun_clone() Vault_backup() Etc. 7 Store1 Store2

BARNS Solution: Cassandra Master-less Distributed Database 8

Phase 1: Light-weight Backup Phase C1 Token1 Production Cluster C2 C3 Token2 Token3 C4 Token4 BARNS: LWB 1. Capture token assignment of each node 2. Store mapping of LUNs à Tokens L1 L2 L3 L4 CL1 D1 CL2 D2 CL3 D3 CL4 D4 3. Take snapshots of L1 to L4 Backup Metadata example CL1 CL2 CL3 CL4 D1 D2 D3 D4 sn1 sn2 sn3 sn4 9

Phase2: Post Process Phase Part1: Flush Commitlogs PP Node(s) BARNS: PP-Phase P1 P2 P3 P4 CL_sn1 CL_sn2 CL_sn3 CL_sn4 1. Clone the LWB Snapshot LUNs 2. Mount on different PP processes or different nodes 3. Start Cassandra processes P1 to P4 CL1 D1 CL2 D2 CL3 D3 CL4 D4 4. Flush CommitLogs E.g. Data K1, V1, T1 K2, V2, T1 K1,V11, T2 10

Phase2: Post Process Phase Part2: Compaction BARNS: PP-Phase PP Node /cassandra unionfs 1. Using UnionFS mount all snapshot clones as readonly 2. Create a new volume and LUN for full backup 3. Mount it as RW through UnionFS 4. Let PP node have all tokens of prod cluster 5. Start Cassandra 6. Start compaction process 7. Final compacted files will be stored on Fullbackup LUN Data 1 Data 2 Data 3 Data 4 K1, V1, T1 K1, V1, T1 K1, V1, T1 K3,V3, T3 K2, V2, T1 K1, V11, T2 K2, V2, T1 K2, V22, T4 K1,V11, T2 K3,V3, T3 CL_sn1 CL_sn2 CL_sn3 CL_sn4 Data Compact K1, V11, T2 K2, V22, T1 K3,V3, T3 Full_bk_lun Keeping single copy of data in the backup => ~66% reduction in backup storage requirements 11

Cassandra Restore The post-process step enables cloning to different restore/clone topologies Token1 Token2 N1 Token3 Token4 N2 Clones Full Backup K1, V11, T2 K2, V22, T1 Full_bk_lun K3,V3, T3 12

Evaluation - Cassandra Backup and Restore Full Backup LWB <10 secs pp-flush - ~40 secs pp-compact time increases by 35-40% à incremental backup Production cluster 4 nodes 4 iscsi LUNs Commitlog and SSTables for a node on same LUN Cassandra 4.0 Post Process Node 2 CPUs 8GB RAM YCSB to ingest data Restore time less than ~2 mins (irrespective of cluster size and data set size) 13

BARNS Solution: MongoDB Master-Slave Distributed Database Ø Check the paper or just attend the poster session J 14

Summary Tracking replicas and cluster topologies is important for taking backups and performing flexible topology restores Existing open-source solutions have several inefficiencies like need for repairs after restore, lack of storage efficiency in backup and poor integrations with shared storage Opportunity to provide efficient backup and restore through light-weight snapshots and clones 15

Thank You. 16