Your Complete Guide to Backup and Recovery for MongoDB

Similar documents
The Definitive Guide to MongoDB Backup and Recovery

The Definitive Guide to Backup and Recovery for Cassandra

Cloud Backup and Recovery for Healthcare and ecommerce

Scalable backup and recovery for modern applications and NoSQL databases. Best practices for cloud-native applications and NoSQL databases on AWS

Introducing RecoverX 2.5

THE COMPLETE GUIDE HADOOP BACKUP & RECOVERY

THE COMPLETE GUIDE COUCHBASE BACKUP & RECOVERY

The Backup and Recovery Guide for Cassandra

Cloud Confidence: Simple Seamless Secure. Dell EMC Data Protection for VMware Cloud on AWS

Deploy Next-Generation Cloud Applications on Apache Cassandra with Datos IO RecoverX on Cisco UCS and Cisco ACI

MAXIMIZE YOUR NUTANIX ROI WITH SURELINE SUREedge

MODERNIZE INFRASTRUCTURE

Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery

SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility

Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability

Ten things hyperconvergence can do for you

Nutanix White Paper. Hyper-Converged Infrastructure for Enterprise Applications. Version 1.0 March Enterprise Applications on Nutanix

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

Hystax. Live Migration and Disaster Recovery. Hystax B.V. Copyright

Asigra Cloud Backup Provides Comprehensive Virtual Machine Data Protection Including Replication

IBM Spectrum Protect Plus

White Paper BC/DR in the Cloud Era

IBM Compose Managed Platform for Multiple Open Source Databases

How CloudEndure Works

EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud

VPLEX & RECOVERPOINT CONTINUOUS DATA PROTECTION AND AVAILABILITY FOR YOUR MOST CRITICAL DATA IDAN KENTOR

Virtualization with Arcserve Unified Data Protection

Migration and Building of Data Centers in IBM SoftLayer

Redefine Data Protection: Next Generation Backup And Business Continuity

TOP REASONS TO CHOOSE DELL EMC OVER VEEAM

Become a MongoDB Replica Set Expert in Under 5 Minutes:

How CloudEndure Works

Non-disruptive, two node high-availability (HA) support keeps you operating against unplanned storage failures in the cloud

vsan Disaster Recovery November 19, 2017

Don t Jeopardize Your Business: 5 Key Business Continuity Use Cases for Cloud

STATE OF MODERN APPLICATIONS IN THE CLOUD

Modernize Your Storage

Data Protection for Virtualized Environments

From Single File Recovery to Full Restore: Choosing the Right Backup and Recovery Solution for Your Cloud Data

The Impact of Hyper- converged Infrastructure on the IT Landscape

Microsoft SQL Server HA and DR with DVX

The Data Protection Rule and Hybrid Cloud Backup

DATACENTER AS A SERVICE. We unburden you at the level you desire

CASE STUDY: USING THE HYBRID CLOUD TO INCREASE CORPORATE VALUE AND ADAPT TO COMPETITIVE WORLD TRENDS

Using Cohesity with Amazon Web Services (AWS)

Hyperconverged Infrastructure: Cost-effectively Simplifying IT to Improve Business Agility at Scale

Software Defined Storage for the Evolving Data Center

VMware Cloud on AWS The Next Generation Hybrid Cloud Architecture

Symantec NetBackup 7 for VMware

Hystax Acura. Cloud Migration and Disaster Recovery Solution. Hystax. All rights reserved. 1

powered by Cloudian and Veritas

Backup & Recovery on AWS

Welcome! IT Executive Sagamore Spirit

Hyper-Convergence De-mystified. Francis O Haire Group Technology Director

Clouds, Convergence & Consolidation

Private Cloud Public Cloud Edge. Consistent Infrastructure & Consistent Operations

VMware Virtual SAN Technology

Data safety for digital business. Veritas Backup Exec WHITE PAPER. One solution for hybrid, physical, and virtual environments.

Data Protection. Rethinking. Michael Andrews, Director, Enterprise Data Protection, APJ HP Autonomy IM

5 reasons why choosing Apache Cassandra is planning for a multi-cloud future

ebook ADVANCED LOAD BALANCING IN THE CLOUD 5 WAYS TO SIMPLIFY THE CHAOS

PURITY FLASHRECOVER REPLICATION. Native, Data Reduction-Optimized Disaster Recovery Solution

How to Keep UP Through Digital Transformation with Next-Generation App Development

SOLUTION BRIEF Fulfill the promise of the cloud

How CloudEndure Disaster Recovery Works

How CloudEndure Disaster Recovery Works

The Technology Behind Datrium Cloud DVX

Solution Brief: Commvault HyperScale Software

Discover the all-flash storage company for the on-demand world

Commvault Backup to Cloudian Hyperstore CONFIGURATION GUIDE TO USE HYPERSTORE AS A STORAGE LIBRARY

IBM Cloud IBM Cloud for VMware Solutions Zeb Ahmed Senior Offering Manager and BCDR Leader VMware on IBM Cloud VMworld 2017 Content: Not for publicati

Virtual Disaster Recovery

VMworld 2018 Content: Not for publication or distribution

HYCU and ExaGrid Hyper-converged Backup for Nutanix

DATABASE SCALE WITHOUT LIMITS ON AWS

Arcserve Unified Data Protection Virtualization Solution Brief

Aurora, RDS, or On-Prem, Which is right for you

What is Dell EMC Cloud for Microsoft Azure Stack?

Rok: Decentralized storage for the cloud native world

XtremIO Business Continuity & Disaster Recovery. Aharon Blitzer & Marco Abela XtremIO Product Management

Migrating Enterprise Applications to the Cloud Session 672. Leighton L. Nelson

TITLE. the IT Landscape

Storage Strategies for vsphere 5.5 users

Elevate the Conversation: Put IT Resilience into Practice for Cloud Service Providers

Cisco Cloud Architecture with Microsoft Cloud Platform Peter Lackey Technical Solutions Architect PSOSPG-1002

Vision of the Software Defined Data Center (SDDC)

HPE Synergy HPE SimpliVity 380

A Practical Guide to Cost-Effective Disaster Recovery Planning

Real-time Protection for Microsoft Hyper-V

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

Understanding Virtual System Data Protection

New Approach to Unstructured Data

Disaster Recovery Guide

Intermedia s Private Cloud Exchange

Madis Pärn Sr. System Engineer EMC CORE TECHNOLOGIES DATA PROTECTION OVERVIEW. Copyright 2015 EMC Corporation. All rights reserved.

HPE Nimble Storage Partner Sales Guide

RECOVERY & BUSINESS CONTINUITY SERVICES. Protect your data. Recover your environment. Manage your recovery.

Availability and the Always-on Enterprise: Why Backup is Dead

Ensuring business continuity with comprehensive and cost-effective disaster recovery service.

Redefine Data Protection: Next Generation Backup & Business Continuity Solutions

Transcription:

Your Complete Guide to Backup and Recovery for MongoDB

EBOOK Your Complete Guide to Backup and Recovery for MongoDB Table of Contents Part I: Backup and Recovery for MongoDB Part II: Customer Case Study Part III: RecoverX Overview and Comparison of Current Solutions Part IV: Datos IO Product Summary How important is data backup and recovery for production applications? Must have, critical requirement Nice to have Market Survey Conducted by Dimensional Research March 2015 1% 89% 10% Not required Part I: Backup and Recovery for MongoDB In this era of big data, enterprise applications create a large volume of data that may be structured, semi-structured or unstructured in nature. In addition, application development cycles are much shorter and application availability is a critical requirement. Given these requirements, enterprises are forced to look beyond traditional relational databases to onboard the next-generation applications (on IaaS or cloud-based PaaS). NoSQL databases such as MongoDB are now being adopted and evaluated by enterprises for these applications (ecommerce, content management, etc.). MongoDB provides dynamic schema, easy scaling through auto-sharding, unable consistency for reads, and built-in replication. MongoDB database has native replication that satisfies availability requirements. However, data protection requirements for scalable point-in-time backup and recovery need to be addressed. For robust data protection, yes, enterprises need both backup and replication! Without point-intime backups, organizations are at substantial risk of losing data due to human error, logical corruption and other operational failures. Traditional backup solutions were built to address the requirements of structured applications on relational databases that used shared storage and had the ACID transaction model. Unfortunately, they fall short of addressing the point-in-time backup requirements of next-generation applications and distributed databases. There are a few alternate script-based solutions that enterprises are using to fill the data protection gap but these solutions are suboptimal at best. EBOOK 2

1. Manual Scripted Solutions Manual solutions leverage native MongoDB snapshot utility and scripts to transfer data to secondary storage. The scripts (via mongodump) are customized for each MongoDB cluster and require significant operational effort to scale or adapt to any topology changes (such as addition or removal of nodes to your MongoDB database). Further, these scripts are not resilient to failure scenarios e.g., failure of a node (primary or secondary) or intermittent network issues. Finally, recovery is a manual process, hence, time consuming and results in very high application downtime and contains data loss risk due to any bugs in the scripts. Overall, these solutions work when the MongoDB environment is small and some data loss may be permitted in the application. Top issues of scripted solutions: Lack of enterprise backup solution for sharded configurations Database needs to be offline when snapshots are taken Both backup and recovery fail under node failure and other infrastructure failures Recovery process is manual and requires verifications, which increases recovery time Recovery at collection-level requires manual recovery that is time consuming Recovery to unlike topologies (sharded to unsharded) for test/dev refresh is not available Most enterprises use these scripted methods as a temporary quick-fix solution. It is like driving your car with flat tires you can keep going, however, neither can go at the speed you want to go nor are you risk free from disasters. 2. MongoDB Paid Backup and Recovery (aka MMS ) MongoDB itself provides a couple of ways to backup MongoDB databases. Enterprises may choose from either a managed backup offering (MMS) that runs in public cloud or if they are paid MongoDB customers, they may deploy the backup service on-premise. In addition to being exorbitantly costly, the managed backup service stores customers data in public cloud. Backup data transfer over WAN may not work for customers who deploy MongoDB on-premise and for customers who need to keep their sensitive data in-house. Further, there are significant data limitations per shard to use this service. Using the MongoDB on-premise backup service is possible but is complex to deploy and operationalize (deployment diagram speaks for itself!). Enterprises need to deploy 8 servers, additional databases (with additional licensing) and about ~6-9x storage capacity of the database that is backed up for enabling onpremise backups. Overall, on-premise backup service is a theoretical solution that brings with it significant CAPEX and OPEX investments: 1. Complexity of deploying multiple databases 2. Cost of additional infrastructure (servers and storage) 3. Cost of licensing additional MongoDB nodes 4. Risk of failed backups when nodes fail (secondary from which backup is taken) 5. Siloed backup infrastructure for only MongoDB database Realizing data protection requirements of enterprise customers, the emerging era of next-gen distributed databases, and the limitations of the solutions described above, Datos IO has built the industry-first scale-out data protection software product for nextgeneration applications deployed on distributed and cloud databases such as MongoDB and Apache Cassandra (DataStax). The Datos IO solution is built from the ground-up for next-generation applications, caters to the needs of application owners and DevOps, and takes away the operational hassles of deploying and managing protection infrastructure. Most importantly, it is a reliable and scalable solution to use even in scenarios of node failures which leads to optimal performance through minimized recovery time (RTO). EBOOK 3

This section will introduce the key requirements for protecting data that resides on MongoDB, deployed either onpremise, or on private cloud with as-a-service model, or in public cloud with Amazon AWS, Google Cloud Platform. Requirement #1: Online Cluster-Consistent Backups One of the key requirements of next-generation applications that are deployed on MongoDB is the always-on nature. This means that quiescing the database for taking backups is not feasible and moreover, the backup operation should not impact the performance of the application. As the application scales, the underlying MongoDB also needs to scaleout to multiple shards. In this case, a backup solution must provide a consistent backup copy across shards without disrupting database and application performance during backup operations. Requirement #2: Flexible Backup Options Depending on the application, data may have different change rate and patterns. For example, in a product catalog, certain items may be refreshed everyday (fast selling goods), while the others may have longer shelf life (premium items). Based on the application requirements, some collections may need to be backed up every hour versus the others that may be backed up daily. Providing this flexibility to schedule backups at any interval and at collection level granularity is another requirement that we have heard from customers who are using MongoDB. More importantly, these backups should always be stored on the secondary storage in native formats to avoid vendor lock-in. Advanced Data Management Services Recovery Query-able Versions Analytics Runnable s RecoverX Data Protection Software Platform EBOOK 4

Requirement #3: Scalable Recovery During its lifecycle, data resides in multiple stages such as development, test, pre-production and production, and may also reside in multiple clouds (private cloud and public cloud). The topology of MongoDB clusters at each stage is different. For production, the application could be deployed on a sharded MongoDB cluster on-premise but the test team might only have access to unsharded MongoDB clusters in the Amazon AWS (public cloud). Hence, the backup solution should allow multiple restore operations such as sharded to sharded (such as from 5 3 cluster to 2 3 sharded cluster) or sharded to unsharded (such as 5 3 cluster to 1 3 unsharded) across such cloud configuration. Production (On Premise) Production 3x3 Cluster Test 2x3 Cluster Test/Dev (Cloud) Replica Set 1 Replica Set 2 Replica Set 3 Replica Set 1 Replica Set 2 Dev Unsharded Restore to same topology Restore to different topology Datos IO RecoverX Restore sharded to unsharded Replica Set 1 NFS or Object Storage (S3) Requirement #4: Handling Failure Failures are a norm in the distributed database world. However, the backup solution should be resilient to database process failures, node failures, network failure and even logical corruption of data during backup and recovery operations. Finally, the backup solution should be able to handle failures of MongoDB config servers that store metadata for sharded clusters. Finally, customers are deploying MongoDB on a variety of models such as physical servers, private clouds and micro services like frameworks, and in public cloud. Backup and recovery should be seamless across these deployments and the ease of backup and recovery deployment is a big one for MongoDB customers. At Datos IO, we are working to provide enterprise grade backup and recovery solutions to enable you to onboard and scale your enterprise applications on MongoDB with confidence. EBOOK 5

Part II: Customer Case Study The Customer: Our customer is a North American-based leading entertainment discovery platform company that provides personalized entertainment. This organization has ~1,800+ employees worldwide. Their end customers include Spotify, Shazaam, itunes and Google. The customer deploys an entertainment guide application on a MongoDB database because of its flexibility of schema, ease of use, and scalability given the high-volume nature of their applications. Specifically, they use multiple MongoDB clusters, both sharded and unsharded configuration. They store entertainment content metadata for audio/video programming, as well as sensitive customer data such as name and address. Media & Entertainment The Problem: This customer required a durable backup and recovery solution that is purpose-built for MongoDB, and one that would meet internal IT standards. They initially considered native backup and recovery capabilities of MongoDB, but the complexity of deployment, extremely high TCO, and lack of sharded cluster support made it unsuitable. Our customer uses MongoDB Enterprise with Cloud Manager for operational management of their MongoDB cluster. However, they found that on-premise backup solution from MongoDB is very resource-intensive and requires setting up multiple, dedicated servers (~6-8), large amount of storage (6-8x) and additional MongoDB licenses. Overall, the competitive solution price was ~4x higher than Datos IO RecoverX. In addition, they wanted a single backup and recovery solution that can support multiple databases as they look to adopt additional non-relational databases in the future. The Solution: RecoverX provided our customer with the ability to do: 1) Daily backups, and 2) Recover in the event of operational failures. Specifically, the capability to schedule and perform flexible versioning and recovery (collection-level) operations along with an intuitive user interface is why they chose Datos IO. Additionally, our customer wanted to keep backups local on-premise storage, so needed Datos IO RecoverX to support classical Network Attached secondary storage. Finally, Datos IO RecoverX software-only solution allowed them to deploy on existing hardware and scale across their application environments. EBOOK 6

Environment Details MongoDB Cluster 1 MongoDB Cluster 2 MongoDB Version 3.0 WT 3.0 WT MongoDB Configuration Sharded Unsharded Database Size 1TB 200GB Number of Collections ~100 Several Hundreds Deployment Type Physical Server Physical Servers Storage Type NFS Storage NFS Storage Config Server (Yes/No) Replica Set No Deployment Diagram MongoDB Cluster 1 Storage MongoDB Cluster 2 RS1/P RS2/P RS1/P Parallel Data Transfer Parallel Data Transfer RS1/S RS1/S RS2/S RS2/S Consistency & Duplication RS1/S RS1/S Control Plane Control Plane Cluster 1 : 2 replica set 3.0 WT RecoverX Cluster 2 : 1 replica set 3.0 WT EBOOK 7

Part III: RecoverX Overview and Comparison of Current Solutions Benefits of RecoverX for MongoDB Backup and Recovery Recover in Minutes, Not Hours 80% Savings on Storage Costs ~5x Improvement in DevOps Efficiency Datos IO RecoverX: The Leading Solution for Scalable and Reliable MongoDB Backup and Recovery Datos IO RecoverX is the industry-first scale-out data protection software-only product to deliver scalable and reliable MongoDB backup and recovery solutions. RecoverX provides scalable versioning, 1-click recovery, industry-first semantic de-duplication through a scale-out software-only platform. RecoverX allows organizations to protect their data at any granularity and at any point in time (flexible RPOs), to reduce downtime with recovery in minutes (low RTOs) not hours, to save up to 80% on secondary storage costs, and to increase productivity of applications and DevOps teams. Datos IO Compatibility Specifics for MongoDB MongoDB MongoDB v3.0 (MMAP & WT) MongoDB v3.2 (MMAP & WT) Deployment Type On-Premise AWS Cloud Google Cloud Storage Type NFS AWS S3 Google Cloud Storage 8-core physical or virtual EC2 m4.2xlarge Standard Compute Engine machine 8vCPUs, 30GB RecoverX S/W Requirements 16GB Memory EC2 m4.2xlarge Standard Compute Engine (per node) 8vCPUs, 30GB 128GB Local Storage 128GB EBS or Local SSD 128GB SSD (SSD) EBOOK 8

How Datos IO Stacks up for MongoDB Backup and Recovery Native Tool Mongodump Database Vendor Backup Service RecoverX Customer Value & Benefits Backup Not Consistent Consistent Consistent version at any interval and at any granularity Enterprise grade backup and recovery software Infracture Cost & Complexity Implementation dependent Very high (6-8 servers and storage required) Medium (agents) Single (1) Datos IO Software server 8X lower TCO (saving of 60% to 70%) Impact on Database High (quiescing) Medium (agents) Low Streaming backup, no database impact Recovery Manual and risk prone Slow Parallel recovery to all shards ~4-5X faster recovery (reduced RTO) Enterprise grade with Flexible options to Recovery Option Manual and risk limited Limited options all combinations, such as sharded to enable test/dev and workload migration sharded use case Failure Handling None None Yes, Resilient to database and node failures (including primary) Resiliency and reduced enterprise risk Storage Savings None None Up to 3x (RF=3) Up to 80% reduced storage cost Support for Multiple Databases None None Apache Cassandra and mongo DB Multi-platform, enterprise grade data protection software EBOOK 9

Part IV: Datos IO Product Summary RecoverX IO Overview Datos IO RecoverX is the industry-first scale-out data protection software purpose-built for non-relational databases such as Apache Cassandra, MongoDB, etc. RecoverX provides any point-in-time backup and orchestrated recovery to protect against logical errors, human errors, malicious data corruption, application schema corruption and other soft errors. RecoverX also enables continuous integration and development by fully automating the refresh of test and development environments using production data. Features & Benefits RecoverX provides organizations with the following benefits: Minimize application downtime through application consistent pointin-time backups by removing database repairs post recovery operations Reduce secondary storage costs by ~80% using industry-first global semantic deduplication Failure resiliency and elastic performance via highly-available software-only product Operational efficiency through fully orchestrated recovery Lightweight deployment in public cloud or private datacenter Data Source (Cluster 1) Storage Data Source (Cluster 2) Node 1 Parallel Data Streaming Test/Dev Refresh Node 0 Node 6 Node 2 Node 5 Node 3 Consistency & Duplication Control Plane Node 1 Node 2 Node 4 Control Plane RecoverX 3-Node DevOps About Datos IO Datos IO is the application-centric data management company for the multi-cloud world. Our flagship Datos IO RecoverX delivers a radically novel approach to data management helping organizations embrace the cloud with confidence by delivering solutions that protect, mobilize, and monetize their data at scale. Datos IO was recently awarded Product of the Year by Storage Magazine, and was recognized by Gartner in the 2016 Hype Cycle for Storage Technologies. Backed by Lightspeed Venture Partners and True Ventures, Datos IO is headquartered in San Jose, California. EBOOK 10

Datos IO RecoverX -Centric Data Protection for the Cloud www.datos.io 408.708.4136 info@datos.io 2550 North First Street, Suite 420 San Jose, CA 95131