Your Complete Guide to Backup and Recovery for MongoDB
EBOOK Your Complete Guide to Backup and Recovery for MongoDB Table of Contents Part I: Backup and Recovery for MongoDB Part II: Customer Case Study Part III: RecoverX Overview and Comparison of Current Solutions Part IV: Datos IO Product Summary How important is data backup and recovery for production applications? Must have, critical requirement Nice to have Market Survey Conducted by Dimensional Research March 2015 1% 89% 10% Not required Part I: Backup and Recovery for MongoDB In this era of big data, enterprise applications create a large volume of data that may be structured, semi-structured or unstructured in nature. In addition, application development cycles are much shorter and application availability is a critical requirement. Given these requirements, enterprises are forced to look beyond traditional relational databases to onboard the next-generation applications (on IaaS or cloud-based PaaS). NoSQL databases such as MongoDB are now being adopted and evaluated by enterprises for these applications (ecommerce, content management, etc.). MongoDB provides dynamic schema, easy scaling through auto-sharding, unable consistency for reads, and built-in replication. MongoDB database has native replication that satisfies availability requirements. However, data protection requirements for scalable point-in-time backup and recovery need to be addressed. For robust data protection, yes, enterprises need both backup and replication! Without point-intime backups, organizations are at substantial risk of losing data due to human error, logical corruption and other operational failures. Traditional backup solutions were built to address the requirements of structured applications on relational databases that used shared storage and had the ACID transaction model. Unfortunately, they fall short of addressing the point-in-time backup requirements of next-generation applications and distributed databases. There are a few alternate script-based solutions that enterprises are using to fill the data protection gap but these solutions are suboptimal at best. EBOOK 2
1. Manual Scripted Solutions Manual solutions leverage native MongoDB snapshot utility and scripts to transfer data to secondary storage. The scripts (via mongodump) are customized for each MongoDB cluster and require significant operational effort to scale or adapt to any topology changes (such as addition or removal of nodes to your MongoDB database). Further, these scripts are not resilient to failure scenarios e.g., failure of a node (primary or secondary) or intermittent network issues. Finally, recovery is a manual process, hence, time consuming and results in very high application downtime and contains data loss risk due to any bugs in the scripts. Overall, these solutions work when the MongoDB environment is small and some data loss may be permitted in the application. Top issues of scripted solutions: Lack of enterprise backup solution for sharded configurations Database needs to be offline when snapshots are taken Both backup and recovery fail under node failure and other infrastructure failures Recovery process is manual and requires verifications, which increases recovery time Recovery at collection-level requires manual recovery that is time consuming Recovery to unlike topologies (sharded to unsharded) for test/dev refresh is not available Most enterprises use these scripted methods as a temporary quick-fix solution. It is like driving your car with flat tires you can keep going, however, neither can go at the speed you want to go nor are you risk free from disasters. 2. MongoDB Paid Backup and Recovery (aka MMS ) MongoDB itself provides a couple of ways to backup MongoDB databases. Enterprises may choose from either a managed backup offering (MMS) that runs in public cloud or if they are paid MongoDB customers, they may deploy the backup service on-premise. In addition to being exorbitantly costly, the managed backup service stores customers data in public cloud. Backup data transfer over WAN may not work for customers who deploy MongoDB on-premise and for customers who need to keep their sensitive data in-house. Further, there are significant data limitations per shard to use this service. Using the MongoDB on-premise backup service is possible but is complex to deploy and operationalize (deployment diagram speaks for itself!). Enterprises need to deploy 8 servers, additional databases (with additional licensing) and about ~6-9x storage capacity of the database that is backed up for enabling onpremise backups. Overall, on-premise backup service is a theoretical solution that brings with it significant CAPEX and OPEX investments: 1. Complexity of deploying multiple databases 2. Cost of additional infrastructure (servers and storage) 3. Cost of licensing additional MongoDB nodes 4. Risk of failed backups when nodes fail (secondary from which backup is taken) 5. Siloed backup infrastructure for only MongoDB database Realizing data protection requirements of enterprise customers, the emerging era of next-gen distributed databases, and the limitations of the solutions described above, Datos IO has built the industry-first scale-out data protection software product for nextgeneration applications deployed on distributed and cloud databases such as MongoDB and Apache Cassandra (DataStax). The Datos IO solution is built from the ground-up for next-generation applications, caters to the needs of application owners and DevOps, and takes away the operational hassles of deploying and managing protection infrastructure. Most importantly, it is a reliable and scalable solution to use even in scenarios of node failures which leads to optimal performance through minimized recovery time (RTO). EBOOK 3
This section will introduce the key requirements for protecting data that resides on MongoDB, deployed either onpremise, or on private cloud with as-a-service model, or in public cloud with Amazon AWS, Google Cloud Platform. Requirement #1: Online Cluster-Consistent Backups One of the key requirements of next-generation applications that are deployed on MongoDB is the always-on nature. This means that quiescing the database for taking backups is not feasible and moreover, the backup operation should not impact the performance of the application. As the application scales, the underlying MongoDB also needs to scaleout to multiple shards. In this case, a backup solution must provide a consistent backup copy across shards without disrupting database and application performance during backup operations. Requirement #2: Flexible Backup Options Depending on the application, data may have different change rate and patterns. For example, in a product catalog, certain items may be refreshed everyday (fast selling goods), while the others may have longer shelf life (premium items). Based on the application requirements, some collections may need to be backed up every hour versus the others that may be backed up daily. Providing this flexibility to schedule backups at any interval and at collection level granularity is another requirement that we have heard from customers who are using MongoDB. More importantly, these backups should always be stored on the secondary storage in native formats to avoid vendor lock-in. Advanced Data Management Services Recovery Query-able Versions Analytics Runnable s RecoverX Data Protection Software Platform EBOOK 4
Requirement #3: Scalable Recovery During its lifecycle, data resides in multiple stages such as development, test, pre-production and production, and may also reside in multiple clouds (private cloud and public cloud). The topology of MongoDB clusters at each stage is different. For production, the application could be deployed on a sharded MongoDB cluster on-premise but the test team might only have access to unsharded MongoDB clusters in the Amazon AWS (public cloud). Hence, the backup solution should allow multiple restore operations such as sharded to sharded (such as from 5 3 cluster to 2 3 sharded cluster) or sharded to unsharded (such as 5 3 cluster to 1 3 unsharded) across such cloud configuration. Production (On Premise) Production 3x3 Cluster Test 2x3 Cluster Test/Dev (Cloud) Replica Set 1 Replica Set 2 Replica Set 3 Replica Set 1 Replica Set 2 Dev Unsharded Restore to same topology Restore to different topology Datos IO RecoverX Restore sharded to unsharded Replica Set 1 NFS or Object Storage (S3) Requirement #4: Handling Failure Failures are a norm in the distributed database world. However, the backup solution should be resilient to database process failures, node failures, network failure and even logical corruption of data during backup and recovery operations. Finally, the backup solution should be able to handle failures of MongoDB config servers that store metadata for sharded clusters. Finally, customers are deploying MongoDB on a variety of models such as physical servers, private clouds and micro services like frameworks, and in public cloud. Backup and recovery should be seamless across these deployments and the ease of backup and recovery deployment is a big one for MongoDB customers. At Datos IO, we are working to provide enterprise grade backup and recovery solutions to enable you to onboard and scale your enterprise applications on MongoDB with confidence. EBOOK 5
Part II: Customer Case Study The Customer: Our customer is a North American-based leading entertainment discovery platform company that provides personalized entertainment. This organization has ~1,800+ employees worldwide. Their end customers include Spotify, Shazaam, itunes and Google. The customer deploys an entertainment guide application on a MongoDB database because of its flexibility of schema, ease of use, and scalability given the high-volume nature of their applications. Specifically, they use multiple MongoDB clusters, both sharded and unsharded configuration. They store entertainment content metadata for audio/video programming, as well as sensitive customer data such as name and address. Media & Entertainment The Problem: This customer required a durable backup and recovery solution that is purpose-built for MongoDB, and one that would meet internal IT standards. They initially considered native backup and recovery capabilities of MongoDB, but the complexity of deployment, extremely high TCO, and lack of sharded cluster support made it unsuitable. Our customer uses MongoDB Enterprise with Cloud Manager for operational management of their MongoDB cluster. However, they found that on-premise backup solution from MongoDB is very resource-intensive and requires setting up multiple, dedicated servers (~6-8), large amount of storage (6-8x) and additional MongoDB licenses. Overall, the competitive solution price was ~4x higher than Datos IO RecoverX. In addition, they wanted a single backup and recovery solution that can support multiple databases as they look to adopt additional non-relational databases in the future. The Solution: RecoverX provided our customer with the ability to do: 1) Daily backups, and 2) Recover in the event of operational failures. Specifically, the capability to schedule and perform flexible versioning and recovery (collection-level) operations along with an intuitive user interface is why they chose Datos IO. Additionally, our customer wanted to keep backups local on-premise storage, so needed Datos IO RecoverX to support classical Network Attached secondary storage. Finally, Datos IO RecoverX software-only solution allowed them to deploy on existing hardware and scale across their application environments. EBOOK 6
Environment Details MongoDB Cluster 1 MongoDB Cluster 2 MongoDB Version 3.0 WT 3.0 WT MongoDB Configuration Sharded Unsharded Database Size 1TB 200GB Number of Collections ~100 Several Hundreds Deployment Type Physical Server Physical Servers Storage Type NFS Storage NFS Storage Config Server (Yes/No) Replica Set No Deployment Diagram MongoDB Cluster 1 Storage MongoDB Cluster 2 RS1/P RS2/P RS1/P Parallel Data Transfer Parallel Data Transfer RS1/S RS1/S RS2/S RS2/S Consistency & Duplication RS1/S RS1/S Control Plane Control Plane Cluster 1 : 2 replica set 3.0 WT RecoverX Cluster 2 : 1 replica set 3.0 WT EBOOK 7
Part III: RecoverX Overview and Comparison of Current Solutions Benefits of RecoverX for MongoDB Backup and Recovery Recover in Minutes, Not Hours 80% Savings on Storage Costs ~5x Improvement in DevOps Efficiency Datos IO RecoverX: The Leading Solution for Scalable and Reliable MongoDB Backup and Recovery Datos IO RecoverX is the industry-first scale-out data protection software-only product to deliver scalable and reliable MongoDB backup and recovery solutions. RecoverX provides scalable versioning, 1-click recovery, industry-first semantic de-duplication through a scale-out software-only platform. RecoverX allows organizations to protect their data at any granularity and at any point in time (flexible RPOs), to reduce downtime with recovery in minutes (low RTOs) not hours, to save up to 80% on secondary storage costs, and to increase productivity of applications and DevOps teams. Datos IO Compatibility Specifics for MongoDB MongoDB MongoDB v3.0 (MMAP & WT) MongoDB v3.2 (MMAP & WT) Deployment Type On-Premise AWS Cloud Google Cloud Storage Type NFS AWS S3 Google Cloud Storage 8-core physical or virtual EC2 m4.2xlarge Standard Compute Engine machine 8vCPUs, 30GB RecoverX S/W Requirements 16GB Memory EC2 m4.2xlarge Standard Compute Engine (per node) 8vCPUs, 30GB 128GB Local Storage 128GB EBS or Local SSD 128GB SSD (SSD) EBOOK 8
How Datos IO Stacks up for MongoDB Backup and Recovery Native Tool Mongodump Database Vendor Backup Service RecoverX Customer Value & Benefits Backup Not Consistent Consistent Consistent version at any interval and at any granularity Enterprise grade backup and recovery software Infracture Cost & Complexity Implementation dependent Very high (6-8 servers and storage required) Medium (agents) Single (1) Datos IO Software server 8X lower TCO (saving of 60% to 70%) Impact on Database High (quiescing) Medium (agents) Low Streaming backup, no database impact Recovery Manual and risk prone Slow Parallel recovery to all shards ~4-5X faster recovery (reduced RTO) Enterprise grade with Flexible options to Recovery Option Manual and risk limited Limited options all combinations, such as sharded to enable test/dev and workload migration sharded use case Failure Handling None None Yes, Resilient to database and node failures (including primary) Resiliency and reduced enterprise risk Storage Savings None None Up to 3x (RF=3) Up to 80% reduced storage cost Support for Multiple Databases None None Apache Cassandra and mongo DB Multi-platform, enterprise grade data protection software EBOOK 9
Part IV: Datos IO Product Summary RecoverX IO Overview Datos IO RecoverX is the industry-first scale-out data protection software purpose-built for non-relational databases such as Apache Cassandra, MongoDB, etc. RecoverX provides any point-in-time backup and orchestrated recovery to protect against logical errors, human errors, malicious data corruption, application schema corruption and other soft errors. RecoverX also enables continuous integration and development by fully automating the refresh of test and development environments using production data. Features & Benefits RecoverX provides organizations with the following benefits: Minimize application downtime through application consistent pointin-time backups by removing database repairs post recovery operations Reduce secondary storage costs by ~80% using industry-first global semantic deduplication Failure resiliency and elastic performance via highly-available software-only product Operational efficiency through fully orchestrated recovery Lightweight deployment in public cloud or private datacenter Data Source (Cluster 1) Storage Data Source (Cluster 2) Node 1 Parallel Data Streaming Test/Dev Refresh Node 0 Node 6 Node 2 Node 5 Node 3 Consistency & Duplication Control Plane Node 1 Node 2 Node 4 Control Plane RecoverX 3-Node DevOps About Datos IO Datos IO is the application-centric data management company for the multi-cloud world. Our flagship Datos IO RecoverX delivers a radically novel approach to data management helping organizations embrace the cloud with confidence by delivering solutions that protect, mobilize, and monetize their data at scale. Datos IO was recently awarded Product of the Year by Storage Magazine, and was recognized by Gartner in the 2016 Hype Cycle for Storage Technologies. Backed by Lightspeed Venture Partners and True Ventures, Datos IO is headquartered in San Jose, California. EBOOK 10
Datos IO RecoverX -Centric Data Protection for the Cloud www.datos.io 408.708.4136 info@datos.io 2550 North First Street, Suite 420 San Jose, CA 95131