Mike Kania Truss

Similar documents
Scaling MongoDB. Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB Senior Service Technical Service Engineer.

How to Scale MongoDB. Apr

Which technology to choose in AWS?

MongoDB in AWS (MongoDB as a DBaaS)

Time-Series Data in MongoDB on a Budget. Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018

Scaling with mongodb

Become a MongoDB Replica Set Expert in Under 5 Minutes:

Run your own Open source. (MMS) to avoid vendor lock-in. David Murphy MongoDB Practice Manager, Percona

Introduction to Database Services

Aurora, RDS, or On-Prem, Which is right for you

Advanced Architectures for Oracle Database on Amazon EC2

EVCache: Lowering Costs for a Low Latency Cache with RocksDB. Scott Mansfield Vu Nguyen EVCache

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN. PostgresConf US

AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS

Running MySQL on AWS. Michael Coburn Wednesday, April 15th, 2015

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

Scaling MongoDB: Avoiding Common Pitfalls. Jon Tobin Senior Systems

MongoDB Backup & Recovery Field Guide

MongoDB Shootout: MongoDB Atlas, Azure Cosmos DB and Doing It Yourself

MongoDB Revs You Up: What Storage Engine is Right for You?

MongoDB Backup and Recovery Field Guide. Tim Vaillancourt Sr Technical Operations Architect, Percona

AWS_SOA-C00 Exam. Volume: 758 Questions

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN

WiredTiger In-Memory vs WiredTiger B-Tree. October, 5, 2016 Mövenpick Hotel Amsterdam Sveta Smirnova

Upgrading Databases. without losing your data, your performance or your mind. Charity

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

MongoDB Management Suite Manual Release 1.4

Amazon AWS-Solution-Architect-Associate Exam

How to upgrade MongoDB without downtime

GFS: The Google File System. Dr. Yingwu Zhu

The course modules of MongoDB developer and administrator online certification training:

Reduce MongoDB Data Size. Steven Wang

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

MongoDB Monitoring and Performance for The Savvy DBA

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

Amazon Aurora Deep Dive

4 Myths about in-memory databases busted

MongoDB. David Murphy MongoDB Practice Manager, Percona

Servers fail, who cares? (Answer: I do, sort of) Gregg Ulrich, #netflixcloud #cassandra12

Amazon AWS and RDS, moving towards it. Dimitri Vanoverbeke Solution Percona

Amazon Aurora Deep Dive

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

Pass4test Certification IT garanti, The Easy Way!

Highway to Hell or Stairway to Cloud?

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN. PGConf.EU 2017, Warsaw

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

MongoDB on Kaminario K2

Migrating and living on RDS/Aurora. life after Datacenters

PrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps

Persistent Storage with Docker in production - Which solution and why?

Cloudian Sizing and Architecture Guidelines

Making Non-Distributed Databases, Distributed. Ioannis Papapanagiotou, PhD Shailesh Birari

ITG Software Engineering

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

SQL Server Performance on AWS. October 2018

Amazon. Exam Questions AWS-Certified-Solutions-Architect- Professional. AWS-Certified-Solutions-Architect-Professional.

MongoDB - a No SQL Database What you need to know as an Oracle DBA

Highly Available Database Architectures in AWS. Santa Clara, California April 23th 25th, 2018 Mike Benshoof, Technical Account Manager, Percona

Elastic Compute Service. Quick Start for Windows

How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan

CIT 668: System Architecture. Amazon Web Services

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

8/3/17. Encryption and Decryption centralized Single point of contact First line of defense. Bishop

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

OnCommand Cloud Manager 3.2 Deploying and Managing ONTAP Cloud Systems

Exploring the replication in MongoDB. Date: Oct

Innodb Performance Optimization

Amazon ElastiCache 8/1/17. Why Amazon ElastiCache is important? Introduction:

Windows Servers In Microsoft Azure

~3333 write ops/s ms response

HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION

MyRocks deployment at Facebook and Roadmaps. Yoshinori Matsunobu Production Engineer / MySQL Tech Lead, Facebook Feb/2018, #FOSDEM #mysqldevroom

Amazon Web Services Training. Training Topics:

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

Percona Live Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations. Kimberly Wilkins

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

HCI: Hyper-Converged Infrastructure

Cloud Computing /AWS Course Content

Migrating to Cassandra in the Cloud, the Netflix Way

ZFS and MySQL on Linux, the Sweet Spots

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content

Designing Fault-Tolerant Applications

Key metrics for effective storage performance and capacity reporting

Lessons learned while automating MySQL in the AWS cloud. Stephane Combaudon DB Engineer - Slice

AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT

Volumes as a Micro-Service Distributed Block Storage Enabled by Docker Sheng Yang Rancher Labs

AWS Solutions Architect Exam Tips

RocksDB Key-Value Store Optimized For Flash

MongoDB: Comparing WiredTiger In-Memory Engine to Redis. Jason Terpko DBA, Rackspace/ObjectRocket 1

Creating the Fastest Possible Backups Using VMware Consolidated Backup. A Design Blueprint

Amazon EC2 Deep Dive. Michael #awssummit

Microsoft Windows Server Failover Clustering (WSFC) and SQL Server AlwaysOn Availability Groups on the AWS Cloud: Quick Start Reference Deployment

Storage Optimization with Oracle Database 11g

MongoDB Schema Design for. David Murphy MongoDB Practice Manager - Percona

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Modernize Your Backup and DR Using Actifio in AWS

Google File System 2

Transcription:

Mike Kania Engineer @ Truss http://truss.works/

MongoDB on AWS With Minimal Suffering +

Topics Provisioning MongoDB Replica Sets on AWS Choosing storage and a storage engine Backups Monitoring Capacity Planning

Why Shouldn t You Listen to Me? MongoDB is a jack of all trades, and there s certain features that I haven t touched. Sharding built custom way to shard data Aggregation/Map Reduce didn t touch this at all

Why Should You Listen to Me? Ran one of the most complex MongoDB infrastructures(in the world?) Started using MongoDB in v1.8 Ran MongoDB on a platform that supported hundreds of thousands of developers Did crazy shit with MongoDB

What happens when you manage MongoDB for 2 years

Do You Need to Run Your own MongoDB? Hosted MongoDB services do exist and they aren t terrible, but there is a cost mlab(formerly MongoLab) ObjectRocket Compose.io Parse(RIP)

I thought we were going to talk about MongoDB on AWS

Getting your AWS House in Order Use VPC(s) EC2 Classic is full of problems VPC is better in every way except for IPv6 Run all your servers using hardware virtual machine(hvm) AMIs Use Enhanced Networking on certain instance types Way better packets per second limits Less packet loss

Starter Replica Set 1 Primary and 2 Secondaries In at least 3 Availability Zones because redundancy Each node has one vote(default) 1 Secondary is dedicated backup and should never be primary hidden=true priority = 0

Replica Set

SERVER-17882 - Update with key too large to index crashes WiredTiger/RockDB secondary

Replica Set

Arbiters Mongod processes that do nothing but vote Highly reliable and mostly stateless Easy to run multiple arbiters instances on a single host

Arbiters

Arbiters

Storage Engines Choosing a storage engine will influence your options around instance types and disk storage It also depends on how you use MongoDB Read-heavy MMAP WiredTiger Write-heavy RocksDB If you have an existing workload use Flashback to benchmark each storage engine

MMAP Only storage that existed before v3.0 Most stable of all the storage engines It is what it says. Memory Mapped Files Uses B-Trees as the data structure No performance tuning Trusts the underlying OS will do the right thing No Compression Collection level locking

WiredTiger The next generation storage engine that is default in 3.2 B-trees in 3.0 and 3.2 Up to 10x compression over MMAP Document level locks Good for mixed read/write workloads

RockDB RocksDB is an embedded database written and deployed at Facebook Uses log structured merge(lsm) trees as the underlying data structure Designed for fast writes Document level locks Up to 10x compression over MMAP Incremental backups with Strata Supported by Percona

Disk Storage EBS Built in block level snapshots and restores Up to 20,000 IOPS per volume and overall max throughput of 800MB/ sec No longer need to RAID unless you have a dataset > 16TB! Restoring from snapshot requires expensive pre-warm step Ephemeral Storage Up to 315,000 Write IOPS and 365,000 Read IOPS Lives and dies with the EC2 instance Low network latencies

EBS Standard Never use these Spinning drives are so 2012 GP2 Max 10,000 IOPS per EBS volumes Max 160MB/sec per volume SSD backed but not as performant as PIOPS Provisioned IOPS Max 20,000 IOPS per EBS volume Max 320MB/sec per volume

Ephemeral Storage Use i2 instance class Use only with RockDB and Strata for backups No need to pre-warm No reliance on network for storage Low latency and High throughput Goes POOF with the EC2 instance

Backups EBS Snapshots Incremental backups at the block level Works on all storage engines Impacts performance Strata Open source backup tool used by Parse Incremental backups to S3 Support RocksDB only

EBS Snapshots Avoid RAIDing EBS volumes 16TB is more than enough for anyone Dedicated Secondary for backups Set priority = 0 Set hidden = 1 Lock mongo db.fsynclock() or xfs_freeze if using XFS Restoring from snapshot comes with pre-warming cost use dd or fio to read all the blocks

Strata No need for dedicated backup node Use with ephemeral storage Turn on Cross Region Replication for the S3 bucket Running a strata backup to S3 strata backup -r=mydata-db1 -b=s3-mongo-backups Prune metadata for backups older than a certain date strata delete -r=mydata-db1 -b=s3-mongo-backups -a=720h Delete data files that are orphaned by strata delete strata gc -r=mydata-db1 -b=s3-mongo-backups

Planning for More Mongo Try to keep replica sets to 2-3TBs in size Allows for reasonable time for initial syncs and restoring from backup If running MMAP, keep an eye on lock % as the metric for adding more capacity > 50% means it s time to add more

Configuration Chef/Ansible/Salt Pick your poison.. as long as it s not CFEngine Parse chose Chef, but eventually grew out of it Most important lesson learned was limiting scope to managing system level configs and services ulimits cronjobs sysctl

Managing Running Replica Sets Mongoctl Keeps state in JSON file or Database Allows for centralizing commands like rsyncsecondary or configure-cluster Write your own open source it because more tools like this need to exist

CloudFormation Suited for managing stateless services or services that don t change often Auto Scaling Groups VPCs/Security groups/nat Gateways Orchestration becomes difficult once you start dealing with stateful EBS volumes Also doesn t really fit into the need for managing cluster state Adding removing nodes Changing priorities Rolling upgrade

Monitoring Log all queries to your favorite centralized logging service MMAP Lock % the biggest bottleneck RockDB Tombstones CPU and DiskIO

Logging Pipeline Log all queries, not just the slow ones Open source logtailer written in Go that can output Mongo logs as consistent JSON https://github.com/parseplatform/logtailer Fire into favorite centralized logging services

Qs and As

Links RockDB - http://rocksdb.org/ Compose - compose.io Object Rocket - http://objectrocket.com/ mlab - https://mlab.com/ Strata - https://github.com/facebookgo/rocks-strata Mongo Logtailer - https://github.com/parseplatform/logtailer Mongoctl - https://github.com/mongolab/mongoctl Terraform - https://www.terraform.io/ CloudFormation - https://aws.amazon.com/documentation/cloudformation/