MongoDB. David Murphy MongoDB Practice Manager, Percona

Similar documents
Run your own Open source. (MMS) to avoid vendor lock-in. David Murphy MongoDB Practice Manager, Percona

MongoDB Schema Design for. David Murphy MongoDB Practice Manager - Percona

How to Scale MongoDB. Apr

Course Content MongoDB

MongoDB Architecture

The course modules of MongoDB developer and administrator online certification training:

MongoDB Distributed Write and Read

Percona Live Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations. Kimberly Wilkins

MongoDB Schema Design

Sharding Introduction

MongoDB Shootout: MongoDB Atlas, Azure Cosmos DB and Doing It Yourself

Scaling with mongodb

How to upgrade MongoDB without downtime

Scaling for Humongous amounts of data with MongoDB

MongoDB - a No SQL Database What you need to know as an Oracle DBA

What s new in Mongo 4.0. Vinicius Grippa Percona

Scaling MongoDB. Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB Senior Service Technical Service Engineer.

MongoDB Backup & Recovery Field Guide

MongoDB: Replica Sets and Sharded Cluster. Monday, November 5, :30 AM - 5:00 PM - Bull

Become a MongoDB Replica Set Expert in Under 5 Minutes:

MongoDB 2.2 and Big Data

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Exploring the replication in MongoDB. Date: Oct

VMWARE VREALIZE OPERATIONS MANAGEMENT PACK FOR. MongoDB. User Guide

The Google File System

Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database

Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila Prakash, Dhanusha Varik

MongoDB. copyright 2011 Trainologic LTD

MongoDB Storage Engine with RocksDB LSM Tree. Denis Protivenskii, Software Engineer, Percona

Scaling MongoDB: Avoiding Common Pitfalls. Jon Tobin Senior Systems

MyRocks deployment at Facebook and Roadmaps. Yoshinori Matsunobu Production Engineer / MySQL Tech Lead, Facebook Feb/2018, #FOSDEM #mysqldevroom

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

Choosing a MySQL HA Solution Today

Google File System 2

MongoDB Monitoring and Performance for The Savvy DBA

Distributed File Systems II

Reduce MongoDB Data Size. Steven Wang

GR Reference Models. GR Reference Models. Without Session Replication

Why Choose Percona Server for MongoDB? Tyler Duzan

NPTEL Course Jan K. Gopinath Indian Institute of Science

Percona XtraDB Cluster MySQL Scaling and High Availability with PXC 5.7 Tibor Korocz

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

The Google File System

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

Scaling. Yashh Nelapati Gotham City. Marty Weiner Krypton. Friday, July 27, 12

Review of Morphus Abstract 1. Introduction 2. System design

MySQL Replication. Rick Golba and Stephane Combaudon April 15, 2015

MongoDB Chunks Distribution, Splitting, and Merging. Jason Terpko

Which technology to choose in AWS?

Choosing a MySQL HA Solution Today. Choosing the best solution among a myriad of options

Mike Kania Truss

ITG Software Engineering

MySQL High Availability. Michael Messina Senior Managing Consultant, Rolta-AdvizeX /

The Google File System

GFS: The Google File System. Dr. Yingwu Zhu

Database Architectures

MongoDB Backup and Recovery Field Guide. Tim Vaillancourt Sr Technical Operations Architect, Percona

Jailbreaking MySQL Replication Featuring Tungsten Replicator. Robert Hodges, CEO, Continuent

NoSQL Databases Analysis

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Everything You Need to Know About MySQL Group Replication

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013

GitHub's online schema migrations for MySQL

MongoDB in AWS (MongoDB as a DBaaS)

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

MATH is Hard: TTL Index Configuration and Considerations. Kimberly Wilkins Sr.

Document Object Storage with MongoDB

Map-Reduce. Marco Mura 2010 March, 31th

NoSQL BENCHMARKING AND TUNING. Nachiket Kate Santosh Kangane Ankit Lakhotia Persistent Systems Ltd. Pune, India

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

The Google File System (GFS)

Putting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt

Fluentd + MongoDB + Spark = Awesome Sauce

MMS Backup Manual Release 1.4

MySQL Group Replication. Bogdan Kecman MySQL Principal Technical Engineer

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

Distributed Computation Models

CSE 124: Networked Services Fall 2009 Lecture-19

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Google File System. By Dinesh Amatya

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions

Scaling. Marty Weiner Grayskull, Eternia. Yashh Nelapati Gotham City

Integrity in Distributed Databases

Extreme Computing. NoSQL.

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

Highly Available Database Architectures in AWS. Santa Clara, California April 23th 25th, 2018 Mike Benshoof, Technical Account Manager, Percona

API Gateway 8.0 Multi-Regional Deployment

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

CA485 Ray Walshe NoSQL

MongoDB: Comparing WiredTiger In-Memory Engine to Redis. Jason Terpko DBA, Rackspace/ObjectRocket 1

Scale out Read Only Workload by sharing data files of InnoDB. Zhai weixiang Alibaba Cloud

Switching to Innodb from MyISAM. Matt Yonkovit Percona

CS435 Introduction to Big Data FALL 2018 Colorado State University. 11/7/2018 Week 12-B Sangmi Lee Pallickara. FAQs

NosDB vs DocumentDB. Comparison. For.NET and Java Applications. This document compares NosDB and DocumentDB. Read this comparison to:

CSE 530A. Non-Relational Databases. Washington University Fall 2013

TokuDB vs RocksDB. What to choose between two write-optimized DB engines supported by Percona. George O. Lorch III Vlad Lesin

CSE 124: Networked Services Lecture-16

Transcription:

MongoDB Click Replication to edit Master and Sharding title style David Murphy MongoDB Practice Manager, Percona

Who is this Person and What Does He Know? Former MongoDB Master Former Lead DBA for ObjectRocket, original high-performance DBaaS for Mongo RDBMS background: 15+ years in MySQL NoSQL/MySQL Architect @ Electronic Arts Contributor to Mongo Core and tool builder 2

MongoDB Replication and Sharding Mongo Architecture OpLogs and Journals How replication works Special Replicaset settings What s in a configsvr? Talk through key collections What exactly is a chunk? What is it, and what happens in its life? Picking a shard key Bearing Fruit Understand what happened recently 3 Keeping thing balanced

4 Mongo Architecture - Overview

Mongo Architecture - Node Single Mongod process A single server could have many processes Holds the full data and index data Uses journalling for recovery Similar to MySQL node with no replication being used 5

Mongo Architecture - Replica Set Group of Nodes Forms the basis of HA via elections All members are equal by default and replicate to each other Oplog used in place of MySQL stype binlogs, its is a Capped collection, which is timestamp based and idempotent Will auto-heal/ rebuild bad secondary nodes 6

Mongo Architecture - Shard If sharding enabled Replica Sets are groups or shards in a clustered setup Additional thread started on primary call chunk manager Filters queries for only data the chunk should own Do the leg work of balancing while another starts the process 7

Mongo Architecture - Config Server In 3.2+ this is a special replica set, before that 1 or 3 stand alone machines for meta data Database Collections Chunks Changelog Lock Manager Mongos list 8

Mongo Architecture - Mongos If config servers are the memory Mongos is the brain Sharding Commands run on this Responsible for splitting chunks to keep cluster balanced chunk Responsible for tell 1 shard to drain and another to receive a Groups and sort multi-shard queries (scatter-gather) 9 Connect to shards in HA way for fault tolerance

Oplogs and Journals Oplog is like a MySQL binary log but bound by Timestamp (ntpd needed) is a capped collection / circular buffer uses statement based not binary is idempotent Unable to change an oplog size online (may change in 3.4) One statement updating 1M docs is 1M oplog entries Journaling Mongo has journaling like iblogs in MySQL but Different engines may or may not need them 10

How replication works Replication connections are normal connections, and may hit max cons No Separate SQL/IO threads Relay logs kept on node for buffering Uses tailable cursor model to detect when more data is available No indexes on timestamp Hard coded batch sizes High latency is a killer! Rollbacks are pure EVIL due to lack of re-integration 11

Special Replicaset settings 12 Delayed Secondary Used to have an offset node ( can be used for reporting) incase something bad happens. Many people will run ½ the backup offset, so at worst they roll forward 50% of the data in a recovery Hidden Used with Tags or delayed to prevent normal queries from using a node, sometimes you will backup with a hidden node. Tags Able to let queries tags specific nodes, maybe with special index nodes Chained Replication Secondary can replicate from Secondaries

What s in a configsvr? Key Collections Databases/Collections Chunks Shards Changelog Mongos Locks Types of Changes: Splits movechunks 13

What exactly is a chunk? What is it Simply said is a logical range of documents Is not physical Splits do not cause data movement by themselves Ranges live in config.chunks and tell mongos where to find data Lifecycle Splits - When a chunk grows the range is broken into part to keep balance. MoveChunks has stages of start, to, from, commit movechunk.from stage 4 is copying to new location Orphans - uncleaned up docs on source 14

Picking a shard key Must be immutable Should not be non-unique ( low cardinality prevents splits) No arrays, geo, sparse/partial indexes, or nulls No incremental fields on the left side of the shard key Never use _id as a stand alone key unless it is hashed Avoid using dates and sequential user id's If they are need put them on the right side Text fields are big try and avoid them Unique fields must be on the left side 15

Understand what happened recently Changelog - Best source of truth for all sharding actions Balancing and Splits New/Drop DB s and Collections New Sharding Locks Good to help understand if something is still running or done Logs Need to look at Mongos and Mongod logs ( donor and target) 16

Keeping things balanced Balancer can get stuck on a problematic collection Can manually run sh.movechunk to balance other collections Using changelog find bad chunk and move it by hand If it s a jumbo should be able to run sh.split on it Draining can be a pain Lookup sh.movechunk, sometimes doing it yourself will be easier than removeshard Avoid restarting mongos to often, they track when a split should happen based on a in memory counter, restarts lose this. 17

Bonus movechunk.from Stages Parse and Prepare Check config servers, and activate a migration lock Find documents on the donor shard and get a sort ordered list Copy chunk data to the destination shard (24 hour timeout), keep oplog Take a global lock Log event to changelog Update config.chunks and config version Cleanup* ( delete from source) and unlock for next migration 18

Thanks and Enjoy the Conference! Follow me on Twitter : @dmurphy_data Percona is hiring, are you ready to be challenged?