MongoDB Schema Design

Similar documents
Scaling MongoDB: Avoiding Common Pitfalls. Jon Tobin Senior Systems

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

MongoDB Schema Design for. David Murphy MongoDB Practice Manager - Percona

Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila Prakash, Dhanusha Varik

MongoDB. David Murphy MongoDB Practice Manager, Percona

relational Key-value Graph Object Document

MongoDB Revs You Up: What Storage Engine is Right for You?

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

Scaling for Humongous amounts of data with MongoDB

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

Chapter 24 NOSQL Databases and Big Data Storage Systems

MongoDB. copyright 2011 Trainologic LTD

Course Content MongoDB

Kim Greene - Introduction

MongoDB - a No SQL Database What you need to know as an Oracle DBA

Database Solution in Cloud Computing

MongoDB 2.2 and Big Data

Scaling. Marty Weiner Grayskull, Eternia. Yashh Nelapati Gotham City

Topics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL

Invitation to a New Kind of Database. Sheer El Showk Cofounder, Lore Ai We re Hiring!

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Database Architectures

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Percona Live Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations. Kimberly Wilkins

Document Object Storage with MongoDB

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

The course modules of MongoDB developer and administrator online certification training:

Scaling with mongodb

Oracle NoSQL Database at OOW 2017

Making MongoDB Accessible to All. Brody Messmer Product Owner DataDirect On-Premise Drivers Progress Software

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

How to Scale MongoDB. Apr

MongoDB Architecture

Extreme Computing. NoSQL.

Understanding NoSQL Database Implementations

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

Writing High Performance SQL Statements. Tim Sharp July 14, 2014

CSE 344 JULY 9 TH NOSQL

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence.

AN ALGORITHM FOR MAPPING THE RELATIONAL DATABASES TO MONGODB A CASE STUDY

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

Run your own Open source. (MMS) to avoid vendor lock-in. David Murphy MongoDB Practice Manager, Percona

NoSQL Databases Analysis

FREE AND OPEN SOURCE SOFTWARE CONFERENCE (FOSSC-17) MUSCAT, FEBRUARY 14-15, 2017

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016

MongoDB Tutorial for Beginners

MongoDB Introduction and Red Hat Integration Points. Chad Tindel Solution Architect

<Insert Picture Here> MySQL Cluster What are we working on

Database Systems CSE 414

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

CSE 530A. Non-Relational Databases. Washington University Fall 2013

Scaling MongoDB. Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB Senior Service Technical Service Engineer.

Migrating Oracle Databases To Cassandra

/ Cloud Computing. Recitation 10 March 22nd, 2016

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC

MongoDB An Overview. 21-Oct Socrates

Relational databases

COMP9321 Web Application Engineering

MySQL Views & Comparing SQL to NoSQL

Database Systems CSE 414

File Structures and Indexing

Introduction to NoSQL

Applied NoSQL in.net

MONGODB INTERVIEW QUESTIONS

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Scaling. Yashh Nelapati Gotham City. Marty Weiner Krypton. Friday, July 27, 12

Improving overall Robinhood performance for use on large-scale deployments Colin Faber

MongoDB. History. mongodb = Humongous DB. Open-source Document-based High performance, high availability Automatic scaling C-P on CAP.

DISTRIBUTED DATABASE OPTIMIZATIONS WITH NoSQL MEMBERS

WiredTiger In-Memory vs WiredTiger B-Tree. October, 5, 2016 Mövenpick Hotel Amsterdam Sveta Smirnova

Open Source Database Ecosystem in Peter Zaitsev 3 October 2016

MongoDB Chunks Distribution, Splitting, and Merging. Jason Terpko

Sharding Introduction

CISC 7610 Lecture 2b The beginnings of NoSQL

Non-Relational Databases. Pelle Jakovits

Bringing code to the data: from MySQL to RocksDB for high volume searches

A Journey to DynamoDB

Distributed Data Store

CSC 453 Database Technologies. Tanu Malik DePaul University

Database Architectures

Motivation Overview of NoSQL space Comparing technologies used Getting hands dirty tutorial section

{SDD} Applied NoSQL in.net. Software Design & Development. Michael

Performance improvement of sharding in MongoDB using k-mean clustering algorithm

Distributed Databases: SQL vs NoSQL

Introduction to Database Systems CSE 344

5 Fundamental Strategies for Building a Data-centered Data Center

Cloud Computing. DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech

Networks and Web for Health Informatics (HINF 6220)

Migrating to Vitess at (Slack) Scale. Michael Demmer Percona Live - April 2018

MySQL Database Scalability

Use Cases for Partitioning. Bill Karwin Percona, Inc

מרכז התמחות DBA. NoSQL and MongoDB תאריך: 3 דצמבר 2015 מציג: רז הורוביץ, ארכיטקט מרכז ההתמחות

Physical Design of Relational Databases

Transcription:

MongoDB Schema Design Demystifying document structures in MongoDB Jon Tobin @jontobs

MongoDB Overview NoSQL Document Oriented DB Dynamic Schema HA/Sharding Built In Simple async replication setup Automated elections Sharding engine/router/balancer Aggregation Pipeline Map-Reduce Developers Database Full driver library Work outside of shell Easy to use Read: Easy to get started 2

Terms: What Do They Mean? MySQL Database Table Row Field MongoDB Database Collection Document Key : value pairs 3

Practical Examples

Modeling Data - SQL @ each intersection is a single scalar value 5

Modeling Data - SQL Good Normalization gives guidelines Minimizes redundancy Efficient updates JOIN to get data (in database) Database is feature rich Schema enforces data intergrity TLDR: great for consistency, updates & application simplicity Bad Three queries for data Pre-defined schema constrains agility Complex relationships TLDR: querying and inserting can (will) be inefficient, features may affect performance WHY: It s all about (co)location of relevant data WHERE: At what level should feature be implemented for best performance 6

RDBMS JOINS Assumptions Network latency: 2 ms Single table op:.5 ms Application OPERATION 3 table JOIN operation 2 ms Network 2 ms JOINS are handled by the DB RDBMS Time to App Response 2 + (3 *.5) + 2 = 5.5 ms.5 ms/per Tables 7

MongoDB Design Basics Known as a Developers DB Meaning: put it in the app! No Joins Joins are done in application V3.2 = $LOOKUP =left outer join (no sharding) Dynamic Schemas Fields (keys) can be added anytime Keys don t need to be added to all docs (rows) Keys can have Multiple values (arrays) Multiple key:value pairs (sub-docs) (De)Normalization is up to you What best fits your application Could be a mix 16MB BSON limit on docs Atomicity within a single document NO multi-doc transactions 8

JSON Types Number Text Boolean Array Object Null 9

Modeling Data - MongoDB { _id : ObjectId( 507f1f77bcf86cd799439011 ), studentid : 100, firstname : Jonathan, middlename : Eli, lastname : Tobin, classes : [ { courseid : PHY101, grade : B, coursename : Physics 101, credits : 3 }, { courseid : BUS101, grade : B+, coursename : Business 101, credits : 3 } ] 10

QnD Doc Design Pointers Embed Query performance priority Fields are fairly static Size of doc can be reasonably determined Eventual consistency acceptable { } _id : ObjectId("53d98f1 ") firstname" : Jonathan, lastname" : Tobin, year" : 3, classes" : [ { class : Calc 101, credits : 3, }, { -etc- Reference Insert performance priority Updates are common Immediate consistency necessary Field size can t be determined { } _id : ObjectId("53d98f1 ") firstname" : Jonathan, lastname" : Tobin, year" : 3, classes" : [ ObjectID(<of_class_1>), ObjectID(<of_class_2>), ObjectID(<of_class_3>), ] 11

Embedding Insert Quick Semi efficient Update studentid Quick courseid Complex Inefficient Inconsistent Query Fast Efficient Be mindful: cache thrashing Show collections college.students //sample document { _id : ObjectId( 507f1f77bcf86cd799439011 ), studentid : 100, firstname : Jonathan, middlename : Eli, lastname : Tobin, classes : [ { courseid : PHY101, grade : B, coursename : Physics 101, credits : 3 }, { courseid : BUS101, grade : B+, coursename : Business 101, credits : 3 } ] } 12

Referencing Insert Quick Efficient Update classes Fairly quick courseid Efficient Consistent Query Fast Efficient Show collections college.students college.courses college.grades //sample document { _id : ObjectId("507f1f77bcf86cd799439011") firstname" : Jonathan, lastname" : Tobin, year" : 3, classes" : [ ObjectID(<of_class_1>), ObjectID(<of_class_2>), ObjectID(<of_class_3>), ] } Be mindful: join overhead 13

MongoDB Design - QnD DEPENDS: on use case Embed Efficient lookups Infrequently changed data Often queried data Atomicity Reference Efficient writes Oft excluded data (from queries) Boundless additions Doc size may approach 16MB limit 14

MongoDB JOINs Each query is separate full stack operation Application OPERATION 3 coll JOIN operation Assumptions 2 ms Network 2 ms Network latency: 2 ms Single table op:.5 ms MongoDB.5 ms Time to App Response (2 + 2 +.5) *3 = 13.5 ms 15 Collections

Finding Middle Ground Embed fields that are often fetched If they don t grow boundlessly Limit growing keys to 1/per doc Move to last key Reference fields that are volatile Or are occasionally queried Atomicity can be achieved @ single doc level Take care in design Index judiciously Re-evaluate often Store relevant data Archive old data (when possible) Or delete DON T default to the RDBMS way 16

Sharding an unfortunate name

Sharded Cluster 18

Sharding Mongo distributes data based on shard key Indexed single key Indexed compound key Data chunked by key space Range based Hash based Shard key is immutable Balancing happens in background Inside each shard Between shards Two distinct possibilities: Range: data has low entropy (scatter) > key1 & key2 are likely to be together Hash: data has high entropy (scatter) > key1 & key2 are unlikely to be together 19

Shard Keys For insert speed: (avoid single shard bottleneck) High-entropy shard key (mostly random). Balances load across all shards. Avoid migrations, can be expensive in MongoDB. Range queries are scatter-gathers. Scatter is good. For query speed: (avoid scatter gather queries) Low-entropy shard key (mostly sequential). Range queries should only hit 1 shard. Queries should include shard key. Indexing is still necessary Scatter is bad. Data loading Shard the collection(s) Pre-split and distribute chucks Removes balancer bottleneck 20

@ MongoDB World? When: Thursday 6/30 8:00 PM Where: Park Central Hotel Why: Percona believes in community. Without the entire community we all lose. More Details & Reg: http://bit.ly/perconaoh 21

Percona Live Amsterdam Share Your Knowledge Learn from others Network with The Community Drink (free) Beer!!! Call for papers is open!! https://www.percona.com/live/plam16/ 22

Useful Resources Free Resources Understanding How Your MongoDB Schemas Affect Scaling David Murphy MongoDB Administration for MySQL DBA Alexander Rubin Optimizing MongoDB for High Performance Applications David Murphy MongoDB University FREE ONLINE TRAINING! Books MongoDB: The Definitive Guide by Kristina Chodorow Outdated, but still largely relevant as far as design goes MongoDB Applied Design Patterns by Rick Copeland More up to date than the definitive guide in regards to functionality but is already dated in terms of storage engine 23

DATABASE PERFORMANCE MATTERS