MongoDB 2.2 and Big Data
|
|
- Barry Daniels
- 5 years ago
- Views:
Transcription
1 MongoDB 2.2 and Big Data Christian Kvalheim Team Lead Engineering, christiankvalheim.com
2
3 From here...
4 ...to here...
5 ...without buying several of these
6 Warning! This is a technical talk But MongoDB is very simple!
7 Solving real world data problems with MongoDB Effective schema design for scaling Linking versus embedding Bucketing Time series Implications of sharding keys with alternatives Read scaling through replication Challenges of eventual consistency
8 Since the dawn of the RDBMS Main memory Intel 1103, 1k bits 4GB of RAM costs $25.99 Mass storage IBM 3330 Model 1, 100 MB 3TB Superspeed USB for $129 Microprocessor Nearly 4004 being developed; 4 bits and 92,000 instructions per second Westmere EX has 10 cores, 30MB L3 cache, runs at 2.4GHz
9 More recent changes A decade ago Now Faster Buy a bigger server Buy more servers Faster storage A SAN with more spindles SSD More reliable storage More expensive SAN More copies of local storage Deployed in Your data center The cloud private or public Large data set Millions of rows Billions to trillions of rows Development Waterfall Iterative
10 What MongoDB solves Agility Applications store complex data that is easier to model as documents Schemaless DB enables faster development cycles Flexibility Relaxed transactional semantics enable easy scale out Auto Sharding for scale down and scale up Cost Cost effective operationalize abundant data (clickstreams, logs, tweets,...)
11 Design Goal of MongoDB scalability & performance memcached key/value RDBMS depth of functionality
12 Effective Schema Design
13 Design Schema for Twitter Model each users activity stream Users Name, address, display name Tweets Text Who Timestamp
14 Solution A Two Collections // users - one doc per user { _id: "alvin", "alvin@10gen.com", display: "jonnyeight" } // tweets - one doc per user per tweet { user: "bob", tweet: " ", text: "Best Tweet Ever!", ts: ISODate(" T09:56:06.298Z") }
15 Solution B Embedded Tweets // users - one doc per user with all tweets { _id: "alvin", "alvin@10gen.com", display: "jonnyeight", tweets: [! {!! user: "bob",!! tweet: " ",!! text: "Best Tweet Ever!", ts: ISODate(" T09:56:06.298Z")! } ] }
16 Embedding Great for read performance One seek to load entire object One roundtrip to database Writes can be slow if adding to objects all the time
17 Linking or Embedding? Linking can make some queries easy // Find latest 50 tweets for "alvin" > db.tweets.find( { _id:"alvin"} ).sort( {ts:-1} ).limit(10) But what effect does this have on the systems?
18 Collection 1 Index 1
19 Collection 1 Virtual Address Space 1 Index 1 This is your virtual memory size (mapped)
20 Collection 1 Virtual Address Space 1 Physical RAM Index 1 This is your resident memory size
21 Collection 1 Virtual Address Space 1 Disk Physical RAM Index 1
22 Collection 1 Virtual Address Space 1 Disk Physical RAM Index 1 = = 100 ns 10,000,000 ns
23 Collection 1 Virtual Address Space 1 Disk Physical RAM Index db.tweets.find( { _id:"alvin"} ).sort( {ts:-1} ).limit(10) 3 Linking = Many Random Read
24 Collection 1 Virtual Address Space 1 Disk Physical RAM Index 1 db.tweets.find( { _id:"alvin"} ) 1 Embedding = Large Sequential Read
25 Problems Large sequential reads Good: Disks are great at Sequential reads Bad: Read too much data Many Random reads Good: Easy of query Bad: Disks are poor at Random reads (SSD?)
26 Solution C Buckets // tweets : one doc per user per day > db.tweets.findone() { _id: "alvin-2011/12/09", "alvin@10gen.com", tweets: [! { user: "Bob",! tweet: " ",! text: "Best Tweet Ever!" },! { author: "Joe",! date: "May ",! text: "Stuck in traffic (again)" } ]! }
27 Solution C Last 10 Tweets db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ).sort( { _id: -1 } ).limit(1)
28 Collection 1 Virtual Address Space 1 Disk Physical RAM Index 1 db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ).sort( { _id: -1 } ).limit(1) 1 Bucket = Small Sequential Read
29 Sharding
30 Sharding - Goals Data location transparent to your code Data distribution is automatic Data re-distribution is automatic Aggregate system resources horizontally No code changes
31 Sharding - Range distribution sh.shardcollection("test.tweets", {_id: 1}, false) shard01 shard02 shard03
32 Sharding - Range distribution shard01 shard02 shard03 a-i j-r s-z
33 Sharding - Splits shard01 shard02 shard03 a-i ja-jz s-z k-r
34 Sharding - Splits shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r
35 Sharding - Auto Balancing shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw js-jw jz-r jz-r
36 Sharding - Auto Balancing shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r
37 How does sharding effect Schema Design? Sharding key choice Access patterns (query versus write)
38 Sharding Key { photo_id :????, data : <binary> } What s the right key? auto increment MD5( data ) month() + MD5( data )
39 Right balanced access Only have to keep small portion in ram Right shard "hot" Time Based ObjectId Auto Increment
40 Random access Have to keep entire index in ram All shards "warm" Hash
41 Segmented access Have to keep some index in ram Some shards "warm" Month + Hash
42 Shard Key Example { _id : "alvin", // shard key "alvin@10gen.com", display: "jonnyeight" li: "alvin.j.richards", tweets: [... ] } Shard on { _id : 1 } Lookup by _id routed to 1 node Index on { 1 }
43 Sharding - Routed Query find({_id: "alvin"}) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r
44 Sharding - Routed Query find({_id: "alvin"}) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r
45 Sharding - Scatter Gather find({ "alvin@10gen.com"}) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r
46 Sharding - Scatter Gather find({ "alvin@10gen.com"}) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r
47 Multiple Identities User can have multiple identities twitter name address etc. What is the best sharding key & schema design?
48 Solution A Multiple Identities { _id: "alvin", display: "jonnyeight", "alvin@10gen.com", li: "alvin.j.richards", // linkedin tweets: [... ] } Shard on { _id: 1 } Lookup by _id routed to 1 node Lookup by li or is scatter gather Cannot create a unique index on li or
49 Solution B Multiple Identities identities { type: "_id", val: "alvin", info: " "} { type: "em", val: "alvin@10gen.com", info: " "} { type: "li", val: "alvin.j.richards",info: " "} tweets { _id: " ", tweets : [... ] } Shard identities on { type : 1, val : 1 } Lookup by type & val routed to 1 node Can create unique index on type & val Shard info on { _id: 1 } Lookup info on _id routed to 1 node
50 Sharding - Routed Query shard01 shard02 shard03 type: em val: a-q "Min"- "1100" type: li val: s-z type: em val: r-z type: li val: d-r "1100"- "1200" type: _id val: a-z "1200"- "Max" type: li val: a-c
51 Sharding - Routed Query find({type: "em", val: "alvin@10gen.com}) shard01 shard02 shard03 type: em val: a-q "Min"- "1100" type: li val: s-z type: em val: r-z type: li val: d-r "1100"- "1200" type: _id val: a-z "1200"- "Max" type: li val: a-c
52 Sharding - Routed Query find({type: "em", val: "alvin@10gen.com}) find({_id: " "}) shard01 shard02 shard03 type: em val: a-q "Min"- "1100" type: li val: s-z type: em val: r-z type: li val: d-r "1100"- "1200" type: _id val: a-z "1200"- "Max" type: li val: a-c
53 Sharding - Caching 96 GB Mem shard01 a-i 300 GB Data j-r s-z 3:1 Data/Mem
54 Sharding - Caching 96 GB Mem 96 GB Mem 96 GB Mem shard01 shard02 shard03 a-i j-r s-z 300 GB Data 1:1 Data/Mem 1:1 Data/Mem 1:1 Data/Mem
55 Auto Sharding - Summary Fully consistent Application code unaware of data location Zero code changes Shard by Compound Key, Tag, Hash (2.4) Add capacity On-line When needed Zero downtime
56 Replica Sets Data Protection Multiple copies of the data Spread across Data Centers, AZs High Availability Automated Failover Automated Recovery
57 Replica Sets App Write Read Primary Asynchronous Replication Read Secondary Read Secondary
58 Replica Sets App Write Read Read Primary Secondary Read Secondary
59 Replica Sets App Primary Write Read Read Primary Secondary Automatic Election of new Primary
60 Replica Sets App Recovering Write Read Read Primary Secondary New primary serves data
61 Replica Sets App Read Write Read Read Secondary Primary Secondary
62 Replica Sets - Summary Data Protection High Availability Scaling eventual consistent reads Source to feed other systems Backups Indexes (Solr etc.)
63 Types of Durability with MongoDB Fire and forget Wait for error Wait for fsync Wait for journal sync Wait for replication
64 Durability Summary Memory Journal Secondary Other Data Center RDBMS Default "Fire & Forget" w=1 w=1 j=true w="majority" w=n w="mytag" Less Durability More Durability
65 Eventual Consistency
66 Understanding Eventual Consistency Application Insert Read Update Primary v1 v2 Time Replicated Replicated Secondary v1 v2
67 Understanding Eventual Consistency Does not exist Application Insert Read Update Reads v1 Primary v1 v2 Reads Replicated Replicated v1 Secondary Application #2 v1 Read Read Read Read v2 Reads v2
68 Mongo 2.2 Big Data support
69 Time to live collections TTL Features set a time to live for documents in a collection automatic removal of documents without big batch purges Example db.log.events.ensureindex( { "status": 1 }, { expireafterseconds: 3600 } )
70 Aggregation framework Declarative No JavaScript required C++ implementation Higher performance than JavaScript Expression evaluation Return computed values Framework: we can add new operations easily
71 Aggregation framework Series of operations Members of a collection are passed through a pipeline to produce a result
72 Pipeline Operations $match Uses a query predicate (like.find({ })) as a filter $project Uses a sample document to determine the shape of the result (similar to.find() s optional argument) This can include computed values $unwind Hands out array elements one at a time $group Aggregates items into buckets defined by a key
73 Pipeline Operations (continued) $sort Sort documents $limit Only allow the specified number of documents to pass $skip Skip over the specified number of documents
74 Example: finding popular checkins checkin_1 = { location: location_id, user: user_id, ts: "20/03/2010" } // Find most popular locations in last 3 hours > agg = db.checkins.aggregate( {$match: {ts: {$gt: now_minus_3_hrs}}}, {$group: {_id: "$location", numentries: {$sum: 1}}} {$sort: {numentries : -1 }} ) > agg.result [{"_id": "Din Tai Fung", "numentries" : 17},...]
75 Tag aware sharding
76 Tag aware sharding // A document we need to store in the uk { _id: "...", k: {c: jp, uname: jhilman } } // Define a shard tag range where documents containing the field c = jp are stored in servers tagged with JP > sh.addtagrange( mydb.users, {k: {c: jp }}, {{c: jp }}, JP )
77 As your data processing needs increase you will want to use a tool designed for the job
78 Hadoop Map Reduce InputFormat Map (k 1, v 1, ctx) Runs on same thread as map Many map operations ctx.write(k 2,v 2 ) Combiner(k 2,values 2 ) 1 at time per input split similar to Mongo's group same as Mongo's emit Partitioner(k 2 ) Sort(keys 2 ) k 2, v 3 similar to Mongo's reducer similar to Mongo's Finalize Reducer threads Output Format Reduce(k 3,values 4 ) k f,v f Runs once per key
79 MongoDB & Hadoop MongoDB same as Mongo's shard chunks (64mb) Many map operations 1 at time per input split single server or sharded cluster Creates a list of Input Splits (InputFormat) each split each split each split RecordReader Map (k Map (k 1, 1 v, 1, v 1 ctx), ctx) Map (k 1, v 1, ctx) ctx.write(k 2,v 2 ctx.write(k 2 ),v 2 ctx.write(k 2 ),v 2 ) Runs on same thread as map Combiner(k 2,values 2 ) Combiner(k Combiner(k 2,values 2,values 2 ) 2 ) k 2, k 2 v 3, k 2 v 3, v 3 Partitioner(k 2 ) Partitioner(k 2 ) Partitioner(k 2 ) Sort(keys 2 ) Sort(k 2 ) Sort(k 2 ) MongoDB Reducer threads Output Format Reduce(k 2,values 3 ) k f,v f Runs once per key
80 Product & Roadmap
81 The Evolution of MongoDB March 11 Sept 11 Aug 12 winter 12 Journaling Sharding and Replica set enhancements Spherical geo search Index enhancements to improve size and performance Authentication with sharded clusters Replica Set Enhancements Concurrency improvements Aggregation Framework Multi-Data Center Deployments Improved Performance and Concurrency
82 2.4 Roadmap Must Kerberos integration LDAP/AD integration Nice To Have (not 100% yet) Full Text Search Hash Shard Key Background Index Build on Secondaries V8 for Map/Reduce Geo: intersecting polygons, Geo shard key
83 Morning With MongoDB Barcelona, Martim Museum 18th October Free Technical Overview, Roadmap, Common Use Cases, Customer Projects
84
Scaling for Humongous amounts of data with MongoDB
Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com From here... http://bit.ly/ot71m4 ...to here... http://bit.ly/oxcsis
More informationScaling with mongodb
Scaling with mongodb Ross Lawley Python Engineer @ 10gen Web developer since 1999 Passionate about open source Agile methodology email: ross@10gen.com twitter: RossC0 Today's Talk Scaling Understanding
More informationCourse Content MongoDB
Course Content MongoDB 1. Course introduction and mongodb Essentials (basics) 2. Introduction to NoSQL databases What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL
More informationThe course modules of MongoDB developer and administrator online certification training:
The course modules of MongoDB developer and administrator online certification training: 1 An Overview of the Course Introduction to the course Table of Contents Course Objectives Course Overview Value
More informationOpen source, high performance database. July 2012
Open source, high performance database July 2012 1 Quick introduction to mongodb Data modeling in mongodb, queries, geospatial, updates and map reduce. Using a location-based app as an example Example
More informationSQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden
SQL, NoSQL, MongoDB CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden SQL Databases Really better called Relational Databases Key construct is the Relation, a.k.a. the table Rows represent records Columns
More informationMongoDB - a No SQL Database What you need to know as an Oracle DBA
MongoDB - a No SQL Database What you need to know as an Oracle DBA David Burnham Aims of this Presentation To introduce NoSQL database technology specifically using MongoDB as an example To enable the
More informationScaling MongoDB. Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB Senior Service Technical Service Engineer.
caling MongoDB Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB enior ervice Technical ervice Engineer 1 Me and the expected audience @adamotonete Intermediate - At least 6+ months
More informationHow to Scale MongoDB. Apr
How to Scale MongoDB Apr-24-2018 About me Location: Skopje, Republic of Macedonia Education: MSc, Software Engineering Experience: Lead Database Consultant (since 2016) Database Consultant (2012-2016)
More informationMongoDB. David Murphy MongoDB Practice Manager, Percona
MongoDB Click Replication to edit Master and Sharding title style David Murphy MongoDB Practice Manager, Percona Who is this Person and What Does He Know? Former MongoDB Master Former Lead DBA for ObjectRocket,
More informationPercona Live Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations. Kimberly Wilkins
Percona Live 2016 Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Kimberly Wilkins Principal Engineer - Databases, Rackspace/ ObjectRocket www.linkedin.com/in/wilkinskimberly,
More informationCouchbase Architecture Couchbase Inc. 1
Couchbase Architecture 2015 Couchbase Inc. 1 $whoami Laurent Doguin Couchbase Developer Advocate @ldoguin laurent.doguin@couchbase.com 2015 Couchbase Inc. 2 2 Big Data = Operational + Analytic (NoSQL +
More informationTime-Series Data in MongoDB on a Budget. Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018
Time-Series Data in MongoDB on a Budget Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018 TIME SERIES DATA in MongoDB on a Budget Click to add text
More informationGroup13: Siddhant Deshmukh, Sudeep Rege, Sharmila Prakash, Dhanusha Varik
Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila Prakash, Dhanusha Varik mongodb (humongous) Introduction What is MongoDB? Why MongoDB? MongoDB Terminology Why Not MongoDB? What is MongoDB? DOCUMENT STORE
More informationJargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems
Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons
More informationDocument Object Storage with MongoDB
Document Object Storage with MongoDB Lecture BigData Analytics Julian M. Kunkel julian.kunkel@googlemail.com University of Hamburg / German Climate Computing Center (DKRZ) 2017-12-15 Disclaimer: Big Data
More informationMongoDB Schema Design
MongoDB Schema Design Demystifying document structures in MongoDB Jon Tobin @jontobs MongoDB Overview NoSQL Document Oriented DB Dynamic Schema HA/Sharding Built In Simple async replication setup Automated
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017
Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 About the Presentation Problems Existing Solutions Denis Magda
More informationNoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,
More informationConceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.
Conceptual Modeling on Tencent s Distributed Database Systems Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc. Outline Introduction System overview of TDSQL Conceptual Modeling on TDSQL Applications Conclusion
More informationMongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM
MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM About us Adamo Tonete MongoDB Support Engineer Agustín Gallego MySQL Support Engineer Agenda What are MongoDB and MySQL; NoSQL
More informationKim Greene - Introduction
Kim Greene kim@kimgreene.com 507-216-5632 Skype/Twitter: iseriesdomino Copyright Kim Greene Consulting, Inc. All rights reserved worldwide. 1 Kim Greene - Introduction Owner of an IT consulting company
More informationReduce MongoDB Data Size. Steven Wang
Reduce MongoDB Data Size Tangome inc Steven Wang stwang@tango.me Outline MongoDB Cluster Architecture Advantages to Reduce Data Size Several Cases To Reduce MongoDB Data Size Case 1: Migrate To wiredtiger
More informationCS 655 Advanced Topics in Distributed Systems
Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3
More informationCISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL
CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours
More informationAccelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card
Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card The Rise of MongoDB Summary One of today s growing database
More informationIntroduction to Database Services
Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.
More informationIntroduction to Oracle NoSQL Database
Introduction to Oracle NoSQL Database Anand Chandak Ashutosh Naik Agenda NoSQL Background Oracle NoSQL Database Overview Technical Features & Performance Use Cases 2 Why NoSQL? 1. The four V s of Big Data
More information1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions
Big Data Hadoop Architect Online Training (Big Data Hadoop + Apache Spark & Scala+ MongoDB Developer And Administrator + Apache Cassandra + Impala Training + Apache Kafka + Apache Storm) 1 Big Data Hadoop
More informationSharding Introduction
search MongoDB Home Admin Zone Sharding Sharding Introduction Sharding Introduction MongoDB supports an automated sharding architecture, enabling horizontal scaling across multiple nodes. For applications
More informationVoldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data
More informationGhislain Fourny. Big Data 5. Column stores
Ghislain Fourny Big Data 5. Column stores 1 Introduction 2 Relational model 3 Relational model Schema 4 Issues with relational databases (RDBMS) Small scale Single machine 5 Can we fix a RDBMS? Scale up
More informationThe MapReduce Abstraction
The MapReduce Abstraction Parallel Computing at Google Leverages multiple technologies to simplify large-scale parallel computations Proprietary computing clusters Map/Reduce software library Lots of other
More informationYour First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database
Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database Me - @adamotonete Adamo Tonete Senior Technical Engineer Brazil Agenda What is MongoDB? The good side of MongoDB
More informationמרכז התמחות DBA. NoSQL and MongoDB תאריך: 3 דצמבר 2015 מציג: רז הורוביץ, ארכיטקט מרכז ההתמחות
מרכז התמחות DBA NoSQL and MongoDB תאריך: 3 דצמבר 2015 מציג: רז הורוביץ, ארכיטקט מרכז ההתמחות Raziel.Horovitz@tangram-soft.co.il Matrix IT work Copyright 2013. Do not remove source or Attribution from any
More informationNoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu
NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related
More informationMongoDB An Overview. 21-Oct Socrates
MongoDB An Overview 21-Oct-2016 Socrates Agenda What is NoSQL DB? Types of NoSQL DBs DBMS and MongoDB Comparison Why MongoDB? MongoDB Architecture Storage Engines Data Model Query Language Security Data
More informationMySQL Cluster Web Scalability, % Availability. Andrew
MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended
More informationMongoDB. copyright 2011 Trainologic LTD
MongoDB MongoDB MongoDB is a document-based open-source DB. Developed and supported by 10gen. MongoDB is written in C++. The name originated from the word: humongous. Is used in production at: Disney,
More informationCS 61C: Great Ideas in Computer Architecture. MapReduce
CS 61C: Great Ideas in Computer Architecture MapReduce Guest Lecturer: Justin Hsia 3/06/2013 Spring 2013 Lecture #18 1 Review of Last Lecture Performance latency and throughput Warehouse Scale Computing
More informationAmazon ElastiCache 8/1/17. Why Amazon ElastiCache is important? Introduction:
Amazon ElastiCache Introduction: How to improve application performance using caching. What are the ElastiCache engines, and the difference between them. How to scale your cluster vertically. How to scale
More information<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store
Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb The following is intended to outline our general product direction. It is intended for information purposes only,
More informationA Non-Relational Storage Analysis
A Non-Relational Storage Analysis Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Cloud Computing - 2nd semester 2012/2013 Universitat Politècnica de Catalunya Microblogging - big data?
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationCIB Session 12th NoSQL Databases Structures
CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is
More informationBreaking Barriers: MongoDB Design Patterns. Nikolaos Vyzas & Christos Soulios
Breaking Barriers: MongoDB Design Patterns Nikolaos Vyzas & Christos Soulios a bit about us and this talk Who we are what we do Christos Soulios Christos is a principal architect at Pythian Delivers Big
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL
More informationWhat s new in Mongo 4.0. Vinicius Grippa Percona
What s new in Mongo 4.0 Vinicius Grippa Percona About me Support Engineer at Percona since 2017 Working with MySQL for over 5 years - Started with SQL Server Working with databases for 7 years 2 Agenda
More informationPart 1: Indexes for Big Data
JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,
More information/ Cloud Computing. Recitation 10 March 22nd, 2016
15-319 / 15-619 Cloud Computing Recitation 10 March 22nd, 2016 Overview Administrative issues Office Hours, Piazza guidelines Last week s reflection Project 3.3, OLI Unit 4, Module 15, Quiz 8 This week
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in
More informationTALK 1: CONVINCE YOUR BOSS: CHOOSE THE "RIGHT" DATABASE. Prof. Dr. Stefan Edlich Beuth University of Technology Berlin (App.Sc.)
TALK 1: CONVINCE YOUR BOSS: CHOOSE THE "RIGHT" DATABASE Prof. Dr. Stefan Edlich Beuth University of Technology Berlin (App.Sc.) nosqlfrankfurt.de nosql powerdays 2 years of NoSQL Consulting http://nosql-database.org
More informationData Centers. Tom Anderson
Data Centers Tom Anderson Transport Clarification RPC messages can be arbitrary size Ex: ok to send a tree or a hash table Can require more than one packet sent/received We assume messages can be dropped,
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationMap-Reduce. Marco Mura 2010 March, 31th
Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of
More informationMongoDB Architecture
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui MongoDB Architecture Lecturer : Dr. Pavle Mogin SWEN 432 Advanced Database Design and Implementation Advanced Database Design
More informationMATH is Hard: TTL Index Configuration and Considerations. Kimberly Wilkins Sr.
MATH is Hard: TTL Index Configuration and Considerations Kimberly Wilkins Sr. DBA/Engineer kimberly@objectrocket.com @dba_denizen Drowning in Data? TTL s are your lifeboat Sources? Amounts? 600 TB 115
More informationrelational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance
WHITEPAPER Relational to Riak relational Introduction This whitepaper looks at why companies choose Riak over a relational database. We focus specifically on availability, scalability, and the / data model.
More informationPerspectives on NoSQL
Perspectives on NoSQL PGCon 2010 Gavin M. Roy What is NoSQL? NoSQL is a movement promoting a loosely defined class of nonrelational data stores that break with a long history of relational
More informationMongoDB Tutorial for Beginners
MongoDB Tutorial for Beginners Mongodb is a document-oriented NoSQL database used for high volume data storage. In this tutorial you will learn how Mongodb can be accessed and some of its important features
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationMongoDB Distributed Write and Read
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui MongoDB Distributed Write and Read Lecturer : Dr. Pavle Mogin SWEN 432 Advanced Database Design and Implementation Advanced
More informationMongoDB w/ Some Node.JS Sprinkles
MongoDB w/ Some Node.JS Sprinkles Niall O'Higgins Author MongoDB and Python O'Reilly @niallohiggins on Twitter niallo@beyondfog.com MongoDB Overview Non-relational (NoSQL) document-oriented database Rich
More informationECS High Availability Design
ECS High Availability Design March 2018 A Dell EMC white paper Revisions Date Mar 2018 Aug 2017 July 2017 Description Version 1.2 - Updated to include ECS version 3.2 content Version 1.1 - Updated to include
More informationRA-GRS, 130 replication support, ZRS, 130
Index A, B Agile approach advantages, 168 continuous software delivery, 167 definition, 167 disadvantages, 169 sprints, 167 168 Amazon Web Services (AWS) failure, 88 CloudTrail Service, 21 CloudWatch Service,
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationITG Software Engineering
Introduction to MongoDB Course ID: Page 1 Last Updated 12/15/2014 MongoDB for Developers Course Overview: In this 3 day class students will start by learning how to install and configure MongoDB on a Mac
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016
Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation
More informationOral Questions and Answers (DBMS LAB) Questions & Answers- DBMS
Questions & Answers- DBMS https://career.guru99.com/top-50-database-interview-questions/ 1) Define Database. A prearranged collection of figures known as data is called database. 2) What is DBMS? Database
More informationFLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568
FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected
More informationHigh Performance NoSQL with MongoDB
High Performance NoSQL with MongoDB History of NoSQL June 11th, 2009, San Francisco, USA Johan Oskarsson (from http://last.fm/) organized a meetup to discuss advances in data storage which were all using
More informationOracle NoSQL Database Enterprise Edition, Version 18.1
Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across
More informationManaging IoT and Time Series Data with Amazon ElastiCache for Redis
Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All
More informationBuilding High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL
Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high
More informationThe Little Mongo DB Schema Design Book
The Little Mongo DB Schema Design Book Christian Kvalheim This book is for sale at http://leanpub.com/mongodbschemadesign This version was published on 2015-10-09 ISBN 978-1-943846-77-1 This is a Leanpub
More informationCascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching
Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value
More informationebay s Architectural Principles
ebay s Architectural Principles Architectural Strategies, Patterns, and Forces for Scaling a Large ecommerce Site Randy Shoup ebay Distinguished Architect QCon London 2008 March 14, 2008 What we re up
More informationOracle NoSQL Database Enterprise Edition, Version 18.1
Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across
More informationAsynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage
Asynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage Kevin Beineke, Florian Klein, Michael Schöttner Institut für Informatik, Heinrich-Heine-Universität Düsseldorf Outline
More informationPutting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt
Putting together the platform: Riak, Redis, Solr and Spark Bryan Hunt 1 $ whoami Bryan Hunt Client Services Engineer @binarytemple 2 Minimum viable product - the ideologically correct doctrine 1. Start
More informationUsing PostgreSQL in Tantan - From 0 to 350bn rows in 2 years
Using PostgreSQL in Tantan - From 0 to 350bn rows in 2 years Victor Blomqvist vb@viblo.se Tantan ( 探探 ) December 2, PGConf Asia 2016 in Tokyo tantanapp.com 1 Sweden - Tantan - Tokyo 10 Million 11 Million
More informationDATABASE DESIGN II - 1DL400
DATABASE DESIGN II - 1DL400 Fall 2016 A second course in database systems http://www.it.uu.se/research/group/udbl/kurser/dbii_ht16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationClustering Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Clustering Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April, 2017 Sham Kakade 2017 1 Document Retrieval n Goal: Retrieve
More informationCOMP Parallel Computing. Lecture 22 November 29, Datacenters and Large Scale Data Processing
- Parallel Computing Lecture 22 November 29, 2018 Datacenters and Large Scale Data Processing Topics Parallel memory hierarchy extend to include disk storage Google web search Large parallel application
More informationCS3600 SYSTEMS AND NETWORKS
CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection
More informationDept. Of Computer Science, Colorado State University
CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [HADOOP/HDFS] Trying to have your cake and eat it too Each phase pines for tasks with locality and their numbers on a tether Alas within a phase, you get one,
More informationDistributed Computations MapReduce. adapted from Jeff Dean s slides
Distributed Computations MapReduce adapted from Jeff Dean s slides What we ve learnt so far Basic distributed systems concepts Consistency (sequential, eventual) Fault tolerance (recoverability, availability)
More information5 Fundamental Strategies for Building a Data-centered Data Center
5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse
More informationGhislain Fourny. Big Data 5. Wide column stores
Ghislain Fourny Big Data 5. Wide column stores Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 2 Where we are User interfaces
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationComparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2014 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
More informationelasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay Banon
elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay Banon - @kimchy Lucene Basics - Directory A File System Abstraction Mainly used to read and write files Used to read and write
More informationADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services
ADVANCED HBASE Architecture and Schema Design GeeCON, May 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera Consulting on Hadoop projects (everywhere) Apache Committer
More informationHow we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016
How we build TiDB Max Liu PingCAP Amsterdam, Netherlands October 5, 2016 About me Infrastructure engineer / CEO of PingCAP Working on open source projects: TiDB: https://github.com/pingcap/tidb TiKV: https://github.com/pingcap/tikv
More informationMaking Non-Distributed Databases, Distributed. Ioannis Papapanagiotou, PhD Shailesh Birari
Making Non-Distributed Databases, Distributed Ioannis Papapanagiotou, PhD Shailesh Birari Dynomite Ecosystem Dynomite - Proxy layer Dyno - Client Dynomite-manager - Ecosystem orchestrator Dynomite-explorer
More informationEngineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05
Engineering Goals Scalability Availability Transactional behavior Security EAI... Scalability How much performance can you get by adding hardware ($)? Performance perfect acceptable unacceptable Processors
More informationFacebook. The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook. March 11, 2011
HBase @ Facebook The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook March 11, 2011 Talk Outline the new Facebook Messages, and how we got started with HBase quick
More informationA BigData Tour HDFS, Ceph and MapReduce
A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!
More information