MongoDB 2.2 and Big Data

Size: px
Start display at page:

Download "MongoDB 2.2 and Big Data"

Transcription

1 MongoDB 2.2 and Big Data Christian Kvalheim Team Lead Engineering, christiankvalheim.com

2

3 From here...

4 ...to here...

5 ...without buying several of these

6 Warning! This is a technical talk But MongoDB is very simple!

7 Solving real world data problems with MongoDB Effective schema design for scaling Linking versus embedding Bucketing Time series Implications of sharding keys with alternatives Read scaling through replication Challenges of eventual consistency

8 Since the dawn of the RDBMS Main memory Intel 1103, 1k bits 4GB of RAM costs $25.99 Mass storage IBM 3330 Model 1, 100 MB 3TB Superspeed USB for $129 Microprocessor Nearly 4004 being developed; 4 bits and 92,000 instructions per second Westmere EX has 10 cores, 30MB L3 cache, runs at 2.4GHz

9 More recent changes A decade ago Now Faster Buy a bigger server Buy more servers Faster storage A SAN with more spindles SSD More reliable storage More expensive SAN More copies of local storage Deployed in Your data center The cloud private or public Large data set Millions of rows Billions to trillions of rows Development Waterfall Iterative

10 What MongoDB solves Agility Applications store complex data that is easier to model as documents Schemaless DB enables faster development cycles Flexibility Relaxed transactional semantics enable easy scale out Auto Sharding for scale down and scale up Cost Cost effective operationalize abundant data (clickstreams, logs, tweets,...)

11 Design Goal of MongoDB scalability & performance memcached key/value RDBMS depth of functionality

12 Effective Schema Design

13 Design Schema for Twitter Model each users activity stream Users Name, address, display name Tweets Text Who Timestamp

14 Solution A Two Collections // users - one doc per user { _id: "alvin", "alvin@10gen.com", display: "jonnyeight" } // tweets - one doc per user per tweet { user: "bob", tweet: " ", text: "Best Tweet Ever!", ts: ISODate(" T09:56:06.298Z") }

15 Solution B Embedded Tweets // users - one doc per user with all tweets { _id: "alvin", "alvin@10gen.com", display: "jonnyeight", tweets: [! {!! user: "bob",!! tweet: " ",!! text: "Best Tweet Ever!", ts: ISODate(" T09:56:06.298Z")! } ] }

16 Embedding Great for read performance One seek to load entire object One roundtrip to database Writes can be slow if adding to objects all the time

17 Linking or Embedding? Linking can make some queries easy // Find latest 50 tweets for "alvin" > db.tweets.find( { _id:"alvin"} ).sort( {ts:-1} ).limit(10) But what effect does this have on the systems?

18 Collection 1 Index 1

19 Collection 1 Virtual Address Space 1 Index 1 This is your virtual memory size (mapped)

20 Collection 1 Virtual Address Space 1 Physical RAM Index 1 This is your resident memory size

21 Collection 1 Virtual Address Space 1 Disk Physical RAM Index 1

22 Collection 1 Virtual Address Space 1 Disk Physical RAM Index 1 = = 100 ns 10,000,000 ns

23 Collection 1 Virtual Address Space 1 Disk Physical RAM Index db.tweets.find( { _id:"alvin"} ).sort( {ts:-1} ).limit(10) 3 Linking = Many Random Read

24 Collection 1 Virtual Address Space 1 Disk Physical RAM Index 1 db.tweets.find( { _id:"alvin"} ) 1 Embedding = Large Sequential Read

25 Problems Large sequential reads Good: Disks are great at Sequential reads Bad: Read too much data Many Random reads Good: Easy of query Bad: Disks are poor at Random reads (SSD?)

26 Solution C Buckets // tweets : one doc per user per day > db.tweets.findone() { _id: "alvin-2011/12/09", "alvin@10gen.com", tweets: [! { user: "Bob",! tweet: " ",! text: "Best Tweet Ever!" },! { author: "Joe",! date: "May ",! text: "Stuck in traffic (again)" } ]! }

27 Solution C Last 10 Tweets db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ).sort( { _id: -1 } ).limit(1)

28 Collection 1 Virtual Address Space 1 Disk Physical RAM Index 1 db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ).sort( { _id: -1 } ).limit(1) 1 Bucket = Small Sequential Read

29 Sharding

30 Sharding - Goals Data location transparent to your code Data distribution is automatic Data re-distribution is automatic Aggregate system resources horizontally No code changes

31 Sharding - Range distribution sh.shardcollection("test.tweets", {_id: 1}, false) shard01 shard02 shard03

32 Sharding - Range distribution shard01 shard02 shard03 a-i j-r s-z

33 Sharding - Splits shard01 shard02 shard03 a-i ja-jz s-z k-r

34 Sharding - Splits shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r

35 Sharding - Auto Balancing shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw js-jw jz-r jz-r

36 Sharding - Auto Balancing shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r

37 How does sharding effect Schema Design? Sharding key choice Access patterns (query versus write)

38 Sharding Key { photo_id :????, data : <binary> } What s the right key? auto increment MD5( data ) month() + MD5( data )

39 Right balanced access Only have to keep small portion in ram Right shard "hot" Time Based ObjectId Auto Increment

40 Random access Have to keep entire index in ram All shards "warm" Hash

41 Segmented access Have to keep some index in ram Some shards "warm" Month + Hash

42 Shard Key Example { _id : "alvin", // shard key "alvin@10gen.com", display: "jonnyeight" li: "alvin.j.richards", tweets: [... ] } Shard on { _id : 1 } Lookup by _id routed to 1 node Index on { 1 }

43 Sharding - Routed Query find({_id: "alvin"}) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r

44 Sharding - Routed Query find({_id: "alvin"}) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r

45 Sharding - Scatter Gather find({ "alvin@10gen.com"}) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r

46 Sharding - Scatter Gather find({ "alvin@10gen.com"}) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r

47 Multiple Identities User can have multiple identities twitter name address etc. What is the best sharding key & schema design?

48 Solution A Multiple Identities { _id: "alvin", display: "jonnyeight", "alvin@10gen.com", li: "alvin.j.richards", // linkedin tweets: [... ] } Shard on { _id: 1 } Lookup by _id routed to 1 node Lookup by li or is scatter gather Cannot create a unique index on li or

49 Solution B Multiple Identities identities { type: "_id", val: "alvin", info: " "} { type: "em", val: "alvin@10gen.com", info: " "} { type: "li", val: "alvin.j.richards",info: " "} tweets { _id: " ", tweets : [... ] } Shard identities on { type : 1, val : 1 } Lookup by type & val routed to 1 node Can create unique index on type & val Shard info on { _id: 1 } Lookup info on _id routed to 1 node

50 Sharding - Routed Query shard01 shard02 shard03 type: em val: a-q "Min"- "1100" type: li val: s-z type: em val: r-z type: li val: d-r "1100"- "1200" type: _id val: a-z "1200"- "Max" type: li val: a-c

51 Sharding - Routed Query find({type: "em", val: "alvin@10gen.com}) shard01 shard02 shard03 type: em val: a-q "Min"- "1100" type: li val: s-z type: em val: r-z type: li val: d-r "1100"- "1200" type: _id val: a-z "1200"- "Max" type: li val: a-c

52 Sharding - Routed Query find({type: "em", val: "alvin@10gen.com}) find({_id: " "}) shard01 shard02 shard03 type: em val: a-q "Min"- "1100" type: li val: s-z type: em val: r-z type: li val: d-r "1100"- "1200" type: _id val: a-z "1200"- "Max" type: li val: a-c

53 Sharding - Caching 96 GB Mem shard01 a-i 300 GB Data j-r s-z 3:1 Data/Mem

54 Sharding - Caching 96 GB Mem 96 GB Mem 96 GB Mem shard01 shard02 shard03 a-i j-r s-z 300 GB Data 1:1 Data/Mem 1:1 Data/Mem 1:1 Data/Mem

55 Auto Sharding - Summary Fully consistent Application code unaware of data location Zero code changes Shard by Compound Key, Tag, Hash (2.4) Add capacity On-line When needed Zero downtime

56 Replica Sets Data Protection Multiple copies of the data Spread across Data Centers, AZs High Availability Automated Failover Automated Recovery

57 Replica Sets App Write Read Primary Asynchronous Replication Read Secondary Read Secondary

58 Replica Sets App Write Read Read Primary Secondary Read Secondary

59 Replica Sets App Primary Write Read Read Primary Secondary Automatic Election of new Primary

60 Replica Sets App Recovering Write Read Read Primary Secondary New primary serves data

61 Replica Sets App Read Write Read Read Secondary Primary Secondary

62 Replica Sets - Summary Data Protection High Availability Scaling eventual consistent reads Source to feed other systems Backups Indexes (Solr etc.)

63 Types of Durability with MongoDB Fire and forget Wait for error Wait for fsync Wait for journal sync Wait for replication

64 Durability Summary Memory Journal Secondary Other Data Center RDBMS Default "Fire & Forget" w=1 w=1 j=true w="majority" w=n w="mytag" Less Durability More Durability

65 Eventual Consistency

66 Understanding Eventual Consistency Application Insert Read Update Primary v1 v2 Time Replicated Replicated Secondary v1 v2

67 Understanding Eventual Consistency Does not exist Application Insert Read Update Reads v1 Primary v1 v2 Reads Replicated Replicated v1 Secondary Application #2 v1 Read Read Read Read v2 Reads v2

68 Mongo 2.2 Big Data support

69 Time to live collections TTL Features set a time to live for documents in a collection automatic removal of documents without big batch purges Example db.log.events.ensureindex( { "status": 1 }, { expireafterseconds: 3600 } )

70 Aggregation framework Declarative No JavaScript required C++ implementation Higher performance than JavaScript Expression evaluation Return computed values Framework: we can add new operations easily

71 Aggregation framework Series of operations Members of a collection are passed through a pipeline to produce a result

72 Pipeline Operations $match Uses a query predicate (like.find({ })) as a filter $project Uses a sample document to determine the shape of the result (similar to.find() s optional argument) This can include computed values $unwind Hands out array elements one at a time $group Aggregates items into buckets defined by a key

73 Pipeline Operations (continued) $sort Sort documents $limit Only allow the specified number of documents to pass $skip Skip over the specified number of documents

74 Example: finding popular checkins checkin_1 = { location: location_id, user: user_id, ts: "20/03/2010" } // Find most popular locations in last 3 hours > agg = db.checkins.aggregate( {$match: {ts: {$gt: now_minus_3_hrs}}}, {$group: {_id: "$location", numentries: {$sum: 1}}} {$sort: {numentries : -1 }} ) > agg.result [{"_id": "Din Tai Fung", "numentries" : 17},...]

75 Tag aware sharding

76 Tag aware sharding // A document we need to store in the uk { _id: "...", k: {c: jp, uname: jhilman } } // Define a shard tag range where documents containing the field c = jp are stored in servers tagged with JP > sh.addtagrange( mydb.users, {k: {c: jp }}, {{c: jp }}, JP )

77 As your data processing needs increase you will want to use a tool designed for the job

78 Hadoop Map Reduce InputFormat Map (k 1, v 1, ctx) Runs on same thread as map Many map operations ctx.write(k 2,v 2 ) Combiner(k 2,values 2 ) 1 at time per input split similar to Mongo's group same as Mongo's emit Partitioner(k 2 ) Sort(keys 2 ) k 2, v 3 similar to Mongo's reducer similar to Mongo's Finalize Reducer threads Output Format Reduce(k 3,values 4 ) k f,v f Runs once per key

79 MongoDB & Hadoop MongoDB same as Mongo's shard chunks (64mb) Many map operations 1 at time per input split single server or sharded cluster Creates a list of Input Splits (InputFormat) each split each split each split RecordReader Map (k Map (k 1, 1 v, 1, v 1 ctx), ctx) Map (k 1, v 1, ctx) ctx.write(k 2,v 2 ctx.write(k 2 ),v 2 ctx.write(k 2 ),v 2 ) Runs on same thread as map Combiner(k 2,values 2 ) Combiner(k Combiner(k 2,values 2,values 2 ) 2 ) k 2, k 2 v 3, k 2 v 3, v 3 Partitioner(k 2 ) Partitioner(k 2 ) Partitioner(k 2 ) Sort(keys 2 ) Sort(k 2 ) Sort(k 2 ) MongoDB Reducer threads Output Format Reduce(k 2,values 3 ) k f,v f Runs once per key

80 Product & Roadmap

81 The Evolution of MongoDB March 11 Sept 11 Aug 12 winter 12 Journaling Sharding and Replica set enhancements Spherical geo search Index enhancements to improve size and performance Authentication with sharded clusters Replica Set Enhancements Concurrency improvements Aggregation Framework Multi-Data Center Deployments Improved Performance and Concurrency

82 2.4 Roadmap Must Kerberos integration LDAP/AD integration Nice To Have (not 100% yet) Full Text Search Hash Shard Key Background Index Build on Secondaries V8 for Map/Reduce Geo: intersecting polygons, Geo shard key

83 Morning With MongoDB Barcelona, Martim Museum 18th October Free Technical Overview, Roadmap, Common Use Cases, Customer Projects

84

Scaling for Humongous amounts of data with MongoDB

Scaling for Humongous amounts of data with MongoDB Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com From here... http://bit.ly/ot71m4 ...to here... http://bit.ly/oxcsis

More information

Scaling with mongodb

Scaling with mongodb Scaling with mongodb Ross Lawley Python Engineer @ 10gen Web developer since 1999 Passionate about open source Agile methodology email: ross@10gen.com twitter: RossC0 Today's Talk Scaling Understanding

More information

Course Content MongoDB

Course Content MongoDB Course Content MongoDB 1. Course introduction and mongodb Essentials (basics) 2. Introduction to NoSQL databases What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL

More information

The course modules of MongoDB developer and administrator online certification training:

The course modules of MongoDB developer and administrator online certification training: The course modules of MongoDB developer and administrator online certification training: 1 An Overview of the Course Introduction to the course Table of Contents Course Objectives Course Overview Value

More information

Open source, high performance database. July 2012

Open source, high performance database. July 2012 Open source, high performance database July 2012 1 Quick introduction to mongodb Data modeling in mongodb, queries, geospatial, updates and map reduce. Using a location-based app as an example Example

More information

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden SQL, NoSQL, MongoDB CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden SQL Databases Really better called Relational Databases Key construct is the Relation, a.k.a. the table Rows represent records Columns

More information

MongoDB - a No SQL Database What you need to know as an Oracle DBA

MongoDB - a No SQL Database What you need to know as an Oracle DBA MongoDB - a No SQL Database What you need to know as an Oracle DBA David Burnham Aims of this Presentation To introduce NoSQL database technology specifically using MongoDB as an example To enable the

More information

Scaling MongoDB. Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB Senior Service Technical Service Engineer.

Scaling MongoDB. Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB Senior Service Technical Service Engineer. caling MongoDB Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB enior ervice Technical ervice Engineer 1 Me and the expected audience @adamotonete Intermediate - At least 6+ months

More information

How to Scale MongoDB. Apr

How to Scale MongoDB. Apr How to Scale MongoDB Apr-24-2018 About me Location: Skopje, Republic of Macedonia Education: MSc, Software Engineering Experience: Lead Database Consultant (since 2016) Database Consultant (2012-2016)

More information

MongoDB. David Murphy MongoDB Practice Manager, Percona

MongoDB. David Murphy MongoDB Practice Manager, Percona MongoDB Click Replication to edit Master and Sharding title style David Murphy MongoDB Practice Manager, Percona Who is this Person and What Does He Know? Former MongoDB Master Former Lead DBA for ObjectRocket,

More information

Percona Live Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations. Kimberly Wilkins

Percona Live Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations. Kimberly Wilkins Percona Live 2016 Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Kimberly Wilkins Principal Engineer - Databases, Rackspace/ ObjectRocket www.linkedin.com/in/wilkinskimberly,

More information

Couchbase Architecture Couchbase Inc. 1

Couchbase Architecture Couchbase Inc. 1 Couchbase Architecture 2015 Couchbase Inc. 1 $whoami Laurent Doguin Couchbase Developer Advocate @ldoguin laurent.doguin@couchbase.com 2015 Couchbase Inc. 2 2 Big Data = Operational + Analytic (NoSQL +

More information

Time-Series Data in MongoDB on a Budget. Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018

Time-Series Data in MongoDB on a Budget. Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018 Time-Series Data in MongoDB on a Budget Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018 TIME SERIES DATA in MongoDB on a Budget Click to add text

More information

Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila Prakash, Dhanusha Varik

Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila Prakash, Dhanusha Varik Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila Prakash, Dhanusha Varik mongodb (humongous) Introduction What is MongoDB? Why MongoDB? MongoDB Terminology Why Not MongoDB? What is MongoDB? DOCUMENT STORE

More information

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons

More information

Document Object Storage with MongoDB

Document Object Storage with MongoDB Document Object Storage with MongoDB Lecture BigData Analytics Julian M. Kunkel julian.kunkel@googlemail.com University of Hamburg / German Climate Computing Center (DKRZ) 2017-12-15 Disclaimer: Big Data

More information

MongoDB Schema Design

MongoDB Schema Design MongoDB Schema Design Demystifying document structures in MongoDB Jon Tobin @jontobs MongoDB Overview NoSQL Document Oriented DB Dynamic Schema HA/Sharding Built In Simple async replication setup Automated

More information

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 About the Presentation Problems Existing Solutions Denis Magda

More information

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,

More information

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc. Conceptual Modeling on Tencent s Distributed Database Systems Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc. Outline Introduction System overview of TDSQL Conceptual Modeling on TDSQL Applications Conclusion

More information

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM About us Adamo Tonete MongoDB Support Engineer Agustín Gallego MySQL Support Engineer Agenda What are MongoDB and MySQL; NoSQL

More information

Kim Greene - Introduction

Kim Greene - Introduction Kim Greene kim@kimgreene.com 507-216-5632 Skype/Twitter: iseriesdomino Copyright Kim Greene Consulting, Inc. All rights reserved worldwide. 1 Kim Greene - Introduction Owner of an IT consulting company

More information

Reduce MongoDB Data Size. Steven Wang

Reduce MongoDB Data Size. Steven Wang Reduce MongoDB Data Size Tangome inc Steven Wang stwang@tango.me Outline MongoDB Cluster Architecture Advantages to Reduce Data Size Several Cases To Reduce MongoDB Data Size Case 1: Migrate To wiredtiger

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours

More information

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card The Rise of MongoDB Summary One of today s growing database

More information

Introduction to Database Services

Introduction to Database Services Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.

More information

Introduction to Oracle NoSQL Database

Introduction to Oracle NoSQL Database Introduction to Oracle NoSQL Database Anand Chandak Ashutosh Naik Agenda NoSQL Background Oracle NoSQL Database Overview Technical Features & Performance Use Cases 2 Why NoSQL? 1. The four V s of Big Data

More information

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions Big Data Hadoop Architect Online Training (Big Data Hadoop + Apache Spark & Scala+ MongoDB Developer And Administrator + Apache Cassandra + Impala Training + Apache Kafka + Apache Storm) 1 Big Data Hadoop

More information

Sharding Introduction

Sharding Introduction search MongoDB Home Admin Zone Sharding Sharding Introduction Sharding Introduction MongoDB supports an automated sharding architecture, enabling horizontal scaling across multiple nodes. For applications

More information

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data

More information

Ghislain Fourny. Big Data 5. Column stores

Ghislain Fourny. Big Data 5. Column stores Ghislain Fourny Big Data 5. Column stores 1 Introduction 2 Relational model 3 Relational model Schema 4 Issues with relational databases (RDBMS) Small scale Single machine 5 Can we fix a RDBMS? Scale up

More information

The MapReduce Abstraction

The MapReduce Abstraction The MapReduce Abstraction Parallel Computing at Google Leverages multiple technologies to simplify large-scale parallel computations Proprietary computing clusters Map/Reduce software library Lots of other

More information

Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database

Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database Me - @adamotonete Adamo Tonete Senior Technical Engineer Brazil Agenda What is MongoDB? The good side of MongoDB

More information

מרכז התמחות DBA. NoSQL and MongoDB תאריך: 3 דצמבר 2015 מציג: רז הורוביץ, ארכיטקט מרכז ההתמחות

מרכז התמחות DBA. NoSQL and MongoDB תאריך: 3 דצמבר 2015 מציג: רז הורוביץ, ארכיטקט מרכז ההתמחות מרכז התמחות DBA NoSQL and MongoDB תאריך: 3 דצמבר 2015 מציג: רז הורוביץ, ארכיטקט מרכז ההתמחות Raziel.Horovitz@tangram-soft.co.il Matrix IT work Copyright 2013. Do not remove source or Attribution from any

More information

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related

More information

MongoDB An Overview. 21-Oct Socrates

MongoDB An Overview. 21-Oct Socrates MongoDB An Overview 21-Oct-2016 Socrates Agenda What is NoSQL DB? Types of NoSQL DBs DBMS and MongoDB Comparison Why MongoDB? MongoDB Architecture Storage Engines Data Model Query Language Security Data

More information

MySQL Cluster Web Scalability, % Availability. Andrew

MySQL Cluster Web Scalability, % Availability. Andrew MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended

More information

MongoDB. copyright 2011 Trainologic LTD

MongoDB. copyright 2011 Trainologic LTD MongoDB MongoDB MongoDB is a document-based open-source DB. Developed and supported by 10gen. MongoDB is written in C++. The name originated from the word: humongous. Is used in production at: Disney,

More information

CS 61C: Great Ideas in Computer Architecture. MapReduce

CS 61C: Great Ideas in Computer Architecture. MapReduce CS 61C: Great Ideas in Computer Architecture MapReduce Guest Lecturer: Justin Hsia 3/06/2013 Spring 2013 Lecture #18 1 Review of Last Lecture Performance latency and throughput Warehouse Scale Computing

More information

Amazon ElastiCache 8/1/17. Why Amazon ElastiCache is important? Introduction:

Amazon ElastiCache 8/1/17. Why Amazon ElastiCache is important? Introduction: Amazon ElastiCache Introduction: How to improve application performance using caching. What are the ElastiCache engines, and the difference between them. How to scale your cluster vertically. How to scale

More information

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb The following is intended to outline our general product direction. It is intended for information purposes only,

More information

A Non-Relational Storage Analysis

A Non-Relational Storage Analysis A Non-Relational Storage Analysis Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Cloud Computing - 2nd semester 2012/2013 Universitat Politècnica de Catalunya Microblogging - big data?

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

CIB Session 12th NoSQL Databases Structures

CIB Session 12th NoSQL Databases Structures CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is

More information

Breaking Barriers: MongoDB Design Patterns. Nikolaos Vyzas & Christos Soulios

Breaking Barriers: MongoDB Design Patterns. Nikolaos Vyzas & Christos Soulios Breaking Barriers: MongoDB Design Patterns Nikolaos Vyzas & Christos Soulios a bit about us and this talk Who we are what we do Christos Soulios Christos is a principal architect at Pythian Delivers Big

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

What s new in Mongo 4.0. Vinicius Grippa Percona

What s new in Mongo 4.0. Vinicius Grippa Percona What s new in Mongo 4.0 Vinicius Grippa Percona About me Support Engineer at Percona since 2017 Working with MySQL for over 5 years - Started with SQL Server Working with databases for 7 years 2 Agenda

More information

Part 1: Indexes for Big Data

Part 1: Indexes for Big Data JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,

More information

/ Cloud Computing. Recitation 10 March 22nd, 2016

/ Cloud Computing. Recitation 10 March 22nd, 2016 15-319 / 15-619 Cloud Computing Recitation 10 March 22nd, 2016 Overview Administrative issues Office Hours, Piazza guidelines Last week s reflection Project 3.3, OLI Unit 4, Module 15, Quiz 8 This week

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in

More information

TALK 1: CONVINCE YOUR BOSS: CHOOSE THE "RIGHT" DATABASE. Prof. Dr. Stefan Edlich Beuth University of Technology Berlin (App.Sc.)

TALK 1: CONVINCE YOUR BOSS: CHOOSE THE RIGHT DATABASE. Prof. Dr. Stefan Edlich Beuth University of Technology Berlin (App.Sc.) TALK 1: CONVINCE YOUR BOSS: CHOOSE THE "RIGHT" DATABASE Prof. Dr. Stefan Edlich Beuth University of Technology Berlin (App.Sc.) nosqlfrankfurt.de nosql powerdays 2 years of NoSQL Consulting http://nosql-database.org

More information

Data Centers. Tom Anderson

Data Centers. Tom Anderson Data Centers Tom Anderson Transport Clarification RPC messages can be arbitrary size Ex: ok to send a tree or a hash table Can require more than one packet sent/received We assume messages can be dropped,

More information

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

MongoDB Architecture

MongoDB Architecture VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui MongoDB Architecture Lecturer : Dr. Pavle Mogin SWEN 432 Advanced Database Design and Implementation Advanced Database Design

More information

MATH is Hard: TTL Index Configuration and Considerations. Kimberly Wilkins Sr.

MATH is Hard: TTL Index Configuration and Considerations. Kimberly Wilkins Sr. MATH is Hard: TTL Index Configuration and Considerations Kimberly Wilkins Sr. DBA/Engineer kimberly@objectrocket.com @dba_denizen Drowning in Data? TTL s are your lifeboat Sources? Amounts? 600 TB 115

More information

relational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance

relational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance WHITEPAPER Relational to Riak relational Introduction This whitepaper looks at why companies choose Riak over a relational database. We focus specifically on availability, scalability, and the / data model.

More information

Perspectives on NoSQL

Perspectives on NoSQL Perspectives on NoSQL PGCon 2010 Gavin M. Roy What is NoSQL? NoSQL is a movement promoting a loosely defined class of nonrelational data stores that break with a long history of relational

More information

MongoDB Tutorial for Beginners

MongoDB Tutorial for Beginners MongoDB Tutorial for Beginners Mongodb is a document-oriented NoSQL database used for high volume data storage. In this tutorial you will learn how Mongodb can be accessed and some of its important features

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

MongoDB Distributed Write and Read

MongoDB Distributed Write and Read VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui MongoDB Distributed Write and Read Lecturer : Dr. Pavle Mogin SWEN 432 Advanced Database Design and Implementation Advanced

More information

MongoDB w/ Some Node.JS Sprinkles

MongoDB w/ Some Node.JS Sprinkles MongoDB w/ Some Node.JS Sprinkles Niall O'Higgins Author MongoDB and Python O'Reilly @niallohiggins on Twitter niallo@beyondfog.com MongoDB Overview Non-relational (NoSQL) document-oriented database Rich

More information

ECS High Availability Design

ECS High Availability Design ECS High Availability Design March 2018 A Dell EMC white paper Revisions Date Mar 2018 Aug 2017 July 2017 Description Version 1.2 - Updated to include ECS version 3.2 content Version 1.1 - Updated to include

More information

RA-GRS, 130 replication support, ZRS, 130

RA-GRS, 130 replication support, ZRS, 130 Index A, B Agile approach advantages, 168 continuous software delivery, 167 definition, 167 disadvantages, 169 sprints, 167 168 Amazon Web Services (AWS) failure, 88 CloudTrail Service, 21 CloudWatch Service,

More information

Distributed computing: index building and use

Distributed computing: index building and use Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput

More information

ITG Software Engineering

ITG Software Engineering Introduction to MongoDB Course ID: Page 1 Last Updated 12/15/2014 MongoDB for Developers Course Overview: In this 3 day class students will start by learning how to install and configure MongoDB on a Mac

More information

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation

More information

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS Questions & Answers- DBMS https://career.guru99.com/top-50-database-interview-questions/ 1) Define Database. A prearranged collection of figures known as data is called database. 2) What is DBMS? Database

More information

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568 FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected

More information

High Performance NoSQL with MongoDB

High Performance NoSQL with MongoDB High Performance NoSQL with MongoDB History of NoSQL June 11th, 2009, San Francisco, USA Johan Oskarsson (from http://last.fm/) organized a meetup to discuss advances in data storage which were all using

More information

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across

More information

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Managing IoT and Time Series Data with Amazon ElastiCache for Redis Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All

More information

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high

More information

The Little Mongo DB Schema Design Book

The Little Mongo DB Schema Design Book The Little Mongo DB Schema Design Book Christian Kvalheim This book is for sale at http://leanpub.com/mongodbschemadesign This version was published on 2015-10-09 ISBN 978-1-943846-77-1 This is a Leanpub

More information

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value

More information

ebay s Architectural Principles

ebay s Architectural Principles ebay s Architectural Principles Architectural Strategies, Patterns, and Forces for Scaling a Large ecommerce Site Randy Shoup ebay Distinguished Architect QCon London 2008 March 14, 2008 What we re up

More information

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across

More information

Asynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage

Asynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage Asynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage Kevin Beineke, Florian Klein, Michael Schöttner Institut für Informatik, Heinrich-Heine-Universität Düsseldorf Outline

More information

Putting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt

Putting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt Putting together the platform: Riak, Redis, Solr and Spark Bryan Hunt 1 $ whoami Bryan Hunt Client Services Engineer @binarytemple 2 Minimum viable product - the ideologically correct doctrine 1. Start

More information

Using PostgreSQL in Tantan - From 0 to 350bn rows in 2 years

Using PostgreSQL in Tantan - From 0 to 350bn rows in 2 years Using PostgreSQL in Tantan - From 0 to 350bn rows in 2 years Victor Blomqvist vb@viblo.se Tantan ( 探探 ) December 2, PGConf Asia 2016 in Tokyo tantanapp.com 1 Sweden - Tantan - Tokyo 10 Million 11 Million

More information

DATABASE DESIGN II - 1DL400

DATABASE DESIGN II - 1DL400 DATABASE DESIGN II - 1DL400 Fall 2016 A second course in database systems http://www.it.uu.se/research/group/udbl/kurser/dbii_ht16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval Case Study 2: Document Retrieval Clustering Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April, 2017 Sham Kakade 2017 1 Document Retrieval n Goal: Retrieve

More information

COMP Parallel Computing. Lecture 22 November 29, Datacenters and Large Scale Data Processing

COMP Parallel Computing. Lecture 22 November 29, Datacenters and Large Scale Data Processing - Parallel Computing Lecture 22 November 29, 2018 Datacenters and Large Scale Data Processing Topics Parallel memory hierarchy extend to include disk storage Google web search Large parallel application

More information

CS3600 SYSTEMS AND NETWORKS

CS3600 SYSTEMS AND NETWORKS CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection

More information

Dept. Of Computer Science, Colorado State University

Dept. Of Computer Science, Colorado State University CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [HADOOP/HDFS] Trying to have your cake and eat it too Each phase pines for tasks with locality and their numbers on a tether Alas within a phase, you get one,

More information

Distributed Computations MapReduce. adapted from Jeff Dean s slides

Distributed Computations MapReduce. adapted from Jeff Dean s slides Distributed Computations MapReduce adapted from Jeff Dean s slides What we ve learnt so far Basic distributed systems concepts Consistency (sequential, eventual) Fault tolerance (recoverability, availability)

More information

5 Fundamental Strategies for Building a Data-centered Data Center

5 Fundamental Strategies for Building a Data-centered Data Center 5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse

More information

Ghislain Fourny. Big Data 5. Wide column stores

Ghislain Fourny. Big Data 5. Wide column stores Ghislain Fourny Big Data 5. Wide column stores Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 2 Where we are User interfaces

More information

Next-Generation Cloud Platform

Next-Generation Cloud Platform Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2014 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay Banon

elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay Banon elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay Banon - @kimchy Lucene Basics - Directory A File System Abstraction Mainly used to read and write files Used to read and write

More information

ADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services

ADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services ADVANCED HBASE Architecture and Schema Design GeeCON, May 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera Consulting on Hadoop projects (everywhere) Apache Committer

More information

How we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016

How we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016 How we build TiDB Max Liu PingCAP Amsterdam, Netherlands October 5, 2016 About me Infrastructure engineer / CEO of PingCAP Working on open source projects: TiDB: https://github.com/pingcap/tidb TiKV: https://github.com/pingcap/tikv

More information

Making Non-Distributed Databases, Distributed. Ioannis Papapanagiotou, PhD Shailesh Birari

Making Non-Distributed Databases, Distributed. Ioannis Papapanagiotou, PhD Shailesh Birari Making Non-Distributed Databases, Distributed Ioannis Papapanagiotou, PhD Shailesh Birari Dynomite Ecosystem Dynomite - Proxy layer Dyno - Client Dynomite-manager - Ecosystem orchestrator Dynomite-explorer

More information

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05 Engineering Goals Scalability Availability Transactional behavior Security EAI... Scalability How much performance can you get by adding hardware ($)? Performance perfect acceptable unacceptable Processors

More information

Facebook. The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook. March 11, 2011

Facebook. The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook. March 11, 2011 HBase @ Facebook The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook March 11, 2011 Talk Outline the new Facebook Messages, and how we got started with HBase quick

More information

A BigData Tour HDFS, Ceph and MapReduce

A BigData Tour HDFS, Ceph and MapReduce A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!

More information