MongoDB 2.2 and Big Data

Size: px

Start display at page:

Download "MongoDB 2.2 and Big Data"

Barry Daniels
5 years ago
Views:

1 MongoDB 2.2 and Big Data Christian Kvalheim Team Lead Engineering, christiankvalheim.com

3 From here...

4 ...to here...

5 ...without buying several of these

6 Warning! This is a technical talk But MongoDB is very simple!

7 Solving real world data problems with MongoDB Effective schema design for scaling Linking versus embedding Bucketing Time series Implications of sharding keys with alternatives Read scaling through replication Challenges of eventual consistency

8 Since the dawn of the RDBMS Main memory Intel 1103, 1k bits 4GB of RAM costs $25.99 Mass storage IBM 3330 Model 1, 100 MB 3TB Superspeed USB for $129 Microprocessor Nearly 4004 being developed; 4 bits and 92,000 instructions per second Westmere EX has 10 cores, 30MB L3 cache, runs at 2.4GHz

9 More recent changes A decade ago Now Faster Buy a bigger server Buy more servers Faster storage A SAN with more spindles SSD More reliable storage More expensive SAN More copies of local storage Deployed in Your data center The cloud private or public Large data set Millions of rows Billions to trillions of rows Development Waterfall Iterative

10 What MongoDB solves Agility Applications store complex data that is easier to model as documents Schemaless DB enables faster development cycles Flexibility Relaxed transactional semantics enable easy scale out Auto Sharding for scale down and scale up Cost Cost effective operationalize abundant data (clickstreams, logs, tweets,...)

11 Design Goal of MongoDB scalability & performance memcached key/value RDBMS depth of functionality

12 Effective Schema Design

13 Design Schema for Twitter Model each users activity stream Users Name, address, display name Tweets Text Who Timestamp

14 Solution A Two Collections // users - one doc per user { _id: "alvin", "alvin@10gen.com", display: "jonnyeight" } // tweets - one doc per user per tweet { user: "bob", tweet: " ", text: "Best Tweet Ever!", ts: ISODate(" T09:56:06.298Z") }

15 Solution B Embedded Tweets // users - one doc per user with all tweets { _id: "alvin", "alvin@10gen.com", display: "jonnyeight", tweets: [! {!! user: "bob",!! tweet: " ",!! text: "Best Tweet Ever!", ts: ISODate(" T09:56:06.298Z")! } ] }

16 Embedding Great for read performance One seek to load entire object One roundtrip to database Writes can be slow if adding to objects all the time

17 Linking or Embedding? Linking can make some queries easy // Find latest 50 tweets for "alvin" > db.tweets.find( { _id:"alvin"} ).sort( {ts:-1} ).limit(10) But what effect does this have on the systems?

18 Collection 1 Index 1

19 Collection 1 Virtual Address Space 1 Index 1 This is your virtual memory size (mapped)

20 Collection 1 Virtual Address Space 1 Physical RAM Index 1 This is your resident memory size

21 Collection 1 Virtual Address Space 1 Disk Physical RAM Index 1

22 Collection 1 Virtual Address Space 1 Disk Physical RAM Index 1 = = 100 ns 10,000,000 ns

23 Collection 1 Virtual Address Space 1 Disk Physical RAM Index db.tweets.find( { _id:"alvin"} ).sort( {ts:-1} ).limit(10) 3 Linking = Many Random Read

24 Collection 1 Virtual Address Space 1 Disk Physical RAM Index 1 db.tweets.find( { _id:"alvin"} ) 1 Embedding = Large Sequential Read

25 Problems Large sequential reads Good: Disks are great at Sequential reads Bad: Read too much data Many Random reads Good: Easy of query Bad: Disks are poor at Random reads (SSD?)

26 Solution C Buckets // tweets : one doc per user per day > db.tweets.findone() { _id: "alvin-2011/12/09", "alvin@10gen.com", tweets: [! { user: "Bob",! tweet: " ",! text: "Best Tweet Ever!" },! { author: "Joe",! date: "May ",! text: "Stuck in traffic (again)" } ]! }

27 Solution C Last 10 Tweets db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ).sort( { _id: -1 } ).limit(1)

28 Collection 1 Virtual Address Space 1 Disk Physical RAM Index 1 db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ).sort( { _id: -1 } ).limit(1) 1 Bucket = Small Sequential Read

29 Sharding

30 Sharding - Goals Data location transparent to your code Data distribution is automatic Data re-distribution is automatic Aggregate system resources horizontally No code changes

31 Sharding - Range distribution sh.shardcollection("test.tweets", {_id: 1}, false) shard01 shard02 shard03

32 Sharding - Range distribution shard01 shard02 shard03 a-i j-r s-z

33 Sharding - Splits shard01 shard02 shard03 a-i ja-jz s-z k-r

34 Sharding - Splits shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r

35 Sharding - Auto Balancing shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw js-jw jz-r jz-r

36 Sharding - Auto Balancing shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r

37 How does sharding effect Schema Design? Sharding key choice Access patterns (query versus write)

38 Sharding Key { photo_id :????, data : <binary> } What s the right key? auto increment MD5( data ) month() + MD5( data )

39 Right balanced access Only have to keep small portion in ram Right shard "hot" Time Based ObjectId Auto Increment

40 Random access Have to keep entire index in ram All shards "warm" Hash

41 Segmented access Have to keep some index in ram Some shards "warm" Month + Hash

42 Shard Key Example { _id : "alvin", // shard key "alvin@10gen.com", display: "jonnyeight" li: "alvin.j.richards", tweets: [... ] } Shard on { _id : 1 } Lookup by _id routed to 1 node Index on { 1 }

43 Sharding - Routed Query find({_id: "alvin"}) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r

44 Sharding - Routed Query find({_id: "alvin"}) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r

45 Sharding - Scatter Gather find({ "alvin@10gen.com"}) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r

46 Sharding - Scatter Gather find({ "alvin@10gen.com"}) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r

47 Multiple Identities User can have multiple identities twitter name address etc. What is the best sharding key & schema design?

48 Solution A Multiple Identities { _id: "alvin", display: "jonnyeight", "alvin@10gen.com", li: "alvin.j.richards", // linkedin tweets: [... ] } Shard on { _id: 1 } Lookup by _id routed to 1 node Lookup by li or is scatter gather Cannot create a unique index on li or

49 Solution B Multiple Identities identities { type: "_id", val: "alvin", info: " "} { type: "em", val: "alvin@10gen.com", info: " "} { type: "li", val: "alvin.j.richards",info: " "} tweets { _id: " ", tweets : [... ] } Shard identities on { type : 1, val : 1 } Lookup by type & val routed to 1 node Can create unique index on type & val Shard info on { _id: 1 } Lookup info on _id routed to 1 node

50 Sharding - Routed Query shard01 shard02 shard03 type: em val: a-q "Min"- "1100" type: li val: s-z type: em val: r-z type: li val: d-r "1100"- "1200" type: _id val: a-z "1200"- "Max" type: li val: a-c

51 Sharding - Routed Query find({type: "em", val: "alvin@10gen.com}) shard01 shard02 shard03 type: em val: a-q "Min"- "1100" type: li val: s-z type: em val: r-z type: li val: d-r "1100"- "1200" type: _id val: a-z "1200"- "Max" type: li val: a-c

52 Sharding - Routed Query find({type: "em", val: "alvin@10gen.com}) find({_id: " "}) shard01 shard02 shard03 type: em val: a-q "Min"- "1100" type: li val: s-z type: em val: r-z type: li val: d-r "1100"- "1200" type: _id val: a-z "1200"- "Max" type: li val: a-c

53 Sharding - Caching 96 GB Mem shard01 a-i 300 GB Data j-r s-z 3:1 Data/Mem

54 Sharding - Caching 96 GB Mem 96 GB Mem 96 GB Mem shard01 shard02 shard03 a-i j-r s-z 300 GB Data 1:1 Data/Mem 1:1 Data/Mem 1:1 Data/Mem

55 Auto Sharding - Summary Fully consistent Application code unaware of data location Zero code changes Shard by Compound Key, Tag, Hash (2.4) Add capacity On-line When needed Zero downtime

56 Replica Sets Data Protection Multiple copies of the data Spread across Data Centers, AZs High Availability Automated Failover Automated Recovery

57 Replica Sets App Write Read Primary Asynchronous Replication Read Secondary Read Secondary

58 Replica Sets App Write Read Read Primary Secondary Read Secondary

59 Replica Sets App Primary Write Read Read Primary Secondary Automatic Election of new Primary

60 Replica Sets App Recovering Write Read Read Primary Secondary New primary serves data

61 Replica Sets App Read Write Read Read Secondary Primary Secondary

62 Replica Sets - Summary Data Protection High Availability Scaling eventual consistent reads Source to feed other systems Backups Indexes (Solr etc.)

63 Types of Durability with MongoDB Fire and forget Wait for error Wait for fsync Wait for journal sync Wait for replication

64 Durability Summary Memory Journal Secondary Other Data Center RDBMS Default "Fire & Forget" w=1 w=1 j=true w="majority" w=n w="mytag" Less Durability More Durability

65 Eventual Consistency

66 Understanding Eventual Consistency Application Insert Read Update Primary v1 v2 Time Replicated Replicated Secondary v1 v2

67 Understanding Eventual Consistency Does not exist Application Insert Read Update Reads v1 Primary v1 v2 Reads Replicated Replicated v1 Secondary Application #2 v1 Read Read Read Read v2 Reads v2

68 Mongo 2.2 Big Data support

69 Time to live collections TTL Features set a time to live for documents in a collection automatic removal of documents without big batch purges Example db.log.events.ensureindex( { "status": 1 }, { expireafterseconds: 3600 } )

70 Aggregation framework Declarative No JavaScript required C++ implementation Higher performance than JavaScript Expression evaluation Return computed values Framework: we can add new operations easily

71 Aggregation framework Series of operations Members of a collection are passed through a pipeline to produce a result

72 Pipeline Operations $match Uses a query predicate (like.find({ })) as a filter $project Uses a sample document to determine the shape of the result (similar to.find() s optional argument) This can include computed values $unwind Hands out array elements one at a time $group Aggregates items into buckets defined by a key

73 Pipeline Operations (continued) $sort Sort documents $limit Only allow the specified number of documents to pass $skip Skip over the specified number of documents

74 Example: finding popular checkins checkin_1 = { location: location_id, user: user_id, ts: "20/03/2010" } // Find most popular locations in last 3 hours > agg = db.checkins.aggregate( {$match: {ts: {$gt: now_minus_3_hrs}}}, {$group: {_id: "$location", numentries: {$sum: 1}}} {$sort: {numentries : -1 }} ) > agg.result [{"_id": "Din Tai Fung", "numentries" : 17},...]

75 Tag aware sharding

76 Tag aware sharding // A document we need to store in the uk { _id: "...", k: {c: jp, uname: jhilman } } // Define a shard tag range where documents containing the field c = jp are stored in servers tagged with JP > sh.addtagrange( mydb.users, {k: {c: jp }}, {{c: jp }}, JP )

77 As your data processing needs increase you will want to use a tool designed for the job

78 Hadoop Map Reduce InputFormat Map (k 1, v 1, ctx) Runs on same thread as map Many map operations ctx.write(k 2,v 2 ) Combiner(k 2,values 2 ) 1 at time per input split similar to Mongo's group same as Mongo's emit Partitioner(k 2 ) Sort(keys 2 ) k 2, v 3 similar to Mongo's reducer similar to Mongo's Finalize Reducer threads Output Format Reduce(k 3,values 4 ) k f,v f Runs once per key

MongoDB & Hadoop MongoDB same as Mongo's shard chunks (64mb) Many map operations 1 at time per input split single server or sharded cluster Creates a list of Input Splits (InputFormat) each split

79 MongoDB & Hadoop MongoDB same as Mongo's shard chunks (64mb) Many map operations 1 at time per input split single server or sharded cluster Creates a list of Input Splits (InputFormat) each split each split each split RecordReader Map (k Map (k 1, 1 v, 1, v 1 ctx), ctx) Map (k 1, v 1, ctx) ctx.write(k 2,v 2 ctx.write(k 2 ),v 2 ctx.write(k 2 ),v 2 ) Runs on same thread as map Combiner(k 2,values 2 ) Combiner(k Combiner(k 2,values 2,values 2 ) 2 ) k 2, k 2 v 3, k 2 v 3, v 3 Partitioner(k 2 ) Partitioner(k 2 ) Partitioner(k 2 ) Sort(keys 2 ) Sort(k 2 ) Sort(k 2 ) MongoDB Reducer threads Output Format Reduce(k 2,values 3 ) k f,v f Runs once per key

80 Product & Roadmap

81 The Evolution of MongoDB March 11 Sept 11 Aug 12 winter 12 Journaling Sharding and Replica set enhancements Spherical geo search Index enhancements to improve size and performance Authentication with sharded clusters Replica Set Enhancements Concurrency improvements Aggregation Framework Multi-Data Center Deployments Improved Performance and Concurrency

82 2.4 Roadmap Must Kerberos integration LDAP/AD integration Nice To Have (not 100% yet) Full Text Search Hash Shard Key Background Index Build on Secondaries V8 for Map/Reduce Geo: intersecting polygons, Geo shard key

83 Morning With MongoDB Barcelona, Martim Museum 18th October Free Technical Overview, Roadmap, Common Use Cases, Customer Projects

Scaling for Humongous amounts of data with MongoDB

Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com From here... http://bit.ly/ot71m4 ...to here... http://bit.ly/oxcsis