Open source, high performance database July 2012 1
Quick introduction to mongodb Data modeling in mongodb, queries, geospatial, updates and map reduce. Using a location-based app as an example Example works in mongodb JS shell 2
3
MongoDB is a scalable, high-performance, open source, document-oriented database. Fast Querying In-place updates Full Index Support Replication /High Availability Auto-Sharding Aggregation; Map/Reduce GridFS 4
Better data locality In-Memory Caching Auto-Sharding Read scaling Write scaling Relational MongoDB We just can't get any faster than the way MongoDB handles our data. Tony Tam CTO, Wordnik 5
MongoDB is Implemented in C++ Windows, Linux, Mac OS-X, Solaris Drivers are available in many languages 10gen supported C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala, nodejs Multiple community supported drivers 6
RDBMS Table Row(s) Index Partition Join Schema MongoDB Collection JSON Document Index Shard Embedding/Linking (implied Schema) 7
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Asya", date : ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : ISODate("2012-02-03T17:22:21.124Z"), Post Ever!" }], comment_count : 1 } text : "Best 8
JSON has powerful, limited set of datatypes Mongo extends datatypes with Date, Int types, Id, MongoDB stores data in BSON BSON is a binary representation of JSON Optimized for performance and navigational abilities Also compression See: bsonspec.org 9
MySQL START TRANSACTION; INSERT INTO contacts VALUES (NULL, joeblow ); INSERT INTO contact_emails VALUES ( NULL, joe@blow.com, LAST_INSERT_ID() ), ( NULL, joseph@blow.com, LAST_INSERT_ID() ); COMMIT; MongoDB db.contacts.save( { username: joeblow, emailaddresses: [ joe@blow.com, joseph@blow.com ] } ); Native drivers for dozens of languages Data maps naturally to OO data structures 10
Want to build an app where users can check in to a location Leave notes or comments about that location 11
"As a user I want to be able to find other locations nearby" Need to store locations (Offices, Restaurants, etc) name, address, tags coordinates User generated content e.g. tips / notes 12
"As a user I want to be able to 'checkin' to a location" Checkins User should be able to 'check in' to a location Want to be able to generate statistics: Recent checkins Popular locations 13
loc1, loc2, loc3 user1, user2 checkin1, checkin2 locations users checkins 14
> location_1 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012 } 15
> location_1 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012 } > db.locations.find({name: "Lotus Flower"}) 16
> location_1 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012 } > db.locations.ensureindex({name: 1}) > db.locations.find({name: "Lotus Flower"}) 17
> location_2 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, } tags: ["restaurant", "dumplings"] 18
> location_2 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, } tags: ["restaurant", "dumplings"] > db.locations.ensureindex({tags: 1}) 19
> location_2 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, } tags: ["restaurant", "dumplings"] > db.locations.ensureindex({tags: 1}) > db.locations.find({tags: "dumplings"}) 20
> location_3 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], } lat_long: [52.5184, 13.387] 21
> location_3 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], } lat_long: [52.5184, 13.387] > db.locations.ensureindex({lat_long: "2d"}) 22
> location_3 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], } lat_long: [52.5184, 13.387] > db.locations.ensureindex({lat_long: "2d"}) > db.locations.find({lat_long: {$near:[52.53, 13.4]}}) 23
// creating your indexes: > db.locations.ensureindex({tags: 1}) > db.locations.ensureindex({name: 1}) > db.locations.ensureindex({lat_long: "2d"}) // finding places: > db.locations.find({lat_long: {$near:[52.53, 13.4]}}) // with regular expressions: > db.locations.find({name: /^Din/}) // by tag: > db.locations.find({tag: "dumplings"}) 24
Atomic operators: $set, $unset, $inc, $push, $pushall, $pull, $pullall, $bit 25
// initial data load: > db.locations.insert(location_3) // adding a tip with update: > db.locations.update( {name: "Lotus Flower"}, {$push: { tips: { user: "Asya", date: "28/03/2012", tip: "The hairy crab dumplings are awesome!"} }}) 26
> db.locations.findone() { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387], tips:[{ user: "Asya", date: "28/03/2012", tip: "The hairy crab dumplings are awesome!" }] } 27
"As a user I want to be able to 'checkin' to a location" Checkins User should be able to 'check in' to a location Want to be able to generate statistics: Recent checkins Popular locations 28
> user_1 = { _id: "asya@10gen.com", name: "Asya", twitter: "asya999", checkins: [ {location: "Lotus Flower", ts: "28/03/2012"}, {location: "Meridian Hotel", ts: "27/03/2012"} ] } > db.users.ensureindex({checkins.location: 1}) > db.users.find({checkins.location: "Lotus Flower"}) 29
// find all users who've checked in here: > db.users.find({"checkins.location":"lotus Flower"}) 30
// find all users who've checked in here: > db.users.find({"checkins.location":"lotus Flower"}) // find the last 10 checkins here? > db.users.find({"checkins.location":"lotus Flower"}).sort({"checkins.ts": -1}).limit(10) 31
// find all users who've checked in here: > db.users.find({"checkins.location":"lotus Flower"}) // find the last 10 checkins here: - Warning! > db.users.find({"checkins.location":"lotus Flower"}).sort({"checkins.ts": -1}).limit(10) Hard to query for last 10 32
> user_2 = { _id: "asya@10gen.com", name: "Asya", twitter: "asya999", } > checkin_1 = { location: location_id, user: user_id, ts: "20/03/2010" } > db.checkins.ensureindex({user: 1}) > db.checkins.find({user: user_id}) 33
// find all users who've checked in here: > location_id = db.checkins.find({"name":"lotus Flower"}) > u_ids = db.checkins.find({location: location_id}, {_id: -1, user: 1}) > users = db.users.find({_id: {$in: u_ids}}) // find the last 10 checkins here: > db.checkins.find({location: location_id}).sort({ts: -1}).limit(10) // count how many checked in today: > db.checkins.find({location: location_id, ts: {$gt: midnight}} ).count() 34
// Find most popular locations > agg = db.checkins.aggregate( {$match: {ts: {$gt: now_minus_3_hrs}}}, {$group: {_id: "$location",numentries: {$sum: 1}}} ) > agg.result [{"_id": "Lotus Flower", "numentries" : 17}] 35
// Find most popular locations > map_func = function() { emit(this.location, 1); } > reduce_func = function(key, values) { return Array.sum(values); } > db.checkins.mapreduce(map_func, reduce_func, {query: {ts: {$gt: now_minus_3_hrs}}, out: "result"}) > db.result.findone() {"_id": "Lotus Flower", "value" : 17} 36
Deployment 37
Single server - need a strong backup plan P 38
Single server - need a strong backup plan P Replica sets - High availability - Automatic failover P S S 39
Single server - need a strong backup plan P Replica sets - High availability - Automatic failover Sharded - Horizontally scale - Auto balancing P S S P S S P S S 40
Content Management Operational Intelligence E Commerce User Data Management High Volume Data Feeds 41
42
download at mongodb.org conferences, appearances, and meetups http://www.10gen.com/events Facebook Twitter LinkedIn http://bit.ly/mongofb @mongodb http://linkd.in/joinmongo support, training, and this talk brought to you by 43