Friday, April 26, 13

Size: px

Start display at page:

Download "Friday, April 26, 13"

Madison Patrick
5 years ago
Views:

2 Introduc)on to Map Reduce with Couchbase Tugdual Grall NoSQL Ma)ers 13 - Cologne - April 25th 2013

3 About Me Tugdual Tug Grall Couchbase exo Technical Evangelist CTO Oracle Developer/Product Manager Mainly Java/SOA Developer in firms hep://blog.grallandco.com tgrall NantesJUG co- founder Pet Project : hep://

4 What s the Problem? Lots of Data Big Data Big Users SaaS/Cloud CompuDng

5 Solu)on Distribute: the data the processing of the data

6 Map Reduce MapReduce is a programming model for processing large data sets, and the name of an implementa@on of the model by Google. MapReduce is typically used to do distributed compu@ng on clusters of computers. hep://research.google.com/archive/mapreduce.html

7 In details Developer specifies 2 methods: map (in_key, in_value) -> list(out_key, intermediate_value) Processes input data Produces key, values pairs reduce (out_key, list(intermediate_value)) -> list(out_value) Combines all intermediate values for a par@cular key Produce a set of merged output values

8 Execu)on

9 Most common use case Yahoo inc.

10 What about Couchbase?

oriented use cases All components are available under the Apache 2.

11 Couchbase Open Source Project Leading NoSQL database project focused on distributed database technology and surrounding ecosystem Supports both key- value and document- oriented use cases All components are available under the Apache 2.0 Public License Obtained as packaged soxware in both enterprise and community Couchbase Open Source Project

12 Couchbase Server Core Principles Easy Scalability PERFORMANCE Consistent High Performance Grow cluster without changes, without with a single click Consistent sub- millisecond read and write with consistent high throughput Always On 24x365 JSON JSON JSON JSON Flexible Data Model No down@me for soxware upgrades, hardware maintenance, etc. JSON document model with no fixed schema.

13 Addi)onal Couchbase Server Features Built- in clustering All nodes equal Data with auto- failover Zero- maintenance Built- in managed cached Append- only storage layer Online Monitoring and admin API & UI SDK for a variety of languages

14 Couchbase Server 2.0 Architecture 8092 Query API Memcapable Memcapable 2.0 Moxi Query Engine Memcached Couchbase EP Engine Data Manager New Persistence Layer storage interface REST management API/Web UI Heartbeat Process monitor manager Global singleton supervisor Rebalance orchestrator Node health monitor vbucket state and manager Cluster Manager hvp on each node one per cluster Erlang/OTP HTTP 8091 Erlang port mapper 4369 Distributed Erlang

Couchbase Server 2.0 Architecture 8092 Query API 11211 Memcapable 1.0 11210 Memcapable 2.

15 Couchbase Server 2.0 Architecture 8092 Query API Memcapable Memcapable 2.0 Moxi Query Engine Object- level Cache RAM Cache, Indexing & Persistence Management (C & V8) Couchbase EP Engine New Disk Persistence Persistence Layer storage interface REST management API/Web UI Heartbeat Process monitor manager Global singleton supervisor Rebalance orchestrator Node health monitor vbucket state and manager Server/Cluster Management & CommunicaDon (Erlang) hvp on each node one per cluster Erlang/OTP The Unreasonable Effectiveness of C by Damien Katz HTTP 8091 Erlang port mapper 4369 Distributed Erlang

Basic Opera)on APP SERVER 1 APP SERVER 2 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP READ/WRITE/UPDATE SERVER 1 ACTIVE SERVER 2 ACTIVE SERVER 3 ACTIVE s distributed

16 Basic Opera)on APP SERVER 1 APP SERVER 2 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP READ/WRITE/UPDATE SERVER 1 ACTIVE SERVER 2 ACTIVE SERVER 3 ACTIVE s distributed evenly across servers Each server stores both ac)ve and replica docs Only one server ac@ve at Client library provides app with simple interface to database REPLICA 4 REPLICA 6 REPLICA 7 Cluster map provides map to which server doc is on App never needs to know App reads, writes, updates docs Mul)ple app servers can access same document at same )me COUCHBASE SERVER CLUSTER User Configured Replica Count = 1

17 How to access the data?

18 Couchbase.get( my-key );

19 Look at a document Key { } string : string, string : value, string JSON : { string : string, OBJECT string : value }, string : [ array ] ( DOCUMENT ) How to find document based on its avributes? get employee by get products by type... You need to look into the document/value

20 How to? Create an index!

Create the index { { {"id": "110f37fa30", {"id": "110f37fa30", "rev": {"id":"1-000000000", "110f37fa30", "rev": "expiration": {"id":"1-000000000", "110f37fa30", "rev": 0, "expiration":

"flags": "id": "json" "1-000000000", 0, "110f37fa30", 0, "expiration": "type": "rev": "flags": "id": "json" "1-000000000", 0, "110f37fa30", 0, } "expiration": "type": "rev": "json" "1-000000000", 0,

0, { } "type": "json" {"name": } "Aventinus", "type": "json" {"name": } "Aventinus", "abv": {"name": } 8.2, "Aventinus", "abv": "name": 8.2, "Aventinus", "ibu": { "abv": 0, 8.

21 Create the index { { {"id": "110f37fa30", {"id": "110f37fa30", "rev": {"id":" ", "110f37fa30", "rev": "expiration": {"id":" ", "110f37fa30", "rev": 0, "expiration": {"id":" ", "110f37fa30", "rev": 0, "flags": "expiration": {"id":" ", 0, "110f37fa30", "rev": 0, "flags": "expiration": {"id":" ", 0, "110f37fa30", "type": "rev": "flags": "id": "json" " ", 0, "110f37fa30", 0, "expiration": "type": "rev": "flags": "id": "json" " ", 0, "110f37fa30", 0, } "expiration": "type": "rev": "json" " ", 0, "flags": } "expiration": "type": "rev": 0, "json" " ", 0, "flags": } "expiration": 0, 0, "type": "flags": } "expiration": "json" 0, 0, "type": "flags": "json" 0, { } "type": "flags": "json" 0, { } "type": "json" {"name": } "Aventinus", "type": "json" {"name": } "Aventinus", "abv": {"name": } 8.2, "Aventinus", "abv": "name": 8.2, "Aventinus", "ibu": { "abv": 0, 8.2, "ibu": {"name": "Aventinus", "abv": 0, 8.2, "srm": "ibu": {"name": 0, "Aventinus", "abv": 0, "srm": "ibu": {"name": 8.2, 0, "Aventinus", 0, "upc": "abv": "srm": "ibu": 0, "name": 8.2, 0, "Aventinus", 0, "upc": "abv": "srm": 0, "name": 8.2, 0, "Aventinus", "type": "ibu": "upc": "abv": "srm": "beer", 0, 0, 8.2, 0, "type": "ibu": "upc": "abv": "beer", 0, 0, 8.2, "brewery_id": "srm": "type": "ibu": 0, "upc":"beer", 0, "110f1f2012", 0, "brewery_id": "srm": "type": "ibu": 0, "beer", "110f1f2012", 0, "updated": "upc": "brewery_id": "srm": "type":" , 0, "beer", "110f1f2012", 20:00:20", "updated": "upc": "brewery_id": "srm": " , 0, "110f1f2012", 20:00:20", "description": "type": "beer", "updated": "upc": "brewery_id": " , "Dark-ruby, "110f1f2012", 20:00:20", "description": "type": "beer", "updated": "upc": " , "Dark-ruby, 20:00:20",... "brewery_id": "description": "updated": Weizenbock", "type": "beer", "110f1f2012", " "Dark-ruby, 20:00:20",... "brewery_id": "description": Weizenbock", "type": "beer", "110f1f2012", "Dark-ruby, "category": "updated":... "brewery_id": "description": Weizenbock", "German " "Dark-ruby, Ale" "110f1f2012", 20:00:20", "category": "updated":... "brewery_id": Weizenbock", "German " Ale" "110f1f2012", 20:00:20", } "description": "Dark-ruby, "category": "updated":... Weizenbock", "German " Ale" 20:00:20", } "description": "Dark-ruby, "category": "updated":... Weizenbock", "German " Ale" 20:00:20", } "description": "Dark-ruby, "category":... Weizenbock", "German Ale" } "description": "Dark-ruby, "category":... Weizenbock", "German Ale" } "category":... Weizenbock", "German Ale" } "category": "German Ale" } "category": "German Ale" } } Key Value Aven@nus 8.2 Avenue Ale

22 Concrete Example This map func)on: receives the document and metadata as developer you just have to emit the K,V

23 Map Func)on Text

24 ?startkey= b1?startkey= bz & endkey= zn endkey= zz Pulls the Index- Keys between UTF- 8 Range specified by the startkey and endkey. doc. abba@couchbase.com beta@couchbase.com jasdeep@couchbase.com math@couchbase.com mae@couchbase.com ye@@couchbase.com zorro@couchbase.com meta.id u::1 u::7 u::2 u::5 u::6 u::4 u::3

25 ?key= Match a Single Index- Key doc. abba@couchbase.com beta@couchbase.com jasdeep@couchbase.com math@couchbase.com mae@couchbase.com ye@@couchbase.com zorro@couchbase.com meta.id u::1 u::7 u::2 u::5 u::6 u::4 u::3

26 ?keys=[ ] Query Mul@ple in the Set (Array Nota@on) doc. abba@couchbase.com beta@couchbase.com jasdeep@couchbase.com math@couchbase.com mae@couchbase.com ye@@couchbase.com zorro@couchbase.com meta.id u::1 u::7 u::2 u::5 u::6 u::4 u::3

27 How it works?

28 Indexing and Querying APP SERVER 1 APP SERVER 2 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP Query SERVER 1 ACTIVE 5 SERVER 2 ACTIVE 5 SERVER 3 ACTIVE 5 Indexing work is distributed amongst nodes Large data set possible Parallelize the effort Each node has index for data stored on it 4 REPLICA 4 REPLICA 4 REPLICA Queries combine the results from required nodes COUCHBASE SERVER CLUSTER User Configured Replica Count = 1

29 Couchbase Server 2.0: Views Views can cover a few different use cases Primary Index Simple secondary indexes (the most common) Complex secondary, ter@ary and composite indexes Aggrega@on func@ons (reduc@on) Example: count the number of North American Ales Organizing related data Built using Map/Reduce Map func@on creates a matrix from document fields Reduce func@on summarizes (reduces) informa@on

30 Distributed Index Build Phase Op)mized for lookups, in- order access and aggrega)ons All view reads from disk (different performance profile) View builds against every document on every node This is why you should group them in a design document Automa)cally kept up to date Incremental Map Reduce

31 Dynamic Range Queries with Op5onal Aggrega5on Efficiently fetch an row or group of related rows. Queries use cached values from B- tree inner nodes when possible Take advantage of in- order tree traversal with group_level queries?startkey= J &endkey= K { rows :[{ key : Juneau, value :null}]} SERVER 1 SERVER 2 SERVER 3 Ac@ve s Ac@ve s Ac@ve s 5 DOC 4 DOC 1 DOC 2 DOC 7 DOC 3 DOC 9 DOC 8 DOC 6 DOC Replica s Replica s Replica s 4 DOC 6 DOC 7 DOC 1 DOC 3 DOC 9 DOC 8 DOC 2 DOC 5 DOC

32 Append Only Index Disk acdvity is slow UpdaDng disk blocks is very slow Appending new data to the end of the current file is fast Overhead of reverse reading is small Because exisdng blocks are not re- used, can lead to fragmentadon Couchbase will compact the index View Processor Disk Changed uments View Processor Original Appended

33 Adding a new ument new root A-R 14 A-R 15 new reductions A-H 7 I-R 7 I-R 8 A-C 3 D-F 2 G-H 2 I-L 3 N-R 4 M-R 5 A B C D F G H I K L M N O Q R new key

34 What about Reduce? Out of the box func)ons : _count() _sum() _stats() Create your own if needed function(key, values, rereduce) { if (rereduce) { var result = 0; for (var i = 0; i < values.length; i++) { result += values[i]; } return result; } else { return values.length; } }

35 Reduce Func)on Key and Arrays of values as parameters WriVen Javascript Called aner the map func)on Used to reduce the result of a map of single values Used with grouping Could be ignored when querying reuse the index

36 Reduce in Ac)on Map() Result Key Value Belgian- Style Dubbel 1 Belgian- Style Dubbel 1 Belgian- Style Dubbel 1 Belgian- Style Pale Ale 1 Belgian- Style White 1 Belgian- Style White Reduce() _count() Result Key Value Belgian- Style Dubbel 3 Belgian- Style Pale Ale 1 Belgian- Style White 2

$setrangestart(complexkey.of(startkey)).setrangeend(complexkey.of(startkey + "\uefff")); ViewResponse result = client.$

37 How to use it? Use client SDK to call the view: View view = client.getview("beer", "by_name"); Query query = new Query(); query.setincludes(true).setlimit(20).setrangestart(complexkey.of(startkey)).setrangeend(complexkey.of(startkey + "\uefff")); ViewResponse result = client.query(view, query); for(viewrow row : result) {... }

38 Demonstra)on

39 Hadoop & Couchbase Deal with Big Data More is be)er than Faster Batch Oriented Usually used to extract/transform data Fully distributed Map, Shuffle, Reduce Distributed Executed where the document is Deal with indexing data As fast as possible Use to query the data in the Database

40 Map Reduce in Couchbase Like many other NoSQL Database : Used for queries! Index are distributed on each node of the cluster Index are updated Incrementally Write you Map Reduce in Javascript

41 Thank Get Couchbase Server at hep://

Couchbase Server. Chris Anderson Chief

Couchbase Server. Chris Anderson Chief Couchbase Server Chris Anderson Chief Architect @jchris 1 Couchbase Server Simple = Fast Elas=c NoSQL Database Formerly known as Membase Server 2 Couchbase Server Features Built- in clustering All nodes