Friday, April 26, 13

Similar documents
Couchbase Server. Chris Anderson Chief

Developing in NoSQL with Couchbase and Java. Raghavan N. Srinivas Couchbase 123

Couchbase Architecture Couchbase Inc. 1

CS Silvia Zuffi - Sunil Mallya. Slides credits: official membase meetings

How$to$integrate$Hadoop$ with$your$nosql$database?

The NoSQL Landscape. Frank Weigel VP, Field Technical Opera;ons

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis

Developing in NoSQL with Couchbase

MapReduce, Apache Hadoop

MapReduce, Apache Hadoop

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies

Realtime visitor analysis with Couchbase and Elasticsearch

Hadoop An Overview. - Socrates CCDH

Document Databases: MongoDB

Outline. Spanner Mo/va/on. Tom Anderson

Document stores using CouchDB

HBase... And Lewis Carroll! Twi:er,

Parallel Computing: MapReduce Jin, Hai

MapReduce. Cloud Computing COMP / ECPE 293A

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

MapReduce-style data processing

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Lecture: The Google Bigtable

vbuckets: The Core Enabling Mechanism for Couchbase Server Data Distribution (aka Auto-Sharding )

OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System

Welcome to the New Era of Cloud Computing

Practice and Applications of Data Management CMPSCI 345. Lecture 18: Big Data, Hadoop, and MapReduce

Standalone to SQL Server HA Clusters in Minutes.

The MapReduce Abstraction

hashfs Applying Hashing to Op2mize File Systems for Small File Reads

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent

Search Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson

Logisland Event mining at scale. Thomas [ ]

Dealing with Memcached Challenges

Overview. Why MapReduce? What is MapReduce? The Hadoop Distributed File System Cloudera, Inc.

Distributed Systems. 29. Distributed Caching Paul Krzyzanowski. Rutgers University. Fall 2014

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Clustering Lecture 8: MapReduce

A BigData Tour HDFS, Ceph and MapReduce

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card

Learning CouchDB A Non-Relational Alternative to Data Persistence for Modern Software Applications

Func%onal Programming in Scheme and Lisp

Scaling MongoDB: Avoiding Common Pitfalls. Jon Tobin Senior Systems

Re- op&mizing Data Parallel Compu&ng

RAD, Rules, and Compatibility: What's Coming in Kuali Rice 2.0

Matt Ingenthron. Couchbase, Inc.

NoSQL data stores and SOS: Uniform Access to Non-Relational Database Systems Paolo Atzeni Francesca Bugiotti Luca Rossi

Introduction to HDFS and MapReduce

Oracle VM Workshop Applica>on Driven Virtualiza>on

Architecture of So-ware Systems Massively Distributed Architectures Reliability, Failover and failures. Mar>n Rehák

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Capabilities of Cloudant NoSQL Database IBM Corporation

NODE.JS SERVER SIDE JAVASCRIPT. Introduc)on Node.js

Splout SQL When Big Data Output is also Big Data

Submitted to: Dr. Sunnie Chung. Presented by: Sonal Deshmukh Jay Upadhyay

Agenda. Request- Level Parallelism. Agenda. Anatomy of a Web Search. Google Query- Serving Architecture 9/20/10

Today s Lecture. CS 61C: Great Ideas in Computer Architecture (Machine Structures) Map Reduce

Latest Trends in Database Technology NoSQL and Beyond

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

June 20, 2017 Revision NoSQL Database Architectural Comparison

Developing MapReduce Programs

Tools for Social Networking Infrastructures

Intro to Couchbase Server for ColdFusion - Clustered NoSQL and Caching at its Finest

Distributed File Systems II

MongoDB Revs You Up: What Storage Engine is Right for You?

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

MapReduce. Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan

Distributed Non-Relational Databases. Pelle Jakovits

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Introduction to MapReduce

Be warned Niklas Gustavsson

Op#mizing MapReduce for Highly- Distributed Environments

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Introduction to Hadoop and MapReduce

@ COUCHBASE CONNECT. Using Couchbase. By: Carleton Miyamoto, Michael Kehoe Version: 1.1w LinkedIn Corpora3on

OpenWorld 2015 Oracle Par22oning

Non-Relational Databases. Pelle Jakovits

An Introduction to Big Data Formats

ECS 165B: Database System Implementa6on Lecture 3

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

26- April- 2010, Spring Member Mee4ng Chris Hyzer, Grouper developer

Introduc)on to. CS60092: Informa0on Retrieval

High Performance Drupal

Elas%c Load Balancing, Amazon CloudWatch, and Auto Scaling Sco) Linder

Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391

Rule 14 Use Databases Appropriately

Func%onal Programming in Scheme and Lisp

Mul$media Networking. #9 CDN Solu$ons Semester Ganjil 2012 PTIIK Universitas Brawijaya

Stay Informed During and AEer OpenWorld

Scaling DreamFactory

Securing Hadoop. Keys Botzum, MapR Technologies Jan MapR Technologies - Confiden6al

CISC 7610 Lecture 2b The beginnings of NoSQL

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

MapReduce. Stony Brook University CSE545, Fall 2016

Stream and Complex Event Processing Discovering Exis7ng Systems: esper

Transcription:

Introduc)on to Map Reduce with Couchbase Tugdual Grall / @tgrall NoSQL Ma)ers 13 - Cologne - April 25th 2013

About Me Tugdual Tug Grall Couchbase exo Technical Evangelist CTO Oracle Developer/Product Manager Mainly Java/SOA Developer in consul@ng firms Web @tgrall hep://blog.grallandco.com tgrall NantesJUG co- founder Pet Project : hep://www.resultri.com

What s the Problem? Lots of Data Big Data Big Users SaaS/Cloud CompuDng

Solu)on Distribute: the data the processing of the data

Map Reduce MapReduce is a programming model for processing large data sets, and the name of an implementa@on of the model by Google. MapReduce is typically used to do distributed compu@ng on clusters of computers. hep://research.google.com/archive/mapreduce.html

In details Developer specifies 2 methods: map (in_key, in_value) -> list(out_key, intermediate_value) Processes input data Produces key, values pairs reduce (out_key, list(intermediate_value)) -> list(out_value) Combines all intermediate values for a par@cular key Produce a set of merged output values

Execu)on

Most common use case Yahoo inc.

What about Couchbase?

Couchbase Open Source Project Leading NoSQL database project focused on distributed database technology and surrounding ecosystem Supports both key- value and document- oriented use cases All components are available under the Apache 2.0 Public License Obtained as packaged soxware in both enterprise and community edi@ons. Couchbase Open Source Project

Couchbase Server Core Principles Easy Scalability PERFORMANCE Consistent High Performance Grow cluster without applica@on changes, without down@me with a single click Consistent sub- millisecond read and write response @mes with consistent high throughput Always On 24x365 JSON JSON JSON JSON Flexible Data Model No down@me for soxware upgrades, hardware maintenance, etc. JSON document model with no fixed schema.

Addi)onal Couchbase Server Features Built- in clustering All nodes equal Data replica@on with auto- failover Zero- down@me maintenance Built- in managed cached Append- only storage layer Online compac@on Monitoring and admin API & UI SDK for a variety of languages

Couchbase Server 2.0 Architecture 8092 Query API 11211 Memcapable 1.0 11210 Memcapable 2.0 Moxi Query Engine Memcached Couchbase EP Engine Data Manager New Persistence Layer storage interface REST management API/Web UI Heartbeat Process monitor Configura@on manager Global singleton supervisor Rebalance orchestrator Node health monitor vbucket state and replica@on manager Cluster Manager hvp on each node one per cluster Erlang/OTP HTTP 8091 Erlang port mapper 4369 Distributed Erlang 21100-21199

Couchbase Server 2.0 Architecture 8092 Query API 11211 Memcapable 1.0 11210 Memcapable 2.0 Moxi Query Engine Object- level Cache RAM Cache, Indexing & Persistence Management (C & V8) Couchbase EP Engine New Disk Persistence Persistence Layer storage interface REST management API/Web UI Heartbeat Process monitor Configura@on manager Global singleton supervisor Rebalance orchestrator Node health monitor vbucket state and replica@on manager Server/Cluster Management & CommunicaDon (Erlang) hvp on each node one per cluster Erlang/OTP The Unreasonable Effectiveness of C by Damien Katz HTTP 8091 Erlang port mapper 4369 Distributed Erlang 21100-21199

Basic Opera)on APP SERVER 1 APP SERVER 2 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP READ/WRITE/UPDATE SERVER 1 ACTIVE SERVER 2 ACTIVE SERVER 3 ACTIVE s distributed evenly across servers 5 2 4 7 1 2 Each server stores both ac)ve and replica docs Only one server ac@ve at a @me 9 8 6 Client library provides app with simple interface to database REPLICA 4 REPLICA 6 REPLICA 7 Cluster map provides map to which server doc is on App never needs to know 1 3 9 App reads, writes, updates docs 8 2 5 Mul)ple app servers can access same document at same )me COUCHBASE SERVER CLUSTER User Configured Replica Count = 1

How to access the data?

Couchbase.get( my-key );

Look at a document Key { } string : string, string : value, string JSON : { string : string, OBJECT string : value }, string : [ array ] ( DOCUMENT ) How to find document based on its avributes? get employee by email get products by type... You need to look into the document/value

How to? Create an index!

Create the index { { {"id": "110f37fa30", {"id": "110f37fa30", "rev": {"id":"1-000000000", "110f37fa30", "rev": "expiration": {"id":"1-000000000", "110f37fa30", "rev": 0, "expiration": {"id":"1-000000000", "110f37fa30", "rev": 0, "flags": "expiration": {"id":"1-000000000", 0, "110f37fa30", "rev": 0, "flags": "expiration": {"id":"1-000000000", 0, "110f37fa30", "type": "rev": "flags": "id": "json" "1-000000000", 0, "110f37fa30", 0, "expiration": "type": "rev": "flags": "id": "json" "1-000000000", 0, "110f37fa30", 0, } "expiration": "type": "rev": "json" "1-000000000", 0, "flags": } "expiration": "type": "rev": 0, "json" "1-000000000", 0, "flags": } "expiration": 0, 0, "type": "flags": } "expiration": "json" 0, 0, "type": "flags": "json" 0, { } "type": "flags": "json" 0, { } "type": "json" {"name": } "Aventinus", "type": "json" {"name": } "Aventinus", "abv": {"name": } 8.2, "Aventinus", "abv": "name": 8.2, "Aventinus", "ibu": { "abv": 0, 8.2, "ibu": {"name": "Aventinus", "abv": 0, 8.2, "srm": "ibu": {"name": 0, "Aventinus", "abv": 0, "srm": "ibu": {"name": 8.2, 0, "Aventinus", 0, "upc": "abv": "srm": "ibu": 0, "name": 8.2, 0, "Aventinus", 0, "upc": "abv": "srm": 0, "name": 8.2, 0, "Aventinus", "type": "ibu": "upc": "abv": "srm": "beer", 0, 0, 8.2, 0, "type": "ibu": "upc": "abv": "beer", 0, 0, 8.2, "brewery_id": "srm": "type": "ibu": 0, "upc":"beer", 0, "110f1f2012", 0, "brewery_id": "srm": "type": "ibu": 0, "beer", "110f1f2012", 0, "updated": "upc": "brewery_id": "srm": "type":"2010-07-22 0, 0, "beer", "110f1f2012", 20:00:20", "updated": "upc": "brewery_id": "srm": "2010-07-22 0, 0, "110f1f2012", 20:00:20", "description": "type": "beer", "updated": "upc": "brewery_id": "2010-07-22 0, "Dark-ruby, "110f1f2012", 20:00:20", "description": "type": "beer", "updated": "upc": "2010-07-22 0, "Dark-ruby, 20:00:20",... "brewery_id": "description": "updated": Weizenbock", "type": "beer", "110f1f2012", "2010-07-22 "Dark-ruby, 20:00:20",... "brewery_id": "description": Weizenbock", "type": "beer", "110f1f2012", "Dark-ruby, "category": "updated":... "brewery_id": "description": Weizenbock", "German "2010-07-22 "Dark-ruby, Ale" "110f1f2012", 20:00:20", "category": "updated":... "brewery_id": Weizenbock", "German "2010-07-22 Ale" "110f1f2012", 20:00:20", } "description": "Dark-ruby, "category": "updated":... Weizenbock", "German "2010-07-22 Ale" 20:00:20", } "description": "Dark-ruby, "category": "updated":... Weizenbock", "German "2010-07-22 Ale" 20:00:20", } "description": "Dark-ruby, "category":... Weizenbock", "German Ale" } "description": "Dark-ruby, "category":... Weizenbock", "German Ale" } "category":... Weizenbock", "German Ale" } "category": "German Ale" } "category": "German Ale" } } Key Value Aven@nus 8.2 Avenue Ale 4.1......

Concrete Example This map func)on: receives the document and metadata as developer you just have to emit the K,V

Map Func)on Text

?startkey= b1?startkey= bz & endkey= zn endkey= zz Pulls the Index- Keys between UTF- 8 Range specified by the startkey and endkey. doc.email abba@couchbase.com beta@couchbase.com jasdeep@couchbase.com math@couchbase.com mae@couchbase.com ye@@couchbase.com zorro@couchbase.com meta.id u::1 u::7 u::2 u::5 u::6 u::4 u::3

?key= math@couchbase.com Match a Single Index- Key doc.email abba@couchbase.com beta@couchbase.com jasdeep@couchbase.com math@couchbase.com mae@couchbase.com ye@@couchbase.com zorro@couchbase.com meta.id u::1 u::7 u::2 u::5 u::6 u::4 u::3

?keys=[ math@couchbase.com, yed@couchbase.com ] Query Mul@ple in the Set (Array Nota@on) doc.email abba@couchbase.com beta@couchbase.com jasdeep@couchbase.com math@couchbase.com mae@couchbase.com ye@@couchbase.com zorro@couchbase.com meta.id u::1 u::7 u::2 u::5 u::6 u::4 u::3

How it works?

Indexing and Querying APP SERVER 1 APP SERVER 2 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP Query SERVER 1 ACTIVE 5 SERVER 2 ACTIVE 5 SERVER 3 ACTIVE 5 Indexing work is distributed amongst nodes Large data set possible 2 2 2 Parallelize the effort 9 9 9 Each node has index for data stored on it 4 REPLICA 4 REPLICA 4 REPLICA Queries combine the results from required nodes 1 1 1 8 8 8 COUCHBASE SERVER CLUSTER User Configured Replica Count = 1

Couchbase Server 2.0: Views Views can cover a few different use cases Primary Index Simple secondary indexes (the most common) Complex secondary, ter@ary and composite indexes Aggrega@on func@ons (reduc@on) Example: count the number of North American Ales Organizing related data Built using Map/Reduce Map func@on creates a matrix from document fields Reduce func@on summarizes (reduces) informa@on

Distributed Index Build Phase Op)mized for lookups, in- order access and aggrega)ons All view reads from disk (different performance profile) View builds against every document on every node This is why you should group them in a design document Automa)cally kept up to date Incremental Map Reduce

Dynamic Range Queries with Op5onal Aggrega5on Efficiently fetch an row or group of related rows. Queries use cached values from B- tree inner nodes when possible Take advantage of in- order tree traversal with group_level queries?startkey= J &endkey= K { rows :[{ key : Juneau, value :null}]} SERVER 1 SERVER 2 SERVER 3 Ac@ve s Ac@ve s Ac@ve s 5 DOC 4 DOC 1 DOC 2 DOC 7 DOC 3 DOC 9 DOC 8 DOC 6 DOC Replica s Replica s Replica s 4 DOC 6 DOC 7 DOC 1 DOC 3 DOC 9 DOC 8 DOC 2 DOC 5 DOC

Append Only Index Disk acdvity is slow UpdaDng disk blocks is very slow Appending new data to the end of the current file is fast Overhead of reverse reading is small Because exisdng blocks are not re- used, can lead to fragmentadon Couchbase will compact the index automa@cally View Processor Disk Changed uments View Processor Original Appended

Adding a new ument new root A-R 14 A-R 15 new reductions A-H 7 I-R 7 I-R 8 A-C 3 D-F 2 G-H 2 I-L 3 N-R 4 M-R 5 A B C D F G H I K L M N O Q R new key

What about Reduce? Out of the box func)ons : _count() _sum() _stats() Create your own if needed function(key, values, rereduce) { if (rereduce) { var result = 0; for (var i = 0; i < values.length; i++) { result += values[i]; } return result; } else { return values.length; } }

Reduce Func)on Key and Arrays of values as parameters WriVen Javascript Called aner the map func)on Used to reduce the result of a map of single values Used with grouping Could be ignored when querying reuse the index

Reduce in Ac)on Map() Result Key Value Belgian- Style Dubbel 1 Belgian- Style Dubbel 1 Belgian- Style Dubbel 1 Belgian- Style Pale Ale 1 Belgian- Style White 1 Belgian- Style White 1...... Reduce() _count() Result Key Value Belgian- Style Dubbel 3 Belgian- Style Pale Ale 1 Belgian- Style White 2

How to use it? Use client SDK to call the view: View view = client.getview("beer", "by_name"); Query query = new Query(); query.setincludes(true).setlimit(20).setrangestart(complexkey.of(startkey)).setrangeend(complexkey.of(startkey + "\uefff")); ViewResponse result = client.query(view, query); for(viewrow row : result) {... }

Demonstra)on

Hadoop & Couchbase Deal with Big Data More is be)er than Faster Batch Oriented Usually used to extract/transform data Fully distributed Map, Shuffle, Reduce Distributed Executed where the document is Deal with indexing data As fast as possible Use to query the data in the Database

Map Reduce in Couchbase Like many other NoSQL Database : Used for queries! Index are distributed on each node of the cluster Index are updated Incrementally Write you Map Reduce in Javascript

Thank you! tug@couchbase.com @tgrall Get Couchbase Server at hep://www.couchbase.com/download