Friday, April 26, 13

Size: px
Start display at page:

Download "Friday, April 26, 13"

Transcription

1

2 Introduc)on to Map Reduce with Couchbase Tugdual Grall NoSQL Ma)ers 13 - Cologne - April 25th 2013

3 About Me Tugdual Tug Grall Couchbase exo Technical Evangelist CTO Oracle Developer/Product Manager Mainly Java/SOA Developer in firms hep://blog.grallandco.com tgrall NantesJUG co- founder Pet Project : hep://

4 What s the Problem? Lots of Data Big Data Big Users SaaS/Cloud CompuDng

5 Solu)on Distribute: the data the processing of the data

6 Map Reduce MapReduce is a programming model for processing large data sets, and the name of an implementa@on of the model by Google. MapReduce is typically used to do distributed compu@ng on clusters of computers. hep://research.google.com/archive/mapreduce.html

7 In details Developer specifies 2 methods: map (in_key, in_value) -> list(out_key, intermediate_value) Processes input data Produces key, values pairs reduce (out_key, list(intermediate_value)) -> list(out_value) Combines all intermediate values for a par@cular key Produce a set of merged output values

8 Execu)on

9 Most common use case Yahoo inc.

10 What about Couchbase?

11 Couchbase Open Source Project Leading NoSQL database project focused on distributed database technology and surrounding ecosystem Supports both key- value and document- oriented use cases All components are available under the Apache 2.0 Public License Obtained as packaged soxware in both enterprise and community Couchbase Open Source Project

12 Couchbase Server Core Principles Easy Scalability PERFORMANCE Consistent High Performance Grow cluster without changes, without with a single click Consistent sub- millisecond read and write with consistent high throughput Always On 24x365 JSON JSON JSON JSON Flexible Data Model No down@me for soxware upgrades, hardware maintenance, etc. JSON document model with no fixed schema.

13 Addi)onal Couchbase Server Features Built- in clustering All nodes equal Data with auto- failover Zero- maintenance Built- in managed cached Append- only storage layer Online Monitoring and admin API & UI SDK for a variety of languages

14 Couchbase Server 2.0 Architecture 8092 Query API Memcapable Memcapable 2.0 Moxi Query Engine Memcached Couchbase EP Engine Data Manager New Persistence Layer storage interface REST management API/Web UI Heartbeat Process monitor manager Global singleton supervisor Rebalance orchestrator Node health monitor vbucket state and manager Cluster Manager hvp on each node one per cluster Erlang/OTP HTTP 8091 Erlang port mapper 4369 Distributed Erlang

15 Couchbase Server 2.0 Architecture 8092 Query API Memcapable Memcapable 2.0 Moxi Query Engine Object- level Cache RAM Cache, Indexing & Persistence Management (C & V8) Couchbase EP Engine New Disk Persistence Persistence Layer storage interface REST management API/Web UI Heartbeat Process monitor manager Global singleton supervisor Rebalance orchestrator Node health monitor vbucket state and manager Server/Cluster Management & CommunicaDon (Erlang) hvp on each node one per cluster Erlang/OTP The Unreasonable Effectiveness of C by Damien Katz HTTP 8091 Erlang port mapper 4369 Distributed Erlang

16 Basic Opera)on APP SERVER 1 APP SERVER 2 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP READ/WRITE/UPDATE SERVER 1 ACTIVE SERVER 2 ACTIVE SERVER 3 ACTIVE s distributed evenly across servers Each server stores both ac)ve and replica docs Only one server ac@ve at Client library provides app with simple interface to database REPLICA 4 REPLICA 6 REPLICA 7 Cluster map provides map to which server doc is on App never needs to know App reads, writes, updates docs Mul)ple app servers can access same document at same )me COUCHBASE SERVER CLUSTER User Configured Replica Count = 1

17 How to access the data?

18 Couchbase.get( my-key );

19 Look at a document Key { } string : string, string : value, string JSON : { string : string, OBJECT string : value }, string : [ array ] ( DOCUMENT ) How to find document based on its avributes? get employee by get products by type... You need to look into the document/value

20 How to? Create an index!

21 Create the index { { {"id": "110f37fa30", {"id": "110f37fa30", "rev": {"id":" ", "110f37fa30", "rev": "expiration": {"id":" ", "110f37fa30", "rev": 0, "expiration": {"id":" ", "110f37fa30", "rev": 0, "flags": "expiration": {"id":" ", 0, "110f37fa30", "rev": 0, "flags": "expiration": {"id":" ", 0, "110f37fa30", "type": "rev": "flags": "id": "json" " ", 0, "110f37fa30", 0, "expiration": "type": "rev": "flags": "id": "json" " ", 0, "110f37fa30", 0, } "expiration": "type": "rev": "json" " ", 0, "flags": } "expiration": "type": "rev": 0, "json" " ", 0, "flags": } "expiration": 0, 0, "type": "flags": } "expiration": "json" 0, 0, "type": "flags": "json" 0, { } "type": "flags": "json" 0, { } "type": "json" {"name": } "Aventinus", "type": "json" {"name": } "Aventinus", "abv": {"name": } 8.2, "Aventinus", "abv": "name": 8.2, "Aventinus", "ibu": { "abv": 0, 8.2, "ibu": {"name": "Aventinus", "abv": 0, 8.2, "srm": "ibu": {"name": 0, "Aventinus", "abv": 0, "srm": "ibu": {"name": 8.2, 0, "Aventinus", 0, "upc": "abv": "srm": "ibu": 0, "name": 8.2, 0, "Aventinus", 0, "upc": "abv": "srm": 0, "name": 8.2, 0, "Aventinus", "type": "ibu": "upc": "abv": "srm": "beer", 0, 0, 8.2, 0, "type": "ibu": "upc": "abv": "beer", 0, 0, 8.2, "brewery_id": "srm": "type": "ibu": 0, "upc":"beer", 0, "110f1f2012", 0, "brewery_id": "srm": "type": "ibu": 0, "beer", "110f1f2012", 0, "updated": "upc": "brewery_id": "srm": "type":" , 0, "beer", "110f1f2012", 20:00:20", "updated": "upc": "brewery_id": "srm": " , 0, "110f1f2012", 20:00:20", "description": "type": "beer", "updated": "upc": "brewery_id": " , "Dark-ruby, "110f1f2012", 20:00:20", "description": "type": "beer", "updated": "upc": " , "Dark-ruby, 20:00:20",... "brewery_id": "description": "updated": Weizenbock", "type": "beer", "110f1f2012", " "Dark-ruby, 20:00:20",... "brewery_id": "description": Weizenbock", "type": "beer", "110f1f2012", "Dark-ruby, "category": "updated":... "brewery_id": "description": Weizenbock", "German " "Dark-ruby, Ale" "110f1f2012", 20:00:20", "category": "updated":... "brewery_id": Weizenbock", "German " Ale" "110f1f2012", 20:00:20", } "description": "Dark-ruby, "category": "updated":... Weizenbock", "German " Ale" 20:00:20", } "description": "Dark-ruby, "category": "updated":... Weizenbock", "German " Ale" 20:00:20", } "description": "Dark-ruby, "category":... Weizenbock", "German Ale" } "description": "Dark-ruby, "category":... Weizenbock", "German Ale" } "category":... Weizenbock", "German Ale" } "category": "German Ale" } "category": "German Ale" } } Key Value Aven@nus 8.2 Avenue Ale

22 Concrete Example This map func)on: receives the document and metadata as developer you just have to emit the K,V

23 Map Func)on Text

24 ?startkey= b1?startkey= bz & endkey= zn endkey= zz Pulls the Index- Keys between UTF- 8 Range specified by the startkey and endkey. doc. abba@couchbase.com beta@couchbase.com jasdeep@couchbase.com math@couchbase.com mae@couchbase.com ye@@couchbase.com zorro@couchbase.com meta.id u::1 u::7 u::2 u::5 u::6 u::4 u::3

25 ?key= Match a Single Index- Key doc. abba@couchbase.com beta@couchbase.com jasdeep@couchbase.com math@couchbase.com mae@couchbase.com ye@@couchbase.com zorro@couchbase.com meta.id u::1 u::7 u::2 u::5 u::6 u::4 u::3

26 ?keys=[ ] Query Mul@ple in the Set (Array Nota@on) doc. abba@couchbase.com beta@couchbase.com jasdeep@couchbase.com math@couchbase.com mae@couchbase.com ye@@couchbase.com zorro@couchbase.com meta.id u::1 u::7 u::2 u::5 u::6 u::4 u::3

27 How it works?

28 Indexing and Querying APP SERVER 1 APP SERVER 2 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP Query SERVER 1 ACTIVE 5 SERVER 2 ACTIVE 5 SERVER 3 ACTIVE 5 Indexing work is distributed amongst nodes Large data set possible Parallelize the effort Each node has index for data stored on it 4 REPLICA 4 REPLICA 4 REPLICA Queries combine the results from required nodes COUCHBASE SERVER CLUSTER User Configured Replica Count = 1

29 Couchbase Server 2.0: Views Views can cover a few different use cases Primary Index Simple secondary indexes (the most common) Complex secondary, ter@ary and composite indexes Aggrega@on func@ons (reduc@on) Example: count the number of North American Ales Organizing related data Built using Map/Reduce Map func@on creates a matrix from document fields Reduce func@on summarizes (reduces) informa@on

30 Distributed Index Build Phase Op)mized for lookups, in- order access and aggrega)ons All view reads from disk (different performance profile) View builds against every document on every node This is why you should group them in a design document Automa)cally kept up to date Incremental Map Reduce

31 Dynamic Range Queries with Op5onal Aggrega5on Efficiently fetch an row or group of related rows. Queries use cached values from B- tree inner nodes when possible Take advantage of in- order tree traversal with group_level queries?startkey= J &endkey= K { rows :[{ key : Juneau, value :null}]} SERVER 1 SERVER 2 SERVER 3 Ac@ve s Ac@ve s Ac@ve s 5 DOC 4 DOC 1 DOC 2 DOC 7 DOC 3 DOC 9 DOC 8 DOC 6 DOC Replica s Replica s Replica s 4 DOC 6 DOC 7 DOC 1 DOC 3 DOC 9 DOC 8 DOC 2 DOC 5 DOC

32 Append Only Index Disk acdvity is slow UpdaDng disk blocks is very slow Appending new data to the end of the current file is fast Overhead of reverse reading is small Because exisdng blocks are not re- used, can lead to fragmentadon Couchbase will compact the index View Processor Disk Changed uments View Processor Original Appended

33 Adding a new ument new root A-R 14 A-R 15 new reductions A-H 7 I-R 7 I-R 8 A-C 3 D-F 2 G-H 2 I-L 3 N-R 4 M-R 5 A B C D F G H I K L M N O Q R new key

34 What about Reduce? Out of the box func)ons : _count() _sum() _stats() Create your own if needed function(key, values, rereduce) { if (rereduce) { var result = 0; for (var i = 0; i < values.length; i++) { result += values[i]; } return result; } else { return values.length; } }

35 Reduce Func)on Key and Arrays of values as parameters WriVen Javascript Called aner the map func)on Used to reduce the result of a map of single values Used with grouping Could be ignored when querying reuse the index

36 Reduce in Ac)on Map() Result Key Value Belgian- Style Dubbel 1 Belgian- Style Dubbel 1 Belgian- Style Dubbel 1 Belgian- Style Pale Ale 1 Belgian- Style White 1 Belgian- Style White Reduce() _count() Result Key Value Belgian- Style Dubbel 3 Belgian- Style Pale Ale 1 Belgian- Style White 2

37 How to use it? Use client SDK to call the view: View view = client.getview("beer", "by_name"); Query query = new Query(); query.setincludes(true).setlimit(20).setrangestart(complexkey.of(startkey)).setrangeend(complexkey.of(startkey + "\uefff")); ViewResponse result = client.query(view, query); for(viewrow row : result) {... }

38 Demonstra)on

39 Hadoop & Couchbase Deal with Big Data More is be)er than Faster Batch Oriented Usually used to extract/transform data Fully distributed Map, Shuffle, Reduce Distributed Executed where the document is Deal with indexing data As fast as possible Use to query the data in the Database

40 Map Reduce in Couchbase Like many other NoSQL Database : Used for queries! Index are distributed on each node of the cluster Index are updated Incrementally Write you Map Reduce in Javascript

41 Thank Get Couchbase Server at hep://

42

Couchbase Server. Chris Anderson Chief

Couchbase Server. Chris Anderson Chief Couchbase Server Chris Anderson Chief Architect @jchris 1 Couchbase Server Simple = Fast Elas=c NoSQL Database Formerly known as Membase Server 2 Couchbase Server Features Built- in clustering All nodes

More information

Developing in NoSQL with Couchbase and Java. Raghavan N. Srinivas Couchbase 123

Developing in NoSQL with Couchbase and Java. Raghavan N. Srinivas Couchbase 123 Developing in NoSQL with Couchbase and Java Raghavan N. Srinivas Couchbase 123 Objective A technical overview To getting started on programming with Couchbase using Java To learn about the basic operations,

More information

Couchbase Architecture Couchbase Inc. 1

Couchbase Architecture Couchbase Inc. 1 Couchbase Architecture 2015 Couchbase Inc. 1 $whoami Laurent Doguin Couchbase Developer Advocate @ldoguin laurent.doguin@couchbase.com 2015 Couchbase Inc. 2 2 Big Data = Operational + Analytic (NoSQL +

More information

CS Silvia Zuffi - Sunil Mallya. Slides credits: official membase meetings

CS Silvia Zuffi - Sunil Mallya. Slides credits: official membase meetings CS227 - Silvia Zuffi - Sunil Mallya Slides credits: official membase meetings Schedule Overview silvia History silvia Data Model silvia Architecture sunil Transaction support sunil Case studies silvia

More information

How$to$integrate$Hadoop$ with$your$nosql$database?

How$to$integrate$Hadoop$ with$your$nosql$database? How$to$integrate$Hadoop$ with$your$nosql$database? Tugdual$ Tug $Grall Technical$Evangelist About$Me$ Tugdual$ Tug $Grall Couchbase Technical.Evangelist exo CTO Oracle Developer/Product.Manager Mainly.Java/SOA

More information

The NoSQL Landscape. Frank Weigel VP, Field Technical Opera;ons

The NoSQL Landscape. Frank Weigel VP, Field Technical Opera;ons The NoSQL Landscape Frank Weigel VP, Field Technical Opera;ons What we ll talk about Why RDBMS are not enough? What are the different NoSQL taxonomies? Which NoSQL is right for me? Macro Trends Driving

More information

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Elif Dede, Madhusudhan Govindaraju Lavanya Ramakrishnan, Dan Gunter, Shane Canon Department of Computer Science, Binghamton

More information

Developing in NoSQL with Couchbase

Developing in NoSQL with Couchbase Developing in NoSQL with Couchbase Raghavan Rags Srinivas Developer Advocate Simple. Fast. Elastic. Speaker Introduction Architect and Evangelist working with developers Speaker at JavaOne, RSA conferences,

More information

MapReduce, Apache Hadoop

MapReduce, Apache Hadoop NDBI040: Big Data Management and NoSQL Databases hp://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 2 MapReduce, Apache Hadoop Marn Svoboda svoboda@ksi.mff.cuni.cz 11. 10. 2016 Charles University

More information

MapReduce, Apache Hadoop

MapReduce, Apache Hadoop Czech Technical University in Prague, Faculty of Informaon Technology MIE-PDB: Advanced Database Systems hp://www.ksi.mff.cuni.cz/~svoboda/courses/2016-2-mie-pdb/ Lecture 12 MapReduce, Apache Hadoop Marn

More information

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The

More information

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:

More information

Realtime visitor analysis with Couchbase and Elasticsearch

Realtime visitor analysis with Couchbase and Elasticsearch Realtime visitor analysis with Couchbase and Elasticsearch Jeroen Reijn @jreijn #nosql13 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com About Hippo Visitor Analysis OneHippo

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

Document Databases: MongoDB

Document Databases: MongoDB NDBI040: Big Data Management and NoSQL Databases hp://www.ksi.mff.cuni.cz/~svoboda/courses/171-ndbi040/ Lecture 9 Document Databases: MongoDB Marn Svoboda svoboda@ksi.mff.cuni.cz 28. 11. 2017 Charles University

More information

Outline. Spanner Mo/va/on. Tom Anderson

Outline. Spanner Mo/va/on. Tom Anderson Spanner Mo/va/on Tom Anderson Outline Last week: Chubby: coordina/on service BigTable: scalable storage of structured data GFS: large- scale storage for bulk data Today/Friday: Lessons from GFS/BigTable

More information

Document stores using CouchDB

Document stores using CouchDB 2018 Document stores using CouchDB ADVANCED DATABASE PROJECT APARNA KHIRE, MINGRUI DONG aparna.khire@vub.be, mingdong@ulb.ac.be 1 Table of Contents 1. Introduction... 3 2. Background... 3 2.1 NoSQL Database...

More information

HBase... And Lewis Carroll! Twi:er,

HBase... And Lewis Carroll! Twi:er, HBase... And Lewis Carroll! jw4ean@cloudera.com Twi:er, LinkedIn: @jw4ean 1 Introduc@on 2010: Cloudera Solu@ons Architect 2011: Cloudera TAM/DSE 2012-2013: Cloudera Training focusing on Partners and Newbies

More information

Parallel Computing: MapReduce Jin, Hai

Parallel Computing: MapReduce Jin, Hai Parallel Computing: MapReduce Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! MapReduce is a distributed/parallel computing framework introduced by Google

More information

MapReduce. Cloud Computing COMP / ECPE 293A

MapReduce. Cloud Computing COMP / ECPE 293A Cloud Computing COMP / ECPE 293A MapReduce Jeffrey Dean and Sanjay Ghemawat, MapReduce: simplified data processing on large clusters, In Proceedings of the 6th conference on Symposium on Opera7ng Systems

More information

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela

More information

MapReduce-style data processing

MapReduce-style data processing MapReduce-style data processing Software Languages Team University of Koblenz-Landau Ralf Lämmel and Andrei Varanovich Related meanings of MapReduce Functional programming with map & reduce An algorithmic

More information

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons

More information

Lecture: The Google Bigtable

Lecture: The Google Bigtable Lecture: The Google Bigtable h#p://research.google.com/archive/bigtable.html 10/09/2014 Romain Jaco3n romain.jaco7n@orange.fr Agenda Introduc3on Data model API Building blocks Implementa7on Refinements

More information

vbuckets: The Core Enabling Mechanism for Couchbase Server Data Distribution (aka Auto-Sharding )

vbuckets: The Core Enabling Mechanism for Couchbase Server Data Distribution (aka Auto-Sharding ) vbuckets: The Core Enabling Mechanism for Data Distribution (aka Auto-Sharding ) Table of Contents vbucket Defined 3 key-vbucket-server ping illustrated 4 vbuckets in a world of s 5 TCP ports Deployment

More information

OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks

OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks Monte Zweben Co- Founder and Chief Execu6ve Officer John Leach Co- Founder and Chief Technology Officer September 30, 2015 The Tradi6onal

More information

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Indexing Process Indexes Indexes are data structures designed to make search faster Text search

More information

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System Ilkay Al(ntas and Daniel Crawl San Diego Supercomputer Center UC San Diego Jianwu Wang UMBC WorDS.sdsc.edu Computa3onal

More information

Welcome to the New Era of Cloud Computing

Welcome to the New Era of Cloud Computing Welcome to the New Era of Cloud Computing Aaron Kimball The web is replacing the desktop 1 SDKs & toolkits are there What about the backend? Image: Wikipedia user Calyponte 2 Two key concepts Processing

More information

Practice and Applications of Data Management CMPSCI 345. Lecture 18: Big Data, Hadoop, and MapReduce

Practice and Applications of Data Management CMPSCI 345. Lecture 18: Big Data, Hadoop, and MapReduce Practice and Applications of Data Management CMPSCI 345 Lecture 18: Big Data, Hadoop, and MapReduce Why Big Data, Hadoop, M-R? } What is the connec,on with the things we learned? } What about SQL? } What

More information

Standalone to SQL Server HA Clusters in Minutes.

Standalone to SQL Server HA Clusters in Minutes. Standalone to SQL Server HA Clusters in Minutes Connor.Cox@DH2i.com Failover Cluster Instances Instance- level failover Applica:on, OS, and infrastructure protec:on Fast, automated failover Free hdps://technet.microsog.com/en-

More information

The MapReduce Abstraction

The MapReduce Abstraction The MapReduce Abstraction Parallel Computing at Google Leverages multiple technologies to simplify large-scale parallel computations Proprietary computing clusters Map/Reduce software library Lots of other

More information

hashfs Applying Hashing to Op2mize File Systems for Small File Reads

hashfs Applying Hashing to Op2mize File Systems for Small File Reads hashfs Applying Hashing to Op2mize File Systems for Small File Reads Paul Lensing, Dirk Meister, André Brinkmann Paderborn Center for Parallel Compu2ng University of Paderborn Mo2va2on and Problem Design

More information

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent Introduc)on to Apache Ka1a Jun Rao Co- founder of Confluent Agenda Why people use Ka1a Technical overview of Ka1a What s coming What s Apache Ka1a Distributed, high throughput pub/sub system Ka1a Usage

More information

Search Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson

Search Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson Search Engines Informa1on Retrieval in Prac1ce Annotations by Michael L. Nelson All slides Addison Wesley, 2008 Indexes Indexes are data structures designed to make search faster Text search has unique

More information

Logisland Event mining at scale. Thomas [ ]

Logisland Event mining at scale. Thomas [ ] Logisland Event mining at scale Thomas Bailet @hurence [2017-01-19] Overview Logisland provides a stream analy0cs solu0on that can handle all enterprise-scale event data and processing Big picture Open

More information

Dealing with Memcached Challenges

Dealing with Memcached Challenges Dealing with Memcached Challenges Dealing with Memcached Challenges eplacing a Memcached Tier With a Couchbase Cluster Summary Memcached is an open-source caching technology that is used by 18 of the top

More information

Overview. Why MapReduce? What is MapReduce? The Hadoop Distributed File System Cloudera, Inc.

Overview. Why MapReduce? What is MapReduce? The Hadoop Distributed File System Cloudera, Inc. MapReduce and HDFS This presentation includes course content University of Washington Redistributed under the Creative Commons Attribution 3.0 license. All other contents: Overview Why MapReduce? What

More information

Distributed Systems. 29. Distributed Caching Paul Krzyzanowski. Rutgers University. Fall 2014

Distributed Systems. 29. Distributed Caching Paul Krzyzanowski. Rutgers University. Fall 2014 Distributed Systems 29. Distributed Caching Paul Krzyzanowski Rutgers University Fall 2014 December 5, 2014 2013 Paul Krzyzanowski 1 Caching Purpose of a cache Temporary storage to increase data access

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

Clustering Lecture 8: MapReduce

Clustering Lecture 8: MapReduce Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data

More information

A BigData Tour HDFS, Ceph and MapReduce

A BigData Tour HDFS, Ceph and MapReduce A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!

More information

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card The Rise of MongoDB Summary One of today s growing database

More information

Learning CouchDB A Non-Relational Alternative to Data Persistence for Modern Software Applications

Learning CouchDB A Non-Relational Alternative to Data Persistence for Modern Software Applications Learning CouchDB A Non-Relational Alternative to Data Persistence for Modern Software Applications Is CouchDB a good choice for your application? Problem: The Object-Relational Impedance Mismatch [1] How

More information

Func%onal Programming in Scheme and Lisp

Func%onal Programming in Scheme and Lisp Func%onal Programming in Scheme and Lisp http://www.lisperati.com/landoflisp/ Overview In a func(onal programming language, func(ons are first class objects You can create them, put them in data structures,

More information

Scaling MongoDB: Avoiding Common Pitfalls. Jon Tobin Senior Systems

Scaling MongoDB: Avoiding Common Pitfalls. Jon Tobin Senior Systems Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs www.linkedin.com/in/jonathanetobin Agenda Document Design Data Management Replica3on & Failover

More information

Re- op&mizing Data Parallel Compu&ng

Re- op&mizing Data Parallel Compu&ng Re- op&mizing Data Parallel Compu&ng Sameer Agarwal Srikanth Kandula, Nicolas Bruno, Ming- Chuan Wu, Ion Stoica, Jingren Zhou UC Berkeley A Data Parallel Job can be a collec/on of maps, A Data Parallel

More information

RAD, Rules, and Compatibility: What's Coming in Kuali Rice 2.0

RAD, Rules, and Compatibility: What's Coming in Kuali Rice 2.0 software development simplified RAD, Rules, and Compatibility: What's Coming in Kuali Rice 2.0 Eric Westfall - Indiana University JASIG 2011 For those who don t know Kuali Rice consists of mul8ple sub-

More information

Matt Ingenthron. Couchbase, Inc.

Matt Ingenthron. Couchbase, Inc. Matt Ingenthron Couchbase, Inc. 2 What is Membase? Before: Application scales linearly, data hits wall Application Scales Out Just add more commodity web servers Database Scales Up Get a bigger, more complex

More information

NoSQL data stores and SOS: Uniform Access to Non-Relational Database Systems Paolo Atzeni Francesca Bugiotti Luca Rossi

NoSQL data stores and SOS: Uniform Access to Non-Relational Database Systems Paolo Atzeni Francesca Bugiotti Luca Rossi NoSQL data stores and SOS: Uniform Access to Non-Relational Database Systems Paolo Atzeni Francesca Bugiotti Luca Rossi Outline Context Rela&onal DBMS NoSQL Data Stores NoSQL Timeline NoSQL Data Stores

More information

Introduction to HDFS and MapReduce

Introduction to HDFS and MapReduce Introduction to HDFS and MapReduce Who Am I - Ryan Tabora - Data Developer at Think Big Analytics - Big Data Consulting - Experience working with Hadoop, HBase, Hive, Solr, Cassandra, etc. 2 Who Am I -

More information

Oracle VM Workshop Applica>on Driven Virtualiza>on

Oracle VM Workshop Applica>on Driven Virtualiza>on Oracle VM Workshop Applica>on Driven Virtualiza>on Simon COTER Principal Product Manager Oracle VM & VirtualBox simon.coter@oracle.com hnps://blogs.oracle.com/scoter November 25th, 2015 Copyright 2014

More information

Architecture of So-ware Systems Massively Distributed Architectures Reliability, Failover and failures. Mar>n Rehák

Architecture of So-ware Systems Massively Distributed Architectures Reliability, Failover and failures. Mar>n Rehák Architecture of So-ware Systems Massively Distributed Architectures Reliability, Failover and failures Mar>n Rehák Mo>va>on Internet- based business models imposed new requirements on computa>onal architectures

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

Capabilities of Cloudant NoSQL Database IBM Corporation

Capabilities of Cloudant NoSQL Database IBM Corporation Capabilities of Cloudant NoSQL Database After you complete this section, you should understand: The features of the Cloudant NoSQL Database: HTTP RESTfulAPI Secondary indexes and MapReduce Cloudant Query

More information

NODE.JS SERVER SIDE JAVASCRIPT. Introduc)on Node.js

NODE.JS SERVER SIDE JAVASCRIPT. Introduc)on Node.js NODE.JS SERVER SIDE JAVASCRIPT Introduc)on Node.js Node.js was created by Ryan Dahl starting in 2009. For more information visit: http://www.nodejs.org 1 What about Node.js? 1. JavaScript used in client-side

More information

Splout SQL When Big Data Output is also Big Data

Splout SQL When Big Data Output is also Big Data Iván de Prado Alonso CEO of Datasalt www.datasalt.es @ivanprado @datasalt Splout SQL When Big Data Output is also Big Data Big Data consulting & training Full SQL * * Within each par??on Full SQL * Unlike

More information

Submitted to: Dr. Sunnie Chung. Presented by: Sonal Deshmukh Jay Upadhyay

Submitted to: Dr. Sunnie Chung. Presented by: Sonal Deshmukh Jay Upadhyay Submitted to: Dr. Sunnie Chung Presented by: Sonal Deshmukh Jay Upadhyay Submitted to: Dr. Sunny Chung Presented by: Sonal Deshmukh Jay Upadhyay What is Apache Survey shows huge popularity spike for Apache

More information

Agenda. Request- Level Parallelism. Agenda. Anatomy of a Web Search. Google Query- Serving Architecture 9/20/10

Agenda. Request- Level Parallelism. Agenda. Anatomy of a Web Search. Google Query- Serving Architecture 9/20/10 Agenda CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaHerson hhp://inst.eecs.berkeley.edu/~cs61c/fa10 Request and Data Level Parallelism Administrivia

More information

Today s Lecture. CS 61C: Great Ideas in Computer Architecture (Machine Structures) Map Reduce

Today s Lecture. CS 61C: Great Ideas in Computer Architecture (Machine Structures) Map Reduce CS 61C: Great Ideas in Computer Architecture (Machine Structures) Map Reduce 8/29/12 Instructors Krste Asanovic, Randy H. Katz hgp://inst.eecs.berkeley.edu/~cs61c/fa12 Fall 2012 - - Lecture #3 1 Today

More information

Latest Trends in Database Technology NoSQL and Beyond

Latest Trends in Database Technology NoSQL and Beyond Latest Trends in Database Technology NoSQL and Beyond Sebas>an Marsching www.aquenos.com Why we want more than SQL Performance / Data Size Opera>onal Costs Availability 2 NoSQL NoSQL Not Only SQL 3 NoSQL

More information

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. 1. MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. MapReduce is a framework for processing big data which processes data in two phases, a Map

More information

June 20, 2017 Revision NoSQL Database Architectural Comparison

June 20, 2017 Revision NoSQL Database Architectural Comparison June 20, 2017 Revision 0.07 NoSQL Database Architectural Comparison Table of Contents Executive Summary... 1 Introduction... 2 Cluster Topology... 4 Consistency Model... 6 Replication Strategy... 8 Failover

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

Tools for Social Networking Infrastructures

Tools for Social Networking Infrastructures Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes

More information

Intro to Couchbase Server for ColdFusion - Clustered NoSQL and Caching at its Finest

Intro to Couchbase Server for ColdFusion - Clustered NoSQL and Caching at its Finest Tweet Intro to Couchbase Server for ColdFusion - Clustered NoSQL and Caching at its Finest Brad Wood Jul 26, 2013 Today we are starting a new blogging series on how to leverage Couchbase NoSQL from ColdFusion

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

MongoDB Revs You Up: What Storage Engine is Right for You?

MongoDB Revs You Up: What Storage Engine is Right for You? MongoDB Revs You Up: What Storage Engine is Right for You? Jon Tobin, Director of Solution Eng. --------------------- Jon.Tobin@percona.com @jontobs Linkedin.com/in/jonathanetobin Agenda How did we get

More information

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016 DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016 AGENDA FOR TODAY Advanced Mysql More than just SELECT Creating tables MySQL optimizations: Storage engines, indexing.

More information

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation

More information

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,

More information

MapReduce. Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan

MapReduce. Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan MapReduce Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan The Problem Massive amounts of data >100TB (the internet) Needs simple processing Computers

More information

Distributed Non-Relational Databases. Pelle Jakovits

Distributed Non-Relational Databases. Pelle Jakovits Distributed Non-Relational Databases Pelle Jakovits Tartu, 7 December 2018 Outline Relational model NoSQL Movement Non-relational data models Key-value Document-oriented Column family Graph Non-relational

More information

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data

More information

Introduction to MapReduce

Introduction to MapReduce 732A54 Big Data Analytics Introduction to MapReduce Christoph Kessler IDA, Linköping University Towards Parallel Processing of Big-Data Big Data too large to be read+processed in reasonable time by 1 server

More information

Be warned Niklas Gustavsson

Be warned Niklas Gustavsson 1 Niklas Gustavsson niklas.gustavsson@callistaenterprise.se www.callistaenterprise.se Be warned CouchDB, Slide 2 2 Won't replace your relational database You (probably) won't be using it any time soon

More information

Op#mizing MapReduce for Highly- Distributed Environments

Op#mizing MapReduce for Highly- Distributed Environments Op#mizing MapReduce for Highly- Distributed Environments Abhishek Chandra Associate Professor Department of Computer Science and Engineering University of Minnesota hep://www.cs.umn.edu/~chandra 1 Big

More information

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018 Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster

More information

Introduction to Hadoop and MapReduce

Introduction to Hadoop and MapReduce Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large

More information

@ COUCHBASE CONNECT. Using Couchbase. By: Carleton Miyamoto, Michael Kehoe Version: 1.1w LinkedIn Corpora3on

@ COUCHBASE CONNECT. Using Couchbase. By: Carleton Miyamoto, Michael Kehoe Version: 1.1w LinkedIn Corpora3on @ COUCHBASE CONNECT Using Couchbase By: Carleton Miyamoto, Michael Kehoe Version: 1.1w Overview The LinkedIn Story Enter Couchbase Development and Opera3ons Clusters and Numbers Opera3onal Tooling Carleton

More information

OpenWorld 2015 Oracle Par22oning

OpenWorld 2015 Oracle Par22oning OpenWorld 2015 Oracle Par22oning Did You Think It Couldn t Get Any Be6er? Safe Harbor Statement The following is intended to outline our general product direc2on. It is intended for informa2on purposes

More information

Non-Relational Databases. Pelle Jakovits

Non-Relational Databases. Pelle Jakovits Non-Relational Databases Pelle Jakovits 25 October 2017 Outline Background Relational model Database scaling The NoSQL Movement CAP Theorem Non-relational data models Key-value Document-oriented Column

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

ECS 165B: Database System Implementa6on Lecture 3

ECS 165B: Database System Implementa6on Lecture 3 ECS 165B: Database System Implementa6on Lecture 3 UC Davis April 4, 2011 Acknowledgements: some slides based on earlier ones by Raghu Ramakrishnan, Johannes Gehrke, Jennifer Widom, Bertram Ludaescher,

More information

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab CS6030 Cloud Computing Ajay Gupta B239, CEAS Computer Science Department Western Michigan University ajay.gupta@wmich.edu 276-3104 1 Acknowledgements I have liberally borrowed these slides and material

More information

26- April- 2010, Spring Member Mee4ng Chris Hyzer, Grouper developer

26- April- 2010, Spring Member Mee4ng Chris Hyzer, Grouper developer 26- April- 2010, Spring Member Mee4ng Chris Hyzer, Grouper developer XMPP integra4on XMPP and the Grouper loader XMPP and the Grouper client Kuali Rice integra4on Rice groups Rice subjects Automa4c workflow

More information

Introduc)on to. CS60092: Informa0on Retrieval

Introduc)on to. CS60092: Informa0on Retrieval Introduc)on to CS60092: Informa0on Retrieval Ch. 4 Index construc)on How do we construct an index? What strategies can we use with limited main memory? Sec. 4.1 Hardware basics Many design decisions in

More information

High Performance Drupal

High Performance Drupal High Performance Drupal A Panel Discussion 25 1 st St., Suite 104, Cambridge, MA 02141 www.bioraft.com Panelists Erik Peterson (eporama) Seth Cohn (sethcohn) Micky MeMs (freescholar) Patrick CorbeM (pcorbem)

More information

Elas%c Load Balancing, Amazon CloudWatch, and Auto Scaling Sco) Linder

Elas%c Load Balancing, Amazon CloudWatch, and Auto Scaling Sco) Linder Elas%c Load Balancing, Amazon, and Auto Scaling Sco) Linder Overview Elas4c Load Balancing Features/Restric4ons Connec4on Types Listeners Configura4on Op4ons Auto Scaling Launch Configura4ons Scaling Types

More information

Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391

Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391 Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391 Outline Big Data Big Data Examples Challenges with traditional storage NoSQL Hadoop HDFS MapReduce Architecture 2 Big Data In information

More information

Rule 14 Use Databases Appropriately

Rule 14 Use Databases Appropriately Rule 14 Use Databases Appropriately Rule 14: What, When, How, and Why What: Use relational databases when you need ACID properties to maintain relationships between your data. For other data storage needs

More information

Func%onal Programming in Scheme and Lisp

Func%onal Programming in Scheme and Lisp Func%onal Programming in Scheme and Lisp http://www.lisperati.com/landoflisp/ Overview In a func(onal programming language, func(ons are first class objects You can create them, put them in data structures,

More information

Mul$media Networking. #9 CDN Solu$ons Semester Ganjil 2012 PTIIK Universitas Brawijaya

Mul$media Networking. #9 CDN Solu$ons Semester Ganjil 2012 PTIIK Universitas Brawijaya Mul$media Networking #9 CDN Solu$ons Semester Ganjil 2012 PTIIK Universitas Brawijaya Schedule of Class Mee$ng 1. Introduc$on 2. Applica$ons of MN 3. Requirements of MN 4. Coding and Compression 5. RTP

More information

Stay Informed During and AEer OpenWorld

Stay Informed During and AEer OpenWorld Stay Informed During and AEer OpenWorld TwiIer: @OracleBigData, @OracleExadata, @Infrastructure Follow #CloudReady LinkedIn: Oracle IT Infrastructure Oracle Showcase Page Oracle Big Data Oracle Showcase

More information

Scaling DreamFactory

Scaling DreamFactory Scaling DreamFactory This white paper is designed to provide information to enterprise customers about how to scale a DreamFactory Instance. The sections below talk about horizontal, vertical, and cloud

More information

Securing Hadoop. Keys Botzum, MapR Technologies Jan MapR Technologies - Confiden6al

Securing Hadoop. Keys Botzum, MapR Technologies Jan MapR Technologies - Confiden6al Securing Hadoop Keys Botzum, MapR Technologies kbotzum@maprtech.com Jan 2014 MapR Technologies - Confiden6al 1 Why Secure Hadoop Historically security wasn t a high priority Reflec6on of the type of data

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Managing IoT and Time Series Data with Amazon ElastiCache for Redis Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All

More information

MapReduce. Stony Brook University CSE545, Fall 2016

MapReduce. Stony Brook University CSE545, Fall 2016 MapReduce Stony Brook University CSE545, Fall 2016 Classical Data Mining CPU Memory Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining

More information

Stream and Complex Event Processing Discovering Exis7ng Systems: esper

Stream and Complex Event Processing Discovering Exis7ng Systems: esper Stream and Complex Event Processing Discovering Exis7ng Systems: esper G. Cugola E. Della Valle A. Margara Politecnico di Milano gianpaolo.cugola@polimi.it emanuele.dellavalle@polimi.it Univ. della Svizzera

More information