Friday, April 26, 13
|
|
- Madison Patrick
- 5 years ago
- Views:
Transcription
1
2 Introduc)on to Map Reduce with Couchbase Tugdual Grall NoSQL Ma)ers 13 - Cologne - April 25th 2013
3 About Me Tugdual Tug Grall Couchbase exo Technical Evangelist CTO Oracle Developer/Product Manager Mainly Java/SOA Developer in firms hep://blog.grallandco.com tgrall NantesJUG co- founder Pet Project : hep://
4 What s the Problem? Lots of Data Big Data Big Users SaaS/Cloud CompuDng
5 Solu)on Distribute: the data the processing of the data
6 Map Reduce MapReduce is a programming model for processing large data sets, and the name of an implementa@on of the model by Google. MapReduce is typically used to do distributed compu@ng on clusters of computers. hep://research.google.com/archive/mapreduce.html
7 In details Developer specifies 2 methods: map (in_key, in_value) -> list(out_key, intermediate_value) Processes input data Produces key, values pairs reduce (out_key, list(intermediate_value)) -> list(out_value) Combines all intermediate values for a par@cular key Produce a set of merged output values
8 Execu)on
9 Most common use case Yahoo inc.
10 What about Couchbase?
11 Couchbase Open Source Project Leading NoSQL database project focused on distributed database technology and surrounding ecosystem Supports both key- value and document- oriented use cases All components are available under the Apache 2.0 Public License Obtained as packaged soxware in both enterprise and community Couchbase Open Source Project
12 Couchbase Server Core Principles Easy Scalability PERFORMANCE Consistent High Performance Grow cluster without changes, without with a single click Consistent sub- millisecond read and write with consistent high throughput Always On 24x365 JSON JSON JSON JSON Flexible Data Model No down@me for soxware upgrades, hardware maintenance, etc. JSON document model with no fixed schema.
13 Addi)onal Couchbase Server Features Built- in clustering All nodes equal Data with auto- failover Zero- maintenance Built- in managed cached Append- only storage layer Online Monitoring and admin API & UI SDK for a variety of languages
14 Couchbase Server 2.0 Architecture 8092 Query API Memcapable Memcapable 2.0 Moxi Query Engine Memcached Couchbase EP Engine Data Manager New Persistence Layer storage interface REST management API/Web UI Heartbeat Process monitor manager Global singleton supervisor Rebalance orchestrator Node health monitor vbucket state and manager Cluster Manager hvp on each node one per cluster Erlang/OTP HTTP 8091 Erlang port mapper 4369 Distributed Erlang
15 Couchbase Server 2.0 Architecture 8092 Query API Memcapable Memcapable 2.0 Moxi Query Engine Object- level Cache RAM Cache, Indexing & Persistence Management (C & V8) Couchbase EP Engine New Disk Persistence Persistence Layer storage interface REST management API/Web UI Heartbeat Process monitor manager Global singleton supervisor Rebalance orchestrator Node health monitor vbucket state and manager Server/Cluster Management & CommunicaDon (Erlang) hvp on each node one per cluster Erlang/OTP The Unreasonable Effectiveness of C by Damien Katz HTTP 8091 Erlang port mapper 4369 Distributed Erlang
16 Basic Opera)on APP SERVER 1 APP SERVER 2 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP READ/WRITE/UPDATE SERVER 1 ACTIVE SERVER 2 ACTIVE SERVER 3 ACTIVE s distributed evenly across servers Each server stores both ac)ve and replica docs Only one server ac@ve at Client library provides app with simple interface to database REPLICA 4 REPLICA 6 REPLICA 7 Cluster map provides map to which server doc is on App never needs to know App reads, writes, updates docs Mul)ple app servers can access same document at same )me COUCHBASE SERVER CLUSTER User Configured Replica Count = 1
17 How to access the data?
18 Couchbase.get( my-key );
19 Look at a document Key { } string : string, string : value, string JSON : { string : string, OBJECT string : value }, string : [ array ] ( DOCUMENT ) How to find document based on its avributes? get employee by get products by type... You need to look into the document/value
20 How to? Create an index!
21 Create the index { { {"id": "110f37fa30", {"id": "110f37fa30", "rev": {"id":" ", "110f37fa30", "rev": "expiration": {"id":" ", "110f37fa30", "rev": 0, "expiration": {"id":" ", "110f37fa30", "rev": 0, "flags": "expiration": {"id":" ", 0, "110f37fa30", "rev": 0, "flags": "expiration": {"id":" ", 0, "110f37fa30", "type": "rev": "flags": "id": "json" " ", 0, "110f37fa30", 0, "expiration": "type": "rev": "flags": "id": "json" " ", 0, "110f37fa30", 0, } "expiration": "type": "rev": "json" " ", 0, "flags": } "expiration": "type": "rev": 0, "json" " ", 0, "flags": } "expiration": 0, 0, "type": "flags": } "expiration": "json" 0, 0, "type": "flags": "json" 0, { } "type": "flags": "json" 0, { } "type": "json" {"name": } "Aventinus", "type": "json" {"name": } "Aventinus", "abv": {"name": } 8.2, "Aventinus", "abv": "name": 8.2, "Aventinus", "ibu": { "abv": 0, 8.2, "ibu": {"name": "Aventinus", "abv": 0, 8.2, "srm": "ibu": {"name": 0, "Aventinus", "abv": 0, "srm": "ibu": {"name": 8.2, 0, "Aventinus", 0, "upc": "abv": "srm": "ibu": 0, "name": 8.2, 0, "Aventinus", 0, "upc": "abv": "srm": 0, "name": 8.2, 0, "Aventinus", "type": "ibu": "upc": "abv": "srm": "beer", 0, 0, 8.2, 0, "type": "ibu": "upc": "abv": "beer", 0, 0, 8.2, "brewery_id": "srm": "type": "ibu": 0, "upc":"beer", 0, "110f1f2012", 0, "brewery_id": "srm": "type": "ibu": 0, "beer", "110f1f2012", 0, "updated": "upc": "brewery_id": "srm": "type":" , 0, "beer", "110f1f2012", 20:00:20", "updated": "upc": "brewery_id": "srm": " , 0, "110f1f2012", 20:00:20", "description": "type": "beer", "updated": "upc": "brewery_id": " , "Dark-ruby, "110f1f2012", 20:00:20", "description": "type": "beer", "updated": "upc": " , "Dark-ruby, 20:00:20",... "brewery_id": "description": "updated": Weizenbock", "type": "beer", "110f1f2012", " "Dark-ruby, 20:00:20",... "brewery_id": "description": Weizenbock", "type": "beer", "110f1f2012", "Dark-ruby, "category": "updated":... "brewery_id": "description": Weizenbock", "German " "Dark-ruby, Ale" "110f1f2012", 20:00:20", "category": "updated":... "brewery_id": Weizenbock", "German " Ale" "110f1f2012", 20:00:20", } "description": "Dark-ruby, "category": "updated":... Weizenbock", "German " Ale" 20:00:20", } "description": "Dark-ruby, "category": "updated":... Weizenbock", "German " Ale" 20:00:20", } "description": "Dark-ruby, "category":... Weizenbock", "German Ale" } "description": "Dark-ruby, "category":... Weizenbock", "German Ale" } "category":... Weizenbock", "German Ale" } "category": "German Ale" } "category": "German Ale" } } Key Value Aven@nus 8.2 Avenue Ale
22 Concrete Example This map func)on: receives the document and metadata as developer you just have to emit the K,V
23 Map Func)on Text
24 ?startkey= b1?startkey= bz & endkey= zn endkey= zz Pulls the Index- Keys between UTF- 8 Range specified by the startkey and endkey. doc. abba@couchbase.com beta@couchbase.com jasdeep@couchbase.com math@couchbase.com mae@couchbase.com ye@@couchbase.com zorro@couchbase.com meta.id u::1 u::7 u::2 u::5 u::6 u::4 u::3
25 ?key= Match a Single Index- Key doc. abba@couchbase.com beta@couchbase.com jasdeep@couchbase.com math@couchbase.com mae@couchbase.com ye@@couchbase.com zorro@couchbase.com meta.id u::1 u::7 u::2 u::5 u::6 u::4 u::3
26 ?keys=[ ] Query Mul@ple in the Set (Array Nota@on) doc. abba@couchbase.com beta@couchbase.com jasdeep@couchbase.com math@couchbase.com mae@couchbase.com ye@@couchbase.com zorro@couchbase.com meta.id u::1 u::7 u::2 u::5 u::6 u::4 u::3
27 How it works?
28 Indexing and Querying APP SERVER 1 APP SERVER 2 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP Query SERVER 1 ACTIVE 5 SERVER 2 ACTIVE 5 SERVER 3 ACTIVE 5 Indexing work is distributed amongst nodes Large data set possible Parallelize the effort Each node has index for data stored on it 4 REPLICA 4 REPLICA 4 REPLICA Queries combine the results from required nodes COUCHBASE SERVER CLUSTER User Configured Replica Count = 1
29 Couchbase Server 2.0: Views Views can cover a few different use cases Primary Index Simple secondary indexes (the most common) Complex secondary, ter@ary and composite indexes Aggrega@on func@ons (reduc@on) Example: count the number of North American Ales Organizing related data Built using Map/Reduce Map func@on creates a matrix from document fields Reduce func@on summarizes (reduces) informa@on
30 Distributed Index Build Phase Op)mized for lookups, in- order access and aggrega)ons All view reads from disk (different performance profile) View builds against every document on every node This is why you should group them in a design document Automa)cally kept up to date Incremental Map Reduce
31 Dynamic Range Queries with Op5onal Aggrega5on Efficiently fetch an row or group of related rows. Queries use cached values from B- tree inner nodes when possible Take advantage of in- order tree traversal with group_level queries?startkey= J &endkey= K { rows :[{ key : Juneau, value :null}]} SERVER 1 SERVER 2 SERVER 3 Ac@ve s Ac@ve s Ac@ve s 5 DOC 4 DOC 1 DOC 2 DOC 7 DOC 3 DOC 9 DOC 8 DOC 6 DOC Replica s Replica s Replica s 4 DOC 6 DOC 7 DOC 1 DOC 3 DOC 9 DOC 8 DOC 2 DOC 5 DOC
32 Append Only Index Disk acdvity is slow UpdaDng disk blocks is very slow Appending new data to the end of the current file is fast Overhead of reverse reading is small Because exisdng blocks are not re- used, can lead to fragmentadon Couchbase will compact the index View Processor Disk Changed uments View Processor Original Appended
33 Adding a new ument new root A-R 14 A-R 15 new reductions A-H 7 I-R 7 I-R 8 A-C 3 D-F 2 G-H 2 I-L 3 N-R 4 M-R 5 A B C D F G H I K L M N O Q R new key
34 What about Reduce? Out of the box func)ons : _count() _sum() _stats() Create your own if needed function(key, values, rereduce) { if (rereduce) { var result = 0; for (var i = 0; i < values.length; i++) { result += values[i]; } return result; } else { return values.length; } }
35 Reduce Func)on Key and Arrays of values as parameters WriVen Javascript Called aner the map func)on Used to reduce the result of a map of single values Used with grouping Could be ignored when querying reuse the index
36 Reduce in Ac)on Map() Result Key Value Belgian- Style Dubbel 1 Belgian- Style Dubbel 1 Belgian- Style Dubbel 1 Belgian- Style Pale Ale 1 Belgian- Style White 1 Belgian- Style White Reduce() _count() Result Key Value Belgian- Style Dubbel 3 Belgian- Style Pale Ale 1 Belgian- Style White 2
37 How to use it? Use client SDK to call the view: View view = client.getview("beer", "by_name"); Query query = new Query(); query.setincludes(true).setlimit(20).setrangestart(complexkey.of(startkey)).setrangeend(complexkey.of(startkey + "\uefff")); ViewResponse result = client.query(view, query); for(viewrow row : result) {... }
38 Demonstra)on
39 Hadoop & Couchbase Deal with Big Data More is be)er than Faster Batch Oriented Usually used to extract/transform data Fully distributed Map, Shuffle, Reduce Distributed Executed where the document is Deal with indexing data As fast as possible Use to query the data in the Database
40 Map Reduce in Couchbase Like many other NoSQL Database : Used for queries! Index are distributed on each node of the cluster Index are updated Incrementally Write you Map Reduce in Javascript
41 Thank Get Couchbase Server at hep://
42
Couchbase Server. Chris Anderson Chief
Couchbase Server Chris Anderson Chief Architect @jchris 1 Couchbase Server Simple = Fast Elas=c NoSQL Database Formerly known as Membase Server 2 Couchbase Server Features Built- in clustering All nodes
More informationDeveloping in NoSQL with Couchbase and Java. Raghavan N. Srinivas Couchbase 123
Developing in NoSQL with Couchbase and Java Raghavan N. Srinivas Couchbase 123 Objective A technical overview To getting started on programming with Couchbase using Java To learn about the basic operations,
More informationCouchbase Architecture Couchbase Inc. 1
Couchbase Architecture 2015 Couchbase Inc. 1 $whoami Laurent Doguin Couchbase Developer Advocate @ldoguin laurent.doguin@couchbase.com 2015 Couchbase Inc. 2 2 Big Data = Operational + Analytic (NoSQL +
More informationCS Silvia Zuffi - Sunil Mallya. Slides credits: official membase meetings
CS227 - Silvia Zuffi - Sunil Mallya Slides credits: official membase meetings Schedule Overview silvia History silvia Data Model silvia Architecture sunil Transaction support sunil Case studies silvia
More informationHow$to$integrate$Hadoop$ with$your$nosql$database?
How$to$integrate$Hadoop$ with$your$nosql$database? Tugdual$ Tug $Grall Technical$Evangelist About$Me$ Tugdual$ Tug $Grall Couchbase Technical.Evangelist exo CTO Oracle Developer/Product.Manager Mainly.Java/SOA
More informationThe NoSQL Landscape. Frank Weigel VP, Field Technical Opera;ons
The NoSQL Landscape Frank Weigel VP, Field Technical Opera;ons What we ll talk about Why RDBMS are not enough? What are the different NoSQL taxonomies? Which NoSQL is right for me? Macro Trends Driving
More informationPerformance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis
Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Elif Dede, Madhusudhan Govindaraju Lavanya Ramakrishnan, Dan Gunter, Shane Canon Department of Computer Science, Binghamton
More informationDeveloping in NoSQL with Couchbase
Developing in NoSQL with Couchbase Raghavan Rags Srinivas Developer Advocate Simple. Fast. Elastic. Speaker Introduction Architect and Evangelist working with developers Speaker at JavaOne, RSA conferences,
More informationMapReduce, Apache Hadoop
NDBI040: Big Data Management and NoSQL Databases hp://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 2 MapReduce, Apache Hadoop Marn Svoboda svoboda@ksi.mff.cuni.cz 11. 10. 2016 Charles University
More informationMapReduce, Apache Hadoop
Czech Technical University in Prague, Faculty of Informaon Technology MIE-PDB: Advanced Database Systems hp://www.ksi.mff.cuni.cz/~svoboda/courses/2016-2-mie-pdb/ Lecture 12 MapReduce, Apache Hadoop Marn
More informationParallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce
Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The
More informationInforma)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies
Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:
More informationRealtime visitor analysis with Couchbase and Elasticsearch
Realtime visitor analysis with Couchbase and Elasticsearch Jeroen Reijn @jreijn #nosql13 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com About Hippo Visitor Analysis OneHippo
More informationHadoop An Overview. - Socrates CCDH
Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected
More informationDocument Databases: MongoDB
NDBI040: Big Data Management and NoSQL Databases hp://www.ksi.mff.cuni.cz/~svoboda/courses/171-ndbi040/ Lecture 9 Document Databases: MongoDB Marn Svoboda svoboda@ksi.mff.cuni.cz 28. 11. 2017 Charles University
More informationOutline. Spanner Mo/va/on. Tom Anderson
Spanner Mo/va/on Tom Anderson Outline Last week: Chubby: coordina/on service BigTable: scalable storage of structured data GFS: large- scale storage for bulk data Today/Friday: Lessons from GFS/BigTable
More informationDocument stores using CouchDB
2018 Document stores using CouchDB ADVANCED DATABASE PROJECT APARNA KHIRE, MINGRUI DONG aparna.khire@vub.be, mingdong@ulb.ac.be 1 Table of Contents 1. Introduction... 3 2. Background... 3 2.1 NoSQL Database...
More informationHBase... And Lewis Carroll! Twi:er,
HBase... And Lewis Carroll! jw4ean@cloudera.com Twi:er, LinkedIn: @jw4ean 1 Introduc@on 2010: Cloudera Solu@ons Architect 2011: Cloudera TAM/DSE 2012-2013: Cloudera Training focusing on Partners and Newbies
More informationParallel Computing: MapReduce Jin, Hai
Parallel Computing: MapReduce Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! MapReduce is a distributed/parallel computing framework introduced by Google
More informationMapReduce. Cloud Computing COMP / ECPE 293A
Cloud Computing COMP / ECPE 293A MapReduce Jeffrey Dean and Sanjay Ghemawat, MapReduce: simplified data processing on large clusters, In Proceedings of the 6th conference on Symposium on Opera7ng Systems
More informationBig Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela
More informationMapReduce-style data processing
MapReduce-style data processing Software Languages Team University of Koblenz-Landau Ralf Lämmel and Andrei Varanovich Related meanings of MapReduce Functional programming with map & reduce An algorithmic
More informationJargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems
Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons
More informationLecture: The Google Bigtable
Lecture: The Google Bigtable h#p://research.google.com/archive/bigtable.html 10/09/2014 Romain Jaco3n romain.jaco7n@orange.fr Agenda Introduc3on Data model API Building blocks Implementa7on Refinements
More informationvbuckets: The Core Enabling Mechanism for Couchbase Server Data Distribution (aka Auto-Sharding )
vbuckets: The Core Enabling Mechanism for Data Distribution (aka Auto-Sharding ) Table of Contents vbucket Defined 3 key-vbucket-server ping illustrated 4 vbuckets in a world of s 5 TCP ports Deployment
More informationOLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks
OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks Monte Zweben Co- Founder and Chief Execu6ve Officer John Leach Co- Founder and Chief Technology Officer September 30, 2015 The Tradi6onal
More informationCS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University
CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Indexing Process Indexes Indexes are data structures designed to make search faster Text search
More informationA Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System
A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System Ilkay Al(ntas and Daniel Crawl San Diego Supercomputer Center UC San Diego Jianwu Wang UMBC WorDS.sdsc.edu Computa3onal
More informationWelcome to the New Era of Cloud Computing
Welcome to the New Era of Cloud Computing Aaron Kimball The web is replacing the desktop 1 SDKs & toolkits are there What about the backend? Image: Wikipedia user Calyponte 2 Two key concepts Processing
More informationPractice and Applications of Data Management CMPSCI 345. Lecture 18: Big Data, Hadoop, and MapReduce
Practice and Applications of Data Management CMPSCI 345 Lecture 18: Big Data, Hadoop, and MapReduce Why Big Data, Hadoop, M-R? } What is the connec,on with the things we learned? } What about SQL? } What
More informationStandalone to SQL Server HA Clusters in Minutes.
Standalone to SQL Server HA Clusters in Minutes Connor.Cox@DH2i.com Failover Cluster Instances Instance- level failover Applica:on, OS, and infrastructure protec:on Fast, automated failover Free hdps://technet.microsog.com/en-
More informationThe MapReduce Abstraction
The MapReduce Abstraction Parallel Computing at Google Leverages multiple technologies to simplify large-scale parallel computations Proprietary computing clusters Map/Reduce software library Lots of other
More informationhashfs Applying Hashing to Op2mize File Systems for Small File Reads
hashfs Applying Hashing to Op2mize File Systems for Small File Reads Paul Lensing, Dirk Meister, André Brinkmann Paderborn Center for Parallel Compu2ng University of Paderborn Mo2va2on and Problem Design
More informationIntroduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent
Introduc)on to Apache Ka1a Jun Rao Co- founder of Confluent Agenda Why people use Ka1a Technical overview of Ka1a What s coming What s Apache Ka1a Distributed, high throughput pub/sub system Ka1a Usage
More informationSearch Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson
Search Engines Informa1on Retrieval in Prac1ce Annotations by Michael L. Nelson All slides Addison Wesley, 2008 Indexes Indexes are data structures designed to make search faster Text search has unique
More informationLogisland Event mining at scale. Thomas [ ]
Logisland Event mining at scale Thomas Bailet @hurence [2017-01-19] Overview Logisland provides a stream analy0cs solu0on that can handle all enterprise-scale event data and processing Big picture Open
More informationDealing with Memcached Challenges
Dealing with Memcached Challenges Dealing with Memcached Challenges eplacing a Memcached Tier With a Couchbase Cluster Summary Memcached is an open-source caching technology that is used by 18 of the top
More informationOverview. Why MapReduce? What is MapReduce? The Hadoop Distributed File System Cloudera, Inc.
MapReduce and HDFS This presentation includes course content University of Washington Redistributed under the Creative Commons Attribution 3.0 license. All other contents: Overview Why MapReduce? What
More informationDistributed Systems. 29. Distributed Caching Paul Krzyzanowski. Rutgers University. Fall 2014
Distributed Systems 29. Distributed Caching Paul Krzyzanowski Rutgers University Fall 2014 December 5, 2014 2013 Paul Krzyzanowski 1 Caching Purpose of a cache Temporary storage to increase data access
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationClustering Lecture 8: MapReduce
Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data
More informationA BigData Tour HDFS, Ceph and MapReduce
A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!
More informationAccelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card
Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card The Rise of MongoDB Summary One of today s growing database
More informationLearning CouchDB A Non-Relational Alternative to Data Persistence for Modern Software Applications
Learning CouchDB A Non-Relational Alternative to Data Persistence for Modern Software Applications Is CouchDB a good choice for your application? Problem: The Object-Relational Impedance Mismatch [1] How
More informationFunc%onal Programming in Scheme and Lisp
Func%onal Programming in Scheme and Lisp http://www.lisperati.com/landoflisp/ Overview In a func(onal programming language, func(ons are first class objects You can create them, put them in data structures,
More informationScaling MongoDB: Avoiding Common Pitfalls. Jon Tobin Senior Systems
Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs www.linkedin.com/in/jonathanetobin Agenda Document Design Data Management Replica3on & Failover
More informationRe- op&mizing Data Parallel Compu&ng
Re- op&mizing Data Parallel Compu&ng Sameer Agarwal Srikanth Kandula, Nicolas Bruno, Ming- Chuan Wu, Ion Stoica, Jingren Zhou UC Berkeley A Data Parallel Job can be a collec/on of maps, A Data Parallel
More informationRAD, Rules, and Compatibility: What's Coming in Kuali Rice 2.0
software development simplified RAD, Rules, and Compatibility: What's Coming in Kuali Rice 2.0 Eric Westfall - Indiana University JASIG 2011 For those who don t know Kuali Rice consists of mul8ple sub-
More informationMatt Ingenthron. Couchbase, Inc.
Matt Ingenthron Couchbase, Inc. 2 What is Membase? Before: Application scales linearly, data hits wall Application Scales Out Just add more commodity web servers Database Scales Up Get a bigger, more complex
More informationNoSQL data stores and SOS: Uniform Access to Non-Relational Database Systems Paolo Atzeni Francesca Bugiotti Luca Rossi
NoSQL data stores and SOS: Uniform Access to Non-Relational Database Systems Paolo Atzeni Francesca Bugiotti Luca Rossi Outline Context Rela&onal DBMS NoSQL Data Stores NoSQL Timeline NoSQL Data Stores
More informationIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce Who Am I - Ryan Tabora - Data Developer at Think Big Analytics - Big Data Consulting - Experience working with Hadoop, HBase, Hive, Solr, Cassandra, etc. 2 Who Am I -
More informationOracle VM Workshop Applica>on Driven Virtualiza>on
Oracle VM Workshop Applica>on Driven Virtualiza>on Simon COTER Principal Product Manager Oracle VM & VirtualBox simon.coter@oracle.com hnps://blogs.oracle.com/scoter November 25th, 2015 Copyright 2014
More informationArchitecture of So-ware Systems Massively Distributed Architectures Reliability, Failover and failures. Mar>n Rehák
Architecture of So-ware Systems Massively Distributed Architectures Reliability, Failover and failures Mar>n Rehák Mo>va>on Internet- based business models imposed new requirements on computa>onal architectures
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationCapabilities of Cloudant NoSQL Database IBM Corporation
Capabilities of Cloudant NoSQL Database After you complete this section, you should understand: The features of the Cloudant NoSQL Database: HTTP RESTfulAPI Secondary indexes and MapReduce Cloudant Query
More informationNODE.JS SERVER SIDE JAVASCRIPT. Introduc)on Node.js
NODE.JS SERVER SIDE JAVASCRIPT Introduc)on Node.js Node.js was created by Ryan Dahl starting in 2009. For more information visit: http://www.nodejs.org 1 What about Node.js? 1. JavaScript used in client-side
More informationSplout SQL When Big Data Output is also Big Data
Iván de Prado Alonso CEO of Datasalt www.datasalt.es @ivanprado @datasalt Splout SQL When Big Data Output is also Big Data Big Data consulting & training Full SQL * * Within each par??on Full SQL * Unlike
More informationSubmitted to: Dr. Sunnie Chung. Presented by: Sonal Deshmukh Jay Upadhyay
Submitted to: Dr. Sunnie Chung Presented by: Sonal Deshmukh Jay Upadhyay Submitted to: Dr. Sunny Chung Presented by: Sonal Deshmukh Jay Upadhyay What is Apache Survey shows huge popularity spike for Apache
More informationAgenda. Request- Level Parallelism. Agenda. Anatomy of a Web Search. Google Query- Serving Architecture 9/20/10
Agenda CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaHerson hhp://inst.eecs.berkeley.edu/~cs61c/fa10 Request and Data Level Parallelism Administrivia
More informationToday s Lecture. CS 61C: Great Ideas in Computer Architecture (Machine Structures) Map Reduce
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Map Reduce 8/29/12 Instructors Krste Asanovic, Randy H. Katz hgp://inst.eecs.berkeley.edu/~cs61c/fa12 Fall 2012 - - Lecture #3 1 Today
More informationLatest Trends in Database Technology NoSQL and Beyond
Latest Trends in Database Technology NoSQL and Beyond Sebas>an Marsching www.aquenos.com Why we want more than SQL Performance / Data Size Opera>onal Costs Availability 2 NoSQL NoSQL Not Only SQL 3 NoSQL
More informationApril Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.
1. MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. MapReduce is a framework for processing big data which processes data in two phases, a Map
More informationJune 20, 2017 Revision NoSQL Database Architectural Comparison
June 20, 2017 Revision 0.07 NoSQL Database Architectural Comparison Table of Contents Executive Summary... 1 Introduction... 2 Cluster Topology... 4 Consistency Model... 6 Replication Strategy... 8 Failover
More informationDeveloping MapReduce Programs
Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes
More informationTools for Social Networking Infrastructures
Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes
More informationIntro to Couchbase Server for ColdFusion - Clustered NoSQL and Caching at its Finest
Tweet Intro to Couchbase Server for ColdFusion - Clustered NoSQL and Caching at its Finest Brad Wood Jul 26, 2013 Today we are starting a new blogging series on how to leverage Couchbase NoSQL from ColdFusion
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationMongoDB Revs You Up: What Storage Engine is Right for You?
MongoDB Revs You Up: What Storage Engine is Right for You? Jon Tobin, Director of Solution Eng. --------------------- Jon.Tobin@percona.com @jontobs Linkedin.com/in/jonathanetobin Agenda How did we get
More informationDATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016
DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016 AGENDA FOR TODAY Advanced Mysql More than just SELECT Creating tables MySQL optimizations: Storage engines, indexing.
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016
Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation
More informationNoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,
More informationMapReduce. Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan
MapReduce Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan The Problem Massive amounts of data >100TB (the internet) Needs simple processing Computers
More informationDistributed Non-Relational Databases. Pelle Jakovits
Distributed Non-Relational Databases Pelle Jakovits Tartu, 7 December 2018 Outline Relational model NoSQL Movement Non-relational data models Key-value Document-oriented Column family Graph Non-relational
More informationVoldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data
More informationIntroduction to MapReduce
732A54 Big Data Analytics Introduction to MapReduce Christoph Kessler IDA, Linköping University Towards Parallel Processing of Big-Data Big Data too large to be read+processed in reasonable time by 1 server
More informationBe warned Niklas Gustavsson
1 Niklas Gustavsson niklas.gustavsson@callistaenterprise.se www.callistaenterprise.se Be warned CouchDB, Slide 2 2 Won't replace your relational database You (probably) won't be using it any time soon
More informationOp#mizing MapReduce for Highly- Distributed Environments
Op#mizing MapReduce for Highly- Distributed Environments Abhishek Chandra Associate Professor Department of Computer Science and Engineering University of Minnesota hep://www.cs.umn.edu/~chandra 1 Big
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large
More information@ COUCHBASE CONNECT. Using Couchbase. By: Carleton Miyamoto, Michael Kehoe Version: 1.1w LinkedIn Corpora3on
@ COUCHBASE CONNECT Using Couchbase By: Carleton Miyamoto, Michael Kehoe Version: 1.1w Overview The LinkedIn Story Enter Couchbase Development and Opera3ons Clusters and Numbers Opera3onal Tooling Carleton
More informationOpenWorld 2015 Oracle Par22oning
OpenWorld 2015 Oracle Par22oning Did You Think It Couldn t Get Any Be6er? Safe Harbor Statement The following is intended to outline our general product direc2on. It is intended for informa2on purposes
More informationNon-Relational Databases. Pelle Jakovits
Non-Relational Databases Pelle Jakovits 25 October 2017 Outline Background Relational model Database scaling The NoSQL Movement CAP Theorem Non-relational data models Key-value Document-oriented Column
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationECS 165B: Database System Implementa6on Lecture 3
ECS 165B: Database System Implementa6on Lecture 3 UC Davis April 4, 2011 Acknowledgements: some slides based on earlier ones by Raghu Ramakrishnan, Johannes Gehrke, Jennifer Widom, Bertram Ludaescher,
More informationCS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab
CS6030 Cloud Computing Ajay Gupta B239, CEAS Computer Science Department Western Michigan University ajay.gupta@wmich.edu 276-3104 1 Acknowledgements I have liberally borrowed these slides and material
More information26- April- 2010, Spring Member Mee4ng Chris Hyzer, Grouper developer
26- April- 2010, Spring Member Mee4ng Chris Hyzer, Grouper developer XMPP integra4on XMPP and the Grouper loader XMPP and the Grouper client Kuali Rice integra4on Rice groups Rice subjects Automa4c workflow
More informationIntroduc)on to. CS60092: Informa0on Retrieval
Introduc)on to CS60092: Informa0on Retrieval Ch. 4 Index construc)on How do we construct an index? What strategies can we use with limited main memory? Sec. 4.1 Hardware basics Many design decisions in
More informationHigh Performance Drupal
High Performance Drupal A Panel Discussion 25 1 st St., Suite 104, Cambridge, MA 02141 www.bioraft.com Panelists Erik Peterson (eporama) Seth Cohn (sethcohn) Micky MeMs (freescholar) Patrick CorbeM (pcorbem)
More informationElas%c Load Balancing, Amazon CloudWatch, and Auto Scaling Sco) Linder
Elas%c Load Balancing, Amazon, and Auto Scaling Sco) Linder Overview Elas4c Load Balancing Features/Restric4ons Connec4on Types Listeners Configura4on Op4ons Auto Scaling Launch Configura4ons Scaling Types
More informationHadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391
Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391 Outline Big Data Big Data Examples Challenges with traditional storage NoSQL Hadoop HDFS MapReduce Architecture 2 Big Data In information
More informationRule 14 Use Databases Appropriately
Rule 14 Use Databases Appropriately Rule 14: What, When, How, and Why What: Use relational databases when you need ACID properties to maintain relationships between your data. For other data storage needs
More informationFunc%onal Programming in Scheme and Lisp
Func%onal Programming in Scheme and Lisp http://www.lisperati.com/landoflisp/ Overview In a func(onal programming language, func(ons are first class objects You can create them, put them in data structures,
More informationMul$media Networking. #9 CDN Solu$ons Semester Ganjil 2012 PTIIK Universitas Brawijaya
Mul$media Networking #9 CDN Solu$ons Semester Ganjil 2012 PTIIK Universitas Brawijaya Schedule of Class Mee$ng 1. Introduc$on 2. Applica$ons of MN 3. Requirements of MN 4. Coding and Compression 5. RTP
More informationStay Informed During and AEer OpenWorld
Stay Informed During and AEer OpenWorld TwiIer: @OracleBigData, @OracleExadata, @Infrastructure Follow #CloudReady LinkedIn: Oracle IT Infrastructure Oracle Showcase Page Oracle Big Data Oracle Showcase
More informationScaling DreamFactory
Scaling DreamFactory This white paper is designed to provide information to enterprise customers about how to scale a DreamFactory Instance. The sections below talk about horizontal, vertical, and cloud
More informationSecuring Hadoop. Keys Botzum, MapR Technologies Jan MapR Technologies - Confiden6al
Securing Hadoop Keys Botzum, MapR Technologies kbotzum@maprtech.com Jan 2014 MapR Technologies - Confiden6al 1 Why Secure Hadoop Historically security wasn t a high priority Reflec6on of the type of data
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationManaging IoT and Time Series Data with Amazon ElastiCache for Redis
Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All
More informationMapReduce. Stony Brook University CSE545, Fall 2016
MapReduce Stony Brook University CSE545, Fall 2016 Classical Data Mining CPU Memory Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining
More informationStream and Complex Event Processing Discovering Exis7ng Systems: esper
Stream and Complex Event Processing Discovering Exis7ng Systems: esper G. Cugola E. Della Valle A. Margara Politecnico di Milano gianpaolo.cugola@polimi.it emanuele.dellavalle@polimi.it Univ. della Svizzera
More information