introduction to riak 1.2 Good data need good storage POWERED BY SmartRecruiters

Size: px
Start display at page:

Download "introduction to riak 1.2 Good data need good storage POWERED BY SmartRecruiters"

Transcription

1 introduction to riak 1.2 Good data need good storage POWERED BY SmartRecruiters

2 pawel

3 hiringwww.smartrecruiters.com made easy.

4

5 introduction to riak 1.2 how we do it at smartrecruiters what is your data closer look at riak 1.2 basic concepts configuration management java client and API overview searching with riak

6 I won't say much about mapreduce links walking secondary Indexes alternative persistent store backend Bitcask, InnoDB, LevelDB, Memory and more managing cluster command line tools (not all) multiple distribution riak, riak enterprise, riak cs

7 at smartrecruiters one oracle instance for storing data. mongodb with replica for storing data. one solr instance for search. one neo4j graph instance for relations. there is no much data - we are not facebook - yet!

8 mongodb is great replication and fault tolerance better performance better locking auto sharding read/write concerns for tuning it has embedded mongo for testing good clients we are moving away from morphia to mongo-jackson-mapper

9 mongodb is great but it is a bit complex when you need it to scale

10 preparing for growth

11 what is your data model? - how do you represent data? - can model objects be bound in size? - do you have lots of concurrent updates? - can you detect conflicts and resolve them in runtime? - is your data logically monotonic?

12 logically monotonic data we can model data in way it is always possible to merge without losing data ex. two objects representing keywords mentioned today on twitter can be merged into one objects with all unique keywords ;] ex. two objects representing keywords not mentioned yesterday on twitter cannot be merged... :/

13 a key-value distributed store

14

15 BASIC CONCEPTS distributed, highly available system

16 CAP theorem It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees Consistency (all nodes see the same data at the same time) Availability (a guarantee that every request receives a response about whether it was successful or failed) Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) You need to choose two! By default Riak team had chosen Availability and Partition tolerance in expense of Consistency.

17 CAP theorem hopefully you can tune this behaviour, see tunable CAP...

18 nodes, vnodes and partitions in this example cluster consists: 32 partitions in cluster (this is configurable) 4 physical nodes 8 vnodes per node 8 partitions per node

19 vnodes and replication object is replicated to N separate partitions on the Riak Ring HINTED HANDOFF - if target vnode is unavailable Riak will try to choose other node which will spawn temporary process to replace failed node and keep operating as usual. this temporary process will try to communicate with failed node and pass data.

20 vnodes and replication For cases where the number of nodes is less than the N value, data will likely be duplicated on some nodes. For example, with N=3 and 2 nodes in the cluster, one node will likely have one replica, and the other node will have two replicas. In general N copies of each piece of data will be stored in the cluster. There are no guarantees that the N replicas will go to N separate physical nodes. Riak will rebalance/re-replicate data after each successful read (if is needed) so called read repair. No master node to manage read/write operation.

21 replication in depth 1. each bucket/key pair is hashed by node handling the request 2. node is requesting a list of preferred vnodes for the hash 3. node is choosing first N vnodes and sends object to parent node of the vnode 4. if node is unavailable requesting node gets next vnodes from preferred list and sends the same info to the next node - HINTED HANDOFF 5. if operation is successful, the requesting node responds to client

22 architecture build on top of erlang ecosystem erlang distribution needs to be installed separately

23 node configuration riak/etc/app.config [ %% Riak Client APIs config {riak_api, [ {pb_ip, " " }, {pb_port, 8087}]}, %% Riak Core config {riak_core, [ %% Default location of ringstate {ring_state_dir, "./data/ring"}, {http, [{" ", 8098}]}, {https, [{" ", 8090}]}, {ssl, [ {certfile, "./etc/cert.pem"}, {keyfile, "./etc/key.pem"} ]}, {handoff_port, 8099 },... ]}, %% Riak KV config {riak_kv, [ {storage_backend, riak_kv_bitcask_backend},... ]

24 node configuration riak/etc/vm.args ## Name of the riak node -name ## Cookie for distributed erlang. All nodes in the same cluster ## should use the same cookie or they will not be able to communicate. -setcookie riak67c e5c9de9bd24f8da591d5f7 ## Heartbeat management; auto-restarts VM if it dies or becomes unresponsive ## (Disabled by default..use with caution!) ##-heart ## Enable kernel poll and a few async threads +K true +A 64 ## Treat error_logger warnings as warnings +W w...

25 buckets organise keys and values in buckets, think of it as tables in SQL

26 default bucket configuration riak/etc/app.config default bucket values [ %% Riak Core config {riak_core, [ {default_bucket_props, [ { n_val,3}, { allow_mult,false}, { last_write_wins,false}, {r, all}, %% n {w, quorum}, %% n/2+1 {dw, 1}, {rw, 2}, {precommit, []}, {postcommit, []}, {chash_keyfun, {riak_core_util, chash_std_keyfun}}, {linkfun, {modfun, riak_kv_wm_link_walker, mapreduce_linkfun}} ]} ]} ] more

27 working with buckets curl -v -X PUT -H "Content-Type: application/json" -d '{"props":{"n_val":2, "dw":1, "w":2, "r":1}}' curl -v curl -v curl -v more in general there is no limit to buckets number. that is: buckets comes for free as long as default configuration is used, no need to gossip custom bucket's properties.

28 tunable CAP N default to 3 - number of replicas R, W, DW, RW - defaults to quorum PR, PW - defaults to 0 rw - quorum for both operations (get and put) involved in deleting an object (default is set at the bucket level) r - (read quorum) how many replicas need to agree when retrieving the object w - (write quorum) how many vnodes must confirm receiving write request before returning a successful response dw - (durable write quorum) how many replicas to commit to durable storage before returning a successful response pr - (primary read quorum) works like r but requires that the nodes read from are not fallback nodes pw - (primary write quorum) how many replicas to commit to primary nodes before returning a successful response possible values 1..N, one, all (N), quorum (N/2 + 1)

29 tunable CAP how many down nodes cluster can tolerate when performing operations? Nodes N R W store operation read operation

30 MANAGEMENT

31 Web console riak/etc/app.config [ {riak_core, [ {https, [{ " ", 8090 }]}, {ssl, [ {certfile, "./etc/cert.pem"}, {keyfile, "./etc/key.pem"} ]} ]}, {riak_control, [ %% Set to false to disable the admin panel. {enabled, true}, {auth, userlist}, %% If auth is set to 'userlist' then this is the %% list of usernames and passwords for access to the %% admin panel. {userlist, [{"user", "pass"}]} ]} ] more

32

33

34 cluster status curl -v or riak/bin/riak-admin status node_gets Number of GETs coordinated by this node, including GETs to non-local vnodes on this node within the last minute node_gets_total Number of GET operations coordinated by vnodes on this node since node was started node_puts Number of PUTs coordinated by this node, including PUTs to non-local vnodes on this node within the last minute node_puts_total Number of PUT operations coordinated by vnodes on this node since node was started vnode gets Number of GET operations coordinated by vnodes on this node within the last minute vnode_gets_total Number of GET operations coordinated by vnodes on this node since node was started vnode_puts Number of PUT operations coordinated by vnodes on this node within the last minute vnode_puts_total Number of PUT operations coordinated by vnodes on this node since node was started read_repairs Number of read repair operations this this node has coordinated in the last minute read_repairs_total: Number of read repair operations this node has coordinated since node was started request-response time statisitics number of siblings statisitics object size statistics system cpu/threads/memory usage statistics cluster operating statistics more

35 cluster backup centralized backups riak/bin/riak-admin backup riak1.srv riak67c e5c9de9bd24f8da591d5f7 log all riak/bin/riak-admin restore riak1.srv riak67c e5c9de9bd24f8da591d5f7 log all one node backup - copy/archive files from following directories riak/data/bitcask riak/data/ring riak/etc

36 DEVELOPMENT fetching and storing

37 HTTP API basic api to communicate with cluster, supports: storing objects fetching objects deleting objects searching objects link walking map reduce

38 GET Object curl -v HTTP/ OK X-Riak-Vclock: a85hygbgzgdkbvicnmcuy5reyvfd6ejivcwa= Vary: Accept-Encoding Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted it blue) Link: </riak/cars>; rel="up" Last-Modified: Wed, 03 Oct :58:23 GMT ETag: "3mf1qgFRZiIL02ilCVZHVr" Date: Fri, 05 Oct :00:14 GMT Content-Type: application/json; charset=utf-8 Content-Length: 390

39 PUT/POST Object curl -v -X PUT -d '{"name":"baz", friends:["max"]}' -H "Content-Type: application/json" -H "X-Riak-Vclock: IszMk55zKYEhnzWBlKIniO8mUBAA==" -H "X-Riak-Meta-MyCustomHeader:.." returnbody=true&r=2&w=5&dw=1 to generate unique around cluster, sortable keys with POST key can be auto generated Riak generates keys using an Erlang-generated unique ID and a timestamp hashed with SHA-1 and base-62 encoded for URL safety. Thus Riak doesn't make any guarantees of uniqueness for keys it generates, but will be extremely unlikely to get collisions.

40 java PB* client * final PBClientConfig config = new PBClientConfig.Builder().withPort( 8097).withConnectionTimeoutMillis( 200).build(); // round robined between nodes for requests and retries final PBClusterConfig pbclusterconfig = new PBClusterConfig(200); pbclusterconfig.addhosts(config, " ", " ",..); final IRiakClient riakclient = RiakFactory.newClient(pbClusterConfig); riakclient.setclientid( ClientId.generate()); final Bucket storebucket = riakclient.createbucket( "cars").allowsiblings( true).lazyloadbucketproperties().execute(); Car carsobject = //create any Car object carsobject = storebucket. store(carsobject). withoutfetch().returnbody( true).execute(); final Bucket fetchbucket = riakclient.fetchbucket( "cars").execute(); final Car fetched = bucket. fetch(new Car(id)).execute();

41 helpers com.basho.riak.client.cap.* public interface Retrier { <T> T attempt(callable<t> command) throws RiakRetryFailedException; } public interface Converter<T> { IRiakObject fromdomain(t domainobject, VClock vclock) throws ConversionException; T todomain(iriakobject riakobject) throws ConversionException; } public interface ConflictResolver<T> { T resolve(final Collection<T> siblings); } public interface Mutation<T> { T apply(t original); }

42 balancing requests? YEP:)

43 DEVELOPMENT commit hooks

44 hooks works like triggers in good old SQL RDBMS can be triggered before or after object is persisted hooks can modify object before persisting prevent object from being persisted allow object to be persisted post-commit hooks shouldn't modify target object hooks are stored per bucket in its properties pre-commit hooks can be written in javascript or in erlang post-commit hooks can be written in erlang only %% Riak Core config {riak_core, [ {default_bucket_props, [ {n_val,3}, {allow_mult,false}, {precommit, [ { "mod": "riak_search_kv_hook", "fun": "precommit" } ]}, {postcommit, []}, ]}

45 DEVELOPMENT searching and Yokozuna

46 search hooks search index is persisted on different vnodes than data can be stored data is stored in merge-index inspired by Lucene file format, Bitcask (Riak KV), and SSTables (Google's file structure) Riak Search uses timestamps, rather than vector clocks, for performance reasons

47 search hooks Riak Search does not use quorum values when writing (indexing) data. The data is written in a fire and forget model. Riak Search does use hinted-handoff to remain write-available when a node goes offline Riak Search does not use quorum values when reading (querying) data. Only one copy of the data is read, and the partition is chosen based on what will create the most efficient query plan overall

48 search hooks Riak Search indexes currently have no form of anti-entropy (such as read-repair). Furthermore, for performance and load balancing reasons, Search reads from 1 random node. This means that when a replica loss has occurred, inconsistent results may be returned.

49 configuration search is disabled by default %% Riak Search Config {riak_search, [ %% To enable Search functionality set this 'true'. {enabled, false} ]}, %% Merge Index Config {merge_index, [ %% The root dir to store search merge_index data {data_root, "./data/merge_index"}, %% Size, in bytes, of the in-memory buffer. {buffer_rollover_size, }, %% This is the maximum number of segments that will be %% compacted at once. {max_compact_segments, 20} ]}, next install hook for bucket riak/bin/search-cmd install cars OR curl -X PUT -H "content-type:application/json " /riak/friend -d {"props":{"precommit":[{"mod":"riak_search_kv_hook ","fun":" precommit"}]}}

50 indexing curl -v -X PUT friend/ d '{"name":"baz", friends:{"name": "max milus", "age_int":"21st"}}' -H "Content-Type: application/json" <?xml version="1.0" encoding="utf-8"?> <response> <lst name="responseheader"> <int name="status">0</int> <int name="qtime">21</int> <lst name="params"> <str name="indent">on</str> <str name="start">0</str> <str name="q">mil*</str> <str name="q.op">or</str> <str name="filter"></str> <str name="df">friends_name</str> <str name="wt">standard</str> <str name="version">1.1</str> <str name="rows">1</str> </lst> </lst><result name="response" numfound="1" start="0"> <doc> <str name="id"> </str> <str name="name">baz</str> <str name="friends_name">max milus</str> <str name="friends_age">21</str> </doc>

51 schema riak/bin/search-cmd show-schema riak/bin/search-cmd show-schema friend FIELDNAME_num - Numeric field. Uses Integer analyzer. Values are padded to 10 characters. FIELDNAME_int - Numeric field. Uses Integer analyzer. Values are padded to 10 characters. FIELDNAME_dt - Date field. Uses No-Op analyzer. FIELDNAME_date - Date field. Uses No-Op analyzer. FIELDNAME_txt - Full text field. Uses Standard Analyzer. FIELDNAME_text - Full text field. Uses Standard Analyzer. All other fields use the Whitespace analyzer. { schema, [ {version, "1.1"}, {n_val, 1}, {default_field, "friends_name"}, {analyzer_factory, {erlang, text_analyzers, noop_analyzer_factory}} ], [ %% Field names ending in "_num" are indexed as integers {dynamic_field, [ {name, "*_num"}, {type, integer}, {skip, false}, {analyzer_factory, {erlang, text_analyzers, integer_analyzer_factory}} ]},

52 analyzers white space analyzer including spaces, tabs, newlines, carriage returns, etc. standard analyzer uses dictionary to split english sentences integer analyzer no-op analyzer doesn't tokenize a field custom analyzer written in Erlang NOTE: white space analyzer preserves capitalization and punctuation

53 data types JSON Content-Type: application/json { "name":"alyssa P. Hacker", "bio":"i'm an engineer, making awesome things.", "favorites":{ "book":"the Moon is a Harsh Mistress", "album":"magical Mystery Tour" } } indexed fields: name, bio, favorites_book, favorites_album XML Content-Type: application/xml PLAIN TEXT Content-Type: text/plain ERLANG TERM Content-Type: application/x-erlang CUSTOM FORMAT

54 search quick overview Support for various mime types (JSON, XML, plain text, Erlang, Erlang binaries) for automatic data extraction Support for various analyzers (to break text into tokens) including a white space analyzer, an integer analyzer, and a no-op analyzer Robust, easy-to-use query language Exact match queries Wildcards Inclusive/exclusive range queries o AND/OR/NOT support Grouping Prefix matching Proximity searches Term boosting Solr-like interface via HTTP (not Solr compatible) Protocol buffers interface Scoring and ranking for most relevant results Search queries as input for MapReduce jobs

55 Yokozuna overview very new project - just released its alpha architecture Jetty/Solr operating system process per node. All vnodes on a node share that one instance. includes Solr handles all JVM management for you.

56 how yammer did it? com/blog/technical/2011/03/28/riakand-scala-at-yammer/

57 FIN Questions?

Hello User, Welcome to Database. Joe DeVivo works for Basho, tweets and put some relevant files at github.com/joedevivo/ricon

Hello User, Welcome to Database. Joe DeVivo works for Basho, tweets and put some relevant files at github.com/joedevivo/ricon Hello User, Welcome to Database Joe DeVivo works for Basho, tweets at @joedevivo, and put some relevant files at github.com/joedevivo/ricon Go ahead, solve CAP. If someone downloads your amazing database

More information

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies! DEMYSTIFYING BIG DATA WITH RIAK USE CASES Martin Schneider Basho Technologies! Agenda Defining Big Data in Regards to Riak A Series of Trade-Offs Use Cases Q & A About Basho & Riak Basho Technologies is

More information

Cuttlefish. easing the pain of erlang application configuration. Joe DeVivo

Cuttlefish. easing the pain of erlang application configuration. Joe DeVivo Cuttlefish easing the pain of erlang application configuration! Joe DeVivo erlanger @ basho @joedevivo app.config (sys.config)? vm.args That s how it happened with Riak %% - *- tab- width: 4;erlang- indent-

More information

Riak. Distributed, replicated, highly available

Riak. Distributed, replicated, highly available INTRO TO RIAK Riak Overview Riak Distributed Riak Distributed, replicated, highly available Riak Distributed, highly available, eventually consistent Riak Distributed, highly available, eventually consistent,

More information

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014 Cassandra @ Spotify Scaling storage to million of users world wide! Jimmy Mårdell October 14, 2014 2 About me Jimmy Mårdell Tech Product Owner in the Cassandra team 4 years at Spotify

More information

#IoT #BigData. 10/31/14

#IoT #BigData.  10/31/14 #IoT #BigData Seema Jethani @seemaj @basho 1 10/31/14 Why should we care? 2 11/2/14 Source: http://en.wikipedia.org/wiki/internet_of_things Motivation for Specialized Big Data Systems Rate of data capture

More information

What am I? Bryan Hunt Basho Client Services Engineer Erlang neophyte JVM refugee Be gentle

What am I? Bryan Hunt Basho Client Services Engineer Erlang neophyte JVM refugee Be gentle What am I? Bryan Hunt Basho Client Services Engineer Erlang neophyte JVM refugee Be gentle What are you? Developer Operations Other Structure of this talk Introduction to Riak Introduction to Riak 2.0

More information

Key-Value Stores: RiakKV

Key-Value Stores: RiakKV B4M36DS2: Database Systems 2 h p://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-b4m36ds2/ Lecture 4 Key-Value Stores: RiakKV Mar n Svoboda svoboda@ksi.mff.cuni.cz 24. 10. 2016 Charles University in Prague,

More information

From Relational to Riak

From Relational to Riak www.basho.com From Relational to Riak December 2012 Table of Contents Table of Contents... 1 Introduction... 1 Why Migrate to Riak?... 1 The Requirement of High Availability...1 Minimizing the Cost of

More information

Advanced Databases ( CIS 6930) Fall Instructor: Dr. Markus Schneider. Group 17 Anirudh Sarma Bhaskara Sreeharsha Poluru Ameya Devbhankar

Advanced Databases ( CIS 6930) Fall Instructor: Dr. Markus Schneider. Group 17 Anirudh Sarma Bhaskara Sreeharsha Poluru Ameya Devbhankar Advanced Databases ( CIS 6930) Fall 2016 Instructor: Dr. Markus Schneider Group 17 Anirudh Sarma Bhaskara Sreeharsha Poluru Ameya Devbhankar BEFORE WE BEGIN NOSQL : It is mechanism for storage & retrieval

More information

Key-Value Stores: RiakKV

Key-Value Stores: RiakKV NDBI040: Big Data Management and NoSQL Databases h p://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 4 Key-Value Stores: RiakKV Mar n Svoboda svoboda@ksi.mff.cuni.cz 25. 10. 2016 Charles

More information

June 20, 2017 Revision NoSQL Database Architectural Comparison

June 20, 2017 Revision NoSQL Database Architectural Comparison June 20, 2017 Revision 0.07 NoSQL Database Architectural Comparison Table of Contents Executive Summary... 1 Introduction... 2 Cluster Topology... 4 Consistency Model... 6 Replication Strategy... 8 Failover

More information

Key-Value Stores: RiakKV

Key-Value Stores: RiakKV B4M36DS2, BE4M36DS2: Database Systems 2 h p://www.ksi.m.cuni.cz/~svoboda/courses/181-b4m36ds2/ Lecture 7 Key-Value Stores: RiakKV Mar n Svoboda mar n.svoboda@fel.cvut.cz 12. 11. 2018 Charles University,

More information

Course Content MongoDB

Course Content MongoDB Course Content MongoDB 1. Course introduction and mongodb Essentials (basics) 2. Introduction to NoSQL databases What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL

More information

relational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance

relational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance WHITEPAPER Relational to Riak relational Introduction This whitepaper looks at why companies choose Riak over a relational database. We focus specifically on availability, scalability, and the / data model.

More information

CS Amazon Dynamo

CS Amazon Dynamo CS 5450 Amazon Dynamo Amazon s Architecture Dynamo The platform for Amazon's e-commerce services: shopping chart, best seller list, produce catalog, promotional items etc. A highly available, distributed

More information

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons

More information

EPL660: Information Retrieval and Search Engines Lab 3

EPL660: Information Retrieval and Search Engines Lab 3 EPL660: Information Retrieval and Search Engines Lab 3 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Apache Solr Popular, fast, open-source search platform built

More information

Eventually Consistent HTTP with Statebox and Riak

Eventually Consistent HTTP with Statebox and Riak Eventually Consistent HTTP with Statebox and Riak Author: Bob Ippolito (@etrepum) Date: November 2011 Venue: QCon San Francisco 2011 1/62 Introduction This talk isn't really about web. It's about how we

More information

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement

More information

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data

More information

Dynamo: Amazon s Highly Available Key-Value Store

Dynamo: Amazon s Highly Available Key-Value Store Dynamo: Amazon s Highly Available Key-Value Store DeCandia et al. Amazon.com Presented by Sushil CS 5204 1 Motivation A storage system that attains high availability, performance and durability Decentralized

More information

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

Distributed Data Management Replication

Distributed Data Management Replication Felix Naumann F-2.03/F-2.04, Campus II Hasso Plattner Institut Distributing Data Motivation Scalability (Elasticity) If data volume, processing, or access exhausts one machine, you might want to spread

More information

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,

More information

10. Replication. Motivation

10. Replication. Motivation 10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

More information

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY WHAT IS NOSQL? Stands for No-SQL or Not Only SQL. Class of non-relational data storage systems E.g.

More information

The course modules of MongoDB developer and administrator online certification training:

The course modules of MongoDB developer and administrator online certification training: The course modules of MongoDB developer and administrator online certification training: 1 An Overview of the Course Introduction to the course Table of Contents Course Objectives Course Overview Value

More information

CIB Session 12th NoSQL Databases Structures

CIB Session 12th NoSQL Databases Structures CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is

More information

SCALING COUCHDB WITH BIGCOUCH. Adam Kocoloski Cloudant

SCALING COUCHDB WITH BIGCOUCH. Adam Kocoloski Cloudant SCALING COUCHDB WITH BIGCOUCH Adam Kocoloski Cloudant Erlang Factory SF Bay Area 2011 OUTLINE Introductions Brief intro to CouchDB BigCouch Usage Overview BigCouch Internals Reports from the Trenches 2

More information

Scaling for Humongous amounts of data with MongoDB

Scaling for Humongous amounts of data with MongoDB Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com From here... http://bit.ly/ot71m4 ...to here... http://bit.ly/oxcsis

More information

NoSQL Databases Analysis

NoSQL Databases Analysis NoSQL Databases Analysis Jeffrey Young Intro I chose to investigate Redis, MongoDB, and Neo4j. I chose Redis because I always read about Redis use and its extreme popularity yet I know little about it.

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Migrating to Cassandra in the Cloud, the Netflix Way

Migrating to Cassandra in the Cloud, the Netflix Way Migrating to Cassandra in the Cloud, the Netflix Way Jason Brown - @jasobrown Senior Software Engineer, Netflix Tech History, 1998-2008 In the beginning, there was the webapp and a single database in a

More information

Non-Relational Databases. Pelle Jakovits

Non-Relational Databases. Pelle Jakovits Non-Relational Databases Pelle Jakovits 25 October 2017 Outline Background Relational model Database scaling The NoSQL Movement CAP Theorem Non-relational data models Key-value Document-oriented Column

More information

Why NoSQL? Why Riak?

Why NoSQL? Why Riak? Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense? Riak Voldemort HBase MongoDB Neo4j Cassandra CouchDB Membase Redis (and the list goes on...) 2 What went wrong with

More information

Extreme Computing. NoSQL.

Extreme Computing. NoSQL. Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

Implementing Riak in Erlang: Benefits and Challenges

Implementing Riak in Erlang: Benefits and Challenges Implementing Riak in Erlang: Benefits and Challenges Steve Vinoski Basho Technologies Cambridge, MA USA http://basho.com @stevevinoski vinoski@ieee.org http://steve.vinoski.net/ Erlang Erlang Started in

More information

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores CSE 444: Database Internals Lectures 26 NoSQL: Extensible Record Stores CSE 444 - Spring 2014 1 References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No. 4)

More information

Introduction Storage Processing Monitoring Review. Scaling at Showyou. Operations. September 26, 2011

Introduction Storage Processing Monitoring Review. Scaling at Showyou. Operations. September 26, 2011 Scaling at Showyou Operations September 26, 2011 I m Kyle Kingsbury Handle aphyr Code http://github.com/aphyr Email kyle@remixation.com Focus Backend, API, ops What the hell is Showyou? Nontrivial complexity

More information

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence.

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence. SCALABLE DATABASES From Relational Databases To Polyglot Persistence Sergio Bossa sergio.bossa@gmail.com http://twitter.com/sbtourist About Me Software architect and engineer Gioco Digitale (online gambling

More information

/ Cloud Computing. Recitation 8 October 18, 2016

/ Cloud Computing. Recitation 8 October 18, 2016 15-319 / 15-619 Cloud Computing Recitation 8 October 18, 2016 1 Overview Administrative issues Office Hours, Piazza guidelines Last week s reflection Project 3.2, OLI Unit 3, Module 13, Quiz 6 This week

More information

MySQL Cluster An Introduction

MySQL Cluster An Introduction MySQL Cluster An Introduction Geert Vanderkelen O Reilly MySQL Conference & Expo 2010 Apr. 13 2010 In this presentation we'll introduce you to MySQL Cluster. We ll go through the MySQL server, the storage

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

Why distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data?"

Why distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data? Why distributed databases suck, and what to do about it - Regaining consistency Do you want a database that goes down or one that serves wrong data?" 1 About the speaker NoSQL team lead at Trifork, Aarhus,

More information

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures Lecture 20 -- 11/20/2017 BigTable big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures what does paper say Google

More information

Intuitive distributed algorithms. with F#

Intuitive distributed algorithms. with F# Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype

More information

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems Fall 2017 Exam 3 Review Paul Krzyzanowski Rutgers University Fall 2017 December 11, 2017 CS 417 2017 Paul Krzyzanowski 1 Question 1 The core task of the user s map function within a

More information

CA485 Ray Walshe NoSQL

CA485 Ray Walshe NoSQL NoSQL BASE vs ACID Summary Traditional relational database management systems (RDBMS) do not scale because they adhere to ACID. A strong movement within cloud computing is to utilize non-traditional data

More information

What Came First? The Ordering of Events in

What Came First? The Ordering of Events in What Came First? The Ordering of Events in Systems @kavya719 kavya the design of concurrent systems Slack architecture on AWS systems with multiple independent actors. threads in a multithreaded program.

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.

More information

Distributed Non-Relational Databases. Pelle Jakovits

Distributed Non-Relational Databases. Pelle Jakovits Distributed Non-Relational Databases Pelle Jakovits Tartu, 7 December 2018 Outline Relational model NoSQL Movement Non-relational data models Key-value Document-oriented Column family Graph Non-relational

More information

Distributed Computation Models

Distributed Computation Models Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case

More information

Putting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt

Putting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt Putting together the platform: Riak, Redis, Solr and Spark Bryan Hunt 1 $ whoami Bryan Hunt Client Services Engineer @binarytemple 2 Minimum viable product - the ideologically correct doctrine 1. Start

More information

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage Horizontal or vertical scalability? Scaling Out Key-Value Storage COS 418: Distributed Systems Lecture 8 Kyle Jamieson Vertical Scaling Horizontal Scaling [Selected content adapted from M. Freedman, B.

More information

Scaling Out Key-Value Storage

Scaling Out Key-Value Storage Scaling Out Key-Value Storage COS 418: Distributed Systems Logan Stafman [Adapted from K. Jamieson, M. Freedman, B. Karp] Horizontal or vertical scalability? Vertical Scaling Horizontal Scaling 2 Horizontal

More information

Piqi-RPC. Exposing Erlang services via JSON, XML and Google Protocol Buffers over HTTP. Friday, March 25, 2011

Piqi-RPC. Exposing Erlang services via JSON, XML and Google Protocol Buffers over HTTP. Friday, March 25, 2011 Piqi-RPC Exposing Erlang services via JSON, XML and Google Protocol Buffers over HTTP 1 Anton Lavrik http://piqi.org http://www.alertlogic.com 2 Overview Call Erlang functions using HTTP POST : Server

More information

GFS-python: A Simplified GFS Implementation in Python

GFS-python: A Simplified GFS Implementation in Python GFS-python: A Simplified GFS Implementation in Python Andy Strohman ABSTRACT GFS-python is distributed network filesystem written entirely in python. There are no dependencies other than Python s standard

More information

Big Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

Big Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI Big Data Development CASSANDRA NoSQL Training - Workshop November 20 to 24 2016 (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 9798 Dubai UAE, email training-coordinator@isidusnet

More information

Google File System 2

Google File System 2 Google File System 2 goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) focus on multi-gb files handle appends efficiently (no random writes & sequential reads) co-design

More information

There is a tempta7on to say it is really used, it must be good

There is a tempta7on to say it is really used, it must be good Notes from reviews Dynamo Evalua7on doesn t cover all design goals (e.g. incremental scalability, heterogeneity) Is it research? Complexity? How general? Dynamo Mo7va7on Normal database not the right fit

More information

Transactions and ACID

Transactions and ACID Transactions and ACID Kevin Swingler Contents Recap of ACID transactions in RDBMSs Transactions and ACID in MongoDB 1 Concurrency Databases are almost always accessed by multiple users concurrently A user

More information

Cassandra- A Distributed Database

Cassandra- A Distributed Database Cassandra- A Distributed Database Tulika Gupta Department of Information Technology Poornima Institute of Engineering and Technology Jaipur, Rajasthan, India Abstract- A relational database is a traditional

More information

MMS Backup Manual Release 1.4

MMS Backup Manual Release 1.4 MMS Backup Manual Release 1.4 MongoDB, Inc. Jun 27, 2018 MongoDB, Inc. 2008-2016 2 Contents 1 Getting Started with MMS Backup 4 1.1 Backing up Clusters with Authentication.................................

More information

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 138: Dynamo CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Dynamo Highly available and scalable distributed data store Manages state of services that have high reliability and

More information

6.830 Lecture Spark 11/15/2017

6.830 Lecture Spark 11/15/2017 6.830 Lecture 19 -- Spark 11/15/2017 Recap / finish dynamo Sloppy Quorum (healthy N) Dynamo authors don't think quorums are sufficient, for 2 reasons: - Decreased durability (want to write all data at

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [DYNAMO & GOOGLE FILE SYSTEM] Frequently asked questions from the previous class survey What s the typical size of an inconsistency window in most production settings? Dynamo?

More information

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,

More information

MongoDB - a No SQL Database What you need to know as an Oracle DBA

MongoDB - a No SQL Database What you need to know as an Oracle DBA MongoDB - a No SQL Database What you need to know as an Oracle DBA David Burnham Aims of this Presentation To introduce NoSQL database technology specifically using MongoDB as an example To enable the

More information

Nasuni Data API Nasuni Corporation Boston, MA

Nasuni Data API Nasuni Corporation Boston, MA Nasuni Corporation Boston, MA Introduction The Nasuni API has been available in the Nasuni Filer since September 2012 (version 4.0.1) and is in use by hundreds of mobile clients worldwide. Previously,

More information

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 138: Google CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface

More information

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Bigtable. Presenter: Yijun Hou, Yixiao Peng Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 06 Presenter: Yijun Hou, Yixiao Peng

More information

Mastering phpmyadmiri 3.4 for

Mastering phpmyadmiri 3.4 for Mastering phpmyadmiri 3.4 for Effective MySQL Management A complete guide to getting started with phpmyadmin 3.4 and mastering its features Marc Delisle [ t]open so 1 I community experience c PUBLISHING

More information

Elixir Domain Configuration and Administration

Elixir Domain Configuration and Administration Elixir Domain Configuration and Administration Release 4.0.0 Elixir Technology Pte Ltd Elixir Domain Configuration and Administration: Release 4.0.0 Elixir Technology Pte Ltd Published 2015 Copyright 2015

More information

Distributed PostgreSQL with YugaByte DB

Distributed PostgreSQL with YugaByte DB Distributed PostgreSQL with YugaByte DB Karthik Ranganathan PostgresConf Silicon Valley Oct 16, 2018 1 CHECKOUT THIS REPO: github.com/yugabyte/yb-sql-workshop 2 About Us Founders Kannan Muthukkaruppan,

More information

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. CS 138: Google CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface

More information

Soir 1.4 Enterprise Search Server

Soir 1.4 Enterprise Search Server Soir 1.4 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more David Smiley Eric Pugh *- PUBLISHING -J BIRMINGHAM - MUMBAI Preface

More information

Nasuni Data API Nasuni Corporation Boston, MA

Nasuni Data API Nasuni Corporation Boston, MA Nasuni Corporation Boston, MA Introduction The Nasuni API has been available in the Nasuni Filer since September 2012 (version 4.0.1) and is in use by hundreds of mobile clients worldwide. Previously,

More information

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals References CSE 444: Database Internals Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol 39, No 4) Lectures 26 NoSQL: Extensible Record Stores Bigtable: A Distributed

More information

/ Cloud Computing. Recitation 10 March 22nd, 2016

/ Cloud Computing. Recitation 10 March 22nd, 2016 15-319 / 15-619 Cloud Computing Recitation 10 March 22nd, 2016 Overview Administrative issues Office Hours, Piazza guidelines Last week s reflection Project 3.3, OLI Unit 4, Module 15, Quiz 8 This week

More information

Primary-Backup Replication

Primary-Backup Replication Primary-Backup Replication CS 240: Computing Systems and Concurrency Lecture 7 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Simplified Fault Tolerance

More information

Big Data Analytics. Rasoul Karimi

Big Data Analytics. Rasoul Karimi Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline

More information

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

! Design constraints.  Component failures are the norm.  Files are huge by traditional standards. ! POSIX-like Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total

More information

Accelerating NoSQL. Running Voldemort on HailDB. Sunny Gleason March 11, 2011

Accelerating NoSQL. Running Voldemort on HailDB. Sunny Gleason March 11, 2011 Accelerating NoSQL Running Voldemort on HailDB Sunny Gleason March 11, 2011 whoami Sunny Gleason, human passion: distributed systems engineering previous... Ning : custom social networks Amazon.com : infra

More information

Technical Deep Dive: Cassandra + Solr. Copyright 2012, Think Big Analy7cs, All Rights Reserved

Technical Deep Dive: Cassandra + Solr. Copyright 2012, Think Big Analy7cs, All Rights Reserved Technical Deep Dive: Cassandra + Solr Confiden7al Business case 2 Super scalable realtime analytics Hadoop is fantastic at performing batch analytics Cassandra is an advanced column family oriented system

More information

MarkLogic Server. Database Replication Guide. MarkLogic 9 May, Copyright 2017 MarkLogic Corporation. All rights reserved.

MarkLogic Server. Database Replication Guide. MarkLogic 9 May, Copyright 2017 MarkLogic Corporation. All rights reserved. Database Replication Guide 1 MarkLogic 9 May, 2017 Last Revised: 9.0-3, September, 2017 Copyright 2017 MarkLogic Corporation. All rights reserved. Table of Contents Table of Contents Database Replication

More information

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions Big Data Hadoop Architect Online Training (Big Data Hadoop + Apache Spark & Scala+ MongoDB Developer And Administrator + Apache Cassandra + Impala Training + Apache Kafka + Apache Storm) 1 Big Data Hadoop

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 10: Mutable State (1/2) March 15, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

MongoDB Architecture

MongoDB Architecture VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui MongoDB Architecture Lecturer : Dr. Pavle Mogin SWEN 432 Advanced Database Design and Implementation Advanced Database Design

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

Everything You Need to Know About MySQL Group Replication

Everything You Need to Know About MySQL Group Replication Everything You Need to Know About MySQL Group Replication Luís Soares (luis.soares@oracle.com) Principal Software Engineer, MySQL Replication Lead Copyright 2017, Oracle and/or its affiliates. All rights

More information

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

The NoSQL Ecosystem. Adam Marcus MIT CSAIL The NoSQL Ecosystem Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in The Architecture of Open Source Applications

More information

BigData and Map Reduce VITMAC03

BigData and Map Reduce VITMAC03 BigData and Map Reduce VITMAC03 1 Motivation Process lots of data Google processed about 24 petabytes of data per day in 2009. A single machine cannot serve all the data You need a distributed system to

More information

run your own search engine. today: Cablecar

run your own search engine. today: Cablecar run your own search engine. today: Cablecar Robert Kowalski @robinson_k http://github.com/robertkowalski Search nobody uses that, right? Services on the Market Google Bing Yahoo ask Wolfram Alpha Baidu

More information

2/26/2017. For instance, consider running Word Count across 20 splits

2/26/2017. For instance, consider running Word Count across 20 splits Based on the slides of prof. Pietro Michiardi Hadoop Internals https://github.com/michiard/disc-cloud-course/raw/master/hadoop/hadoop.pdf Job: execution of a MapReduce application across a data set Task:

More information

Bringing Riak to the Mobile Platform

Bringing Riak to the Mobile Platform Bringing Riak to the Mobile Platform Kresten Krab Thorup Hacker @drkrab Outline Riak and it s Data Model RiakSync, a protocol for Key/Value synchronization RiakMobile, Riak clients for mobile What is Riak,

More information

Maintaining the NDS Database

Maintaining the NDS Database Chapter 7 Maintaining the NDS Database Overview..................................................................2 Concepts to Know....................................................... 2 Preserving the

More information

Architekturen für die Cloud

Architekturen für die Cloud Architekturen für die Cloud Eberhard Wolff Architecture & Technology Manager adesso AG 08.06.11 What is Cloud? National Institute for Standards and Technology (NIST) Definition On-demand self-service >

More information