Using Erlang, Riak and the ORSWOT CRDT at bet365 for Scalability and Performance. Michael Owen Research and Development Engineer
|
|
- Karin Green
- 5 years ago
- Views:
Transcription
1 1
2 Using Erlang, Riak and the ORSWOT CRDT at bet365 for Scalability and Performance Michael Owen Research and Development Engineer 2
3 Background 3
4 bet365 in stats Founded in 2000 Located in Stoke-on-Trent The largest online sports betting company Over 19 million customers One of the largest private companies in the UK Employs more than 2,000 people : Over 26 billion was staked Last year is likely to be around 25% up Business growing very rapidly! Very technology focused company 4
5 bet365 technology stats Over 500 employees within technology 60 million per year IT budget Fifteen datacentres in seven countries worldwide 100Gb capable private dark fibre network 9 upstream ISPs 150 Gigabits of aggregated edge bandwidth 25 Gbps and 6M HTTP requests/sec at peak Around 1 to 1.5 million markets on site at any time 18 languages supported Push systems burst to 100,000 changes per second Almost all this change generated via automated models Database systems running at > 500K TPS at peak Over 2.5 million concurrent users of our push systems We stream more live sport than anyone else in Europe 5
6 Production systems using Erlang and Riak Cash-out A system used by customers to close out bets early. Stronger An online transaction processing (OLTP) data layer. 6
7 Why Erlang and Riak? 7
8 Our historical technology stack Very pragmatic What would deliver a quality product to market in record time Mostly.NET with some Java middleware Lot and lots of SQL Server 8
9 But we needed to change Complexity of code and systems Needed to make better use of multi-core CPUs Needed to scale out Could no longer scale our SQL infrastructure Had scaled up and out as far as we could Lack of scalability caused undue stress on the infrastructure Lead to loss of availability 9
10 Erlang Adoption 10
11 Erlang Key learnings You can get a lot done in a short space of time. A plus and a minus! Tooling is limited Hot code upgrades with state can be hard Dependency management could be better Get as much visibility into the system as possible e.g. stats, data etc Use OTP / reuse proven code Keep to standards (e.g. code layout etc) 11
12 Erlang Key learnings Check your supervision tree Message passing is a double edged sword Keep state small (e.g. gen_server state) Binaries and GC: Heap vs reference-counted Explore whether you need the Transparent Huge Pages (THP) feature in the Linux kernel Don t use error_logger as your main logger Validate all data coming into the Erlang system at the edge 12
13 Riak Adoption 13
14 Riak Brief overview Key value store Inspired by the Dynamo paper Traditionally, an eventually consistent system (AP from CAP) Riak 2.0+: Introduction of a strongly consistent option (CP from CAP) A Riak cluster is a 160-bit integer space the Riak Ring Split into partitions a virtual node (vnode) is responsible for each one A vnode lives on one of the Riak nodes Data is stored in a number of partitions (n_val setting default 3) Consistent hashing technique helps with identifying the partitions for putting and getting the data 14
15 Riak Why? Open source aspect Uses Erlang Based on solid ideas Horizontally scalable Highly available Masterless: No global locks performance is predictable Designed to be operationally simple Support and community exists 15
16 Riak Key learnings Eventually consistent: Eventually the data will converge to the consistent value Keep in mind: A get/read may return an old value A put/write may be accepted for a key at the same time as another concurrent put for the same key in the cluster (i.e. no global locking) Bend your problem! E.g. Look at it from another side Data model for your use case i.e. normalisation isn t key: Trade off puts vs gets for your use case No bigger object sizes than 1MB Store data in a structure which helps with version upgrades Riak Enterprise Multi-Datacenter Replication + Support 16
17 Riak Key learnings Consult Riak s System Performance Tuning documentation Different internode network vs inbound Monitor network and disk usage Use the riak-admin diag command Use Basho Bench to load test your cluster For bitcask backend: load test merging and tune for your use case Setting log_needs_merge to true will help with this tuning Allow siblings (i.e. allow_mult = true) With resolving siblings asap 17
18 Riak What are siblings? A sibling happens when Riak does not know which value is the causally recent (E.g. because of concurrent puts/writes) Uses version vectors to know this Version vector A is Concurrent to version vector B (as opposed to Descends or Dominates) Explained later Referenced as vector clocks in the Riak documentation should have been named version vectors Talk: A Brief History of Time in Riak. Sean Cribbs (Basho). RICON Similar logic, however: Vector clocks is about tracking events to a computation Version vectors is about tracking updates to data replicas Riak 2.0 introduced the option of dotted version vectors instead Similar idea to the ORSWOT CRDT dot functionality (explained later) Reduces potential number of siblings (i.e. causality tracking is more accurate) -> limits sibling explosion All sibling values are returned (i.e. more than one) Big difference to the normal experience with SQL type data stores 18
19 Riak allow_mult=false You can set allow_mult to false (i.e. no siblings to the client) with: last_write_wins set to false Uses version vectors. In conflict, the sibling with the highest timestamp wins. last_write_wins set to true Doesn't use version vectors new value overwrites current value However, not recommended* because of potential data loss Network problems Reading: Fallacies of Distributed Computing Explained, Arnon Rotem-Gal-Oz Complexity of time synchronisation across servers (speed of light, machines fail etc) Reading: There is No Now - Problems with simultaneity in distributed systems, Justin Sheehy * Perhaps for immutable data with separate unique keys 19
20 Riak Dealing with siblings Sibling values are returned on a get request Need to have a merge function to produce the correct value The merge function should be deterministic by having the following properties: Associativity Order of the merge function being applied to the data doesn t matter as long as the sequence of the data items is not changed Commutativity Order into the merge function does not matter Idempotence Merge function applied twice to the same value results in the same value Can be hard to get right Can lead to possible data loss and incorrect data 20
21 CRDTs 21
22 CRDTs What are they? Conflict-Free Replicated Data Types Can be: Operation based: Commutative Replicated Data Types State based: Convergent Replicated Data Types Reduces complexity by having no client side siblings But still having no data loss Readings: A comprehensive study of Convergent and Commutative Replicated Data Types. Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski Conflict-free replicated data types. Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski CRDTs: Consistency without concurrency control. Mihai Letia, Nuno Preguiça, Marc Shapiro
23 CRDTs Operation based Commutative Replicated Data Types All replicas of the data are sent operational updates Relies more on a good network and reliably delivering updates Knowing the current true membership is more important 23
24 CRDTs State based Convergent Replicated Data Types Data is locally updated, sent to replicas and merged Update function must be monotonically increasing Generally easier to understand than operation based Easier to have an elastic membership of replicas However, more data is sent around the network 24
25 CRDTs Types Different data type implementations exist for: Counters Sets Maps For our core use case the Sets data type made sense However, we were and are using Riak 1.4+ and a Sets CRDT isn t available Introduced in Riak 2.0+ At the time, Riak 2.0+ wasn t even a release candidate 25
26 CRDTs riak_dt We decided to use the riak_dt dependency and integrate it ourselves into our system using Riak Apache License Version 2.0 ( Different set based implementations exist (all state based CRDT s): G-Set: Grow only set i.e. no remove OR-Set: Observe Remove Set Able to add and remove However, when an element is removed a tombstone still exists i.e. Size problem ORSWOT: Observe Remove Set Without Tombstones Able to add and remove, but doesn t have tombstones In a concurrent add and remove of the same element > add-wins Reading: An optimized conflict-free replicated set. Annette Bieniusa, Marek Zawirski, Nuno Preguiça, Marc Shapiro, Carlos Baquero, Valter Balegas, Sérgio Duarte For our core use case we chose to use the ORSWOT 26
27 CRDTs ORSWOT overview Example: ORSWOT entry ORSWOT element (i.e. data) << KEY_NAME >> = { [ {x,1}, {y,2} ], [ {<< Data1 >>, [ {x,1} ] }, {<< Data2 >>, [ {y,1} ] }, {<< Data3 >>, [ {y,2} ] } ] } ORSWOT entries (a list for the purpose of example) Value i.e. the ORSWOT Version vector exists as part of the ORSWOT: List of tuples Each tuple being {UniqueActorName, Counter} ORSWOT operations happen through an unique actor Similar but a different instance of version vector to what will also exist for the overall key/value Dots exist for each element in the ORSWOT set: Just a minimal version vector Each dot pair represents a tuple in the ORSWOT version vector at a particular point Usually a list of one Can be a list of more than one when, for example, two ORSWOTs for two concurrent adds of the same element (i.e. using two different unique actors) are merged 27
28 CRDTs ORSWOT overview Adding an element: Version vector as part of the ORSWOT incremented Counter for unique actor being used is incremented (or set as 1 if not currently in the version vector) The updated {UniqueActorName, Counter} pair is stored with the element as its dots If an entry already exists in the ORSWOT for the element, this is replaced Example: Adding << Data2 >> using unique actor y to the existing ORSWOT: { [ {x,1} ], [ {<< Data1 >>, [ {x,1} ] } ] } Results in the new ORSWOT: { [ {x,1}, {y,1} ], [ {<< Data1 >>, [ {x,1} ] }, {<< Data2 >>, [ {y,1} ] } ] } and ORSWOT value (i.e. ignoring metadata / what a client would be interested in) of: [ <<"Data1">>, <<"Data2">> ] 28
29 CRDTs ORSWOT overview Removing an element: Version vector as part of the ORSWOT does not change Of course when the put happens to store the ORSWOT s updated value, the version vector for the overall key/value is incremented The elements entry is simply removed from the ORSWOT i.e. No tombstones Example: Removing << Data1 >> from the existing ORSWOT: { [ {x,1}, {y,1} ], [ {<< Data1 >>, [ {x,1} ] }, {<< Data2 >>, [ {y,1} ] } ] } Results in the new ORSWOT: { [ {x,1}, {y,1} ], [ {<< Data2 >>, [ {y,1} ] } ] } * Further options do exist, such as being able to delay removes E.g. the ORSWOT object doing a remove might have not seen the original add for the element yet (e.g. because of being a replica not merged with the add yet) Would also add further logic to the merge operation 29
30 CRDTs ORSWOT overview Merging ORSWOT s: ORSWOT A and ORSWOT B E.g. because of siblings detected by version vectors for the overall key/value Version vectors for ORSWOT A and ORSWOT B merged i.e. the least possible common descendant of both (*) Elements merged: Common elements only kept if there exists a non-empty dots for them from merging (*): Common dot pairs for the element in ORSWOT A and ORSWOT B Dot pairs for the element only in ORSWOT A where the dot pair count is greater than any count* for the same actor in ORSWOT B s version vector Dot pairs for the element only in ORSWOT B where the dot pair count is greater than any count* for the same actor in ORSWOT A s version vector Elements only in ORSWOT A only kept if there exists a non-empty dots for them after: Keeping only dot pairs for the element where the dot pair count is greater than any count* for the same actor in ORSWOT B s version vector Elements only in ORSWOT B only kept if there exists a non-empty dots for them after: Keeping only dot pairs for the element where the dot pair count is greater than any count* for the same actor in ORSWOT A s version vector * As you might think, if there doesn t exist an appropriate actor pair/count in the version vector then the dot pair is merged/kept 30
31 CRDTs ORSWOT overview Merging example: ORSWOT A: { [ {x,1}, {y,2} ], [ {<< Data1 >>, [ {x,1} ] }, {<< Data2 >>, [ {y,1} ] }, {<< Data3 >>, [ {y,2} ] } ] } Seen: 1. Adding element << Data1 >> via actor x 2. Adding element << Data2 >> via actor y 3. Adding element << Data3 >> via actor y ORSWOT B: { [ {x,1}, {y,1}, {z,2} ], [ {<< Data2 >>, [ {y,1} ] }, {<< Data3 >>, [ {z,1} ] }, {<< Data4 >>, [ {z,2} ] } ] } Seen: 1. Adding element << Data1 >> via actor x 2. Adding element << Data2 >> via actor y 3. Adding element << Data3 >> via actor z 4. Adding element << Data4 >> via actor z 5. Removing element << Data1 >> via actor z Merged ORSWOT: { [ {x,1}, {y,2}, {z,2} ], [ {<< Data2 >>, [ {y,1} ] }, {<< Data3 >>, [ {y,2}, {z,1} ] }, {<< Data4 >>, [ {z,2} ] } ] } 31
32 CRDTs ORSWOT overview Merging example: ORSWOT A: { [ {x,1}, {y,2} ], [ {<< Data1 >>, [ {x,1} ] }, {<< Data2 >>, [ {y,1} ] }, {<< Data3 >>, [ {y,2} ] } ] } ORSWOT B: { [ {x,1}, {y,1}, {z,2} ], [ {<< Data2 >>, [ {y,1} ] }, {<< Data3 >>, [ {z,1} ] }, {<< Data4 >>, [ {z,2} ] } ] } Merged = { [ {x,1}, {y,2}, {z,2} ], [ {<< Data2 >>, [ {y,1} ] }, {<< Data3 >>, [ {y,2}, {z,1} ] }, {<< Data4 >>, [ {z,2} ] } ] } [ {x,1}, {y,2}, {z,2} ] is the least possible common descendant for the version vectors of ORSWOT A and ORSWOT B Common elements: << Data2 >> : ORSWOT s have common dot pair {y,1} no other dot pairs to consider/merge with it => element in << Data3 >> : ORSWOT A has {y,2} which has a greater count than {y,1} in ORSWOT B s version vector [ {x,1}, {y,1}, {z,2} ] ORSWOT B has {z,1} which is included because ORSWOT A s version vector [ {x,1}, {y,2} ] doesn t have a count for z. Therefore {y,2} and {z,1} are merged to give [ {y,2}, {z,1} ] => element in Elements only in ORSWOT A: << Data1 >> : From ORSWOT A, no dots exist from [ {x,1} ] which are greater than ORSWOT B s version vector [ {x,1}, {y,1}, {z,2} ] => element not in Elements only in ORSWOT B: << Data4 >> : From ORSWOT B, dot pair {z,2} for the element is kept because ORSWOT A s version vector [ {x,1}, {y,2} ] doesn t have a count for z => element in 32
33 CRDTs riak_dt integration Erlang middle layer between clients and Riak Middle layer needed to have unique actors (gen_server s) as part of using the ORSWOT implementation With getting a balance on the number of them e.g. due to impacting ORSWOT version vector size Unique actor names pre-defined as part of server setup configuration With making them small e.g. due to impacting ORSWOT version vector size Clients don t have to deal with siblings Middle layer (i.e. client to Riak) does need to bring back siblings Not as good as a Riak server side CRDT (i.e. Riak 2.0+) Still using version vectors on the overall key/value s i.e. to know to do the ORSWOT merge operation 33
34 Getting It Live 34
35 Tooling From day 1 we ate our own dog food Monitoring Performance counters Error reporting (including correlation between systems) Built custom adhoc query tool On replicated Riak cluster Custom reconciliation between new system and old system Created build and release scripts for automation 35
36 Released in phases Able to build on stable ground Able to get data to impact future decisions Built confidence Move functionality to using Erlang and Riak Not all phases were immediately business impacting However, overall able to get business impacting functionality out sooner 36
37 Replay testing Captured logs asynchronously in one phase Common interface between old and new systems Able to do reconciliation between the systems Big range of test data / realistic load profile Used for functional and performance testing Different logs for long weekend run vs quick run Complemented other testing such as specific unit/integration testing, fuzz testing and formal UAT 37
38 Performance testing Done early and repeated Built custom client using replay logs Able to increase load profile easily Used our custom tooling / monitoring Identified bottlenecks / tested horizontal scalability of system Adhoc changes and fed back into development Had a specific profile which could give a fair test between changes Basho Bench used to sanity test Riak cluster setup 38
39 Failure testing Did failure testing Built confidence and understanding Trying to make sure a failure seen in production isn t the first time it s experienced Tested common procedures e.g. taking nodes in and out of service Failures will happen embrace them 39
40 Today with Stronger The project was a success! We have a system which is: Performant and able to deal with many times our peak load Reliable and deals with failure using minimal human intervention We have introduced the business to a number of new technologies We have grown the capabilities of the business and our people 40
41 Questions? 41
42 42
Advanced Databases ( CIS 6930) Fall Instructor: Dr. Markus Schneider. Group 17 Anirudh Sarma Bhaskara Sreeharsha Poluru Ameya Devbhankar
Advanced Databases ( CIS 6930) Fall 2016 Instructor: Dr. Markus Schneider Group 17 Anirudh Sarma Bhaskara Sreeharsha Poluru Ameya Devbhankar BEFORE WE BEGIN NOSQL : It is mechanism for storage & retrieval
More informationStrong Eventual Consistency and CRDTs
Strong Eventual Consistency and CRDTs Marc Shapiro, INRIA & LIP6 Nuno Preguiça, U. Nova de Lisboa Carlos Baquero, U. Minho Marek Zawirski, INRIA & UPMC Large-scale replicated data structures Large, dynamic
More informationSWIFTCLOUD: GEO-REPLICATION RIGHT TO THE EDGE
SWIFTCLOUD: GEO-REPLICATION RIGHT TO THE EDGE Annette Bieniusa T.U. Kaiserslautern Carlos Baquero HSALab, U. Minho Marc Shapiro, Marek Zawirski INRIA, LIP6 Nuno Preguiça, Sérgio Duarte, Valter Balegas
More informationGenie. Distributed Systems Synthesis and Verification. Marc Rosen. EN : Advanced Distributed Systems and Networks May 1, 2017
Genie Distributed Systems Synthesis and Verification Marc Rosen EN.600.667: Advanced Distributed Systems and Networks May 1, 2017 1 / 35 Outline Introduction Problem Statement Prior Art Demo How does it
More informationWhat Came First? The Ordering of Events in
What Came First? The Ordering of Events in Systems @kavya719 kavya the design of concurrent systems Slack architecture on AWS systems with multiple independent actors. threads in a multithreaded program.
More informationrelational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance
WHITEPAPER Relational to Riak relational Introduction This whitepaper looks at why companies choose Riak over a relational database. We focus specifically on availability, scalability, and the / data model.
More informationStrong Eventual Consistency and CRDTs
Strong Eventual Consistency and CRDTs Marc Shapiro, INRIA & LIP6 Nuno Preguiça, U. Nova de Lisboa Carlos Baquero, U. Minho Marek Zawirski, INRIA & UPMC Large-scale replicated data structures Large, dynamic
More informationConflict-free Replicated Data Types (CRDTs)
Conflict-free Replicated Data Types (CRDTs) for collaborative environments Marc Shapiro, INRIA & LIP6 Nuno Preguiça, U. Nova de Lisboa Carlos Baquero, U. Minho Marek Zawirski, INRIA & UPMC Conflict-free
More informationJargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems
Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons
More informationRiak. Distributed, replicated, highly available
INTRO TO RIAK Riak Overview Riak Distributed Riak Distributed, replicated, highly available Riak Distributed, highly available, eventually consistent Riak Distributed, highly available, eventually consistent,
More informationEventually Consistent HTTP with Statebox and Riak
Eventually Consistent HTTP with Statebox and Riak Author: Bob Ippolito (@etrepum) Date: November 2011 Venue: QCon San Francisco 2011 1/62 Introduction This talk isn't really about web. It's about how we
More informationDEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!
DEMYSTIFYING BIG DATA WITH RIAK USE CASES Martin Schneider Basho Technologies! Agenda Defining Big Data in Regards to Riak A Series of Trade-Offs Use Cases Q & A About Basho & Riak Basho Technologies is
More informationEventual Consistency Today: Limitations, Extensions and Beyond
Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley - Nomchin Banga Outline Eventual Consistency: History and Concepts How eventual is eventual consistency?
More informationCS 655 Advanced Topics in Distributed Systems
Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3
More informationLessons learnt re-writing a PubSub system. Chandru Mullaparthi - Principal Software Architect at bet365
1 Lessons learnt re-writing a PubSub system Chandru Mullaparthi - Principal Software Architect at bet365 2 About Founded in 2000 Located in Stoke-on-Trent The largest online sports betting company Over
More informationSpotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014
Cassandra @ Spotify Scaling storage to million of users world wide! Jimmy Mårdell October 14, 2014 2 About me Jimmy Mårdell Tech Product Owner in the Cassandra team 4 years at Spotify
More informationConflict-Free Replicated Data Types (basic entry)
Conflict-Free Replicated Data Types (basic entry) Marc Shapiro Sorbonne-Universités-UPMC-LIP6 & Inria Paris http://lip6.fr/marc.shapiro/ 16 May 2016 1 Synonyms Conflict-Free Replicated Data Types (CRDTs).
More informationPutting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt
Putting together the platform: Riak, Redis, Solr and Spark Bryan Hunt 1 $ whoami Bryan Hunt Client Services Engineer @binarytemple 2 Minimum viable product - the ideologically correct doctrine 1. Start
More informationCS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Dynamo CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Dynamo Highly available and scalable distributed data store Manages state of services that have high reliability and
More informationBuilding High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL
Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high
More informationWhat am I? Bryan Hunt Basho Client Services Engineer Erlang neophyte JVM refugee Be gentle
What am I? Bryan Hunt Basho Client Services Engineer Erlang neophyte JVM refugee Be gentle What are you? Developer Operations Other Structure of this talk Introduction to Riak Introduction to Riak 2.0
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL
More informationSelf-healing Data Step by Step
Self-healing Data Step by Step Uwe Friedrichsen (codecentric AG) NoSQL matters Cologne, 29. April 2014 @ufried Uwe Friedrichsen uwe.friedrichsen@codecentric.de http://slideshare.net/ufried http://ufried.tumblr.com
More informationWhy distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data?"
Why distributed databases suck, and what to do about it - Regaining consistency Do you want a database that goes down or one that serves wrong data?" 1 About the speaker NoSQL team lead at Trifork, Aarhus,
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationImproving Logical Clocks in Riak with Dotted Version Vectors: A Case Study
Improving Logical Clocks in Riak with Dotted Version Vectors: A Case Study Ricardo Gonçalves Universidade do Minho, Braga, Portugal, tome@di.uminho.pt Abstract. Major web applications need the partition-tolerance
More informationScaling Out Key-Value Storage
Scaling Out Key-Value Storage COS 418: Distributed Systems Logan Stafman [Adapted from K. Jamieson, M. Freedman, B. Karp] Horizontal or vertical scalability? Vertical Scaling Horizontal Scaling 2 Horizontal
More information(It s the invariants, stupid)
Consistency in three dimensions (It s the invariants, stupid) Marc Shapiro Cloud to the edge Social, web, e-commerce: shared mutable data Scalability replication consistency issues Inria & Sorbonne-Universités
More informationCoordination-Free Computations. Christopher Meiklejohn
Coordination-Free Computations Christopher Meiklejohn LASP DISTRIBUTED, EVENTUALLY CONSISTENT COMPUTATIONS CHRISTOPHER MEIKLEJOHN (BASHO TECHNOLOGIES, INC.) PETER VAN ROY (UNIVERSITÉ CATHOLIQUE DE LOUVAIN)
More informationCMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22
More information#IoT #BigData. 10/31/14
#IoT #BigData Seema Jethani @seemaj @basho 1 10/31/14 Why should we care? 2 11/2/14 Source: http://en.wikipedia.org/wiki/internet_of_things Motivation for Specialized Big Data Systems Rate of data capture
More informationModern Database Concepts
Modern Database Concepts Basic Principles Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz NoSQL Overview Main objective: to implement a distributed state Different objects stored on different
More informationEventual Consistency Today: Limitations, Extensions and Beyond
Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley Presenter: Yifei Teng Part of slides are cited from Nomchin Banga Road Map Eventual Consistency:
More informationINF-5360 Presentation
INF-5360 Presentation Optimistic Replication Ali Ahmad April 29, 2013 Structure of presentation Pessimistic and optimistic replication Elements of Optimistic replication Eventual consistency Scheduling
More informationSelective Hearing: An Approach to Distributed, Eventually Consistent Edge Computation
Selective Hearing: An Approach to Distributed, Eventually Consistent Edge Computation Christopher Meiklejohn Basho Technologies, Inc. Bellevue, WA Email: cmeiklejohn@basho.com Peter Van Roy Université
More informationPresented By: Devarsh Patel
: Amazon s Highly Available Key-value Store Presented By: Devarsh Patel CS5204 Operating Systems 1 Introduction Amazon s e-commerce platform Requires performance, reliability and efficiency To support
More informationDynamo: Amazon s Highly Available Key-Value Store
Dynamo: Amazon s Highly Available Key-Value Store DeCandia et al. Amazon.com Presented by Sushil CS 5204 1 Motivation A storage system that attains high availability, performance and durability Decentralized
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in
More informationBloom: Big Systems, Small Programs. Neil Conway UC Berkeley
Bloom: Big Systems, Small Programs Neil Conway UC Berkeley Distributed Computing Programming Languages Data prefetching Register allocation Loop unrolling Function inlining Optimization Global coordination,
More informationHorizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage
Horizontal or vertical scalability? Scaling Out Key-Value Storage COS 418: Distributed Systems Lecture 8 Kyle Jamieson Vertical Scaling Horizontal Scaling [Selected content adapted from M. Freedman, B.
More information10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein
10. Replication CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety Copyright 2012 Philip A. Bernstein 1 Outline 1. Introduction 2. Primary-Copy Replication 3. Multi-Master Replication 4.
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system
More information10. Replication. Motivation
10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure
More informationSCALABLE CONSISTENCY AND TRANSACTION MODELS
Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:
More informationEvaluating Cloud Databases for ecommerce Applications. What you need to grow your ecommerce business
Evaluating Cloud Databases for ecommerce Applications What you need to grow your ecommerce business EXECUTIVE SUMMARY ecommerce is the future of not just retail but myriad industries from telecommunications
More informationJust-Right Consistency. Centralised data store. trois bases. As available as possible As consistent as necessary Correct by design
Just-Right Consistency As available as possible As consistent as necessary Correct by design Marc Shapiro, UMC-LI6 & Inria Annette Bieniusa, U. Kaiserslautern Nuno reguiça, U. Nova Lisboa Christopher Meiklejohn,
More informationMap-Reduce. Marco Mura 2010 March, 31th
Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of
More informationCSE-E5430 Scalable Cloud Computing Lecture 10
CSE-E5430 Scalable Cloud Computing Lecture 10 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 23.11-2015 1/29 Exam Registering for the exam is obligatory,
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.
More informationConsistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016
Consistency in Distributed Storage Systems Mihir Nanavati March 4 th, 2016 Today Overview of distributed storage systems CAP Theorem About Me Virtualization/Containers, CPU microarchitectures/caches, Network
More informationDistributed Key Value Store Utilizing CRDT to Guarantee Eventual Consistency
Distributed Key Value Store Utilizing CRDT to Guarantee Eventual Consistency CPSC 416 Project Proposal n6n8: Trevor Jackson, u2c9: Hayden Nhan, v0r5: Yongnan (Devin) Li, x5m8: Li Jye Tong Introduction
More informationFrom Relational to Riak
www.basho.com From Relational to Riak December 2012 Table of Contents Table of Contents... 1 Introduction... 1 Why Migrate to Riak?... 1 The Requirement of High Availability...1 Minimizing the Cost of
More informationScaling for Humongous amounts of data with MongoDB
Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com From here... http://bit.ly/ot71m4 ...to here... http://bit.ly/oxcsis
More informationDistributed Data Management Replication
Felix Naumann F-2.03/F-2.04, Campus II Hasso Plattner Institut Distributing Data Motivation Scalability (Elasticity) If data volume, processing, or access exhausts one machine, you might want to spread
More informationStorage Optimization with Oracle Database 11g
Storage Optimization with Oracle Database 11g Terabytes of Data Reduce Storage Costs by Factor of 10x Data Growth Continues to Outpace Budget Growth Rate of Database Growth 1000 800 600 400 200 1998 2000
More informationKey-Value Stores: RiakKV
B4M36DS2, BE4M36DS2: Database Systems 2 h p://www.ksi.m.cuni.cz/~svoboda/courses/181-b4m36ds2/ Lecture 7 Key-Value Stores: RiakKV Mar n Svoboda mar n.svoboda@fel.cvut.cz 12. 11. 2018 Charles University,
More informationDISTRIBUTED EVENTUALLY CONSISTENT COMPUTATIONS
LASP 2 DISTRIBUTED EVENTUALLY CONSISTENT COMPUTATIONS 3 EN TAL AV CHRISTOPHER MEIKLEJOHN 4 RESEARCH WITH: PETER VAN ROY (UCL) 5 MOTIVATION 6 SYNCHRONIZATION IS EXPENSIVE 7 SYNCHRONIZATION IS SOMETIMES
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationEventual Consistency 1
Eventual Consistency 1 Readings Werner Vogels ACM Queue paper http://queue.acm.org/detail.cfm?id=1466448 Dynamo paper http://www.allthingsdistributed.com/files/ amazon-dynamo-sosp2007.pdf Apache Cassandra
More informationConsistency Without Transactions Global Family Tree
Consistency Without Transactions Global Family Tree NoSQL Matters Cologne Spring 2014 2014 by Intellectual Reserve, Inc. All rights reserved. 1 Contents Introduction to FamilySearch Family Tree Motivation
More informationDYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun
DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:
More informationThe What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)
The What, Why and How of the Pure Storage Enterprise Flash Array Ethan L. Miller (and a cast of dozens at Pure Storage) Enterprise storage: $30B market built on disk Key players: EMC, NetApp, HP, etc.
More informationSQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden
SQL, NoSQL, MongoDB CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden SQL Databases Really better called Relational Databases Key construct is the Relation, a.k.a. the table Rows represent records Columns
More informationKey-Value Stores: RiakKV
NDBI040: Big Data Management and NoSQL Databases h p://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 4 Key-Value Stores: RiakKV Mar n Svoboda svoboda@ksi.mff.cuni.cz 25. 10. 2016 Charles
More informationCLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters
CLUSTERING HIVEMQ Building highly available, horizontally scalable MQTT Broker Clusters 12/2016 About this document MQTT is based on a publish/subscribe architecture that decouples MQTT clients and uses
More informationScaleArc for SQL Server
Solution Brief ScaleArc for SQL Server Overview Organizations around the world depend on SQL Server for their revenuegenerating, customer-facing applications, running their most business-critical operations
More informationCloud-backed applications. Write Fast, Read in the Past: Causal Consistency for Client- Side Applications. Requirements & challenges
Write Fast, Read in the Past: ausal onsistenc for lient- Side Applications loud-backed applications 1. Request 3. Repl 2. Process request & store update 4. Transmit update 50~200ms databas e app server
More informationOracle Data Warehousing Pushing the Limits. Introduction. Case Study. Jason Laws. Principal Consultant WhereScape Consulting
Oracle Data Warehousing Pushing the Limits Jason Laws Principal Consultant WhereScape Consulting Introduction Oracle is the leading database for data warehousing. This paper covers some of the reasons
More informationIntroduction Storage Processing Monitoring Review. Scaling at Showyou. Operations. September 26, 2011
Scaling at Showyou Operations September 26, 2011 I m Kyle Kingsbury Handle aphyr Code http://github.com/aphyr Email kyle@remixation.com Focus Backend, API, ops What the hell is Showyou? Nontrivial complexity
More informationDatacenter replication solution with quasardb
Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION
More informationConflict-free Replicated Data Types in Practice
Conflict-free Replicated Data Types in Practice Georges Younes Vitor Enes Wednesday 11 th January, 2017 HASLab/INESC TEC & University of Minho InfoBlender Motivation Background Background: CAP Theorem
More informationCS November 2017
Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationDistributed File Systems
Distributed File Systems Today l Basic distributed file systems l Two classical examples Next time l Naming things xkdc Distributed File Systems " A DFS supports network-wide sharing of files and devices
More informationCS Amazon Dynamo
CS 5450 Amazon Dynamo Amazon s Architecture Dynamo The platform for Amazon's e-commerce services: shopping chart, best seller list, produce catalog, promotional items etc. A highly available, distributed
More informationImportant Lessons. Today's Lecture. Two Views of Distributed Systems
Important Lessons Replication good for performance/ reliability Key challenge keeping replicas up-to-date Wide range of consistency models Will see more next lecture Range of correctness properties L-10
More informationCS November 2018
Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationBUILDING A SCALABLE MOBILE GAME BACKEND IN ELIXIR. Petri Kero CTO / Ministry of Games
BUILDING A SCALABLE MOBILE GAME BACKEND IN ELIXIR Petri Kero CTO / Ministry of Games MOBILE GAME BACKEND CHALLENGES Lots of concurrent users Complex interactions between players Persistent world with frequent
More informationData Replication CS 188 Distributed Systems February 3, 2015
Data Replication CS 188 Distributed Systems February 3, 2015 Page 1 Some Other Possibilities What if the machines sharing files are portable and not always connected? What if the machines communicate across
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationKey-Value Stores: RiakKV
B4M36DS2: Database Systems 2 h p://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-b4m36ds2/ Lecture 4 Key-Value Stores: RiakKV Mar n Svoboda svoboda@ksi.mff.cuni.cz 24. 10. 2016 Charles University in Prague,
More informationOutline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles
INF3190:Distributed Systems - Examples Thomas Plagemann & Roman Vitenberg Outline Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles Today: Examples Googel File System (Thomas)
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [DYNAMO & GOOGLE FILE SYSTEM] Frequently asked questions from the previous class survey What s the typical size of an inconsistency window in most production settings? Dynamo?
More informationECE Engineering Robust Server Software. Spring 2018
ECE590-02 Engineering Robust Server Software Spring 2018 Business Continuity: Disaster Recovery Tyler Bletsch Duke University Includes material adapted from the course Information Storage and Management
More informationMY CONVERSATION HAS RUN DRY
PARTITION TOLERANCE MY CONVERSATION HAS RUN DRY Many systems degrade, or otherwise change state, under partition BRING THE PIECES BACK TOGETHER REDISCOVER COMMUNICATION A EXAMPLE ANPLICATION 5 clients
More informationTraditional RDBMS Wisdom is All Wrong -- In Three Acts. Michael Stonebraker
Traditional RDBMS Wisdom is All Wrong -- In Three Acts Michael Stonebraker The Stonebraker Says Webinar Series The first three acts: 1. Why main memory is the answer for OLTP Recording available at VoltDB.com
More informationIntroduction to Distributed Data Systems
Introduction to Distributed Data Systems Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook January
More informationLarge-Scale Key-Value Stores Eventual Consistency Marco Serafini
Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,
More informationDesigning for Scalability. Patrick Linskey EJB Team Lead BEA Systems
Designing for Scalability Patrick Linskey EJB Team Lead BEA Systems plinskey@bea.com 1 Patrick Linskey EJB Team Lead at BEA OpenJPA Committer JPA 1, 2 EG Member 2 Agenda Define and discuss scalability
More informationSolidFire and Ceph Architectural Comparison
The All-Flash Array Built for the Next Generation Data Center SolidFire and Ceph Architectural Comparison July 2014 Overview When comparing the architecture for Ceph and SolidFire, it is clear that both
More informationBig Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline
More informationTwo Success Stories - Optimised Real-Time Reporting with BI Apps
Oracle Business Intelligence 11g Two Success Stories - Optimised Real-Time Reporting with BI Apps Antony Heljula October 2013 Peak Indicators Limited 2 Two Success Stories - Optimised Real-Time Reporting
More informationFLAT DATACENTER STORAGE CHANDNI MODI (FN8692)
FLAT DATACENTER STORAGE CHANDNI MODI (FN8692) OUTLINE Flat datacenter storage Deterministic data placement in fds Metadata properties of fds Per-blob metadata in fds Dynamic Work Allocation in fds Replication
More informationA Guide to Architecting the Active/Active Data Center
White Paper A Guide to Architecting the Active/Active Data Center 2015 ScaleArc. All Rights Reserved. White Paper The New Imperative: Architecting the Active/Active Data Center Introduction With the average
More information10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline
10. Replication CSEP 545 Transaction Processing Philip A. Bernstein Copyright 2003 Philip A. Bernstein 1 Outline 1. Introduction 2. Primary-Copy Replication 3. Multi-Master Replication 4. Other Approaches
More informationExample File Systems Using Replication CS 188 Distributed Systems February 10, 2015
Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015 Page 1 Example Replicated File Systems NFS Coda Ficus Page 2 NFS Originally NFS did not have any replication capability
More informationHeckaton. SQL Server's Memory Optimized OLTP Engine
Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability
More informationConfiguration changes such as conversion from a single instance to RAC, ASM, etc.
Today, enterprises have to make sizeable investments in hardware and software to roll out infrastructure changes. For example, a data center may have an initiative to move databases to a low cost computing
More informationInside the PostgreSQL Shared Buffer Cache
Truviso 07/07/2008 About this presentation The master source for these slides is http://www.westnet.com/ gsmith/content/postgresql You can also find a machine-usable version of the source code to the later
More informationExtending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants
Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants Valter Balegas, Diogo Serra, Sérgio Duarte Carla Ferreira, Rodrigo Rodrigues, Nuno Preguiça NOVA LINCS/FCT/Universidade
More information