Scalable Wikipedia with Erlang
|
|
- Lynne Parks
- 5 years ago
- Views:
Transcription
1 Scalable Wikipedia with Erlang Thorsten Schütt, Florian Schintke, Alexander Reinefeld Zuse Institute Berlin (ZIB) onscale solutions Thorsten Schütt 1
2 Scaling Web 2.0 Hosting Thorsten Schütt 2
3 1. Step Clients Single Server Webserver DB Server Thorsten Schütt 3
4 2. Step Clients Single Webserver Single DB server Thorsten Schütt 4
5 3. Step Clients Load Balancer n Webservers... Single DB server Thorsten Schütt 5
6 4. Step Clients Load Balancer n Webservers... Master DB server Slave DB servers Thorsten Schütt 6
7 5. Step Clients Load Balancer n Webservers... DB partitioning DB Directory DB clusters n DB servers memcached... Thorsten Schütt 7
8 What did others do? SimpleDB BigTable Dynamo MySQL Cluster MapReduce Google FS Thorsten Schütt 8
9 OurApproach: P2P in thedata Center Clients Key/Value Store (simple DBMS) ACID Transactions Replication P2P Overlay Thorsten Schütt 9
10 ScalableWikipediawithErlang Chord# Transactional Layer Load Balancing Wikipedia Thorsten Schütt 10
11 Chord # = Chord \Hash Thorsten Schütt 11
12 Chord # A dictionaryhas 3 ops: insert(key, value) delete(key) lookup(key) Chord # implements a distributeddictionary dictionary Key Value node A Clarke 2007 node B Allen 2006 Bachman 1973 Each node only Thompson 1983 Knuth 1974 node D node C stores part of the data Codd 1981 Thorsten Schütt 12
13 Chord # Key Space isan arbitraryset witha total order, e.g. strings Wirth Yao Backus Codd Dijkstr a Rivest Flyod Every nodeisassigneda random key Ritchie Gray Nodes form logicalring overthekeyspace Perlis Karp Nygaar d Knuth Naur Minsky Milner Thorsten Schütt 13
14 Distributed Dictionary Items are stored on their successor, i.e. the first node encountered in clockwise direction Thomson on Wirth Allen on Backus Bachman on Backus Clarke on Codd Rivest Wirth Yao Backus Codd (Clarke, 2007) (Bachman, 1973) Ritchie (Thompson, 1983) (Allen, 2006) Thorsten Schütt 14
15 Distributed Dictionary Each node puts the successor and log 2 Nexponentially spaced fingers in its routing table Backus With each routing step using greedy routing, the distance between the currrent node Wirth Yao Codd Dijkstra and the destination is halved. Rivest Flyod => Yields O(log 2 N)hops. Ritchie Gray Perlis Karp Nygaard Knuth Naur Minsky Milner Thorsten Schütt 15
16 Summary: Chord# DHTs/Chord Chord# DHT is a fully decentralized data structure DHTs self-organize as nodes join, leave, and fail All operations only require local knowledge Onlyonehop for calculating a routing pointer, Chord needs log(n) hops. max. log (N)hops, while Chord guarantees this only with high probability. Can adapt to imbalances in the query load; Chord can t. Supports range queries. Thorsten Schütt 16
17 ScalableWikipediawithErlang Chord# Transactional Layer Load Balancing Wikipedia ACID Thorsten Schütt 17
18 Transactions on DHTs are Challenging high churn rate nodes may leave, join, or crash at any time changing responsibilities crash stop fault model no perfect failure detector never know whether a node crashed or just slow network Thorsten Schütt 18
19 Transactions + Replicas START debit (a, 100); deposit (b, 100); COMMIT START debit (a 1, 100); debit (a 2, 100); debit (a 3, 100); deposit (b 1, 100); deposit (b 2, 100); deposit (b 3, 100); COMMIT Thorsten Schütt 19
20 Adapted PaxosCommit Optimistic CC with fallback option Fast Read Operation just 1 round reads a majority (of the last versions) of all replicates Write Operation 3 rounds nonblocking(fallback) succeeds when > f/2 nodes alive Leader GetTP RegisterTP Prepare Ack Prepared Commit Items InitTM Transaction Manager RegisterRTM TMs andtps Prepared 1. Step 2. Step 3. Step 4. Step 5. Step 6. Step Thorsten Schütt 20
21 Summary Transactions Consistent way to update several items and their replicas Mitigates some of the Overlay Oddities Node Failures Asynchronous programming model Thorsten Schütt 21
22 ScalableWikipediawithErlang Chord# Transactional Layer Load Balancing Wikipedia Thorsten Schütt 22
23 Load-Balancing 1. Pick 2 randomnodes( Naur, Wirth ) 2. IF Load( Wirth ) >> Load( Naur ) 1. Naur leaves system 2. Naur joins as Tarjan Rivest Wirth Yao Backus Codd Overloaded Dijkstr a Flyod Loadcanbeanymetric Ritchie Gray Perlis Karp Nygaar d Knuth D. Karger, M. Ruhl. Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems. IPTPS Naur Minsky Milner Thorsten Schütt 23
24 Load-Balancing 1. Pick 2 randomnodes( Naur, Wirth ) 2. IF Load( Wirth ) >> Load( Naur ) 1. Naur leaves system 2. Naur joins as Tarjan Tarjan Wirth Yao Backus Codd Dijkstr a Flyod Loadcanbeanymetric Rivest Gray Ritchie Karp Perlis Knuth D. Karger, M. Ruhl. Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems. IPTPS Nygaa rd Naur Minsk y Milner Thorsten Schütt 24
25 Multi Data-Center Scenario Multi-Data-Center Scenarios Optimize for Latency Increase Availability Prefix Articles with Language Replicas Number E.g. de:main Page 5 replicas 2 in Germany 1 in UK 1 in USA 1 in Asia 0de:Main Page, 1de:Main Page, 2de:Main Page, Thorsten Schütt 25
26 ScalableWikipediawithErlang Chord# Transactional Layer Load Balancing Wikipedia Thorsten Schütt 26
27 Wikipedia Top 10 Web sites 1. Yahoo! 2. Google 3. YouTube 4. Windows Live 5. MSN 6. Myspace 7. Wikipedia 8. Facebook 9. Blogger.com 10. Yahoo! カテゴリ Wikipedia is the top1 openweb Site source code is open architecture is open dumps available Source: alexa.com Thorsten Schütt 27
28 Wikipedia Top 10 Web sites 1. Yahoo! 2. Google 3. YouTube 4. Windows Live 5. MSN 6. Myspace 7. Wikipedia 8. Facebook 9. Blogger.com 10. Yahoo! カテゴリ requests/sec 95% are answered by squid proxies 2,000 req./sec hit the backend Thorsten Schütt 28
29 The Wikipedia System Architecture other search servers NFS web servers Thorsten Schütt 29
30 ErlangImplementation Thorsten Schütt 30
31 Overlay Implemented in Erlang Chord# Load-Balancing Transaction Framework OTP behaviours Supervisor gen_server Distributed Erlang Security Problems Scalability Problems -> OwnTransport-Layer on top oftcp Thorsten Schütt 31
32 Data Model Wikipedia SQL DB Chord# Key-Value Store CREATE TABLE /*$wgdbprefix*/page ( page_id int unsigned NOT NULL auto_increment, page_namespace int NOT NULL,... Map Relations to Key-Value Pairs (Title, List of Versions) (CategoryName, List of Titles) (Title, List of Titles) //Backlinks Thorsten Schütt 32
33 Data Model void updatepage(string title, int oldversion, string newtext) { //new transaction Transaction t = new Transaction(); //read old version Page p = t.read(title); //check for concurrent update if(p.currentversion!= oldversion) t.abort(); else{ //write new text t.write(p.add(newtext)); //update categories foreach(category c in p) t.write(t.read(c.name).add(title)); //commit t.commit(); } } Thorsten Schütt 33
34 Wiki Database: Chord# Mapping Wiki -> Key-Value Store Renderer: Java Tomcat Plog4u Jinterface Interface to Erlang Thorsten Schütt 34
35 IEEE ScaleChallenge 2008 Live Demos Bavarian Simple English Browsing Editing with full History Category-Pages Deployments Planet-Lab 20 nodes Cluster 320 nodes in Berlin Thorsten Schütt 35
36 Summary DHT + Transactions = Scalable, Reliable, Efficient Key/Value Store Previously, P2P was mainly used for file sharing (read only). We support consistent, distributed write operations. Numerous applications Internet databases, transactional online-services, Thorsten Schütt 36
37 Team Thorsten Schütt, Florian Schintke, Monika Moser, Stefan Plantikow, Alexander Reinefeld, Nico Kruber, Christian von Prollius, Seif Haridi (SICS), Ali Ghodsi (SICS), Tallat Shafaat (SICS) Thorsten Schütt 37
38 Detailed Descriptions Wiki S. Plantikow, A. Reinefeld, F. Schintke. Transactions for Distributed Wikis on Structured Overlays. DSOM, October Transactions M. Moser, S. Haridi. Atomic Commitment in Transactional DHTs. 1st CoreGRID Symposium, August DHT T. Shafaat, M. Moser, A. Ghodsi, S. Haridi, T. Schütt, A. Reinefeld. Key-Based Consistency and Availability in Structured Overlay Networks. Infoscale, June T. Schütt, F. Schintke, A. Reinefeld. A Structured Overlay for Multi-dimensional Range Queries. Euro-Par, August T. Schütt, F. Schintke, A. Reinefeld. Structured Overlay without Consistent Hashing: Empirical Results. GP2PC, May Thorsten Schütt 38
39 Questions Thorsten Schütt 39
Building a transactional distributed
Building a transactional distributed data store with Erlang Alexander Reinefeld, Florian Schintke, Thorsten Schütt Zuse Institute Berlin, onscale solutions GmbH Transactional data store - What for? Web
More informationEnhanced Paxos Commit for Transactions on DHTs
Konrad-Zuse-Zentrum für Informationstechnik Berlin Takustraße 7 D-14195 Berlin-Dahlem Germany FLORIAN SCHINTKE, ALEXANDER REINEFELD, SEIF HARIDI, THORSTEN SCHÜTT Enhanced Paxos Commit for Transactions
More informationEnhanced Paxos Commit for Transactions on DHTs
Konrad-Zuse-Zentrum für Informationstechnik Berlin Takustraße 7 D-14195 Berlin-Dahlem Germany FLORIAN SCHINTKE, ALEXANDER REINEFELD, SEIF HARIDI, THORSTEN SCHÜTT Enhanced Paxos Commit for Transactions
More informationSCALARIS. Irina Calciu Alex Gillmor
SCALARIS Irina Calciu Alex Gillmor RoadMap Motivation Overview Architecture Features Implementation Benchmarks API Users Demo Conclusion Motivation (NoSQL) "One size doesn't fit all" Stonebraker Reinefeld
More informationPeer-to-Peer Systems and Distributed Hash Tables
Peer-to-Peer Systems and Distributed Hash Tables CS 240: Computing Systems and Concurrency Lecture 8 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Selected
More informationToday. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables
Peer-to-Peer Systems and Distributed Hash Tables COS 418: Distributed Systems Lecture 7 Today 1. Peer-to-Peer Systems Napster, Gnutella, BitTorrent, challenges 2. Distributed Hash Tables 3. The Chord Lookup
More information08 Distributed Hash Tables
08 Distributed Hash Tables 2/59 Chord Lookup Algorithm Properties Interface: lookup(key) IP address Efficient: O(log N) messages per lookup N is the total number of servers Scalable: O(log N) state per
More informationPassive/Active Load Balancing with Informed Node Placement in DHTs
Passive/Active Load Balancing with Informed Node Placement in DHTs Mikael Högqvist and Nico Kruber Zuse Institute Berlin Takustr. 7, 14195, Berlin, Germany hoegqvist@zib.de kruber@zib.de Abstract. Distributed
More informationSelf Management for Large-Scale Distributed Systems
Self Management for Large-Scale Distributed Systems Peter Van Roy and SELFMAN partners May 8, 2008 Grid@Mons 2008 Université catholique de Louvain Louvain-la-Neuve, Belgium May 2008 P. Van Roy & SELFMAN
More informationDIVING IN: INSIDE THE DATA CENTER
1 DIVING IN: INSIDE THE DATA CENTER Anwar Alhenshiri Data centers 2 Once traffic reaches a data center it tunnels in First passes through a filter that blocks attacks Next, a router that directs it to
More informationExtreme Computing. NoSQL.
Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable
More informationDynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat
Dynamo: Amazon s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat Dynamo An infrastructure to host services Reliability and fault-tolerance at massive scale Availability providing
More informationApache Cassandra - A Decentralized Structured Storage System
Apache Cassandra - A Decentralized Structured Storage System Avinash Lakshman Prashant Malik from Facebook Presented by: Oded Naor Acknowledgments Some slides are based on material from: Idit Keidar, Topics
More informationPassive/Active Load Balancing with Informed Node Placement in DHTs
Passive/Active Load Balancing with Informed Node Placement in DHTs Mikael Högqvist and Nico Kruber Zuse Institute Berlin Takustr. 7, 14195, Berlin, Germany hoegqvist@zib.de, kruber@zib.de Abstract. Distributed
More informationPEER-TO-PEER NETWORKS, DHTS, AND CHORD
PEER-TO-PEER NETWORKS, DHTS, AND CHORD George Porter May 25, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license
More informationDistributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017
Distributed Systems 16. Distributed Lookup Paul Krzyzanowski Rutgers University Fall 2017 1 Distributed Lookup Look up (key, value) Cooperating set of nodes Ideally: No central coordinator Some nodes can
More informationDistributed K-Ary System
A seminar presentation Arne Vater and Prof. Schindelhauer Professorship for Computer Networks and Telematik Department of Computer Science University of Freiburg 2007-03-01 Outline 1 2 3 4 5 Outline 1
More informationDistributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016
Distributed Systems 17. Distributed Lookup Paul Krzyzanowski Rutgers University Fall 2016 1 Distributed Lookup Look up (key, value) Cooperating set of nodes Ideally: No central coordinator Some nodes can
More informationCSE 486/586 Distributed Systems
CSE 486/586 Distributed Systems Distributed Hash Tables Slides by Steve Ko Computer Sciences and Engineering University at Buffalo CSE 486/586 Last Time Evolution of peer-to-peer Central directory (Napster)
More informationP2P: Distributed Hash Tables
P2P: Distributed Hash Tables Chord + Routing Geometries Nirvan Tyagi CS 6410 Fall16 Peer-to-peer (P2P) Peer-to-peer (P2P) Decentralized! Hard to coordinate with peers joining and leaving Peer-to-peer (P2P)
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationJargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems
Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons
More informationDistributed Hash Tables: Chord
Distributed Hash Tables: Chord Brad Karp (with many slides contributed by Robert Morris) UCL Computer Science CS M038 / GZ06 12 th February 2016 Today: DHTs, P2P Distributed Hash Tables: a building block
More informationDistributed Hash Tables
Distributed Hash Tables CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University.
More informationA Framework for Peer-To-Peer Lookup Services based on k-ary search
A Framework for Peer-To-Peer Lookup Services based on k-ary search Sameh El-Ansary Swedish Institute of Computer Science Kista, Sweden Luc Onana Alima Department of Microelectronics and Information Technology
More informationCMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22
More informationScalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou
Scalability In Peer-to-Peer Systems Presented by Stavros Nikolaou Background on Peer-to-Peer Systems Definition: Distributed systems/applications featuring: No centralized control, no hierarchical organization
More informationMarch 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE
for for March 10, 2006 Agenda for Peer-to-Peer Sytems Initial approaches to Their Limitations CAN - Applications of CAN Design Details Benefits for Distributed and a decentralized architecture No centralized
More information: Scalable Lookup
6.824 2006: Scalable Lookup Prior focus has been on traditional distributed systems e.g. NFS, DSM/Hypervisor, Harp Machine room: well maintained, centrally located. Relatively stable population: can be
More informationDistributed Systems. 29. Distributed Caching Paul Krzyzanowski. Rutgers University. Fall 2014
Distributed Systems 29. Distributed Caching Paul Krzyzanowski Rutgers University Fall 2014 December 5, 2014 2013 Paul Krzyzanowski 1 Caching Purpose of a cache Temporary storage to increase data access
More informationDistributed Hash Tables
KTH ROYAL INSTITUTE OF TECHNOLOGY Distributed Hash Tables Distributed Hash Tables Vladimir Vlassov Large scale data bases hundreds of servers High churn rate servers will come and go Benefits fault tolerant
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationLarge-Scale Data Stores and Probabilistic Protocols
Distributed Systems 600.437 Large-Scale Data Stores & Probabilistic Protocols Department of Computer Science The Johns Hopkins University 1 Large-Scale Data Stores and Probabilistic Protocols Lecture 11
More informationC 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want
Last Time Distributed Systems Distributed Hash Tables Evolution of peer-to-peer Central directory (Napster) Query flooding (Gnutella) Hierarchical overlay (Kazaa, modern Gnutella) BitTorrent Focuses on
More informationLast Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1
Last Time Distributed Systems Distributed Hash Tables Evolution of peer-to-peer Central directory (Napster) Query flooding (Gnutella) Hierarchical overlay (Kazaa, modern Gnutella) BitTorrent Focuses on
More informationZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table
ZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table 1 What is KVS? Why to use? Why not to use? Who s using it? Design issues A storage system A distributed hash table Spread simple structured
More informationDynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation
Dynamo Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/20 Outline Motivation 1 Motivation 2 3 Smruti R. Sarangi Leader
More informationNOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY
NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY WHAT IS NOSQL? Stands for No-SQL or Not Only SQL. Class of non-relational data storage systems E.g.
More informationDistributed Hash Table
Distributed Hash Table P2P Routing and Searching Algorithms Ruixuan Li College of Computer Science, HUST rxli@public.wh.hb.cn http://idc.hust.edu.cn/~rxli/ In Courtesy of Xiaodong Zhang, Ohio State Univ
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.
More informationPersistence and Node Failure Recovery in Strongly Consistent Key-Value Datastore
Persistence and Node Failure Recovery in Strongly Consistent Key-Value Datastore MUHAMMAD EHSAN UL HAQUE Master of Science Thesis Stockholm, Sweden 2012 TRITA-ICT-EX-2012:175 Persistence and Node Failure
More informationL3S Research Center, University of Hannover
, University of Hannover Dynamics of Wolf-Tilo Balke and Wolf Siberski 21.11.2007 *Original slides provided by S. Rieche, H. Niedermayer, S. Götz, K. Wehrle (University of Tübingen) and A. Datta, K. Aberer
More informationCassandra- A Distributed Database
Cassandra- A Distributed Database Tulika Gupta Department of Information Technology Poornima Institute of Engineering and Technology Jaipur, Rajasthan, India Abstract- A relational database is a traditional
More informationA Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables
A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables Takehiro Miyao, Hiroya Nagao, Kazuyuki Shudo Tokyo Institute of Technology 2-12-1 Ookayama, Meguro-ku,
More informationCS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.
Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message
More informationCompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang
CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4 Xiaowei Yang xwy@cs.duke.edu Overview Problem Evolving solutions IP multicast Proxy caching Content distribution networks
More informationLarge-Scale Data Engineering
Large-Scale Data Engineering nosql: BASE vs ACID THE NEED FOR SOMETHING DIFFERENT One problem, three ideas We want to keep track of mutable state in a scalable manner Assumptions: State organized in terms
More informationHow to make sites responsive? 7/121
How to make sites responsive? 7/2 Goals of Replication Fault-Tolerance That s what we have been looking at so far... Databases We want to have a system that looks like a single node, but can tolerate node
More informationIndexing Large-Scale Data
Indexing Large-Scale Data Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook November 16, 2010
More informationEECS 498 Introduction to Distributed Systems
EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Looking back and ahead So far: Different ways of replicating state Tradeoffs between replication and consistency Impact of strong
More informationDISTRIBUTED SYSTEMS CSCI 4963/ /4/2015
1 DISTRIBUTED SYSTEMS CSCI 4963/6963 12/4/2015 2 Info Quiz 7 on Tuesday. Project 2 submission URL is posted on the web site Submit your source code and project report (PDF!!!) in a single zip file. If
More informationFinding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems
Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems 1 Motivation from the File Systems World The App needs to know the path /home/user/my pictures/ The Filesystem
More informationChapter 24 NOSQL Databases and Big Data Storage Systems
Chapter 24 NOSQL Databases and Big Data Storage Systems - Large amounts of data such as social media, Web links, user profiles, marketing and sales, posts and tweets, road maps, spatial data, email - NOSQL
More informationPerformance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences
Performance and Forgiveness June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences Margo Seltzer Architect Outline A consistency primer Techniques and costs of consistency
More informationKademlia: A P2P Informa2on System Based on the XOR Metric
Kademlia: A P2P Informa2on System Based on the XOR Metric Today! By Petar Mayamounkov and David Mazières, presented at IPTPS 22 Next! Paper presentation and discussion Image from http://www.vs.inf.ethz.ch/about/zeit.jpg
More informationIntegrity in Distributed Databases
Integrity in Distributed Databases Andreas Farella Free University of Bozen-Bolzano Table of Contents 1 Introduction................................................... 3 2 Different aspects of integrity.....................................
More informationScalable Transactions for Web Applications in the Cloud
Scalable Transactions for Web Applications in the Cloud Zhou Wei 1,2, Guillaume Pierre 1 and Chi-Hung Chi 2 1 Vrije Universiteit, Amsterdam, The Netherlands zhouw@few.vu.nl, gpierre@cs.vu.nl 2 Tsinghua
More informationCS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Dynamo CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Dynamo Highly available and scalable distributed data store Manages state of services that have high reliability and
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 10: Mutable State (1/2) March 15, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationPage 1. Key Value Storage"
Key Value Storage CS162 Operating Systems and Systems Programming Lecture 14 Key Value Storage Systems March 12, 2012 Anthony D. Joseph and Ion Stoica http://inst.eecs.berkeley.edu/~cs162 Handle huge volumes
More informationDistributed Hash Tables
Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi 1/34 Outline 1 2 Smruti R. Sarangi 2/34 Normal Hashtables Hashtable : Contains a set of
More informationGoal of the presentation is to give an introduction of NoSQL databases, why they are there.
1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in
More informationNoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,
More informationConsistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016
Consistency in Distributed Storage Systems Mihir Nanavati March 4 th, 2016 Today Overview of distributed storage systems CAP Theorem About Me Virtualization/Containers, CPU microarchitectures/caches, Network
More informationIntroduction to Distributed Data Systems
Introduction to Distributed Data Systems Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook January
More informationSearching for Shared Resources: DHT in General
1 ELT-53206 Peer-to-Peer Networks Searching for Shared Resources: DHT in General Mathieu Devos Tampere University of Technology Department of Electronics and Communications Engineering Based on the original
More informationSearching for Shared Resources: DHT in General
1 ELT-53207 P2P & IoT Systems Searching for Shared Resources: DHT in General Mathieu Devos Tampere University of Technology Department of Electronics and Communications Engineering Based on the original
More informationRule 14 Use Databases Appropriately
Rule 14 Use Databases Appropriately Rule 14: What, When, How, and Why What: Use relational databases when you need ACID properties to maintain relationships between your data. For other data storage needs
More informationTriple Distribution, Resoning and Load Balancing in DHT Based RDF Stores
Albert-Ludwigs-University Freiburg SS 2009 Department of Computer Science Computer Networks and Telematics Triple Distribution, Resoning and Load Balancing in DHT Based RDF Stores Aldarwich Yaser 29. Juli
More informationSpanner: Google's Globally-Distributed Database* Huu-Phuc Vo August 03, 2013
Spanner: Google's Globally-Distributed Database* Huu-Phuc Vo August 03, 2013 *OSDI '12, James C. Corbett et al. (26 authors), Jay Lepreau Best Paper Award Outline What is Spanner? Features & Example Structure
More informationP2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili
P2P Network Structured Networks: Distributed Hash Tables Pedro García López Universitat Rovira I Virgili Pedro.garcia@urv.net Index Description of CHORD s Location and routing mechanisms Symphony: Distributed
More informationCS5412: DIVING IN: INSIDE THE DATA CENTER
1 CS5412: DIVING IN: INSIDE THE DATA CENTER Lecture V Ken Birman Data centers 2 Once traffic reaches a data center it tunnels in First passes through a filter that blocks attacks Next, a router that directs
More informationStructured Peer-to-Peer Networks
Structured Peer-to-Peer Networks The P2P Scaling Problem Unstructured P2P Revisited Distributed Indexing Fundamentals of Distributed Hash Tables DHT Algorithms Chord Pastry Can Programming a DHT Graphics
More informationHow do we build TiDB. a Distributed, Consistent, Scalable, SQL Database
How do we build TiDB a Distributed, Consistent, Scalable, SQL Database About me LiuQi ( 刘奇 ) JD / WandouLabs / PingCAP Co-founder / CEO of PingCAP Open-source hacker / Infrastructure software engineer
More informationEssential Skills - RDBMS and SQL
Essential Skills - RDBMS and SQL Essential Skills RDBMS and SQL Daniël van Eeden dveeden@snow.nl October 2011 What is a Database? A structured collection of data What is a DBMS DataBase Management System
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [P2P SYSTEMS] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Byzantine failures vs malicious nodes
More informationWinter CS454/ Assignment 2 Instructor: Bernard Wong Due date: March 12 15, 2012 Group size: 2
Winter CS454/654 2012 Assignment 2 Instructor: Bernard Wong Due date: March 12 15, 2012 Group size: 2 Distributed systems is a rapidly evolving field. Most of the systems we have studied (or will study)
More informationChordNet: A Chord-based self-organizing super-peer network
ChordNet: A Chord-based self-organizing super-peer network Dennis Schwerdel, Matthias Priebe, Paul Müller Dennis Schwerdel University of Kaiserslautern Department of Computer Science Integrated Communication
More informationBabelchord: a Social Tower of DHT-Based Overlay Networks
Babelchord: a Social Tower of DHT-Based Overlay Networks Luigi Liquori Cédric Tedeschi Francesco Bongiovanni INRIA Sophia Antipolis - Méditerranée, France surname.name@sophia.inria.fr Abstract Chord is
More informationCS5412: DIVING IN: INSIDE THE DATA CENTER
1 CS5412: DIVING IN: INSIDE THE DATA CENTER Lecture V Ken Birman We ve seen one cloud service 2 Inside a cloud, Dynamo is an example of a service used to make sure that cloud-hosted applications can scale
More informationDistributed Hash Tables Chord and Dynamo
Distributed Hash Tables Chord and Dynamo (Lecture 19, cs262a) Ion Stoica, UC Berkeley October 31, 2016 Today s Papers Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Ion Stoica,
More informationCoordinate-based Routing:
Coordinate-based Routing: Refining NodeIds in Fabian Hartmann and Bernhard Heep P2PNet 09 (in conjunction with ICUMT 09) St. Petersburg, Russia Motivation Structured P2P overlays offering key-based routing
More informationDesigning Robust and Adaptive Distributed Systems with Weakly Interacting Feedback Structures
Designing Robust and Adaptive Distributed Systems with Weakly Interacting Feedback Structures Peter Van Roy Univ. catholique de Louvain Place Sainte Barbe, 2 B-1348, Louvain-la-Neuve peter.vanroy@uclouvain.be
More informationA Structured Overlay for Multi-Dimensional Range Queries
A Structured Overlay for Multi-Dimensional Range Queries Thorsten Schütt, Florian Schintke, and Alexander Reinefeld Zuse Institute Berlin Abstract. We introduce SONAR, a structured overlay to store and
More informationScalable Data Models with the Transactional Key-Value Store Scalaris
Scalable Data Models with the Transactional Key-Value Store Scalaris Nico Kruber Michael Berlin Zuse Institut Berlin Parallel and Distributed Systems 20th November INGI 2012 Doctoral School Day in Cloud
More informationEECS 498 Introduction to Distributed Systems
EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Dynamo Recap Consistent hashing 1-hop DHT enabled by gossip Execution of reads and writes Coordinated by first available successor
More informationTools for Social Networking Infrastructures
Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes
More informationChord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Amir H. Payberah (amir@sics.se) 1/71 Recap 2/71 Distributed Hash Tables (DHT) An ordinary hash table, which is... Key Fatemeh Sarunas
More informationToday CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space
Today CSCI 5105 Coda GFS PAST Instructor: Abhishek Chandra 2 Coda Main Goals: Availability: Work in the presence of disconnection Scalability: Support large number of users Successor of Andrew File System
More informationPeer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.
Peer to Peer I II 1 Roadmap This course will feature key concepts in Distributed Systems, often illustrated by their use in example systems Start with Peer-to-Peer systems, which will be useful for your
More informationDistributed Data Store
Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is
More informationCSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores
CSE 444: Database Internals Lectures 26 NoSQL: Extensible Record Stores CSE 444 - Spring 2014 1 References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No. 4)
More informationContent Overlays (continued) Nick Feamster CS 7260 March 26, 2007
Content Overlays (continued) Nick Feamster CS 7260 March 26, 2007 Administrivia Quiz date Remaining lectures Interim report PS 3 Out Friday, 1-2 problems 2 Structured vs. Unstructured Overlays Structured
More informationA Scalable Content- Addressable Network
A Scalable Content- Addressable Network In Proceedings of ACM SIGCOMM 2001 S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker Presented by L.G. Alex Sung 9th March 2005 for CS856 1 Outline CAN basics
More informationOverlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q
Overlay networks To do q q q Overlay networks P2P evolution DHTs in general, Chord and Kademlia Turtles all the way down Overlay networks virtual networks Different applications with a wide range of needs
More informationOverlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma
Overlay and P2P Networks Structured Networks and DHTs Prof. Sasu Tarkoma 6.2.2014 Contents Today Semantic free indexing Consistent Hashing Distributed Hash Tables (DHTs) Thursday (Dr. Samu Varjonen) DHTs
More informationSpanner: Google's Globally-Distributed Database. Presented by Maciej Swiech
Spanner: Google's Globally-Distributed Database Presented by Maciej Swiech What is Spanner? "...Google's scalable, multi-version, globallydistributed, and synchronously replicated database." What is Spanner?
More informationNoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu
NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related
More informationReferences. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals
References CSE 444: Database Internals Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol 39, No 4) Lectures 26 NoSQL: Extensible Record Stores Bigtable: A Distributed
More informationMegastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database. Presented by Kewei Li The Problem db nosql complex legacy tuning expensive
More information