Scalable overlay Networks
|
|
- Willis Barber
- 5 years ago
- Views:
Transcription
1 Scalable overlay Networks Dr. Samu Varjonen Scalable overlay networks
2 Lectures MO C122 Introduction. Exercises. Motivation. TH DK117 Unstructured networks I MO C122 Unstructured networks II TH DK117 Bittorrent and evaluation MO C122 Privacy (Freenet etc.) and intro to power-law networks. TH DK117 Consistent hashing. Distributed Hash Tables (DHTs) MO C122 DHTs continued TH DK117 Power-law networks MO C122 Power-law networks and applications. TH DK117 Applications I MO C122 Applications I TH DK117 Advanced topics MO C122 Conclusions and summary TH DK117 Reserved Scalable overlay networks
3 Contents Today Structured Overlay Networks Content Delivery Networks Akamai Coral Amazon Dynamo Scalable overlay networks
4 Content Delivery Networks 4
5 Limitations of Web Proxies (Caching) Inability of cache all objects/content Dynamic Data Encrypted data Server Side Analytics Scalability Hit Metering (limiting and reporting RFC 2227), User Demographics, etc. Inability to support flash crowds 5
6 Content Delivery Networks Role Redirect content requests to an 'optimal site' Cache and Serve content from 'optimal site' Export logs and other information to origin servers Redirection mechanism DNS redirection URL rewriting 6
7 Critical Issues in Deploying CDNs Servers Placement Where to place the servers? How many in each location? Content Selection Which content to distribute in CDNs? Content Replication Proactive push from origin server Cooperative vs Uncooperative Pulls Pricing George Pallis et al. Insight and perspectives for content delivery networks. In Communicatoins of ACM 49, 1 (January 2006), 7
8 Server Placement Problem Given N possible locations at edge of the Internet, we are able to place K (K<N) surrogate servers, how to place them to minimize the total cost? Minimum K-median problem Given N points we need to select K centers Assign each input point j to a center 'closest' to it Minimize the sum of distances between each j and its center NP-Hard 8
9 Minimum K-median problem 9
10 Redirection Techniques Routing Strategy Anycast Load Balancing Application specific selection HTTP redirection (HTTP status codes 3xx) Naming based redirection DNS (example next slide) 10
11 DNS Based Redirection CDN DNS Server 5 Client ISP DNS Server 2 3 Content Provider DNS Server 11
12 Akamai CDN (overview) Client requests content from Original Server URLs for content in CDN modified in the original response Client resolves <content>.<akamai host> name Server from the region (best server) chosen Client fetches content from akamai server 12
13 Akamai Content Source (initial request) DNS Root Server Akamai (high level DNS server) Akamai (low level DNS server) Akamai (content server) Srinivasan Seshan Computer Networking Caching, CDN, Consistent Hashing, P2P 13
14 Akamai Content Source (subsequent request) DNS Root Server Akamai (high level DNS server) 1 Akamai (low level DNS server) Akamai (content server) Srinivasan Seshan Computer Networking Caching, CDN, Consistent Hashing, P2P 14
15 Democratizing Content Publication with Coral (Coral CDN) 15
16 Coral Objectives Pool resources to dissipate Flash Crowds Work with unmodified clients Fetch content only once from Origin No centralized management Browser Coral CDN Coral http prx dnssrv Coral http prx dnssrv Coral http prx dnssrv Origin Server Coral http prx dnssrv Browser Browser Browser Browser Browser 16
17 Using Coral Origin Server rewrites URLs abc.com abc.com.coralhost:coralport Redirect clients to Coral server Coral CDN Components DNS server Given address of resolver used by the client, return the address of proxy near the client HTTP proxy Given the URL find nearest proxy that has content Cache the content (DHT) Distributed Sloppy Hash Table (DSHT) 17
18 Coral System Overview 1) A client sends a DNS request for nyud.net to its local resolver. 2) The client s resolver attempts to resolve the hostname using some Coral DNS server(s) 3) Upon receiving a query, a Coral DNS server probes the client to determines its round-trip-time and last few network hops. 4) Based on the probe results, the DNS server checks Coral to see if there are any known nameservers and/or HTTP proxies near the client s resolver. 5) The DNS server replies, returning any servers (close ones) found through Coral in the previous step; if none were found, it returns a random set of nameservers and proxies. 6) The client s resolver returns the address of a Coral HTTP proxy for 7) The client sends the HTTP request x.com.nyud.net:8090/ to the specified proxy. If the proxy is caching the file locally, it returns the file and stops. Otherwise, this process continues. 8) The proxy looks up the web object s URL in Coral. 9) If Coral returns the address of a node caching the object, the proxy fetches the object from this node. Otherwise, the proxy downloads the object from the origin server, (not shown). 10) The proxy stores the web object and returns it to the client browser. 11) The proxy stores a reference to itself in Coral, recording the fact that is now caching the URL. Freedman, Michael et al. "Democratizing Content Publication with Coral." In NSDI
19 Distributed Sloppy Hash Table Sloppy hashing and self-organizing clusters Michael J. Freedman and David Mazières DHTs provide the wrong abstraction. DHTs have poor locality Overloading nodes caching frequent updates a node might need to send a query half way around the world to learn that its neighbor is caching a particular web page Coral s two principal goals are to avoid hot spots and to find nearby data without querying distant nodes The fundamental observation is that a node doesn t need to know every replicated location of a resource it only needs a single, valid, nearby copy On top of Chord but equally supports Kademlia 19
20 DSHT Coral was made up of concentric rings of distributed hash tables (DHTs) each ring representing a wider and wider geographic range (ping range) The DHTs are composed of nodes all within some latency of each other (for example, a ring of nodes within 20 milliseconds of each other) It avoids hot spots (the 'sloppy' part) by only continuing to query progressively larger sized rings if they are not overburdened If the two top-most rings are experiencing too much traffic, a node will just ping closer ones: when a node that is overloaded is reached, upward progression stops This minimises the occurrence of hot spots, with the disadvantage that knowledge of the system as a whole is reduced. 20
21 Hierarchical Indexing Diameter, Clusters, Levels Each Coral Node part of several DSHTs called clusters Each cluster characterized by max RTT (diameter) Fixed hierarchy of diameters called levels Group of nodes can form a level-i cluster if the pairwise RTT less than threshold for level-i Paper uses 3 (levels): 20ms (2), 60ms (1), SHA-1 for Coral Keys and Node-Ids Bitwise XOR is distance (Kademlia) Longer matching prefix numerically closer Key stored at node having ID close to key (0) 21
22 Routing and Sloppy Storage Routing Initiate from highest level cluster Sloppy Storage Cache key/value pairs at nodes whose IDs are close to the key being referenced Reduces hot-spot congestion for popular content More knowledge of nearby nodes Continue at lower levels on MISS Routing table size logarithmic in total number of nodes Freedman, Michael et al. "Democratizing Content Publication with Coral." In NSDI
23 Reduction in Server Load Initial minute, 15 requests hit the origin web server During this first minute, equal numbers of requests were handled by the level-1 and level-2 cluster caches Within 8-10 minutes, the files became replicated at nearly every server Freedman, Michael et al. "Democratizing Content Publication with Coral." In NSDI
24 Insights from 5 year Deployment A large majority of its traffic does not require any cooperative caching Handling of flash crowds relies on cooperative caching Flash Crowds Small fraction of CoralCDN s domains experience large rate increases within short time periods Flash crowd domains traffic accounts for a small fraction of the total requests Request rate increases very rarely occur on the order of seconds Content delivery via untrusted nodes requires the HTTP protocol to support end-to-end signatures for content integrity Freedman, Michael. "Experiences with CoralCDN: A Five Year Operational View." In NSDI
25 Other CDNs 27
26 Distributing content P2P applications may be oblivious to underlying network Lot of inter-domain traffic (Karagiannis et al. 2005) Approaches to address this problem ISP approaches Block P2P, Rate-limit P2P, Cache content, etc. Reduces traffic but hurts the overlay 28
27 Ono project (P2P) Summary: there are no good solutions that allow ISPs to reduce traffic generated by P2P applications. Proposed solution: use information provided by CDNs for biased peer selection. Features: no cooperation between users and ISPs, no infrastructure,and no network topology information required. Hypothesis: clients who are redirected to the same replica server by a CDN are likely to be good peers. BitTorrent plugin that Periodically, each peer does a DNS lookup for popular CDNs Builds and share ratio maps that how many times the peer has been directed to a certain CDN node. Peer selection biased toward similar ratio maps Taming the Torrent: A Practical Approach to Reducing Cross-ISP Traffic in P2P Systems, Choffnes and Bustamante, in SIGCOMM
28 itracker of P4P P4P (Provider Portals for Applications) Network provider runs an itracker itracker used by ISPs to provide additional information regarding network topology P2P networks may choose to utilize to optimize network data delivery 3. Query itrackers of the ISP for peer selection 1. ISPs run their own itrackers 2. Register (a & b) If trackerless then a and b can connect the itrackers themself Haiyong Xie et al. P4P: provider portal for applications. In SIGCOMM
29 Did this work? Principal result, however, is that this benefit is difficult to achieve in practice Limited impact Reduced performance and robustness Degrades the structural robustness of the overlay Local peers might not be the fastest ones Conflicting interests Too few peers inside a ISP Some of the tier-1 IPSs have incentives to create longer paths than necessary Improvements shown (although, not that big than the original papers) System designers should not expect universal cooperation from IPSs Piatek et al., Pitfalls for ISP-friendly P2P design, 31
30 Maygh P2P CDN P2P CDN on Browser a system that distributes the cost of serving content across the visitors to a web site. Maygh automatically recruits web visitors to help serve content to other visitors, thereby substantially reducing the costs for the web site Leverage on WebSockets, WebRTC, WebStorage API Liang Zhang et al. Maygh: building a CDN from client web browsers. In EuroSys '13. 32
31 Amazon Dynamo 34
32 ACID (Recap) Atomicity Consistency Successful transaction commits only legal results Isolation All or nothing Events within a transaction must be hidden from other transactions running concurrently Durability Once a transaction has been committed its results, the system must guarantee the results survive subsequent malfunctions Theo Haerder et al. "Principles of transaction oriented database recovery." ACM Computing Surveys
33 CAP Theorem C: Strong Consistency (only one consistent state can be observed among parallel processes) A: High Availability (available at all times) P: Partition Tolerance (survive partition between replicas) PICK ANY TWO This is usually relaxed a bit A. Fox et al. Harvest, yield, and scalable tolerant systems. HotOS
34 Two Phase Perspective of CAP Two phase commit P1: Coordinator asks databases to perform a pre-commit and asks them if commit is possible. If all DBs agree then proceed to P2 P2: Coordinator asks DBs to commit Two phase commit supports consistency and partitioning. How is availability violated? Availability of any system is the product of the availability of the components required for the operation i.e., two DBs (availability 99.9) taking part in 2PC results in 99.8 availability ACID provides Consistency. Partion Tolerance is essential. How do you achieve Availability? BASE 37
35 BASE Basically available, Soft state, Eventually consistent Strong vs Eventual (informal comparison) Strong: Every replica sees every update in the same order (atomic updates) Eventual: every replica will eventually see updates and eventually agree on all values (non-atomic updates) Eventual Consistency Database consistency will be in a state of flux but eventually it will be consistent Reads might not return the results of the latest update Dan Pritchett. BASE: An Acid Alternative. ACM Queue
36 Requirements from Dynamo Key-value store shopping carts, seller lists, preferences, product catalog System built using off-the-shelf hardware. Platform must scale to support continuous growth Address tradeoff of high-availability, guaranteed performance, cost-effectiveness, and performance The system needs to have scalable and robust solutions for load balancing, membership and failure detection, failure recovery, replica synchronization, overload handling, state transfer, concurrency and job scheduling, request marshalling, request routing, system monitoring and alarming, and configuration management G. DeCandia et al. Dynamo: Amazon s Highly Available Key value Store, In SOSP
37 Partitioning and Replication in Dynamo Consistent Hashing DHT Each physical node added as multiple virtual nodes Each data-item replicated in N nodes Each virtual node responsible for the region between it and its Nth predecessor Preference List: list of nodes (in multiple datacenters) storing a key G. DeCandia et al. Dynamo: Amazon s Highly Available Key value Store, In SOSP
38 API get (key) may return many versions of the same object put(key, context, object) Context: encodes system metadata and includes information such as the version of the object may return to its caller before the update has been applied at all the replicas An object may have different version sub-histories Vector clock based versioning One vector clock associated with every version of objects 41
39 Data Versioning Assume object is shopping cart. Requirements: additions to the cart don t get lost but deletions can be lost Node Clock Objects versions: D1, D2, D3,... all versions of the object committed to the system are returned when read Reconciliation is usually a merge of the versions so nothing is lost but deleted items may pop back The vector may get long over time but, Dynamo suppors truncation (remove oldest pairs, not foolproof) G. DeCandia et al. Dynamo: Amazon s Highly Available Key value Store, In SOSP
40 Sloppy Quorum Read + Write involves N nodes from the preference list R: minimum number of nodes for Read W: minimum number of nodes for Write R+W>N R = W = 5 high consistency but system is vulnerable to network partitions R = W = 1 weak consistency with failure Typical values of (N, R, W) = (3,2,2) balance between performance and consistency Critique: in the case of network partitions, writes could be made to nodes outside of the set of top N preferred nodes. In this case, there would be no guarantee that writes and reads over the N nodes would overlap since the nodes that constitute N are in flux. Therefore, the formula R+W>N has no meaning. 43
41 Read and Write Operations Coordinator Node responsible for read/writes First node in the preference list Write Operation New vector clock from coordinator Write locally and forward to N-1 nodes, if W-1 nodes respond then write was successful Read Operation Forward request to N-1 nodes, if R-1 nodes respond then forward to user User resolves conflicts and writes back result 44
42 Membership Changes Gossip-based Protocol to propagate membership changes Each node contacts a peer chosen at random every second and the two nodes efficiently reconcile their persisted membership change histories Each node is aware of the key ranges handled by its peers 45
43 Handling Failures: Hinted Handoff Imagine A goes down and N=3 Keys stored by A will now be stored by D D is hinted in the metadata that it is storing keys meant for A When A recovers, the keys at D are now copied to A 46
44 Handling Failures: Merkle Trees Each node maintains seperate Merkle tree for each key-range H(...) Merkle Tree: Leaves are hashes of keys Parents are hashes of children Minimize the amount of transferred data H(H(B3), H(H(B4), H(B5))) H(H(B1), H(B2)) H(B3) H(B1) H(B2) H(B4) Easy to check if a branch is the same (i.e. the hash is the same) H(H(B4), H(B5)) H(B5) 47
Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat
Dynamo: Amazon s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat Dynamo An infrastructure to host services Reliability and fault-tolerance at massive scale Availability providing
More informationCS Amazon Dynamo
CS 5450 Amazon Dynamo Amazon s Architecture Dynamo The platform for Amazon's e-commerce services: shopping chart, best seller list, produce catalog, promotional items etc. A highly available, distributed
More informationDemocratizing Content Publication with Coral
Democratizing Content Publication with Mike Freedman Eric Freudenthal David Mazières New York University NSDI 2004 A problem Feb 3: Google linked banner to julia fractals Users clicking directed to Australian
More informationDYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun
DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:
More informationDemocratizing Content Publication with Coral
Democratizing Content Publication with Mike Freedman Eric Freudenthal David Mazières New York University www.scs.cs.nyu.edu/coral A problem Feb 3: Google linked banner to julia fractals Users clicking
More informationHorizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage
Horizontal or vertical scalability? Scaling Out Key-Value Storage COS 418: Distributed Systems Lecture 8 Kyle Jamieson Vertical Scaling Horizontal Scaling [Selected content adapted from M. Freedman, B.
More informationCAP Theorem, BASE & DynamoDB
Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत DS256:Jan18 (3:1) Department of Computational and Data Sciences CAP Theorem, BASE & DynamoDB Yogesh Simmhan Yogesh Simmhan
More informationScaling Out Key-Value Storage
Scaling Out Key-Value Storage COS 418: Distributed Systems Logan Stafman [Adapted from K. Jamieson, M. Freedman, B. Karp] Horizontal or vertical scalability? Vertical Scaling Horizontal Scaling 2 Horizontal
More informationOverlay and P2P Networks. Applications. Prof. Sasu Tarkoma and
Overlay and P2P Networks Applications Prof. Sasu Tarkoma 10.2.2015 and 12.2.2015 Contents Monday 9.2. Applications I BitTorrent Mainline DHT Scribe and PAST Thursday 12.2. Applications II P2PSIP Internet
More informationCS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Dynamo CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Dynamo Highly available and scalable distributed data store Manages state of services that have high reliability and
More informationPresented By: Devarsh Patel
: Amazon s Highly Available Key-value Store Presented By: Devarsh Patel CS5204 Operating Systems 1 Introduction Amazon s e-commerce platform Requires performance, reliability and efficiency To support
More informationDynamo: Amazon s Highly Available Key-Value Store
Dynamo: Amazon s Highly Available Key-Value Store DeCandia et al. Amazon.com Presented by Sushil CS 5204 1 Motivation A storage system that attains high availability, performance and durability Decentralized
More informationDynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation
Dynamo Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/20 Outline Motivation 1 Motivation 2 3 Smruti R. Sarangi Leader
More informationRecap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques
Recap Distributed Systems Case Study: Amazon Dynamo CAP Theorem? Consistency, Availability, Partition Tolerance P then C? A? Eventual consistency? Availability and partition tolerance over consistency
More informationDynamo: Key-Value Cloud Storage
Dynamo: Key-Value Cloud Storage Brad Karp UCL Computer Science CS M038 / GZ06 22 nd February 2016 Context: P2P vs. Data Center (key, value) Storage Chord and DHash intended for wide-area peer-to-peer systems
More informationScalable overlay Networks
overlay Networks Dr. Samu Varjonen 1 Contents Course overview Lectures Assignments/Exercises 2 Course Overview Overlay networks and peer-to-peer technologies have become key components for building large
More informationRecap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques
Recap CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo CAP Theorem? Consistency, Availability, Partition Tolerance P then C? A?
More informationScalable overlay Networks
overlay Networks Dr. Samu Varjonen 1 Lectures MO 15.01. C122 Introduction. Exercises. Motivation. TH 18.01. DK117 Unstructured networks I MO 22.01. C122 Unstructured networks II TH 25.01. DK117 Bittorrent
More informationEECS 498 Introduction to Distributed Systems
EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Dynamo Recap Consistent hashing 1-hop DHT enabled by gossip Execution of reads and writes Coordinated by first available successor
More informationScaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman
Scaling KVS CS6450: Distributed Systems Lecture 14 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed
More informationDynamo: Amazon s Highly Available Key-value Store
Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and
More informationDistributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi
1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.
More informationDistributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs
1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds
More informationDistributed Systems. 21. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2018
Distributed Systems 21. Content Delivery Networks (CDN) Paul Krzyzanowski Rutgers University Fall 2018 1 2 Motivation Serving web content from one location presents problems Scalability Reliability Performance
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [P2P SYSTEMS] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Byzantine failures vs malicious nodes
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.
More informationCS November 2018
Distributed Systems 21. Delivery Networks (CDN) Paul Krzyzanowski Rutgers University Fall 2018 1 2 Motivation Serving web content from one location presents problems Scalability Reliability Performance
More informationCONTENT DISTRIBUTION. Oliver Michel University of Illinois at Urbana-Champaign. October 25th, 2011
CONTENT DISTRIBUTION Oliver Michel University of Illinois at Urbana-Champaign October 25th, 2011 OVERVIEW 1. Why use advanced techniques for content distribution on the internet? 2. CoralCDN 3. Identifying
More informationDrafting Behind Akamai (Travelocity-Based Detouring)
(Travelocity-Based Detouring) Ao-Jan Su, David R. Choffnes, Aleksandar Kuzmanovic and Fabián E. Bustamante Department of EECS Northwestern University ACM SIGCOMM 2006 Drafting Detour 2 Motivation Growing
More informationDistributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017
Distributed Systems 16. Distributed Lookup Paul Krzyzanowski Rutgers University Fall 2017 1 Distributed Lookup Look up (key, value) Cooperating set of nodes Ideally: No central coordinator Some nodes can
More informationPeer-to-Peer Systems and Distributed Hash Tables
Peer-to-Peer Systems and Distributed Hash Tables CS 240: Computing Systems and Concurrency Lecture 8 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Selected
More informationToday: World Wide Web! Traditional Web-Based Systems!
Today: World Wide Web! WWW principles Case Study: web caching as an illustrative example Invalidate versus updates Push versus Pull Cooperation between replicas Lecture 22, page 1 Traditional Web-Based
More information416 Distributed Systems. March 23, 2018 CDNs
416 Distributed Systems March 23, 2018 CDNs Outline DNS Design (317) Content Distribution Networks 2 Typical Workload (Web Pages) Multiple (typically small) objects per page File sizes are heavy-tailed
More informationA Survey on Research on the Application-Layer Traffic Optimization (ALTO) Problem
A Survey on Research on the Application-Layer Traffic Optimization (ALTO) Problem draft-rimac-p2prg-alto-survey-00 Marco Tomsu, Ivica Rimac, Volker Hilt, Vijay Gurbani, Enrico Marocco 75 th IETF Meeting,
More information11/12/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University
S Introduction to ig Data Week - W... S Introduction to ig Data W.. FQs Term project final report Preparation guide is available at: http://www.cs.colostate.edu/~cs/tp.html Successful final project must
More informationBackground. Distributed Key/Value stores provide a simple put/get interface. Great properties: scalability, availability, reliability
Background Distributed Key/Value stores provide a simple put/get interface Great properties: scalability, availability, reliability Increasingly popular both within data centers Cassandra Dynamo Voldemort
More informationCE693: Adv. Computer Networking
CE693: Adv. Computer Networking L-17 Naming Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought by Srinivasan Seshan at CMU. When slides are obtained from other
More informationTelematics Chapter 9: Peer-to-Peer Networks
Telematics Chapter 9: Peer-to-Peer Networks Beispielbild User watching video clip Server with video clips Application Layer Presentation Layer Application Layer Presentation Layer Session Layer Session
More informationDynamo: Amazon s Highly Available Key-value Store
Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and
More informationThe Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla
The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Venugopalan Ramasubramanian Emin Gün Sirer Presented By: Kamalakar Kambhatla * Slides adapted from the paper -
More informationDynamo Tom Anderson and Doug Woos
Dynamo motivation Dynamo Tom Anderson and Doug Woos Fast, available writes - Shopping cart: always enable purchases FLP: consistency and progress at odds - Paxos: must communicate with a quorum Performance:
More information10. Replication. Motivation
10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure
More informationLecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011
Lecture 6: Overlay Networks CS 598: Advanced Internetworking Matthew Caesar February 15, 2011 1 Overlay networks: Motivations Protocol changes in the network happen very slowly Why? Internet is shared
More informationCS 43: Computer Networks. 14: DHTs and CDNs October 3, 2018
CS 43: Computer Networks 14: DHTs and CDNs October 3, 2018 If Alice wants to stream Mad Men over a highspeed Internet connection her browser may choose a video rate A. in the range of Mbps. B. in the range
More informationOverview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste
Overview 5-44 5-44 Computer Networking 5-64 Lecture 6: Delivering Content: Peer to Peer and CDNs Peter Steenkiste Web Consistent hashing Peer-to-peer Motivation Architectures Discussion CDN Video Fall
More informationCS 655 Advanced Topics in Distributed Systems
Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3
More informationPeer-to-peer computing research a fad?
Peer-to-peer computing research a fad? Frans Kaashoek kaashoek@lcs.mit.edu NSF Project IRIS http://www.project-iris.net Berkeley, ICSI, MIT, NYU, Rice What is a P2P system? Node Node Node Internet Node
More informationToday. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables
Peer-to-Peer Systems and Distributed Hash Tables COS 418: Distributed Systems Lecture 7 Today 1. Peer-to-Peer Systems Napster, Gnutella, BitTorrent, challenges 2. Distributed Hash Tables 3. The Chord Lookup
More informationDistributed Hash Tables Chord and Dynamo
Distributed Hash Tables Chord and Dynamo (Lecture 19, cs262a) Ion Stoica, UC Berkeley October 31, 2016 Today s Papers Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Ion Stoica,
More informationCS November 2017
Distributed Systems 21. Delivery Networks () Paul Krzyzanowski Rutgers University Fall 2017 1 2 Motivation Serving web content from one location presents problems Scalability Reliability Performance Flash
More informationJohn S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante
John S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante Northwestern, EECS http://aqualab.cs.northwestern.edu ! DNS designed to map names to addresses Evolved into a large-scale distributed system!
More informationLarge-Scale Key-Value Stores Eventual Consistency Marco Serafini
Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,
More informationLast Class: Consistency Models. Today: Implementation Issues
Last Class: Consistency Models Need for replication Data-centric consistency Strict, linearizable, sequential, causal, FIFO Lecture 15, page 1 Today: Implementation Issues Replica placement Use web caching
More informationFAQs Snapshots and locks Vector Clock
//08 CS5 Introduction to Big - FALL 08 W.B.0.0 CS5 Introduction to Big //08 CS5 Introduction to Big - FALL 08 W.B. FAQs Snapshots and locks Vector Clock PART. LARGE SCALE DATA STORAGE SYSTEMS NO SQL DATA
More informationCSE 124: CONTENT-DISTRIBUTION NETWORKS. George Porter December 4, 2017
CSE 124: CONTENT-DISTRIBUTION NETWORKS George Porter December 4, 2017 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons
More informationCS 43: Computer Networks BitTorrent & Content Distribution. Kevin Webb Swarthmore College September 28, 2017
CS 43: Computer Networks BitTorrent & Content Distribution Kevin Webb Swarthmore College September 28, 2017 Agenda BitTorrent Cooperative file transfers Briefly: Distributed Hash Tables Finding things
More informationOverlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma
Overlay and P2P Networks Unstructured networks Prof. Sasu Tarkoma 20.1.2014 Contents P2P index revisited Unstructured networks Gnutella Bloom filters BitTorrent Freenet Summary of unstructured networks
More informationThere is a tempta7on to say it is really used, it must be good
Notes from reviews Dynamo Evalua7on doesn t cover all design goals (e.g. incremental scalability, heterogeneity) Is it research? Complexity? How general? Dynamo Mo7va7on Normal database not the right fit
More informationPerformance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences
Performance and Forgiveness June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences Margo Seltzer Architect Outline A consistency primer Techniques and costs of consistency
More informationDistributed Hash Tables
Distributed Hash Tables What is a DHT? Hash Table data structure that maps keys to values essen=al building block in so?ware systems Distributed Hash Table (DHT) similar, but spread across many hosts Interface
More informationEARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems.
: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems. 1 K.V.K.Chaitanya, 2 Smt. S.Vasundra, M,Tech., (Ph.D), 1 M.Tech (Computer Science), 2 Associate Professor, Department
More informationDynamo: Amazon s Highly Available Key-value Store
Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and
More informationDistributed Hash Tables
Distributed Hash Tables CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University.
More informationConsistency in Distributed Systems
Consistency in Distributed Systems Recall the fundamental DS properties DS may be large in scale and widely distributed 1. concurrent execution of components 2. independent failure modes 3. transmission
More informationMarch 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE
for for March 10, 2006 Agenda for Peer-to-Peer Sytems Initial approaches to Their Limitations CAN - Applications of CAN Design Details Benefits for Distributed and a decentralized architecture No centralized
More informationCONTENT-DISTRIBUTION NETWORKS
CONTENT-DISTRIBUTION NETWORKS George Porter June 1, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license These
More informationOverlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q
Overlay networks To do q q q Overlay networks P2P evolution DHTs in general, Chord and Kademlia Turtles all the way down Overlay networks virtual networks Different applications with a wide range of needs
More informationServer Selection Mechanism. Server Selection Policy. Content Distribution Network. Content Distribution Networks. Proactive content replication
Content Distribution Network Content Distribution Networks COS : Advanced Computer Systems Lecture Mike Freedman Proactive content replication Content provider (e.g., CNN) contracts with a CDN CDN replicates
More informationCOOPERATIVE WEB CACHING OF DYNAMIC WEB CONTENT
The American University in Cairo School of Sciences and Engineering COOPERATIVE WEB CACHING OF DYNAMIC WEB CONTENT A Thesis Submitted to The Department of Computer Science and Engineering in partial fulfillment
More informationOverlay and P2P Networks. Introduction. Prof. Sasu Tarkoma
Overlay and P2P Networks Introduction Prof. Sasu Tarkoma 12.1.2015 Contents Course Overview Lectures Assignments/Exercises Course Overview Overlay networks and peer-to-peer technologies have become key
More informationContent Overlays. Nick Feamster CS 7260 March 12, 2007
Content Overlays Nick Feamster CS 7260 March 12, 2007 Content Overlays Distributed content storage and retrieval Two primary approaches: Structured overlay Unstructured overlay Today s paper: Chord Not
More informationContent Distribution. Today. l Challenges of content delivery l Content distribution networks l CDN through an example
Content Distribution Today l Challenges of content delivery l Content distribution networks l CDN through an example Trends and application need " Some clear trends Growing number of and faster networks
More informationCS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications
Introduction to Computer Networks Lecture30 Today s lecture Peer to peer applications Napster Gnutella KaZaA Chord What is P2P? Significant autonomy from central servers Exploits resources at the edges
More informationDistributed Data Management Replication
Felix Naumann F-2.03/F-2.04, Campus II Hasso Plattner Institut Distributing Data Motivation Scalability (Elasticity) If data volume, processing, or access exhausts one machine, you might want to spread
More informationCompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang
CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4 Xiaowei Yang xwy@cs.duke.edu Overview Problem Evolving solutions IP multicast Proxy caching Content distribution networks
More informationOverlay and P2P Networks. Unstructured networks. PhD. Samu Varjonen
Overlay and P2P Networks Unstructured networks PhD. Samu Varjonen 25.1.2016 Contents Unstructured networks Last week Napster Skype This week: Gnutella BitTorrent P2P Index It is crucial to be able to find
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [DYNAMO & GOOGLE FILE SYSTEM] Frequently asked questions from the previous class survey What s the typical size of an inconsistency window in most production settings? Dynamo?
More informationToday s class. CSE 123b Communications Software. Telnet. Network File System (NFS) Quick descriptions of some other sample applications
CSE 123b Communications Software Spring 2004 Today s class Quick examples of other application protocols Mail, telnet, NFS Content Distribution Networks (CDN) Lecture 12: Content Distribution Networks
More informationTECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica
TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica Examination Architecture of Distributed Systems (2IMN10 / 2II45), on Monday November 2, 2015, from 13.30 to 16.30 hours. Indicate on
More informationDHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..
DHT Overview P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar DHTs provide a simple primitive put (key,value) get (key) Data/Nodes distributed over a key-space High-level idea: Move
More informationDISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed
More informationSCALABLE CONSISTENCY AND TRANSACTION MODELS
Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:
More informationScalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou
Scalability In Peer-to-Peer Systems Presented by Stavros Nikolaou Background on Peer-to-Peer Systems Definition: Distributed systems/applications featuring: No centralized control, no hierarchical organization
More informationConsistency and Replication (part b)
Consistency and Replication (part b) EECS 591 Farnam Jahanian University of Michigan Tanenbaum Chapter 6.1-6.5 Eventual Consistency A very weak consistency model characterized by the lack of simultaneous
More informationBuilding Consistent Transactions with Inconsistent Replication
Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports University of Washington Distributed storage systems
More informationChapter 6: Distributed Systems: The Web. Fall 2012 Sini Ruohomaa Slides joint work with Jussi Kangasharju et al.
Chapter 6: Distributed Systems: The Web Fall 2012 Sini Ruohomaa Slides joint work with Jussi Kangasharju et al. Chapter Outline Web as a distributed system Basic web architecture Content delivery networks
More informationCS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.
Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message
More informationDepartment of Computer Science Institute for System Architecture, Chair for Computer Networks. Caching, Content Distribution and Load Balancing
Department of Computer Science Institute for System Architecture, Chair for Computer Networks Caching, Content Distribution and Load Balancing Motivation Which optimization means do exist? Where should
More informationPitfalls for ISP-friendly P2P design. Michael Piatek*, Harsha V. Madhyastha, John P. John*, Arvind Krishnamurthy*, Thomas Anderson* *UW, UCSD
Pitfalls for ISP-friendly P2P design Michael Piatek*, Harsha V. Madhyastha, John P. John*, Arvind Krishnamurthy*, Thomas Anderson* *UW, UCSD P2P & ISPs P2P systems: Large volume of traffic (20 80% of total)
More information08 Distributed Hash Tables
08 Distributed Hash Tables 2/59 Chord Lookup Algorithm Properties Interface: lookup(key) IP address Efficient: O(log N) messages per lookup N is the total number of servers Scalable: O(log N) state per
More informationCSE 123b Communications Software
CSE 123b Communications Software Spring 2002 Lecture 13: Content Distribution Networks (plus some other applications) Stefan Savage Some slides courtesy Srini Seshan Today s class Quick examples of other
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [REPLICATION & CONSISTENCY] Frequently asked questions from the previous class survey Shrideep Pallickara Computer Science Colorado State University L25.1 L25.2 Topics covered
More informationCSC2231: DNS with DHTs
CSC2231: DNS with DHTs http://www.cs.toronto.edu/~stefan/courses/csc2231/05au Stefan Saroiu Department of Computer Science University of Toronto Administrivia Next lecture: P2P churn Understanding Availability
More information11/13/2018 CACHING, CONTENT-DISTRIBUTION NETWORKS, AND OVERLAY NETWORKS ATTRIBUTION
CACHING, CONTENT-DISTRIBUTION NETWORKS, AND OVERLAY NETWORKS George Porter November 1, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike.0 Unported (CC BY-NC-SA.0)
More informationIntroduction to Distributed Data Systems
Introduction to Distributed Data Systems Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook January
More informationWeb, HTTP, Caching, CDNs
Web, HTTP, Caching, CDNs Outline Web HyperText Transfer Protocol (HTTP) Inefficiencies in HTTP HTTP Persistent Connections Caching CDNs Consistent Hashing CS 640 1 Web Original goal of the web: mechanism
More informationOverlay and P2P Networks. Introduction. Prof. Sasu Tarkoma
Overlay and P2P Networks Introduction Prof. Sasu Tarkoma 13.1.2014 Contents Course Overview Lectures Assignments/Exercises Course Overview Overlay networks and peer-to-peer technologies have become key
More informationPage 1. Key Value Storage"
Key Value Storage CS162 Operating Systems and Systems Programming Lecture 14 Key Value Storage Systems March 12, 2012 Anthony D. Joseph and Ion Stoica http://inst.eecs.berkeley.edu/~cs162 Handle huge volumes
More informationConsistency and Replication 1/62
Consistency and Replication 1/62 Replicas and Consistency??? Tatiana Maslany in the show Orphan Black: The story of a group of clones that discover each other and the secret organization Dyad, which was
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing
More information