Page 1. Key Value Storage" System Examples" Key Values: Examples "

Similar documents
Page 1. Key Value Storage" System Examples" Key Values: Examples "

Page 1. Key Value Storage"

Distributed Key-Value Stores UCSB CS170

Page 1. Goals for Today" What Is A Protocol?" CS162 Operating Systems and Systems Programming Lecture 10. Protocols, Layering and e2e Argument"

Distributed Hash Tables Chord and Dynamo

Haridimos Kondylakis Computer Science Department, University of Crete

Goals for Today s Class. EE 122: Networks & Protocols. What Global (non-digital) Communication Network Do You Use Every Day?

EE 122: Introduction To Communication Networks. Some Questions & Answers

Page 1. Review: The Problem" Software System Modularity" Solution: Intermediate Layers"

Internet Design Principles

CS3600 SYSTEMS AND NETWORKS

416 Distributed Systems. Networks review; Day 1 of 2 Jan 5 + 8, 2018

Lecture 2: Layering & End-to-End

Chapter 1. Computer Networks and the Internet

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit)

Page 1. Goals for Today" Discussion" Example: Reliable File Transfer" CS162 Operating Systems and Systems Programming Lecture 11

LECTURE 9. Ad hoc Networks and Routing

CS 43: Computer Networks. 08:Network Services and Distributed Systems 19 September

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Dynamo: Key-Value Cloud Storage

Introduction to Distributed Data Systems

Introduction to Networks

CSCI Computer Networks

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Large-Scale Data Stores and Probabilistic Protocols

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

The Internet. Overview. Network building blocks

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

Announcements Computer Networking. What is the Objective of the Internet? Today s Lecture

CE 443: Computer Networks

Replication in Distributed Systems

Networking and Internetworking 1

The Transmission Control Protocol (TCP)

CS4700/CS5700 Fundaments of Computer Networks

10. Replication. Motivation

Web Mechanisms. Draft: 2/23/13 6:54 PM 2013 Christopher Vickery

Review of Topology and Access Techniques / Switching Concepts

Lecture 3 Protocol Stacks and Layering

CSC 401 Data and Computer Communications Networks

CMPE 80N: Introduction to Networking and the Internet

Scaling Out Key-Value Storage

CS457 Transport Protocols. CS 457 Fall 2014

Outline. Inter-Process Communication. IPC across machines: Problems. CSCI 4061 Introduction to Operating Systems

CS 5520/ECE 5590NA: Network Architecture I Spring Lecture 13: UDP and TCP

Next Steps Spring 2011 Lecture #18. Multi-hop Networks. Network Reliability. Have: digital point-to-point. Want: many interconnected points

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Causal Consistency. CS 240: Computing Systems and Concurrency Lecture 16. Marco Canini

IP Packet Switching. Goals of Todayʼs Lecture. Simple Network: Nodes and a Link. Connectivity Links and nodes Circuit switching Packet switching

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

CMPE150 Midterm Solutions

Connectionless and Connection-Oriented Protocols OSI Layer 4 Common feature: Multiplexing Using. The Transmission Control Protocol (TCP)

EECS 228a Lecture 1 Overview: Networks. Jean Walrand

CS162 Operating Systems and Systems Programming Lecture 21. Networking. Page 1

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

CS 268: Internet Architecture & E2E Arguments. Today s Agenda. Scott Shenker and Ion Stoica (Fall, 2010) Design goals.

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Networking Overview. CS Computer Security Profs. Vern Paxson & David Wagner

CSCI 466 Midterm Networks Fall 2013

Lecture 2: Links and Signaling

Data Centers. Tom Anderson

Distributed Systems 16. Distributed File Systems II

Lecture (11) OSI layer 4 protocols TCP/UDP protocols

Page 1. Goals for Today" Why is Networking Important?" Example: Whatʼs in a Search Query?"

What Global (non-digital) Communication Network Do You Use Every Day?"

Internet Technology. 06. Exam 1 Review Paul Krzyzanowski. Rutgers University. Spring 2016

Internet Technology 3/2/2016

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Reminder: Datalink Functions Computer Networking. Datalink Architectures

Transport Protocols Reading: Sections 2.5, 5.1, and 5.2. Goals for Todayʼs Lecture. Role of Transport Layer

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

CS 3516: Advanced Computer Networks

Distributed Systems CS6421

CLOUD-SCALE FILE SYSTEMS

Dynamo: Amazon s Highly Available Key-Value Store

CS 640: Introduction to Computer Networks. Today s Lecture. Page 1

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 19: Networks and Distributed Systems

CS 43: Computer Networks. 14: DHTs and CDNs October 3, 2018

Operating Systems. 16. Networking. Paul Krzyzanowski. Rutgers University. Spring /6/ Paul Krzyzanowski

CSEN 503 Introduction to Communication Networks

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Programming Assignment 3: Transmission Control Protocol

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

Computer Communication Networks Midterm Review

SUBJECT: DATA COMMUNICATION AND NETWORK SEMESTER: V SEMESTER COURSE: BCA SUBJECT TEACHER: Dr.K.Chitra Assistant Professor, Department of Computer

Application Layer Introduction; HTTP; FTP

Distributed Systems (5DV147)

The Google File System

Question 1 (6 points) Compare circuit-switching and packet-switching networks based on the following criteria:

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 20: Networks and Distributed Systems

2/29/2012. Part 1: Networking overview Part 2: Data transfer methods Part 3: Communication Channels

Eventual Consistency 1

CA485 Ray Walshe Google File System

CS Amazon Dynamo

EE 122: Peer-to-Peer Networks

Peer-to-Peer Systems and Distributed Hash Tables

Review for Internet Introduction

Reliable Transport I: Concepts and TCP Protocol

Transcription:

Key Value Storage" CS162 Operating Systems and Systems Programming Lecture 15 Key-Value Storage, Network Protocols" March 20, 2013 Anthony D. Joseph http://inst.eecs.berkeley.edu/~cs162 Interface put(key, value); // insert/write value associated with key value = get(key); // get/read data associated with key Abstraction used to implement File systems: value content à block Sometimes as a simpler but more scalable database Can handle large volumes of data, e.g., PBs Need to distribute data over hundreds, even thousands of machines Lec 15.2 Key Values: Examples " Amazon: Key: customerid Value: customer profile (e.g., buying history, credit card,..) Facebook, Twitter: Key: UserID Value: user profile (e.g., posting history, photos, friends, ) icloud/itunes: Key: Movie/song name Value: Movie, Song Distributed file systems Key: Block ID Value: Block Lec 15.3 System Examples" Google File System, Hadoop Dist. File Systems (HDFS)" Amazon" Dynamo: internal key value store used to power Amazon.com (shopping cart) Simple Storage System (S3) BigTable/HBase/Hypertable: distributed, scalable data storage Cassandra: distributed data management system (Facebook) Memcached: in-memory key-value store for small chunks of arbitrary data (strings, objects) edonkey/emule: peer-to-peer sharing system Lec 15.4 Page 1

Key Value Store" Also called a Distributed Hash Table (DHT) Main idea: partition set of key-values across many machines key, value Challenges" Fault Tolerance: handle machine failures without losing data and without degradation in performance Scalability: " Need to scale to thousands of machines Need to allow easy addition of new machines Consistency: maintain data consistency in face of node failures and message losses Heterogeneity (if deployed as peer-to-peer systems): Latency: 1ms to 1000ms Bandwidth: 32Kb/s to 100Mb/s Lec 15.5 Lec 15.6 Key Questions" Directory-Based Architecture" put(key, value): where do you store a new (key, value) tuple? get(key): where is the value associated with a given key stored? Have a node maintain the mapping between keys and the machines (nodes) that store the values associated with the keys" Master/Directory And, do the above while providing Fault Tolerance Scalability Consistency put(k14, V14) put(k14, V14) K5 N2 K14 N3 K105 N50 K5 V5 K14 V14 K105V105 N 1 N 2 N 3 N 50 Lec 15.7 Lec 15.8 Page 2

Directory-Based Architecture" Have a node maintain the mapping between keys and the machines (nodes) that store the values associated with the keys" get(k14) V14 get(k14) Master/Directory V14 K5 N2 K14 N3 K105 N50 Directory-Based Architecture" Having the master relay the requests à recursive query" Another method: iterative query (this slide) Return node to requester and let requester contact node put(k14, V14) N3 put(k14, V14) Master/Directory K5 N2 K14 N3 K105 N50 K5 V5 K14 V14 K105V105 K5 V5 K14 V14 K105V105 N 1 N 2 N 3 N 50 N 1 N 2 N 3 N 50 Lec 15.9 Lec 15.10 Directory-Based Architecture" Having the master relay the requests à recursive query" Another method: iterative query" Return node to requester and let requester contact node get(k14) N3 V14 get(k14) Master/Directory K5 N2 K14 N3 K105 N50 K5 V5 K14 V14 K105V105 N 1 N 2 N 3 N 50 Lec 15.11 Discussion: Iterative vs. Recursive Query" get(k14) V14 Recursive get(k14) K14 V14 Master/Directory V14 K14 N3 N1 N2 N3 N50 get(k14) N3 Recursive Query: Advantages:» Faster, as typically master/directory closer to nodes» Easier to maintain consistency, as master/directory can serialize puts()/gets() Disadvantages: scalability bottleneck, as all Values go through master/directory Iterative Query Advantages: more scalable Disadvantages: slower, harder to enforce data consistency V14 Iterative get(k14) K14 V14 Master/Directory K14 N3 N1 N2 N3 N50 Lec 15.12 Page 3

Fault Tolerance" Replicate value on several nodes Usually, place replicas on different racks in a datacenter to guard against rack failures put(k14, V14) N1, N3 put(k14, V14) put(k14, V14), N1 Master/Directory K5 N2 K14 N1,N3 K105 N50 Fault Tolerance" Again, we can have Recursive replication (previous slide) Iterative replication (this slide) put(k14, V14) N1, N3 put(k14, V14) put(k14, V14) Master/Directory K5 N2 K14 N1,N3 K105 N50 K14 V14 K5 V5 K14 V14 K105V105 K14 V14 K5 V5 K14 V14 K105V105 N 1 N 2 N 3 N 50 N 1 N 2 N 3 N 50 Lec 15.13 Lec 15.14 Scalability" Scalability: Load Balancing" Storage: use more nodes Request Throughput: Can serve requests from all nodes on which a value is stored in parallel Master can replicate a popular value on more nodes Master/directory scalability: Replicate it Partition it, so different keys are served by different masters/directories» How do you partition? (p2p DHDT, end of semester) Directory keeps track of the storage availability at each node Preferentially insert new values on nodes with more storage available What happens when a new node is added? Cannot insert only new values on new node. Why? Move values from the heavy loaded nodes to the new node What happens when a node fails? Need to replicate values from fail node to other nodes Lec 15.15 Lec 15.16 Page 4

Replication Challenges" Need to make sure that a value is replicated correctly How do you know a value has been replicated on every node? Wait for acknowledgements from every node What happens if a node fails during replication? Pick another node and try again Consistency" How close does a distributed system emulate a single machine in terms of read and write semantics? Q: Assume put(k14, V14 ) and put(k14, V14 ) are concurrent, what value ends up being stored? A: assuming put() is atomic, then either V14 or V14, right? What happens if a node is slow? Slow down the entire put()? Pick another node? In general, with multiple replicas Slow puts and fast gets Lec 15.17 Q: Assume a client calls put(k14, V14) and then get(k14), what is the result returned by get()? A: It should be V14, right? Above semantics, not trivial to achieve in distributed systems Lec 15.18 Concurrent Writes (Updates)" Concurrent Writes (Updates) " If concurrent updates (i.e., puts to same key) may need to make sure that updates happen in the same order put(k14, V14 ) put(k14, V14ʼʼ) put(k14, V14 ) Master/Directory K5 N2 K14 N1,N3 K105 N50 put(k14, V14 ) put(k14, V14ʼ) and put(k14, V14ʼʼ) reach N1 and N3 in reverse order What does get(k14) return? Undefined If concurrent updates (i.e., puts to same key) may need to make sure that updates happen in the same order put(k14, V14 ) put(k14, V14ʼʼ) put(k14, V14 ) Master/Directory put(k14, V14 ) K5 N2 K14 N1,N3 K105 N50 put(k14, V14 ) put(k14, V14 ') put(k14, V14ʼ) and put(k14, V14ʼʼ) reach N1 and N3 in reverse order What does get(k14) return? Undefined K14 V14 V14 K5 V5 K14 V14 V14 K105V105 K14 V14 V14 K5 V5 K14 V14 V14 K105V105 N 1 N 2 N 3 N 50 N 1 N 2 N 3 N 50 Lec 15.19 Lec 15.20 Page 5

Read after Write" Read not guaranteed to return value of latest write Can happen if Master processes requests in different threads Master/Directory get(k14) happens right after put(k14, V14 ) K5 N2 put(k14, V14ʼ) get(k14) K14 N1,N3 V14 get(k14) reaches N3 before K105 N50 put(k14, V14ʼ) put(k14, V14 ) K14 V14 V14 V14 get(k14, V14 ) put(k14, V14 ) K5 V5 K14 V14 V14 K105V105 N 1 N 2 N 3 N 50 Consistency (cont d)" Large variety of consistency models: Atomic consistency (linearizability): reads/writes (gets/puts) to replicas appear as if there was a single underlying replica (single system image)» Think one updated at a time» Transactions (later in the class) Eventual consistency: given enough time all updates will propagate through the system» One of the weakest form of consistency; used by many systems in practice And many others: causal consistency, sequential consistency, strong consistency, Lec 15.21 Lec 15.22 Strong Consistency" Quorum Consensus" Assume Master serializes all operations Improve put() and get() operation performance Challenge: master becomes a bottleneck Not address here Still want to improve performance of reads/writes à quorum consensus Define a replica set of size N put() waits for acks from at least W replicas get() waits for responses from at least R replicas W+R > N Why does it work? There is at least one node that contains the update Why you may use W+R > N+1? Lec 15.23 Lec 15.24 Page 6

Quorum Consensus Example" N=3, W=2, R=2 Replica set for K14: {N1, N2, N4} Assume put() on N3 fails Quorum Consensus Example" Now, issuing get() to any two nodes out of three will return the answer ACK put(k14, V14) put(k14, V14) ACK put(k14, V14) V14 get(k14) nill get(k14) K14 V14 K14 V14 K14 V14 K14 V14 N 1 N 2 N 3 N 4 N 1 N 2 N 3 N 4 Lec 15.25 Lec 15.26 Conclusions: Key Value Store" Quiz 15.1: Key-Value Store" Very large scale storage systems Two operations put(key, value) value = get(key) Challenges Fault Tolerance à replication Scalability à serve get() s in parallel; replicate/cache hot tuples Consistency à quorum consensus to improve put/get performance Q1: True _ False _ On a single node, a key-value store can be implemented by a hash-table Q2: True _ False _ Master can be a bottleneck point for a key-value store Q3: True _ False _ Iterative puts achieve lower throughput than recursive puts Q4: True _ False _ With quorum consensus, we can improve read performance at expense of write performance 3/20/2013 Anthony D. Joseph CS162 UCB Spring 2013 Lec 15.27 Lec 15.28 Page 7

Quiz 15.1: Key-Value Store" Q1: True _ X" False _ On a single node, a key-value store can be implemented by a hash-table Q2: True _ X" False _ Master can be a bottleneck point for a key-value store Q3: True _ False _ X" Iterative puts achieve lower throughput than recursive puts Q4: True _ X" False _ With quorum consensus, we can improve read performance at expense of write performance 5min Break" Lec 15.29 Lec 15.30 Networking: This Lectureʼs Goals" What Is A Protocol?" What is a protocol? Layering A protocol is an agreement on how to communicate Includes Syntax: how a communication is specified & structured» Format, order messages are sent and received Semantics: what a communication means» Actions taken when transmitting, receiving, or when a timer expires Many slides generated from my lecture notes by Vern Paxson, and Scott Shenker." Lec 15.31 Lec 15.32 Page 8

Examples of Protocols in Human Interactions" Telephone 1. (Pick up / open up the phone.) 2. Listen for a dial tone / see that you have service. 3. Dial 4. Should hear ringing 5. Callee: Hello? 6. Caller: Hi, it s Alice. Or: Hi, it s me ( what s that about?) 7. Caller: Hey, do you think blah blah blah pause 8. Callee: Yeah, blah blah blah pause" 9. Caller: Bye 10. Callee: Bye 11. Hang up Examples of Protocols in Human Interactions" Asking a question 1. Raise your hand. 2. Wait to be called on. 3. Or: wait for speaker to pause and vocalize Lec 15.33 Lec 15.34 End System: Computer on the Net" Internet Client program Running on end host Requests service Clients and Servers" E.g., Web browser GET /index.html Also known as a host Lec 15.35 Lec 15.36 Page 9

Clients and Servers" Client-Server Communication" Client program Running on end host Requests service E.g., Web browser GET /index.html Server program Running on end host Provides service E.g., Web server Client sometimes on Initiates a request to the server when interested E.g., Web browser on your laptop or cell phone Doesn t communicate directly with other clients Needs to know the server s address Server is always on Services requests from many client hosts E.g., Web server for the www.cnn.com Web site Doesn t initiate contact with the clients Needs a fixed, well-known address Site under construction Lec 15.37 Lec 15.38 Peer-to-Peer Communication" The Problem" Not always-on server at the center of it all Hosts can come and go, and change addresses Hosts may have a different address each time Many different applications email, web, P2P, etc. Example: peer-to-peer file sharing (e.g., Bittorrent) Any host can request files, send files, query to find where a file is located, respond to queries, and forward queries Scalability by harnessing millions of peers Each peer acting as both a client and server Many different network styles and technologies Wireless vs. wired vs. optical, etc. How do we organize this mess? Lec 15.39 Lec 15.40 Page 10

The Problem (cont d)" Solution: Intermediate Layers" Application" Skype " SSH" NFS" HTTP" Introduce intermediate layers that provide set of abstractions for various network functionality & technologies A new app/media implemented only once Variation on add another level of indirection Transmission" Media" Coaxial " cable" Fiber" optic" Packet" Radio" Application" Skype " SSH" NFS" HTTP" Re-implement every application for every technology? No But how does the Internet design avoid this? Intermediate " layers" Transmission" Media" Coaxial " cable" Fiber" optic" Packet" radio" Lec 15.41 Lec 15.42 Software System Modularity" Network System Modularity" Partition system into modules & abstractions: Well-defined interfaces give flexibility Hides implementation - thus, it can be freely changed Extend functionality of system by adding new modules E.g., libraries encapsulating set of functionality E.g., programming language + compiler abstracts away not only how the particular CPU works but also the basic computational model Well-defined interfaces hide information Isolate assumptions Present high-level abstractions But can impair performance Like software modularity, but: Implementation distributed across many machines (routers and hosts) Must decide: How to break system into modules» Layering What functionality does each module implement» End-to-End Principle" We will address these choices next lecture Lec 15.43 Lec 15.44 Page 11

Layering: A Modular Approach" Partition the system Each layer solely relies on services from layer below Each layer solely exports services to layer above Interface between layers defines interaction Hides implementation details Layers can change without disturbing other layers Protocol Standardization" Ensure communicating hosts speak the same protocol Standardization to enable multiple implementations Or, the same folks have to write all the software Standardization: Internet Engineering Task Force Based on working groups that focus on specific issues Produces Request For Comments (RFCs)» Promoted to standards via rough consensus and running code IETF Web site is http://www.ietf.org RFCs archived at http://www.rfc-editor.org De facto standards: same folks writing the code P2P file sharing, Skype, <your protocol here> Lec 15.45 Lec 15.46 Example: The Internet Protocol (IP): Best-Effort Packet Delivery" Datagram packet switching Send data in packets Header with source & destination address Service it provides: Packets may be lost Packets may be corrupted Packets may be delivered out of order source destination Example: Transmission Control Protocol (TCP)" Communication service Ordered, reliable byte stream Simultaneous transmission in both directions Key mechanisms at end hosts Retransmit lost and corrupted packets Discard duplicate packets and put packets in order Flow control to avoid overloading the receiver buffer Congestion control to adapt sending rate to network load TCP connection IP network source network destination Lec 15.47 Lec 15.48 Page 12

Quiz 15.2: Protocols" Q1: True _ False _ Protocols specify the syntax and semantics of communication Q2: True _ False _ Protocols specify the implementation Q3: True _ False _ Layering helps to improve application performance Q4: True _ False _ Best Effort packet delivery ensures that packets are delivered in order Q5: True _ False _ In p2p systems a node is both a client and a server Q6: True _ False _ TCP ensures that each packet is delivered within a predefined amount of time Quiz 15.2: Protocols" Q1: True _ X" False _ Protocols specify the syntax and semantics of communication Q2: True _ False _ X" Protocols specify the implementation Q3: True _ False _ X" Layering helps to improve application performance Q4: True _ False _ X" Best Effort packet delivery ensures that packets are delivered in order Q5: True X" _ False _ In p2p systems a node is both a client and a server Q6: True _ False _ X" TCP ensures that each packet is delivered within a predefined amount of time Lec 15.49 Lec 15.50 Summary" Roles of Standardization Clients, servers, peer-to-peer Layered architecture as a powerful means for organizing complex networks Though layering has its drawbacks too Next lecture Layering End-to-end arguments Lec 15.51 Page 13