Distributed Systems Homework 1 (6 problems)

Similar documents
CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Paxos provides a highly available, redundant log of events

416 practice questions (PQs)

Homework 5 (by Tupac Shakur) Solutions Due: Monday Dec 3, 11:59pm

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

Distributed Systems Final Exam

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Spring Second Exam.

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed Systems 11. Consensus. Paul Krzyzanowski

Map-Reduce. Marco Mura 2010 March, 31th

Ceph: A Scalable, High-Performance Distributed File System PRESENTED BY, NITHIN NAGARAJ KASHYAP

Replications and Consensus

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Question 1 (6 points) Compare circuit-switching and packet-switching networks based on the following criteria:

Fault Tolerance Dealing with an imperfect world

CS October 2017

Distributed Consensus Protocols

Question A B C D E F Points / 28 / 16 / 21 / 15 / 18 / 2

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Naming WHAT IS NAMING? Name: Entity: Slide 3. Slide 1. Address: Identifier:

Internet Technology. 06. Exam 1 Review Paul Krzyzanowski. Rutgers University. Spring 2016

The Google File System

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Internet Technology 3/2/2016

The Application Layer: Sockets, DNS

CS 425 / ECE 428 Distributed Systems Fall 2017

Paxos Made Live. An Engineering Perspective. Authors: Tushar Chandra, Robert Griesemer, Joshua Redstone. Presented By: Dipendra Kumar Jha

Large-Scale Data Stores and Probabilistic Protocols

15-440/15-640: Homework 3 Due: November 8, :59pm

PushyDB. Jeff Chan, Kenny Lam, Nils Molina, Oliver Song {jeffchan, kennylam, molina,

Balancing RTO, RPO, and budget. Table of Contents. White Paper Seven steps to disaster recovery nirvana for wholesale distributors

Paxos and Replication. Dan Ports, CSEP 552

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Laboratory. Disk Scheduling. Objectives. References. Watch how the operating system schedules the reading or writing of disk tracks.

LAB 8 CONFIGURING DNS ZONES

CMPE150 Midterm Solutions

CS 167 Final Exam Solutions

Homework 5 (by Tupac Shakur) Due: Monday Dec 3, 11:59pm

Distributed System. Gang Wu. Spring,2018

Recovering from a Crash. Three-Phase Commit

Parallel DBs. April 25, 2017

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Today: Fault Tolerance

Applications of Paxos Algorithm

Consensus, impossibility results and Paxos. Ken Birman

Distributed DNS Name server Backed by Raft

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

CSC2231: DNS with DHTs

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Distributed System Engineering: Spring Exam I

Distributed Systems 24. Fault Tolerance

CS519: Computer Networks. Lecture 6: Apr 5, 2004 Naming and DNS

Distributed Systems. Day 9: Replication [Part 1]

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Lecture 05: Application Layer (Part 02) Domain Name System. Dr. Anis Koubaa

CS 677 Distributed Operating Systems. Programming Assignment 3: Angry birds : Replication, Fault Tolerance and Cache Consistency

Domain Name System (DNS) Session 2: Resolver Operation and debugging. Joe Abley AfNOG Workshop, AIS 2017, Nairobi

EECS 498 Introduction to Distributed Systems

Computer Networks - Midterm

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

The Chubby Lock Service for Loosely-coupled Distributed systems

Homework 3 1 DNS. A root. A com. A google.com

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed Systems Consensus

Homework 7: Subsets Due: 11:59 PM, Oct 23, 2018

There Is More Consensus in Egalitarian Parliaments

Exam 2 Review. October 29, Paul Krzyzanowski 1

Distributed Systems 8L for Part IB

AlwaysOn Availability Groups: Backups, Restores, and CHECKDB

Naming. To do. q What s in a name q Flat naming q Structured naming q Attribute-based naming q Next: Content distribution networks

Send me up to 5 good questions in your opinion, I ll use top ones Via direct message at slack. Can be a group effort. Try to add some explanation.

Naming. Distributed Systems IT332

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Consensus and related problems

Dynamo: Key-Value Cloud Storage

Text Input and Conditionals

DNS Session 2: DNS cache operation and DNS debugging. Joe Abley AfNOG 2006 workshop

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Spring Quiz II Solutions. q2 Grades Mean: 72.48, Median: 74.00, StDev: 14.

Distributed Commit in Asynchronous Systems

Lecture 4 Naming. Prof. Wilson Rivera. University of Puerto Rico at Mayaguez Electrical and Computer Engineering Department

Last Time. Internet in a Day Day 2 of 1. Today: TCP and Apps

CSCI4211: Introduction to Computer Networks Fall 2017 Homework Assignment 2

You can also launch the instances on different machines by supplying IPv4 addresses and port numbers in the format :3410

SEGR 550 Distributed Computing. Final Exam, Fall 2011

or? Paxos: Fun Facts Quorum Quorum: Primary Copy vs. Majority Quorum: Primary Copy vs. Majority

Exam 2 Review. Fall 2011

Domain Name System.

6.189 Project 1. Readings. What to hand in. Project 1: The Game of Hangman. Get caught up on all the readings from this week!

Introduction to Distributed Systems Seif Haridi

Security (and finale) Dan Ports, CSEP 552

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Distributed Data Management Replication

Application Layer Protocols

Replication. Feb 10, 2016 CPSC 416

Transcription:

15-440 Distributed Systems Homework 1 (6 problems) Due: November 30, 11:59 PM via electronic handin Hand in to Autolab in PDF format November 29, 2011 1. You have set up a fault-tolerant banking service. Based upon an examination of other systems, you ve decided that the best way to do so is to use Paxos to replicate log entries across three servers, and let one of your employees handle the issue of recovering from a failure using the log. The state on the replicas consists of a list of all bank account mutation operations that have been made, each with a unique request ID to prevent retransmitted requests, listed in the order they were committed. Assume that the replicas execute Paxos for every operation. 1 Each value that the servers agree on looks like account action 555 transfers $1,000,000 from Mark Stehlik to David Andersen. When a server receives a request, it looks at its state to find the next unused action number, and uses Paxos to propose that value for the number to use. The three servers are S1, S2, and S3. At the same time: S1 receives a request to withdraw $500 from A. Carnegie. S1 picks proposal number 101 (the n in Paxos). S2 receives a request to transfer $500 from A. Carnegie to A. Mellon. S2 picks proposal number 102. Both servers lok at their lists of applied account actions and decide that the next action number is 15. So both start executing Paxos as a leader for action 15. (a) Each sequence below shows one initial sequence of messages of a possible execution of Paxos when both S1 and S2 are acting as the leader. Each message shown is received successfully. The messages are sent one by one in the indicated order. No other messages are sent until after the last message shown is received. Answer three questions about the final outcomes that could result from each of these sequences: Is it possible for the servers to agree on the withdrawal as entry 15? Is it possible for the servers to agree on the transfer as entry 15? For each of these outcomes, explain how it either could occur or how Paxos prevents it. To be clear on the message terminology: The PREPARE message is the leader election message. RESPONSE is the response to the prepare message. ACCEPT says you ve agreed that I m the leader, now take a value. 1 In practice, most systems use Paxos to elect a primary and let it have a lease on the operations for a while, but that adds complexity to the homework problem. 1

Sequence 1: Sequence 2: S1 -> S3 ACCEPT(101, withdraw... ) Sequence 3: S1 -> S3 ACCEPT(101, withdraw... ) S1 -> S1 ACCEPT(101, withdraw.. ) (b) Suppose one of the servers received an ACCEPT for a particular instance of Paxos (remember, each instance agrees on a value for a particular account event), but it never heard back about what the final outcome was. What steps should the server take to figure out whether agreement was reached and what the agreed-upon value was? Explain why your procedure is correct even if there are still active leaders executing this instance of Paxos. Page 2

2. In order to reduce load on the root and top level domain servers, local DNS servers cache queries. The length of time that a query is kept is referred to as the time to live (ttl). Caching also allows a DNS server to respond more quickly to queries for the same domain or host. In this problem you will compute downtimes for a configuration of servers. (a) Suppose a particular ISP has a set of servers with the following records: Server 1 Authoritative for CAR Knows the following Authoritative Servers: - HIT (Server 4) - BLUE (Server 5) - ODD (Server 6) Server 2 Replica of Server 1 Server 3 Replica of Server 1 Server 4 Authoritative for HIT - WWW.HIT.CAR Server 5 Authoritative for BLUE - WWW.BLUE.CAR Server 6 Authoritative for ODD - WWW.ODD.CAR The time to live for each NS record is 3 hours. The timeout for each of the resolvers is 30 seconds. So for example, if a query comes in to Server 1, but Server 1 has crashed, the resolver that issued the query to Server 1 will wait for 30 seconds for a response before querying Server 2. To simplify things a bit, the time for the resolver to respond to the client from the cache is 0. The time to respond to a query not in the cache is 1 second per other server that needs to be contacted. For the following sequence of events at time t (in seconds): t=0: QUERY to Server 1 for WWW.BLUE.CAR t=10: QUERY to Server 2 for WWW.BLUE.CAR t=15: QUERY to Server 1 for WWW.ODD.CAR t=20: Server 1 crashes. t=25: QUERY to Server 1 for WWW.BLUE.CAR t=60: QUERY to Server 3 for WWW.HIT.CAR t=65: QUERY to Server 3 for WWW.HIT.CAR t=100: QUERY to Server 3 for WWW.HIT.CAR t=105: QUERY to Server 2 for WWW.BLUE.CAR Page 3

t=110: Server 6 crashes. t=115: QUERY to Server 2 for WWW.ODD.CAR What is the total latency for this set of queries? For this problem, we will define latency as the amount of time that the client has to wait for an answer, whether that answer is the IP address of the hostname or an error. Explain how you got your answer. (b) Let s start with a clean slate with the same servers as in part a. This time t is in minutes. Consider the following sequence of events: t=0: QUERY to Server 1 for WWW.BLUE.CAR t=15: The administrator for Server 5 changes the IP address record for WWW.BLUE.CAR t=20: QUERY to Server 1 for WWW.BLUE.CAR t=30: QUERY to Server 2 for WWW.HIT.CAR t=40: QUERY to Server 3 for WWW.BLUE.CAR t=60: Server 1 crashes. t=80: QUERY to Server 2 for WWW.HIT.CAR t=85: QUERY to Server 2 for WWW.BLUE.CAR t=90: Server 1 is restored. (same state as when it crashed) t=100: QUERY to Server 1 for WWW.BLUE.CAR t=200: QUERY to Server 1 for WWW.BLUE.CAR How many times did the client receive the incorrect IP address for WWW.BLUE.CAR? (c) As you saw in part b, caching has a drawback. What is this drawback? 3. For this problem you have to implement a barrier using Go channels. To understand how a barrier works, consider N goroutines (where N is a global, known constant), all running the following code concurrently: for i := 0; i < 100; i++ { barrier() log.printf("epoch %d\n", i) } The barrier ensures that all N goroutines stop at the barrier before any can move past the barrier. As a result, messages from different epochs will not be intermixed. Provide an implementation for the function barrier() using only Go channels for synchronization. You will also need to define and give the implementation of a different goroutine (a different function) dedicated to monitoring the barrier (let s call it barrier monitor()). You don t need to write initialization code: consider that any channel (and any other variable) have been defined globally. Also, N is a global constant that has already been defined. 4. Skype uses a custom protocol to determine whether a user is logged in, where they are located, and what ports they are listening on. Suppose that you have set out to build your own peer-to-peer telephony system. (a) Very briefly, how could you use the Chord distributed hash table to maintain a directory of users that was stored entirely on the end-users computers, with no centralized infrastructure? (b) Your telephony system catches on very quickly. Soon you have 100,000 users worldwide. However, users start to complain that it takes a long time to perform directory lookups. Assume that the average latency between users is 100ms. Explain why are lookups taking so long and estimate how long they take. 5. This is the annoying open-ended question. Don t spend more than a half an hour on it for your own sanity. But do discuss it with a friend or three. You will not be able to find an exact answer. The course Page 4

staff doesn t know the correct answer. So come up with a decent estimate based upon sources you can find - and have fun with it! (This is good practice for job interviews. :-) How many total entries are there in the domain name system? (including all levels deep - www.garble.gobble.andrew.cmu.edu is fair game). Hint: You ll find some useful statistics on the number of domain names at http://www.dailychanges.com/. But remember, each domain may have many entries. And there are also country-specific domain names like.uk.de etc. So this page isn t enough on its own, but hopefully it ll get you thinking in the right direction. 6. For this problem, you will devise a method to solve the all-pairs shortest path problem via Map/Reduce. Suppose that we have a graph consisting of a set of nodes {1, 2,..., n}, and a set of edges {e 1, e 2,..., e m }, where each edge is of the form (i, j, d), indicating that the edge is from node i to node j and is of length d, where d is a non-negative real number. Our goal is to compute for each pair of nodes i and j, a value D(i, j), equal to either the length of the shortest path from i to j, or to if no such path exists. Devise an algorithm that can perform this computation using a total of 2 log 2 n Map/Reduce steps. Hint: Formulate a scheme that computes a sequence of values of the form D k (i, j) equal to the shortest path from i to j containing at most 2 k edges. You can perform each iteration by adapting the matrix multiplication algorithm shown in the lecture of October 25 for this purpose. The trick is to define the multiplication and addition operations of matrix multiplication appropriately. Page 5