Remote Procedure Call. Tom Anderson

Similar documents
1/10/16. RPC and Clocks. Tom Anderson. Last Time. Synchroniza>on RPC. Lab 1 RPC

MapReduce. Tom Anderson

Data Centers. Tom Anderson

Parallel Programming Concepts

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #4 Tuesday, December 15 th 11:00 12:15. Advanced Topics: Distributed File Systems

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS

Distributed Computation Models

CSE Lecture 11: Map/Reduce 7 October Nate Nystrom UTA

Security (and finale) Dan Ports, CSEP 552

RPCs in Go. COS 418: Distributed Systems Precept 2. Themis Melissaris and Daniel Suo

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Clustering Lecture 8: MapReduce

Primary-Backup Replication

416 Distributed Systems. RPC Day 2 Jan 11, 2017

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

COMMUNICATION IN DISTRIBUTED SYSTEMS

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Quiz I Solutions MASSACHUSETTS INSTITUTE OF TECHNOLOGY Spring Department of Electrical Engineering and Computer Science

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

CA464 Distributed Programming

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Remote Procedure Call. Chord: Node Joins and Leaves. Recall? Socket API

Introduction to MapReduce

Chapter 17: Distributed Systems (DS)

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

CS603: Distributed Systems

Developing MapReduce Programs

The Google File System

Today CSCI Communication. Communication in Distributed Systems. Communication in Distributed Systems. Remote Procedure Calls (RPC)

Distributed Systems Principles and Paradigms

The Client Server Model and Software Design

RPC and Threads. Jinyang Li. These slides are based on lecture notes of 6.824

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

416 Distributed Systems. RPC Day 2 Jan 12, 2018

Batch Processing Basic architecture

Google is Really Different.

Primary/Backup. CS6450: Distributed Systems Lecture 3/4. Ryan Stutsman

Remote Procedure Call

Programming Models MapReduce

Remote Procedure Calls

Data Partitioning and MapReduce

NFS Design Goals. Network File System - NFS

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Day 3: Principles Continued Jan 31, 2019

The MapReduce Abstraction

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

MapReduce. Stony Brook University CSE545, Fall 2016

Distributed Filesystem

Sharding & CDNs. CS 475, Spring 2018 Concurrent & Distributed Systems

Distributed File Systems II

CS November 2017

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed Systems 8. Remote Procedure Calls

Networks and distributed computing

416 practice questions (PQs)

UDP: Datagram Transport Service

Programming Systems for Big Data

Replication in Distributed Systems

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Introduction to MapReduce

Distributed File Systems. Directory Hierarchy. Transfer Model

GFS: The Google File System. Dr. Yingwu Zhu

CSci Introduction to Distributed Systems. Communication: RPC

Parallel Nested Loops

Parallel Partition-Based. Parallel Nested Loops. Median. More Join Thoughts. Parallel Office Tools 9/15/2011

CSE 124: QUANTIFYING PERFORMANCE AT SCALE AND COURSE REVIEW. George Porter December 6, 2017

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

Lecture 5: RMI etc. Servant. Java Remote Method Invocation Invocation Semantics Distributed Events CDK: Chapter 5 TVS: Section 8.3

Stanford University Computer Science Department CS 240 Sample Quiz 2 Questions Winter February 25, 2005

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

a. Under overload, whole network collapsed iii. How do you make an efficient high-level communication mechanism? 1. Similar to using compiler instead

DISTRIBUTED COMPUTER SYSTEMS

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit)

Stanford University Computer Science Department CS 240 Quiz 2 with Answers Spring May 24, total

CS November 2018

Building up to today. Remote Procedure Calls. Reminder about last time. Threads - impl

CPS 512 midterm exam #1, 10/7/2016

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Map Reduce Group Meeting

Lecture 15: Network File Systems

Distributed Systems. How do regular procedure calls work in programming languages? Problems with sockets RPC. Regular procedure calls

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

NWEN 243. Networked Applications. Layer 4 TCP and UDP

CSE 461 The Transport Layer

Map-Reduce. Marco Mura 2010 March, 31th

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

CS454/654 Midterm Exam Fall 2004

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Distributed computing: index building and use

Network Protocols. Sarah Diesburg Operating Systems CS 3430

MI-PDB, MIE-PDB: Advanced Database Systems

Transcription:

Remote Procedure Call Tom Anderson

Why Are Distributed Systems Hard? Asynchrony Different nodes run at different speeds Messages can be unpredictably, arbitrarily delayed Failures (partial and ambiguous) Parts of the system can crash Can t tell crash from slowness Concurrency and consistency Replicated state, cached on multiple nodes How to keep many copies of data consistent?

Why Are Distributed Systems Hard? Performance Have to efficiently coordinate many machines Performance is variable and unpredictable Tail latency: only as fast as slowest machine Testing and verification Almost impossible to test all failure cases Proofs (emerging field) are really hard Security Need to assume adversarial nodes

Three-tier Web Architecture Scalable number of front-end web servers Stateless ( RESTful ): if crash can reconnect the user to another server Scalable number of cache servers Lower latency (better for front end) Reduce load (better for database) Q: how do we keep the cache layer consistent? Scalable number of back-end database servers Run carefully designed distributed systems code

And Beyond Worldwide distribution of users Cross continent Internet delay ~ half a second Amazon: reduction in sales if latency > 100ms Many data centers One near every user Smaller data centers just have web and cache layer Larger data centers include storage layer as well Q: how do we coordinate updates across DCs?

MapReduce Computational Model For each key k with value v, compute a new set of key-value pairs: map (k,v) list(k,v ) For each key k and list of values v, compute a new (hopefully smaller) list of values: reduce (k,list(v )) list(v ) User writes map and reduce functions. Framework takes care of parallelism, distribution, and fault tolerance.

MapReduce Example: grep find lines that match text pattern 1. Master splits file into M almost equal chunks at line boundaries 2. Master hands each partition to mapper 3. map phase: for each partition, call map on each line of text search line for word output line number, line of text if word shows up, nil if not 4. Partition results among R reducers map writes each output record into a file, hashed on key

Example: grep 5. Reduce phase: each reduce job collects 1/R output from each Map job all map jobs have completed! Reduce function is identity: v1 in, v1 out 6. merge phase: master merges R outputs

MapReduce (or ML or ) Architecture Scheduler accepts MapReduce jobs finds a MapReduce master and set of avail workers For each job, MapReduce master <array> farms tasks to workers; restarts failed jobs; syncs task completion Worker <array> executes Map and Reduce tasks Storage <array> stores initial data set, intermediate files, end results

Remote Procedure Call (RPC) A request from the client to execute a function on the server. To the client, looks like a procedure call To the server, looks like an implementation of a procedure call

Remote Procedure Call (RPC) A request from the client to execute a function on the server. On client Ex: result = DoMap(worker, i) Parameters marshalled into a message (can be arbitrary types) Message sent to server (can be multiple pkts) Wait for reply On server message is parsed operation DoMap(i) invoked Result marshalled into a message (can be multiple pkts) Message sent to client

Transport RPC implementation DoMap(worker, i) Map(worker, i) Serialize args Open connection Write data TCP/IP write RPC library OS Read data Deserialize reply TCP/IP read CSE 461 x RPC library Serialize reply Read data Write data Deserialize args OS TCP/IP write TCP/IP read x Transport

RPC vs. Procedure Call What is equivalent of: The name of the procedure? The calling convention? The return value? The return address?

RPC vs. Procedure Call Binding Client needs a connection to server Server must implement the required function What if the server is running a different version of the code? Performance procedure call: maybe 10 cycles = ~3 ns RPC in data center: 10 microseconds => ~1K slower RPC in the wide area: millions of times slower

RPC vs. Procedure Call Failures What happens if messages get dropped? What if client crashes? What if server crashes? What if server crashes after performing op but before replying? What if server appears to crash but is slow? What if network partitions?

Semantics Semantics = meaning reply == ok =>??? reply!= ok =>???

Semantics At least once (NFS, DNS, lab 1b) true: executed at least once false: maybe executed, maybe multiple times At most once (lab 1c) true: executed once false: maybe executed, but never more than once Exactly once true: executed once false: never returns false

At Least Once RPC library waits for response for a while If none arrives, re-send the request Do this a few times Still no response -- return an error to the application

Non-replicated key/value server Client sends Put k v Server gets request, but network drops reply Client sends Put k v again should server respond "yes"? or "no"? What if op is append?

Does TCP Fix This? TCP: reliable bi-directional byte stream between two endpoints Retransmission of lost packets Duplicate detection But what if TCP times out and client reconnects? Browser connects to Amazon RPC to purchase book Wifi times out during RPC Browser reconnects

When does at-least-once work? If no side effects read-only operations (or idempotent ops) Example: MapReduce Example: NFS readfileblock writefileblock

At Most Once Client includes unique ID (UID) with each request use same UID for re-send Server RPC code detects duplicate requests return previous reply instead of re-running handler if seen[uid] { r = old[uid] } else { r = handler() old[uid] = r seen[uid] = true }

Some At-Most-Once Issues How do we ensure UID is unique? Big random number? Combine unique client ID (IP address?) with seq #? What if client crashes and restarts? Can it reuse the same UID? In labs, nodes never restart Equivalent to: every node gets new ID on start

When Can Server to Discard Old RPCs? Option 1: Never? Option 2: unique client IDs per-client RPC sequence numbers client includes "seen all replies <= X" with every RPC Option 3: only allow client one outstanding RPC at a time arrival of seq+1 allows server to discard all <= seq Labs use Option 3

What if Server Crashes? If at-most-once list of recent RPC results is stored in memory, server will forget and accept duplicate requests when it reboots Does server need to write the recent RPC results to disk? If replicated, does replica also need to store recent RPC results? In Labs, server gets new address on restart Client messages aren t delivered to restarted server