NETWORKED STORAGE AND REMOTE PROCEDURE CALLS (RPC)

NETWORKED STORAGE AND REMOTE PROCEDURE CALLS (RPC) George Porter Oct 16 and 18, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license These slides incorporate material from: Michael Freedman and Kyle Jamieson, Princeton University (also under a CC BY-NC-SA 3.0 Creative Commons license) 1

ANNOUNCEMENTS THIS WEEK: STORAGE AND RPC Networked storage underpins many/most networked services Microcosm of many principles we re cover in this course Protocol in many ways more complex than HTTP Consistency, heterogeneity, caching, locks, leases Tue: overview of networked storage Thu: dive into remote procedure calls 2

Outline Networked storage Remote procedure calls ABSTRACTION, ABSTRACTION, ABSTRACTION! Local file systems Disks are terrible abstractions: low-level blocks, etc. Directories, files, links much better Distributed file systems Make a remote file system look local Today: NFS (Network File System) Developed by Sun in 1980s, still used today! 3

NFS ARCHITECTURE Server 1 (root) Client Server 2 (root) (root) export... vmunix usr nfs people Remote mount students x staff Remote mount users big jon bob... jim ann jane joe Mount remote FS (host:path) as local directories VIRTUAL FILE SYSTEM ENABLES TRANSPARENCY 4

VFS / LOCAL FS fd = open( path, flags) read(fd, buf, n) write(fd, buf, n) close(fd) Server maintains state that maps fd to inode, offset STATELESS NFS: STRAWMAN 1 fd = open( path, flags) read( path, buf, n) write( path, buf, n) close(fd) 5

STATELESS NFS: STRAWMAN 2 fd = open( path, flags) read( path, offset, buf, n) write( path, offset, buf, n) close(fd) EMBED PATHNAMES IN SYSCALLS? Should read refer to current dir1/f or dir2/f? In UNIX, it s dir2/f. How do we preserve in NFS? 6

STATELESS NFS (FOR REAL) fh = lookup( path, flags) read(fh, offset, buf, n) write(fh, offset, buf, n) getattr(fh) Implemented as Remote Procedure Calls (RPCs) NFS FILE HANDLES (FH) Opaque identifier provider to client from server Includes all info needed to identify file/object on server volume ID inode # generation # It s a trick: store server state at the client! Generation # optional, depending on the underlying file system 7

ARE REMOTE == LOCAL? TANSTANFL (THERE AIN T NO SUCH THING AS A FREE LUNCH) With local FS, read sees data from most recent write, even if performed by different process Read/write coherence, linearizability Achieve the same with NFS? Perform all reads & writes synchronously to server Huge cost: high latency, low scalability And what if the server doesn t return? Options: hang indefinitely, return ERROR 8

CACHING OPTIONS Centralized control: Record status of clients (which files open for reading/writing, what cached, ) Read-ahead: Pre-fetch blocks before needed Write-through: All writes sent to server Write-behind: Writes locally buffered, send as batch Consistency challenges: When client writes, how do others caching data get updated? (Callbacks, ) Two clients concurrently write? (Locking, overwrite, ) SHOULD SERVER MAINTAIN PER-CLIENT STATE? Stateful Pros Smaller requests Simpler req processing Better cache coherence, file locking, etc. Cons Per-client state limits scalability Fault-tolerance on state required for correctness Stateless Pros Easy server crash recovery No open/close needed Better scalability Cons Each request must be fully selfdescribing Consistency is harder, e.g., no simple file locking 9

SOFT VS. HARD STATE Hard state: Don t lose data Durability: State not lost Write to disk, or cold remote backup Exact replica or recoverable (DB: checkpoint + op log) Availability (liveness): Maintain online replicas Soft state: Performance optimization NFS Stateless protocol Recovery easy: crashed == slow server Messages over UDP (unencrypted) Read from server, caching in NFS client NFSv2 was write-through (i.e., synchronous) NFSv3 added write-behind Delay writes until close or fsync from application 10

EXPLORING THE CONSISTENCY TRADEOFFS Write-to-read semantics too expensive Give up caching, require server-side state, or Close-to-open session semantics Ensure an ordering, but only between application close and open, not all writes and reads. If B opens after A closes, will see A s writes But if two clients open at same time? No guarantees And what gets written? Last writer wins NFS CACHE CONSISTENCY Recall challenge: Potential concurrent writers Cache validation: Get file s last modification time from server: getattr(fh) Both when first open file, then poll every 3-60 seconds If server s last modification time has changed, flush dirty blocks and invalidate cache When reading a block Validate: (current time last validation time < threshold) If valid, serve from cache. Otherwise, refresh from server 11

SOME PROBLEMS Mixed reads across version A reads block 1-10 from file, B replaces blocks 1-20, A then keeps reading blocks 11-20. Assumes synchronized clocks. Not really correct. We ll learn about the notion of logical clocks later Writes specified by offset Concurrent writes can change offset NFS CACHE CONSISTENCY Recall challenge: Potential concurrent writers Timestamp invalidation: NFS Callback invalidation: AFS, Sprite, Spritely NFS Server tracks all clients that have opened file On write, sends notification to clients if file changes. Client invalidates cache. Leases: Gray & Cheriton 89, NFSv4 12

LOCKS A client can request a lock over a file / byte range Advisory: Well-behaved clients comply Mandatory: Server-enforced Client performs writes, then unlocks Problem: What if the client crashes? Solution: Keep-alive timer: Recover lock on timeout Problem: what if client alive but network route failed? Client thinks it has lock, server gives lock to other: Split brain LEASES Client obtains lease on file for read or write A lease is a ticket permitting an activity; the lease is valid until some expiration time. Read lease allows client to cache clean data Guarantee: no other client is modifying file Write lease allows safe delayed writes Client can locally modify than batch writes to server Guarantee: no other client has file cached 13

USING LEASES Client requests a lease May be implicit, distinct from file locking Issued lease has file version number for cache coherence Server determines if lease can be granted Read leases may be granted concurrently Write leases are granted exclusively If conflict exists, server may send eviction notices Evicted write lease must write back Evicted read leases must flush/disable caching Client acknowledges when completed BOUNDED LEASE TERM SIMPLIFIES RECOVERY Before lease expires, client must renew lease Client fails while holding a lease? Server waits until the lease expires, then unilaterally reclaims If client fails during eviction, server waits then reclaims Server fails while leases outstanding? On recovery, Wait lease period + clock skew before issuing new leases Absorb renewal requests and/or writes for evicted leases 14

ANDREW FILE SYSTEM (CMU 1980S-) Scalability was key design goal Many servers, 10,000s of users Observations about workload Reads much more common than writes Concurrent writes are rare / writes between users disjoint Interfaces in terms of files, not blocks Whole-file serving: entire file and directories Whole-file caching: clients cache files to local disk Large cache and permanent, so persists across reboots AFS: CONSISTENCY Consistency: Close-to-open consistency No mixed writes, as whole-file caching / whole-file overwrites Update visibility: Callbacks to invalidate caches What about crashes or partitions? Client invalidates cache iff Recovering from failure Regular liveness check to server (heartbeat) fails. Server assumes cache invalidated if callbacks fail + heartbeat period exceeded 15

Outline Networked storage Remote procedure calls WHY RPC? The typical programmer is trained to write singlethreaded code that runs in one place Goal: Easy-to-program network communication that makes client-server communication transparent Retains the feel of writing centralized code Programmer needn t think about the network 16

REMOTE PROCEDURE CALL (RPC) Distributed programming is challenging Need common primitives/abstraction to hide complexity E.g., file system abstraction to hide block layout, process abstraction for scheduling/fault isolation In early 1980 s, researchers at PARC noticed most distributed programming took form of remote procedure call WHAT S THE GOAL OF RPC? Within a single program, running in a single process, recall the well-known notion of a procedure call: Caller pushes arguments onto stack, jumps to address of callee function Callee reads arguments from stack, executes, puts return value in register, returns to next instruction in caller RPC s Goal: To make communication appear like a local procedure call: transparency for procedure calls 17

RPC EXAMPLE Local computing X = 3 * 10; print(x) > 30 Remote computing server = connecttoserver(s); Try: X = server.mult(3,10); print(x) Except e: print Error! > 30 or > Error RPC ISSUES Heterogeneity Client needs to rendezvous with the server Server must dispatch to the required function What if server is different type of machine? Failure What if messages get dropped? What if client, server, or network fails? Performance Procedure call takes 10 cycles 3 ns RPC in a data center takes 10 μs (10 3 slower) In the wide area, typically 10 6 slower 18

PROBLEM: DIFFERENCES IN DATA REPRESENTATION Not an issue for local procedure call For a remote procedure call, a remote machine may: Represent data types using different sizes Use a different byte ordering (endianness) Represent floating point numbers differently Have different data alignment requirements e.g.,4-byte type begins only on 4-byte memory boundary BYTE ORDER x86-64 is a little endian architecture Least significant byte of multibyte entity at lowest memory address Little end goes first int 5 at address 0x1000: 0x1000: 0000 0101 0x1001: 0000 0000 0x1002: 0000 0000 0x1003: 0000 0000 Some other systems use big endian Most significant byte of multibyte entity at lowest memory address Big end goes first int 5 at address 0x1000: 0x1000: 0000 0000 0x1001: 0000 0000 0x1002: 0000 0000 0x1003: 0000 0101 19

PROBLEM: DIFFERENCES IN PROGRAMMING SUPPORT Language support varies: Many programming languages have no inbuilt concept of remote procedure calls e.g., C, C++, earlier Java Some languages have support that enables RPC e.g., Python, Haskell, Go SOLUTION: INTERFACE DESCRIPTION LANGUAGE Mechanism to pass procedure parameters and return values in a machine-independent way Programmer may write an interface description in the IDL Defines API for procedure calls: names, parameter/return types Then runs an IDL compiler which generates: Code to marshal (convert) native data types into machineindependent byte streams And vice-versa, called unmarshaling Client stub: Forwards local procedure call as a request to server Server stub: Dispatches RPC to its implementation 20

A DAY IN THE LIFE OF AN RPC 1. Client calls stub function (pushes params onto stack) Client machine Client process k = add(3, 5) Client stub (RPC library) A DAY IN THE LIFE OF AN RPC 1. Client calls stub function (pushes params onto stack) 2. Stub marshals parameters to a network message Client machine Client process k = add(3, 5) Client stub (RPC library) proc: add int: 3 int: 5 Client OS 21

A DAY IN THE LIFE OF AN RPC 2. Stub marshals parameters to a network message 3. OS sends a network message to the server Client machine Server machine Client process k = add(3, 5) Client stub (RPC library) Client OS proc: add int: 3 int: 5 Server OS A DAY IN THE LIFE OF AN RPC 3. OS sends a network message to the server 4. Server OS receives message, sends it up to stub Client machine Server machine Client process k = add(3, 5) Client stub (RPC library) Server stub (RPC library) Client OS Server OS proc: add int: 3 int: 5 22

A DAY IN THE LIFE OF AN RPC 4. Server OS receives message, sends it up to stub 5. Server stub unmarshals params, calls server function Client machine Client process k = add(3, 5) Client stub (RPC library) Server machine Server process Implementation of add Server stub (RPC library) proc: add int: 3 int: 5 Client OS Server OS A DAY IN THE LIFE OF AN RPC 5. Server stub unmarshals params, calls server function 6. Server function runs, returns a value Client machine Client process k = add(3, 5) Client stub (RPC library) Server machine Server process 8 add(3, 5) Server stub (RPC library) Client OS Server OS 23

A DAY IN THE LIFE OF AN RPC 6. Server function runs, returns a value 7. Server stub marshals the return value, sends msg Client machine Client process k = add(3, 5) Client stub (RPC library) Server machine Server process 8 add(3, 5) Server stub (RPC library) Result int: 8 Client OS Server OS A DAY IN THE LIFE OF AN RPC 7. Server stub marshals the return value, sends msg 8. Server OS sends the reply back across the network Client machine Client process k = add(3, 5) Client stub (RPC library) Server machine Server process 8 add(3, 5) Server stub (RPC library) Client OS Server OS Result int: 8 24

A DAY IN THE LIFE OF AN RPC 8. Server OS sends the reply back across the network 9. Client OS receives the reply and passes up to stub Client machine Client process k = add(3, 5) Client stub (RPC library) Server machine Server process 8 add(3, 5) Server stub (RPC library) Client OS Result int: 8 Server OS A DAY IN THE LIFE OF AN RPC 9. Client OS receives the reply and passes up to stub 10.Client stub unmarshals return value, returns to client Client machine Client process k 8 Client stub (RPC library) Result int: 8 Server machine Server process 8 add(3, 5) Server stub (RPC library) Client OS Server OS 25

PETERSON AND DAVIE VIEW THE SERVER STUB IS REALLY TWO PARTS Dispatcher Receives a client s RPC request Identifies appropriate server-side method to invoke Skeleton Unmarshals parameters to server-native types Calls the local server procedure Marshals the response, sends it back to the dispatcher All this is hidden from the programmer Dispatcher and skeleton may be integrated Depends on implementation 26

Outline 1. RPC fundamentals 2. Handling failures in RPCs 3. grpc: Google RPC overview 4. ATM Server demo WHAT COULD POSSIBLY GO WRONG? 27

WHAT COULD POSSIBLY GO WRONG? 1. Client may crash and reboot 2. Packets may be dropped Some individual packet loss in the Internet Broken routing results in many lost packets 3. Server may crash and reboot 4. Network or server might just be very slow All these may look the same to the client FAILURES, FROM CLIENT S PERSPECTIVE Client Server Time The cause of the failure is hidden from the client! 28

AT-LEAST-ONCE SCHEME Simplest scheme for handling failures 1. Client stub waits for a response, for a while Response takes the form of an acknowledgement message from the server stub 2. If no response arrives after a fixed timeout time period, then client stub re-sends the request Repeat the above a few times Still no response? Return an error to the application AT-LEAST-ONCE AND SIDE EFFECTS Client sends a debit $10 from bank account RPC Client Server (debit $10) (debit $10) Time 29

AT-LEAST-ONCE AND WRITES put(x, value), then get(x): expect answer to be value Client put(x,10) put(x,20) Server x 10 x=20 x 20 Time AT-LEAST-ONCE AND WRITES Consider a client storing key-value pairs in a database put(x, value), then get(x): expect answer to be value Client put(x,10) put(x,20) Server x 10 x=20 x 20 x 10 Time 30

SO IS AT-LEAST-ONCE EVER OKAY? Yes: If they are read-only operations with no side effects e.g., read a key s value in a database Yes: If the application has its own functionality to cope with duplication and reordering AT-MOST-ONCE SCHEME Idea: server RPC code detects duplicate requests Returns previous reply instead of re-running handler How to detect a duplicate request? Test: Server sees same function, same arguments twice No! Sometimes applications legitimately submit the same function with same augments, twice in a row 31

AT-MOST-ONCE SCHEME How to detect a duplicate request? Client includes unique transaction ID (xid) with each one of its RPC requests Client uses same xid for retransmitted requests At-Most-Once Server if seen[xid]: retval = old[xid] else: retval = handler() old[xid] = retval seen[xid] = true return retval AT MOST ONCE: ENSURING UNIQUE XIDS How to ensure that the xid is unique? 1. Combine a unique client ID (e.g., IP address) with the current time of day 2. Combine unique client ID with a sequence number Suppose the client crashes and restarts. Can it reuse the same client ID? 3. Big random number 32

AT-MOST-ONCE: DISCARDING SERVER STATE Problem: seen and old arrays will grow without bound Observation: By construction, when the client gets a response to a particular xid, it will never re-send it Client could tell server I m done with xid x delete it Have to tell the server about each and every retired xid Could piggyback on subsequent requests Significant overhead if many RPCs are in flight, in parallel AT-MOST-ONCE: DISCARDING SERVER STATE Problem: seen and old arrays will grow without bound Suppose xid = unique client id, sequence no. e.g. 42, 1000, 42, 1001, 42, 1002 Client includes seen all replies X with every RPC Much like TCP sequence numbers, acks How does the client know that the server received the information about retired RPCs? Each one of these is cumulative: later seen messages subsume earlier ones 33

AT-MOST-ONCE: CONCURRENT REQUESTS Problem: How to handle a duplicate request while the original is still executing? Server doesn t know reply yet. Also, we don t want to run the procedure twice Idea: Add a pending flag per executing RPC Server waits for the procedure to finish, or ignores AT MOST ONCE: SERVER CRASH AND RESTART Problem: Server may crash and restart Does server need to write its tables to disk? Yes! On server crash and restart: If old[], seen[] tables are only in memory: Server will forget, accept duplicate requests 34

RPC SEMANTICS Retry Request Delivery Guarantees Duplicate Filtering Retransmit Response RPC Call Semantics No NA NA Maybe Yes No Re-execute Procedure At-least once Yes Yes Retransmit reply At-most once 35