ECE454 Tutorial. June 16, (Material prepared by Evan Jones)

ECE454 Tutorial June 16, 2009 (Material prepared by Evan Jones)

2. Consider the following function: void strcpy(char* out, char* in) { while(*out++ = *in++); } which is invoked by the following code: void main( void ) { char buf[10] = "name"; strcpy(buf+4,buf); cout << buf << endl; } What is the result of executing this code if the strcpy function is a remote procedure using copy/restore semantics? What is the result if it is a local procedure, using the standard C/C++ call-by-value semantics?

Q2: RPC Copy & Restore Server copies buf into a local buffer void strcpy(char* out = char[n], char* in = name ) { while(*out++ = *in++); // out = name, in = name } Client stub copies this buffer into buf+4 // buf = [ n, a, m, e, 0, 0, 0, 0, 0, 0 ]; strcpy(buf+4,buf); // buf = [ n, a, m, e, n, a, m, e, 0, 0 ]; Result: namename

Q2: Local Semantics void strcpy(char* out = buf+4, char* in = buf) { while(*out++ = *in++); } The copy overwrites the null characters Result: infinite loop Eventually, a write to a forbidden address will terminate the program with a segmentation fault

3. Consider the following declaration in C: union { int a; char b; float c; } foo; At run-time there is no way to determine which of the entries in the foo union is valid. What implications does this have for RPC? What is the implication if instead of being char b; it was char* b?

Q3: Struct and Union Memory Layout

Q3: Unions Multiple data types occupy the same space Union is as wide as the largest data type Type retrieved should be the type last stored To marshal a union, we need its current type. Discriminated unions have a tag indicating the current type

Q3: RPC Problems Send all three Marshall invalid values Just send the bits What about different architectures? (Big or little endian, floating point format, etc.) Pointers (char* instead of char) May try to access invalid or inaccessible address space when marshalling Send an invalid address to the remote system

4. We wish to determine some of the benefits and drawbacks of caching the result of a server address lookup in an RPC system. Consider a system in which a client requests the server address for a given procedure from a binder. The time to execute this request is τ b. The client can then request execution of the procedure at the server, which takes a total time of τ s. Hint: This is basically Project 1

4 (continued). If the client caches the server address, it does not need to look it up on subsequent RPCs. However, the server may be shut down for maintenance from time to time. As a result, the client must now consider the possibility that the RPC will not execute because the server address information it has cached is stale. To determine this, the client will simply have a timeout period τ o > τ s. If after invoking an RPC using a cached server address, the server has not responded within this timeout period, the client will presume the server is down, and will request the binder lookup a different server address.

Q4: Normal Request

Q4: Cached Request

Q4: Cached Request Timeout

Q4: Summary System like Project 1 Binder lookup time: τ b Server request time: τ s Client timeout time: τ o > τ s

4. a) If the client does no caching, what is the minimum and maximum amount of time it takes to execute an RPC?

4. a) If the client does no caching, what is the minimum and maximum amount of time it takes to execute an RPC? Max time = Min time = τ b + τ s

4. b) If the client does caching, what is the minimum and maximum amount of time it takes to execute an RPC?

4. b) If the client does caching, what is the minimum and maximum amount of time it takes to execute an RPC? Min time = τ s Max time = τ o + τ b + τ s

4. c) Suppose a client executes k RPCs before the server is shutdown for maintenance and another server takes over. What is the average time to execute an RPC using the caching scheme?

4. c) 1 st = τ b + τ s 2 nd = τ s k th = τ s (k+1) th = τ o + τ b + τ s (k+2) th = same as for the 2nd Total time for k requests = τ o + τ b + kτ s Ignores the initial k requests because we assume an infinite series of requests Average time = τ s + (τ o + τ b )/k

4. d) For what value of k will the caching scheme outperform the non-caching scheme?

4. d) For what value of k will the caching scheme outperform the non-caching scheme? Non-caching = τ b + τ s Caching = τ s + (τ o + τ b )/k Equate the two and solve for k: k > τ o / τ b + 1

1. Identify four ways in which a Remote Procedure Call is different from a Local Procedure Call, and what is the significance of those differences.

Q1: Parameter Passing RPC: copy and restore Local: call by value or call by reference Remote references: Must use code to access data remotely Significance: Results can be different when using RPC or local calls

Q1: Failure Local calls: Failures are only due to local bugs RPC: Can fail due to network or server problems Significance: Client must have additional error handling for RPC calls

Q1: Performance Parameters must be marshaled Server must be accessed over the network Significance: RPC calls have much more overhead than local calls

Q1: Performance Workaround RPC has a lot of overhead Processing on the server occurs at full speed Conclusion: RMI interfaces should do a lot of work per call

Q1: Global Resources Local procedures all share the same global state RPC calls have no access to global shared resources. Example: if an operation depends on the value of the computer s clock, it may not work as an RPC.

Q1: At-most-once or At-leastonce semantics Client has sent the request The server crashes What should the client do? The server might have executed the call before it crashed, but the client has no way to tell

Q1: At-most-once The RPC call fails with an error It is the application s responsibility to handle it appropriately Query an application specific function about the state of the system Retry if the application knows it doesn t matter At-most-once operation: The call was executed one or zero times

Q1: At-least-once The RPC is retried until the client knows it was executed Appropriate if result does not change when the operation is performed multiple times An idempotent operation At-least-once operation: The call was executed one or more times

Q1: But aren t most operations not idempotent? Example: Adding a record to a database int addrecord( DataBase, Record ); If this is executed multiple times, the database will have duplicate entries But maybe we can rework this

Making operations idempotent EntryHandle createrecord(); int modifyrecord( DataBase, Record, EntryHandle ); Multiple createrecord: unused entries which can eventually be deleted Multiple modifyrecord: identical data In general, avoid keeping state on the server. This is not always possible.

5. Consider the following line of Java code: a = b.foo(c); where a, b, and c are objects of types A, B, and C respectively. The foo() method for type B is defined as: A foo( C c ) { return c.bar(); } Objects b and c are located on a server and client respectively. Object c does not have a remote interface defined (i.e C does not extend java.rmi.remote).

5. a) Can we determine where the process that is executing this line of code is located? If so, where is it, and why must it be there? If not, why can we not determine this?

5. a) Can we determine where the process that is executing this line of code is located? If so, where is it, and why must it be there? If not, why can we not determine this? a = b.foo(c); b: server c: client c: No remote interface

Q5a: Executing on the Client Because c has no remote interface and c is located on the client, the client is the only process that has a copy Therefore, this code must run on the client

5. b) In the process where this line of code is executing, is b a local or a remote reference, or is it not possible to determine?

5. b) In the process where this line of code is executing, is b a local or a remote reference, or is it not possible to determine? a = b.foo(c); b: server c: client c: No remote interface Executing on the client

Q5b: Remote Reference Executing on the client b is on the server Therefore, b must be a remote reference

5. c) Can we determine where object a is located?

5. c) Can we determine where object a is located? a = b.foo(c); b: remote reference (on server) c: client (no remote interface) Executing on the client A foo( C c ) { return c.bar(); }

Q5c: Unable to Determine Question does not provide information about type A If type A is a remote interface, a will be located on the server, and a remote reference will be returned to the client Otherwise, a will be copied back to the client, and be a local object

5. d) What is the sequence of action during the execution of this line of code. You should consider the possibility that the returned value is either a remote reference or a local object. Indicate at all stages when either a remote reference or a local object is passed.

Q5d: Sequence 1. The client calls foo on the server via RMI. b is passed as a remote reference, and a copy of c is sent because it is a local object. 2. The server executes b.foo with its own local copy of c. 3. The server returns a to the client. If a is a remote interface, then it is returned as a remote reference. Otherwise, a copy is sent to the client.

2. Consider the maximum server throughput, in client requests handled per second, for different numbers of threads. If a single thread has to perform all processing then the time for handling any request is on average 2 milliseconds of processing and 8 milliseconds of input-output delay when the server reads from a drive on behalf of the client. Any new messages that arrive while the server is handling a request are queued at the server port.

2. a) Compute the maximum throughput when the server has two threads that are independently scheduled and disk access requests can be serialized. T Disk = 8 ms T CPU = 2 ms

2. a) Compute the maximum throughput when the server has two threads that are independently scheduled and disk access requests can be serialized. T Disk = 8 ms T CPU = 2 ms Disk limited: A request completes each 8 ms Throughput = 1/0.008 = 125 requests/s

2. a) Compute the maximum throughput when the server has two threads that are independently scheduled and disk access requests can be serialized. T Disk = 8 ms T CPU = 2 ms If we add more threads, can we increase the throughput?

2. b) Caching is introduced and a server thread that is asked to retrieve data, first looks at the shared cache, and avoids accessing the disk if it finds one, so there is no time cost in the I/O time. Assume a 75% success hit rate on the cache, and that the processing time due to cache search increases to 4 milliseconds per request.

2. b) Compute the maximum throughput: T Disk = 8 ms P(Disk) = 25% T CPU = 4 ms

2. b) Compute the maximum throughput: T Disk = 8 ms P(Disk) = 25% T CPU = 4 ms We care about the average case: T Disk Avg = 8 0.25 = 2

2. b) Compute the maximum throughput: T Disk = 8 ms P(Disk) = 25% T CPU = 4 ms We care about the average case: T Disk Avg = 8 0.25 = 2 CPU limited: A request completes each 4 ms Throughput = 1/0.004 = 250 requests/s

2.b) Reality Check How many threads do we need to get that maximum rate, with caching?

2.b) Reality Check How many threads do we need to get that maximum rate, with caching? In theory (ideal case), we only need 2: One thread using the CPU One thread waiting for the disk

2.b) Reality Check How many threads do we need to get that maximum rate, with caching? In reality, the order of cache hits and misses will be random. What happens if we get two requests in a row that need to access the disk?

2. c) Caching is introduced as above with a 75% hit rate, but now there are two processors using a shared memory model. T Disk = 8 ms P(Disk) = 25% T CPU = 4 ms

2. c) Caching is introduced as above with a 75% hit rate, but now there are two processors using a shared memory model. T Disk = 8 ms P(Disk) = 25% T CPU = 4 ms We care about the average case: T Disk Avg = 8 0.25 = 2 T CPU Avg = 4/2 = 2

2.c) Reality Check How many threads do we need to get that maximum rate, with caching and two CPUs?

2.c) Reality Check How many threads do we need to get that maximum rate, with caching and two CPUs? Theoretically, we need 3: 2 using the CPUs 1 blocked waiting for disk

2.c) Reality Check How many threads do we need to get that maximum rate, with caching and two CPUs? Realistically, we still have a problem if multiple requests that need the disk arrive in a row.

5. Consider the following C-code for implementing a file copy command: void filecopy(char* dest, char* src) { const int bufsz = 1024; char buf[bufsz]; } int fd1 = open(src, O_RDONLY); // Open src int fd2 = open(dest, O_WRONLY O_CREAT); // Open dest bool done = false; while (!done) { int rc = read(fd1, buf, bufsz); if (rc <= 0) done = true; else write(fd2, buf, rc); } close(fd1); close(fd2);

5. (continued) Suppose that we wish to use this as a basis for a client/server file copy operation, in which the copy command is executed at the client, while the files reside on the server. 5. a) What are the advantages (if any) and disadvantages (if any) of using this code as is, except with the various file operations (open, read, write and close) implemented as remote procedure calls.

5. a) Advantages Very flexible: Implementing open, close, read and write as RPC calls would allow all possible file system operations. In fact, you could build a remote file system on top of them.

Q5: A File System Over RPC!?!? Are you Crazy? No: Sun s NFS, the de facto standard Unix network file system, was built on top of Sun s RPC system. It has some problems, but has been working pretty well since 1985.

5. a) Disadvantages Many RPC calls add overhead Data is copied over the network to the client, then copied back to the server: Waste of bandwidth, since the server already has the data. Reliability: If the client crashes in the middle of the copy, the server will have part of a copied file, and maybe the files will be locked.

5. b) What changes would you make to overcome any disadvantages you have identified, and how might those changes affect the advantages you have identified? Only implement a copy RPC: Only one RPC call overhead No bandwidth wasted moving data Less flexibility Implement copy as well as the others More complexity

1. Reading a file using a single-threaded file server and a multithreaded server. It takes 15 ms to get a request for work, dispatch it, and do the rest of the processing, assuming that the data needed are in a cache in main memory. If a disk operation is needed, as is the case one-third of the time, an additional 75 ms is required, during which time the thread sleeps. How many requests per second can the server handle if it is single threaded? If it is multithreaded?

Q1: Single-Threaded Diagram

Q1: Single-Threaded We care about the average case: T avg = CPU + Disk P(Disk) = 15 + 75/3 = 40 ms Requests per second = 1/0.040 = 25

Q1: Multi-Threaded Diagram

Q1: Multi-Threaded We care about the average case. On average, each task has: = 15 ms T CPU T Disk = 75/3 = 25 ms But we can overlap CPU and disk operations. T between completions = Max( T CPU, T disk ) = 25 ms Requests per second = 1/0.025 = 40

Q1: Multi-Threaded Average

3. We wish to examine the effect of the order of processing client requests at a server. A typical server will initiate the processing of clients requests in the order in which they are received. It does this because it has no knowledge of future requests. Suppose we have a server that has just received two requests, one big and one little. The big request will take T b time to process. The little request will take T l time to process.

3. In addition to these times, the time it takes to initiate the processing is T i. The time it takes a client to send the request is T c. Finally, the time it takes the server to package up and return the results to the client is T r.

3. i) Our server is single threaded. What is the least amount of time that the client initiating the little request will experience before getting the results back? Two requests have arrived at the server simultaneously T l = little request T b = big request T c = client request T i = initiate request T r = return data

3. i) Sequence of Events 1. Client initiates the requests (T c ) 2. Server initiates the little request (T i ) 3. Processing time for the little request (T l ) 4. Returns results for the little request (T r ) Total = T c + T i + T l + T r

3. ii) What is the greatest amount of time that the client initiating the little request will experience before getting the results back? Two requests have arrived at the server simultaneously T l = little request T b = big request T c = client request T i = initiate request T r = return data

3. ii) Sequence of Events 1. Client initiates the requests (T c ) 2. Server processes the big request first (T i + T b + T r ) 3. Server processes the little request (Steps 2-4 above: T i + T l + T r ) Total = T c + T i + T b + T r + T i + T l + T r

3. iii) Now suppose we have a multi-threaded server. The server makes a thread switch every T t, and it takes T s seconds to make the switch. Whenever a request is received, a thread is spawned to process the request. Spawning takes time T f. Thus, in this situation there would initially be a thread that received the requests, and it would spawn two additional threads to process them. The receiving thread would then remain blocked for the rest of the time, and can thus be ignored after this point.

3. iii) What are the least and greatest amounts of time that the client initiating the little request will experience before getting the results back?

3. iii) Receiving Thread loop: initiate request from queue or block) (T i ) spawn new thread for that request (T f )

3. iii) Assumptions The receiving thread does all its work in a single time slice (no switches) The little task takes one time slice The big task takes multiple time slices

3. iii) Best Case Sequence 1. Client initiates the requests (T c ) 2. The receiving thread initiates the requests and creates 2 threads (2(T i + T f )) 3. Switch to little task and complete (T s + T l ) 4. Switch to big task and execute (T s + T t ) 5. Switch to little task, return results (T s + T r )

3. iii) Worst Case Sequence 1. Client initiates the requests (T c ) 2. The receiving thread initiates the requests and creates 2 threads (2(T i + T f )) 3. Switch to big task and execute (T s + T t ) 4. Switch to little task and complete (T s + T l ) 5. Switch to big task and execute (T s + T t ) 6. Switch to little task, return results (T s + T r )

3. iv) Define server efficiency as the percent of time spent processing requests (i.e. the time T l or T b as a fraction of total time). What is the efficiency of this server in the single threaded case? Assume T c is predominately client and network, and so is not part of efficiency Efficiency = Ideal Time / Actual Time = (T l + T b ) / (T l + T b + 2(T i + T r ))

3. iv) What is the efficiency of this server in the multi-threaded case?

3. iv) What is the efficiency of this server in the multi-threaded case? Multi-threading adds overhead: T t of CPU time actually takes T t + T s

3. iv) What is the efficiency of this server in the multi-threaded case? Multi-threading adds overhead: T f to spawn threads T t of CPU time actually takes T t + T s Multiply the actual time by a factor of (T t + T s )/T t Efficiency = Ideal / ((Single Thread)(Context)) = (T l + T b ) / ((T l + T b + 2(T i + T r + T f ))(T t + T s )/T t )

3. v) Compute all the answers with actual numbers. One of the multi-threaded assumptions is no longer valid. The receiving thread does all its work in a single time slice: 2(T i + T f ) T t The numbers given: T i = 5 T f = 3 T t = 10 2(T i + T f ) = 16 > 10

3. v) What changes in the best case? 1. Client initiates the requests (T c = 5 ms) 2. The receiving thread initiates the requests and creates 2 threads (2(T i + T f ) = 16 ms) 3. Switch to little task and complete (T s + T l = 11 ms) 4. Switch to big task and execute (T s + T t = 11 ms) 5. Switch to little task, return results (T s + T r = 6 ms)

3. v) What changes in the best case? 1. Client initiates the requests (T c = 5 ms) 2. The receiving thread initiates and creates the little task thread (T i + T f = 8 ms) 3. It starts initiating the big task (2 ms) 4. Switch to little task and complete (T s + T l = 11 ms) 5. Switch to receiving thread and finish initiating (T s + T i + T f = 7 ms) 6. Switch to little task, return results (T s + T r = 6 ms)

3. iii) What changes in the worst case? 1. Client initiates the requests (T c = 5 ms) 2. The receiving thread initiates the requests and creates 2 threads (2(T i + T f ) = 16 ms) 3. Switch to big task and execute (T s + T t = 11 ms) 4. Switch to little task and complete (T s + T l = 11 ms) 5. Switch to big task and execute (T s + T t = 11 ms) 6. Switch to little task, return results (T s + T r = 6 ms)

3. iii) What changes in the worst case? 1. Client initiates the requests (T c = 5 ms) 2. The receiving thread initiates and creates the big task thread (T i + T f = 8 ms) 3. It starts initiating the little task (2 ms) 4. Switch to big task and execute (T s + T t = 11 ms) 5. Switch to receiving thread and finish initiating (T s + T i + T f - 2 = 7 ms) 6. Switch to big task and execute (T s + T t = 11 ms) 7. Switch to little task and complete (T s + T l = 11 ms) 8. Switch to big task and execute (T s + T t = 11 ms) 9. Switch to little task, return results (T s + T r = 6 ms)