Notes adapted from Robert Morris Remote Procedure Call (RPC) a key piece of distrib sys machinery; all the labs use RPC goal: easy-to-program network communication hides most details of client/server communication client call is much like ordinary procedure call server handlers are much like ordinary procedures RPC is widely used! RPC ideally makes communication look just like a local fn call: Client: z = fn(x, y) Server: fn(x, y) { compute return z } RPC aims for this level of transparency Examples from lab 1: DoJob Register RPC message diagram: Client Server request---> <---response Software structure client app handlers stubs dispatcher RPC lib RPC lib net ------------ net A few details: Which server function (handler) to call? Marshalling: format data into packets Tricky for arrays, pointers, objects, &c Go's RPC library is pretty powerful! some things you cannot pass: e.g., channels, functions Binding: how does client know who to talk to? Maybe client supplies server host name Maybe a name service maps service names to best server host Threads: Client often has many threads, so > 1 call outstanding, match up replies Handlers may be slow, so server often runs each in a thread RPC problem: what to do about failures? e.g. lost packet, broken network, slow server, crashed server What does a failure look like to the client RPC library? Client never sees a response from the server Client does *not* know if the server saw the request! Maybe server/net failed just before sending reply Simplest scheme: "at least once" behavior RPC library waits for response for a while If none arrives, re-send the request Do this a few times Still no response -- return an error to the application Q: is "at least once" easy for applications to cope with? Simple problem w/ at least once: client sends "deduct $10 from bank account" Q: what can go wrong with this client program? Put("k", 10) -- an RPC to set key's value in a DB server Put("k", 20) -- client then does a 2nd Put to same key [diagram, timeout, re-send, original arrives very late] Q: is at-least-once ever OK? yes: if it's OK to repeat operations, e.g. read-only op yes: if application has its own plan for coping w/ duplicates which you will need for Lab 1 Better RPC behavior: "at most once" idea: server RPC code detects duplicate requests returns previous reply instead of re-running handler Q: how to detect a duplicate request? client includes unique ID (XID) with each request uses same XID for re-send server: if seen[xid]: r = saved_results[xid] else r = handler() saved_results[xid] = r seen[xid] = true Server must eventually discard info about old RPCs (or saved_results hashmap grows without bound) when is discard safe? idea: unique client IDs per-client RPC sequence numbers client includes "seen all replies <= X" with every RPC much like TCP sequence #s and acks or only allow client one outstanding RPC at a time arrival of seq+1 allows server to discard all <= seq or client agrees to keep retrying for < 5 minutes server discards after 5+ minutes how to handle dup req while original is still executing? server doesn't know reply yet; don't want to run twice idea: "pending" flag per executing RPC; wait or ignore What if an at-most-once server crashes and re-starts? if at-most-once duplicate info in memory, server will forget and accept duplicate requests after re-start maybe it should write the duplicate info to disk? What about "exactly once"? Does not make sense to implement this at RPC layer. We can achieve this at the storage layer (labs 2 3) Go RPC is "at-most-once" open TCP connection write request to TCP connection TCP may retransmit, but server's TCP will filter out duplicates no retry in Go code (i.e. will NOT create 2nd TCP connection) Go RPC code returns an error if it doesn't get a reply perhaps after a timeout (from TCP) perhaps server didn't see request perhaps server processed request but server/net failed before reply came back Go RPC's at-most-once isn't enough for Lab 1 it only applies to a single RPC call if worker doesn't respond, the master re-send to it to another worker but original worker may have not failed, and is working on it too Go RPC can't detect this kind of duplicate No problem in lab 1, which handles at application level Lab 2 will explicitly detect duplicates Threads threads are a fundamental server structuring tool you'll use them a lot in the labs they can be tricky useful with RPC Go calls them goroutines; everyone else calls them threads Thread = "thread of control" threads allow one program to (logically) do many things at once the threads share memory each thread includes some per-thread state: program counter, registers, stack Threading challenges: sharing data two threads modify the same variable at same time? one thread reads data that another thread is changing? these problems are often called races need to protect invariants on shared data use Go sync.Mutex coordination between threads e.g. wait for all Map threads to finish use Go channels deadlock thread 1 is waiting for thread 2 thread 2 is waiting for thread 1 easy detectable (unlike races) lock granularity coarse-grained -> simple, but little concurrency/parallelism fine-grained -> more concurrency, more races and deadlocks let's look at a toy RPC package to illustrate these problems look at today's handout -- l-rpc.go it's a simplified RPC system illustrates threads, mutexes, channels it's a toy, though it does run assumes connection already open only supports an integer arg, integer reply omits error checks struct ToyClient client RPC state mutex per ToyClient connection to server (e.g. TCP socket) xid -- unique ID per call, to match reply to caller pending[] -- chan per thread waiting in Call() so client knows what to do with each arriving reply Call application calls reply := client.Call(procNum, arg) procNum indicates what function to run on server WriteRequest knows the format of an RPC msg basically just the arguments turned into bits in a packet Q: why the mutex in Call()? what does mu.Lock() do? Q: could we move "xid := tc.xid" outside the critical section? after all, we are not changing anything [diagram to illustrate] Q: do we need to WriteRequest inside the critical section? note: Go says you are responsible for preventing concurrent map ops that's one reason the update to pending is locked Listener runs as a background thread what is <- doing? not quite right that it may need to wait on chan for caller Back to Call()... Q: what if reply comes back very quickly? could Listener() see reply before pending[xid] entry exists? or before caller is waiting for channel? Q: should we put reply:=<-done inside the critical section? why is it OK outside? after all, two threads use it. Q: why mutex per ToyClient, rather than single mutex per whole RPC pkg? Server's Dispatcher() note that the Dispatcher echos the xid back to the client so that Listener knows which Call to wake up Q: why run the handler in a separate thread? Q: is it a problem that the dispatcher can reply out of order? main() note registering handler in handlers[] what will the program print? Q: when to use channels vs shared memory + locks? here is my opinion use channels when you want one thread to explicitly wait for another often wait for a result, or wait for the next request e.g. when client Call() waits for Listener() use shared memory and locks when the threads are not intentionally directly interacting, but just happen to r/w the same data e.g. when Call() uses tc.xid but: they are fundamentally equivalent; either can always be used. Go's "memory model" requires explicit synchronization to communicate! This code is not correct: var x int done := false go func() { x = f(...); done = true } while done == false { } it's very tempting to write, but the Go spec says it's undefined use a channel or sync.WaitGroup instead Study the Go tutorials on goroutines and channels use go's race detector 'go test -race'