Lecture 3: Primary/backup replication Notes adapted from Robert Morris' Fault tolerance we'd like a service that continues despite failures! available: still useable despite [some class of] failures correct: act just like a single server to clients very hard! very useful! Need a failure model: what will we try to cope with? Independent fail-stop computer failure Site-wide power failure (and eventual reboot) (Network partition) No common bugs, no malice Core idea: replication *Two* servers (or more) Each replica keeps state needed for the service If one replica fails, others can continue Example: fault-tolerant MapReduce master lab 1 workers are already fault-tolerant, but not master master is a "single point of failure" can we have two masters, in case one fails? [diagram: M1, M2, workers] state: worker list which jobs are done which workers idle TCP connection state program counter Big Questions: What state to replicate? How to keep replicas' state identical? When to cut over to backup? Are anomalies visible at cut-over? How to repair / re-integrate? Two main approaches: * State transfer "Primary" replica executes the service Primary periodically sends [new] state to backups Draw on board Primary Backup op1 -> state' op2 -> state'' op3 -> state''' ---------> * Replicated state machine All replicas execute all operations If same start state, same operations, same order, deterministic, then same end state Primary op1 --------> Choose a replication method for: * Replicate a virtual machine Asynchronous state transfer. * Replicate a key/value store Replicated state machine. PRIMARY-BACKUP REPLICATION IN LAB 2 outline: simple key/value database Get(k), Put(k, v), Append(k, v) primary and backup replicate by primary sending each operation to backups tolerate network problems, including partition either keep going, correctly or suspend operations until network is repaired allow replacement of failed servers you implement essentially all of this "view server" decides who p and b are main goal: avoid "split brain" -- disagreement about who primary is clients and servers ask view server they don't make independent decisions repair: view server can turn an "idle" server as b after old b becomes p primary initializes new backup's state Invariants: 1. only one primary at a time! 2. the primary must have the latest state! we will work out some rules to ensure these view server maintains a sequence of "views" view #, primary, backup 0: -- -- 1: S1 -- 2: S1 S2 4: S2 -- 3: S2 S3 monitors server liveness each server periodically sends a Ping RPC "dead" if missed N Pings in a row "live" after single Ping If > 2 servers Pinging view server if more than two, "idle" servers if primary is dead new view with previous backup as primary if backup is dead, or no backup new view with previously idle server as backup OK to have a view with just a primary, and no backup but -- if an idle server is available, make it the backup Q: does the new primary always have up-to-date replica of state? only promote previous backup i.e. don't make an idle server the primary Q: can more than one server think it is primary? 1: S1, S2 net broken, so viewserver thinks S1 dead but it's alive 2: S2, -- now S1 alive and not aware of view #2, so S1 still thinks it is primary AND S2 alive and thinks it is primary => split brain, no good how to ensure only one server acts as primary? even though more than one may *think* it is primary "acts as" == executes and responds to client requests the basic idea: 1: S1 S2 2: S2 -- S1 still thinks it is primary S1 must forward ops to S2 S2 thinks S2 is primary so S2 must reject S1's forwarded ops the rules: 1. primary in view i must have been primary or backup in view i-1 2. if you think you are primary, must wait for backup for each request 3. if you think you are not backup, reject forwarded requests 4. if you think you are not primary, reject direct client requests so: before S2 hears about view #2 S1 can process ops from clients, S2 will accept forwarded requests S2 will reject ops from clients who have heard about view #2 after S2 hears about view #2 if S1 receives client request, it will forward, S2 will reject so S1 can no longer act as primary S1 will send error to client, client will ask vs for new view, client will re-send to S2 the true moment of switch-over occurs when S2 hears about view #2 how can new backup get state? e.g. all the keys and values if S2 is backup in view i, but was not in view i-1, S2 should ask primary to transfer the complete state rule for state transfer: every operation (Put/Get/Append) must be either before or after state xfer == state xfer must be atomic w.r.t. operations either op is before, and xferred state reflects op op is after, xferred state doesn't reflect op, prim forwards op after state Q: does primary need to forward Get()s to backup? after all, Get() doesn't change anything, so why does backup need to know? and the extra RPC costs time Q: how could we make primary-only Get()s work? Q: are there cases when the lab 2 protocol cannot make forward progress? View service fails Primary fails before new backup gets state We will start fixing those in lab 3