Distributed Systems Schedule

Overview

Please submit your answer for each day's paper question before the class on Piazza.

Lecture 2

#q1 [C#threads]

Will your lock server for lab 1 use mutexes? Condition variables? Why or why not? Give a specific example in your answer.

#q2 [RPC]

The RPC package described in the paper provides at-most-once semantics (see page 49, last paragraph). What kind of semantics do you think the yfs RPC implementation has before you change it by completing lab1? How does this difference affect the implementation of your lock server for lab 1?

Lecture 3

#q3 [Sequential Consistency by Lamport]

The definition of sequential consistency says the overall execution happens as if following a total order of READ/WRITE operations such that:

all CPUs (or processes/threads) see results consistent with the total order.
each CPU's (or process/thread's) instructions appear in-order in the total order.

Please argue why a system satisfying requirements R1 and R2 (defined in the paper) ensures sequential consistency: In other words, find a total order for a system satisfying R1 and R2 and argue that the total order you found ensures the properties listed above. Is the total order you find unique?

#q4 [Li:DSM]

Here's a strawman implementation of a distributed shared memory system: there are three nodes (N1,N2,N2) each having a full copy of all of the memory. A read request is satisifed by reading from the local copy of the memory (i.e. if a process executes on N1, then its reads are obtained from N1's local copy of the memory). A write is forwarded to all other nodes and the process issuing the write is allowed to continue without waiting for writes to finish at remote nodes.

Does this DSM implementation satisfy sequential consistency? Give concrete examples.

Lecture 4

#q5 [Bayou] Suppose you want to implement a Calendar application on top of Bayou that supports three simple operations "ADD_EVENT", "DELETE_EVENT", "READ_EVENT". Each of the operations takes a timeslot as its argument and adds/deletes/reads/ an event. For fault tolerance reasons, a user can access the Calendar application from any Bayou server. Suppose each user is only allowed to access his own private Calendar, what kind of "non-intuitive" or "anomalous" results your Calendar user might experience? Suppose two users share a calendar, what kind of "non-intuitive" results they might experience?

#q6 [COPS] Suppose an application client at data center D1 writes object x with version 2 (x_2) and then object y with version 3 (y_3). Suppose y_3 has propagated from data center D1 to data center D2 but x_2 has not yet arrived at D2. Suppose another application client data center D2 has just read Y_3, is it possible that it might read x_1 next? (If not, why not?) Will the client be blocked waiting for x_2 to arrive from D1? (If not, why not?)

Lecture 5

#q7 [SysR] What property must the system checkpoint operation guarantee? Atomicity? Consistency? How does System R guarantee it?

Lecture 6

#q10 [Snapshot]
Snapshot isolation (SI) differs from serilizatiability due to one anomaly that is possible under SI but not under serilizatiability. Describe the anomality and also give a concrete application for which the anomaly is undesirable.

Lecture 7

Lecture 8

#q13 [Paxos]
Is Paxos guranteed to terminate with a consensus if there are a majority of live nodes available? Why and why not?

#q14 [Spanner]
If Spanner adheres to 2PL for its read-only transactions, would it still need TrueTime for ensure serializability (or external consistency)? Why and why not?

Lecture 9

#q15 [MapReduce] In MapReduce, each Mapper saves intermediate key/value pairs in R partitions on its local disk. Contrast the pros and cons of this approach to the alternative of having Mappers directly send intermediate results in R reducers that shuffle and save intermediate results on reducers' local disk before feeding them to the user-defined reduce function.

#q16 [Dryad] Why does Dryad only allow acyclic graphs for describing the applications' communication pattern? i.e. why not cycles?

Lecture 10

#q17 [Treadmarks] Suppose that a simplified version of Treadmarks, called Dreadmarks, simply sent all modifications of variables between an acquire and a release to the next processor to acquire the same lock. No other modifications are sent. What changes does Treadmarks send that Dreadmarks does not? Outline a specific simple situation in which Treadmarks would provide more useful or intuitive memory behavior than Dreadmarks.

#q18 [Piccolo] Imagine a simplified version of Piccolo's table checkpointing mechanism where the master simply instructs all workers to save a copy of the current table partitions that they own. Explain why this checkpointing mechanism is unsatisfactory for the distributed crawler application.

Lecture 11

#q19 [PBFT] If faulty nodes produce random results as opposed to being actively malicious, what is the number of replicas needed to handle f bad nodes? Why?