Distributed Systems Schedule

Overview

Please print out your answers and hand them in during the class. Thanks!

Lecture 2

Q1: [C#threads]

It is important to pick the right lock granularity. If the lock is too "big", there won't be enough concurrency. If the lock is too "small", it might not be correct (and might also bring deadlock problems). Explain the choice of lock granularity in Lab 1 for lock_server and argue why it is correct. Give an example of a fine grained lock that is incorrect for lock_server.

Q2: [RPC]

The RPC package described in the paper provides at-most-once semantics (see page 49, last paragraph). What kind of semantics do you think the yfs RPC implementation has before you change it by completing lab1? How does this difference affect the implementation of your lock server for lab 1?

Lecture 3

Q1: [Sequential Consistency by Lamport]

The definition of sequential consistency says the overall execution happens as if following a total order of READ/WRITE operations such that:

all CPUs (or processes/threads) see results consistent with the total order.
each CPU's (or process/thread's) instructions appear in-order in the total order.

Please argue why a system satisfying requirements R1 and R2 (defined in the paper) ensures sequential consistency: In other words, find a total order for a system satisfying R1 and R2 and argue that the total order you found ensures the properties listed above. Is the total order you find unique?

Q2:[Li:DSM]

Here's a strawman implementation of a distributed shared memory system: there are three nodes (N1,N2,N2) each having a full copy of all of the memory. A read request is satisifed by reading from the local copy of the memory (i.e. if a process executes on N1, then its reads are obtained from N1's local copy of the memory). A write is forwarded to all other nodes and the process issuing the write is allowed to continue without waiting for writes to finish at remote nodes.

Does this DSM implementation satisfy sequential consistency? Give concrete examples.

Lecture 4

Why does Tra have two vectors times (modification time and synchronization time)?

Q2:[Bayou] Suppose you want to implement a Calendar application on top of Bayou that supports three simple operations "ADD_EVENT", "DELETE_EVENT", "READ_EVENT". Each of the operations takes a timeslot as its argument and adds/deletes/reads/ an event. For fault tolerance reasons, a user can access the Calendar application from any Bayou server. Suppose each user is only allowed to access his own private Calendar, what kind of "non-intuitive" or "anomalous" results your Calendar user might experience? Suppose two users share a calendar, what kind of "non-intuitive" results they might experience?

Lecture 5

Q1: [SysR] What property must the system checkpoint operation guarantee? Atomicity? Consistency? How does System R guarantee it?

Lecture 6

Why must the coordinator in two phase commit log the outcome of its decision before sending out commit messages?

Lecture 8

Q2: [Paxos]
Is Paxos guranteed to terminate with a consensus if there are a majority of live nodes available? Why and why not?

Lecture 9

Q1: [MapReduce] In MapReduce, each Mapper saves intermediate key/value pairs in R partitions on its local disk. Contrast the pros and cons of this approach to the alternative of having Mappers directly send intermediate results in R reducers that shuffle and save intermediate results on reducers' local disk before feeding them to the user-defined reduce function.

Lecture 10

Q1: [Dryad] Why does Dryad only allow acyclic graphs for describing the applications' communication pattern? i.e. why not cycles? Q2: [Dryad] Describe a MapReduce program (or a series of MR programs) that solves the example SQL query in Section 2.1 of [Dryad]. If you assume different input file formats than ugriz.bin and and neighbors.bin than in the paper, explain them.

Lecture 11

Q1: [PBFT] If faulty nodes produce random results as opposed to being actively malicious, what is the number of replicas needed to handle f bad nodes? Why?

Lecture 12

Q1: [SUNDR]
In the simple straw-man, both fetch and modify operations are placed in the log and signed. Suppose an alternate design that only signs and logs modify operations. Does this allow a malicious server to break fetch-modify consistency or fork consistency? Why or why not? (Question Courtesy of Frans Kaashoek).

Lecture 13

Q1: [LBFS]
LBFS decides breakpoints based on the fingerprint of a 48-byte region. Why not just determine the breakpoints based on the actual content of the 48-byte region?

Lecture 14

Question on Dynamo and BigTable (Courtesy of Alberto Lerner).
Dynamo and Bigtable support "incremental scalability", i.e. accepting a new server into an existing group and making it responsible to handle a portion of the requests. Questions: 1) How each of these systems accepts a new server into the group? 2) What amount and which portion of data is assigned to it? 3) How is the assignment process? During this process, what happens if a client accesses that data?