Now you've learnt enough tools to implement distributed transations In-class problem solving: design a transactional key-value store Transaction t; t.begin_tx(); v1 = t.get(key1) t.put(key2,v2) t.commit_tx(); Your transactions will run on top of the lab's extent service (we assume the extent service is shared across many machines). You will need to change the extent service implementation.... Half of the class is to design transactions supporting serializability. Half of the class is to design transactions supporting snapshot isolation. The Percolator paper (OSDI'10): What's the problem? * Maintain Google's crawled document collection * Not a simple collection of texts, transformation/computation is needed - eliminate duplicates - invert hyperlinks * Existing solution process an entire doc collection (newly crawled + old ones) in a batch. Ideally, should only need to process the new ones. The challenge: maintain invariants across servers insert_doc(doc) { content_table.put(doc.url, doc.contents); hash = md5sum(doc.contents); if (!(canonical_url = dups_table.get(hash)) { dups_table.put(hash, doc.url); }else if canonical_url > doc.url) { dups_table.put(hash, doc.url); } } What can go wrong in the above code? * Failures example. - doc inserted, but does not appear in the collection of canonical urls * Concurrency example: - two concurrent inserts of nyt.com nytimes.com Why transactions help? Why do they need distributed transactions? Why does Percolator decide to support SI instead of serializability? - avoid read locks so that transactions can read lots and lots of data cheaply Percolator is built on top of BigTable (a versioned multi-coloumn key-value store). How is their design different from one where storage nodes support 2p-commit? Let's look at Pseudocode for Perrcolator in Figure 6? * pre-write is similar to the prepare phase in 2P commit. - why line 32? (check for write-write conflict) - why line 34? (concurrent 2P commit on overlapping data) * commit() - can I move line 48 to after 43? - ignore line 53 for now How does Percolator handle failures? * data is replicated by BigTable. * But, transaction coordiator (Percolator worker) can fail. Consequence: locks held during 2P commit are never cleaned up. The idea: if other workers notice that a lock is held for a long time, it cleans up. Challenge: what if B thinks A is dead but A is not dead and still in the process of committing? Percolator solution: - Every transaction corresponds to a primary lock. - All other locks in a transaction point to its primary lock. - Commit replaces the primary lock with a write record - Cleanup erases the primary lock if it's still there. - Access to primary lock is synchronized by the underlying BigTable single row transaction.