Transactions (Single machine, local transactions) Transactions groups a set of read/write operations together begin_tx v <-- read(checking) v' <-- read(saving) write(checking, v-10) write(saving,v'+10) commit_tx Transactions Properties: ACID Atomicity, Consistency, Isolation, Durability Two goals: 1. handle failure (A, D). A machine crash and later re-starts. 2. handle concurrency (I) First goal first. ----- Failure handling --------------- begin_tx //buffer writes (No-Steal policy) end_tx commit_tx starts the commit protocol: write(checking, 90) to disk write(saving, 110) to disk In example, what happens when crash happens after first write, before second write. ---> after recovery, partial writes, money is lost Ideal property: all-or-nothing atomicity How to implement all-or-nothing atomicity: Write-ahead Logging One type of logging: redo logging append_log_entry(checking=90,saving=110) write(checking, 90) <------ does not have to immediately flush to disk write(saving, 110) <-------does not have to immediately flush to disk how to ensure each log entry can be written atomically? (each log entry is checksummed) how to recover from logs? replay from the beginning of time recovery takes too log periodic checkpoint how to checkpoint? naive way: stop all transactions, flush all storage state to disk, write a checkpoint entry to log 2nd approach: flush all storage state to disk, write a checkpoint entry to log Question: is 2nd approach correct? At recovery time, persistent state might contain partial writes from committed transactions. ---> some writes may be applied more than once --> okay? it's correct if log entries contain the after-image. not correct if log entries contain the operations. e.g. append_log_write(transfer 10 from checking to saving) Note that writes *during* a transaction cannot be flushed to disk otherwise, redo-logging + checkpoint would not be incorrect Second goal -------- Concurrency control -------------- T1: transfer $10 from checking to saving T2: withdaw $20 from checking T3: read the sum of checking, saving Bad interleaving T1: Read(C)=100, Read(S) Write(C, 90), Write(S, 110) T2: Read(C)=100 Write(C, 80) What's the expected outcome? Isolation: T1 happens either before or after T2 (sometimes called before-after atomicity) Ideal isolation semantics: serializability Definition: execution of a set of transactions is equivalent to some serial order - Two executions are *equivalent* if they have the same effect on database and produce the same output - typically, people mean strict serializability, i.e. the equivalent serial order cannot re-order commit-to-begin_tx ordering. Conflict serializability - An execution schedule is an ordering of read/write/commit/abort operations (executed by the local system) e.g. R_1(C), R_2(C), R_1(S), W_2(C), Commit2, W_1(C), W_1(S), Commit1 Two schedules are equivalent if they: * contain the same operations * order conflicting operations the same way: A pair of operations conflict if they access the same data item and one is a write. Examples: Serializable? R_1(C), R_1(S), R_2(C), W_2(C), Commit2, W_1(C), W_1(S), Commit1 No Serialiazable? R_1(C), R_1(S), R_3(C), R_3(S), Commit3, W_1(C), W_1(S), Commit1 equivalent to R_3(C), R_3(S), Commit3, R_1(C), R_1(S), W_1(C), W_1(S), Commit1 Serialiazable? R_1(C), R_1(S), W_1(C), R_3(C), R_3(S), Commit3, W_1(S), Commit1 no. How to check if a schedule is serializable? Check if Serialization graph is cycle-free How to ensure a serializable schedule? Locking-based approach Strawman solution 1: * Grab global lock when transaction starts * release global lock when transaction finishes committing Strawman solution 2: - grab lock on item X before read/write X - release lock on item X after read/write X Problem with strawman 2? Permits this non-serializable interleaving R_1(C), R_1(S), W_1(C), R_3(C), R_3(S), Commit3, W_1(S), Commit1 Look at W_1(C), if lock on C is released, then another transaction T' can read new value of C, but T' must be able to read new value of S that T has not yet written yet. If write lock is not held, ---> read of uncommitted value So, write lock must be held till the transaction commits How about read locks, must it also be held till transaction commits? R_3(C), R_1(C), R_1(S), W_1(S), W_1(S), Commit1, R_3(S), Commit3 This is a non-serializable schedule R_3(C) reads old value R_3(S) reads new value (non-repeatable reads) So, read locks must also be held till commit time 2-phase-locking (2PL) - a growing phase in which transaction is acquiring locks - a shrinking phase in which locks are released in practice growing phase is the entire transaction shrinking phase is during commit Optimization: use read/write locks instead of exclusive locks 2PL example: T1: Begin_tx R_lock(C) v<-R(C) R_Lock(S) v'<-R(S) write_lock(C) write v-10 to C (in T1's write-buffer) write_lock(S) write v'+10 to S (in T1's write-buffer) Commit_tx write (C=90) write (S=110) unlock(C) unlock(S) T3: begin_tx read_lock(C) v = read(C) read_lock(S) v = read(S) commit_tx unlock(C) unlock(S) Can R_1(C), R_1(S), W_1(C), R_3(C), R_3(S), Commit3, W_1(S), Commit1 ocurr? Proof sketch of 2PL: suppose exists some non-serializable schedule in this schedule, there must exist a pair of transactions such as T -> T' (according to conflicting access on item a) T' -> ... -> T (according to conflicting access on item b) 1) T releases lock of a 2) T' grabs lock of a 3) T' releases lock of b 4) T grabs lock of b according to non-serializable schedule 1 -> 2 3 -> 4 according to rules of 2PL 4 -> 1 2 -> 3 so, 4->1->2 2->3->4 a contradiction More on 2PL: * what is a lock is unavailable? * deadlock possible? * how to cope with deadlocks? - grab locks in order? not always possible - (central) system detects deadlock cycles and aborts involved transactions Other concurrency control methods: Optimistic concurrency control T1: Begin_tx v<-R(C) # remember read version c#=1 v'<-R(S) # remember read version s#=1 write v-10 to C (in T1's write-buffer) write v'+10 to S (in T1's write-buffer) Commit_tx Lock(C) Lock(S) validate C# still 1 validate S# still 1 write (C=90), version 2 write (S=110), version 2 unlock(C) unlock(S) *OCC has no deadlock problem The phantom problem: database has fancier ops than k/v store T1: begin_tx update employee (set salary=1.1*salary) where dept = "CS" commit_tx T2: begin_tx insert into employee ("carol", "CS") insert into employee ("mike", "CS") If only locking individual items T1: update Bob T1: update Herbert T2: inserted carol T2: inserted mike T1: update larry T1: update mike T1: ... Problem: cannot just lock items touched, but must also guarantee the non-existence of new ones! Solution: * predicate lock (lock the predicate statement "depart=cs") * range locking (lock the range of B-tree index on the department) * often ignored in practice Degree of isolation paper * degree 0 (read uncommitted) short locks * degree 1 (read committed) short read locks, long write locks * degree 2 (repeatable read) long read/write locks. (serializable unless you squint, default mysql isolation level) * degree 3 (serializable) long read/write locks + solving the phantom problem Why offer less than serializability? Performance (allow more concurrent operations. e.g. running a big analytics jobs on the database) Disadvantage of locking-based approach? * need to detect deadlocks * big read-only transactions are performance killers Multi-version concurrency control * Each data item is associated with multiple versions (not just version #) * multi-version transactions: - reads choose the appropriate version - at commit time, system validates if okay to make writes visible (by generating new versions) Snapshot isolation A transaction * reads a "snapshot" of database image * can commit only if there are no write-write conflict -----------------------------------------------------------> T1: begin_tx, Read(C), Write(C) begin_tx : T is assigned a start timestamp T.sts Read: T reads the biggest version i such that i < = T.sts Write: buffers writes, T.wset += {C} commit_tx: T is assigned a commit timestamp T.cts checks for all T' such that T'.cts \in [T.sts, T.cts] whether T'.wset and T.wset overlaps write item with version T.cts Recall our earlier bad interleaving: T1: Read(C) Read(S) Write(C, 90) Write(S, 110) T3: Read(C) Read(S) T3 reads old value of C and S (due to T3.sts < T1.cts) Does snapshot isolation implement serializability? No The write-skew problem R_1(A), R_2(B), W_1(B), W_2(A), C_1(A), C_2(B) non-serializable interleaving but, possible under SI