Concurrency lecture II

Multi-threaded programs:
- achieve speedup (utilizing multiple CPU cores, do CPU processing and I/O at the same time)

Challenges:
- execution of different interleave in arbitary manner (many lead to incorrect results)
- we need to synchronize among threads to control allowed interleaving.

Synchronization primitives:

* Mutual exclusion (locking)
pthread_mutex_init
pthread_mutex_lock 
pthread_mutex_unlock

Locks protect a critical section  (only one thread is allowed in a CS)
  precondition: invariant holds
  pthread_mutex_lock
  violate invariants
  restore invariants
  pthread_mutex_unlock
  postcondition: invariant holds

In practice, you (mentally) associates a lock with a (set of) shared variables

* Lock granularity

typedef struct {
   char *name;
   int val;
}account;

account *accounts[100];

void
transfer(int x, int y, int amount)
{
	accounts[x]->val -= amount;
	acounts[y]->val += amount;
}

If no synchronization, what happens?
T1: read account x = 100
T2: read account x = 100
T1: write account x = 90
T2: write account x = 90
T1: read account y = 100
T1: increment account y = 110
T2: read account y = 110
T2: increment account y = 120

Lack of proper synchronization has resulted in many disasters
Recently, Flexcoin, Poloniex two bitcoin exchanges that got shutdown because of synchronization bug:
In their own words:
 The attacker successfully exploited a flaw in the code which allows transfers between flexcoin users. By sending thousands of simultaneous requests, the attacker was able to "move" coins from one user account to another until the sending account was overdrawn, before balances were updated.


The easy solution:
pthread_mutex_t m;
//pthread_mutex_init(&m,NULL)
void
transfer(int x, int y, int amount)
{
	pthread_mutex_lock(&m);
	accounts[x]->val -= amount;
	acounts[y]->val += amount;
	pthread_mutex_unlock(&m);
}

The lock is associated with the entired array of accounts ---> coarse-grained
no concurrency

Fine-grained locking
typedef struct {
   char *name;
   int val;
   pthread_mutex_t m;
}account;

void
transfer(int x, int y, int amount)
{
	pthread_mutex_lock(&accounts[x]->m);
	pthread_mutex_lock(&accounts[y]->m);
	accounts[x]->val -= amount;
	acounts[y]->val += amount;
	pthread_mutex_unlock(&accounts[x]->m);
	pthread_mutex_unlock(&accounts[y]->m);
}

Problem: deadlock
T1:transfer(X, Y, 10)
T2:transfer(Y, X, 20)

T1: acquired X's lock
T2: acquired Y's lock
T1: blocked waiting for Y's lock to be released
T2: blocked waiting for X's lock to be released
Neither can make progress!

Solution: ensure locks are acquired in order
e.g. acquire locks in the order based on account numbers
void
transfer(int x, int y, int amount)
{
	if (x < y) {
		pthread_mutex_lock(&accounts[x]->m);
		pthread_mutex_lock(&accounts[y]->m);
	}else{
		pthread_mutex_lock(&accounts[y]->m);
		pthread_mutex_lock(&accounts[x]->m);
	}
	accounts[x]->val -= amount;
	acounts[y]->val += amount;
	pthread_mutex_unlock(&accounts[x]->m);
	pthread_mutex_unlock(&accounts[y]->m);
}

In general: deadlocks are much easier to debug than races
program halts at the moment when deadlocks happens.

* Scheduling shared resources

Locking is a simple kind of resource scheduling -- one thread at a time.
What about more complicated scheduling policy? 
- need a mechanism to block a thread until some condition is true

Conditional variables
pthread_cond_init(pthread_cond_t *c,..);
pthread_cond_wait(pthread_cond_t *c, pthread_mutex_t *m);
- atomically unlock m and sleep until conditional variable is signalled
- when this function returns, m is locked.
pthread_cond_signal(pthread_cond_t *c);
- wake up one waiting thread
pthread_cond_broadcast(pthread_cond_t *c);
- wakes up all waiting threads

typedef struct {
	int *val; //can buffer one value
}channel;

channel c;
void
chan_send(int *v) {
   if (c->val == NULL) {
 	c->val = v;
   }else{
        //wait
   }
}

int *
chan_recv() {
   if (c->val) {
       int *v = c->val;
       c->val = NULL;
       return v;
   }else {
       //wait
   }
}

- explain that it's inefficient to use sleep(1) for waiting
- add pthread_cond_wait and signal

void
send(int *v) {
   pthread_mutex_lock(&m);
   while (c->val != NULL) 
       pthread_cond_wait(&c, &m);
   c->val = v;
   pthread_cond_unlock(&m);
}

int *
recv() {
   pthread_mutex_lock(&m);
   if (c->val) {
       int *v = c->val;
       c->val = NULL;
       pthread_cond_signal(&c);
       pthread_mutex_unlock(&m);
       return v;
   }else {
       //wait
   }
}


Notes on the above code:
The general pattern is:
T1:
lock(&m);
while (condition != true)
   cond_wait(&c,&m)
... do stuff...
unlock(&m)

T2:
lock(&m)
condition = true
cond_signal(&c)
unlock(&m)

Fist, always use cond_wait in this pattern
while (condition is not true) 
    pthread_cond_wait(&c, &m)

**Always** use "while" instead of "if".
- T1 waits
- T2 signals (but T1 does not run immediately)
- T3 acquires mutex, make the condition false (this does not happen in the channel example, but might be the case in more sophisticated programs)
- T1 starts running (with the condition now is false)
Another advantage of using 'while'--> use cond_broadcast instead of cond_signal would also be correct here.

Second, why the interface for cond_wait take a mutex argument and atomically unlocks it?
Suppose code is:
while (condition != true) {
    mutex_unlock()
    cond_wait()
}

Problem:
T1: mutex_unlock()
T2: make condition true, cond_signal (signal is dropped because noone is waiting)
T2: cond_wait() 

Signal is lost!

Third: code doing cond_signal needs to lock/unlock
suppose code is:
condition = true  //make condition true must be protected by locks
cond_signal

T1: lock
T1: while (condition != true)
T2: condition = true
T2: cond_signal //signal is dropped
T1: waits 

Signal is lost!