Concurrency lecture:

What's concurrency? 
- things happening "simultaneously"
- e.g. multiple CPU cores concurrently executing instructions
- e.g. CPU and network card concurrently doing processing

why write concurrent programs?
- speed up programs using multiple CPUs
- speed up programs by interleaving CPU processing and I/O.

How to write concurrent programs?

1. Use multiple processes

Process = process context + code, data and stack

Draw different address spaces

code,data,stack                    code, data, stack
-----                              -----
CPU state (registers)              CPU state (registers)
kernel context:                    kernel context:
VM structure                       VM structure 
file descriptors                   file descriptors 
...                                ...

Example program

int numbers[1000];
int sum1 = 0;  
int sum2 = 0;
void
main()
{
	pid_t pid = fork();
	assert(pid >= 0);
	if (pid != 0) {
		for (int i = 0; i < 500; i++) {
			sum1 += numbers[i];
		}
	} else {
		for (int i = 0; i < 500; i++) {
			sum2 += numbers[500+i];
		}
		return;
	}
	waitpid(pid, NULL, 0);
	printf("sum is %d\n", sum1+sum2);
}

Two processes concurrently sum the numbers. However, it is 
difficult to pass the results between them because they have 
separate address spaces.

How to communicate across processes? (inter-process communication)
- via pipes, file system
- via kernel-supported shared memory objects 
- via UNIX sockets

2. Single process, multiple threads
Each thread has its own control flow
Each thread shares the same code, data and kernel context
Each thread has its own stack (stacks are not protected from other threads)

Draw picture

one address space 
with 
code, data, multiple stack

one kernel context (vm structures, descriptor table)
multiple thread contexts (CPU state , data registers, condition codes, stack pointers, PC)

Threads vs. Processes
Similarities:
* each has its own control flow
* run concurrently with each other
* can be context switched

Differences:
* threads share one address space (processes have separate address spaces)
* threads are cheaper (creating, switching) than processes


Posix threads interface
thread creation:
pthread_create
pthread_join
pthread_detach

int numbers[1000];
int sum1 = 0; 
int sum2 = 0;
void
main()
{
	pthread_t tid;
	pthread_create(&tid, NULL, run, (void *)(numbers + 500))
	for (int i = 0; i < 500; i++) {
		sum1 += numbers[i];
	} 
	pthread_join(tid, NULL); //if not joined, then peer thread must call pthread_detach(pthread_self()) to avoid memory leak
	printf("sum is %d\n", sum1+sum2);
}

void *
run(void *arg)
{
	int *numbers = (int *)arg;
	for (int i = 0; i < 500; i++) {
		sum2 += numbers[i];
	}
	return NULL;
}

Draw picture of threaded execution
main thread                     peer thread
call pthread_create    |      
pthread_create returns |              | run
               for loop|              | for loop
                       |              | run exits
     call  pthread_join|
 pthread_join returns  |


Challenges of multi-threaded programs:
Three Synchronization problems:
* races
* deadlock
* livelock

What's a race?
* Multiple threads make conflicting access (read-write, write-write) to shared variables
* Different interleavings of conflicting access result in violation of application invariants.

Go back to earlier example, and turn sum1 sum2 into one variable.
int numbers[2] = {1, 1};
int sum = 0; 
void
main()
{
	pthread_t tid;
	pthread_create(&tid, NULL, run, (void *)(numbers + 1))
	for (int i = 0; i < 1; i++) {
		sum += numbers[i];
	} 
	pthread_join(tid, NULL); //if not joined, then peer thread must call pthread_detach(pthread_self()) to avoid memory leak
	printf("sum is %d\n", sum);
}

void *
run(void *arg)
{
	int *numbers = (int *)arg;
	for (int i = 0; i < 1; i++) {
		sum += numbers[i];
	}
	return NULL;
}


What's the expected printf sum?
- 2
What's the actual sum?
- can be 1 

Why can the outcome be 1?
Zoom in the line sum += numbers[i]
The corresponding machine code
movq ...(,%rdi,4), %rcx
movq ...(%rip), %rdx
addq %rcx, %rdx
movq %rdx, ...(%rip)

mark the above instructions 1,2,3,4.
Two threads T, T' has combinatorial number of interleavings
e.g.  T1,T2,T3,T4,T'1,T'2,T'3,T'4
T1,T'1, T2, T'2, T3, T'3, T4, T'4
illustrate why the second interleaving results in 1


Another example
typedef struct node {
	int v;
	struct node *next;
}node;

node *head = NULL;

void
list_insert(int v) {
	node *n = malloc(sizeof(node));	
	n->v = v;
	n->next = head;
	head = n;
}

void
list_delete() {
	n = head;
	head = n->next;
	free(n);
}

What if two threads concurrently insert?
T: n->next = head  T': n->next=head T: head =n; T' head =n;
T's insertion of n is lost

what if an insert and delete occur concurrently?
T: n = head  T': n->next = head; T: head = n->next; T: free(n); T' n->next = head; T' head = n
Now the first node in the linked list points to a freed node!

- Critical section
  * conflicting access of shared variables constitute critical section
  - in our earlier example
     sum += numbers[i]
     insert/delete's access of head

- Protecting CS
  * Must make sure accesses within a CS are not interleaved 
  * or equivalently, CS must have *atomicity*
  * atomicity is achieved via mutual exclusion (Mutex)

How?
  * pthread_mutext_init(), pthread_mutex_lock(), pthread_mutex_unlock()
  * once a thread T has successfully locked, no other thread can lock until the T unlocks.

Why protecting CS?
  * Application invariants hold outside CS, but do not hold in CS.
  What are invariants?
  - in the sum example, sum captures the sum of numbers scanned
  - in the linked list, the list structure is correct.

  pthread_mutex_lock
  violate invariants
  restore invariants
  pthread_mutex_unlock