Concurrency lecture: What's concurrency? - things happening "simultaneously" - e.g. multiple CPU cores concurrently executing instructions - e.g. CPU and network card concurrently doing processing why write concurrent programs? - speed up programs using multiple CPUs - speed up programs by interleaving CPU processing and I/O. How to write concurrent programs? 1. Use multiple processes Process = process context + code, data and stack Draw different address spaces code,data,stack code, data, stack ----- ----- CPU state (registers) CPU state (registers) kernel context: kernel context: VM structure VM structure file descriptors file descriptors ... ... Example program int numbers[1000]; int sum1 = 0; int sum2 = 0; void main() { pid_t pid = fork(); assert(pid >= 0); if (pid != 0) { for (int i = 0; i < 500; i++) { sum1 += numbers[i]; } } else { for (int i = 0; i < 500; i++) { sum2 += numbers[500+i]; } return; } waitpid(pid, NULL, 0); printf("sum is %d\n", sum1+sum2); } Two processes concurrently sum the numbers. However, it is difficult to pass the results between them because they have separate address spaces. How to communicate across processes? (inter-process communication) - via pipes, file system - via kernel-supported shared memory objects - via UNIX sockets 2. Single process, multiple threads Each thread has its own control flow Each thread shares the same code, data and kernel context Each thread has its own stack (stacks are not protected from other threads) Draw picture one address space with code, data, multiple stack one kernel context (vm structures, descriptor table) multiple thread contexts (CPU state , data registers, condition codes, stack pointers, PC) Threads vs. Processes Similarities: * each has its own control flow * run concurrently with each other * can be context switched Differences: * threads share one address space (processes have separate address spaces) * threads are cheaper (creating, switching) than processes Posix threads interface thread creation: pthread_create pthread_join pthread_detach int numbers[1000]; int sum1 = 0; int sum2 = 0; void main() { pthread_t tid; pthread_create(&tid, NULL, run, (void *)(numbers + 500)) for (int i = 0; i < 500; i++) { sum1 += numbers[i]; } pthread_join(tid, NULL); //if not joined, then peer thread must call pthread_detach(pthread_self()) to avoid memory leak printf("sum is %d\n", sum1+sum2); } void * run(void *arg) { int *numbers = (int *)arg; for (int i = 0; i < 500; i++) { sum2 += numbers[i]; } return NULL; } Draw picture of threaded execution main thread peer thread call pthread_create | pthread_create returns | | run for loop| | for loop | | run exits call pthread_join| pthread_join returns | Challenges of multi-threaded programs: Three Synchronization problems: * races * deadlock * livelock What's a race? * Multiple threads make conflicting access (read-write, write-write) to shared variables * Different interleavings of conflicting access result in violation of application invariants. Go back to earlier example, and turn sum1 sum2 into one variable. int numbers[2] = {1, 1}; int sum = 0; void main() { pthread_t tid; pthread_create(&tid, NULL, run, (void *)(numbers + 1)) for (int i = 0; i < 1; i++) { sum += numbers[i]; } pthread_join(tid, NULL); //if not joined, then peer thread must call pthread_detach(pthread_self()) to avoid memory leak printf("sum is %d\n", sum); } void * run(void *arg) { int *numbers = (int *)arg; for (int i = 0; i < 1; i++) { sum += numbers[i]; } return NULL; } What's the expected printf sum? - 2 What's the actual sum? - can be 1 Why can the outcome be 1? Zoom in the line sum += numbers[i] The corresponding machine code movq ...(,%rdi,4), %rcx movq ...(%rip), %rdx addq %rcx, %rdx movq %rdx, ...(%rip) mark the above instructions 1,2,3,4. Two threads T, T' has combinatorial number of interleavings e.g. T1,T2,T3,T4,T'1,T'2,T'3,T'4 T1,T'1, T2, T'2, T3, T'3, T4, T'4 illustrate why the second interleaving results in 1 Another example typedef struct node { int v; struct node *next; }node; node *head = NULL; void list_insert(int v) { node *n = malloc(sizeof(node)); n->v = v; n->next = head; head = n; } void list_delete() { n = head; head = n->next; free(n); } What if two threads concurrently insert? T: n->next = head T': n->next=head T: head =n; T' head =n; T's insertion of n is lost what if an insert and delete occur concurrently? T: n = head T': n->next = head; T: head = n->next; T: free(n); T' n->next = head; T' head = n Now the first node in the linked list points to a freed node! - Critical section * conflicting access of shared variables constitute critical section - in our earlier example sum += numbers[i] insert/delete's access of head - Protecting CS * Must make sure accesses within a CS are not interleaved * or equivalently, CS must have *atomicity* * atomicity is achieved via mutual exclusion (Mutex) How? * pthread_mutext_init(), pthread_mutex_lock(), pthread_mutex_unlock() * once a thread T has successfully locked, no other thread can lock until the T unlocks. Why protecting CS? * Application invariants hold outside CS, but do not hold in CS. What are invariants? - in the sum example, sum captures the sum of numbers scanned - in the linked list, the list structure is correct. pthread_mutex_lock violate invariants restore invariants pthread_mutex_unlock