L5

Lec 5: Scheduling

Lecture notes courtesy of Eddie Kohler, Robert Morris and Frans Kaashoek

Scheduling

Basic Scheduling algorithms

  • In the simplest form, OS must schedule which of the user processes to run next among a queue of runnable processes.
  • Two possibilities for when to schedule:
    • cooperative (non-preemptive) scheduling: The running process decides when to yield its resources. Or, finish the current running process before starting another.
    • preemptive scheduling: the kernel can stop a potentially bad process and force it to give up control of its resources.
  • Preemptive scheduling is used in most modern operating systems for user-level applications; however, cooperative scheduling is still often used by the kernel when it manages its own internal jobs.

  • Performance considerations for scheduling algorithms
    • Throughput: The rate of useful work done for a given workload. Ideal scenario: draw diagram. offer load increases, throughput increases linearly until it hits capacity. The higher resource utilization (the lower overhead), the better the throughput is.
    • Latency (or turn-around time): The total amount of time it takes a job to complete its execution. This includes the all the time where the job is waiting around for other job to complete.
    • Wait time: The time it takes before the schedule allows the given job to start executing.
    • Fairness: Does each job get equal share? (or proportional to priority?)
    • Average vs. worse case.
  • Scheduling is hard because different performance goals usually conflict with each other.

  • Algorithm #1: FCFS (first-come-first-served)
  • Organize the set of runnable processes (or jobs) in a FIFO queue.
  • A new job is added to the tail of the queue.
  • Run the job at the head of queue till it finishes, then run the next job, etc.
  • Example:
    Let x refer to the context switching overhead
    
    Job   Arrival-Time  Amount-of-work  Start    Finish   Wait    Turn-around
    A     0             3                0        3         0       3
    B     1             5                3+x      8+x       2+x     7+x
    C     3             2                8+2x     10+2x     5+2x    7+2x
    
    Timeline: +---A----+----------B---------+---C-------+
              0        3+x                 8+2x        
    
    Avg waiting time: (7+3x)/3 = 2.33+x
    Avg turn-around: (17+3x)/3
    
  • The context switching overhead, x, is not negligible. (flush TLB cache, save/restore registers, and plus the running of the scheduling algorithm itself!)
  • Good: running time of FCFS is low: O(1)
  • Bad: average waiting time is big because a long running job makes everyone else wait longer.
  • What if we want to optimize average waiting time?

  • Algorithm #2: Shortest-job-first
  • When time comes to dispatch a job, execute one with the shortest running time
  • It's idealistic: How to predict running time beforehand?
  • Example:
    Job   Arrival-Time  Amount-of-work  Start    Finish   Wait    Turn-around
    A     0             3                0        3         0       3
    B     1             5                5+2x     10+2x    4+2x     9+2x
    C     3             2                3+x      5+x       x       2+x
    
    Timeline: +---A----+--C--+------------B------------+
              0      3+x    5+2x        
    
    Avg waiting time: (4+3x)/3 = 1.33+x
    Avg turn around time: (14+3x)/3 
    
  • Good: We can prove that shortest-job-first scheduling results in the lowest possible average waiting time (among all non-preemptive scheduling algos).
    • Proof by contradiction.
  • Bad: need to predict running time of a job somehow.
  • Bad: more expensive as OS needs to keep a sorted queue of jobs (Worst-case is O(n) to insert a new job)
  • Bad: starvation (i.e. non-fairness): a long-running job may never get an opportunity to run because short jobs are preferred.

  • Algorithm #3: Round-robin
  • Pre-emptive RR: break long jobs into a number of smaller jobs, each of which executes no more than a fixed time value (quantum).
  • Keep a FIFO queue of runnable jobs, execute a job for up to quantum time, moves the pre-empted job to the end of the queue.
    Execution Time line: A B A B C A B C B B 
    
    Job   Arrival-Time  Amount-of-work  Start    Finish   Wait    Turn-around
    A     0             3                0        6+5x      0       6+5x
    B     1             5                1+x     10+9x      x       9+9x
    C     3             2                4+4x     8+7x    1+4x      5+7x
    
    Avg waiting time: (x+1+4x)/3 = 0.33+1.67*x
    Avg. turn-around time: (20+21x)/3 
    
  • Good: short average waiting time (because each quantum is short).
  • Good: easy and efficient to implement.
  • Bad: more overhead for long jobs (due to more context switching..)
  • Bad: longer average turn-around time than shortest-job first

  • Algorithm #3: Priority scheduling
  • Priorities can be defined either externally or internally.
    1. Internal Priorities: The priority of a process is defined using some measurable internal factors like time limits, memory requirements, number of open files etc.
    2. External Priorities: They are set by some criteria that is external to the OS.
  • strict priority scheduling: Processes having lowest priority parameter are serviced first. Round Robin scheduling is employed among processes that have equal priority.
  • Variants:
    1. Preemptive Priority based Scheduling: A newly arrived process's priority is compared with an already existing priority. If it is of greater importance, then the current process is preempted else it continues.
    2. Non Preemptive Priority Based Scheduling: The newly arrived process is put at the head of the priority queue.
  • Good: Can potentially implement different policies (emacs should be given preference over backup etc.).
  • Bad: starvation (low priority-job might never gain access to CPU).

    Common scheduling Pitfalls

      Pitfall: Priority Inversion

    • Assume policy is strict priority.
      Thread T1: low priority (e.g. check to see if I have new mail periodically).
      Thread T2: medium priority.
      Thread T3: high priority (e.g. real-time game).
    • T1: acquire(l)
      context switch to T3 (maybe T3 wakes up needs to run: it has priority)
      T3: acquire(l)... must wait for T1 to release(l)...
      context switch to T2 (again, perhaps T2 needs to run & has prio over T1)
      T2 computes for a while
      T3 is indefinitely delayed despite high priority
    • Can solve if T3 lends its priority to holder of lock it is waiting for, so T1 runs instead of T2.
    • This is really a multiple scheduler problem, since locks schedule access to locked resource.
      • i.e. lock schedules access according to FCFS
      • CPU schedules according to priority.

      Pitfall: Receive Livelock

    • Scheduling is not confined to user processes.
    • Typical UNIX execution contexts:
      • process, user half (e.g. web server)
      • process, kernel half (e.g. the read() write() syscall)
      • soft interrupts (e.g. TCP ACKS, ARP responses etc.)
      • device (hard) interrupts (e.g. reception of a packet)
    • Rules for context transitions:
      • Hardware can force hard interrupt any most times except in a hard intr
      • Soft interrupt can be invoked any time except during a soft or hard intr
      • Interrupt must complete before interrupted code can continue, since interrupts share stack
      • Never runs a user half if any kernel half wants to run
      • User halfs pre-emptable via timer interrupt
    • (Implicit) scheduling priority structure: hard intr > soft intr > kernel half > user half
    • Why soft interrupts?
      • For time-consuming processing in response to hard interrupts, which should run relatively soon, and might not be associated with any specific kernel half (e.g. TCP ACKs, ARP responses, etc.)
      • Hard interrupt does time-critical stuff, schedules soft interrupt, more hard interrupts can occur during soft int.
      • Short-running hard interrupts avoid input overruns.
    • Receive interrupt (hard intr)
      • Copy incoming packet to memory
      • Append packet pointer to IP input queue (ipintrq)
      • Schedule IP soft interrupt
    • IP soft interrupt
      • For each packet in IP input queue,
      • Decide what to do with it
      • E.g. deliver to a user program
        • Append packet pointer to socket Q
        • Wake up process
    • Transmit interrupt (hard intr)
      • Free last packet buffer
      • Copy packet from if_snd queue to h/w
      • Tell h/w to transmit
    • Network interface hardware (e.g. ethernet)
      • Executes concurrent to the main CPU
      • Has small receive buffer on interface h/w (a few packets)
      • Raises receive interrupt for each received packet
      • Has small output buffer (a few packets)
      • Raises transmit-complete interrupts when buffers open up
    • Why the input queue (ipintrq)?
    • Why the output queue (if_snd)?
    • Is this a good arrangement?

    • Draw diagram of web request served vs. requests issued.
    • Possible solutions:
    • Ideal: When IP input queue fills out, disable receive interrupts for the network device. When the queue is drained, re-enable receive interrupts.
    • Practical: Polling. Whenever IP input queue opens up, polls device for received packets.
    • General principle:
      • Don't spend time on new work before completing existing work.
      • Or give new work lower priority than partially-completed work.
      • Corollary: If you might discard, do it as early as possible.

      Pitfall: Multiple Interacting Schedulers.

    • Suppose you want your emacs to have priority over everything else. Give it high CPU priority. Does that mean nothing else will run if emacs wants to run?
    • No: Disk scheduler might not know to favor emacs's disk I/Os. Typical UNIX disk scheduler favors disk efficiency, not process prio.
    • Suppose emacs needs more memory. Other processes have dirty pages; emacs must wait. Disk scheduler doesn't know these other processes' writes are high prio.

      Pitfall: Server Processes

    • Suppose emacs uses X windows to display. The X server must serve requests from many clients.
    • Does it know that emacs' requests should be given priority?
    • Does the OS know to raise X's priority when it is serving emacs?
    • Similarly for DNS, and NFS. Does the network know to give emacs's NFS requests priority?