Lec 4: OS structure (cont)

Lecture notes courtesy of Eddie Kohler, Frans Kaashoek, Robert Morris

  • In the last lecture, we talked about
  • In this lecture, we will discuss

    Previous lecture: virtualize CPU and memory

  • We have virtualized CPU and memory w/ minimal performance overhead
  • Process is a (group of) thread(s) in a separate address space
  • OS needs to keep per-process state necessary for virtualizing CPU and memory
  • Structure of a process descriptor table:
    +--procdescriptor_t--------+
    |   ....                   |
    |registers (%eax,%esp,...) |
    |  address space           |
    |   ...                    |
    +--------------------------+
    

    Virtualize I/O

  • Three main considerations determine how OS handles I/O
  • For these reasons, modern OS provide abstract interfaces for I/O devices.

  • What's a good interface for accessing I/O?
  • Important I/O devices: Hard Disk, CD-ROM, Network, and Keyboard
  • They differ in characteristics (random access vs. stream etc.). One interface for each device?
  • UNIX' idea: treat everything as a file.
  • Our previous interface for reading file (in L3)
    int read_file(char *filename, int offset, char *buffer, int len)
    
  • Better interface:
    int read(int fd, char *buffer, int len)
    
  • If the file descriptor is of streaming type/random access, the next len bytes are read.
  • The offset is moved to a separate function to handle random access operations.
    off_t lseek(int fd, off_t offset, int whence)
    
    where whence is either SET_SEEK, SEEK_CUR, SEEK_END.
  • Attempts to call lseek on streaming file will fail.
  • The write interface is similar
    int write(int fd, const void *data, int len)
    

  • Before I/O can be accessed, a file descriptor needs to be created using the open
    int open(const char *name, int mode);
    
  • Each process' process descriptor table contains a file descriptor table (Otherwise, OS does not know what fd=3 refers to!)
    +--procdescriptor_t--------+
    |   ....                   |
    |registers (%eax,%esp,...) |
    |  address space           |
    |file descriptor table     |
    +--------------------------+
    
    +--file descriptor table--------------+
    | ...                                 |
    | "wordfile.txt", O_RDONLY, offset=10 |
    | tcp_socket, O_RDWR,  ....           |
    | ...                                 |
    +-------------------------------------+
    
  • close frees up space in file descriptor table

    Waiting on I/O

  • In L3, the OS wastes a lot of time doing nothing while I/O is slowly doing its thing
  • Recall the read_ide_sector function as part of sys_read
    read_ide_sector()
    {
      while ((inb(0x1F7) & (0xC0)) != 0x40)
        /* do nothing */
    }
    
  • Busy waiting defeats our goal of high utilization.
  • Better alternative: make the process waiting on I/O to yield to other processes.
  • Keep track of what processes are blocked (on I/O) and which are runnable.
    +--procdescriptor_t--------+
    |   ....                   |
    |registers (%eax,%esp,...) |
    |  address space           |
    |file descriptor table     |
    |process state             |
    +--------------------------+
    

    Basic Process interaction: Fork and Exec

  • Part of minilab1's task is to implement process creation
  • The application code that uses fork
    pid_t pid;
    pid = fork();
    if (pid == 0) {
      //do child code
    }else{
      //do parent code
    }
    
  • Fork can be implemented easily:
  • exec loads a program from file into memory
  • Copying memory from parent to child wasteful if child is going to execute a new program? (trick: copy-on-write)
  • Processes also need to synchronize with each other: kill, wait, ...

    H/w Privilege

  • In L3, we talked about how the kernel can only properly isolate (enforce modularity) different application processes with h/w support. For example,
  • Applications are not only negligent, they could also be malicious.
  • How to prevent application processes from disabling timer interrupts, modifying page tables, directly communicating with h/w?
  • H/w support for privilege levels. Main idea: Running code has "privilege" associated with it. High privilege code (the operating system) can run any instruction it likes. Low privilege code (user code) can only run "safe" commands.
  • x86 supports 0,1,2,3 privilege levels, of which a typical OS like Linux uses only two. (0: most privileged 3: least privileged)
  • Each application has a current privilege level (CPL) (encoded in two bits in the Code Segment(CS) register)
  • Each dangerous instruction on the processor executes a check before executing a dangerous instruction, which looks like the following pseudo-code:
    if (CPL != 0)
      raise exception;
    else
      execute instruction;
    
  • What instructions should be "protected" (i.e. "dangerous")?
  • User-level applications cannot set CPL nor directly execute "dangerous" instructions. Otherwise, the h/w processor generates an exception and gives control back to kernel.
  • Kernel must set the CPL correctly before jumping to user code.

    Protected control transfer

  • Application processes cannot directly execute "dangerous" x86 instructions like inb/outb but must invoke OS services (functions) instead.
  • App cannot directly call kernel functions. Why?
  • App invoke kernel services (syscalls) by going through protected control transfer using x86 instruction int
  • int generates a trap exception. (Intel classifies software generated interrupts as exceptions)
  • Interrupts are handled by interrupt gate or trap gate
  • Each interrupt is associated with a number used for identifying its corresponding gate
  • When an interrupt occurs, the following steps are performed:
    1. Processor switches to a numerically lower pivilege level, e.g. with int
    2. Processor performs stack switch if switching to a numerically lower pivilege level
    3. Processor saves the EFLAGS, CS, EIP registers on the stack
    4. Processor jumps to the interrupt/trap gate based on the interrupt #.
    5. Kernel stores the general registers (%eax,%ebx,...) into memory
    6. Kernel performs the rest of interrupt handling logic
    7. Kernel invokes the iret instruction to return to interrupted procedure (restores saved EIP, CS, EFLAGS, performs stack switching, resets privilege)
  • Note which of the tasks are done by h/w (processor) and which are done by the kernel.

    System call

  • Linux invokes syscall using int 0x80 (interrupt # 128)
  • The syscall number is passed as %eax
  • In the inerrrupt handling routine, kernel invokes the right syscall function based on syscall #.
  • The return value of the syscall is passed via %eax
  • Linux enables tracing of syscalls invoked by a user-level process, i.e. strace -p 6778 where 6778 is the process ID.
  • Tracing is implemented by invoking tracing functions upon entering and exitting syscall handling routines.
  • Note that a user application usually uses a library wrapper function to access syscalls.
  • Minilab1 implements syscall differently from Linux (or any other typical modern OS). Does it use a single interrupt number for all its syscalls?

    H/w vs. software fault isolation

  • H/w privilege level support allows us to forbid apps from executing "dangerous" instructions.
  • Software-enforced isolation idea 1: Why not parse app code to throw away "dangerous" instructions?
  • Software idea 2: why not create an intermediate "instruction set" (and an associated runtime system) to forbid "dangerous" instructions?

    Summary: Organization of a Modern Operating System

  • Picture of user-level processes, OS, syscall interfaces
  • Kernel runs with full privilege over the hareware.

  • Above is the traditional OS organization: monolithic OS
  • Kernel is a big program occupying a single address space
  • All kernel code runs w/ full h/w privilege (CPL=0)
  • good: fast, easy for sub-systems to cooperate (e.g. paging and file system) via simple function calls
  • bad: no isolation within kernel. One buggy component affects everything else.

  • Alternative organization: microkernel
  • Split up kernel subsystems into server processes
  • app communicates w/ servers via IPC
  • Kernel's task: implement fast IPC
  • Good: simple/efficient kernel, sub-systems isolated, enforced better modularity
  • bad: cross-sub-system optimization harder, lots of IPCs may be slow

  • Monolithic OS remains the most popular today.

    Digression: H/w emulation