Lec1: Understand OS concepts w/ the shell

Lecture notes courtesy of MIT 6.828 staff (Frans Kaashoek, Robert Morris)

Here's the simple pseudo-code for a shell:

while (1) {
  write (1, "$ ", 2); //print command prompt
  readcommand (command, args);   // parse user input
  if ((pid = fork ()) == 0) {  // fork a new process, if child
    exec (command, args, 0);
  } else if (pid > 0) {   // if parent
    wait (0);   // wait for child to terminate
  } else {
    perror ("Failed to fork\n");
  }
}

O/S process abstractions

O/S I/O abstractions

I/O redirection

  • How does the shell implement:
    $ ls > tmp1
    
    just before exec insert:
    close(1);
    creat("tmp1", 0666);   // fd will be 1
    exec (command, args, 0);
    

    The kernel always uses the first free file descriptor, 1 in this case. Could use dup2() to clone a file descriptor to a new number.

  • Good illustration for why fork + exec vs. CreateProcess on Windows.
  • What if you run the shell itself with redirection?
    $ sh < script > tmp1
    
    If for example the file script contains
    echo one
    echo two
    
  • What if we want to redirect multiple FDs (stdout, stderr) for programs that print to both?
    $ ls f1 f2 nonexistant-f3 > tmp1 2> tmp1
    
    after creat, insert:
    close(2);
    creat("tmp1", 0666);   // fd will be 2
    
    why is this bad? illustrate what's going on with file descriptors. better:
    close(2);
    dup(1);		       // fd will be 2
    
    or in bourne shell syntax,
    $ ls f1 f2 nonexistant-f3 > tmp1 2>&1
    
  • Linux has a nice representation of a process and its FDs, under /proc/PID/ (do "man proc" to learn more)

    Pipelined commands

  • how to run a series of programs on some data?
    $ sort < file.txt > tmp1
    $ uniq tmp1 > tmp2
    $ wc -l tmp2
    $ rm tmp1 tmp2
    
    can be more concisely done as:
    $ sort < file.txt | uniq | wc
    
  • A pipe is an O/S abstraction that implements a one-way communication channel. Here is an example of how a user program uses this abstraction:
    int fdarray[2];
    char buf[512];
    int n;
    
    pipe(fdarray);
    write(fdarray[1], "hello", 5);
    n = read(fdarray[0], buf, sizeof(buf));
    // buf[] now contains 'h', 'e', 'l', 'l', 'o'
    
  • file descriptors are inherited across fork(), so this also works:
    int fdarray[2];
    char buf[512];
    int n, pid;
    
    pipe(fdarray);
    pid = fork();
    if(pid > 0){
      write(fdarray[1], "hello", 5);
    } else {
      n = read(fdarray[0], buf, sizeof(buf));
    }
    
  • How does the shell implement pipelines (i.e., cmd 1 | cmd 2 )? We want to arrange that the output of cmd 1 is the input of cmd 2.
  • The shell creates processes for each command in the pipeline, hooks up their stdin and stdout, and waits for the last process of the pipeline to exit. Here's a sketch of what the shell does, in the child process of the fork() we already have, to set up a pipe:
    int fdarray[2];
    
    if (pipe(fdarray) < 0) panic ("error");
    if ((pid = fork ()) == 0) {  child (left end of pipe)
      close (1);
      tmp = dup (fdarray[1]);   // fdarray[1] is the write end, tmp will be 1
      close (fdarray[0]);       // close read end
      close (fdarray[1]);       // close fdarray[1]
      exec (command1, args1, 0);
    } else if (pid > 0) {        // parent (right end of pipe)
      close (0);
      tmp = dup (fdarray[0]);   // fdarray[0] is the read end, tmp will be 0
      close (fdarray[0]);
      close (fdarray[1]);       // close write end
      exec (command2, args2, 0);
    } else {
      printf ("Unable to fork\n");
    }
    
  • Why close read-end and write-end? (To ensure that every process starts with 3 file descriptors, and that reading from the pipe returns end of file after the first command exits.)