ANU crest

Research School of
Computer Science

Concurrent & Distributed
Systems
(comp2310, comp6310)

Lectures
<empty space> Schedule
<empty space> Contents
(incl. slides)
<empty space>
Laboratories
<empty space> Setup & manuals
<empty space> Weekly labs
<empty space> Assignments
<empty space>
Assessment
<empty space> Examination
<empty space>
Resources
<empty space> References
<empty space> Sources
<empty space> Description
<empty space> Forum

<empty space>

Laboratory 7

Fighting Zombies with Forks and Pipes


You are expected to use the man pages! Remember that man pages with the same title may appear in multiple sections. If you want to search all sections use "man -a".


Process Identifiers


Every Unix process is associated with a unique non-negative integer corresponding to its process ID. There are some special processes, such as process ID 0 that is usually associated with the scheduler, while process ID 1 (remember this number) is usually the init process that is invoked by the kernel at the end of the bootstrap procedure. Two useful functions that we will use in this lab are:
pid_t getpid(void);
pid_t getppid(void);
ACTIONS
  • (BEFORE LAB) Read the man page for these functions and determine what they do.
  • (BEFORE LAB) Execute the command top - what does it do?

The fork Function


The only way a new process is created by the Unix kernel is when an existing process calls the fork function. This function is called once but returns twice as two processes - the parent and the child process. The only difference between the two processes is the value of the return code, in the parent the return code gives the PID of the child, while in the return value of the child is 0. A small example code is given below:
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
int main(void){
  pid_t fork_pid;
  printf(" Hello from main \n");
  fork_pid=fork();
  printf(" After fork %d\n",fork_pid);
  return 0;
}
ACTIONS
  • Compile the above code (using gcc) and verify that it does print out two "After fork" messages.
  • (AFTER LAB) Using the functions getpid and getppid modify the above code to obtain and printout the PID of both the parent and child process after the fork command. Thus verifying that the value of fork_pid in the parent process is indeed the PID of the child.
  • Modify the above code to print out different messages from the parent and child after fork.
  • The following program is identical to the above except that the output is now written to a file called "MY_FORK_FILE.txt":
    #include <sys/types.h>
    #include <unistd.h>
    #include <stdio.h>
    int main(void){
      pid_t fork_pid;
      FILE *my_file;
    
      my_file=fopen("MY_FORK_FILE.txt","w");
      fprintf(my_file," Hello from main \n");
      fork_pid=fork();
      fprintf(my_file," After fork %d\n",fork_pid);
    
      return 0;
    }
    
    Compile and run this code, then examine the contents of "MY_FORK_FILE.txt".
    • The first thing you should note is that the output from the child process appears in the same file as the parent process. One characteristic of fork is that all file descriptors that are open in the parent are duplicated in the child.
    • The second thing you should note is the number of times "Hello from main" appears in the file. Given your earlier experiments does this surprise you? (It should!).
  • We modify the code further as follows:
    #include <sys/types.h>
    #include <unistd.h>
    #include <stdio.h>
    int main(void){
      pid_t fork_pid;
      FILE *my_file;
    
      my_file=fopen("MY_FORK_FILE.txt","w");
      fprintf(my_file," Hello from main \n");
      fflush(my_file);
      fork_pid=fork();
      fprintf(my_file," After fork %d\n",fork_pid);
    
      return 0;
    }
    
    Compile and run this code. Examine the contents of "MY_FORK_FILE.txt", is this now in line with what you expected from your first experiments. Explain exactly what has happened. (It is important you understand this so verify your thoughts with your tutor.)

The wait and waitpid Functions


When a process terminates, either normally or abnormally, the parent is notified by the kernel sending the parent the SIGCHLD signal. (What's a signal you ask - do "man signal", and/or ask your tutor). The parent can choose to ignore this signal, or it can provide a function that is called when the signal occurs (an interrupt handler). The default action is for this signal to be ignored. Functions wait and waitpid permit the parent to check on the termination status of its children. These functions do one of the following:
  • block
  • return immediately with the termination status of a child
  • return immediately with and error
ACTIONS
In the following code we create two processes. The child process is forced to sleep for some period (5 seconds - see "man sleep"). To verify that the process is sleeping we have inserted a number of timestamps.
    #include <sys/types.h>
    #include <unistd.h>
    #include <stdio.h>
    #include <sys/time.h>
    #define SLEEP_TIME 5
    double get_elapsed_time();
    int main(void){
      pid_t fork_pid;
      double elp_time;
    
      elp_time = get_elapsed_time();
      printf("Hello from main   %14.6lf\n",elp_time);
    
      fork_pid=fork();
      if (fork_pid == 0) {
        elp_time = get_elapsed_time();
        printf("1st Hello from Child  %14.6lf\n",elp_time);
        sleep(SLEEP_TIME);
        elp_time = get_elapsed_time();
        printf("2nd Hello from Child  %14.6lf\n",elp_time);
      }else{
        elp_time = get_elapsed_time();
        printf("1st Hello from Parent %14.6lf\n",elp_time);
        elp_time = get_elapsed_time();
        printf("2nd Hello from Parent %14.6lf\n",elp_time);
      }
        elp_time = get_elapsed_time();
        printf("Good bye from %8d %14.6lf\n",fork_pid,elp_time);
      return 0;
    }
    double get_elapsed_time(void){
      /* Routine to get wall clock time since first call */
      struct timeval mytime;
      static double start_time=0.0e0;
      double new_time;
    
      /* this function gets seconds and microseconds since the epoch */
      gettimeofday(&mytime,NULL);
      new_time=(double)mytime.tv_sec + mytime.tv_usec*1.0e-06;
      if (start_time <= 0.0e0) start_time=new_time; return new_time-start_time; } 
  • Compile and run the above code. Verify that the parent does indeed complete substantially before the child.
  • Using the wait system call force the master to wait until the child process has completed until it prints out "2nd Hello from Parent".
  • (AFTER LAB) Modify the above code so that the master creates 2 child processes and waits for both children to complete before exiting itself. (Still use wait and not wait_pid).

Zombies


In Unix terminology a process that has terminated, but whose parent has not yet waited for it, is called a zombie, i.e. it is a process that is not quite dead - it can no longer execute instructions, but information concerning this process is still taking up space in the process tables. Because of the latter, and since often a process does not care what happens to its child, the question arises as to how can we avoid a child process becoming a zombie?

ACTIONS
Consider the following program:

    #include <unistd.h>
    #include <stdio.h>
    int main(void){
      printf(" Hello from main \n");
    
      if (fork() != 0){
        /* THIS IS THE PARENT */
        printf(" Hello from Parent \n");
        sleep(2);
      } else {
        printf("Parent PID of Child before sleep %d\n",getppid());
        sleep(4);
        printf("Parent PID of Child after sleep  %d\n",getppid());
      }
      return 0;
    }
    
  • Compile and run this code. Explain why the values printed out by the child process before and after the sleep(4) statement.
  • Noting the above, a parent who does not care about the termination status of a child process can avoid the creation of a zombied process by using fork twice. Explain how this is done.


The exec Functions


As demonstrated above, fork enables the user to create new processes. These processes are, however, a clone of the calling process. If we want to create a new process running a different executable we have to follow the initial fork command with a call to one of the exec family of functions. (Essentially members of this family differ in how they locate the executable, the treatment of command line arguments, and their treatment of current execution environment. Do man exec to view the details.) When a process calls exec that process is completely replaced by the new program, and the new program starts executing at its main function. The process ID does not change, but the current process (its text, data, heap and stack segments) are replaced with a brand new program from disk.

The following gives a short example of the use of exec.

    #include <unistd.h>
    #include <stdio.h>
    int main(void){
      int status;
      if (fork()){
        // Parent
      } else {
        // Child
        printf(" HELLO from Child \n");
        status=execl("/bin/ls","ls","-l","-t",'\0');
        if (status != 0)perror("execl error:");
      }
      return 0;
    }
    

ACTIONS

  • Compile and run the above code, verify that it works.
  • (AFTER LAB) Modify the above code so that it works when "/bin/ls" is replaced with "ls", i.e. which version of exec should you now be using?
  • (AFTER LAB) Add the "-r" option to the argument list.
  • Modify the above code so that instead of executing "ls" the child process picks up an executable (residing in your current working directory) that you have previously generated by compiling the following source code:
    #include <unistd.h>
    #include <stdio.h>
    int main(int argc, char *argv[]){
      int i,n,total;
      /* simple program to sum integers from 1..n, where n is given
         as a command line argument */
    
      if (argc != 2 ){
        printf("Error no input argument\n");
        exit(-1);
      }else{
        // convert the character string argument to an integer
        n = atoi(argv[1]);
      }
      total=0;
      for (i=1;i<=n;i++)total+=i;
      printf("value of n %d value of total %d \n",n,total);
      
      return 0;
    }
    
    test the code by passing (in the command line) a variety of different values for n.

The pipe Function


"Unix IPC (Interprocess Communication) has been, and continues to be, a hodgepodge of different approaches, few of which are portable across all Unix implementations. .... About the only form of IPC that we can count on, regardless of the Unix implementation, is half duplex pipes" (Stevens, Advanced Programming in the UNIX Environment).

Although common to all flavors of Unix, pipes have two limitations:

  • They are half-duplex.
  • They can only be used between processes that have common ancestry
(make sure you know what the above means - it was part of an exam question).

A pipe is like an I/O buffer administered by the kernel through which data can flow. We create a pipe by a call to the pipe function. Two file descriptors are returned, one that is used for reading and one that is used for writing to the pipe. The following illustrates a pipe in a single process.

    #include <sys/types.h>
    #include <unistd.h>
    #include <string.h>
    #include <stdio.h>
    int main(void){
      int status;
      char string1[]="Hello Mate";
      char string2[]="Get Lost Mate";
      int my_pipe[2];
      int len1,len2;
      status = pipe(my_pipe);
    
      len1 = strlen(string1);
      len2 = strlen(string2);
      printf(" Initial String1 \"%s\" Length %d\n",string1, len1);
      printf(" Initial String2 \"%s\" Length %d\n",string2, len2);
    
      status = write(my_pipe[1],string1, len1);
      printf("Write status %d \n",status);
      status = read(my_pipe[0],string2,len2);
      printf("Read status  %d \n",status);
      printf("Read String %20s\n",string2);
      return 0;
    }
    
the program declares two character strings that are initialized to some random strings. The program writes character string1 to the pipe and then reads from the pipe into character string2.

ACTIONS

  • Compile and run the above code. In both cases what is the value reported by status.
  • Modify the code to investigate what happens if we read less data from the pipe than was initially written.

In a single process a pipe is next to useless. Normally they are used together with fork to provide an IPC channel between a parent and child process. To do this two pipes (A and B) are defined before the process issues a fork command - this gives rise to 4 file descriptors; fd_A_read, fd_A_write, fd_B_read, and fd_B_write. After forking both the parent and child has copies of all these file descriptors. To turn pipe A into a channel that provides communication from

  • parent to child: parent closes fd_A_read and child closes fd_A_write. This provides a one-way channel from parent to child.
  • child to parent parent closes fd_A_write and child closes fd_A_read. This provides a one-way channel from child to parent
Thus with two pipes we can use one for communication from parent to child (e.g. pipe A) and the other (e.g. pipe B) for communication from child to parent. This is illustrated in the following program.
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <string.h>
    #include <sys/wait.h>
    int main(void){
      char string1[]="Hello Mate";
      char string2[]="XXXXXXXXXX";
      char string3[]="*Get Lost*";
      int fd_pc[2],fd_cp[2];
      int status;
      pipe(fd_pc);
      pipe(fd_cp);
    
      if (fork()) {
        /* PARENT */
        close(fd_pc[0]); /* close read  pc */
        close(fd_cp[1]); /* close write cp */
        write(fd_pc[1],string1,strlen(string1));
        printf("PARENT - string2 BEFORE pipe: %s\n",string2);
        read(fd_cp[0],string2,strlen(string1));
        printf("PARENT - string2 AFTER pipe : %s\n",string2);
        wait(&status);
      } else {
        /* CHILD */
        close(fd_pc[1]); /* close write pc*/
        close(fd_cp[0]); /* close read  cp*/
        printf("CHILD - string2 BEFORE pipe: %s\n",string2);
        read(fd_pc[0],string2,strlen(string1));
        printf("CHILD - string2 AFTER pipe : %s\n",string2);
        write(fd_cp[1],string3,strlen(string3));
      }
      return -1;
    }
    
ACTIONS
  • Compile and run the above code. Make sure you are happy with how it works.
  • You write the following test code to communicate between two processes using pipes:
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <string.h>
    #include <sys/wait.h>
    int main(int argc, char *argv[]){
      int n, *buf1, *buf2, len, total;
      int fd_pc[2],fd_cp[2];
      int status, i;
      pipe(fd_pc);
      pipe(fd_cp);
    
      if (argc != 2 ){
        printf("Error no input argument\n");
        exit(-1);
      }else{
        // convert the character string argument to an integer
        n = atoi(argv[1]);
      }
      len = sizeof(int)*n;
      buf1 = (int*) malloc(len);
      buf2 = (int*) malloc(len);
    
      if (fork()) {
        /* PARENT */
        close(fd_pc[0]); /* close read  pc */
        close(fd_cp[1]); /* close write cp */
    
        for (i=0; i<n; i++)buf1[i]=n;
        write(fd_pc[1],buf1,len);
        read(fd_cp[0],buf2,len);
        total=0;
        for (i=0; i<n; i++)total+=buf1[i]+buf2[i];
        printf("PARENT - total : %d\n",total);
        wait(&status);
      } else {
        /* CHILD */
        close(fd_pc[1]); /* close write pc*/
        close(fd_cp[0]); /* close read  cp*/
        for (i=0; i<n; i++)buf2[i]=n-i;
        write(fd_cp[1],buf2,len);
        read(fd_pc[0],buf1,len);
        total=0;
        for (i=0; i<n; i++)total+=buf1[i]+buf2[i];
        printf("CHILD - total : %d\n",total);
      }
      return -1;
    }
    
    Compile and run the above code. You will find that it works correctly for an input value of N = 1000, but there appears to be a problem when N = 2000. What is the problem and how can you fix it? (Again an exam question has related to this issue).
  • Write a program that establishes a ring of N processes linked via pipes, that is each process has two pipes, a read pipe from the process numbered ((N-1) mod N) and a write pipe to process ((N+1) mod N). The number of processes in the ring is read as input. Regardless of N, no process should require more than 13 file descriptors. When established your ring of processes should pass a message around the ring.
  • The above illustrates the use of pipes to communicate between child and parent processes. The following code illustrates how pipes can be used with fork and exec:

    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    int main(void){
       int fd[2];
       pipe(fd);
       if (fork()) {
          /* PARENT */
          dup2(fd[0], STDIN_FILENO);
          close(fd[0]);
          close(fd[1]);
          execlp("wc", "wc","-l",'\0');
       } else {
          /* CHILD */
          dup2(fd[1], STDOUT_FILENO);
          close(fd[0]);
          close(fd[1]);
          execlp("ls", "ls", "-l","-r","-t",'\0');
       }
       return -1;
    }
    
    explain what the above code is doing. (Try compiling and running it!)

Next Lab (for the brave)

Select, Semaphores, Signals and Sockets in Unix


© The Australian National University
ANU CRICOS Provider Code - 00120C
Thursday, 1 September 2011Uwe R. Zimmer

<empty space>