ANU The Australian National University



____________________________________________________

[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]

____________________________________________________

COMP2100/2500
Lecture 21: Shell Programming I

Summary

An introduction to writing shell scripts using bash.

Aims


History lesson

Not so long ago, if you wanted to run a program on a computer, you would have to submit a job, in the form of a stack of punched cards. That stack contained a card for each line of your program, a card for each line of input data, and a number of cards that described how to run the program and access other devices attached to the computer. The instructions on those control cards were written using the computer's job control language, or JCL for short. Indeed, JCL is also the name of IBM's job control language. Here's an example (taken from Appendix 2 of IBM 360 Assembler Language Programming by Gopal K. Kapur, published by Wiley in 1970). Each line corresponds to a separate punched card.

// JOB PRACTICE
// ASSGN SYS012,X'182'
// ASSGN SYS014,X'01F'
// ASSGN SYS010,X'181'
// OPTION LINK
// EXEC ASSEMBLY
The set of cards for the program goes here
/*
// EXEC LNKEDT
// EXEC
The set of cards for the data goes here
/*
/&

The job control cards specified how to compile the program, which devices to use and how they should be configured (e.g. the recording density of the magnetic tape in the tape drive), and where in the stack of cards the program and data start and finish. More advanced job control languages included commands for specifying the name of the user running the job, accounting information (in older years of expensive big machines with multiple users there was strict accounting — time was money; this is still the case for cutting edge supercomputers), the priority of the job, and so on.

With the rise of so-called on-line (i.e. interactive) systems, the job control language did not disappear, but the commands could now be entered directly into the computer by the user.

Today's shells are the descendants of the early job control languages. They incorporate many of the features we have discussed above, but also provide higher-level structuring facilities that we are used to in conventional imperative programming languages (such as Java). Indeed, we will be interested in the shell as a programming language.

The Shell

Here is a simplified hierarchical view of a computer system and its users.

The Shell

Hardware is roughly characterized as those parts of the system which have an immediate physical reality.

The Operating System is low-level system software that is responsible for controlling the hardware and providing facilities to application programs.

The Shell conveys user requests to the OS.

In some systems, the shell is just one component of the operating system (and thus can control devices and processes directly). However, in Unix, the shell has no such special privileges, and has the status of a normal application program.


What is a shell?

In the schema shown, the shell is the program that acts as the user's interface to the operating system. When the user logs in to the computer system, they begin interacting with the shell program in order to start (and stop) application programs.

One of your tasks this semester is to move from being an ordinary user of the computer system to becoming an expert user; a necessary condition for becoming an effective software developer. You will need to become proficient at using the shell. We shall illustrate how the command line (shell) and GUI can handle same tasks in the third shell lecture.


What is a Shell Script?


Which Shell?

“The nice thing about standards is that there are so many of them to choose from.”

Grace Murray Hopper, as quoted in the Unix Haters Handbook, p.10


There have been many different shell programs written for Unix systems, and most are available for you to try. They include:

  1. The Bourne Shell (sh) -- The first UNIX shell, written by Stephen Bourne (1970s, the Unix heroic years).

  2. The C Shell (csh) -- A replacement shell with syntax like that of the C language, written by Bill Joy between 1978 and 1980 (then a student at Stanford). Joy also created vi and, much later, worked on the design of Java. Despite the good intent, the csh turned out to be poorly disigned and buggy. It should not be used today. It's considered harmful. "The csh is a tool utterly inadequate for programming, and its use for such purposes should be strictly banned!"

  3. The T C Shell (tcsh) -- A modernised C shell with command and file completion. (This is the default shell on student accounts.) Bill Joy's source code was the starting point; around sixty people contributed code in the years 1980-1998. ('T' stands for tenex, once the Unix rival).

  4. The Bourne Again Shell (bash) -- A modernised Bourne shell incorporating many of the best features of sh and tcsh. Written by Brian Fox and Chet Ramey starting in 1989; the latest version dates from 2002. This is the standard shell on many Unix systems.

  5. Others such as ksh (David Korn, AT&T, default on System V), zsh, rsh (Paln 9) . . . .

In this course we will focus on bash because:

Both csh and tcsh can also be used for scripting purposes. It's good to be able to read both styles (see the above remark regarding the C shell, though). Remember that your default login shell is tcsh, so to try out some of the things shown here on a command line, you will need to start a copy of bash.

If you are not sure which shell you are running as CLI, type ps -f to find out.


Our First Shell Script

It's traditional for the first program in any new language to be one which just writes the string “Hello world!” on the screen. So here it is in bash.

  1. First you must open a shell window. You can do this from the graphical interface by selecting the icon that looks like a terminal and a shell from the panel at the bottom of the screen.

  2. Create a file called hello in your bin directory. (From here on, stuff that you type on CLI is shown in blue.)

    [comp2100@partch]$ cd ~/bin
    [comp2100@partch]$ emacs vim hello

    The shell has a notion of current directory. When you open a new shell window, the current directory will be your home directory. Its path is something like /students/u1234567/. The cd command changes the current directory. It moves you around the directory tree. The tilde symbol ~ is a shorthand way of writing the path to your home directory. So cd ~/bin means “Move to the bin sub-directory of my home directory.”

  3. Type the following two lines into the opened file

    #!/bin/bash
    
    echo Hello world!

    Now save your shell script to the disk.

  4. Make the file executable.

    [comp2100@partch]$ chmod +x hello

    Every Unix file has permissions associated with it, which determine who is allowed to do what to it. The main permissions are read, write and execute, and they are set separately for each of three categories: the user who owns it, other users in the same group, and other users not in the same group. If you type the command ls -l into the shell, you will see a long listing of the contents of the current directory, including all these permissions. The default for new files depends on the system, but it will certainly have read and write permission for the user, and it will not have execute permission for anyone. The chmod command (short for “Change Mode”) modifies these permissions. You can only do it to files you own.

    See the manual page for this command by typing man chmod. (The man command is extremely useful, although you'll probably get more information than you ever wanted. Learning to read man pages is another important task for you for this semester.)

  5. Run your script by typing

    [comp2100@partch]$ ./hello
    Hello world!
    [comp2100@partch]$

You can also run the bash shell and interact with it just like we have been interacting with tcsh so far.


Basic Shell Commands

Built in commands.

Some commands are a part of the shell.

External commands.

For any command which is not built in, the shell will search for a program with that name to run. This allows any program to be used as a command in a shell script.

Commands are separated by new lines or a semi-colon ;. For example the single input line

cd ..; ls

moves up to the parent directory and then lists its contents.

Command options are very important if you want your command to do exactly what you want (nothing less, nothing more). To find out about the options, use the man pages (or, more modern Unix/Linus documenting system info). E.g., to find out what option can be used with grep command, type man grep. The man pages information can be at first somewhat cryptic. The Unix man pages are written in a special "unixy" style; the ability to read the man pages is a hall mark of every professional user.


Filters and Pipes

Many commands read user input, and produce output. Such commands are called filters.

Filter
diagram

We can pipe the output of one program to be the input of another. The syntax is

program1 | program2

Pipe diagram

The Unix Pipe mechanism is unidirectional, it cannot be reversed.


Examples of Pipes

Counting files (and directories).

If ls lists files in a directory, and wc -w counts words of input, then

ls | wc -w

counts the files in a directory!

Counting all your files:

The command ls -R lists the files in a directory, and all sub-directories. For example, here is part of the output produced by running it in my comp2100 directory:

[barnes@partch comp2100]$ ls -R
.:
assignments
bin
index.src.html
labs
lectures
misc
schedule
www
 
./assignments:
 
./bin:
build
...

and so on for several screens... You can see that the assignments directory was empty, but that the bin directory had a file or subdirectory in it called build.

So you can count all your files (and directories) with:

ls -R ~ | grep -v ':$' | wc -l

How does that work? The output of ls -R is piped into grep. The -v option of grep says select all those lines which do not match the regular expression. The regular expression :$ matches all lines which end with a colon. The dollar sign matches the end of a line. So the result of

ls -R ~ | grep -v ':$'

is just a list of all the file and directory names, one per line, starting at your home directory and going through all sub-directories. Piping that into wc -l (or wc -w) counts them.

Umm, except that the above is plain wrong. The output from ls also includes a lot of blank lines, and these are not stripped out by the call to grep. The quick way to fix this is to add another grep to the pipeline:

ls -R ~ | grep -v ':$' | grep -v '^$' | wc -l

Here, ^ stands for the beginning of a line, so the pattern ^$ means a blank line.


File Redirection

So far I've been a bit vague about “input” and “output”. Now it's time to fix that. Every Unix program takes its input from a stream called standard input — abbreviated as stdin — and sends its output to a stream called standard output — abbreviated as stdout. I'm not going to go into detail about what a stream is, but you should know that stdin is usually connected (indirectly) to the keyboard and stdout is usually connected (indirectly) to the terminal screen. But they don't have to be:

Redirection and piping can be combined, for example:

spell < lec-bash-1.src.html | sort -u > typos.txt

will find all the spelling mistakes in this lecture, sort them alphabetically (discarding duplicates), and store them in the file typos.txt.

Sometimes we don't care about the output of a command. There is a special file called /dev/null where such output can be sent. It is a digital black hole (this is a very nice metaphor invented by Ian).


Variables

Assignment:

Variables do not need to be declared before use! This is one of the major differences between bash (and other scripting languages) and Java. This is one of the reasons that scripting languages are good for “quick and dirty” jobs.

Assignment has the form

variable=value

The assigned value must be a single word, or it must be quoted. For example,

bash$ x=hello

is OK, but you need to type

bash$  y="hello world"

The other big difference is that spaces matter here (unlike in Java). There must be no spaces before or after the = sign.

Note that in bash, as in csh, the set of shell variables is not the same as the set of environment variables. However, in bash (unlike in csh) there is a very strong link between these two sets. All environment variables can be used just like shell variables, but not all shell variables are passed on as environment variables to programs invoked by the shell — only those that have been explicitly exported.


Expansion

So what are variables good for? So far, our scripts have just been a bunch of commands that we could have typed interactively, stored in a file to be run in “batch mode”. But scripts can be much more than that. Before a line of a script is executed, the shell performs all sorts of transformations — known as expansions — on it.

Variable expansion:

Before a command is executed, any instances of ${variable} or $variable are replaced by the value of variable. For example,

bash$ y="hello world"
bash$ echo y
y
bash$  echo $y
hello world
bash$  echo $yly possessions
possessions
bash$  echo ${y}ly possessions
hello worldly possessions

In the last example there, you need the braces, otherwise bash doesn't know where the variable name ends. This is what happened in the second-last example: bash looked for a variable called yly and couldn't find one. Unlike Java, it doesn't care if you use a variable you haven't declared; it just happily treats it as the empty string.

This is something to watch out for: if you make a spelling mistake in a variable name, Java will give you an error message, and you'll find it the first time you try to compile, but bash will happily run your program, giving possibly quite bizarre results. This is one of the reasons that languages like Java are superior to scripting languages for large programs.

Expression expansion:

Before a command is executed, any instances of $[expression] are replaced by the value of expression.

For example,

bash$ echo 2 + 3
2 + 3
bash$ echo $[2 + 3]
5
bash$ echo 2 + 3 = $[2 + 3]
2 + 3 = 5

This is another common surprise for Java programmers new to shell scripting. Expressions are only evaluated when you tell the shell to evaluate them.

Command expansion:

Before a command is executed, any instances of $(command) or `command` are replaced by the output of command. (In the second form that's the backquote character, usually found at the top left of the keyboard.)

For example,

bash$ echo This directory has $(ls | wc -w) files
This directory has 10 files

This is incredibly useful, but it can also lead to seriously cryptic code if overused.

Pathname expansion:

Any instances of “shell style” regular expressions (words with ‘*’, ‘?’, and ‘[...]’) are replaced by possible matches before the command is executed.

For example,

bash$ echo *

functions just like ls. (Actually, that's not true; the spacing of the output is different.) We'll see more uses for this later.


Special Variables

A number of special environment variables have values when the shell starts:

${USER} The login name of the user.
${HOST} The name of the computer.

You can see the complete list by running the command env.

More special variables describe the parameters passed to the shell script:

${#} The number of parameters.
${0} The name of this shell script.
${1} The first parameter.
${*} A list of all the parameters.

Remember that ${0} can be expressed more simply as $0. Indeed, it's usually written the latter way.

The parameters passed to a script (or to any command you invoke in a shell script or interactively at the command line) are the things you type on the same line after the name of the command. The shell breaks them up by looking for spaces (unless you put something inside quote marks). For example, suppose the script params is:

#!/bin/bash

echo \${#} = ${#}
echo \${0} = ${0}
echo \${1} = ${1}
echo \${2} = ${2}
echo \${*} = ${*}

then we can type

bash$ params first second third
${#} = 3
${0} = params
${1} = first
${2} = second
${*} = first second third

Notice another new thing here: if we want to use a special (meta-) character just as itself, without its special meaning in shell language, we can “escape” it by putting a backslash before it. That's how I got it to print ${#}.


Return Codes

All commands return a number to the operating system.

For example

bash$ diff file1 file2 > /dev/null; echo ${?}

Will print 0 if the files are the same, and 1 if they differ.


Control Structures

Bash has a full range of control structures: loops, conditionals and subroutines (functions). The way it handles tests for conditions is a little different than what you're used to in Java.

While loops

while first-command-list
do
  second-command-list
done

This repeatedly executes both command lists while the last command of the first list returns an exit code of 0. That is:

  1. Execute first-command-list.

  2. If the exit code from the last command was zero, then continue, otherwise stop.

  3. Execute second-command-list.

  4. Go back to step 1.

For example:

#!/bin/bash

while lpq | grep ${USER} > /dev/null
do
  sleep 10
done
echo All your print jobs have finished.

Conditionals

if command-list
then
  command-list
elif command-list
then
  command-list
...
else
  command-list
fi

This is pretty similar to the if-then-else-end construction in Java, except that the condition for choosing the “then” part or not is the return code of the last command in the command list between if and then.

if diff ${file1} ${file2} > /dev/null
then
    echo Files are identical.
else
    echo Files differ (or there's an error).
fi

For Loops

These are very useful for scripts which have to process many files, or do the same thing for a whole list of arguments.

for variable in list
do
  command-list
done

Repeatedly execute command-list, with variable taking successive values from list.

For example:

for file in *.txt
do
   echo ${file} has $(cat ${file} | wc -w) words.
done

This will find all the text files (assumed to have the extension .txt) in the current directory and count the number of words in them.


An Example Application: Birthday Reminders

Suppose I record my friends' birthdays in the file .birthdays in my home directory:

Charles Manson:11/12
Zsa Zsa Gabor:06/02
William H. Gates III:28/10

This reminder script will tell me whose birthday it is today.

#!/bin/bash

today=$(date +%d/%m)
lines=$(grep -n ${today} ~/.birthdays | cut -d: -f1)
for x in ${lines}
do
  echo -n Today is
  echo -n $(cut -d: -f1 ~/.birthdays | head -${x} | tail -1)
  echo \'s birthday!
done

How does this work?

  1. The first line runs the date command, and saves its output in the variable today. The argument to the date command tells it to print the date as a two-digit day-of-the-month number, followed by a slash, followed by a two-digit month-in-the-year number. This is the format I used for the birthday dates in the .birthdays file. (And as usual you're likely to run into trouble with American-style birthdays.)

  2. The next line produces the line numbers within the file of all the people whose birthdays match today's date. (There might be more than one. This would be a little easier if we didn't have to allow for that possibility.) The grep command selects all lines from .birthdays which contain the string we just stored in ${today} and precedes each by its line number and a colon. The cut command divides its input up into fields using the delimiter character specified. Here that is the colon character; that's what the -d: option means. The -f1 option tells the cut command to only output the first of the fields it has divided each line into. So the output of the whole thing in $(...) is a list of the line numbers of the people whose birthday is today.

  3. The loop performs one iteration for each number in the list ${lines}. For each of these it first prints ‘Today is ’ and doesn't move to the next line. (That's what the -n option for echo does.)

  4. In the next line, the cut command takes the file ~/.birthdays and throws away everything after and including the first colon on each line. So it just keeps the names, and throws away the dates. The result of this is passed to the head command, which copies the first few lines of a file to its output and throws away the rest. How many? Well if there's no option, it's 10, but if you put a number there (with a minus sign before) then it takes exactly that many lines. So head -${x} after variable substitution is going to take all the lines up to and including the current value of $x. Finally the tail command throws away all but a few lines at the end of its input. With the option tail -1 it produces only the last line, which is the one we want, with the name of one of the people whose birthday is today.

  5. The last line isn't very interesting. The only thing to watch out for is that the apostrophe had to be escaped, because otherwise bash would think it was the beginning of a string.

The next thing is to put a line into your .login file to run this script every time you log in. Any ideas?


What Characterises a Scripting Language

Other scripting languages include:

The last three are large and popular programming languages which have at least partly outgrown their role as tools for writing quick-and-dirty solutions to small problems. Both now incorporate some object-oriented features, while retaining some of the convenience of shell scripts. However they don't offer features such as design by contract, type checking, and so on.


When Would You Use a Scripting Language?

When the task

Scripting languages — particularly Perl and PHP — are now used extensively to generate dynamic web pages.

____________________________________________________

[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]

____________________________________________________

Copyright © 2006, Jim Grundy & Ian Barnes & Richard Walker, The Australian National University
Version 2006.4, Wednesday, 3 May 2006, 15:43:03 +1000
Feedback & Queries to comp2100@cs.anu.edu.au