[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]
COMP2100/2500
Lecture 21: Shell Programming ISummary
An introduction to writing shell scripts using bash.
Aims
Explain what the shell is, what scripting languages are, and what they are good for.
Introduce the basic structures and commands of bash, including filters, pipes, redirection, expansion, special variables and return codes.
History lesson
Not so long ago, if you wanted to run a program on a computer, you would have to submit a job, in the form of a stack of punched cards. That stack contained a card for each line of your program, a card for each line of input data, and a number of cards that described how to run the program and access other devices attached to the computer. The instructions on those control cards were written using the computer's job control language, or JCL for short. Indeed, JCL is also the name of IBM's job control language. Here's an example (taken from Appendix 2 of IBM 360 Assembler Language Programming by Gopal K. Kapur, published by Wiley in 1970). Each line corresponds to a separate punched card.
// JOB PRACTICE // ASSGN SYS012,X'182' // ASSGN SYS014,X'01F' // ASSGN SYS010,X'181' // OPTION LINK // EXEC ASSEMBLYThe set of cards for the program goes here/* // EXEC LNKEDT // EXECThe set of cards for the data goes here/* /&The job control cards specified how to compile the program, which devices to use and how they should be configured (e.g. the recording density of the magnetic tape in the tape drive), and where in the stack of cards the program and data start and finish. More advanced job control languages included commands for specifying the name of the user running the job, accounting information (in older years of expensive big machines with multiple users there was strict accounting — time was money; this is still the case for cutting edge supercomputers), the priority of the job, and so on.
With the rise of so-called on-line (i.e. interactive) systems, the job control language did not disappear, but the commands could now be entered directly into the computer by the user.
Today's shells are the descendants of the early job control languages. They incorporate many of the features we have discussed above, but also provide higher-level structuring facilities that we are used to in conventional imperative programming languages (such as Java). Indeed, we will be interested in the shell as a programming language.
The Shell
Here is a simplified hierarchical view of a computer system and its users.
Hardware is roughly characterized as those parts of the system which have an immediate physical reality.
Memory
Processors
Disks
Network cabling
The Operating System is low-level system software that is responsible for controlling the hardware and providing facilities to application programs.
Organizes disks into file-systems and runs the file-system processes
Runs programs on processors
Allocates memory to processes
The Shell conveys user requests to the OS.
Selects programs to run
Selects input for programs
Collects output from programs
In some systems, the shell is just one component of the operating system (and thus can control devices and processes directly). However, in Unix, the shell has no such special privileges, and has the status of a normal application program.
What is a shell?
In the schema shown, the shell is the program that acts as the user's interface to the operating system. When the user logs in to the computer system, they begin interacting with the shell program in order to start (and stop) application programs.
For most users these days, and for most of you so far, that role has been performed by the facilities of the graphical desktop.
Unix predates graphical desktops, so in Unix ‘shell' refers to a program that interprets typed user commands.
For expert users, the shell is a far faster and more powerful way to interact with the operating system than the graphical interface.
One of your tasks this semester is to move from being an ordinary user of the computer system to becoming an expert user; a necessary condition for becoming an effective software developer. You will need to become proficient at using the shell. We shall illustrate how the command line (shell) and GUI can handle same tasks in the third shell lecture.
What is a Shell Script?
A shell reads and executes user commands.
A sequence of commands can be saved in a file for reuse.
Such files are called shell scripts.
They are really programs in the shell language.
Which Shell?
“The nice thing about standards is that there are so many of them to choose from.”
Grace Murray Hopper, as quoted in the Unix Haters Handbook, p.10
There have been many different shell programs written for Unix systems, and most are available for you to try. They include:
The Bourne Shell (sh) -- The first UNIX shell, written by Stephen Bourne (1970s, the Unix heroic years).
The C Shell (csh) -- A replacement shell with syntax like that of the C language, written by Bill Joy between 1978 and 1980 (then a student at Stanford). Joy also created vi and, much later, worked on the design of Java. Despite the good intent, the csh turned out to be poorly disigned and buggy. It should not be used today. It's considered harmful. "The csh is a tool utterly inadequate for programming, and its use for such purposes should be strictly banned!"
The T C Shell (tcsh) -- A modernised C shell with command and file completion. (This is the default shell on student accounts.) Bill Joy's source code was the starting point; around sixty people contributed code in the years 1980-1998. ('T' stands for tenex, once the Unix rival).
The Bourne Again Shell (bash) -- A modernised Bourne shell incorporating many of the best features of sh and tcsh. Written by Brian Fox and Chet Ramey starting in 1989; the latest version dates from 2002. This is the standard shell on many Unix systems.
Others such as ksh (David Korn, AT&T, default on System V), zsh, rsh (Paln 9) . . . .
In this course we will focus on bash because:
Bourne shell is the traditional Unix scripting language, and bash is the most widely installed version of a Bourne-style shell.
Bash is more expressive as a scripting language than csh or tcsh.
It's freely available (on Unix and Windows).
Both csh and tcsh can also be used for scripting purposes. It's good to be able to read both styles (see the above remark regarding the C shell, though). Remember that your default login shell is tcsh, so to try out some of the things shown here on a command line, you will need to start a copy of bash.
If you are not sure which shell you are running as CLI, type ps -f to find out.
Our First Shell Script
It's traditional for the first program in any new language to be one which just writes the string “Hello world!” on the screen. So here it is in bash.
First you must open a shell window. You can do this from the graphical interface by selecting the icon that looks like a terminal and a shell from the panel at the bottom of the screen.
Create a file called hello in your bin directory. (From here on, stuff that you type on CLI is shown in blue.)
[comp2100@partch]$ cd ~/bin [comp2100@partch]$emacsvim helloThe shell has a notion of current directory. When you open a new shell window, the current directory will be your home directory. Its path is something like /students/u1234567/. The cd command changes the current directory. It moves you around the directory tree. The tilde symbol ~ is a shorthand way of writing the path to your home directory. So cd ~/bin means “Move to the bin sub-directory of my home directory.”
Type the following two lines into the opened file
#!/bin/bash echo Hello world!Now save your shell script to the disk.
Make the file executable.
[comp2100@partch]$ chmod +x helloEvery Unix file has permissions associated with it, which determine who is allowed to do what to it. The main permissions are read, write and execute, and they are set separately for each of three categories: the user who owns it, other users in the same group, and other users not in the same group. If you type the command ls -l into the shell, you will see a long listing of the contents of the current directory, including all these permissions. The default for new files depends on the system, but it will certainly have read and write permission for the user, and it will not have execute permission for anyone. The chmod command (short for “Change Mode”) modifies these permissions. You can only do it to files you own.
See the manual page for this command by typing man chmod. (The man command is extremely useful, although you'll probably get more information than you ever wanted. Learning to read man pages is another important task for you for this semester.)
Run your script by typing
[comp2100@partch]$ ./hello Hello world! [comp2100@partch]$You can also run the bash shell and interact with it just like we have been interacting with tcsh so far.
Basic Shell Commands
Built in commands.
Some commands are a part of the shell.
echo just copies its arguments to the output.
cd changes the current directory.
read waits for input.
test performs comparisons and checks file types.
External commands.
For any command which is not built in, the shell will search for a program with that name to run. This allows any program to be used as a command in a shell script.
ls lists the contents of the current directory.
grep searches files for a regular expression.
sort sorts the lines of a file into order according to different criteria.
cat copies the contents of a file to the standard output.
Commands are separated by new lines or a semi-colon ;. For example the single input line
cd ..; lsmoves up to the parent directory and then lists its contents.
Command options are very important if you want your command to do exactly what you want (nothing less, nothing more). To find out about the options, use the man pages (or, more modern Unix/Linus documenting system info). E.g., to find out what option can be used with grep command, type man grep. The man pages information can be at first somewhat cryptic. The Unix man pages are written in a special "unixy" style; the ability to read the man pages is a hall mark of every professional user.
Filters and Pipes
Many commands read user input, and produce output. Such commands are called filters.
We can pipe the output of one program to be the input of another. The syntax is
program1 | program2
The Unix Pipe mechanism is unidirectional, it cannot be reversed.
Examples of Pipes
Counting files (and directories).
If ls lists files in a directory, and wc -w counts words of input, then
ls | wc -wcounts the files in a directory!
Counting all your files:
The command ls -R lists the files in a directory, and all sub-directories. For example, here is part of the output produced by running it in my comp2100 directory:
[barnes@partch comp2100]$ ls -R .: assignments bin index.src.html labs lectures misc schedule www ./assignments: ./bin: build ...and so on for several screens... You can see that the assignments directory was empty, but that the bin directory had a file or subdirectory in it called build.
So you can count all your files (and directories) with:
ls -R ~ | grep -v ':$' | wc -lHow does that work? The output of ls -R is piped into grep. The -v option of grep says select all those lines which do not match the regular expression. The regular expression :$ matches all lines which end with a colon. The dollar sign matches the end of a line. So the result of
ls -R ~ | grep -v ':$'is just a list of all the file and directory names, one per line, starting at your home directory and going through all sub-directories. Piping that into wc -l (or wc -w) counts them.
Umm, except that the above is plain wrong. The output from ls also includes a lot of blank lines, and these are not stripped out by the call to grep. The quick way to fix this is to add another grep to the pipeline:
ls -R ~ | grep -v ':$' | grep -v '^$' | wc -lHere, ^ stands for the beginning of a line, so the pattern ^$ means a blank line.
File Redirection
So far I've been a bit vague about “input” and “output”. Now it's time to fix that. Every Unix program takes its input from a stream called standard input — abbreviated as stdin — and sends its output to a stream called standard output — abbreviated as stdout. I'm not going to go into detail about what a stream is, but you should know that stdin is usually connected (indirectly) to the keyboard and stdout is usually connected (indirectly) to the terminal screen. But they don't have to be:
The output of a command can be redirected to go to a file:
command > fileThe input of a command can be redirected to come from a file:
command < fileRedirection and piping can be combined, for example:
spell < lec-bash-1.src.html | sort -u > typos.txtwill find all the spelling mistakes in this lecture, sort them alphabetically (discarding duplicates), and store them in the file typos.txt.
Sometimes we don't care about the output of a command. There is a special file called /dev/null where such output can be sent. It is a digital black hole (this is a very nice metaphor invented by Ian).
Variables
Assignment:
Variables do not need to be declared before use! This is one of the major differences between bash (and other scripting languages) and Java. This is one of the reasons that scripting languages are good for “quick and dirty” jobs.
Assignment has the form
variable=valueThe assigned value must be a single word, or it must be quoted. For example,
bash$ x=hellois OK, but you need to type
bash$ y="hello world"The other big difference is that spaces matter here (unlike in Java). There must be no spaces before or after the = sign.
Note that in bash, as in csh, the set of shell variables is not the same as the set of environment variables. However, in bash (unlike in csh) there is a very strong link between these two sets. All environment variables can be used just like shell variables, but not all shell variables are passed on as environment variables to programs invoked by the shell — only those that have been explicitly exported.
Expansion
So what are variables good for? So far, our scripts have just been a bunch of commands that we could have typed interactively, stored in a file to be run in “batch mode”. But scripts can be much more than that. Before a line of a script is executed, the shell performs all sorts of transformations — known as expansions — on it.
Variable expansion:
Before a command is executed, any instances of ${variable} or $variable are replaced by the value of variable. For example,
bash$ y="hello world" bash$ echo y y bash$ echo $y hello world bash$ echo $yly possessions possessions bash$ echo ${y}ly possessions hello worldly possessionsIn the last example there, you need the braces, otherwise bash doesn't know where the variable name ends. This is what happened in the second-last example: bash looked for a variable called yly and couldn't find one. Unlike Java, it doesn't care if you use a variable you haven't declared; it just happily treats it as the empty string.
This is something to watch out for: if you make a spelling mistake in a variable name, Java will give you an error message, and you'll find it the first time you try to compile, but bash will happily run your program, giving possibly quite bizarre results. This is one of the reasons that languages like Java are superior to scripting languages for large programs.
Expression expansion:
Before a command is executed, any instances of $[expression] are replaced by the value of expression.
For example,
bash$ echo 2 + 3 2 + 3 bash$ echo $[2 + 3] 5 bash$ echo 2 + 3 = $[2 + 3] 2 + 3 = 5This is another common surprise for Java programmers new to shell scripting. Expressions are only evaluated when you tell the shell to evaluate them.
Command expansion:
Before a command is executed, any instances of $(command) or `command` are replaced by the output of command. (In the second form that's the backquote character, usually found at the top left of the keyboard.)
For example,
bash$ echo This directory has $(ls | wc -w) files This directory has 10 filesThis is incredibly useful, but it can also lead to seriously cryptic code if overused.
Pathname expansion:
Any instances of “shell style” regular expressions (words with ‘*’, ‘?’, and ‘[...]’) are replaced by possible matches before the command is executed.
For example,
bash$ echo *functions just like ls. (Actually, that's not true; the spacing of the output is different.) We'll see more uses for this later.
Special Variables
A number of special environment variables have values when the shell starts:
${USER} The login name of the user. ${HOST} The name of the computer. You can see the complete list by running the command env.
More special variables describe the parameters passed to the shell script:
${#} The number of parameters. ${0} The name of this shell script. ${1} The first parameter. ${*} A list of all the parameters. Remember that ${0} can be expressed more simply as $0. Indeed, it's usually written the latter way.
The parameters passed to a script (or to any command you invoke in a shell script or interactively at the command line) are the things you type on the same line after the name of the command. The shell breaks them up by looking for spaces (unless you put something inside quote marks). For example, suppose the script params is:
#!/bin/bash echo \${#} = ${#} echo \${0} = ${0} echo \${1} = ${1} echo \${2} = ${2} echo \${*} = ${*}then we can type
bash$ params first second third ${#} = 3 ${0} = params ${1} = first ${2} = second ${*} = first second thirdNotice another new thing here: if we want to use a special (meta-) character just as itself, without its special meaning in shell language, we can “escape” it by putting a backslash before it. That's how I got it to print ${#}.
Return Codes
All commands return a number to the operating system.
Most commands use this number to indicate success or failure. A return code of 0 means success. Anything else means failure.
There are some exceptions. An important one is the diff program. The line
bash$ diff file1 file2prints the differences between file1 and file2. It returns 0 for no differences, 1 if there were differences, and 2 if there was an error.
The special variable ${?} is the return code of the last command. (In passing: every command in Unix returns a value (non-negative integer), which is NOT to be confused with a command output.)
For example
bash$ diff file1 file2 > /dev/null; echo ${?}Will print 0 if the files are the same, and 1 if they differ.
Control Structures
Bash has a full range of control structures: loops, conditionals and subroutines (functions). The way it handles tests for conditions is a little different than what you're used to in Java.
While loops
while first-command-list do second-command-list doneThis repeatedly executes both command lists while the last command of the first list returns an exit code of 0. That is:
Execute first-command-list.
If the exit code from the last command was zero, then continue, otherwise stop.
Execute second-command-list.
Go back to step 1.
For example:
#!/bin/bash while lpq | grep ${USER} > /dev/null do sleep 10 done echo All your print jobs have finished.Conditionals
if command-list then command-list elif command-list then command-list ... else command-list fiThis is pretty similar to the if-then-else-end construction in Java, except that the condition for choosing the “then” part or not is the return code of the last command in the command list between if and then.
if diff ${file1} ${file2} > /dev/null then echo Files are identical. else echo Files differ (or there's an error). fiFor Loops
These are very useful for scripts which have to process many files, or do the same thing for a whole list of arguments.
for variable in list do command-list doneRepeatedly execute command-list, with variable taking successive values from list.
For example:
for file in *.txt do echo ${file} has $(cat ${file} | wc -w) words. doneThis will find all the text files (assumed to have the extension .txt) in the current directory and count the number of words in them.
An Example Application: Birthday Reminders
Suppose I record my friends' birthdays in the file .birthdays in my home directory:
Charles Manson:11/12 Zsa Zsa Gabor:06/02 William H. Gates III:28/10This reminder script will tell me whose birthday it is today.
#!/bin/bash today=$(date +%d/%m) lines=$(grep -n ${today} ~/.birthdays | cut -d: -f1) for x in ${lines} do echo -n Today is echo -n $(cut -d: -f1 ~/.birthdays | head -${x} | tail -1) echo \'s birthday! doneHow does this work?
The first line runs the date command, and saves its output in the variable today. The argument to the date command tells it to print the date as a two-digit day-of-the-month number, followed by a slash, followed by a two-digit month-in-the-year number. This is the format I used for the birthday dates in the .birthdays file. (And as usual you're likely to run into trouble with American-style birthdays.)
The next line produces the line numbers within the file of all the people whose birthdays match today's date. (There might be more than one. This would be a little easier if we didn't have to allow for that possibility.) The grep command selects all lines from .birthdays which contain the string we just stored in ${today} and precedes each by its line number and a colon. The cut command divides its input up into fields using the delimiter character specified. Here that is the colon character; that's what the -d: option means. The -f1 option tells the cut command to only output the first of the fields it has divided each line into. So the output of the whole thing in $(...) is a list of the line numbers of the people whose birthday is today.
The loop performs one iteration for each number in the list ${lines}. For each of these it first prints ‘Today is ’ and doesn't move to the next line. (That's what the -n option for echo does.)
In the next line, the cut command takes the file ~/.birthdays and throws away everything after and including the first colon on each line. So it just keeps the names, and throws away the dates. The result of this is passed to the head command, which copies the first few lines of a file to its output and throws away the rest. How many? Well if there's no option, it's 10, but if you put a number there (with a minus sign before) then it takes exactly that many lines. So head -${x} after variable substitution is going to take all the lines up to and including the current value of $x. Finally the tail command throws away all but a few lines at the end of its input. With the option tail -1 it produces only the last line, which is the one we want, with the name of one of the people whose birthday is today.
The last line isn't very interesting. The only thing to watch out for is that the apostrophe had to be escaped, because otherwise bash would think it was the beginning of a string.
The next thing is to put a line into your .login file to run this script every time you log in. Any ideas?
What Characterises a Scripting Language
Variables do not need to be declared before they are used. (Some scripting languages do allow you to declare variables and then only use variables you have previously declared. Some don't allow you to declare variables even if you wanted to.)
There are not many basic data types. In some languages (such as shells) all data is represented as strings.
Literal strings do not need to be enclosed in special markers (quotes). Almost everything else does.
There are few features for structuring programs. Well, maybe: some scripting languages are more expressive than others.
Programs are interpreted rather than compiled. Well, maybe: some scripting languages (e.g. Perl) come with a compiler, although you don't have to use it.
Other scripting languages include:
awk
tcl
Perl
Python
PHP
The last three are large and popular programming languages which have at least partly outgrown their role as tools for writing quick-and-dirty solutions to small problems. Both now incorporate some object-oriented features, while retaining some of the convenience of shell scripts. However they don't offer features such as design by contract, type checking, and so on.
When Would You Use a Scripting Language?
When the task
is not logically complex
requires a great deal of string and text manipulation
can be accomplished largely by running existing programs if you can control their options, input and output
requires the manipulation of many files.
Scripting languages — particularly Perl and PHP — are now used extensively to generate dynamic web pages.
[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]
Copyright © 2006, Jim Grundy & Ian Barnes & Richard Walker, The Australian National University
Version 2006.4, Wednesday, 3 May 2006, 15:43:03 +1000
Feedback & Queries to
comp2100@cs.anu.edu.au