Lectures 21–23, 20 May & 21 May 2009
Hello World! in bash; invoking the scriptbash commandsif-elif-else-fi, for-do-done, while-do-done)bash; sed and awk© These lectures are based on the material created by Ian Barnes and Richard Walker for the ANU COMP2100 and COMP2500 Software Construction course in 2001–2005 and by Alexi Khorev in 2006–2007, edited by Chris Johnson 2008–2009
The origin of modern scripting languages lies in job control languages of the early years (1960s and 70s), when the programs were written on the punch cards. i If you wanted to run a program on a computer, you would have to submit a job, in the form of a stack of punched cards. That stack contained a card for each line of your program, a card for each line of input data, and a number of cards that described how to run the program and access other devices attached to the computer. The instructions on those control cards were written using the computer's job control language, or JCL for short. Indeed, JCL is also the name of IBM's job control language. Here's an example (taken from Appendix 2 of IBM 360 Assembler Language Programming by Gopal K. Kapur, published by Wiley in 1970). Each line corresponds to a separate punched card.
// JOB PRACTICE // ASSGN SYS012,X'182' // ASSGN SYS014,X'01F' // ASSGN SYS010,X'181' // OPTION LINK // EXEC ASSEMBLY ...The set of cards for the program instructions goes here... /* // EXEC LNKEDT // EXEC ...The set of cards for the data goes here... /* /&
The job control cards specified how to compile the program, which devices to use and how they should be configured (e.g. the recording density of the magnetic tape in the tape drive), and where in the stack of cards the program and data start and finish. More advanced JCL included commands for specifying the name of the user running the job, accounting information (in older years of expensive big machines with multiple users there was strict accounting — (machine) time was money; this is still the case for cutting edge supercomputers), the priority of the job, and so on.
With the rise of so-called on-line (i.e. interactive) systems, the job control language did not disappear, but the commands could now be entered directly into the computer by the user.
Today's shells are the descendants of the early job control languages. They incorporate many of the features we have discussed above, but also provide higher-level structuring facilities that we are used to in conventional imperative programming languages (such as Java). Indeed, we will be interested in the shell as a programming language.
In modern operating systems, shell plays the role of mediator between the hardware and the operations system kernel on one side, and the user and various user applications on the other.
In some systems, the shell is just one component of the operating system (and thus can control devices and processes directly). However, in Unix, the shell has no such special privileges, and has the status of a normal application program (Therefore, one can have several shells on one OS).
In the schema shown, the shell is the program that acts as the user's interface to the operating system. When the user logs in to the computer system, they begin interacting with the shell program in order to start (and stop) application programs.
One of your tasks this semester is to move from being an ordinary user of the computer system to becoming
an expert a budding professional user; a necessary condition for becoming an effective
software developer. You will need to become proficient at using the shell. We shall illustrate how the
command line (shell) and GUI can handle same tasks in the third shell lecture.
For historical and cultural perspective on the significance of the command line interface (abbreviated to CLI from now on), read the book In the Beginning was the Command Line by Neal Stephenson (it's available on-line for free, just do google-search). We will have more to say about this perspective in the third lecture.
What is a Shell Script?
abx% cmd ↩
cat file | tr -sc A-Za-z '\n' | sort | uniq -c | sort -n
myscript.sh or run_tests
#!/bin/sh
#
# junit command line program for Debian
# author: Takashi Okamoto <tora@debian.org>
# usage:
# junit [-text] <TestCase> output result with text.
# This mode is default.
# junit -awt <TestCase> output result with awt ui.
# junit -swing <TestCase> output result with swing ui.
TESTRUNNER=junit.textui.TestRunner
CLASSPATH=${CLASSPATH}:/usr/share/java/junit.jar:.
if [ "$#" = "0" ]; then
TESTRUNNER=junit.awtui.TestRunner
fi
if [ "$1" = "-text" ] ; then
shift;
if [ "$#" = "0" ] ; then
FLAG=false
fi
elif [ "$1" = "-swing" ] ; then
shift;
TESTRUNNER=junit.swingui.TestRunner
if [ "$#" != "0" ]; then
echo "-swing option should not have other arguments"
exit;
fi
elif [ "$1" = "-awt" ] ; then
shift
TESTRUNNER=junit.awtui.TestRunner
if [ "$#" != "0" ]; then
echo "-awt option should not have other arguments"
exit;
fi
fi
if [ "$1" = "-help" ] || [ "$FLAG" = "false" ] ; then
echo "junit 3.8.1 -- this version is modified by Takashi Okamoto <tora@debian.org> for Debian."
echo "Usage: junit "
echo " -text <TestCaseName> - using text user interface."
echo " -awt - using awt user interface."
echo " -swing - using swing user interface."
echo "TestCaseName is the name of the TestCase class"
exit
fi
exec java -classpath ${CLASSPATH} ${TESTRUNNER} ${1+"$@"}
“The nice thing about standards is that there are so many of them to choose from.”
(Grace Murray Hopper, as quoted in the Unix Haters Handbook, p.10)
There have been many different shell programs written for Unix systems, and most are available for you to
try (all modern operating systems from UNIX/Linux family come with them being installed).
To find out which are available on your computer, type in cat /etc/shells; when I do
it, the following list is returned:
sh — the UNIX shell, historically first (Steve Bourne, 1970s);
good for scripting, not so friendly on the command
line interfacecsh — an "improvement" of sh with added C-like syntax and
better CLI support (Bill Joy, circa 1980); but according to
Csh programming
considered harmful
"The csh is a tool utterly inadequate for programming, and its use for such purposes should be
strictly banned!"tcsh — a real improvement on csh, with most of the bugs and
design flaws removed; csh and tcsh together form the csh-like
family of scripting languages, other shells listed here go under sh-family (``t'' in
tcsh stands for Tenex, once a rival to UNIX operating system;
CLI history and name completion mechanisms originated there)ksh — a shell due to David Korn, AT&T's System V shell
(used to be proprietory, but now a free clone exists owing to GNU); solid, robust, reliable shellbash — so called Bourne Again SHell, a modern
and actively supported, feature-rich, free shell, default on all GNU UNIX-like OS (incl.
free BSD, Linux, cygwin); the one is used in this coursezsh — excellent modern shell with some unique CLI and even richer
language featuresThere are few others (rc, es, ...?); to find out more about their diffrences, check out the
article "Shell Programming Differences" on the comp.unix.shell newsgroup.
The default script on the student system and the one which is run when you open a terminal window
is bash (to make sure, type ps -f:
one of the processes listed by this command is your command shell).
All the examples and exercises are done in bash.
Hello World! in bash — 1It's traditional for the first program in any new language to be one which just writes the string “Hello world!” on the screen. So here it is in bash.
shell-prompt> cd ~/bin shell-prompt> vim helloThe shell has a notion of current directory. When you open a new shell window, the current directory will be your home directory. Its path is something like /students/u1234567/. The cd command changes the current directory. It moves you around the directory tree. The tilde symbol ~ is a shorthand way of writing the path to your home directory. So cd ~/bin means “Move to the bin sub-directory of my home directory.”
#!/bin/bash echo Hello world!Now save your shell script to the disk.
Hello World! in bash — 2shell-prompt> chmod +x helloEvery Unix file has permissions associated with it, which determine who is allowed to do what to it. The main permissions are read, write and execute, and they are set separately for each of three categories: the user who owns it, other users in the same group, and other users not in the same group. If you type the command ls -l into the shell, you will see a long listing of the contents of the current directory, including all these permissions. The default for new files depends on the system, but it will certainly have read and write permission for the user, and it will not have execute permission for anyone. The chmod command (short for “Change Mode”) modifies these permissions. You can only do it to files you own.
shell-prompt> ./hello Hello world! shell-prompt>
bash Shell CommandsApart from standard control structures (see below), bash
has a large number of built-in commands
(they are part of the shell, some are common to all shells, and some are
shell specific):
stdin input, when received it can be assigned to a script variableBut the shell can also call any external command (executable which can be
run on your computer, including java):
More commands will be encountered later. Commands are separated by new lines or a semi-colon ;. For example,
cd ..; ls
moves up to the parent directory and then lists its contents.
Command options are very important if you want your command to do exactly what you want (nothing less, nothing more). To find out about the options, use the man pages (or, more modern Unix/Linus documenting system info). Eg., to find out what option can be used with grep command, type man grep. The man pages information can be at first somewhat cryptic. The Unix man pages are written in a special "unixy" style; the ability to read the man pages is a hall mark of every professional user.
Many commands read user input, and produce output. Such commands are called filters.
In UNIX, we can pipe the output of one program to be the input of another. The syntax is
program1 | program2
This is a very convenient utility of the shell, and you will see multiple examples of this, but there is one important limitation to its use: the pipes are unidirectional, the flow of data can not be reversed.
Counting files (and directories)
If ls lists files in a directory, and wc -w counts words of input, then
ls | wc -w
counts the files in a directory! here, wc is the UNIX counting program. It can return
the numbers of characters
(-c option), words (-w option) and lines (-l option).
Counting all your files
The command ls -R lists the files in a directory, and all sub-directories. For example, here is part of the output produced by running it in my comp2100/2007/bash directory
abx% ls -R bash.html images ./images: filter.png malevich.self-2d.jpg shell.png malevich.black-square.jpg pipe.png
So you can count all your files (and directories) with:
ls -R ~ | grep -v ':$' | wc -l
How does that work? The output of ls -R is piped into grep. The -v option of grep says select all those lines which do not match the regular expression. The regular expression :$ matches all lines which end with a colon. The dollar sign matches the end of a line. So the result of
ls -R ~ | grep -v ':$'
is just a list of all the file and directory names, one per line, starting at your home directory and going through all sub-directories. Piping that into wc -l (or wc -w) would count them. But because the output also contains empty lines, they need to be weed out. This is achieved with yet another pipe
ls -R ~ | grep -v ':$' | grep -v "^$" | wc -l
Now let's turn our attention to input/output. Every Unix program takes its input from a stream called standard input — abbreviated as stdin — and sends its output to a stream called standard output — abbreviated as stdout. We shall not going to go into detail about what a stream is (you may have some idea from Java experience), but you should know that stdin is usually connected (indirectly) to the keyboard and stdout is usually connected (indirectly) to the terminal screen. But they don't have to be:
command > file
command >> file
command < file
Redirection and piping can be combined, for example:
spell < bash.html | sort -u > typos.txt
assuming that the
command spell exists, which it generally does.
will find all the spelling mistakes in this lecture, sort them alphabetically (discarding duplicates), and store them in the file typos.txt.
Sometimes we don't care about the output of a command. There is a special file called /dev/null where such output can be sent. It is a digital black hole (this is a very nice metaphor invented by Ian).
Variables do not need to be declared before use! This is one
of the major differences between bash (and other
scripting languages) and Java. This is one of the reasons that
scripting languages are good for “quick and dirty” jobs.
There is no type system in bash — all
variables are essentially character strings,
but when used in an expression, arithmetic operations and
comparisons are permitted (only for values that are all digits; the
conversion is done implicitly).
Assignment has the form:
variable=value
The assigned value must be a single word, or it must be quoted. For example,
shell-prompt> x=hello
is OK, but you need to type
shell-prompt> y="hello world"
The other big difference is that spaces matter here (unlike
in Java).
There must be no spaces before or after the =
sign.
Note that in bash, as in tcsh, the set of shell variables is not the same as the set of environment variables. However, in bash (unlike in tcsh) there is a very strong link between these two sets. All environment variables can be used just like shell variables, but not all shell variables are passed on as environment variables to programs invoked by the shell — only those that have been explicitly exported.
So what are variables good for? So far, our scripts have just been a bunch of commands that we could have typed interactively, stored in a file to be run in “batch mode”. But scripts can be much more than that. Before a line of a script is executed, the shell performs all sorts of transformations — known as expansions — on it. Expansions are evaluations, but the syntax depends on whether we evaluate a value of the variable, or an arithmetic expression, or reading a command output.
Variable expansion (substitution)
Before a command is executed, any instances of ${variable} or $variable are replaced by the value of variable. Eg,
shell-prompt> y="hello world"
shell-prompt> echo y
y
shell-prompt> echo $y
hello world
shell-prompt> echo $yly possessions
possessions
shell-prompt> echo ${y}ly possessions
hello worldly possessions
In the last example there, you need the braces, otherwise bash doesn't know where the variable name ends. This is what happened in the second-last example: bash looked for a variable called yly and couldn't find one. Unlike Java, it doesn't care if you use a variable you haven't declared; it just happily treats it as the empty string.
This is something to watch out for: if you make a spelling mistake in a variable name, Java will give you an error message, and you'll find it the first time you try to compile, but bash will happily run your program, giving possibly quite bizarre (and meaningless) results. This is one of the reasons that languages like Java are superior to scripting languages for large programs (another reason is impossibility to define complex data types).
Expansion with testing and substitution
bash is capable of performing quite sophisticated
string manipulations during the expansion. Some of them are
(this syntax is for bash, other shells may have slightly different
rules):
${var:-word} — evaluates to $var if it's not null or unset, otherwise
evaluates to $word${#var} — the length of the string $var${var:offset} — the substring from $offset to end of $var${var: -offset}${var:offset:length} — the substring starting from $offset of the length
$length${var/str1/str2} — the first occurrence of the string str1 is
substituted by str2${var//str1/str2} — every occurrence of the string str1 is
substituted by str2${string#substring}, ${string##substring} — strips shortest and longest match of
$substring from front of $string.${string%substring}, ${string%%substring} — strips shortest and longest match of
$substring from back of $string.Expression expansion
Before a command is executed, any instances of $[expression] are replaced by the value of expression. For example,
shell-prompt> echo 2 + 3 2 + 3 shell-prompt> echo $[2 + 3] 5 shell-prompt> echo 2 + 3 = $[2 + 3] 2 + 3 = 5
This is another common surprise for Java programmers new to shell scripting. Expressions are only evaluated when you tell the shell to evaluate them.
Command expansion
Before a command is executed, any instances of $(command) or `command` are replaced by the output of command. (In the second form that's the backquote character, usually found at the top left of the keyboard.) For example,
shell-prompt> echo This directory has $(ls images/ | wc -w) files This directory has 5 files
This is incredibly useful, but it can also lead to seriously cryptic code if overused.
Pathname expansion
Any instances of “shell style” regular expressions (words with ‘*’, ‘?’, and ‘[...]’) are replaced by possible matches before the command is executed. For example,
shell-prompt> echo *
functions just like ls. (Actually, that's not true; the spacing of the output is different.) We'll see more uses for this later.
Example: bigfiles1
A number of special environment variables have values when the shell starts (there many more; you can change them or add your own when required):
| ${USER} —The login name of the user | ${HOME} —The pathname of the home directory |
| ${HOST} — The name of the computer | ${SHELL} — The default shell |
You can see the complete list by running the command env (the short version of printenv).
More special variables describe the parameters passed to the shell script ("command line arguments"):
| ${#} — The number of parameters | ${0} — The name of this shell script |
| ${1} — The first parameter | ${*} — The list of all parameters |
Remember that ${0} can be expressed more simply as
$0. Indeed, it's usually written the latter way.
If the operator shift is called inside the script,
the $* list is truncated from the left: $1→$2,
$2→$3, and so on (this is very useful for processing multiple
options for a command script)
The parameters that are passed to a script (or to any command you
invoke in a shell script, or a command that you type interactively
at the command line)
are the things on the same line after the name of the
command. The shell breaks them up by looking for spaces (unless
you put something inside quote marks). The script or command refers to
these values by number as if the number were a variable
name. Parameter number
0 has a special meaning: it is the name of the command or script.
For example, suppose the script params is the commands:
#!/bin/bash
echo \${#} = ${#}
echo \${0} = ${0}
echo \${1} = ${1}
echo \${2} = ${2}
echo \${*} = ${*}
then we can type
shell-prompt> params first second third
${#} = 3
${0} = params
${1} = first
${2} = second
${*} = first second third
Notice another new thing here: if we want to use a special metacharacter just as itself, without its special meaning in shell language, we can “escape” it by putting a backslash before it. That's how ${#} has got printed by using the string \${#}.
The metacharacters are a special set, which if unquoted, separate words (variable names). This set consists of
blank, `|', `&', `;', `(', `)', `<', `>'.
All commands return a numeric value called the status code to the operating system (in the range 0...255).
shell-prompt> diff file1 file2prints the differences between file1 and file2. It returns 0 for no differences, 1 if there were differences, and 2 if there was an error.
For example
shell-prompt> diff file1 file2 > /dev/null; echo ${?}
Will print 0 if the files are the same, or 1 if they differ.
Bash has a full range of control structures: loops, conditionals and subroutines (functions). The way it handles tests for conditions is a little different than what you're used to in Java.
While loops
while first-command-list do second-command-list done
This repeatedly executes both command lists while the last command of the first list returns an exit code of 0. That is:
For example:
#!/bin/bash
while lpq | grep ${USER} > /dev/null
do
sleep 10
done
echo All your print jobs have finished.
Conditionals
if command-list then command-list elif command-list then command-list ... else command-list fi
This is pretty similar to the if-then-else-end construction in Java, except that the condition for choosing the “then” part or not is the return code of the last command in the command list between if and then.
if diff ${file1} ${file2} > /dev/null
then
echo Files are identical.
else
echo “Files differ (or there's an error).”
fi
The double quotes protect the brackets and single quote in my echo string from being grabbed by the shell and causing an error. A (less readable) alternative is to protect each of these characters with a backslash ‘\’ character:
echo Files differ \(or there\'s an error\).
For loops
These are very useful for scripts which have to process many files, or do the same thing for a whole list of arguments.
for variable in list do command-list done
Repeatedly execute command-list, with variable taking successive values from list. For example:
for file in *.txt
do
echo ${file} has $(cat ${file} | wc -w) words.
done
This will find all the text files (assumed to have the extension .txt) in the current directory and count the number of words in them.
In all control structures, described above, the layout must be done as depicted, the part of the control structure beginning with the key word must begin on the new line. The newline separators can be replaced by semi-colons:
for file in *.txt ; do echo ${file} has $(cat ${file} | wc -w) words ; done
Example: bigfiles2
Other scripting languages include:
The last three are large and popular programming languages which have at least partly outgrown their role as tools for writing quick-and-dirty solutions to small problems. Both now incorporate some object-oriented features, while retaining some of the convenience of shell scripts. However they don't offer features such as design by contract, type checking, and so on.
When the task
Scripting languages — particularly Perl and PHP — are now used extensively to generate dynamic web pages. (The web programming is changing very fast, and the last statement is probably out of date, at least as far as Perl is concerned.)
bigfilesAn example with
discussion: several versions of the bigfiles script.
This is all very nice, but the final script looks like an ordinary
program written in
an imperative programming language — too many
"ordinary", banal control structures etc.
The power of shell programming lies in an effective use of
various utilities, small
programs which can do only a single task, but do it very
well. The UNIX environment
allows for seamless integration of such utilities, and the
shell programming serves as a glue
for carefully chosen utilities which combined together via the
piping mechanism.
And while bigfiles script has been written with
all huffing-and-puffing...
The bigfiles code shows you the hard way to do it.
Let's see the easy
way. The find command does all of the hard work.
The new find has options -type and -printf,
where the latter allows to print formatted output much like the command
printf (itself "borrowed" form C). The relevant
format options which go with -printf include
-%k prints the file size in 1K blocks (rounded up). -%p prints the file name
(the total number of the format options is about 50.) The -type option allows you to specify the file type you are looking for (f is for regular file, but can be also a directory (d), a link (l), a socket (s), etc.)
The following works on GNU/Linux:find . -type f -printf "%k\t%p\n" | sort -nr | head -10
[Careful here: there's the `good old' version of find that you'll find on Solaris, and there's GNU find, which has lots more options. You can do the same thing with the old find -- check out the -ls option -- but GNU find has that extra printf and %k stuff that's so useful. Just be aware that using the options that are GNU-find-specific makes your script less portable. As an exercise, use find's -ls option (together with the cut command) to solve the problem in a way that works portably across Solaris and GNU/Linux.]
What's the lesson? look out for powerful commands that already exist.
birthdays scriptAnother simple text-processing example
The birthday script is an
example of using shell scripts to process files of text information.
The birthday script introduces new commands — cut, head and
tail. Study it, it's short and simple.
(For those who are young and don't know: Zsa Zsa Gabor is a Hungarian born Hollywood
actress, Charles Manson is a murderous cult leader in the late 1960s, also from California
— don't worry, he is serving a life sentence and now is rather placid; but who is
William H. Gates?)
The "new" important command used in this script:
cut — select portions of each line of a filecut -f selected-fields [-d delim] [-s] [file ...]
head — display first lines of a filehead [-n count] [file ...]
tail — display the last part of a filetail -n positive_or_negative_number [file ...]
Examples of their use are given in all bash scripts discussed in the lectures.
Your lab 5 will introduce yet more system and built-in commands, including
an important read and test built-ins, and alternative
form of testing (not using the test explicitly, but using expression evaluation).
grep — print lines matching a pattern
find — walk a file hierarchy and do a lot of stuff to them
cut — select portions of each line of a file
head — display first lines of a file
tail — display the last part of a file
sort — sort lines of text files
diff — compare files line by line
echo — write arguments to the standard output
uniq — report or filter out repeated lines in a file
(forgotten how they are used? Look up the man pages!!)
tr — transliterates characters in its input
Reads from stdin, usage: tr [options] string1 [string2]
the first character in string1 is
transliterated into the first character in string2 etc.
cat file | tr -sc A-Za-z '\n' | sort | uniq -c | sort -n
returns a sorted list of all words used in file with the
frequency of its use
(option -c says use the complements of the characters from
string1 set,
and -s say to squeeze all repeatable
characters (designated in string1) into just one;
such that tr -sc A-Za-z '\n' compresses all non-letters into
new lines)
comm — select or reject lines common to two files (an alternative to diff)
comm f1 f2 outputs three columns (unique for f1,
unique for f2, and common to both lines)
xargs — filter for building and executing CLs from stdin, and also a tool for assembling the commands themselves (works when backquotes don't); it will determine which symbol represents a command name, enquire about its options and other syntax features, and put them in the correct order
strings — search a binary file for printable ascii
string
(can be useful for reverse engineering? for finding meta-content
in binary, ie, image, files etc)
The shell scripting languages allow for effective code re-use in the form of function definition. A function is defined in a code block (statements embedded into the curly brackets). A bash function definition is a "black box", its internals are invisible to the outer parts of the script. If there is repetitive code with a task which is subject to a slight variations, the function feature of bash can be useful. The syntax is:
function fn() {
command1
command2
...
}
where the round brackets after the function name fn() can be omitted (the first option is more portable, though). A function is triggered in the outer script by invoking its name, but the triggering can only follow the definition, it cannot precede it.
#!/usr/bin/bash
exclamation="Vivat, I won!"
echo -n "exclaiming from outside: "
echo ${exclamation}
f(){
local exclamation
exclamation="Alas, I lost."
echo -n "exclaiming from inside: "
echo ${exclamation}
}
f
echo -n "again exclaiming from outside: "
echo ${exclamation}
When ran, the output will be
shell-prompt^> ./my_script exclaiming from outside: Vivat, I won! exclaiming from inside: Alas, I lost. again exclaiming from outside: Vivat, I won!
Functions may process arguments passed to them and return an exit status to the script for further processing (details see elsewhere).
Exercise: use bash functions to extend the cake script
to use it in the debug mode (which only echoes messages on the CL, as discussed above),
and in the operational mode when the actual emails are being sent
(check mail command). The bigfiles example contains a discussion
about how to implement the multi-mode (with debug) script.
Bash deals with regular expressions only to a limited extent (using grep
utility and string expansion mechanism). To make your script better recognize
and use RE, one can use the Unix utilities sed and
awk (many professional scripts use them proficiently).
sed — 1sed — stream editor, a powerful filter program. Syntax for this command is
sed 'list of sed commands' filenames
sed processes the input files line by line, independently. The quotes are almost always needed to protect the sed metacharacters from the shell (where they also can have meaning). The output is produced on the command line (stdout). The sed commands usually are matching and substitution instructions (including a simple variant of regular expressions language) in accordance with which the processed lines shall be modified. Consider an example. Suppose, I manually edited the group list file used to set up the permissions in the Subversion group repositories, which you use for the assignment work (from a1→a2, taking into account your requests for group change). But I didn't change the "a1" in the original file:
shell-prompt> cat groups_a2.txt [a1:/branches/group03] u4125148, u4222523 [a1:/branches/group04] u4222980, u4223071 [a1:/branches/group05] u4210391, u4234643 .......
Not to worry! I need not to even open a text editor. First I rename the file
shell-prompt> mv groups_a2.txt groups_a1.txt
Then I run it through sed
shell-prompt> sed 's/a1/a2/' groups_a1.txt > groups_a2.txt shell-prompt> cat groups_a2.txt [a2:/branches/group03] u4125148, u4222523 [a2:/branches/group04] u4222980, u4223071 [a2:/branches/group05] u4210391, u4234643 .....
and I am with the correct file list.
sed — 2To create Subversion repositories for Assignment project in comp2100 one should use the command
svn copy http://svn/comp2100/a2/trunk/ http://svn/comp2100/a2/branches/groups01/
But for 50 odd groups, it would be smarter to use a script. First, I construct the group list group01, group02,..., using the file groups_a2.txt. Using the following command,
groups=$(sed -e 's/].*//' -e 's/.*group/group/' groups_a2.txt)
the -e option means to treat the next token as the sed command, not as the file name from which sed reads the lines. Using the list $groups and by writing a short bash script with the for loop, one can easily create all those svn directories.
Apart from substitution sed can be instructed to do (or not to do) certain action if it finds a pattern in the input. The following example shows an implementation of a command newer which lists all files in a directory that are newer than a specified file (it also shows that you can use the command line arguments passed to sed much like you do in bash):
ls -t | sed '/^'$1'$/q'(the quotes around $1 expose it to the shell which will replace it with the filename).
sed is good: it's fast, easy to use, it can handle a very long inputs. But it does everything line by line only, multi-line processing is hard and awkward. Here, to its aid comes
awk — 1awk (Aho, Weinberger and Kernighan) is a powerful stream processor and formatter, with syntax similar to C. In some respects, it is more powerful than shell (eg, it has floating-point arithmetics capabilities, which shell doesn't have). The usage is similar to sed:
awk [options] 'program' filenames
awk also reads the inputs one line at a time, but you can define what the line is (in awk it's called record, and it can spread several actual lines), and the line is automatically parsed into fields (the field separator can be also defined prior to and in the middle of processing), which can be dealt with separately. Like sed, awk does not alters the content of the input files. The awk program is different from sed's:
pattern { action }
pattern { action }
.....
for each pattern that matches the line the corresponding action is performed. The pattern can be a regex, or a boolean expression. The fields are referenced in the same way as CLA for shell scripts: $0 — entire input line, $1 — first field, etc; instead of $* there is NF built-in variable). The following script prints list of users who has no passwords (imaging this nowadays):
awk -F: '$2 == ""' /etc/passwd
awk has two special patterns, BEGIN and END; followed by blocks in curly brackets, the define actions performed before and after all input lines were processed. Very convenient for setting the built-in variables, and performing some action over all processed data. Eg,
awk 'END { printf NR }' files
does the same as cat files | wc -l.
awk — 2the awk script
{ nc += length($0) + 1
nw += NF }
END { print NR, nw, nc }
counts lines, words and characters like "full" wc. The full list of the built-in variables in awk
By controlling RS and ORS one can process a complex formatted input, and produce a custom formatted output.
With similar syntax to C, awk has full set of control structures (if-else, for and while loops). It has arrays: an awk script backwards
{ line[NR] = $0 }
END { for (i = NR; i > 0; i--) print line[i] }
when called awk -f backwards file will print the lines from file in the reversed order. It has also associative arrays. It also has a set of built-in functions (like seen above length(); they include mathematical functions).
There is a number of books and on-line tutorials on scripting and bash scripting in particular. The above shown tutorial is very detailed and thorough, and it has a lot of of examples, which is especially useful for understanding how the scripting works.
Highly recommended!
Next, we consider four examples of how the same task is being accomplished by the command line and the GUI, respectively. The following examples are taken from "The Pragmatic Programmer" by Hunt and Thomas.
Find all .java file modified more recently than your Makefile
| Shell | find . -name '*.java' -newer Makefile -print |
|---|---|
| GUI | Open the FileManager, navigate to the correct directory. Click on the Makefile, and note the modification time. Bring up Tools/Find, and enter *.java for the file specification. Select the date tab, and enter the date you noted for the Makefile in the first date field. Click OK. |
Construct a zip/jar/tar archive of a project source files
| Shell | zip archive.zip *.java –or– jar cvf archive.jar *.java –or– tar cvf archive.tar *.java |
|---|---|
| GUI | Bring up WinZip utility, select in the menu Create New
Archive. Enter its name, select the sources directory in the adding dialog. Set the filter to *.java. Click Add. Close the archive. |
Which Java files have not been changed in the last week?
| Shell | find . -name '*.java' -mtime +7 -print |
|---|---|
| GUI | Click and navigate to Find files, click the Named field and type in '*.java'. Select the Data Modified tab. Select Between. Click on the starting date and type in the starting date of the beginning of the project. Click on ending date and type in the date of a week ago today (you may need to check with a calendar), Click on Find Now. |
Of those files, which use the awt library?
| Shell | find . -name '*.java' -mtime +7 -print | xargs grep 'java.awt' |
|---|---|
| GUI | Load each file in the list from the previous example into an editor, and search for the string "java.awt". Write down the name of each file containing a match. |
Create a list of all unique package names explicitly imported by your code
| Shell | grep '^import ' *.java | sed -e 's/.*import *//' -e 's/;.*$//' | sort -u > list |
|---|---|
| GUI | (dreadful to even contemplate; Microsoft share price plummets) |

One evening, Master Foo and Nubi attended a gathering of programmers who had met to learn from each other. One of the programmers asked Nubi to what school he and his master belonged. Upon being told they were followers of the Great Way of Unix, the programmer grew scornful.
"The command-line tools of Unix are crude and backward", he scoffed. "Modern, properly designed operating systems do everything through a graphical user interface."
Master Foo said nothing, but pointed at the moon. A nearby dog began to bark at the master's hand.
"I don't understand you!" said the programmer.
Master Foo remained silent, and pointed at an image of Buddha. Then he pointed at a window.
"What are you trying to tell me?" asked the programmer.
Master Foo pointed at the programmer's head. Then he pointed at the rock.
"Why can't you make yourself clear?" demanded the programmer.
Master Foo frowned thoughtfully, tapped the programmer twice on the nose, and dropped him in a nearby trashcan.
As the programmer was attempting to extricate himself from the garbage, the dog wandered over and piddled on him.
At that moment, the programmer achieved enlightenment.
Webster's International Dictionary says:
Abstraction plays a vital role in the development of science. The human ability to abstract is therefore essential, yet only one third of modern adults have this ability (the so called formal stage of the human development, according to Jean Piaget, see this recent article which discusses the teaching of abstraction to computer professionals).
Some computer technologies promote abstraction on the user level, while others may hinder it.
GUI: hallmark of technological progress, problem for mankind
CLI and programming languages in general are adequate tools for capturing and using abstraction. We need complex notions and concepts because we need to solve complex problems. Abstractions are not just means to grasp these complex notions and concepts, abstractions is the hallmark of our mental ability to be up to it. GUI which lacks the ability to express abstraction, therefore, degrades our ability to handle complex problems. You can argue that visual means (including GUI) do allow capturing abstractions. Consider abstract art, for example! Indeed, look at the picture on the next slide. What is it? The answer depends on the additional information (part of which can be due to perception, therefore be subjective). If this were a simple geometrical drawing, the interpretation would be simple, and the abstraction resolution would be easy. But this is, in fact, a piece of fine art, a rather famous picture. To understand its meaning ("how to read it") requires knowledge and understanding of the context (cultural, historic, etc.). The image only does not reveal its context.
Images are not good for having generalized (abstract) meaning. When they do have abstract meaning, this can only be due to associated context which is not part of the image.
Abstraction appears as a result of (may be long) process of dealing with a variety of perceptions (images etc). A single perception without such process, or without a reference or context (which is a kind of database of all associated perceptions) does not allow to develop an abstract definition.

By the way, this is the artist's self-portrait. Note, that the head is also a black square, only smaller (microcosm and macrocosm).