[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]
COMP2100/2500
Lecture 23: Shell Programming IIISummary
More Unix commands for scripting
Functions in bash
sed and awk
Command Line Interface vs Graphical Use Interface.
Aims
Introduce more Unix commands used in scripting
Review advanced features of shell, describe sed and awk
Demonstrate advantages CLI over GUI
Discuss general issues about GUI and abstraction
Already known commands
grep
find
cut
head
tail
sort
printf
(forgotten how they are used? Me too! Look up the man pages)
Other useful commands
tr — transliterates characters in its input
Reads from sdtin, usage: tr [options] string1 [string2]
the first character in string1 is transliterated into the first character in string2 etc.
cat file | tr -sc A-Za-z '\n' | sort | uniq -c | sort -n
returns a sorted list of all words used in file with the frequency of its use
(option -c says use the complements of the characters from string1 set,
and -s say to squeeze all repeatable characters (designated in string1) into just one;
such that tr -sc A-Za-z '\n' compresses all non-letters into new lines)diff — (seen already) compares two files, reports the differences
comm — comparison utility useful for sorted files
comm f1 f2 outputs three columns (unique for f1, unique for f2, and common to both lines)xargs — filter for building and executing CLs from Sdtin, and also a tool for assembling the commands themselves (works when backquotes don't); it will determine which symbol represents a command name, enquire about its options and other syntax features, and put them in the correct order
strings — search a binary file for printable ascii string
(can be useful for reverse engineering? for finding meta-content in binary, ie. image, files etc)
Functions in bash
The shell scripting languages allow for effective code re-use in the form of function definition. A function is defined in a code block (statements embedded into the curly brackets). A bash function definition is a "black box", its internals are invisible to the outer parts of the script. If there is repetitive code with a task which is subject to a slight variations, the function feature of bash can be useful. The syntax is:
function fn() { command1... command2... ... }where the round brackets after the function name fn() can be omitted (the first option is more portable, though). A function is triggered in the outer script by invoking its name, but the triggering can only follow the definition, it cannot precede it. Eg, I put the following into the script my_script:
#!/usr/local/bin/bash exclamation="Vivat, I won!" echo -n "exclaiming from outside: " echo ${exclamation} f(){ local exclamation exclamation="Alas, I lost." echo -n "exclaiming from inside: " echo ${exclamation} } f echo -n "again exclaiming from outside: " echo ${exclamation}When ran, the output will be
abx@eudyptula:~/comp2100/cvs/lectures$ ./my_script exclaiming from outside: Vivat, I won! exclaiming from inside: Alas, I lost. again exclaiming from outside: Vivat, I won!Functions may process arguments passed to them and return an exit status to the script for further processing.
Bash cannot effectively deal with regular expressions. To make you script recognise and use RE, one can use the Unix utilities sed and awk.
sed and awk
sed
sed — stream editor, a powerful filter program. Syntax for this command is
sed 'list of sed commands' filenamessed processes the input files line by line, independently. The quotes are almost always needed to protect the sed metacharacters from the shell (where they also can have meaning). The output is produced on the command line (stdout). The sed commands usually are matching and substitution instructions (including a simple variant of regular expressions language) in accordance with which the processed lines shall be modified. Consider an example. Suppose, I (after enduring a considerable pain) manually edited the assignment group list (from a1—›a2, taking into account your requests for group change). But I didn't change the "a1" in the original file:
abx@eudyptula:~/comp2100/cvs/lectures$ cat groups_a2.txt [a1:/branches/group03] u4125148, u4222523 [a1:/branches/group04] u4222980, u4223071 [a1:/branches/group05] u4210391, u4234643 .......Not to worry! I need not to even open a text editor. First I rename the file
abx@eudyptula:~/comp2100/cvs/lectures$ mv groups_a2.txt groups_a1.txtThen I run it through sed
abx@eudyptula:~/comp2100/cvs/lectures$ sed 's/a1/a2/' groups_a1.txt > groups_a2.txt abx@eudyptula:~/comp2100/cvs/lectures$ cat groups_a2.txt [a2:/branches/group03] u4125148, u4222523 [a2:/branches/group04] u4222980, u4223071 [a2:/branches/group05] u4210391, u4234643 .....and I am with the correct file list.
To create Subversion repositories for Assignment project in comp2100 one should use the command
svn copy http://svn/comp2100/a2/trunk/ http://svn/comp2100/a2/branches/groups01/But for 50 odd groups, it would be smarter to use a script. First, I construct the group list group03, group04,...,group46, using the file groups_a2.txt. Using the following command,
groups=$(sed -e 's/].*//' -e 's/.*group/group/' groups_a2.txt)the -e option means to treat the next token as the sed command, not as the file name from which sed reads the lines; Using the list $groups and by writing a short bash script with the for loop, one can easily create all those svn directories.
Apart from substitution sed can be instructed to do (or not to do) certain action if it finds a pattern in the input. The following example shows an implementation of a command newer which lists all files in a directory that are newer than a specified file (it also shows that you can use the command line arguments passed to sed much like you do in bash):
ls -t | sed '/^'$1'$/q'(the quotes around $1 expose it to the shell which will replace it with the filename).sed is good: it's fast, easy to use, it can handle a very long inputs. But it does everything line by line only, multi-line processing is hard and awkward. Here, to its aid comes
awk
awk (Aho, Weinberger and Kernighan) is a powerful stream processor and formatter, with syntax similar to C. In some respects, it is more powerful than shell (eg, it has floating-point arithmetics capabilities, which shell doesn't have). The usage is similar to sed:
awk [options] 'program' filenamesawk also reads the inputs one line at a time, but you can define what the line is (in awk it's called record, and it can spread several actual lines), and the line is automatically parsed into fields (the field separator can be also defined prior to and in the middle of processing), which can be dealt with separately. Like sed, awk does not alters the content of the input files. The awk program is different from sed's:
pattern { action } pattern { action } .....for each pattern that matches the line the corresponding action is performed. The pattern can be a regex, or a boolean expression. The fields are referenced in the same way as CLA for shell scripts: $0 — entire input line, $1 — first field, etc; instead of $* there is NF built-in variable). The following script prints list of users who has no passwords (imaging this nowadays):
awk -F: '$2 == ""' /etc/passwdawk has two special patterns, BEGIN and END; followed by blocks in curly brackets, the define actions performed before and after all input lines were processed. Very convenient for setting the built-in variables, and performing some action over all processed data. Eg,
awk 'END { printf NR }' filesdoes the same as cat files | wc -l.
the awk script
{ nc += length($0) + 1 nw += NF } END { print NR, nw, nc }counts lines, words and characters like "full" wc. The full list of the built-in variables in awk
- FILENAME —name of current input file
- FS — field separator character (default blank and tab)
- NF — number of fields in input record
- NR — (current) number of input records
- OMFT — output format for numbers
- OFS — output field separator string (blank)
- ORS — output record separator string (newline)
- RS — input record separator (newline)
By controlling RS and ORS one can process a complex formatted input, and produce a custom formatted output.
With similar syntax to C, awk has full set of control structures (if-else, for and while loops). It has arrays: an awk script backwards
{ line[NR] = $0 } END { for (i = NR; i > 0; i--) print line[i] }when called awk -f backwards file will print the lines from file in the reversed order. It has also associative arrays. It also has a set of built-in functions (like seen above length(); they include mathematical functions).
Advanced Bash-Scripting Guide
This is an on-line book (or very long tutorial, the genre is unclear), very helpful, very detailed, very free. Use it!
CLI and GUI doing the same task
The following examples were taken from "The Pragmatic Programmer" by Hunt and Thomas.
Find all .java file modified more recently than your Makefile
Shell find . -name '*.java' -newer Makefile -print GUI Open the FileManager, navigate to the correct directory.
Click on the Makefile, and note the modification time.
Bring up Tools/Find, and enter *.java for the
file specification. Select the date tab, and enter the date you
noted for the Makefile in the first date field. Click OK.
Construct a zip/jar/tar archive of a project source files
Shell zip archive.zip *.java –or–
jar cvf archive.jar *.java –or–
tar cvf archive.tar *.javaGUI Bring up WinZip utility, select in the menu Create New Archive.
Enter its name, select the sources directory in the adding dialog.
Set the filter to *.java. Click Add. Close the archive.
Which Java files have not been changed in the last week?
Shell find . -name '*.java' -mtime +7 -print GUI Click and navigate to Find files, click the Named field
and type in '*.java'. Select the Data Modified tab.
Select Between. Click on the starting date and type in
the starting date of the beginning of the project. Click on
ending date and type in the date of a week ago today
(you may need to check with a calendar), Click on Find Now.
Of those files, which use the awt library?
Shell find . -name '*.java' -mtime +7 -print |
xargs grep 'java.awt'GUI Load each file in the list from the previous example
into an editor, and search for the string "java.awt".
Write down the name of each file containing a match.
Not enough lurid, you say. Then consider this...
Create a list of all unique package names explicitly imported by your code
Shell grep '^import ' *.java |
sed -e 's/.*import *//' -e 's/;.*$//' |
sort -u > listGUI (dreadful to even contemplate; Microsoft share price plummets)
A parable from Eric S. Raymond (The Art of Unix programming)
Master Foo Discourses on the Graphical User Interface
One evening, Master Foo and Nubi attended a gathering of programmers who had met to learn from each other. One of the programmers asked Nubi to what school he and his master belonged. Upon being told they were followers of the Great Way of Unix, the programmer grew scornful.
"The command-line tools of Unix are crude and backward", he scoffed. "Modern, properly designed operating systems do everything through a graphical user interface."
Master Foo said nothing, but pointed at the moon. A nearby dog began to bark at the master's hand.
"I don't understand you!" said the programmer.
Master Foo remained silent, and pointed at an image of Buddha. Then he pointed at a window.
"What are you trying to tell me?" asked the programmer.
Master Foo pointed at the programmer's head. Then he pointed at the rock.
"Why can't you make yourself clear?" demanded the programmer.
Master Foo frowned thoughtfully, tapped the programmer twice on the nose, and dropped him in a nearby trashcan.
As the programmer was attempting to extricate himself from the garbage, the dog wandered over and piddled on him.
At that moment, the programmer achieved enlightenment.
GUI: hallmark of technological progress, problem for mankind
CLI and programming languages in general are adequate tools for capturing and using abstraction. We need complex notions and concepts because we need to solve comlplex problems. Abstractions are not just means to grasp these complex notions and concepts, abstactions is the hallmark of our mental ability to be up to it. GUI which lacks the ability to express abstraction, therefore, degrades our ability to handle complex problems. You can argue that visual means (including GUI) do allow capturing abstractions. Consider abstact art, for example! Indeed, look at this picture. What is it? The answer depends on the additional information (part of which can be due to perception, therefore be subjective). If this were a simple geometrical drawing, the interpretation would be simple, and the abstraction resolution would be easy. But this is, in fact, a piece of fine art, a rather famous picture. To understand its meaning ("how to read it") requires knowledge and understanding of the context (cultural, historic, etc.). The image only does not reveal its context.
Images are not good for having generalised (abstract) meaning. When they do have abstract meaning, this can only be due to associated context which is not part of the image.
By the way, this is the artist's self-portrait. Note, that the head is also a black square, only smaller (microcosm and macrocosm).
[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]
Copyright © 2006, Alexei Khorev (standing on the shoulders of giants), The Australian National University
Version 2006.5, Wednesday, 10 May 2006, 15:33:06 +1000
Feedback & Queries to
comp2100@cs.anu.edu.au