ANU The Australian National University
____________________________________________________
[ANU] [DCS] [COMP2100] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [Assessment] [PSP] [Eiffel] [Reading] [Help]
____________________________________________________

COMP2100
Lab Exam and sample solutions

Here are the lab exam questions together with some sample solutions. Note that there were often several ways to do them, particularly with the scripts, so just because your solution doesn't look like ours, doesn't mean it's wrong.

Question 1

In this question you will write two simple bash scripts.

Q 1(a)

Write a bash script called words1 to count the number of distinct words in a text file.

A word is a contiguous sequence of non-blank characters bordered by white space (or the beginning or end of the file). White space characters are spaces, tabs, form feeds, and new lines.

Two words are considered to be equivalent to each other if and only if they are the same length, and corresponding characters of the words are identical, disregarding the case of letters. For example, the words abc!De(F and abc!de(f are equivalent. Two words are considered to be distinct from each other if and only if they are not equivalent.

The script shall take the name of the file as its only command-line argument. After scanning the file it shall write one line to standard output containing the number of distinct words in that file, and the file's name.

For example, if the file file1.txt looks like this:

abc( def) gh@i
jkl aBc( deF)
GH@I mno

running the script should work as follows:

comp2100@partch ./words1 file1.txt
5 file1.txt

because there are five distinct words.

Hint: Use the tr program to (a) replace lower-case letters with the equivalent upper-case letters, and (b) replace all spaces, tabs and form feeds with new lines (so that each word is on a line by itself), then use sort to find distinct lines, then use grep to select all non-empty lines. Either pipe the result of this into wc or use the count option in grep.

Hint: You will probably need to consult the manpages for tr, sort, and grep. What regular expression matches any character?

Hint: You can write a tab as \t, a form feed as \f, and a new line as \n.

Note: The name of the script must be words1; the last character is the digit '1' (one).

[5 marks]

Solution

Here's one possible solution:

#!/bin/bash

words=$(cat ${1} | tr 'a-z\t\f ' 'A-Z\n\n\n' | sort -u | grep -c '.')
echo ${words} ${1}

This is not the only way to do it. An obvious variant is to split the two jobs done by the one call to tr, so that tr 'a-z\t\f ' 'A-Z\n\n\n' is replaced by tr 'a-z' 'A-Z' | tr '\t\f ' '\n'.

If you didn't know about the -c option for grep, you could have replaced grep -c '.' with grep '.' | wc -l.

If you didn't know that the pattern '.' matches any non-empty line, you could have done the same thing by using the -v option for grep (which reverses the logic of the selection) and saying grep -v '^$'. Since ^ represents the start of a line, and $ represents the end of a line, the pattern '^$' matches only empty lines. Reversing that with -v selects all non-empty lines.


Q 1(b)

Write a new script called words2 that loops through all the files given on the command line, counting distinct words and printing one line of output for each file, as in part (a). After it has done that, if there was more than one input file, it shall leave a blank line and then print a summary line giving the number of files read and the total number of distinct words across all input files.

You don't have to worry about what to do if a file doesn't exist.

For example, if the file file1.txt looks like this:

abc( def) gh@i
jkl aBc( deF)
GH@I mno

and the file file2.txt looks like this:

aBc( deF)
GH@I mno
pqr stu vwx

running the script should work as follows:

comp2100@partch ./words2 *.txt
5 file1.txt
7 file2.txt

2 files, 8 distinct words

Note that 5 + 7 ≠ 8. We conclude that there were 5 + 7 - 8 = 4 distinct words that occur in both file1.txt and file2.txt.

Hint: If you have succeeded with part (a), make a copy of your solution and use that as your starting point.

Hint: You might like to use some temporary files. If you do, it might be nice to include ${$} somewhere in the file name(s). Don't forget to clean up at the end of the script.

Hint: If you can't do part (a), you can still get marks for this part. Just count the number of words in each file with wc -w.

Hint: You may find this part easier than part (a), or not.

[5 marks]

Solution

Here's Richard's solution:

#!/bin/bash

rm -f tmp${$} tmptotal${$}

for f in ${*}
do
    cat ${f} | tr 'a-z\t\f ' 'A-Z\n\n\n' | sort -u > tmp${$}
    words=$(cat tmp${$} | grep -c '.')
    echo ${words} ${f}
    cat tmp${$} >> tmptotal${$}
done

if [ ${#} -gt 1 ]
then
    echo
    echo ${#} files, $(cat tmptotal${$} | sort -u | grep -c '.') distinct words
fi

rm -f tmp${$} tmptotal${$}

This uses a couple of temporary files tmp${$} and tmptotal${$}. The variable ${$} expands to the process ID of this shell. It's nice to use that as part of the file names, because then even if there are two processes running this same script at the same time, they won't clobber each other's temporary files because they will have different process IDs.

You dind't need to use temporary files if you didn't want to. An alternative solution for getting the totals would be to cat together all the files on the command line and then do the same thing to the resulting stream as you did to each file. (There's a minor issue here if a file doesn't end with a newline character.)


Question 2

In this question you will write two simple Eiffel programs that take data from a DICTIONARY and print it to the standard output. Imagine you are compiling the index of a book. For each entry you want in the index, you add an element to a dictionary with the key being the word or phrase, and the value being the page number on which it can be found.

In your home directory you will find two files. The first, index.txt, is a data file. The second, index0.e, is an Eiffel program that reads data from a file named on the command line and stores it in a DICTIONARY. Use this program as a starting point for this question.

Q 2(a)

Write an Eiffel program called index1 that prints out all the information in the data dictionary.

For example, if the file index.txt looks like this:

17 Bash scripting
23 Cascading Style Sheets
49 Eiffel
62 Assignment attempt
70 Data structures

then running the program should give something like this:

comp2100@partch index1 index.txt
Data structures 70
Cascading Style Sheets 23
Eiffel 49
Bash scripting 17
Assignment attempt 62

Notice that the order is all scrambled. Your program does not have to print the entries in any particular order, but it must print them all.

Your program must use an object of class ITERATOR from the Eiffel library. It must not loop through the data structure using a normal loop with a control variable. Have your iterator loop over the keys in the dictionary, not over the values.

Hint: You will need to look at the short forms of classes ITERATOR and DICTIONARY. How do you create an iterator for the keys in the dictionary?

Hint: Some of you may notice that class ITERATOR is an expanded class. Don't let this worry you. This makes no difference for the purposes of this exercise.

Note: The name of your class must be INDEX1; the last character is the digit '1' (one). This class must be in a file called index1.e. The name of the creation routine must be make.

[5 marks]

Solution

First, here's the "starting" version of the program, class INDEX0:

class

   INDEX0

      -- Starting point for 2004 COMP2100 Lab Exam Q2

creation
   make

feature
   
   make is
         -- Create a dictionary, read data into it from the file
         -- named on the command line, then print it out.
      local
         data: DICTIONARY [INTEGER, STRING]
         input: TEXT_FILE_READ
         value: INTEGER
         key: STRING
      do
         create data.make
         create input.connect_to (argument (1))
         from
            input.read_integer
         until
            input.end_of_input
         loop
            value := input.last_integer
            input.skip_separators
            input.read_line
            key := clone (input.last_string)
            data.add (value, key)
            input.read_integer
         end

         -- Insert code here to print the contents of 
         -- "data" to standard output.
         
      end

end -- class INDEX0

Here's a solution, with comments. First you need to declare an iterator. Since the keys are strings, it needs to be an iterator over strings. You could make it an attribute of the class, or a local in make. Probably a local is better, since it's not going to be used anywhere else.

iterator: ITERATOR [STRING]

Now here's the loop you need:

from
   iterator := data.get_new_iterator_on_keys
   iterator.start
until
   iterator.is_off
loop
   key := iterator.item
   value := data.at (key)
   std_output.put_string (key + " " + value.to_string + "%N")
   iterator.next
end

The first thing you needed to find was the feature get_new_iterator_on_keys in the documentation for class DICTIONARY. After that, it was just a matter of reading and understanding the documentation for class ITERATOR.

Here's the completed solution:

class

   INDEX1

      -- Solution to 2004 COMP2100 Lab Exam Q2(a)

creation
   make

feature
   
   make is
         -- Create a dictionary, read data into it from the file
         -- named on the command line, then print it out.
      local
         data: DICTIONARY [INTEGER, STRING]
         input: TEXT_FILE_READ
         value: INTEGER
         key: STRING
         iterator: ITERATOR [STRING]
      do
         create data.make
         create input.connect_to (argument (1))
         from
            input.read_integer
         until
            input.end_of_input
         loop
            value := input.last_integer
            input.skip_separators
            input.read_line
            key := clone (input.last_string)
            data.add (value, key)
            input.read_integer
         end

         from
            iterator := data.get_new_iterator_on_keys
            iterator.start
         until
            iterator.is_off
         loop
            key := iterator.item
            value := data.at (key)
            std_output.put_string (key + " " + value.to_string + "%N")
            iterator.next
         end
      end

end -- class INDEX1

Q 2(b)

Write a new Eiffel program called index2 that does exactly the same as index1 except that it prints out the index entries in alphabetical order.

For example if the file index.txt looks like this:

17 Bash scripting
23 Cascading Style Sheets
49 Eiffel
62 Assignment attempt
70 Data structures

then running the program should give this:

comp2100@partch index2 index.txt
Assignment attempt 62
Bash scripting 17
Cascading Style Sheets 23
Data structures 70
Eiffel 49

Notice that the entries are now listed in alphabetical order.

You may want to use your index1.e as a starting point. If you do, make sure that you copy it into a new file called index2.e before you start. Do not continue editing index1.e.

You will need to get all the keys from the dictionary into an array, then sort them. You must sort them using an object of class COLLECTION_SORTER from the Eiffel library. Do not write your own sorting routine.

Once you have sorted the keys, you must use an iterator to loop through them.

Hint: There is an easy way to get all the keys into an array, if you look at the documentation hard enough.

Hint: You will certainly want to look up the short form of class COLLECTION_SORTER.

Hint: You may notice that class COLLECTION_SORTER is an expanded class. Don't let this worry you. The only difference is that you don't have to create the sorter before you use it. Look at the example code at the top of the short form.

Hint: Class STRING inherits from class COMPARABLE, which means that it has an ordering already defined on it. If you find yourself writing complicated code to compare two strings, stop and think again. You don't need to do that; it's already done for you in class STRING.

Hint: You will need to look at the short form of class ARRAY to find out how to get an iterator for the elements of the array.

Note: The name of your class must be INDEX2. It must be in a file called index2.e. The name of the creation routine must be make.

[5 marks]

Solution

Here there are three steps.

  1. Get an array of strings containing the keys.

  2. Sort that array into alphabetical order.

  3. Loop through the sorted array and print out the keys and values.

For the first step, you need to declare an array of strings to hold the keys. Again this could be an attribute or a local in make. Probably better as a local, since it's not used anywhere else.

keys: ARRAY [STRING]

Now there are two ways to get the keys into this array. The easy way was to use the feature key_map_in of class DICTIONARY:

create keys.with_capacity (data.count, 1)
data.key_map_in (keys)

If you didn't spot that, you would have had to do it the hard way, by looping through the keys and copying the references one by one:

from
   create keys.make (1, 0)
   iterator := data.get_new_iterator_on_keys
   iterator.start
until
   iterator.is_off
loop
   keys.add_last (iterator.item)
   iterator.next
end

The second step is to sort the array using a COLLECTION_SORTER. First you have to declare one, again as a local in make. The things we're sorting are strings, so the declaration has to reflect that.

sorter: COLLECTION_SORTER [STRING]

Now it is just one line to do the sorting:

sorter.sort (keys)

The final step is to print out the sorted keys and corresponding values. This is very similar to part (a), but you have to loop through the array, not the dictionary.

from
   iterator := keys.get_new_iterator
   iterator.start
until
   iterator.is_off
loop
   key := iterator.item
   value := data.at (key)
   std_output.put_string (key + " " + value.to_string + "%N")
   iterator.next
end

Throughout all this, I've used local variables key of class STRING and value of class INTEGER, but you don't have to do that. The body of that last loop could just as well have been written:

std_output.put_string (iterator.item)
std_output.put_character (' ')
std_output.put_integer (data.at (iterator.item))
std_output.put_new_line
iterator.next

It's a matter of taste as to which you think is clearer.

Here's the completed program for part (b):

class

   INDEX2

      -- Solution to 2004 COMP2100 Lab Exam Q2(b)

creation
   make

feature
   
   make is
         -- Create a dictionary, read data into it from the file
         -- named on the command line, then print it out.
      local
         data: DICTIONARY [INTEGER, STRING]
         input: TEXT_FILE_READ
         value: INTEGER
         key: STRING
         keys: ARRAY [STRING]
         sorter: COLLECTION_SORTER [STRING]
         iterator: ITERATOR [STRING]
      do
         create data.make
         create input.connect_to (argument (1))
         from
            input.read_integer
         until
            input.end_of_input
         loop
            value := input.last_integer
            input.skip_separators
            input.read_line
            key := clone (input.last_string)
            data.add (value, key)
            input.read_integer
         end

         create keys.with_capacity (data.count, 1)
         data.key_map_in (keys)
         sorter.sort (keys)
         
         from
            iterator := keys.get_new_iterator
            iterator.start
         until
            iterator.is_off
         loop
            key := iterator.item
            value := data.at (key)
            std_output.put_string (key + " " + value.to_string + "%N")
            iterator.next
         end
      end

end -- class INDEX2
____________________________________________________
[ANU] [DCS] [COMP2100] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [Assessment] [PSP] [Eiffel] [Reading] [Help]
____________________________________________________

Copyright © 2004, Ian Barnes & Richard Walker, The Australian National University
Feedback & Queries to comp2100@iwaki.anu.edu.au
Version 2004.2, 4 June 2004, 19:05:36