COMP2100 / COMP2500

Software Construction, Software Construction for Software Engineers

Practical Exam

Semester 1 2008

Write and test the following programs in Java and bash. Develop and test your programs in whatever way you like - you are responsible for how it behaves when I test it later. You should design your testing strategy carefully. You can use JUnit if you wish, or you can use a script, or you can test by hand.You can use DrJava, or Kate, or emacs, or eclipse, or...

Copies of the data and program files have been provided in your top level home directory and in the directory /dept/dcs/comp2100/exam

Resources

The computer environment for this exam has no Internet access outside the local domain. Copies and links to the normal resources have been created as follows.

A copy of the class web site is at http://csitexam/student/comp2100

The full Java 5 JDK and API documentations are at

/dept/dcs/comp2100/public/www/jdk/index.html

/dept/dcs/comp2100/public/www/jdk/api/index.html

The full Java 6 JDK and API documentations are at

/usr/share/doc/Sun-java6-doc/html/index.html

/usr/share/doc/Sun-java6-doc/html/api/index.html

The full Java Tutorial is at

/dept/dcs/comp2100/public/www/tutorial/index.html

The Apache Ant Tutorial is at

/dept/dcs/comp2100/public/www/ant/index.html

The Advanced Bash Scripting Guide is at

/dept/dcs/comp2100/public/www/abs-guide/HTML/index.html

Junit 4 (includes Junit 3.8) is at /dept/dcs/comp2100/public/junit.jar

A copy of the exam starting files and datasets is in the directory

/dept/dcs/comp2100/exam

  1. Bash: extracting information from files of text records [30 marks]
  2. Java: Counting Lines and computing average length of lines [50 marks]
  3. Bash: extract and join information harder [10 marks]
  4. Java: Counting characters, lines and paragraphs harder [10 marks]

Setting up your directories

Your home directory has been set up with 4 directories: q1, q2, q3, q4. Your solutions must be in the correct directory (answers to question 1 in q1, etc)..


Question 1.

Bash: extracting information from files of text records [30 marks]

A librarian wants to extract useful information from a file containing records of loans of books.

The LoansList.tsv file contains records in the format

    memberID  date  time bookID
  

A single TAB character separates between these fields (hence the suffix tsv: "tab separated values").

Write two Bash scripts for these tasks. Leave your script files in directory q1.

  1. Task BookFreq. [15 marks] In a file called BookFreq write a Bash script which expects no command arguments. It shall count how many times books have ever been borrowed, as recorded in the file LoansList.tsv

    The output shall be in order of bookID numbers.

    The output shall be in the format like the following sample (this is not the whole of the answer)

       6 1994212
      10 1996243
       1 1997273
       2 1997274
       4 1998284

    This output format has one bookID on each line. The line contains:

    1. the count of how many loans for that bookID in the LoansList
    2. the bookID
    There is (optional) white space before the count, then a required white space (either a tab or spaces), then the bookID.

    Hint: think of a short, simple solution of a small number of bash script lines!

    Hints and suggestions:

    Your script must be in directory named q1, in a file named BookFreq (there is no filename extension). It must be written in Bash.

    You should test your script carefully; look at the contents of the LoansList, run other commands, count by hand... to make sure that your answer is correct. Note that the LoansList has only a small number of memberIDs, in a larger number of Loans and a large number of bookIDs.

  2. Task unPopular. [15 marks] In a file called unPopular write a script that outputs a list of the books that are least frequently borrowed
    (that is, a list of the bookIds that were borrowed the smallest number of times, with the count of how many times they have been borrowed.) The command has one argument, which states how many of the least borrowed books will be listed. For example,
        %  ./unPopular 2 
          

    will output the count and bookID for 2 books only: that is, the books that have the 2 smallest counts, such as

    1 1997273
    2 1997274

    (again, this is not the actual values of the answer that you will produce.)

    The results shall be in increasing order of frequency of borrowings.

    The argument to the command determines the number of books that are listed.

    % ./unPopular 5
    would give us the 5 least borrowed books, for example.

    The argument may be any integer number that is greater than zero.

    The output shall be in exactly the same format as for part 1.

    You can make use of your BookFreq command in this script if you wish.

    Your script for this part must be in the same directory called q1 of your home directory, in a file called unPopular

    Hint: you may want to use the head command in your script.



Question 2.

Java: Counting lines and the average length of lines [50 marks]

This task is similar to Homework 5, but read this question carefully for differences.
Your Java programs must compile standalone. Create no packages in your development environment. Leave your program java file in directory q2.

You are given a starting point for this program in the attachment (this is a link): /dept/dcs/comp2100/exam/LineCount.java

Save your finished program in the subdirectory q2 under your top level directory, in a file named LineCount.java

Write the missing parts for a Java program called LineCount.

LineCount counts the number of "non-empty" lines of text in any number of input files, and computes the average length of these lines.

The program must be able to accept any number of command line arguments, each assumed to be the name of a text file ("text" means any kind of characters).

For each of those files it must open the file and count the number of lines of text.
It must count all lines, except those that are empty. An empty line is a line that

or is "logically" empty:

A line is defined in the normal Java sense as in BufferedReader.readLine: a sequence of characters terminated by CR (carriage return, '\n'), or LF (line feed, or '\r') or both together.

The length of a line is the number of characters in the line. It does not include any line-termination characters.

After scanning each file the program must write a line to the standard output containing

  1. the number of non-empty lines in that file, or if the number of non-empty lines is zero, the message "no lines in"
  2. the average length of these non-empty lines, or if the number of non-empty lines is zero, no output
  3. the filename

In the output for each file, the number of lines should be right-justified in a field of width 6. Leave one space after the numbers.

If there was more than one file, then LineCount must leave a blank line after the last file's output, and then print a summary line giving the number of files read and the total number of lines of text, and the overall average line length.
If there is only one file (or no files) then no summary shall be produced.

For example, suppose that I run the program in the directory containing a copy of all the HTML files in the tutorial directory of web pages provided for this exam, at /dept/dcs/comp2100/public/www/tutorial.
A HTML file counts as a text file.

In this example I run the program twice.
% java LineCount book.html
   262 42.19 book.html
% java LineCount *.html
   262 42.19 book.html
   413 38.65 index.html
  1193 61.16 reallybigindex.html
    64 46.34 search.html
   699 48.56 trailmap.html

5 files, 2632 lines of text, average line length 52.04

The program does not have to check that all files are what we would call text: we assume that whatever data is present should be counted. But it does have to watch out for errors like naming files that don't exist. If there is a problem with a file and it can't count the lines, it should print an error message instead of the usual output line, and then continue with any remaining files in the argument list.

% java LineCount foo.txt book.html
ERROR: cannot open file foo.txt
   262  42.19 book.html
% java LineCount emptyFile.txt
no lines in emptyFile.txt

Hints and explanations

Remember that the expansion of the wildcards on the command line in my example is done by the shell, before the resulting list of arguments are passed to the Java program. For example, when I typed ‘java LineCount *.html’, the program actually saw 5 (five) arguments on the command line, not 2 (two).

Detecting a non-existent file is easily solved. Just create an object of class File with your file name. Then ask the object if it exists. (Check the API documentation.) Alternatively, catch the exception when you try to do anything with this file.

As part of your testing of this program, run it over all the html files in the tutorial directory and check that the line counts and averages were similar to the ones I listed above. Before doing this, create some small text files as selected test cases, and count the lines by hand— or count them using the prototype script averageLen in the exam directory—to partly verify your results. This averageLen script does not produce the final total and overall average, and it may not work correctly for empty lines containing spaces.

A simple algorithm: Take a line from the file, look at whether the first character is a '#', and if not, remove any instances of spaces and tabs, and then just check if it's empty.

Think about boundary cases: empty files, empty lines, lines containing only spaces and tabs, lines of different lengths after a '#'...


You will be marked on good style and clarity of your coding as well as on the correctness.

Hints for style: Divide the program into parts and write separate methods for logically separate parts such as opening files, counting in one file, deciding whether a particular line should be counted, etc.


Question 3. [10 marks]

THIS IS A HARDER QUESTION. DO NOT ATTEMPT IT UNTIL YOU HAVE COMPLETED Question 1 and Question 2.

Bash: extract and join information

leave your script file for this question in directory q3

The librarian has a further request to provide information, using a second file containing the list of all bookIDs and book titles and a third, containing a list of books that have been returned from loan.

In a file called Last2Borrowed write a script that outputs a set of the 2 latest outstanding loans for each member. The output shall be in order of memberID, the memberID on a line by itself heading a list of the 2 outstanding loans, listing each loan on a separate line, ordered by increasing date. Each loan shall list the date (not the time), the bookID and the title of the book.

The second file BookList.tsv file (also in the exam directory) contains records in the format

     bookID   author   yearPublished   ISBN    title

The third file ReturnsList.tsv is in the same format as LoansList.tsv

The meaning of the ReturnsList is that each line states the date and time that a book was returned by a member. We assume that the Loans and Returns lists start from the beginning of the library system; that loans and returns are recorded accurately, and a book must be borrwed before it can be returned, and cannot be returned by any member other than the one who borrowed it (these are not reasonable assumptions in real life).

An outstanding loan is for a book that has been borrowed but has not yet been returned.

The command shall take 0 arguments.

For example:

    $ ./Last2Borrowed
    u1377093
        2006-08-06      2006413    Bead & Button, Issue 73, June 2006
    u3248221
        2006-08-07      1982066    The Cimabue Crucifix       
        2006-08-23      2004383    Creative Drawing Ideas
    u3334840
        2006-08-06      1996244    Fashion Memoir: Alaa
        2006-08-13      2000326    Denton Corker Marshall: rule playing and the ratbag element
    
(as before, this is not the result that your correct program will produce.) Note that a member may borrow, return and then re-borrow the same book.

You can make use of your other command scripts in this script if you wish, but you must copy them into this directory.

Hint: consider using a for-do-done Bash construct; sort, cut, and use results in temporary variables and temporary files if necessary, to rejoin records and to handle one member at a time. Consider counting the sets of the loan and return records for one book to determine if the book is still on loan (an odd number of loans and returns); consider a Bash test and arithmetic calculation of a remainder (% operator) to determine whether a number is odd or even. Consider the command grep to extract matching records.


Question 4. [10 marks]

Java: character, word and line counts

THIS IS A HARDER QUESTION. DO NOT ATTEMPT IT UNTIL YOU HAVE A GOOD SOLUTION TO Q2.

Write an extended version of the line counting program in Java. The new version shall be called CLPCount. You must have separate programs for the 2 Java questions. CLPCount.java must be left in directory q4

Input: the input is taken from the files named in command line arguments, as for the LineCount program.

Output: three counts, of the number of characters, lines, and paragraphs in each of the named text files;
and a summary total of characters, lines, and paragraphs, summed across all of the input similarly to LineCount..

A paragraph is defined as a continuous sequence of text lines (as defined for LineCount). That is, any number of empty or comment lines separate between paragraphs. The first paragraph may start at the beginning of a file, or after empty or comment lines; the last paragraph may end at the end of a file, or have following empty or comment lines. A file may contain any number of paragraphs (including zero).