Write and test the following programs in Java and bash. Develop and test your programs in whatever way you like - you are responsible for how it behaves when I test it later. You should design your testing strategy carefully. You can use JUnit if you wish, or you can use a script, or you can test by hand.You can use DrJava, or Kate, or emacs, or eclipse, or...
Copies of the data and program files have been provided in your top level
home directory and in the directory /dept/dcs/comp2100/exam
The computer environment for this exam has no Internet access outside the local domain. Copies and links to the normal resources have been created as follows.
A copy of the class web site is at http://csitexam/student/comp2100
The full Java 5 JDK and API documentations are at
/dept/dcs/comp2100/public/www/jdk/index.html
/dept/dcs/comp2100/public/www/jdk/api/index.html
The full Java 6 JDK and API documentations are at
/usr/share/doc/Sun-java6-doc/html/index.html
/usr/share/doc/Sun-java6-doc/html/api/index.html
The full Java Tutorial is at
/dept/dcs/comp2100/public/www/tutorial/index.html
The Apache Ant Tutorial is at
/dept/dcs/comp2100/public/www/ant/index.html
The Advanced Bash Scripting Guide is at
/dept/dcs/comp2100/public/www/abs-guide/HTML/index.html
Junit 4 (includes Junit 3.8) is at
/dept/dcs/comp2100/public/junit.jar
A copy of the exam starting files and datasets is in the directory
/dept/dcs/comp2100/exam
Your home directory has been set up with 4 directories: q1, q2, q3, q4. Your solutions must be in the correct directory (answers to question 1 in q1, etc)..
A librarian wants to extract useful information from a file containing records of loans of books.
The LoansList.tsv file contains records in the format
memberID date time bookID
A single TAB character separates between these fields (hence the suffix
tsv: "tab separated values").
Write two Bash scripts for these tasks. Leave your script files in directory q1.
BookFreq write a Bash script which
expects no
command arguments. It shall count how many times books have ever been
borrowed, as recorded in the file LoansList.tsv
The output shall be in order of bookID numbers.
The output shall be in the format like the following sample (this is not the whole of the answer)
6 1994212 10 1996243 1 1997273 2 1997274 4 1998284
This output format has one bookID on each line. The line contains:
Hint: think of a short, simple solution of a small number of bash script lines!
Hints and suggestions:
uniq has a counting
option; uniq works only if its input is sorted into order.cut works with
tab-separated fields by default.sort sorts the input to
the output, but it produces lexicographic ordering (i.e. dictionary
ordering, not numerical order).sort has a switch (-n) to choose numerical order,
and another to produce the result in reverse order (-r), and a
switch to select which fields to use as keys (-k).
Your script must be in directory named
q1,
in a file named
BookFreq (there is no filename extension). It must
be written in Bash.
You should test your script carefully; look at the contents of the LoansList, run other commands, count by hand... to make sure that your answer is correct. Note that the LoansList has only a small number of memberIDs, in a larger number of Loans and a large number of bookIDs.
unPopular
write a script that outputs a list of the books
that are least
frequently borrowed % ./unPopular 2
will output the count and bookID for 2 books only: that is, the books that have the 2 smallest counts, such as
1 1997273
2 1997274
(again, this is not the actual values of the answer that you will produce.)
The results shall be in increasing order of frequency of borrowings.
The argument to the command determines the number of books that are listed.
% ./unPopular 5
would give us the 5
least borrowed books, for example.
The argument may be any integer number that is greater than zero.
The output shall be in exactly the same format as for part 1.
You can make use of your BookFreq
command in this script if you
wish.
Your script for this part must be in the same directory called q1 of
your home directory, in a file called
unPopular
Hint: you may want to use the
head command in your script.
This task is similar to Homework 5, but read this question carefully for
differences.
Your Java programs must compile standalone. Create no packages in your
development environment. Leave your program java file in directory
q2.
You are given a starting point for this program in the attachment
(this is a link): /dept/dcs/comp2100/exam/LineCount.java
Save your finished program in the subdirectory q2
under your top level
directory, in a file named
LineCount.java
Write the missing parts for a Java program called
LineCount.
LineCountcounts the number of "non-empty" lines of text in any number of input files, and computes the average length of these lines.The program must be able to accept any number of command line arguments, each assumed to be the name of a text file ("text" means any kind of characters).
For each of those files it must open the file and count the number of lines of text.
It must count all lines, except those that are empty. An empty line is a line that
- is visibly empty (contains no characters, or contains only whitespace characters (spaces and tabs))
or is "logically" empty:
- starts with a C or Bash shell comment marker, that is, the # character at the beginning of the line.
A line is defined in the normal Java sense as in
BufferedReader.readLine: a sequence of characters terminated by CR (carriage return, '\n'), or LF (line feed, or '\r') or both together.The length of a line is the number of characters in the line. It does not include any line-termination characters.
After scanning each file the program must write a line to the standard output containing
- the number of non-empty lines in that file, or if the number of non-empty lines is zero, the message
"no lines in"- the average length of these non-empty lines, or if the number of non-empty lines is zero, no output
- the filename
In the output for each file, the number of lines should be right-justified in a field of width 6. Leave one space after the numbers.
If there was more than one file, then
LineCountmust leave a blank line after the last file's output, and then print a summary line giving the number of files read and the total number of lines of text, and the overall average line length.
If there is only one file (or no files) then no summary shall be produced.For example, suppose that I run the program in the directory containing a copy of all the HTML files in the tutorial directory of web pages provided for this exam, at
In this example I run the program twice./dept/dcs/comp2100/public/www/tutorial.
A HTML file counts as a text file.% java LineCount book.html 262 42.19 book.html % java LineCount *.html 262 42.19 book.html 413 38.65 index.html 1193 61.16 reallybigindex.html 64 46.34 search.html 699 48.56 trailmap.html 5 files, 2632 lines of text, average line length 52.04The program does not have to check that all files are what we would call text: we assume that whatever data is present should be counted. But it does have to watch out for errors like naming files that don't exist. If there is a problem with a file and it can't count the lines, it should print an error message instead of the usual output line, and then continue with any remaining files in the argument list.
% java LineCount foo.txt book.html ERROR: cannot open file foo.txt 262 42.19 book.html % java LineCount emptyFile.txt no lines in emptyFile.txt
Remember that the expansion of the wildcards on the command line in my example is done by the shell, before the resulting list of arguments are passed to the Java program. For example, when I typed ‘java LineCount *.html’, the program actually saw 5 (five) arguments on the command line, not 2 (two).
Detecting a non-existent file is easily solved. Just create an object of class File with your file name. Then ask the object if it exists. (Check the API documentation.) Alternatively, catch the exception when you try to do anything with this file.
As part of your testing of this program, run it over all the
html files in the tutorial directory and check that the line
counts and averages were similar to the ones I listed
above. Before doing this, create some small text files as selected
test cases, and count the lines by hand— or count them using
the prototype script averageLen in the exam
directory—to partly verify your results. This
averageLen script does
not produce the final total and overall average, and it may not
work correctly for empty lines containing spaces.
A simple algorithm: Take a line from the file, look at whether the first character is a '#', and if not, remove any instances of spaces and tabs, and then just check if it's empty.
Think about boundary cases: empty files, empty lines, lines containing only spaces and tabs, lines of different lengths after a '#'...
You will be marked on good style and clarity of your coding as well as on the correctness.
Hints for style: Divide the program into parts and write separate methods for logically separate parts such as opening files, counting in one file, deciding whether a particular line should be counted, etc.
THIS IS A HARDER QUESTION. DO NOT ATTEMPT IT UNTIL YOU HAVE COMPLETED Question 1 and Question 2.
leave your script file for this question in directory q3
The librarian has a further request to provide information, using a second file containing the list of all bookIDs and book titles and a third, containing a list of books that have been returned from loan.
(as before, this is not the result that your correct program will produce.) Note that a member may borrow, return and then re-borrow the same book.In a file called
Last2Borrowedwrite a script that outputs a set of the 2 latest outstanding loans for each member. The output shall be in order of memberID, the memberID on a line by itself heading a list of the 2 outstanding loans, listing each loan on a separate line, ordered by increasing date. Each loan shall list the date (not the time), the bookID and the title of the book.The second file
BookList.tsvfile (also in the exam directory) contains records in the formatbookID author yearPublished ISBN titleThe third file
ReturnsList.tsvis in the same format asLoansList.tsvThe meaning of the ReturnsList is that each line states the date and time that a book was returned by a member. We assume that the Loans and Returns lists start from the beginning of the library system; that loans and returns are recorded accurately, and a book must be borrwed before it can be returned, and cannot be returned by any member other than the one who borrowed it (these are not reasonable assumptions in real life).
An outstanding loan is for a book that has been borrowed but has not yet been returned.
The command shall take 0 arguments.
For example:
$ ./Last2Borrowed u1377093 2006-08-06 2006413 Bead & Button, Issue 73, June 2006 u3248221 2006-08-07 1982066 The Cimabue Crucifix 2006-08-23 2004383 Creative Drawing Ideas u3334840 2006-08-06 1996244 Fashion Memoir: Alaa 2006-08-13 2000326 Denton Corker Marshall: rule playing and the ratbag element
You can make use of your other command scripts in this script if you wish, but you must copy them into this directory.
Hint: consider using a for-do-done Bash
construct; sort, cut, and use results in temporary variables and
temporary files if necessary, to rejoin records and to handle one
member at a time. Consider counting the sets of the loan and return records
for one book to determine if the book is still on loan
(an odd number of loans and returns); consider a Bash test and
arithmetic calculation of a remainder (% operator) to determine
whether a number is odd or even. Consider the command
grep to extract matching records.
THIS IS A HARDER QUESTION. DO NOT ATTEMPT IT UNTIL YOU HAVE A GOOD SOLUTION TO Q2.
Write an extended version of the line counting program in Java. The new version shall be called
CLPCount. You must have separate programs for the 2 Java questions.CLPCount.javamust be left in directoryq4Input: the input is taken from the files named in command line arguments, as for the LineCount program.
Output: three counts, of the number of characters, lines, and paragraphs in each of the named text files;
and a summary total of characters, lines, and paragraphs, summed across all of the input similarly to LineCount..A paragraph is defined as a continuous sequence of text lines (as defined for LineCount). That is, any number of empty or comment lines separate between paragraphs. The first paragraph may start at the beginning of a file, or after empty or comment lines; the last paragraph may end at the end of a file, or have following empty or comment lines. A file may contain any number of paragraphs (including zero).