[Homework 1] [Homework 2] [Homework 3] [Homework 4] [Homework 5] [Homework 6] [Homework 7] [Homework 8] [Homework 9] [Homework 10] [Homework 11] [Homework 12]![]()
COMP2100
Homework 12Due in Lab 11, Week 13.
Continue filling in a new Time Recording Log and Weekly Time Use Summary each week.
Write the following program, following the enhanced PSP as described in Lecture 19 and filling in the Project Plan Summary and a Defect Recording Log. Use one of the Project Plan Summary forms that doesn't have greyed-out sections.
Write an Eiffel program called `commonest' that reads text from the standard input and then prints out a table of the 5 most common words, together with their number of occurrences, in descending order of frequency.
For example, suppose that the file stuff.txt contains the text:
Eiffel! eiffel? eiffel `EIFFEL' eiFfeL java Java "JAVA" java pascal Pascal%$#@!&*PASCAL C, C; C. bash-bash BASIC COBOL Fortran66 Fortran77 Fortran90 SQLThen we could have the following interaction with the finished program:
[comp2100@karajan]$ commonest < stuff.txt 5 eiffel 4 java 3 c 3 pascal 2 bashNotice that the program ignores all punctuation and reduces all words to lower-case for comparison. So a word is defined as an unbroken sequence of letters (both upper- and lower-case) and digits, and words are separated by sequences of other stuff (anything else: punctuation, special symbols, white space, ends of lines etc).
Words with the same frequency should appear in alphabetical order and only the first five words should be printed.
Hints
You will probably need to write more than one class. This is a decision to make during the Design phase of development. By the time you start the Coding phase, you should know exactly what classes you will write, what library classes you will use and how they will work together.
You may find the feature is_letter_or_digit of class CHARACTER and the feature to_lower of class STRING useful. I don't recommend using the feature split of class STRING, since its definition of a word is different than the one above.
To save yourself writing complicated and error-prone sorting routines, look at the library classes COLLECTION_SORTER and REVERSE_COLLECTION_SORTER. An ARRAY is a kind of collection, so you can use one of these to sort an array. The things you're sorting need to inherit from COMPARABLE, so if you want to sort an array of objects from a class you have written, you will need to give a valid definition for the < operator on this class.
Class STRING inherits from class COMPARABLE, which means that you can compare two strings using the <, <=, > and >= operators. The ordering is alphabetical.
It may make this whole thing easier if you split the task into stages. The first stage is to read the input and split it into words. The second stage is to do something with that list of words so that you end up with a relation between words and frequencies (the number of times a word occurs). The third stage is to sort this into descending order of frequency, and for words with the same frequency, by alphabetical order. The final stage is to print the first five. (It may make sense to do the last two stages together.)
The most crucial design decision is what data structure to use for storing the words and frequencies. I created a new class that had two attributes: a string for the word, and an integer for its frequency. But it's not easy to make this work. There may be better ways.
This is definitely the hardest homework for the whole semester. Treat it with respect. Take enough time in the Analysis and Design phases. When I did this exercise, all my estimates were out by a factor of two: it took twice as long and the code was twice as large as I thought. So I didn't spend enough time in the early phases and ended up finding lots of serious defects during the Test phase. Don't make the same mistake.
Extension tasks
Many writers in English use the apostrophe to abbreviate words like "I'm", "it's", "we're", "can't", and "wouldn't". The way the program is specified above, it would split each of those into two words: "I" and "m", "it" and "s" and so on. Modify the requirements so that a single apostrophe in the middle of a word isn't removed, but that quote marks before or after a word are removed, along with all other punctuation. So running the program on the sentence:
"I won't, won't! WON'T!!", shouted the baby.
would give the results:
3 won't 1 baby 1 i 1 shouted 1 the
Deliverables
To get your marks, you must attend your registered lab group with your lab notebook containing:
A completed Time Log for the past week.
A completed Weekly Time Use Summary for the past week.
A printout of your completed program.
A completed PSP Project Plan Summary covering the development of the program, and including a Code Review phase, defect densities, defect injection and removal rates and process yield. Calculating the A/FR is optional, but recommended.
A completed Defect Recording Log with details of all defects found during the development.
A completed Code Review checklist.
These must all be securely located in your notebook: either stuck or stapled into a standard notebook, or punched and stored in a ring binder. No loose sheets!
[Homework 1] [Homework 2] [Homework 3] [Homework 4] [Homework 5] [Homework 6] [Homework 7] [Homework 8] [Homework 9] [Homework 10] [Homework 11] [Homework 12]![]()
Copyright © 2004, Ian Barnes, The Australian National University
Feedback & Queries to
comp2100@iwaki.anu.edu.au
Version 2004.1, 10 May 2004, 10:34:37