[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]
COMP2100/2500
Assignment 1 Marking GuideBackground
Students were asked to add code to or modify the Java oops program to carry out the following seven tasks:
Remove paragraphs that consist of a single data element made up only of hyphens or only of underscores (by adding a new class TreeFixer);
Modify the TextRenderer to implement list continuations (the text:continue-numbering="true" attribute) on ordered lists).
Gather information about the parent styles of automatically generated paragraph styles (by adding a new class StyleDecoder).
Extract title, author and affiliation metadata from documents (by adding a new class MetadataExtractor).
Produce HTML output (by adding a new class HtmlRenderer).
Fix the extra spaces appearing in the plain text output (by modifying class TextRenderer).
Rewrite the scanner with a new design and mode of operation.
You will need to read the Assignment 1 specification and the Assignment 1 Hints and FAQ carefully before you start marking. If you find any inconsistencies, please let me know.
Overall I want the mark on each assignment to reflect the overall level of achievement attained. I hope the guide below points you in that direction. It is important that a mark of 24/30 or higher truly reflects a piece of work at High Distinction level; that a mark of 21 is work at Distinction level; 18 is a Credit and 15 is a Pass.
Assign marks as follows:
Q1 3 Q2 2 Q3 4 Q4 4 Q5 5 Q6 2 Q7 4 Style 6 Total 30 But make sure that the overall mark really reflects their level of achievement.
The printout
The printout you will receive for each student consists of:
The usual header with information about lateness etc and a space for you to write some feedback comments and a final mark.
A section indicating any changes made to their code in the marker script. These are as follows:
Many students never ran their program on a document containing an anchor element, despite these being mentioned specifically in the FAQ discussion of Q5, and there clearly being a visitAnchor routine in every visitor. So they never found the error in the require assertion, which checked for text:anchor instead of text:a. This is really slack testing. Deduct two marks from any assignment where my script had to fix this.
One pair of students (u2562890) managed to mess up creating the “jarball” by omitting the name of the jar file. This caused the jar file to overwrite the first Java source file in the list: Assert.java, causing (of course) a compilation failure. I replaced this with the original Assert.java. One mark penalty.
Two pairs of students submitted assignments with incomplete (non-working) attempts at Q7. They consulted with me about this first. In one case (u4112785) the new scanner compiled OK but crashed, so I renamed it NewScanner.java and copied the original scanner in for compilation and testing. In the other case (u3355411) the new scanner doesn't compile, so I renamed the file ScannerNew.java.txt to prevent the compiler from looking at it. No penalty. In both cases please take a little time to look at the code (both for the scanner class and for its helpers, the various filters) and see if there is something in there worth a few marks.
One student (u3360551) did not complete Q5, and submitted an HTML renderer that doesn't compile. I renamed this to HtmlRenderer.java.txt so that the compiler wouldn't see it. No penalty (special circumstances). Please see if there is anything in the HTML renderer that is worth any marks.
A calculation of the number of new and changed lines of code. Most students who did not attempt Q7 are in the range 700–1000 lines. A few students have submitted over 2000 lines of code. I think this is probably excessive and should be penalised, particularly in the case of the group who submitted 3100 new or changed lines of code.
A complete listing of every new class, plus a diff listing for each changed class. The diff listing has the switches -w -U20, which means that whitespace is ignored in determining whether lines match, and that there are 20 lines of context around each change. I ran the diff output through grep -v ^- so that the original lines that have been removed or changed are not printed. (I found this confusing in the past.) Each new or changed line is marked with a + in the first column. (I've done this with the new classes too, so that every line the students wrote is marked with a +, whether it's in a new file or not. Let me know if you think this helps or not.)
A record of what happened when marker attempted to compile their program. Usually this says very little, but if there were problems, the output might help you. Almost everyone gets a warning about unchecked operations. This just means there is old (pre Java 1.5) code that uses collections without the new generics syntax. No penalty.
The results of running their program on a short test document test.sxw. The students did not have access to this document. The printout first shows what the program printed on the standard output, then the text file sample8.txt and then the HTML file sample8.html.
Note that this test document is fairly short and simple. Just because a student's program deals with it correctly does not mean they should automatically get full marks. Some quite poor programs may still perform well on this document.
1. Tree fixer (3 marks)
Their tree fixer should be a new class that implements Visitor. It should traverse the tree removing all paragraphs that satisfy the following conditions:
They have exactly one child. AND
That child is a data node. AND
The content of that data node is all hyphens. OR
The content of that data node is all underscores.
Consider the following points:
The test document has two such paragraphs, one (underscores) between the author's affiliation and the abstract, and the second (hyphens) between the abstract and the primary head. You can easily check in the plain text output whether they have been removed or not.
Some students may have the conditions a little wrong and may have removed all data elements consisting only of hyphens or underscores. There are two of these in the paragraph between the numbered and bullet point lists. They should not have been removed.
Check their tree fixer code for clarity and simplicity.
My solution works by marking nodes for deletion or possible deletion, then removing them afterwards. I print a diagnostic message on the standard output when I mark a node. They do not have to do this.
Removing elements from a vector while looping through them can lead to strange errors, like skipping over the node after the one that was removed (because all the remaining elements have their index reduced by one, but the loop counter still gets incremented). Watch out for this sort of subtle error; it probably won't show up in the test document. (You'd need two consecutive bad paragraphs.) One way around this is to loop through the nodes in descending order.
2. List continuations (2 marks)
This only applies to the plain text renderer, not the HTML renderer. The first thing is to check the plain text output. The relevant section should look exactly like this:
1. Item one of the list. This is an interruption in the middle of the list. 2. Item two of the list. 3. Item three of the list.If item two has the number 1 in front of it, then the student has not succeeded with this part.
A few points to consider:
Look in class TextRenderer for the code for this, particularly in the methods visitOrderedList() and visitOrderedListItem(). Again look for simplicity and clarity.
Their code does not have to be able to handle nested ordered lists, and there are no bonus marks for doing that.
This is probably the easiest of the seven parts.
3. Style decoder (4 marks)
This was in COMP2100 Assignment 2 in 2003, but in Eiffel.
Here they had to write a new visitor whose only task is to compile a lookup table of style “inheritance”. (See the assignment sheet.) This is pretty straightforward once you understand the task, but it's hard to get started.
Points to consider:
The first thing to look for is what the program writes on the standard output. There should be a section listing style information which should look like this:
Style information gathered: P1->Author's Name P11->Footnote P10->References P9->Table Head P8->Displayed Equation P7->Figure Caption P6->Text body P5->Text body P4->Initial Body Text P3->Categories P2->AffiliationThe order that these are printed out is not important, but all eleven entries should be there.
The other test is that the metadata extractor in Q4 is able to find the author's name and affiliation. It can't do this without the style decoder working correctly.
Look at their style decoder implementation. It should be as clear and simple as possible. There is no need for fancy solutions. Clarity is all.
Some students will have prevented their style decoder from traversing the body of the document, since style definitions cannot be down there. This is OK: no penalty, no reward.
4. Metadata extractor (4 marks)
This was also in the 2003 assignment.
Again they had to write a new visitor class for this question. The task was to traverse the tree looking for paragraphs representing the document title, the author's name and the author's affiliation. This requires using the style inheritance lookup table from Question 3. Output should be written on the standard output after the visitor has finished traversing the tree. Their code has to be able to handle multiple authors, multiple affiliations etc, but I haven't tested that in the test document. The output should look like this:
Metadata information collected: Title = "A Short Paper" Author's Name = "Ian Barnes" Affiliation = "Australian National University"Some points to look out for:
It's OK if it says Author's Name instead.
It's OK if it's not lined up perfectly.
This visitor doesn't need to traverse the styles part of the document, only the body. Again some students will have implemented this for speed and/or elegance, but there is no reward (or penalty).
5. HTML renderer (5 marks)
This was probably the biggest part (except for Q7). Students had to write a new visitor that produces an HTML version of the document.
Some points to look out for:
Read the long discussion of this part in the FAQ.
Formatting of the HTML code produced is not important (although I requested that students put some newlines in at the end of elements just to make it a little more readable).
Look for the document metadata in META elements inside the HEAD element.
Look for the title inside a TITLE element in the HEAD element also.
Look for correct tags around all the different types of paragraphs, as set out in the table in the assignment sheet.
Selecting the tags for paragraphs with a big switch or if-then-else statement is poor style. Penalise this, and reward students who created a lookup table (or tables) of some sort for this.
Look for correct formatting of SPAN elements with bold or italic in them. There are examples in the paragraph after the primary head. The HTML code should look like this:
... Here is a <I>span in italics</I> and here is a <B>span in boldface</B>. ...
Check that the anchor element is processed correctly. It should produce this:
... Here is a hyperlink to the <A HREF="http://cs.anu.edu.au/student/comp2100">COMP2100 Home Page</A>.
Look to see if they processed the small caps in Meyer's name or the superscript in E = mc2 correctly. If they did, write something nice about it, but no extra marks.
The code that handles the formatting for the SPAN elements isn't just in visitSpan(), but also in the methods that visit style descriptions, since the HTML renderer has to do something very similar to what the style decoder does in order to work out how to format a span. Some students may have elected to do this in the style decoder class rather than in the HTML renderer, and to add this information to their lookup table. This is a good solution.
Once again, look for simplicity and clarity in their code.
6. Fix bad spaces (2 marks)
The text renderer they were given breaks data elements up into words, then adds a space character after every word it prints. This is usually OK, but wrong sometimes, particularly at the beginnings and ends of spans. The correct way to do the output is to only put a space in the output if there was one in the input. Then sometimes you replace a space character with a newline if the line is too long. Line breaks are only allowed where there was a space.
This may be hard to mark, partly because the output will be wrong if they just do Q6 but not Q7.
Here is part of what my text output looks like with Q6 but not Q7 done:
This is a paragraph of Initial Body Text. Here is aspan in italicsand here is aspan in boldface.This paragraph is long so that we can check that the line breaking algorithm works correctly in the text renderer. Here is an ordered list with an interruption and continuation. ... Here is a line break. Here is an italic span containing only underscores (should not be removed):___and here is a bold span containing only hyphens (also should not be removed):-----.Here is a hyperlink to theCOMP2100 Home Page.Note the missing spaces in a few places. These are because the scanner supplied incorrectly does a trim() on all data strings. Some students may have made that one-line change to the supplied
Some points to look out for:
There should still be no lines longer than 64 characters. Line breaking needs to be done by some sort of “counting back” to the last space character. The old method “See if the new word will fit. If it doesn't, print out the line, otherwise just add this word.” is no longer good enough.
Students who have not attempted this or Q7 will have some extra spaces in their output, particularly after the ‘M’ in “Meyer”.
Probably the quickest way to check what is going on with this and Q7 is to look at the list of new and changed lines of code and see whether they have modified the scanner. If they haven't, then they haven't attempted Q7.
The code for this question is probably in methods visitData() and addWord() of class TextRenderer.
7. Scanner redesign/rewrite (4 marks)
This is much longer and harder than any of the other parts of the assignment. It was intended only for students who are aiming at getting very high marks in this course. The students had much more freedom in how they implemented this than in the other parts.
Look at my solution to get an idea of how this should work (although students may well have come up with alternative implementations of the details). The scanner no longer stores the complete input as a string. Instead it reads one token at a time from its input. The input is done with the Decorator pattern, with a series of filters wrapped around the Reader it is passed on creation. These filters remove comments, processing instructions, and Doctype declarations from the input, as well as normalising whitespace. In order that whitespace around a comment, PI or Doctype declaration gets merged correctly, the whitespace filter needs to be last in the chain (closest to the scanner, furthest from the actual input stream). Since there is no string to search for tokens in, the parsing of tags, attributes etc needs to be rewritten a bit. It's not really that hard, unless students try to allow for every possible error... This wasn't required. They can basically assume that the input is correct XML and let the program crash or (preferably) throw an exception if it isn't.
Some points to look out for:
The hard thing in the filters is that they have to look ahead for sequences of characters like <!-- (for a comment). This means each filter needs to do its own buffering. My solution was to use the library class PushbackReader, which is probably a bit wasteful. Most students won't have found this class, and will have written their own buffering code.
My first attempt used the mark() and reset() capabilities of class BufferedReader, but this failed to behave correctly if any filter encountered end of file while looking ahead. If a student's program crashed during parsing, it might have the same problem.
If Q6 and Q7 have both been done correctly, then the spacing in the text output should be perfect. No extra spaces and no missing spaces. Check it against my sample output.
It is possible to get correct output by changing just two lines of class Scanner (removing .trim() from two lines in the constructor). This is a smart thing to do in order to make testing Q6 easier, but it is not a solution to Q7 and should not get them any marks.
Style (6 marks)
As well as looking at the correctness of their solution, I want you also to consider their coding style. We haven't discussed a coding standard in class yet, so there is no formal standard to judge them against, but they have been told about the following, either last year or by me (or both):
Consistent indentation. (But it can be a bit hard to tell, because I used diff -w and some of the original code wasn't indented correctly.)
Good use of blank lines to break long sequences of instructions into logical chunks.
Appropriate choice of identifiers: local variables, function arguments and loop counters can be single letter (often this is clearer) but classes, fields and methods should have meaningful names, even if this means that they are quite long.
Modularity: Rather than creating super-long methods, it is often better to break them up into smaller, easier-to-understand methods.
Repeated code should be avoided, although as mentioned in the assignment specification some major redesign is outside the scope of this assignment.
Javadoc header comments for all classes, methods and fields. These should be as short and clear as possible.
In-code comments are needed whenever there is something complex or difficult. Comments should be at a higher level of abstraction than the code. They should typically explain in a brief phrase what the next chunk of code is doing.
The main principle here is clarity. If you find it hard to understand their code, then give them a lower mark for style. Try to identify and give them feedback on what it is that makes their code hard to understand, so that they can improve next time.
Late Penalty
If an assignment was handed in late, this will be shown on the printout. The late penalty is simple: late assignments will be penalised six (6) marks.
Do not penalise assignments that are less than fifteen minutes late. What with this generosity and extensions granted, I think this only leaves one late assignment (u4131114).
Responsibilities
As a marker for this assignment, you are required to:
Mark each assignment given to you;
Make a photocopy of the first page of each marked assignment and return it to me;
Enter the marks you awarded into the FAIS database under the name a1, both for the student listed on the printout and for their partner whose name and number I have written on the front page for you.
Don't forget to photocopy the front pages.
Do not enter zero for a missing assignment, just leave the field blank.
I can only pay you to spend about seven and a half hours marking the 17 assignments given to you. That works out at about twenty-five minutes per student. If you find yourself needing more time than this, let me know.
[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]
Copyright © 2005, Ian Barnes, The Australian National University
Version 2005.1, Monday, 18 April 2005, 15:20:51 +1000
Feedback & Queries to
comp2100@cs.anu.edu.au