ANU The Australian National University



____________________________________________________

[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]

____________________________________________________

COMP2100/2500
Lecture 5: Project I

Summary

An introduction to the project, including discussion of XML.

Aims


The Project Software

Background to the project

This project was largely the idea of Tom Worthington.

The software we will be working on this semester is a prototype for a system that will solve part of this problem.

A preflight system is one where the author can submit their file and then see what the finished article will look like in the journal. If it looks wrong (for example the software mixes up the title and the author's name) then the author can edit the file and resubmit.


Background to the technology


Last year's project

The projects for the last three years have been based around a program that can read an Open Office file, understand it, view it on screen and convert it to plain old ASCII text and HTML. That program was written in Eiffel. Alexei Khorev has been translating the core code into Java over the summer.

This is a good start.


What it's planned to do in future

Some or all of these might be parts of assignments in COMP2100 this semester.


What other possibilities there are...

These might be part of later work, by me or others. Anyone interested in doing an individual project (COMP3700 or honours)?


Introduction to XML

This section is based on Ramesh Sankaranarayana's introductory XML lecture from COMP3410 Information Technology in Electronic Commerce in 2000.

What is XML?


History


Elements


Attributes


Entities


Document Type Descriptions


Open Office File format


Internal structure of the program

To open an Open Office file, we have to:

  1. Unzip it and extract the XML file we want (content.xml).

  2. Separate it into meaningful chunks called tokens, smoothing over irrelevant details like extra spaces, line endings and so on. This is called lexical analysis or just scanning.

  3. Work through the XML keeping track of the nesting of elements and building up a tree that represents the structure of the document. This is called parsing.

Once we have the parse tree we can write code to traverse it in different ways, extracting, processing or modifying the information stored in it.


XML References

____________________________________________________

[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]

____________________________________________________

Copyright © 2005, Ian Barnes, The Australian National University
Version 2005.2, Wednesday, 2 March 2005, 12:37:44 +1100
Feedback & Queries to comp2100@cs.anu.edu.au