Australian National University
Faculty of Engineering and Information Technology
INFS2052: Internet and Intranet Information Systems
Laboratory Exercise Week 5
The HyperText Transfer Protocol (HTTP)
In this tutorial the class can discuss some of the Web sites that
you have already visited for assignment 1, and the challenge of
discovering and describing their information structure. Tutorial
help on HTML scripting will be available if you want it.
The main task in the lab is to get some understanding of the HTTP
protocol in action, and the role of the proxy server or cache.
Discovering Web site information structure
I suggest about twenty minutes of class discussion at a time in the
session selected by the tutor. Topics for discussion:
- How did you discover and choose sites?
- what are the kinds of information structure that you have
seen represented?
- Is it possible to explain or diagram these sites'
information structures? how?
More HTML and web page construction
Help is available from tutor on request, and a small focussed group
session may be run by the tutor.
The xpaint tool can be used to create GIF images if you
want to include diagrams in your web pages (warning: use drawing tools
in moderation, thay can make you excited).
xpaint is available at
/dept/dcs/infs2052/public/bin/xpaint
You can also using xfig and then export the file as a GIF
file. Any other drawing tool can be used if you wish, as long as
you save the file as a GIF file or a JPG file. You can use any
other system you like (Mac, PC) if you can transfer the files
yourself (even by email).
Doing HTTP by hand
The HyperText Transfer Protocol just uses printable ASCII
characters, so it can be generated by you at a keyboard or with a
text editor. The messages can be sent to a HTTP server because such
servers normally listen for a TCP/IP connection at port 80, and the
telnet program is able to make connections send typed text
and receive text replies.
The basic method is as follows:
- connect to a server or proxy at port 80. For example
% telnet online.anu.edu.au 80
- send a request such as GET or HEAD. For example
GET /people/Roger.Clarke/index.html HTTP
(empty line)
- analyse the response
The empty line after the request is necessary: in a later example you
will see that you can put headers in the request, and the blank line
signals the end of the request and its header lines.
- Get the infs2052 motd.html (message of the day) page from
the unit's home page in this way.
Analyse the kind of
errors you get, systematically. Notice what error responses you
get in particular, including 404 Document not found and the
(puzzling) Document moved messages.
- Roger Clarke's site (referenced in the example above) includes
a search index. Use Netscape to look at the
introductory page of the site and try searching for the name of
a person: "Johnson", for instance.
You can see the form of
the query string that was used in Netscape Location line
(scroll through this with the arrow keys; widen your browser's
window if necessary).
Look at the source
of the original page to see where the pieces of information
come from, and the specification of the ACTION on the form
being submitted.
Only two pieces of information are
essential here: the query and the broker. By
editing the Location box and hitting Return, you can force a
modified URL with query back to the server to check this
out.
Now use telnet to construct and send the
corresponding GET command by hand.
- Look at the form and the form of a query constructed for an
Alta Vista search (from Elisa). Discover what URL is used for
the Next page in a multi-page search result? What do you expect
the server to do with this URL?
- A proxy such as wwwcache.anu is able to take requests for other
hosts' URLs and provide a copy, if it has one in its cache - or
to fetch one into its cache. You can explore part of the
interaction by connecting to wwwcache.anu.edu.au at port 80,
and making a request for a full URL (i.e.
http://some.host.somewhere/path). Using Netscape, access some
(small) page from a remote host - since your browser in the lab
uses wwwcache, this page will be loaded into wwwcache's cache.
Now connect to wwwcache and GET the document using the full
URL. What do the headers in the return message tell you?
- The cache is required to check whether its copy is up to date
before returning it to you. A conditional-request for of GET is
used by the proxy server to ask for a time-check and a fresh
copy if necessary, in one request. You can see the effect by
sending a conditional request to iwaki, including the
Last-modified time from a document that you have previously
retrieved.
(connect to iwaki port 80)
GET (URI) HTTP
If-modified-since: (cut and paste the timestamp from the
Last-modified: header in the document
you have previously GOT)
(empty line)
Try modifying this timestamp to different times. When do you
get a document in the response message body? when do you get a
simple short response?
- Experiment with the same idea with the proxy server.
Can
you relate the behaviours here with the operation of your
browser when you (a) go back through the history list or (b)
select a document that you have fetched before (notice the
status line at the bottom of the browser window: is it telling
the truth?).
What can you conclude about the operation of
the history list if you have a disk cache? (your browser may
not be configured to use a local disk cache, but you can set it
to a small local cache while you are experimenting with small
documents)
You can use the rest of your lab time on your assignment.
Last modified: Fri Mar 19 11:36:02 EST 1999
Queries to : infs2052@iwaki.anu.edu.au