Australian National University

Faculty of Engineering and Information Technology

INFS2052: Internet and Intranet Information Systems

Laboratory Exercise Week 5
The HyperText Transfer Protocol (HTTP)

In this tutorial the class can discuss some of the Web sites that you have already visited for assignment 1, and the challenge of discovering and describing their information structure. Tutorial help on HTML scripting will be available if you want it.

The main task in the lab is to get some understanding of the HTTP protocol in action, and the role of the proxy server or cache.

Discovering Web site information structure

I suggest about twenty minutes of class discussion at a time in the session selected by the tutor. Topics for discussion:

More HTML and web page construction

Help is available from tutor on request, and a small focussed group session may be run by the tutor.

The xpaint tool can be used to create GIF images if you want to include diagrams in your web pages (warning: use drawing tools in moderation, thay can make you excited).
xpaint is available at

   
     /dept/dcs/infs2052/public/bin/xpaint   
   
   
You can also using xfig and then export the file as a GIF file. Any other drawing tool can be used if you wish, as long as you save the file as a GIF file or a JPG file. You can use any other system you like (Mac, PC) if you can transfer the files yourself (even by email).

Doing HTTP by hand

The HyperText Transfer Protocol just uses printable ASCII characters, so it can be generated by you at a keyboard or with a text editor. The messages can be sent to a HTTP server because such servers normally listen for a TCP/IP connection at port 80, and the telnet program is able to make connections send typed text and receive text replies.

The basic method is as follows:

  1. connect to a server or proxy at port 80. For example
          % telnet online.anu.edu.au 80
          
  2. send a request such as GET or HEAD. For example
          GET /people/Roger.Clarke/index.html HTTP
          (empty line)
          
  3. analyse the response
The empty line after the request is necessary: in a later example you will see that you can put headers in the request, and the blank line signals the end of the request and its header lines.
  1. Get the infs2052 motd.html (message of the day) page from the unit's home page in this way.
    Analyse the kind of errors you get, systematically. Notice what error responses you get in particular, including 404 Document not found and the (puzzling) Document moved messages.
  2. Roger Clarke's site (referenced in the example above) includes a search index. Use Netscape to look at the introductory page of the site and try searching for the name of a person: "Johnson", for instance.
    You can see the form of the query string that was used in Netscape Location line (scroll through this with the arrow keys; widen your browser's window if necessary).
    Look at the source of the original page to see where the pieces of information come from, and the specification of the ACTION on the form being submitted.
    Only two pieces of information are essential here: the query and the broker. By editing the Location box and hitting Return, you can force a modified URL with query back to the server to check this out.
    Now use telnet to construct and send the corresponding GET command by hand.
  3. Look at the form and the form of a query constructed for an Alta Vista search (from Elisa). Discover what URL is used for the Next page in a multi-page search result? What do you expect the server to do with this URL?
  4. A proxy such as wwwcache.anu is able to take requests for other hosts' URLs and provide a copy, if it has one in its cache - or to fetch one into its cache. You can explore part of the interaction by connecting to wwwcache.anu.edu.au at port 80, and making a request for a full URL (i.e. http://some.host.somewhere/path). Using Netscape, access some (small) page from a remote host - since your browser in the lab uses wwwcache, this page will be loaded into wwwcache's cache. Now connect to wwwcache and GET the document using the full URL. What do the headers in the return message tell you?
  5. The cache is required to check whether its copy is up to date before returning it to you. A conditional-request for of GET is used by the proxy server to ask for a time-check and a fresh copy if necessary, in one request. You can see the effect by sending a conditional request to iwaki, including the Last-modified time from a document that you have previously retrieved.
          (connect to iwaki port 80)
          GET (URI) HTTP
          If-modified-since: (cut and paste the timestamp from the 
    			  Last-modified: header in the document 
    			  you have previously GOT)
          (empty line)
            
    Try modifying this timestamp to different times. When do you get a document in the response message body? when do you get a simple short response?
  6. Experiment with the same idea with the proxy server.
    Can you relate the behaviours here with the operation of your browser when you (a) go back through the history list or (b) select a document that you have fetched before (notice the status line at the bottom of the browser window: is it telling the truth?).
    What can you conclude about the operation of the history list if you have a disk cache? (your browser may not be configured to use a local disk cache, but you can set it to a small local cache while you are experimenting with small documents)

You can use the rest of your lab time on your assignment.


Last modified: Fri Mar 19 11:36:02 EST 1999
Queries to : infs2052@iwaki.anu.edu.au