INFS2052 lecture 2.2

Browser-Server Interaction through HTML

Information Structure of Web Sites

Familiar examples of the way World Wide Web sites are structured

			Unit introduction
				info handout
			Syllabus and topics
			Lecture notes
				Module 1
					Lecture 1.1
				Module 2
					Lecture 2
					Lecture 3
					Lecture 4
			Assignments
				assignment 1
					specification
					additional notes
					assessment results
				assignment 2

... similiarly

			Survey results
			Discipline rules on use of IT services		

These are static structuring methods.

Many sites employ more dynamic techniques - and can become so interactive and dynamic that they no longer have a static structural map.

Page structures for site structures

This information site supports the INFS3056 unit at ANU.
      INFS3056 Course introduction handout 
      Syllabus and lecture topics 
      Lecture notes 
      Assignments 
      Laboratory/tutorial Exercises 

You may be interested in the
      entry skill survey of students 
      Web pages created by members of the class 
	Select initial letter of author's name:
          A  B  C  D  E  F  G  H  I  J  K  L  ...
 

- also search index:

	This is a search Index. Enter search keywords:.              .
            

Example of form as displayed by browser.

See http://info.anu.edu.au/elisa/elibrary/indexes1.html for example.

Basics of HTTP

HTTP - the Hyper Text Transfer Protocol

See also RFC 2068 HTTP/1.1
Tanenbaum section 7.6.2
and many other books about the Web.

HTTP is a simple protocol with few message types.

Assumes reliable transport service (usually over TCP/IP, not necessarily).

Message Formats

2 basic kinds of message: request and response

(all the interest is in the details of sub-formats within these)

Message sequencing

The protocol allows for one request and one response within a connection established at the transport layer.
Then both client and server close the connection.

    client   (CONNECT)      *  REQUEST            *  (CLOSE)
                \          /       \             /
                 \        /         \           /
                  \      /           \         / 
                   \    /             \       /
    server          * (ACCEPT)         * RESPONSE     (CLOSE)

Meaning

The request is received by the server which sends a response.


It's not really that simple...

Form of HTTP messages: header/body

An HTTP message in either direction is in the form

  start line
  *message-header
  CRLF
  [message-body]

The start line denotes the type of message:
request or
status (response)

- see below

The message-header lines are all in the same form as header lines in
RFC 822 (email):

name ":" [field-value] CRLF

e.g.

    Date: Mon, 04 Aug 1997 08:08:02 GMT
    Last-modified: Tue, 25 Feb 1997 06:43:22 GMT
    Server: NCSA/1.5.2
    Content-type: text/html
    Content-Length: 8247

and carry various information about the message.

Some different fields are expected to be associated with requests and responses.

Request

A request message is in the form of a Request line, optional header lines, and an optional message body.

For example, a simple request is

GET      /index.html   HTTP/1.1       CRLF

method      URI         HTTP-version   end of line

URL reminder

The URI (universal resource indicator) here allows a full URL (universal resource locator): e.g.

GET     http://challender.anu.edu.au/index.html    HTTP/1.1

Methods

  • OPTIONS
  • GET
  • HEAD
  • POST
  • PUT
  • DELETE
  • TRACE
  • extension-method
  • Of these, the most widely used are GET, HEAD and POST

    GET request

    (no message body, no header lines required)

    request a document to be sent from the server.

    The response may be the document itself

        HTTP/1.0 200 Document follows
        Date: Mon, 04 Aug 1997 08:18:10 GMT
        Server: NCSA/1.5.2
        Last-modified: Tue, 25 Feb 1997 06:43:22 GMT
        Content-type: text/html
        Content-length: 209
         
        <html><head>
        <title>ANU DCS Information for Students</title>
        </head>
        <body> here is the body of a very simple document </body>
        </html>
    

    note start line includes response code (200) and its meaning for readers (Document follows)

    header lines describe the document and server properties:
    date, content type etc.

    body of document is HTML in this case.

    Error responses do not exclude a message: for example, the dreaded 404:

        HTTP/1.0 404 Not Found
        Date: Mon, 04 Aug 1997 08:29:48 GMT
        Server: NCSA/1.5.2
        Content-type: text/html
         
        <HEAD><TITLE>404 Not Found</TITLE></HEAD>
        <BODY><H1>404 Not Found</H1>
        The requested URL /fred.html was not found on this server.
        </BODY>
    

    HEAD request

    (no message body, no header lines required)

    request only the header information, usually to determine the date-modified or whether the document exists.

    POST request

    send information from the client to the server in the form of a message body (more later)

    Responses code and content

    Responses are all 3-decimal digit codes:

  • 100 range - informational
  • 200 range - successful
  • 300 range - redirection
  • 400 range - Client errors
  • 500 range - server errors
  • The content of the response includes headers (as shown above)
    and a message body - usually in the form of a document for display by the client.

    Basic Server Functions

    Handling of GET, POST and HEAD

    A server may interpret any URL as referring to

    The configuration of the server determines what it does with a URL.

    The common case is that URLs with a pathname that includes the server directory

    	cgi-bin

    refer to a program for execution.

    For example,

    http://challender/cgi-bin/openday/webpages/webpage.pl

    is in fact the pathname of an executable script (in the Perl language) that generates the HTML document for display.

    Interpretation of URL in GET and HEAD

    The server inspects the type of file and the context, and returns headers and that file data as the message
    or executes the program and returns headers (both GET and HEAD) and the result as the body of the message (GET).

    A URL may contain an optional query part up to 255 characters long:

    http://challender/cgi-bin/dostuff?widget+wadget+boff

    The server interprets the query string (following "?") as data to be given to the executing program
    - passed via an environment variable QUERY_STRING in UNIX.

    If the item is a data file then the query part is ignored.

    This provides a more powerful way for a client to interact with the server.

    Interpretation of POST

    A POST message includes a message body, which can contain data of unlimited length.

    Its interpretation is

    [RFC 2068]

    To ``provide a block of data to a process'' the URL is interpreted by the server in the same way as GET (identify a program by pathname);
    but the data is delivered to the program (if any) via its standard input stream (like reading from a keyboard or an input file).

    As with GET, the response may be a further document, a document dynamically constructed by the program, a simple success response, or an error...

    The response is generated by the program writing HTML text to standard output.


    Lecture Notes Index Lecture 6 Lecture 4 Last modified: Tue Mar 30 11:26:19 EST 1999
    Queries to : infs2052.iwaki.anu.edu.au