INFS2052 lecture 2.4

Interaction state and document caching

Saving state in the interaction

The HTTP protocol is stateless
the server does not normally remember one client between requests.

Rationale - the client may go anywhere next - there is often no session relationship established between client and server

- the protocol is/was lightweight, ephemeral connection to lighten load on the server

BUT

when there is a session, such as:

shopping cart

then the client and server need some memory - some state information.

Saving state at server

The server is told where each request comes from -

and can save identity of requesting machine (IP address) in a server database.

Use this database to moderate subsequent requests.

PROBLEMS

Saving state at clients

The browser normally allows no long-term effect on client system
except for saving explicitly requested files

the original issue was trust

  1. the query-string mechanism can be used to preserve some session state
  2. extensions to HTTP and browser software allow cookies for server to save state at client in a safe, restricted way

Client-side state in query strings

Server can include hidden information into URLs embedded in document
or make use of hidden field values in forms.

The user's session state is thus encoded into the document currently held at the client,
and is returned to the server on each new request.

This method can also be used to provide a key for a server-side database which stores per-user information.

A key/password/registration name can be requested from user at start of each session
by otherwise disallowing access beyond server gatekeeper document.

Client-side state in cookies - see RFC 2109

A cookie is an identifying tag stored by the server at the client.

RFC 2109 "HTTP State Managament Mechanism" describes methods evolved by browser/server developers and attempts to standardise the methods.

Cookie values are included in headers of messages: e.g. in response

	Set-Cookie:  NETSCAPE_ID=c65ffb1e,c67de399

contains a tagging value sent by server to client as part of a response.

The value is saved on user's file by user's browser application
(see your ~/.netscape/cookies file for example)
value is tagged by the name of server and a name contained in the cookie.

.netscape.com TRUE / FALSE 946645199 NETSCAPE_ID c65ffb1e,c67de399

When the client sends subsequent requests to that server, it is able to include a Cookie header in the request message header

	Cookie:  NETSCAPE_ID=c65ffb1e,c67de399

The server is able to use this information supplied as additional parameters to a script or program to generate or modify the response.

Netiquette/security issues

Browsers usually allow users to selectively disable the storing of cookies to prevent server from accumulating user information in this way
e.g. Netscape:

Options->
Network preferences->
Protocols: Show alert before accepting a cookie

If users commonly bar cookies, this makes ineffective some services that may be on offer.

For the enterprising service provider, the existence of usual implied contracts of trust between service providers and consumers remain an essential part of The World Wide Web as a successful collaborative enterprise - see later for security/trust considerations.

HTTP performance issues

The Internet is not instant communication (!!)

Steps in making a document request:

  1. client request connection and send request to server
    - network latency = milliseconds (LAN) to tens of seconds (world)
  2. wait in server's internal queue
    - depends on server processor speed and its dynamic workload
  3. server action: disk fetch and/or program execution
    - depends on server processor speed, operating system, disk - milliseconds
  4. server send response to client
    - network latency again
    - network effective bandwidth = 5MB/s .. 500KB/sec (LAN) ..
    1KB/s (modem) .. 100 B/s ..
    time depends on both latency and size of response message
  5. client browser formatting and display
    - milliseconds to seconds

Speed of response is very important in user psychology
and hence determines the effectiveness of any interaction.

The enterprise network and the Internet

The enterprise incurs costs internally (not directly related to bandwidth)
and in going to the external network (often bandwidth related)

HTTP design includes possibility of caching

A cache is

a temporary storage area set aside within a computer's random access memory [or local disk space] to store information which is most frequently accessd in a computer application.

Prentice Hall Illustrated Dictionary of Computing

Networked information systems design includes considering caches at

Normally a request is served from the nearest (hence fastest available) copy of the data entity:
if a copy exists in local store, use it, else intermediate, else fetch origin server copy.

A copy is kept in each cache as it is fetched.

Cache storage space is managed locally at each level,
usually according to a simple predictor of next anticipated usage
- based on when it was most recently used.

Using a cache conflicts directly with the principle of providing up to date information on request - problem of cache coherency.

HTTP 1.1 design goals include supporting caching to

eliminate requests in many cases, eliminate full responses in many others

[RFC 2068 p.69]

Cache design for Web service

Client cache

Assume that browser has caching enabled (user choice)

When user selects a URL in a link
(or types into location bar or into Open Location text field...)

if no document with this URL exists in browser RAM cache then
	if no document with this URL exists in browser disk cache then
		send request to proxy server (if any) or origin server;
		save response document in disk and RAM cache
	else
		copy from disk to RAM cache
format and display from RAM copy

In Netscape the user can select the browser behaviour
when to revalidate the local copy with the proxy or origin server:

Revalidation in HTTP

Revalidation means checking with a server whether the local copy is up to date.

Mechanism:

   Last-modified: Tue, 05 Aug 1997 23:08:16 GMT
  
	GET <url> HTTP/1.1
	If-modified-since: Tue, 05 Aug 1997 23:08:16 GMT

	HTTP 304 Not modified
	Date: Thu, 14 Aug 1997 00:46:13 GMT
	Server: NCSA/1.5.2
	Last-modified: Tue, 05 Aug 1997 23:08:16 GMT
	Content-type: text/html
	Content-length: 2611
	

<end of message>

or

	
	HTTP 200 Document follows
	Date: Thu, 14 Aug 1997 00:46:13 GMT
	Server: NCSA/1.5.2
	Last-modified: Sat, 09 Aug 1997 23:08:16 GMT
	Content-type: text/html
	Content-length: 1561

	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
	<HEAD>

<TITLE> etc.

Proxy server

acts as a cache on behalf of a group of clients (enterprise)

revalidatation based on server controls:

See RFC 2068 section 13.

Implications for Info Systems Design


Lecture Notes Index Lecture 8 Lecture 6

Last modified: Tue Mar 30 11:23:29 EST 1999
Queries to : infs2052@iwaki.anu.edu.au