INFS2052
Internet and Intranet Information Systems
lecture notes - 2.1

Networked Hypertext - HTML

Early information transfer between networked computers

Early information transfer was in the form of

Distinguish client pull, server push.

FTP

FTP and applications over it (ftp) allowed user to browse remote files, retrieve and save locally, view.

Elements

Navigation

No support for

Hypertext

Hypertext existed in text form before computers:

Hypertext on computers added

Some successes, but:

Used more when CD-ROM technology got widespread - larger information base in one package (encyclopedia, technical manual, Standards, specialised government document collections - e.g. Health ROM)

Hypertext on networked computers

Networked computers and hypertext married well - timing was important

The successful HTML/HTTP system was an open system

Networked hypertext - HTML, HTTP

HTML

is the HyperText Markup Language defined and adopted by the World Wide Web.

HTML is a set of markup tags embedded as readable, text-editable tags in the text of a document.

HTML now contains a number of features that support networked hypertext, but it can also be used within a local system.

HTML is the subject of the rest of this lecture and is explored further in the lab classes. It is best learnt from descriptive docuemnts (see lab notes and information resources page for this unit) - this lecture is about context and a view of the technology from the side, not a full technical introduction.

HTTP

is the protocol which delivers documents from servers to clients.

HTTP is a simple protocol (originally very simple) that alos supports services for efficient implementation: caching and proxies in particular.

HTTP is the subject of further lectures and a later lab class.

HTML - HyperText Markup Language - elements

An example of a HTML document from Tanenbaum illustrates some of the features:

(Tanenbaum fig 7-60 A sample scenario for obtaining a Web page)

The HTML document is the part that lies between <HEAD><TITLE>... and the end.

This also illustrates the HTP protocol, showing that any program that issues a properly formatted HTTP GET request to a server will receive a HTTP reply - in this case a docuemtn with leading header and meta-infomration (the part starting with HTTP/1.0 200 Document...)

The C: T: and S: labels are part of the diagram, not part of the protocol.

Significant elements in this document include

URL - Uniform Resource Locators

The URL - uniform resource locator- is central to making HTML useful as a networked hypertext language.

The browser - and HTTP - interpret URLs as references to entities - resources - on the local filesystem or anywhere on the Internet.

The form of a URL is defined in RFC 1738.

URL Syntax

A URL is either

Either form may contain a trailing construct in the form

	 "#" identifier

(the label may be more complex in form than an identifier, but there is no simple name for it in the standard).

A relative URL

is a pathname in the form

Examples:

	"fred_nurk#work"
	"/fred"

An absolute URL

names a scheme, a host computer, and possibly a port, as well as a path:

scheme ":" host [":" port] [absolute-pathname]["#" identifier]

The port number is where a HTTP server normally listens - port numbers are a world-wide convention for services (e.g. the mail server listens on port 25). The default value is 80 for HTTP.

Schemes

The schemes recognised in URLs are defined in RFC1738.

   ftp                     File Transfer protocol
   http                    Hypertext Transfer Protocol
   gopher                  The Gopher protocol
   mailto                  Electronic mail address
   news                    USENET news
   nntp                    USENET news using NNTP access
   telnet                  Reference to interactive sessions
   wais                    Wide Area Information Servers
   file                    Host-specific file names
   prospero                Prospero Directory Service
 
from RFC 1738

Interpretation of pathnames

A pathname may not be that of a file; it can be any resource.

In particular, embedding special punctuation into the pathname is used to give controlling parameters to search engines and other services (more later).

Example URLs

Example URLs from RFC 2151

Kessler &  Shepard   Informational                      [Page 34]
 
RFC 2151       Internet & TCP/IP Tools & Utilities      June 1997
 
 
  file://host/directory/file-name
       Identifies a specific file. E.g., the file htmlasst in the edu
     directory at host ftp.cs.da would be denoted, using the full URL
     form:  <URL:file://ftp.cs.da/edu/htmlasst>.
 
  ftp://user:password@host:port/directory/file-name
       Identifies an FTP site. E.g.:
     ftp://ftp.eff.org/pub/EFF/Policy/Crypto/*.
 
  gopher://host:port/gopher-path

Identifies a Gopher site and menu path; a "00" at the start of

the path indicates a directory and "11" indicates a file. E.g.:

     gopher://info.umd.edu:901/00/info/Government/Factbook92.
 
  http://host:port/directory/file-name?searchpart
       Identifies a WWW server location. E.g.:
     http://info.isoc.org/home.html.
 
  mailto:e-mail_address
       Identifies an individual's Internet mail address. E.g.:
     mailto:s.shepard@hill.com.
 
  telnet://user:password@host:port/
       Identifies a TELNET location (the trailing "/" is optional).
     E.g.: telnet://envnet:henniker@envnet.gsfc.nasa.gov.
 

HTML non-Standardisation and non-determinism

The browser is not bound to interpret HTML in the same way as all others. Display formats can differ.

A browser's other services are not defined by the standard protocol at all: history, bookmarks etc are part of browser individuality and market edge.


Lecture Notes Index Lecture 5 Lecture 3 Last modified: Tue Mar 30 11:19:30 EST 1999
Queries to : infs2052@iwaki.anu.edu.au