Early information transfer was in the form of
Distinguish client pull, server push.
FTP and applications over it (ftp) allowed user to browse remote files, retrieve and save locally, view.
Hypertext existed in text form before computers:
Hypertext on computers added
Some successes, but:
Used more when CD-ROM technology got widespread - larger information base in one package (encyclopedia, technical manual, Standards, specialised government document collections - e.g. Health ROM)
Networked computers and hypertext married well - timing was important
The successful HTML/HTTP system was an open system
is the HyperText Markup Language defined and adopted by the World Wide Web.
HTML is a set of markup tags embedded as readable, text-editable tags in the text of a document.
HTML now contains a number of features that support networked hypertext, but it can also be used within a local system.
HTML is the subject of the rest of this lecture and is explored further in the lab classes. It is best learnt from descriptive docuemnts (see lab notes and information resources page for this unit) - this lecture is about context and a view of the technology from the side, not a full technical introduction.
is the protocol which delivers documents from servers to clients.
HTTP is a simple protocol (originally very simple) that alos supports services for efficient implementation: caching and proxies in particular.
HTTP is the subject of further lectures and a later lab class.
An example of a HTML document from Tanenbaum illustrates some of the features:
(Tanenbaum fig 7-60 A sample scenario for obtaining a Web page)
The HTML document is the part that lies between <HEAD><TITLE>... and the end.
This also illustrates the HTP protocol, showing that any program that issues a properly formatted HTTP GET request to a server will receive a HTTP reply - in this case a docuemtn with leading header and meta-infomration (the part starting with HTTP/1.0 200 Document...)
The C: T: and S: labels are part of the diagram, not part of the protocol.
Significant elements in this document include
The URL - uniform resource locator- is central to making HTML useful as a networked hypertext language.
The browser - and HTTP - interpret URLs as references to entities - resources - on the local filesystem or anywhere on the Internet.
The form of a URL is defined in RFC 1738.
A URL is either
Either form may contain a trailing construct in the form
"#" identifier
(the label may be more complex in form than an identifier, but there is no simple name for it in the standard).
is a pathname in the form
Examples:
"fred_nurk#work"
"/fred"
names a scheme, a host computer, and possibly a port, as well as a path:
scheme ":" host [":" port] [absolute-pathname]["#" identifier]
The port number is where a HTTP server normally listens - port numbers are a world-wide convention for services (e.g. the mail server listens on port 25). The default value is 80 for HTTP.
The schemes recognised in URLs are defined in RFC1738.
ftp File Transfer protocol
http Hypertext Transfer Protocol
gopher The Gopher protocol
mailto Electronic mail address
news USENET news
nntp USENET news using NNTP access
telnet Reference to interactive sessions
wais Wide Area Information Servers
file Host-specific file names
prospero Prospero Directory Service
from RFC 1738
A pathname may not be that of a file; it can be any resource.
In particular, embedding special punctuation into the pathname is used to give controlling parameters to search engines and other services (more later).
Example URLs from RFC 2151
Kessler & Shepard Informational [Page 34]
RFC 2151 Internet & TCP/IP Tools & Utilities June 1997
file://host/directory/file-name
Identifies a specific file. E.g., the file htmlasst in the edu
directory at host ftp.cs.da would be denoted, using the full URL
form: <URL:file://ftp.cs.da/edu/htmlasst>.
ftp://user:password@host:port/directory/file-name
Identifies an FTP site. E.g.:
ftp://ftp.eff.org/pub/EFF/Policy/Crypto/*.
gopher://host:port/gopher-path
Identifies a Gopher site and menu path; a "00" at the start of
the path indicates a directory and "11" indicates a file. E.g.:
gopher://info.umd.edu:901/00/info/Government/Factbook92.
http://host:port/directory/file-name?searchpart
Identifies a WWW server location. E.g.:
http://info.isoc.org/home.html.
mailto:e-mail_address
Identifies an individual's Internet mail address. E.g.:
mailto:s.shepard@hill.com.
telnet://user:password@host:port/
Identifies a TELNET location (the trailing "/" is optional).
E.g.: telnet://envnet:henniker@envnet.gsfc.nasa.gov.
The browser is not bound to interpret HTML in the same way as all others. Display formats can differ.
A browser's other services are not defined by the standard protocol at all: history, bookmarks etc are part of browser individuality and market edge.
Last modified: Tue Mar 30 11:19:30 EST 1999