Australian National University Faculty of Engineering and Information Technology INFS2052 Internet and Intranet Information Systems Laboratory/tutorial class week 7 MIME, HTML and HTTP character encoding Mode of tutorial class The suggested mode of the tutorial class is that the tutor will divide the group up into 4 to 6 teams. Each team will have a copy of various RFC standard documents: RFC 2045 MIME message formats RFC 2046 MIME media types RFC 2068 HTTP/1.1 RFC 1738 syntax of URLs table of Latin-1 printable characters (from HTML 3.2 standard) table of HTML 3.2 special character names (from HTML 3.2 standard) table of hexadecimal, decimal and binary for 0..255 decimal The class will have a number of problems to be solved by individuals and teams in the representation and encoding of data in various formats used in MIME, HTML and HTTP. The second part of the class will involve more team-based investigation work in the standards supplied. Introduction Different encodings of data are used in the various open standards which underpin the protocols studied so far in the course. This tutorial has two purposes: to practise students' understanding of these encodings, and to introduce the use of the Standards documents themselves. The tutor may give a short tutorial on the encoding before these problems are tackled. Part 1 - Translation 0. Identify in the MIME document the rules for (1) quoted-printable (2) base64 content transfer encoding (CTEs). You should discuss these rules to ensure that all members of the team understand them. The object of these exercises is not speed - it is understanding by all members of the team. 1. Translate from quoted-printable to plain text, Latin-1 character set: a) -- text suppressed in on line version - come to the class -- b) -- text suppressed in on line version - come to the class -- 2. Translate from base64 to plain text a) -- text suppressed in on line version - come to the class -- b) -- text suppressed in on line version - come to the class -- 3. Show how the plain-text result of question 1(a) and (b) would be represented within an HTML document. 4. How would these texts be shown if they were included in an HTML query string e.g. by being specified as the value of a HIDDEN field in a FORM with ACTION=GET? 5. What is the representation of the Registered symbol in Latin-1 (it appears midway down column 6 of the Latin-1 table) - as represented in (a) 8 bit hex notation? (b) MIME quoted printable? (c) base64 MIME? (d) HTML - answer (1) (e) HTML - answer (2) (f) HTTP query string? Part 2 - Testing The tutor will hand out some mystery translations to teams, the method of finding the answer to be demonstrated by any member of the team chosen by the tutor - after the team has decided on the answer. Part 3 - MIME media types 1. As a team, create a list of all the media types and subtypes in the MIME RFC. Add any media types that you know of from other standards or other sources. Present to the class and discuss any discrepancies. (hint: what is the apparent status of text/http? of text/html?) How would you find out any updated information? 2. Discuss the use of external-body as a subtype of message media type. How does it operate? what advantages might it have for message senders? for mail message receivers? 3. Discuss the implications of content-transfer-encoding for media image/gif and text/html. Is 7-bit adequate for HTML? Part 4 - discussion questions. Using the RFCs etc, the team has to find answers to the following questions if possible, for presentation to the class. For these answers you should be able to refer to "chapter and verse" (i.e. the exact sections in the standards). 1. Are 8-bit characters allowed in URLs - according to HTTP? according to the URL standard? 2. How come HTML uses a terminating ";" in its special character encodings but MIME quoted printable does not? 3. Why does MIME not have Content-encodingig, but HTTP does have it? 4. Why does MIME have Content-transfer-encoding, but HTTP forbids servers to send CTE data to clients?