HTML, XML and XHTML
Derek Bridge
Department of Computer Science,
University College Cork
HTML, XML and XHTML
Aims:
- to understand the need for standards
- to appreciate the HTML 4.01 standard
- to see the motivation for XML
- to know the differences between HTML and XHTML
Web browsers
We can see the need for standards
Invention of the world wide web
Between 1989 and 1991, Tim Berners-Lee, working at CERN, invented
- the idea of the world wide web (to some extent)
WWW = hypermedia + the Internet
-
HTTP (how web programs communicate)
- URLs (the locations of documents)
- HTML (a markup language)
- the first web browser (WorldWideWeb, renamed Nexus)
Tim Berners-Lee photo from W3C
Large hadron collider photo from CERN
A potted history of HTML standards
Date |
HTML standards |
Web browsers |
Early 1991 |
|
The first web browser |
Late 1991 |
The first published description of HTML |
|
1992 |
|
Staff & students at NCSA develop the Mosaic browser |
1993 |
The first proper specification of HTML (based on SGML) is published |
The Netscape Communications Corporation is formed |
The World Wide Web Consortium (W3C)
-
In 1994, Berners-Lee left CERN and founded the World Wide Web Consortium
(W3C)
- W3C member organisations (e.g. companies) work together to develop web standards
- W3C's mission:
To lead the World Wide Web to its full potential by developing protocols and
guidelines that ensure long-term growth for the Web.
W3C logo from W3C
A potted history of HTML standards
Date |
HTML standards |
Web browsers |
1994 |
The W3C is formed |
Netscape release the Netscape Navigator browser and the web 'takes off' |
1995 |
The HTML 2.0 standard is published |
But Netscape Navigator is 'ahead' of the standard |
1995 |
The W3C starts work on HTML 3.0 and 3.1 |
The first version of Internet Explorer is released |
|
|
The browser wars begin |
A potted history of HTML standards
Date |
HTML standards |
Web browsers |
Early 1997 |
The W3C publishes the HTML 3.2 standard |
Browsers diverge widely from the standard |
Late 1997 |
The W3C publishes HTML 4.0, attempting to 'rein in' the browsers.
(Minor changes are made in 1991 and 2001, the last version being HTML 4.01.) |
|
1998 |
The Web Standards Project starts |
| |
The Web Standards Project (WaSP)
- In 1998, the Web Standards Project was founded
- WaSP's mission:
The Web Standards Project is a grassroots coalition fighting for standards
which ensure simple, affordable access to web technologies for all
- They wanted to persusade suppliers of browsers to adhere to W3C recommendations
- Now they want to persuade authors to use the standards
W3C logo from W3C
A potted history of HTML standards
Date |
HTML standards |
Web browsers |
1998 |
The XML standard is published |
Internet Explorer triumphs in the browser wars. (But its market share has declined
since 2004) |
2000 |
The XHTML 1.0 standard is published. |
Browser support for the HTML & CSS standards grows throughout the period |
2001 |
The XHTML 1.1 standard is published. |
|
HTML 4.01
- HTML 4.01 tries to define a clear standard.
- It deprecates several non-standard extensions (e.g. the
font
tag)
deprecate:
to discourage use of a feature; the feature is retained in the
short-term for backwards compatibility but is likely to be
discontinued in the future
HTML 4.01
HTML 4.01 defines three variants:
- Strict HTML 4.01
- Deprecates and (in some sense) disallows most (but not all) presentational markup.
Authors are expected to use CSS instead
- Transitional HTML 4.01
- Allows presentational markup in recognition of poor support for CSS at the time
and in recognition of the large quantities of legacy web pages
- Frameset HTML 4.01
- Like Transitional, but supporting
frameset
in place of
body
. (Ignore!)
Telling a browser which variant you are using
At the very start of your document, include one of the following:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd">
Standards mode and quirks mode
- Older browsers may have deviated from the standards (bugs or
deliberate extensions)
- Authors may have written web pages in ways that cope with these bugs or exploit
these extensions
- How can newer browsers (which mostly do obey the standards)
successfully display such web pages?
- If there is a complete DOCTYPE declaration, the browser goes into
standards mode, i.e. it displays the page in the way the standard
specifies
- But if there is no DOCTYPE or an incomplete one, the browser goes into
quirks mode, i.e. it uses the old algorithms for displaying the page.
The desire for extensibility
- HTML offers only a limited set of tags.
- But organisations need new markup languages for new purposes, e.g.
- Mathematicians want markup languages for mathematical expressions
- Chemists want markup languages for chemical formulae
- Composers want them for presenting musical scores
- Libraries for book catalogs; business for product inventories; physicians
for medical records...
- There was a risk of an avalanche of nonstandard markup languages
- There was a need for a standardised framework for creating new markup languages
Extensible Markup Language (XML)
- There was already such a framework, the Standard Generalised Markup Language
(SGML), but it was too complex
- So the W3C devised XML, based on SGML
- XML is not a markup language
- It is a metalanguage:
A language (or 'framework') for defining other languages
Example: a markup language for addresses
Example: a markup language for addresses
Example: a markup language for addresses
Example: a markup language for addresses
Example: a markup language for recipes
- Suppose chefs want to exchange recipes like this one:
Zuppa Inglese
- Class exercise: What tags might they use?
The Extensible HyperText Markup Language (XHTML)
- Markup languages defined using XML are increasingly the way that
organisations share data
- But where does that leave 'good old web pages'?
- HTML is not based on XML
- The W3C defined XHTML:
- XHTML is a markup language,
defined using XML, that is as close to HTML 4.01 as possible
- Like HTML 4.01, it has three variants: strict, transitional and frameset
Telling a browser which variant you are using
At the very start of your document, include one of the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
In fact, this is what you should write
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
...
</html>
(And similarly for Transitional and Frameset)
What are the differences between HTML and XHTML
In XHTML,
- elements must be properly nested
- end tags must be present (but you can use the shorthand for empty elements)
- all tags and attributes must be in lowercase
- all attribute values must be in double quotes, and
- one or two other things!
What will we use?
- We've been so self-disciplined with our HTML that it hardly matters!
- Older browsers are a little uneven in their support for XHTML
- But it is the future (maybe)
- Jonathan Hedley's HTML Tidy Tool can tidy HTML, and output XHTML:
http://infohound.net/tidy/
- So let's use Strict XHTML 1.0
- For assessments, your work will be tested on Firefox, version 2.0:
http://www.mozilla.com/