Because elements are nested, web pages have a hierarchical structure.
Terminology:
tree
node
root leaf
parent child
ancestor descendant
sibling
<!DOCTYPE html>
<html>
<head>
<title>A simple document</title>
</head>
<body>
<p>
Some words.
</p>
<p>
More words
<em>and emphasised words</em>
and final words.
</p>
</body>
</html>
You should ensure your HTML is well-formed.
lang
attribute (it's a global attribute).
alt
attribute (e.g. <img>
).
<p>
elements cannot contain other <p>
elements.<p>
elements cannot contain <ul>
or <ol>
elements.<ul>
and <ol>
elements can only contain zero, one or more <li>
elements.Nesting must be done properly.
<p>This is <em>correct</em></p>
<p>This is <em>incorrect</p></em>
Otherwise there is difficulty building the tree.
A web developer writes HTML to produce a nested list.
<ul>
<li>badgers</li>
<li>wombats</li>
<ul>
<li>common wombat</li>
<li>hairy-nosed wombat</li>
</ul>
<li>squirrels</li>
</ul>
Which rule is broken?
<ul>
<li>badgers</li>
<li>wombats</li>
<ul>
<li>common wombat</li>
<li>hairy-nosed wombat</li>
</ul>
<li>squirrels</li>
</ul>
<ul>
<li>badgers</li>
<li>wombats
<ul>
<li>common wombat</li>
<li>hairy-nosed wombat</li>
</ul>
</li>
<li>squirrels</li>
</ul>
In fact, HTML5 has two sets of syntax rules!
XML syntax uses a very strict set of rules.
HTML syntax allows you to break the rules…
in certain cases.
In XML syntax, e.g.:
tags must be in lowercase;
each start tag must have an end tag or, for void elements, an extra slash;
attribute values must be quoted;
…and so on.
Ironically, it's easier to use the strict XML syntax than the HTML syntax!
What does your browser do if your web page is not well-formed?
Browsers (almost) never give error messages.
They do their best to build the tree and display the page.
If browsers don't give error messages,
how do you know if your page is well-formed or not?
You can validate your page:
https://validator.nu/
https://validator.w3.org/
A character encoding refers to the way the numbers are converted to bytes for storage and transmission.
ASCII | 7 bits for every character |
UTF-32 | 4 bytes for every character |
UTF-8 | 1 byte for ASCII characters and 2, 3 or 4 for others |
Browsers and the HTML validator need to know which character encoding was used to create your web page.
Find out which encoding your editor is using and specify that character encoding in a meta
element in the <head>
of your HTML, e.g.:
<meta charset="utf-8" />
Content-Type
HTTP header.<meta>
element.charset
in HTTP responsesSince version 55 of Chrome, Google doesn't even look at the charset tag anymore. It assumes utf-8, no matter what you have written.
Other browsers are following suit.
< | < |
> | > |
" | " |
& | & |
Á | Á |
á | á |
€ | € |
½ | ½ |
charset
is UTF-8), then this is less relevant.
You can include them by name (if they have one), by hexadecimal number or by decimal number, e.g.
á | á | á |