URLs
Derek Bridge
Department of Computer Science,
University College Cork
URLs
Aims:
- to know the parts of a simple URL
- to know the difference between absolute and relative URLs
- to know what kind of URL to use for external and internal links
Without hyperlinks, it wouldn't be a web!
- A link is a connection from one Web resource to another
- One way to link to a resource is to say where it is located (its 'address')
- On the Web, these 'addresses' are called Uniform Resource Locators (URLs)
- <a href="url">text</a>
- Go to <a href="http://www.cs.ucc.ie/~dgb/courses/pwd.html">CS1109</>
URLs
- A URL comprises a scheme and the 'rest' separated by a colon or a
colon and two forward slashes
- The scheme tells the client (e.g. your browser) what to do with the rest of the URL
- Some schemes:
http
, e.g. http://www.cs.ucc.ie/~dgb/courses/pwd.html
https
, e.g. https://uccshop.ie/checkout/onepage/
file
, e.g. file:///Z://users/2016/dtab6/public_html/mypage.html
mailto
, e.g. mailto:d.bridge@cs.ucc.ie
- Question: If you omit the scheme, what does your browser assume?
HTTP URLs
- In general, the format is:
http://user:password@hostname-or-IP-address:port/pathname?query#fragment
- We will focus on the main parts:
http://hostname-or-IP-address/pathname
- E.g.
http://www.cs.ucc.ie/~dgb/courses/pwd.html
IP addresses
- Every device on a network has a hardware address (e.g. MAC address),
unique to that network (at least)
- But we need an address that is unique across the whole Internet
- Hence, every device that is connected to the Internet is assigned a unique
IP address (Simplification!)
- To send a message to a device you use its IP address
The Domain Name System (DNS)
- Numeric IP addresses are cumbersome for humans
- Hence, most computers (hosts) that are connected to the Internet also have one (or more) names (hostnames)
- E.g.
www.cs.ucc.ie
, cs1.ucc.ie
, www.rte.ie
- DNS acts like a directory enquiries system: it automatically takes names and translates them into IP addresses
Pathname
- Directories/folder are organised hierarchically
- A pathname is typically a sequence of directories/folders, ending with the file name
- E.g.
/dirA/dirC/a.html
- Class exercise: Write down a pathname from root to
b.gif
Two kinds of links, two kinds of URLs, but just three possibilities
| Absolute URL | Relative URL |
External link | hostname + absolute pathname | N/A |
Internal link | absolute pathname | relative pathname |
Absolute URLs and relative URLs
- Absolute URLs use absolute pathnames: ones that give complete directions to the file
starting from the top of the file hierarchy (/)
- Relative URLs use relative pathnames: ones that give directions to the file starting
from the current document
Using an absolute URL for an external link
- E.g.
<a href="http://www.rte.ie/news/index.html">RTE News</a>
- Since you are linking to a page on another computer, you must include:
- the hostname (or IP address), e.g.
www.rte.ie
- an absolute pathname, e.g.
/news/index.html
- These are also the URLs that you can see in (or type into) the Location box in your browser
Using an absolute URL for an internal link
- Since you are linking to a page on the same computer, you can omit the hostname/IP address
- But you include the absolute pathname of the file you're linking to (starting with /)
- E.g. suppose I am putting an anchor element into
a.html
that links to c.html
<a href="/dirA/dirC/dirD/c.html">Click me</a>
- N.B. The forward slash at the start
- Question: You can include the hostname (or IP address). But why is it
better to omit it (as in these examples)?
- When you write an absolute URL in CS1109, yours will start with a forward slash, a tilde and your
user id, which is a shorthand for 'start from my
public_html
folder'
- E.g.
<a href="/~dtab6/images/birthday.jpg">My party!</a>
Using a relative URL for an internal link
- In this case, you omit the hostname/IP address
- But you include the relative pathname of the file you're linking to, i.e. starting from the current document
- E.g. suppose I am putting an anchor element into
a.html
that links to c.html
<a href="dirD/c.html">Click me</a>
- Recall that
..
means go up one level to the parent directory/folder
- E.g. suppose I am putting an anchor element into
a.html
that links to d.html
<a href="../d.html">Click me</a>
- N.B. No forward slash at the start
Class exercise
- You are editing
a.html
- Using absolute URLs, write hyperlinks to
b.html
b.gif
- Repeat, this time using relative URLs
Index files
- Omitting the file name (so the URL ends with a slash) gives the URL of a
directory
- E.g.
http://www.rte.ie/news/
- In this case, most servers look for a default file to display
- Most often, they look for a file called
index.html
, e.g.
http://www.rte.ie/news/index.html
- If there is no such file, they'll probably just list the contents of the
directory
File and folder names
- Unix/Linux/Mac OS X are case-sensitive; Windows isn't
- It's best to use only lowercase, to avoid all problems
- Avoid space and punctuation characters in these names too
- Suppose your file is called
my page.html
(which has a space)
- URLs cannot contain spaces: they must be encoded as
%20
- E.g.
<a href="/my%20page.html">My page</a>
- The same applies to about 20 other characters too, e.g. <, >, @
Fragments
- We also know how to link to a particular point within a document
- E.g. at the bottom of the page, we can include a link back to the top
- How?