Introduction to HTTP
Derek Bridge
Department of Computer Science,
University College Cork
Introduction to HTTP
Aims:
- to know how URLs are structured
- to know the difference between absolute URLs and relative URLs
- to know how HTTP works
What makes the web tick
Diagram of web components from The TCP/IP Guide by Charles M. Kozierok
Web servers
Web server computers
- host web resources (text files, graphics files, programs,...)
- run web server software
The market leaders (www.netcraft.com):
Both free!
Uniform Resource Locators (URLs)
- IP addresses/hostnames are used to identify hosts
- URLs are used to locate resources (text files, graphics files, programs, ...)
Schemes: HTTPS
- The HTTPS scheme is not a protocol
- HTTPS describes use of HTTP 'on top of' SSL/TSL 'on top of' TCP
- Secure Sockets Layer (SSL), now supposedly called
Transport Layer Security (TLS), allows for:
- authentication of the identity of the server
- key exchange
- encryption/decryption using a key generated for this session
- SSL/TLS can also be used 'between', e.g., SMTP and TCP
Pathnames
- Typically, a sequence of directories/folders, ending with the file name
- E.g.
/cs1/cs1102/labs.html
- Directories/folder are organised hierarchically
- The root directory/folder on a server is usually called
/
Class exercise
The current document is highlighted
-
Give the absolute pathname for each of the following
- The current document
- Its base directory/folder
b.html
../a.html
../../dirB/a.html
../../dirA/dirA/a.html
- Give the relative pathname for
b.gif
The HyperText Transfer Protocol (HTTP)
- HTTP uses the client-server model
- Servers listen on port 80
- Transmission is (mostly) by TCP
Diagram of HTTP requests and responses from The TCP/IP Guide by Charles M. Kozierok
Responses and request in more detail
- The user generates an HTTP request
- by typing the URL or by clicking on a link
- The client (browser)
- uses DNS to map the server hostname to its IP address (if necessary)
- establishes a TCP 'connection' with the server
- creates an HTTP request and sends it (using TCP)
- The web server
- receives the request
- takes action (e.g. locates the requested file, if it can)
- creates an HTTP response and sends it (using TCP)
- The browser
- receives the response
- takes action (e.g. displays the web page)
Embedded content
- Suppose the web page contains embedded content (e.g. stylesheets, images)
- The server does not send all the content in one go
- The client receives the web page and then sends separate requests for the embedded
content
- Example 1: http://www.cs.ucc.ie/cs1/mugshots-2007.html
- Example 2
HTTP requests
- Request line (required): command (method), URL and HTTP version number
- Request header lines (largely optional): info about date, browser,...
- Request message body (optional): empty for most commands (methods)
Example HTTP request from The TCP/IP Guide by Charles M. Kozierok
HTTP request commands (methods)
- GET: retrieve a file (95% of requests)
- HEAD: just retrieve header information for a file
- POST: submitting data to a server
Other
- PUT: store enclosed document on server
- DELETE: removed named resource from server
- LINK/UNLINK: in HTTP 1.0, gone in HTTP 1.1
- TRACE: http 'echo' for debugging (added in 1.1)
- CONNECT: used by proxies for tunneling (1.1)
- OPTIONS: request for server/proxy options (1.1)
HTTP responses
- Status line (optional): HTTP version number, status code, short explanation of code
- Response header lines (optional): info about date, server,...
- Response message body (required): the requested resource (web page, image,...)
Example HTTP response from The TCP/IP Guide by Charles M. Kozierok
HTTP response status codes
- 1XX: Informational (used in 1.1):
- e.g. 100 Continue, 101 Switching Protocols
- 2XX: Success:
- e.g. 200 OK, 206 Partial Content
- 3XX: Redirection:
- e.g. 301 Moved Permanently, 304 Not Modified
- 4XX: Client error:
- e.g. 400 Bad Request, 403 Forbidden, 404 Not Found
- 5XX: Server error:
- e.g. 500 Internal Server Error, 503 Service Unavailable