Web Development

Dr Derek Bridge
School of Computer Science & Information Technology
University College Cork

Lecture Objectives

  • learn about caching — clicks don't always result in requests
  • learn that browsers are not the only clients

Client-server

The web has machines running client software and machines running server software.

Repeated Requests

  • The user clicks on a link:
    A GET request and response.
  • Suppose the user presses the Back button later today, or clicks on the same link tomorrow.

Browser Caching

  • Once it has it, the browser can store the resource in its cache.
  • Then, subsequent requests can be satisfied from the cache:
    A request can be satisfied from the cache.

What to Cache

HTTP response headers control what to cache:

A resource that the browser should not cache.

How Long to Keep It

Headers also control when to discard it (in seconds):

A resource that the browser should cache for a week.

Closing Remarks about Caching

  • The headers are more complicated than those shown here!
  • Caching is used all over the Internet, e.g. in forward-proxies.

Clients

Browsers are not the only clients!

Browsers include Chrome, Edge, Firefox, Opera, Safari, but there are browsers for mobile devices too.

Visually-Impaired Users

a regular browser or special browser
+
a screen reader,
which sends the information from the web page to a speech synthesizer or a braille display.

Sometimes these browsers are voice-controlled.

Other Clients

harmful beneficial
denial of service attacks dead link checkers
email harvesting search engine crawlers

Denial of Service Attacks


repeat:
    send HTTP GET request to www.example.org
    receive HTTP response but discard it
    

Ouch!

Email Harvesting


list_of_urls = [www.example.org]
list_of_emails = []
while list_of_urls is not empty:
    remove a URL from list_of_urls
    send HTTP GET request to the URL
    receive HTTP response
    find all email addresses within the response
    insert the email addresses into list_of_emails
    find all hyperlinks within the response
    insert the URLs of the hyperlinks into list_of_urls
    

Now sell the list of email addresses to spammers!

Dead Link Checking


list_of_urls = [www.example.org]
list_of_dead_links = []
while list_of_urls is not empty:
    remove a URL from list_of_urls
    send HTTP GET request to the URL
    receive HTTP response
    if response status code is 404:
        insert the URL into list_of_dead_links
    else:
        find all hyperlinks within the response
        insert the URLs of the hyperlinks into list_of_urls
    

Print out the dead links for the web developer to fix!

Web Search Engines

Search engines use a crawler to build an index.

Indexes

Search Engine Crawlers


list_of_urls = [www.example.org]
while list_of_urls is not empty:
    remove a URL from list_of_urls
    send HTTP GET request to the URL
    receive HTTP response
    find all important words within the response
    for each important word:
        in the index entry for that word, insert the URL
    find all hyperlinks within the response
    insert the URLs of the hyperlinks into list_of_urls    
    

Summary & Consequences

There are many different clients:
what they have in common is they send HTTP requests and receive HTTP responses.

Web Developers must endeavour to create web pages, web sites and web apps that are usable by many different kinds of clients!

G'luck!