HTTP response headers control what to cache:
Headers also control when to discard it (in seconds):
Browsers are not the only clients!
a regular browser or special browser
+
a screen reader,
which sends the information from the web page to a speech synthesizer or a braille display.
Sometimes these browsers are voice-controlled.
harmful | beneficial |
---|---|
denial of service attacks | dead link checkers |
email harvesting | search engine crawlers |
repeat:
send HTTP GET request to www.example.org
receive HTTP response but discard it
Ouch!
list_of_urls = [www.example.org]
list_of_emails = []
while list_of_urls is not empty:
remove a URL from list_of_urls
send HTTP GET request to the URL
receive HTTP response
find all email addresses within the response
insert the email addresses into list_of_emails
find all hyperlinks within the response
insert the URLs of the hyperlinks into list_of_urls
Now sell the list of email addresses to spammers!
list_of_urls = [www.example.org]
list_of_dead_links = []
while list_of_urls is not empty:
remove a URL from list_of_urls
send HTTP GET request to the URL
receive HTTP response
if response status code is 404:
insert the URL into list_of_dead_links
else:
find all hyperlinks within the response
insert the URLs of the hyperlinks into list_of_urls
Print out the dead links for the web developer to fix!
brocolli | p.2, p.17, … |
carrots | p.2, p.112, … |
cheese | p.6, p.17, p.28, … |
brocolli | URLs of pages mentioning brocolli |
carrots | URLs of pages mentioning carrots |
cheese | URLs of pages mentioning cheese |
list_of_urls = [www.example.org]
while list_of_urls is not empty:
remove a URL from list_of_urls
send HTTP GET request to the URL
receive HTTP response
find all important words within the response
for each important word:
in the index entry for that word, insert the URL
find all hyperlinks within the response
insert the URLs of the hyperlinks into list_of_urls
There are many different clients:
what they have in common is they send HTTP requests and receive HTTP responses.
Web Developers must endeavour to create web pages, web sites and web apps that are usable by many different kinds of clients!