CS2514 Lab 6
Exercise
You will be writing a movie recommender. Take a look at these files:
movies.txt
This file contains 10 movies, each with a unique id (from 0) and a title.customers.txt
This file contains 10 customers, each with a unique id (from 0) and a name.ratings.txt
Each line of this file has 3 numbers: a customer id, a movie id and a star-rating (from 1 to 5). For example, the first line of the file,0|0|5
, shows that customer 0 has given 5 stars to movie 0; the last line of the file,9|8|4
, shows that customer 9 has given 4 stars to movie 8.
To recommend a movie to a customer, you score all the movies s/he has not already rated, and recommend the one with the highest score. (If more than one movie has joint highest score, then pick the first.)
How do you score the movies? Consider the people who have rated the movie. Take a
weighted average of their star ratings for that movie. That's the score. Here it is in
symbols. Suppose we want to recommend a movie to customer c. We score each movie
m that c has not rated using the following:
$$score(c, m) = \frac{\sum_{x \in S} w(c, x) \times r(x, m)}{\sum_{x \in S} w(c, x)} $$
where $S$ is the set of people who have rated movie $m$ and $r(x, m)$ is the number of stars
$x$ has given to $m$. (If $\sum_{x \in S} w(c, x)$ is zero, then the score is zero.)
What do we use for the weights, $w(c, x)$? We use the inverse of the Euclidean distance: $$w(c, x) = \frac{1}{1 + \sqrt{\sum_{m \in M} (r(c, m) - r(x, m))^2}} $$ where $M$ is the set of movies that $c$ has rated but that $x$ has also rated, i.e. the movies they have in common.
Your program must contain a main
method in a class called
MovieRecommenderTester
, but otherwise you decide what other classes you want.
If you wish, you can assume that there will be 10 customers (0-9) and 10 movies (0-9), allowing you to use arrays of size 10. However, you can gain more credit by being more general and allowing arbitrary numbers of customers and movies, which makes arrays less suitable. (Note though that this is quite a bit more difficult.)
The number of ratings per customer can vary and so storing each customer's ratings in arrays will not be the best solution. (In real recommenders, there are many customers and movies but relatively few ratings per customer, making arrays even less suitable.) So you will be thinking about using, e.g., lists for these.
The files are text files. To read these in, you will need a FileReader
and,
optionally, a BufferedReader
.
When reading in the data from the files, you'll need to convert String
s to
int
s. This is done using Integer.parseInt
, which is a class
method. It will need to be inside a try
with a catch
for
a NumberFormatException
.
Here is the output of my program:
To customer 0, I recommend id: 9 title: Room (2015) To customer 1, I recommend id: 8 title: The Martian (2015) To customer 2, I recommend id: 8 title: The Martian (2015) To customer 3, I recommend id: 8 title: The Martian (2015) To customer 4, I recommend id: 8 title: The Martian (2015) To customer 5, I recommend id: 2 title: Selma (2015) To customer 6, I recommend id: 1 title: Inside Out (2015) To customer 7, I recommend id: 2 title: Selma (2015) To customer 8, I recommend id: 8 title: The Martian (2015) To customer 9, I recommend id: 1 title: Inside Out (2015)
Submission
Deadline: 4pm, Friday 8th April 2016.
Put all your class definitions into a directory called lab6
.
lab6
, not Lab6
,
lab-6
, Lab-6
or some other variant. Variants will not be graded.
- Open the Dolphin File Manager. Right-click on your
lab6
directory. Choose Compress from the menu and As Zip File from the sub-menu. You should see a new icon appear, namedlab6.zip
. - Open a console; use the cd command to move to the directory that contains
lab6.zip
and type:submit-2514 lab6.zip
Challenge exercise
There's no challenge exercise. You'll be too busy making your solution as fantastic as possible! And then get on with exam revision!