CS6120 Lab 05

Item-based collaborative filtering

Copy the script you wrote for Lab 4 and modify it so that it makes item-based, in place of, user-based predictions. Specifically, let's again predict user 1's rating for item 12 using k=20.

Of the items rated by user 1, you need the 20 that are most similar to item 12. For this, I have written you a function called get_k_nearest_items: see this sheet's Appendix.
Then make the prediction using the formula given in Lecture 5.

Test your script, like we did in Lab 4.

Writing a test harness

Save a copy of this file: cs6120_mltd_1.txt.

Each line of the file contains: a user id, an item id, and the user's rating for the item.

Write a PHP script that makes a prediction for each line in the file. Specifically, your script will do the following:

Open the file
Use a while loop to read through the file
In the body of the loop:
1. From the line of the file that you've just read, extract the user id, the item id and the actual rating
2. Compute a predicted rating. (You can do this in a user-based way from Lab 04 or in an item-based way from this lab sheet.)
3. Compute the absolute error between the actual and predicted ratings, and add it to a running total
After the loop, compute and display the MAE
Close the file

Once you get this far, you'll probably receive an error message that says: Fatal error: Maximum execution time exceeded, or words to that effect. This is because you cannot use your browser to run a PHP script that takes as long to run as this one will. The next paragraph explains what to do...

Login to cosmos. Change directory to wherever you are doing this work. Let's assume you've called your script my_script.php. Then to run it, type the following at the cosmos command line:

 php my_script.php

It will run, perhaps taking as long as 10-20 minutes (!). If you want to save its ouput in a file called, say, output.txt, rather just display it on the screen, then use the following instead:

php my_script.php > output.txt

Next steps

Bearing in mind that evaluating predictions and recommendations will form the basis of the CS6120 assignment, you might like to try some or all of the following either now or in your own time:

Experiment with different values for k in making item-based predictions.
Extend your script to make item-based recommendations (rather than predictions).
Read the Appendix and see if you can come up with other ways of making item-based predictions and recommendations.
Extend your test harness so that it makes several different types of predictions using different algorithms, e.g. user-based and item-based. It should then output a table of MAEs, one for each algorithm.

Appendix

Here is a list of additional functions that I have written for your use. I assume that your script begins with the fragment of PHP that I showed you at the start of lab sheet 4.

cf->get_k_nearest_items($k, $i_id, $a_id)

Of the items rated by user $a_id, returns the $k most similar items to item $i_id

The result is an array containing the nearest neighbours, in no particular order. The length of this array will be no more than $k and may be less than $k if the user has rated fewer than $k items