CS6120 Lab 05

Item-based collaborative filtering

Copy the script you wrote for Lab 4 and modify it so that it makes item-based, in place of, user-based predictions. Specifically, let's again predict user 1's rating for item 12 using k=20.

  1. Of the items rated by user 1, you need the 20 that are most similar to item 12. For this, I have written you a function called get_k_nearest_items: see this sheet's Appendix.
  2. Then make the prediction using the formula given in Lecture 5.

Test your script, like we did in Lab 4.

Writing a test harness

Save a copy of this file: cs6120_mltd_1.txt.

Each line of the file contains: a user id, an item id, and the user's rating for the item.

Write a PHP script that makes a prediction for each line in the file. Specifically, your script will do the following:

  1. Open the file
  2. Use a while loop to read through the file
  3. In the body of the loop:
    1. From the line of the file that you've just read, extract the user id, the item id and the actual rating
    2. Compute a predicted rating. (You can do this in a user-based way from Lab 04 or in an item-based way from this lab sheet.)
    3. Compute the absolute error between the actual and predicted ratings, and add it to a running total
  4. After the loop, compute and display the MAE
  5. Close the file

Once you get this far, you'll probably receive an error message that says: Fatal error: Maximum execution time exceeded, or words to that effect. This is because you cannot use your browser to run a PHP script that takes as long to run as this one will. The next paragraph explains what to do...

Login to cosmos. Change directory to wherever you are doing this work. Let's assume you've called your script my_script.php. Then to run it, type the following at the cosmos command line:

 php my_script.php
 

It will run, perhaps taking as long as 10-20 minutes (!). If you want to save its ouput in a file called, say, output.txt, rather just display it on the screen, then use the following instead:

php my_script.php > output.txt

Next steps

Bearing in mind that evaluating predictions and recommendations will form the basis of the CS6120 assignment, you might like to try some or all of the following either now or in your own time:

Appendix

Here is a list of additional functions that I have written for your use. I assume that your script begins with the fragment of PHP that I showed you at the start of lab sheet 4.

cf->get_k_nearest_items($k, $i_id, $a_id)

Of the items rated by user $a_id, returns the $k most similar items to item $i_id

The result is an array containing the nearest neighbours, in no particular order. The length of this array will be no more than $k and may be less than $k if the user has rated fewer than $k items

Each neighbour in the array is represented as an associative array, whose keys are as follows:

cf->get_k_nearest_items($k, $i_id)

Returns the $k most similar items to item $_id

The result is an array containing the nearest neighbours, in no particular order. The length of this array will be no more than $k and may be less than $k if there are fewer than $k items in the database

Each neighbour in the array is represented as an associative array, whose keys are as follows:

cf->get_thresholded_nearest_items($k, $i_id, $a_id)

Of the items rated by user $a_id, returns all items whose degree of similarity to item $i_id exceeds $threshold

The result is an array containing the nearest neighbours, in no particular order.

Each neighbour in the array is represented as an associative array, whose keys are as follows:

cf->get_thresholded_nearest_items($k, $i_id)

Returns all items whose degree of similarity to item $i_id exceeds $threshold

The result is an array containing the nearest neighbours, in no particular order.

Each neighbour in the array is represented as an associative array, whose keys are as follows:

cf->get_k_thresholded_nearest_items($k, $i_id, $a_id)

Of the items rated by user $a_id, returns the $k most similar items to item $i_id provided their similarity to $i_id exceeds $threshold

The result is an array containing the nearest neighbours, in no particular order. The length of this array will be no more than $k and may be less than $k if the user has rated fewer than $k items

Each neighbour in the array is represented as an associative array, whose keys are as follows:

cf->get_k_thresholded_nearest_items($k, $i_id)

Returns the $k most similar items to item $i_id provided their similarity to $i_id exceeds $threshold

The result is an array containing the nearest neighbours, in no particular order. The length of this array will be no more than $k and may be less than $k if the user has rated fewer than $k items

Each neighbour in the array is represented as an associative array, whose keys are as follows: