Copy the script you wrote for Lab 4 and modify it so that it makes item-based, in place of, user-based predictions. Specifically, let's again predict user 1's rating for item 12 using k=20.
get_k_nearest_items
: see this sheet's Appendix.
Test your script, like we did in Lab 4.
Save a copy of this file: cs6120_mltd_1.txt.
Each line of the file contains: a user id, an item id, and the user's rating for the item.
Write a PHP script that makes a prediction for each line in the file. Specifically, your script will do the following:
while
loop to read through the file
Once you get this far, you'll probably receive an error message that says: Fatal error: Maximum execution time exceeded, or words to that effect. This is because you cannot use your browser to run a PHP script that takes as long to run as this one will. The next paragraph explains what to do...
Login to cosmos. Change directory to wherever you are doing this work.
Let's assume you've called your script my_script.php
. Then to run it,
type the following at the cosmos command line:
php my_script.php
It will run, perhaps taking as long as 10-20 minutes (!). If you want to save its ouput in
a file called, say, output.txt
,
rather just display it on the screen, then use the following instead:
php my_script.php > output.txt
Bearing in mind that evaluating predictions and recommendations will form the basis of the CS6120 assignment, you might like to try some or all of the following either now or in your own time:
Here is a list of additional functions that I have written for your use. I assume that your script begins with the fragment of PHP that I showed you at the start of lab sheet 4.
cf->get_k_nearest_items($k, $i_id, $a_id)
Of the items rated by user $a_id
, returns the $k
most similar items to item $i_id
The result is an array containing the nearest neighbours, in no particular order.
The length of this array will be no more than $k
and may be less
than $k
if the user has rated fewer than $k
items
Each neighbour in the array is represented as an associative array, whose keys are as follows:
j_id
: the neighbour's item idi_j_sim
: the degree of similarity between $i_id
and $j_id
a_j_rating
: user $a_id
's rating for item
$j_id
cf->get_k_nearest_items($k, $i_id)
Returns the $k
most similar items to item $_id
The result is an array containing the nearest neighbours, in no particular order.
The length of this array will be no more than $k
and may be less
than $k
if there are fewer than $k
items in the
database
Each neighbour in the array is represented as an associative array, whose keys are as follows:
j_id
: the neighbour's item idi_j_sim
: the degree of similarity between $i_id
and $j_id
cf->get_thresholded_nearest_items($k, $i_id, $a_id)
Of the items rated by user $a_id
, returns all items whose degree
of similarity to item $i_id
exceeds $threshold
The result is an array containing the nearest neighbours, in no particular order.
Each neighbour in the array is represented as an associative array, whose keys are as follows:
j_id
: the neighbour's item idi_j_sim
: the degree of similarity between $i_id
and $j_id
a_j_rating
: user $a_id
's rating for item
$j_id
cf->get_thresholded_nearest_items($k, $i_id)
Returns all items whose degree of similarity to item $i_id
exceeds $threshold
The result is an array containing the nearest neighbours, in no particular order.
Each neighbour in the array is represented as an associative array, whose keys are as follows:
j_id
: the neighbour's item idi_j_sim
: the degree of similarity between $i_id
and $j_id
cf->get_k_thresholded_nearest_items($k, $i_id, $a_id)
Of the items rated by user $a_id
, returns the $k
most similar items to item $i_id
provided their similarity to
$i_id
exceeds $threshold
The result is an array containing the nearest neighbours, in no particular order.
The length of this array will be no more than $k
and may be less
than $k
if the user has rated fewer than $k
items
Each neighbour in the array is represented as an associative array, whose keys are as follows:
j_id
: the neighbour's item idi_j_sim
: the degree of similarity between $i_id
and $j_id
a_j_rating
: user $a_id
's rating for item
$j_id
cf->get_k_thresholded_nearest_items($k, $i_id)
Returns the $k
most similar items to item $i_id
provided their similarity to
$i_id
exceeds $threshold
The result is an array containing the nearest neighbours, in no particular order.
The length of this array will be no more than $k
and may be less
than $k
if the user has rated fewer than $k
items
Each neighbour in the array is represented as an associative array, whose keys are as follows:
j_id
: the neighbour's item idi_j_sim
: the degree of similarity between $i_id
and $j_id