CS6120 Assignment 2007-2008

Experiments with the MovieLens Datasets

Write a paper entitled Experiments with the MovieLens Datasets.

You are given five databases of movie ratings and five corresponding sets of test data, as follows:

Database nameTest data
cs6120_db1 cs6120_mltd_1.txt
cs6120_db2 cs6120_mltd_2.txt
cs6120_db3 cs6120_mltd_3.txt
cs6120_db4 cs6120_mltd_4.txt
cs6120_db5 cs6120_mltd_5.txt

A particular test dataset is to be used only with its corresponding database.

Use the data to run experiments, and present the details of the experiments and their results in your paper. Ideally, your experiments will make comparisons, e.g.:

Present your results in your paper. Give precise, concise explanations of what is being compared, what is being measured, and your experimental methodology.

Review the relevant research literature and discuss it in your paper. For example, you might discuss the usefulness of MAE as a measure of the quality of a prediction algorithm; or the difficulty of evaluating a recommendation algorithm; or other ways of evaluating prediction/recommendation systems. Your research may even help you to design new experiments (different algorithms, different formulae, different evaluation criteria, etc.) that you can run and whose results you can include in your paper.

Both elements (experimental results and discussion of the research literature) must be present in your paper. However, those of you who are less comfortable with programming can, to a limited degree, compensate for less experimentation by including a deeper and broader discussion, with more extensive use of the research literature. Similarly, those of you who are less comfortable with analytic writing can, to a limited degree, compensate for less discussion by including more programming and rigorous experimentation.

Format and submission

Your paper should comprise the following:

  1. a short abstract (200 words or so);
  2. an introduction;
  3. one or more sections and subsections presenting your discussion of the research literature and your experimental results;
  4. a final section that offers conclusions and ideas for future work; and
  5. a list of references, i.e. sources cited in the body of your paper.

Feel free to use tables, diagrams, charts and graphs to make the presentation of your work more vivid.

Note that the references section is not a list of things you've read. It is a list of things you've cited in the body of your paper.

Your paper must not exceed 10 pages in length, including all tables, diagrams, charts and graphs, and including the list of references. Do not feel obliged to write 10 pages: quality beats quantity.

Your paper must be an MS-Word document (using Word for Mac is, of course, OK), and you should use this template. Do not alter the format at all: keep the font, the text size, the margins, etc. etc. etc. etc. etc. exactly as they are in the template.

In no way, shape or form should you submit work as if it were your own when some or all of it is not. To do so is plagiarism and will meet with severe penalties. In this assignment, you must not plagiarise the programs, results, writings or other efforts of another student or any other third-party. Your papers and your PHP scripts will be checked for signs of plagiarism.

In reporting your exploration of the research literature be careful to avoid inadvertent plagiarism (e.g where 'paraphrases' of the source material are too close to the original) as well as avoiding deliberate plagiarism.

Small amounts of material may be quoted directly, where the exact wording of the original needs to be conveyed in your paper. But in these cases, the material must be presented within quotation marks; the quoted material must be followed by an immediate citation to your references section; and the work must be listed in your references section.

Even when not quoting directly, be scrupulous to use citations to acknowledge the influence of the research literature and to add support to claims that you make. Here again a citation should be given immediately, and the work must be listed in your references section.

We will use the Harvard system for citations. Brief details are available here, for example: Harvard referencing. I can advise on matters of detail, if necessary.

Not only should you avoid plagiarism and any other forms of collusion, you must also avoid falsification and fabrication of results.

You may, of course, ask me questions. I may share questions and answers with the class, if I feel they are general matters, for example, of clarification. But I will also discuss with you questions that relate to your own PHP scripts, your experiments and your reading and, in the interest of giving you proper credit for your endeavours, these will not be shared with the class.

The assignment must be submitted by 10 a.m. Tuesday 18th March.

To submit:

  1. Create a folder whose name comprises your name and student id, as in this example: Hugh_Jeegoh_107123456
  2. Copy into this folder: and nothing else.
  3. Copy the folder into the CS6120 Submission Folder

When to start? Start now!

You may be running dozens and dozens of experiments. Each may take 10 minutes or more to run. So it's unwise to leave everything to the last minute. There is no way you can hope to run experiments and bring it all together in the final few hours before the deadline.

Good luck.