Derek Bridge

B.Sc. and M.Sc. Projects

Overview for 2023-2024

DGB1 (BScCs). kNN versus kRNN

This project is no longer available. It has been assigned to Matt Mallen.

You want to predict the selling price of a house, q. You have a dataset that contains examples of recently-sold houses and their selling prices, D.

One simple method is k nearest-neighbours: find k houses in D that are similar to q. Your prediction for q is the mean of the prices of the neighbours. This method is referred to as kNN. One variant is to calculate a weighted-mean, so that the more similar houses count for more in the prediction.

But what happens if some members of D are unreliable? Maybe they are noisy (e.g. erroneous) or unrepresentative. Researchers have developed many algorithms for editing D: the algorithms try to identify and remove the problematic examples from D.

In this project, we will try a different approach, which I refer to as kRNN, k reliable-nearest-neighbours. It involves computing reliability values for each example in D. Predictions, e.g. of house prices, are then based on some combination of the house prices of the most similar, reliable examples in D. I have several ideas for different way of computing reliability, and several ideas of ways of combining similarity with reliability.

In this project, we will build and evaluate systems that explore these ideas. A good student will try several of the ideas on multiple datasets with varying degrees of artificially-introduced noise. An excellent student will compare with editing algorithms and other ideas from the relevant literature.

Background reading

  • There is another description of kNN in my AI1 lecture notes
  • RENN and BBNR are examples of the editing algorithms mentioned above.
  • Others have published papers on reliability in kNN

Skills required

This project is suitable for a student who wants to study the research literature, has very good programming skills and is unafraid of mathematical notation.

DGB2 (BScCS). Predicting differences using kNN+ANN

This project is no longer available. It has been assigned to Oliver Linger.

Begin by reading the description of kNN in DGB1 above. Another way to predict house prices is to train an artificial neural network (ANN) on dataset D so that, given a house , it can predict its selling price.

There are additionally a number of intriguing ways of combing the two and we will investigate some of these in this project.

Let's illustrate one of these ideas here. We have a dataset D = {<x1, p1>, <x2, p2>, ..., <xm, pm>} containing descriptions of houses x1, x2, ..., xm and their selling prices p1, p2, ..., pm. We can construct another dataset D' by taking pairs of examples, <xi, pi> and <xj, pj> from D and inserting their difference <xi - xj, pi - pj> into D'. We then train a neural network on D'. This neural network, given a pair of houses, predicts the difference in their price. So now, to predict the price of q, we find an example <x, p> in D (e.g. using kNN), we ask the neural network to predict the difference in price between p and x, and we apply this difference to x's price, p.

In this project, we will build and evaluate one or more systems of this kind, comparing them with kNN on its own and ANN on its own. A good student will implement several of the variants (see the Background reading below), and will apply the ideas across a range of domains, e.g. datasets that contains images, e.g. datasets that predict things other than numbers such as prices.

Background reading

Skills required

This project is suitable for a student who wants to study the research literature, has very good programming skills and is unafraid of mathematical notation.