DGB1 (BScCs). kNN versus kRNN
This project is no longer available. It has been assigned to Matt Mallen.
You want to predict the selling price of a house, q. You have a dataset that contains examples of recently-sold houses and their selling prices, D.
One simple method is k nearest-neighbours: find k houses in D that are similar to q. Your prediction for q is the mean of the prices of the neighbours. This method is referred to as kNN. One variant is to calculate a weighted-mean, so that the more similar houses count for more in the prediction.
But what happens if some members of D are unreliable? Maybe they are noisy (e.g. erroneous) or unrepresentative. Researchers have developed many algorithms for editing D: the algorithms try to identify and remove the problematic examples from D.
In this project, we will try a different approach, which I refer to as kRNN, k reliable-nearest-neighbours. It involves computing reliability values for each example in D. Predictions, e.g. of house prices, are then based on some combination of the house prices of the most similar, reliable examples in D. I have several ideas for different way of computing reliability, and several ideas of ways of combining similarity with reliability.
In this project, we will build and evaluate systems that explore these ideas. A good student will try several of the ideas on multiple datasets with varying degrees of artificially-introduced noise. An excellent student will compare with editing algorithms and other ideas from the relevant literature.
Background reading
- There is another description of kNN in my AI1 lecture notes
- RENN and BBNR are examples of the editing algorithms mentioned above.
- Others have published papers on reliability in kNN
Skills required
This project is suitable for a student who wants to study the research literature, has very good programming skills and is unafraid of mathematical notation.