(PATS) Pattern Analysis - Temporal & Spatial
... several projects possible.Several phenomena are not always amenable to theory-driven modelling because the underlying theory is not well understood, or the mathematics are intractible, and pattern based data-driven modelling is often used. If historical patterns are recognisable, they can be used with reasonable probability to infer future behaviour, thus allowing appropriate action to be taken. Likewise, abnormal behaviour can provide advance warning of issues requiring attention, such as breakdown, virus, fraud, intrustion etc. Welcome to the world of statistics, augmented by developments in AI for partly automated detection of patterns and inference of correlations.
Obviously data mining techniques abound here and have applications ranging from predicting consumer behaviour to diagnostics within systems and processes.
Applications include: onset of catastrophic failure of a system component or cyber attack: intrusion detection, denial of service & spammer attacks: control e.g. resource allocation and scheduling in networks, be they computer, traffic or social; demand prediction: as in jobs in the IT sector; and indeed relevance metrics for text and graphics. Naturally, they can also be used to generate harmonious or discordant music (even allowing for taste!), based on analysis of similar style sequences... after all, music is just a spatio-temporal pattern!?
The application domain may be discrete with a finite set of values, or continuous, with an infinite range of values (which for all practical purposes can be quantized to a finite alphabet) so that the student need not get involved with the more esoteric aspects of time-series analysis, but restrict himself to simple finite patterns, which in the 1-D case reduce to a simple alphabet, so that parsing techniques or even simple string-matching approaches are applicable. More advance techniques include the use of combinations of basis functions, such as sinusoidal or restricted wavelet functions - but time constraints are likely to restrict this to all but the quite mathematically oriented. A simpler approach is to use neural networks, which can mimic any function and many good neural network simulators and toolkits exist.
There should be plenty of variations to support a range of projects whilst catering for individual flair and preferences !-)
Creativity and insight, lateral thinking (or disregard for conventional wisdom) reasonably good programming skills.
PATS - 1 Aptitude tests as a predictor of course success.
The idea is to correlate (using statistics, a neural network, or any other method), students performance on an online aptitude test with their known performance on an already completed a course or module, and then use the correlation as a predictor of others likely performance in the same courses, based on their performance in the aptitude test. Due care is taken with privacy and ensuring untraceability of marks.NB** 2 basic obstacles:-
i) need lots of candidates to take test for good stats. - not always possible;
ii) quite a lot of work, with test design, programming, & stats analysis - more suited to 2-person project which are currently not approved.
PATS - 2 SPAM MAPS ... (Ir-)relevance search.
(Ir-/)relevance searches and spam blocking techniques are somewhat similar in trying to analyse text for relevance. The basic approaches adopted include a)parsing and semantic analysis, but is considered too difficult or costly and unreliable for natural language, given the shades of meaning and idiom including sarcasm and irony; b) statistics on frequencies and juxtapositions of stemmed words work generally as well as a), but are equally inconsistent in identifying relevance, and afford an easy workaround for spammers. Neural and/or Bayesian statistics show good promise, and have long been recognised as a reliable, consistent and transparent learning method. However, relevance searches also naturally fall into the area of pattern analysis. It is envisaged that a simple combination of traditional methods will be evaluated, initially in through literature searches, and with a simple proof of concept implementation (e.g. Neural and/or Bayesian methods applied to stats of word occurrences within very broadly parsed structures). Where possible, information on traffic of similar messages will also be used to identify SPAM without rejecting important alerts. It could even be used to aid project selection!?Students should have good programming skills, and if Bayesian methods are adopted, should also have reasonable mathematical / statistical ability.
PATS - 3 Pattern analysis based prediction for supply chain management.
The aim is to anticipate demand from previous patterns, to have acceptable response times; in this sense it resembles many on-line real-time resource allocation problems e.g. scheduling/memory allocation and QoS within any networked system. There are a number of fundamental approaches possible Applications include any retail supply chain; (e.g. E-comm with JIT Stock control) where the aim should also be to minimise stock holding costs: expense and storage, but the emphasis is on pattern analysis, prediction and allocation rather than web based e-comm.PATS - 4 Time-based spectral analysis for identification of chemical or physical processes.
Physical & chemical processes typically emit (or absorb) energy, during transitions. Such energy is generally emitted either as electromagnetic (e.g. light, x-rays, radio or heat) or as sound during transitions, and is in frequencies characteristic of the underlying process. Examples include optical spectroscopy to identify traces of compounds in samples, mechanical strain in materials and even acoustic properties prior to failure of a mechanical component. The latter applies even to events of seismic proportion such as earthquakes and volcanoes, which is supposedly why animals frequently become agitated and flee prior to such impending disasters, since they hear sounds beyond our frequency range. The aim in this project is map the change in frequencies over time and use it as a signature to identify the underlying process; speech recognition being a particularly complex example. A much simpler goal is attempted here, basically to map the peaks and troughs of a spectrum (basically a histogram of frequencies as seen in the graphic equaliser display on an audio system) over time and use it as a signature of the underlying process. Data from novel real world applications are available.PATS - 5 Signature verification.
Real-time pressure-sensitive signature verification is almost foolproof and virtually impossible to forge, whereby the characteristic speed and pressure over the trace of the signature is much more reliable than the actual visual outline of the signature. Legally, a written record of a signature is still acceptable, but relatively easy to forge. The aim of this project is to attempt to verify signatures written in the past in an attempt to prevent fraud.As you know, these systems are commonplace and commercially available.
PATS - 6 Character recognition.
Basically a simpler version of the above, for use in optical character recognition, and can be used to recognise basic restricted printed character sets, or extended for handwriting recognition. For the former, certain graphics filters with renormalisation (shift & stretch) to match a standard template might suffice as a first approximation. This approach could be extended through the use of more general pattern-matching techniques involving neural networks and applied to recognition of characters printed by machine, or by hand, or in the end, even be used to recognise handwriting.As you know, these systems are commonplace and commercially available.
PATS - 7 Biometric Identification
Might as well go the whole way and investigate other aspects of pattern recognition: biometrics; fingerprints, iris, voice pattern, and face recognition.As you know, these systems are commonplace and commercially available.
To get a sample of what is available commercially:
http://www.neurotechnologija.com/
Coincidentally, or otherwise, they also have some interest in robots, patly due to the use of neural networks in pattern recognition for scene recognition purposes, and also the use of neural networks for control & diagnostic purposes.
PATS - 8 Weather Patterns
Of daily if not hourly interest. Rather than build a global model, attempt to build and verify a local pattern based predictor, which performs time-series correlations of past weather with local measurements, temperature, humidity, visibility etc, for which local data is readily available e.g. the airport. You could possibly extend the predictive range by including cloud patterns or satellite weather maps. Of course, you could also extend it to climate prediction or other speculative interests, such as the usual weather folklore: red skies by night or morning, delight or warning?; the birds and the bees, and their nests in the trees, the fish in the sea, etc.(Note that there are sound scientific reasons for the red sky inferences, and it is an illustration of deductive reasoning, with comments on these observations recorded as far back as Biblical times, long before science could explain it.)