The following disclaimer applies, and is copied into the start of each relevant page, in case they are taken out of context.
This is just a brief overview of some techniques presented in an entirely informal way, to give an appreciation of the basic principles by cutting away all the confusing details, and therefore is not to be taken as an exact comprehensive summary, but merely indicative of the underlying principles in the general case.
Artificial Neural Networks.
Don't be put off by artificial neural networks (ANNs), whether or not you have any experience of Artificial Intelligence, they may appear forbidding and mathematical, but the basic principles are simple and there are good simulators out there for experimentation. After all, your brain is a neural network, so you already have some experience!
ANNs mimic biological brains to some extent in that if sufficient sensory inputs are received, then an output is generated. They also share many of the features of biological brains in that they are self-training, degrade gracefully (!?) in the presence of damage or noisy and incomplete data, and are prone to errors somewhat analogous to optical illusions or mistaken recognition. However, they are generally nowhere near as complex as a biological brain in terms of numbers of neurons and their interconnections, and brains are not yet fully understood.
At a functional level they have been shown to be universal function approximators, which basically means they can learn to mimic any function. This makes them ideal for pattern recognition tasks and by extension control applications, since if they recognise the situation, this can implement the appropriate control.
Another way of looking at them is through their similarity with fuzzy logic, where instead of crisp Boolean logic functions with sudden switches between sates, there is a more gradual transition.
In ways neural networks harness both the advantages of analogue and digital systems: analogue, in that the values have continuous rather than abrupt transitions; and digital in that values are limited throughout the network, so that no strong input dominates or swamps all others.
This leads naturally to a simple metaphorical attribute-based explanation of a basic 3-layer feedforward neural-network, which consists of an input-layer, a hidden middle layer and a final output layer. Assume the network is trained, in which case all information passes forward from input, through the middle layer to the output layer.
Think of it as any diagnostic (or classification) facility, with the
- input layer being the source of the various samples or metrics which are then distributed to the
- hidden middle layer which are experts in diagnosing a particular attribute or condition from the collection of samples, which then forward their results to the
- output layer which is a like a panel of combined ultimate decision makers and spokesmen. again, each an expert in forming final conclusions based on the reports of the attribute experts in the previous stage.
An illustrative example might be classification of fruit from images given to the input layer.
These images are passed to hidden layer experts, who are experts in estimating identifying attributes such as weight, size, shape, colour, texture from images - yes it's a bit of a stretch but it suits the purpose (The underlying reason is to have enough layers to implement XOR or linearly separable functions, but never mind for now!). The middle-layer expert's confidence or measures of these identifying attributes are then in turn passed to the output layer, who are experts in classifying fruit according to various weighted combinations of these middle-layer attributes; and so the network can classify fruit from a network of 2 layers of experts!
Each expert has learned from experience (training) the importance or weight to allocate to each input data available to reach his conclusion, and correspondingly, gives an output indicating his confidence level in his particular specialist decision. This has 2 effects; a) moderation in that no matter how definite or strong an indicator is, the expert is limited to 100% confidence in his conclusion so that one decision does not swamp the entire network; and b)his decision is based on a number of inputs and therefore arguably quite robust logically as well as to noise or missing data.
However, in practice, networks may train and settle down to their own unknown middle-layer attribute experts, and extracting rules or confidence measures is a non-trivial task; analogous somewhat to humans acting on a hunch which may be correct or not, but are unable to fully express the rationale. There always remains the risk that an ANN will fall for the equivalent of an optical illusion.
During training, feedback is allowed to alter weights to minimise the overall error function. Once again the aim is to find the global optimum and avoid getting stuck in local optima, so that some of the many optimisation techniques have been applied.
Likewise, since neural networks minimise the overall error function, they have been used to optimise the objective function of a solution by making the error function reflect the objective function, but is quite challenging.
Temporal Difference Neural Networks (TDNN) have been used in learning to maximise the objective in board games such as backgammon.
Tricks of the trade.
- data preconditioning - generally normalise or transform the input data to speed up training;
- bias functions - to help bias the neuron inputs to the right range for the activation function, so that speed of learning via weight adjustment is optimised;
- split the data into representative training & validation test sets for supervised learning;
However, it is best to use a good investigative simulator such as :
http://www-ra.informatik.uni-tuebingen.de/SNNS/
More, if not far too much, information is available at the following link:
http://www.faqs.org/faqs/ai-faq/neural-nets/