Neural Networks
Lecture on 15.11.2007
Neuron Activation
- Many neuron axons are attached to another neuron
- Neuron “sums” the activation of all inputs
- If the input is above a threshold, the neuron fires
Perceptrons
- Sum of inputs*weights, threshold function
- could learn AND & OR but NOT XOR
- Only linearly separable functions
Multi-layer NNs
- Add a ‘hidden’ layer!
- Can solve any “reasonable” function (given enough nodes in the hidden layer)
- But how to train? This stopped research for ~30 years
Back-propagation
- Paul Werbos in 1974, further developed in 1986
- Requires threshold function to be differentiable
- Error is ‘propagated’ backwards through the network - units that affected the output the most change more
Threshold Function
Algorithm
- Randomize weights
- For each example e in training set
- “Activate” network: O = net(e); T = ‘correct’ output, error = (T - O)
- Calculate delta for each weight from hidden to output
- Calculate delta for each weight from input to hidden
- Update weights by delta
- Goto 2 unless error low enough
So for each example in your training set you make a Forward-Pass and let the net calculate the output. As second step you determine the error for each example. If the errors are under a certain threshold you can stop the training otherwise you continue with step 3. The next step is the Backward-Pass which is the innovative part. You give the error rate backwards through the neural network and so adjust the threshold-functions to a new level. You do this as long as your error rate is to high. But be careful and do not overtrain your net so it will just recognize specific things.
Batch vs. Online Learning
When updating the weights after calculating the delta for the entire training set, we call it batch learning.
If the weights are adjusted after each example this is called online learning.
Advantages of online learning
- often faster
- can be used when there is no fixed training set
- can be uses in nonstatic environments
Disadvantages of online learning
- can stuck on local minima and not finding the global minimum
- no natural plausibility
Learning Rate
The Question is how fast do we move towards the error minimum?
- if we move to slow we'll never reach the minimum
- if we move to fast we'll bounce around
- solution: trail and error
- good first guess: 1 / √(set size)


