Topic DL02: Understanding Backpropagation
A warm welcome to the second topic of the deep learning understanding the working of backpropagation .
Lets start with the understanding below
What is Backpropagation ?
The basic concept behind back propagation is to calculate error derivatives. After the forward pass through the net, we calculate the error function, and then update the weights through techniques like gradient decent , stochastic gradient , adam ,rms prop … The weights that minimize the error function is then considered as the parameters for our model.
Why do we need Backpropagation ?
While designing a Neural Network, the first step is to initialize the weights with some random values . So, it’s not necessary that whatever weight values we have selected is correct or fits our model is the best.Since we have randomly selected some weight values in the beginning , but our model output is way different than our actual output i.e. the error value(loss function) is huge. so we need a method to reduce the error and change the weight parameters such that the error becomes minimum.
Working of Backpropagation for a statistical data
Let’s understand the how backpropagation works with an example:
Consider the below table:

Now the output of your model when ‘W” value is 3:

As you can observe their is a difference between the actual output and the desired output:

Now lets change the value of ‘W’ and notice the error when ‘W’ = ‘4’

Now if you notice, when we increase the value of ‘W’ the error has increased. So there is no point in increasing the value of ‘W’ further. But, what happens if we decrease the value of ‘W’? Consider the table below:

Now,lets see what we did once more :
- We first initialized some random value to ‘W’ and propagated forward.
- Then, we noticed that there is some error. To reduce that error, we propagated backwards and increased the value of ‘W’.
- After that, also we noticed that the error has increased. We came to know that, we can’t increase the ‘W’ value.
- So, we again propagated backwards and we decreased ‘W’ value.
- Now, we noticed that the error has reduced.
So, we are trying to get the value of weight such that the error becomes minimum. Basically, we need to figure out whether we need to increase or decrease the weight value. Once we know that, we keep on updating the weight value in that direction until error becomes minimum. You might reach a point, where if you further update the weight, the error will increase. At that time you need to stop, and that is your final weight value.
Consider the graph below giving you an idea when to increase the weight and when to decrease it

How Backpropagation Works for a neural network ?
Now lets see how the backpropagation works for a neural netowrk .Consider the below Neural Network:

The above network contains the following:
- Two inputs
- Two hidden neurons
- Two output neurons
- Two biases
Below are the steps involved in Backpropagation:
- Step — 1: Forward Propagation
- Step — 2: Backward Propagation
- Step — 3: Putting all the values together and calculating the updated weight value
Step — 1: Forward Propagation
Lets propagate forward .

We will repeat this process for the output layer neurons, using the output from the hidden layer neurons as inputs.

Now, let’s see what is the value of the error:

Step — 2: Backward Propagation
Now, we will propagate backwards. This way we will try to reduce the error by changing the values of weights and biases.
Consider weight W5, we will calculate the rate of change of error w.r.t change in weight W5.

Since we are propagating backwards, first thing we need to do is, calculate the change in total errors w.r.t the output O1 and O2.

Now, we will propagate further backwards and calculate the change in output O1 w.r.t to its total net input.

Let’s see now how much does the total net input of O1 changes w.r.t W5?

Step — 3: Putting all the values together and calculating the updated weight value
Now, let’s put all the values together:

Let’s calculate the updated value of W5:

- Similarly, we can calculate the other weight values as well.
- After that we will again propagate forward and calculate the output. Again, we will calculate the error.
- If the error is minimum we will stop right there, else we will again propagate backwards and update the weight values.
- This process will keep on repeating until error becomes minimum.
Since I might not be an expert on the topic, if you find any mistakes in the article, or have any suggestions for improvement, please mention in comments.