Why Do Neural Networks Need the Chain Rule? How do we apply it?

Question

Why Do Neural Networks Need the Chain Rule? How do we apply it?

calendar_todayJul 3 • schedule2 min read

— Originally published at dev.to

Hello, I'm Ganesh. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star git-lrc on GitHub to help more developers discover the project. Do give it a try and share your feedback for improving the product.

In the previous article, we introduced backpropagation and learned that neural networks improve by reducing prediction errors.

We also saw that backpropagation relies on two fundamental ideas:

The Chain Rule
Gradient Descent

But we haven't yet answered an important question:

How does we calculate wieghts and biases to decrease the error?

To answer that, let's look at a very small neural network.

A Simple Neural Network

Imagine a neural network with:
Similar to the previous example.

One input neuron
Two hidden neurons
One output neuron

Calculating Last Bias In the last layer

Let's asssume we have wieght and bias of all hidden layer and we only want to find last bias b3

Now from gradient descent, we can update the last bias b3 using the partial derivative of loss with respect to b3

The Error rate is done with Residuals.
Residual = Observed - Predicted

SSR = Sum of (Observed - Predicted)^2

So, We take 3 samples for training

Starting, Ending and middle values.

Finaly By calculating SSR.

Use of Chain Rule

We actually calculated b3 only using gradient descent.

Now Using chain Value generated from the weight and bias of previous layers

Predicted = Top Layer + Bottom Layer + Bias (b3)

Using Chain Rule we can write Dirivative of SSR with

dssr/db3 = dssr/dpredicted * dpredicted/db3

dssr/dpredicted = (Observed - Predicted)^2

As predicted, it is not constant and we are dirving it.

dssr/dpredicted = 2*(Observed - Predicted)*(d(Observed - Predicted))/dpredicted)

dssr/dpredicted = 2*(Observed - Predicted)(-1)
dssr/dpredicted = -2(Observed - Predicted)

For dpredicted/db3

dpredicted = Top Layer + Bottom Layer + Bias (b3)
Both Top Layer and Bottom Layer is constant for this calculation
dpredicted/db3 = 1

Finaly dssr/db3 = -2*(Observed - Predicted) * 1

Slop Calculation and Learning

Now we have 3 values of predicted for 3 samples

dssr/db3 = Σ(-2*(Observed-Predicted))

dssr/db3 = -2 * [(Observed1 - Predicted1) * 1 + (Observed2 - Predicted2) * 1 + (Observed3 - Predicted3) * 1]

dssr/db3 = -2 * [(Residual1) + (Residual2) + (Residual3)]

dssr/db3 = -2 * (ResidualSum)

For our training data I got slope = -15.7

step size = slope x learning rate

step size = -15.7 x 0.1 = -1.57

new b3 = old b3 + step size

new b3 = 0 + (-1.57) = -1.57

Then again, recalculating SSR with new b3 we got slop.

slop = -6.26

step size = -6.26 x 0.1 = -0.626

new b3 = -1.57 + (-0.626) = -2.196

Similarly after calculatinng multiple times utile we get step size close to 0.

Final Result
We found the optimal
b3 = 2.21

Conclusion

We could able to apply these chain rule, gradient descent and backpropagation in a very small neural network.

In next article we will discuss how to calculate wieghts and biases in same neural network.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	Internal Architecture of Neural Networks Ganesh Kumar - May 30
	Understanding Multiple Input and Output Neural Network Ganesh Kumar - Jul 15
	Breaking the AI Data Bottleneck: How Hammerspace's AI Data Platform Eliminates Migration Nightmares Tom Smithverified - Mar 16
	Understanding Backpropagation: How Neural Networks Learn from Their Mistakes Ganesh Kumar - Jul 5
	Understanding Chain Rule Ganesh Kumar - May 28

Why Do Neural Networks Need the Chain Rule? How do we apply it?

A Simple Neural Network

Calculating Last Bias In the last layer

Use of Chain Rule

Slop Calculation and Learning

Conclusion

0 Comments

Please log in to comment on this post.

More Posts

Internal Architecture of Neural Networks

Understanding Multiple Input and Output Neural Network

Breaking the AI Data Bottleneck: How Hammerspace's AI Data Platform Eliminates Migration Nightmares

Understanding Backpropagation: How Neural Networks Learn from Their Mistakes

Understanding Chain Rule

More From Ganesh Kumar

Understanding ReLU activation function for Neural Network

Understanding Multiple Input and Output Neural Network

What is the shared responsibility model in cloud computing?

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,746 amazing developers

Don't have an account? Sign up

OR

Why Do Neural Networks Need the Chain Rule? How do we apply it?

A Simple Neural Network

Calculating Last Bias In the last layer

Use of Chain Rule

Slop Calculation and Learning

Conclusion

0 Comments

Please log in to comment on this post.

More Posts

More From Ganesh Kumar

Related Jobs

Commenters (This Week)