Understanding Backpropagation: Chain Rule, SSR Gradients, and Weight Updates in Neural Networks

Question

Understanding Backpropagation: Chain Rule, SSR Gradients, and Weight Updates in Neural Networks

calendar_todayJul 1 • schedule2 min read

— Originally published at dev.to

Hello, I'm Ganesh. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star git-lrc on GitHub to help more developers discover the project. Do give it a try and share your feedback for improving the product.

In the previous article, We derived fromula for pridicting b3 now we will understand how to dereive wieghts w3 and w4.

How wights are connected to previous output layer?

In previous direvation we considered b3 as variable and w3 and w4 as constants.

That means we need have w3 and w4 as a variable to find it's value.

Wieghts w3 and w4 are multiplied to activation function of both top and bottom neurons.

So, Actication function of top neuron is 1

As it is soft plus function

x1 = input x w1 + b1

y1=f(x1)= log(1+e^x)

Similarly for bottom neuron

x2 = input x w2 + b2

y2=f(x2)= log(1+e^x)

So, Finaly we get

Predicted = y1 * w3 + y2 * w4 + b3

So, finaly we get

SSR = Σ (observed − predicted)²

How Each Values are Calculated?

Now By applying to previous formula by applying direvation w.r.t w3, w4 and b3.

dSSR/dw3 = dSSR/d(predicted) * d(predicted)/dw3

dSSR/dw4 = dSSR/d(predicted) * d(predicted)/dw4

dSSR/db3 = dSSR/d(predicted) * d(predicted)/db3

We can see dSSR/d(predicted) is common in all three direvation.

dSSR/d(predicted) = 2 * (Predicted - Observed) * -1

Now, for d(predicted)/dw3

d(predicted)/dw3 = d(y1 * w3 + y2 * w4 + b3)/dw3 = y1

As remaining all are constant w.r.t w3.

similarly for d(predicted)/dw4

d(predicted)/dw4 = d(y1 * w3 + y2 * w4 + b3)/dw4 = y2

As remaining all are constant w.r.t w4.

Now, for d(predicted)/db3

d(predicted)/db3 = d(y1 * w3 + y2 * w4 + b3)/db3 = 1

As remaining all are constant w.r.t b3.

Now Finaly we get

dSSR/dw3 = dSSR/d(predicted) * d(predicted)/dw3 = 2 * (Predicted - Observed) * -1 * y1 = -2 * (Predicted - Observed) * y1

dSSR/dw4 = dSSR/d(predicted) * d(predicted)/dw4 = 2 * (Predicted - Observed) * -1 * y2 = -2 * (Predicted - Observed) * y2

dSSR/db3 = dSSR/d(predicted) * d(predicted)/db3 = 2 * (Predicted - Observed) * -1 * 1 = -2 * (Predicted - Observed)

Improving Prediction with self Learning

Now we calculate dSSR/dw3, dSSR/dw4, and dSSR/db3.

Then we try to make the value near to 0 hence making the error minimum.

Step size = derivation * Learning rate

New w3 = old w3 - Step size w3

Do the same thing for w4 and b3

Conclusion

We got an idea how weights are calculated for a single neuron. Now we can extend this idea to multi layers.

In next article we will see how to calculate weights for multi layers.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	Why Do Neural Networks Need the Chain Rule? How do we apply it? Ganesh Kumar - Jul 3
	Understanding Multiple Input and Output Neural Network Ganesh Kumar - Jul 15
	Internal Architecture of Neural Networks Ganesh Kumar - May 30
	Understanding Backpropagation: How Neural Networks Learn from Their Mistakes Ganesh Kumar - Jul 5
	Understanding Idea behind Full Backpropogation Ganesh Kumar - Jul 2

Understanding Backpropagation: Chain Rule, SSR Gradients, and Weight Updates in Neural Networks

How wights are connected to previous output layer?

How Each Values are Calculated?

Improving Prediction with self Learning

Conclusion

0 Comments

Please log in to comment on this post.

More Posts

Why Do Neural Networks Need the Chain Rule? How do we apply it?

Understanding Multiple Input and Output Neural Network

Internal Architecture of Neural Networks

Understanding Backpropagation: How Neural Networks Learn from Their Mistakes

Understanding Idea behind Full Backpropogation

More From Ganesh Kumar

Understanding ReLU activation function for Neural Network

Understanding Multiple Input and Output Neural Network

What is the shared responsibility model in cloud computing?

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,759 amazing developers

Don't have an account? Sign up

OR

Understanding Backpropagation: Chain Rule, SSR Gradients, and Weight Updates in Neural Networks

How wights are connected to previous output layer?

How Each Values are Calculated?

Improving Prediction with self Learning

Conclusion

0 Comments

Please log in to comment on this post.

More Posts

More From Ganesh Kumar

Related Jobs

Commenters (This Week)