Understanding Backpropagation: Chain Rule, SSR Gradients, and Weight Updates in Neural Networks

1 5 29
calendar_today agoschedule3 min read
— Originally published at dev.to

Hello, I'm Ganesh. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star git-lrc on GitHub to help more developers discover the project. Do give it a try and share your feedback for improving the product.

In the previous article, We derived fromula for pridicting b3 now we will understand how to dereive wieghts w3 and w4.

How wights are connected to previous output layer?

In previous direvation we considered b3 as variable and w3 and w4 as constants.

That means we need have w3 and w4 as a variable to find it's value.

Wieghts w3 and w4 are multiplied to activation function of both top and bottom neurons.

So, Actication function of top neuron is 1

As it is soft plus function

x1 = input x w1 + b1

y1=f(x1)= log(1+e^x)

Similarly for bottom neuron

x2 = input x w2 + b2

y2=f(x2)= log(1+e^x)

So, Finaly we get

Predicted = y1 * w3 + y2 * w4 + b3

So, finaly we get

SSR = Σ (observed − predicted)²

How Each Values are Calculated?

Now By applying to previous formula by applying direvation w.r.t w3, w4 and b3.

dSSR/dw3 = dSSR/d(predicted) * d(predicted)/dw3

dSSR/dw4 = dSSR/d(predicted) * d(predicted)/dw4

dSSR/db3 = dSSR/d(predicted) * d(predicted)/db3

We can see dSSR/d(predicted) is common in all three direvation.

dSSR/d(predicted) = 2 * (Predicted - Observed) * -1

Now, for d(predicted)/dw3

d(predicted)/dw3 = d(y1 * w3 + y2 * w4 + b3)/dw3 = y1

As remaining all are constant w.r.t w3.

similarly for d(predicted)/dw4

d(predicted)/dw4 = d(y1 * w3 + y2 * w4 + b3)/dw4 = y2

As remaining all are constant w.r.t w4.

Now, for d(predicted)/db3

d(predicted)/db3 = d(y1 * w3 + y2 * w4 + b3)/db3 = 1

As remaining all are constant w.r.t b3.

Now Finaly we get

dSSR/dw3 = dSSR/d(predicted) * d(predicted)/dw3 = 2 * (Predicted - Observed) * -1 * y1 = -2 * (Predicted - Observed) * y1

dSSR/dw4 = dSSR/d(predicted) * d(predicted)/dw4 = 2 * (Predicted - Observed) * -1 * y2 = -2 * (Predicted - Observed) * y2

dSSR/db3 = dSSR/d(predicted) * d(predicted)/db3 = 2 * (Predicted - Observed) * -1 * 1 = -2 * (Predicted - Observed)

Improving Prediction with self Learning

Now we calculate dSSR/dw3, dSSR/dw4, and dSSR/db3.

Then we try to make the value near to 0 hence making the error minimum.

Step size = derivation * Learning rate

New w3 = old w3 - Step size w3

Do the same thing for w4 and b3

Conclusion

We got an idea how weights are calculated for a single neuron. Now we can extend this idea to multi layers.

In next article we will see how to calculate weights for multi layers.

git-lrc

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

Star git-lrc on GitHub

🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

Internal Architecture of Neural Networks

Ganesh Kumar - May 30

Understanding Backpropagation: Calculating Gradients for Hidden Layer Weights and Biases

Ganesh Kumar - Jun 30

Understanding Chain Rule

Ganesh Kumar - May 28

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

Understanding Basic Data Structures for Web Development

MasterCraft - Feb 16
chevron_left
1.2k Points35 Badges
56Posts
5Comments
3Connections
I am tech enthusiast, IoT innovator, software developer.

Related Jobs

View all jobs →

Commenters (This Week)

1 comment
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!