Hello, I'm Ganesh. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star git-lrc on GitHub to help more developers discover the project. Do give it a try and share your feedback for improving the product.
In the previous article, We derived fromula for pridicting b3 now we will understand how to dereive wieghts w3 and w4.
How wights are connected to previous output layer?
In previous direvation we considered b3 as variable and w3 and w4 as constants.
That means we need have w3 and w4 as a variable to find it's value.
Wieghts w3 and w4 are multiplied to activation function of both top and bottom neurons.
So, Actication function of top neuron is 1
As it is soft plus function
x1 = input x w1 + b1
y1=f(x1)= log(1+e^x)
Similarly for bottom neuron
x2 = input x w2 + b2
y2=f(x2)= log(1+e^x)
So, Finaly we get
Predicted = y1 * w3 + y2 * w4 + b3
So, finaly we get
SSR = Σ (observed − predicted)²
How Each Values are Calculated?
Now By applying to previous formula by applying direvation w.r.t w3, w4 and b3.
dSSR/dw3 = dSSR/d(predicted) * d(predicted)/dw3
dSSR/dw4 = dSSR/d(predicted) * d(predicted)/dw4
dSSR/db3 = dSSR/d(predicted) * d(predicted)/db3
We can see dSSR/d(predicted) is common in all three direvation.
dSSR/d(predicted) = 2 * (Predicted - Observed) * -1
Now, for d(predicted)/dw3
d(predicted)/dw3 = d(y1 * w3 + y2 * w4 + b3)/dw3 = y1
As remaining all are constant w.r.t w3.
similarly for d(predicted)/dw4
d(predicted)/dw4 = d(y1 * w3 + y2 * w4 + b3)/dw4 = y2
As remaining all are constant w.r.t w4.
Now, for d(predicted)/db3
d(predicted)/db3 = d(y1 * w3 + y2 * w4 + b3)/db3 = 1
As remaining all are constant w.r.t b3.
Now Finaly we get
dSSR/dw3 = dSSR/d(predicted) * d(predicted)/dw3 = 2 * (Predicted - Observed) * -1 * y1 = -2 * (Predicted - Observed) * y1
dSSR/dw4 = dSSR/d(predicted) * d(predicted)/dw4 = 2 * (Predicted - Observed) * -1 * y2 = -2 * (Predicted - Observed) * y2
dSSR/db3 = dSSR/d(predicted) * d(predicted)/db3 = 2 * (Predicted - Observed) * -1 * 1 = -2 * (Predicted - Observed)
Improving Prediction with self Learning
Now we calculate dSSR/dw3, dSSR/dw4, and dSSR/db3.
Then we try to make the value near to 0 hence making the error minimum.
Step size = derivation * Learning rate
New w3 = old w3 - Step size w3
Do the same thing for w4 and b3
Conclusion
We got an idea how weights are calculated for a single neuron. Now we can extend this idea to multi layers.
In next article we will see how to calculate weights for multi layers.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star git-lrc on GitHub