Linear Regression with and without Scikit learn Library

Question

Linear Regression with and without Scikit learn Library

Sanjidh090 posted Feb 23 3 min read

Exploring Linear Regression: Theory & Implementation

Introduction

Linear Regression is one of the fundamental techniques in Machine Learning and Forecasting. It establishes a relationship between an independent variable (X) and a dependent variable (Y) by fitting a straight line to the data. The equation of this line is:

[ Y = mX + c ]

where m is the slope and c is the intercept.

In this article, we will implement Linear Regression in two ways:

Using Scikit-Learn: A quick and efficient approach using the LinearRegression model.
Without Scikit-Learn: Manually computing the parameters to understand the underlying mathematics.

Understanding Linear Regression

The Mathematics Behind Linear Regression

Linear Regression aims to minimize the error between predicted and actual values using the Least Squares Method. The formula for m and c is calculated as:

Slope (m):

[ m = \frac{ n \sum(XY) - \sum X \sum Y }{ n \sum X^2 - (\sum X)^2 } ]

Intercept (c):

[ c = \frac{\sum Y - m \sum X}{n} ]

Where:

X = Independent variable (Years of Experience)
Y = Dependent variable (Salary)
n = Number of data points

Implementing Linear Regression

Using Scikit-Learn

The easiest way to apply Linear Regression is through Scikit-Learn. Let's walk through the implementation:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Load dataset
df = pd.read_csv("Salary_dataset.csv")
df = df.drop('Unnamed: 0', axis=1)

# Prepare data
X = df[["YearsExperience"]]  # Independent variable
y = df["Salary"]  # Dependent variable

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Get parameters
m = model.coef_[0]  # Slope
c = model.intercept_  # Intercept
print(f"Equation of the Line: Salary = {m:.2f} * Experience + {c:.2f}")

# Predictions
y_pred = model.predict(X_test)

# Predict salary for 10 years of experience
experience = np.array([[10]])
predicted_salary = model.predict(experience)
print(f"Predicted Salary for 10 Years Experience: ${predicted_salary[0]:,.2f}")

# Evaluate the model
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

print(f"Mean Absolute Error: {mae:.2f}")
print(f"Root Mean Squared Error: {rmse:.2f}")

# Plot results
plt.scatter(X_train, y_train, color='blue', label="Training Data")
plt.scatter(X_test, y_test, color='red', label="Testing Data")
plt.plot(X, model.predict(X), color='green', linewidth=2, label="Regression Line")
plt.xlabel("Years of Experience")
plt.ylabel("Salary")
plt.title("Salary Prediction using Linear Regression")
plt.legend()
plt.show()

Manually Implementing Linear Regression

For a deeper understanding, we manually calculate the slope and intercept:

import numpy as np
import pandas as pd

# Load dataset
data = pd.read_csv("Salary_dataset.csv")
data = data.drop('Unnamed: 0', axis=1)

# Prepare data
y = data['Salary']
X = data['YearsExperience']

# Compute parameters manually
n = len(X)
sum_X = X.sum()
sum_y = y.sum()
sum_Xy = (X * y).sum()
sum_X_squared = (X ** 2).sum()

# Compute Slope (m) and Intercept (c)
m = (n * sum_Xy - sum_X * sum_y) / (n * sum_X_squared - sum_X ** 2)
c = (sum_y - m * sum_X) / n

# Display the equation
print(f"Equation: Salary = {m:.2f} * Experience + {c:.2f}")

# Predict salary for 10 years of experience
experience = 10
predicted_salary = m * experience + c
print(f"Predicted Salary for 10 Years Experience: ${predicted_salary:,.2f}")

# Save predictions
data['Predicted_Salary'] = m * data['YearsExperience'] + c
data.to_csv("salary_predictions.csv", index=False)
print("Predictions saved to salary_predictions.csv")

Common Questions

1. When should I use Linear Regression?

Linear Regression is useful when there is a linear relationship between the dependent and independent variables.

2. What are the assumptions of Linear Regression?

The relationship between X and Y is linear.
No multicollinearity (independent variables should not be highly correlated).
Homoscedasticity (constant variance of errors).
Residuals follow a normal distribution.

3. How do I evaluate my model's performance?

You can use:

Mean Absolute Error (MAE)
Root Mean Squared Error (RMSE)
R² Score (indicates how well the model explains the variation in Y)

Conclusion

In this article, we explored Linear Regression in depth by implementing it with and without Scikit-Learn. By manually computing the parameters, we gained insight into the mathematics behind the algorithm. Additionally, we evaluated model performance using MAE and RMSE, and visualized the results.

Understanding Linear Regression is crucial as it is widely used in forecasting, economics, and predictive modeling. By mastering both the theoretical and practical aspects, you can build robust models for real-world applications!

References:

Tags: Python, Machine Learning, Linear Regression, Scikit-Learn, Forecasting

If you read this far, tweet to the author to show them you care. Tweet a Thanks

chevron_left

Andrew Mewborn · Answer 1 · 2025-02-23T09:06:10+0000

Great article Sanjid ! Clear explanation and solid breakdown of theory vs. implementation. Maybe add a bit on handling outliers or feature scaling. A quick comparison of MAE, RMSE, and R² would also help. Overall, well-structured and informative!

	Setting up your environment with Jest and React Testing Library, and configuring Babel and Parcel Bhavik Bhuva - Feb 23
	Testing the performance of Python with and without GIL Andres Alvarez - Nov 24, 2024
	Learn how developers can reduce cloud and AI costs by 15-20% with automated optimization strategies from Aquila Clouds. Tom Smith - May 6
	Learn how to write GenAI applications with Java using the Spring AI framework and utilize RAG for improving answers. Jennifer Reif - Sep 22, 2024
	Enhancing Git Security and Workflow: A Comprehensive Guide to Signed Commits and a Linear History Waffeu Rayn - Sep 1

Linear Regression with and without Scikit learn Library

Exploring Linear Regression: Theory & Implementation

Introduction

Understanding Linear Regression

The Mathematics Behind Linear Regression

Slope (m):

Intercept (c):

Implementing Linear Regression

Using Scikit-Learn

Manually Implementing Linear Regression

Common Questions

1. When should I use Linear Regression?

2. What are the assumptions of Linear Regression?

3. How do I evaluate my model's performance?

Conclusion

0 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Setting up your environment with Jest and React Testing Library, and configuring Babel and Parcel

Testing the performance of Python with and without GIL

Learn how developers can reduce cloud and AI costs by 15-20% with automated optimization strategies from Aquila Clouds.

Learn how to write GenAI applications with Java using the Spring AI framework and utilize RAG for improving answers.

Enhancing Git Security and Workflow: A Comprehensive Guide to Signed Commits and a Linear History

Welcome to Coder Legion Community

with 2,578 amazing developers

Connect with

Already have an account? Log in

Linear Regression with and without Scikit learn Library

Exploring Linear Regression: Theory & Implementation

Introduction

Understanding Linear Regression

The Mathematics Behind Linear Regression

Slope (m):

Intercept (c):

Implementing Linear Regression

Using Scikit-Learn

Manually Implementing Linear Regression

Common Questions

1. When should I use Linear Regression?

2. What are the assumptions of Linear Regression?

3. How do I evaluate my model's performance?

Conclusion

0 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Setting up your environment with Jest and React Testing Library, and configuring Babel and Parcel

Testing the performance of Python with and without GIL

Learn how developers can reduce cloud and AI costs by 15-20% with automated optimization strategies from Aquila Clouds.

Learn how to write GenAI applications with Java using the Spring AI framework and utilize RAG for improving answers.

Enhancing Git Security and Workflow: A Comprehensive Guide to Signed Commits and a Linear History