Overfitting and Underfitting

Overfitting and Underfitting

posted Originally published at dev.to 2 min read

These two terms overfitting and underfitting relate to situations in a machine learning algorithm, specifically in classification algorithms. For example, how to classify a guy being attractive or not attractive. You can use different features to help you in this classification; height, weight, salary, biceps size, and confidence. Before diving into overfitting and underfitting we have to understand where in the machine learning process these terms are relevant. This will help understanding overfitting and underfitting as well as a bit of the process of designing a classification machine learning algorithm.

A classification algorithm is an algorithm that given some values, it predicts the classification. For e.g., [height: 6'5'', weight: 190lbs, salary: 300k/year, biceps size: 18in, confidence: 95%], all of these values are considered a single data input. The machine learning algorithm will then classify this data input as either 'attractive' or 'not attractive'. This algorithm is able to classify this data input after it has been 'trained', meaning it has seen many data inputs with their provided classifications, so that future data inputs missing classifications can be classified.

In the process of classifying a data input you can fine tune the model to hopefully make a better prediction. There's a sweetspot that avoids both; overfitting and underfitting, both are bad by the way. The goal is to avoid both.

When adding the values of your handsome man to the model, it will land somewhere in a multidimensional place, surrounded by similar men that have slightly different values. Where your handsome man lands in this multidimensional place is what determines its classification, attractive, or not attractive. If your man lands in a place where there are 10 men, but you only care about the 5 closest ones and from these 5, 3 are attractive ones and 2 are non attractive ones, then your man is classified as attractive.

This number 5 is known as 'k', that tells the model to look at the k number of neighbors in order to classify your man.

When k is too large the model suffers from underfitting, the surrounding neighbors is not specific enough to predict your man with accuracy. Your man could be sitting with the most handsome 3 men but you told the model to look at the 10 nearest men, so instead of being classified as attractive it is classified as non attractive because there are other 7 men in the surrounding that are not attractive.

When k is too small the model suffers from overfitting, the number of surrounding neighbors is too small. The closest 2 men sitting next to your man are not attractive, but there are 3 other attractive men also sitting nearby. If you increased k to 5, your attractive man will correctly be classified as attractive.

The trick is to find the sweet spot so that k neighbors correctly represent your man.

All men are attractive, it's a matter of taste.

2 Comments

0 votes
0 votes

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

TypeScript Complexity Has Finally Reached the Point of Total Absurdity

Karol Modelskiverified - Apr 23

Your Tech Stack Isn’t Your Ceiling. Your Story Is

Karol Modelskiverified - Apr 9

Everyone says DeepSeek is cheaper, but I got tired of guessing the exact math. So I built a calculat

abarth23 - Apr 27

Why Prompt Engineering Is Just an Expensive Way to Be Incompetent

Karol Modelskiverified - May 21
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!