“Why Averages Lie: Understanding Mean, Median & Mode with Real Data”

Leader posted Originally published at youngtechnologist.hashnode.dev 2 min read

???? Lecture 1 — Central Tendency of Data

Vyoma Data Science Initiative | Module 2: Statistics


???? Introduction

In any dataset, we often face a simple question:

Can we represent all this data using a single number?

Whether it’s exam scores, income levels, or age distributions — we try to summarize complexity into simplicity.

This idea leads us to central tendency.


???? What is Central Tendency?

Central tendency refers to the representative value of a dataset — the point around which the data tends to cluster.

It helps answer:

  • What is typical?
  • Where does the data concentrate?
  • What does this dataset “look like” overall?

???? Measures of Central Tendency

1️⃣ Mean (Average)

[ \text{Mean} = \frac{\sum x_i}{n} ]

  • Represents the balance point of the dataset
  • Highly sensitive to outliers

2️⃣ Median (Middle Value)

  • The middle value after sorting data
  • If even number of values → average of two middle values

???? More robust than mean in real-world datasets


3️⃣ Mode (Most Frequent Value)

  • The value that appears most often
  • Useful for both numerical and categorical data

⚠️ When Averages Mislead

Consider:

2, 3, 4, 5, 100
  • Mean = 22.8 ❌
  • Median = 4 ✅

???? The mean is distorted by an extreme value.


???? Insight

A single number cannot always capture reality.
The choice of measure defines the “truth” you see.


???? Practical Implementation (Python)

Using the Titanic dataset, we analyze the Age column.

???? Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

data = pd.read_csv(r"C:\Users\praso\vyomadatascience\module02\titanic.csv")

print(data.head())

print("Mean Age:", data["Age"].mean())
print("Mode Age:", data["Age"].mode())
print("Median Age:", data["Age"].median())

???? Optional Visualization

sns.histplot(data["Age"].dropna(), kde=True)
plt.title("Age Distribution (Titanic Dataset)")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.show()

???? Dataset Reference

You can download the Titanic dataset from:

(You can place it in your project folder: module02/titanic.csv)


???? GitHub Repo

Download the github repo of our course from here :

https://github.com/psjdeveloper/vyomadatascience


???? Final Reflection

Central tendency is not just about computing mean, median, or mode.

It is about understanding:

  • How data behaves
  • How summaries can distort reality
  • And how interpretation matters more than calculation

✍️ Closing Line

Data does not speak for itself.
The way we summarize it decides what it says.


1 Comment

2 votes
1

More Posts

What Is an Availability Zone Explained Simply

Ijay - Feb 12

Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI

Masbadar - Mar 12

Bridging the Silence: Why Objective Data Outperforms Subjective Health Reports in Elderly Care

Huifer - Jan 27

Lecture 3 — The Art of Collecting Data

Prasoon Jadon - Mar 9

Uncertainty & Randomness — Why Data Is Never Perfec

Prasoon Jadon - Mar 7
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

3 comments
2 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!