Uncertainty & Randomness — Why Data Is Never Perfect
By Vyoma Youth Society
If we collect data carefully, why are we still uncertain?
In the first lecture, we learned that data is not reality — it is a representation of reality.
Today, we take one step further and ask:
Why can’t data ever be perfectly certain?
Why do two measurements of the same thing give different results?
Why can’t we predict the future exactly?
Why does error always exist?
This brings us to two ideas:
uncertainty and randomness.
1️⃣ Every Measurement Has Error
No measurement is perfect.
A thermometer does not show the exact temperature of the room.
A survey does not show the exact opinion of society.
A step counter does not count every movement of your body.
Between reality and numbers, something is always lost.
So uncertainty is not a mistake.
It is unavoidable.
Uncertainty is part of measurement itself.
2️⃣ Randomness vs Determinism
Some systems are predictable:
- a calculator
- a clock
- simple machines
These are called deterministic systems.
Some systems are unpredictable:
- coin toss
- weather
- stock market
- human behavior
These are called random systems.
But here is a deep question:
Is randomness part of nature, or part of our ignorance?
Sometimes things look random because we do not know all the causes.
3️⃣ Random Variable: Turning Randomness into Numbers
In data science, we turn uncertain events into numbers.
These are called random variables.
Examples:
- Coin toss → 0 or 1
- Dice roll → 1 to 6
- Daily temperature → a number
We cannot predict the exact value.
But we can study how often values appear.
So instead of predicting one number,
we study a distribution of numbers.
4️⃣ Order from Chaos: The Average
If we toss a coin once, the result is unpredictable.
If we toss it many times, something interesting happens.
The average slowly becomes stable near 0.5.
This idea is called the law of large numbers.
It teaches us:
- individual events are uncertain
- large patterns are stable
Chaos at small scale, order at large scale.
5️⃣ Variance: Measuring Uncertainty
Two datasets can have the same average but different uncertainty.
Example:
- Dataset A: values are close together
- Dataset B: values are spread out
Both can have the same mean, but:
- one is reliable
- one is risky
Variance measures how spread out data is.
Mean tells us the center.
Variance tells us the uncertainty.
6️⃣ Why This Matters
Uncertainty affects decisions:
- weather prediction
- medical diagnosis
- economic policy
- social planning
A prediction without uncertainty is dangerous.
Good data science does not remove uncertainty.
It understands and measures it.
Reflection Challenges
Challenge 1:
Observe one uncertain thing in your daily life (weather, mood, traffic, steps).
Write:
- what changes
- what seems stable
Challenge 2:
Answer in one paragraph:
Is randomness in the world, or in our knowledge?
Connection to Previous Lecture
Lecture 1:
Data is a representation of reality.
Lecture 2:
That representation is uncertain.
Next Lecture will ask:
What is a model, and why do we need it?
Vyoma Data Science Initiative
Awareness • Mathematics • Open Knowledge