When "Latest" Isn't Actually Latest
TL;DR
In distributed systems, the last record your system processes is not always the most recent event that happened in reality.
Understanding the difference between Event Time and Processing Time is essential to avoid data quality issues in streaming pipelines, CDC architectures, and event-driven systems.
The Assumption That Works... Until It Doesn't
Most developers start their careers working with a single application and a single database.
In that world, time feels simple.
- A customer updates their profile.
- The database stores the change.
- Everyone sees the updated record.
The sequence appears obvious because all operations happen within the same system.
Then distributed systems enter the picture.
A Simple Example
Imagine a customer updates their address at 10:00:00.
That change generates an event and begins traveling through a distributed architecture:
- Application
- Database
- CDC pipeline
- Kafka
- Consumer service
Everything looks normal.
Now imagine an older address update, generated earlier, gets delayed because of a network issue or retry mechanism.
The result?
The older event arrives after the newer one.
Your consumer processes the events in this order:
- New address
- Old address
If your logic assumes that the last processed record represents the latest truth, you've just overwritten correct information with stale data.
The Real Problem: Time Has Multiple Meanings
This is why distributed systems distinguish between two important concepts:
Event Time
The moment something actually happened in the real world.
Example:
- Customer changed address → 10:00:00
Processing Time
The moment a system receives or processes the event.
Example:
- Consumer receives event → 10:00:15
Most of the time, these timestamps are close enough that nobody notices.
The interesting cases appear when they aren't.
Where Things Get Complicated
Several factors can cause events to arrive out of order:
- Network latency
- Retries
- Consumer lag
- Parallel processing
- System failures
- Distributed transactions
- CDC replication delays
As architectures become more distributed, these situations become normal rather than exceptional.
What About Kafka?
Kafka helps preserve ordering within a partition through offsets.
However, offsets tell us the order in which records were written to Kafka.
They do not guarantee the order in which events actually occurred in the real world.
That's an important distinction.
A perfectly ordered Kafka partition can still contain events that represent out-of-order business actions.
Why Data Engineers Care
Many expensive data quality problems are not caused by broken infrastructure.
- Pipelines may be healthy.
- Kafka may be running perfectly.
- Consumers may be processing records successfully.
The issue is often more subtle.
The system answers the right question using the wrong sequence of events.
When that happens, dashboards become inaccurate, customer profiles become inconsistent, and downstream decisions become unreliable.
Final Thought
Distributed systems force us to rethink a concept we usually take for granted: time.
The challenge isn't simply moving data from one system to another.
The challenge is preserving the meaning of that data while it travels through a network of independent components.
Because in distributed systems, the most recent record is not always the most recent event.
And understanding that difference is often the first step toward building reliable data systems.