The Symptom
I started noticing something strange in my observability stack:
- Integration tokens were being minted repeatedly
- My token endpoint showed activity even when no user interaction was happening
- Metrics suggested constant “traffic” to an otherwise idle system
At first glance, it looked like:
- A security issue
- A rogue client
- Or a broken API consumer
It was none of those.
The Root Cause
The issue came down to a subtle but critical architectural mistake:
I was using a non-shared cache in a multi-worker environment.
Stack involved:
- PHP-FPM (2 workers)
- APCu (in-memory cache)
- Token-based integration between services
⚙️ What Went Wrong
APCu is process-local, not shared.
That means:
Worker A cache ≠ Worker B cache
Each PHP-FPM worker had its own isolated memory.
The Cascade Effect
My token logic was straightforward:
if token not in cache:
mint_new_token()
But in reality, the system behaved like this:
- Request hits Worker A → token exists → OK
- Next request hits Worker B → cache miss → mint new token
- Repeat across workers → continuous token regeneration
Why Observability Looked “Wrong”
From the outside, it looked like traffic was hitting the token endpoint.
But in reality:
The system was generating its own traffic due to cache inconsistency.
This is a key lesson:
- Not all traffic is external
- Some is emergent behavior from system design
✅ The Fix
I switched from APCu to:
Now:
All workers → same cache → consistent token state
Result:
- Tokens minted once
- Reused across all workers
- Metrics stabilized instantly
Production Hardening (What I Added Next)
Fixing the cache wasn’t enough — I hardened the system further.
1. Distributed Locking
To prevent race conditions:
if token exists:
return token
acquire lock
re-check cache
mint token if still missing
release lock
2. TTL Buffering
Avoid edge expiration issues:
cache_ttl = token_expiry - safety_margin
3. Observability Metrics
I added:
token_cache_hits
token_cache_misses
token_mint_count
Now anomalies show up immediately.
Key Takeaway
This wasn’t just a bug.
It was a distributed systems failure mode:
Cache locality + multi-worker architecture → inconsistent state → emergent traffic
⚡ Final Insight
If your system:
- Runs multiple workers
- Uses in-memory caching
- Relies on shared state
Then this rule applies:
If your cache isn’t shared, your state isn’t real.
Closing
This issue reinforced something critical in my engineering journey:
You don’t debug systems by staring at code —
you debug them by understanding how state flows across boundaries.
If you're building distributed APIs, token systems, or high-concurrency services —
this is one edge case worth designing for early.