Let me tell you about Sarah.
This is a fictional story. But I bet you’ll recognize it.
I’ve seen this pattern play out across different companies, different teams, different tech stacks. The details change. The progression doesn’t.
Week 1: The Perfect Start
Sarah’s building a notification system for an e-commerce platform.
First requirement: send an email when someone places an order.
Simple. She writes one function. Webhook comes in, format the email, hit SMTP, done.
The whole thing is maybe 50 lines. It works perfectly. Code review approves it. It ships.
Sarah’s thinking: “It’s just one notification type. I’ll add proper abstraction when we actually need it.”
You’ve thought this too. So have I.
Nothing wrong with it. Week 1, this is the right call.
Week 3: The First Copy-Paste
Product team loves the email notifications. Now they want SMS for order shipments.
Mike picks up the ticket.
He opens Sarah’s code. Sees the pattern. Makes sense. He follows it.
New handler. Receives the shipment webhook. Formats the SMS message. Connects to Twilio. Sends it.
He copies some of Sarah’s email formatting logic because customers should see consistent information. Has to adjust it for the 160-character SMS limit, but the core logic is the same.
Mike’s thinking: “There’s some duplication with the email code, but SMS is different enough that abstracting it would be premature. It’s only two notification types.”
Deadline is tomorrow. This ships.
Still nothing catastrophically wrong here. Two types, small duplication, it’s manageable.
Right?
Week 5: User Preferences
Customers start complaining.
“I don’t want SMS notifications.”
“Why am I getting emails for every status change?”
Sarah adds user preferences. Creates a database table. Updates her email handler to check if the user wants that particular notification before sending.
The handler triples in size.
Query the database. Check multiple preference flags. Handle the case where preferences don’t exist yet. Default values. Edge cases.
Sarah’s thinking: “This is getting messy, but the deadline is tomorrow and this works. I’ll refactor it next sprint.”
I cannot tell you how many times I’ve heard “next sprint.”
(Spoiler: next sprint never comes.)
Week 7: Two Ways to Do Everything
Mike needs to add notifications for order cancellations and delivery confirmations.
He realizes hardcoding email bodies isn’t going to scale.
So he builds a template system. Creates a templates directory. Writes a simple renderer. Updates his handlers to load templates, populate data, send.
It’s actually pretty clean.
Meanwhile, Sarah’s handlers still use string formatting. She doesn’t know Mike built a template system. Mike didn’t announce it in Slack. It just… exists now.
The codebase now has two different ways of generating notification content.
Sarah finds out later. Thinks: “I should probably switch to Mike’s templates… but my code is working and I’m slammed with other features.”
And she is. Three new features this sprint. No time to refactor working code.
Week 9: The Third Approach
Emma joins the team.
First task: add Slack notifications for the support team when high-value orders come in.
She opens the notification code. Finds Sarah’s inline approach. Finds Mike’s templates. Neither makes sense for Slack.
Slack needs structured JSON payloads, not formatted text.
So Emma does what any good engineer would do: she creates a “proper solution”.
Notification service class. Methods for each notification type. Handles destination-specific formatting internally. Clean. Testable. Well-designed.
She shows it to the team in standup.
Mike: “That’s nice, but I don’t have time to refactor my SMS code right now. Maybe later.”
Sarah: “I like it, but my code has been running in production for months. If it ain’t broke…”
Emma’s service class gets used for Slack notifications. Nothing else changes.
Now there are three ways to send notifications.
Week 12: The Chaos Compounds
Product wants:
Push notifications for the mobile app
Digest emails (daily order summaries)
Ability to snooze notifications
Three developers. Three features. Same week.
Each one discovers the existing fragmentation. Each one makes their own call.
Developer A tries to extend Sarah’s inline approach. Adds push notification logic directly in the handler.
Developer B uses Mike’s templates but creates a new template format because the existing one doesn’t support digest layouts.
Developer C tries to use Emma’s service class but realizes it doesn’t handle scheduling or snoozing. So they add that logic directly in their handler instead.
The notification preferences table is now being updated by five different code paths.
Each developer added their own columns because they didn’t realize others had added similar fields. One stores preferences as JSON. Another uses boolean columns. Another created a separate preferences table with foreign keys.
I’ve seen this code review happen. Every PR gets approved. Every piece of code works.
Nobody did anything wrong.
And yet.
“Every PR got approved. Every piece of code worked. Nobody did anything wrong. And yet.”
Week 15: Customer Complaints
Support tickets start flooding in.
“I’m getting duplicate notifications.”
“I disabled email but I’m still getting them.”
“I’m not getting notifications at all for important orders.”
Sarah investigates. Opens the codebase.
Six different code paths handle notifications. Some check preferences before sending. Some check during sending. Some don’t check at all because the developer assumed another layer was handling it.
She finds the bug. It’s in her original email handler. The preference check is wrong.
She fixes it. Deploys.
Three other notification types break. They were relying on her buggy behavior.
The team estimate to fix it properly: “We need to stop and refactor everything first, or we’ll just make it worse.”
Management: “We don’t have time for a refactor. Just fix the bugs.”
Week 17: The Template Nightmare
Marketing wants to update email designs. New brand guidelines.
The developer assigned to this opens the codebase.
Templates are everywhere.
Some in a /templates directory. Some hardcoded as strings. Some in the database. Some fetched from an external CMS that one developer integrated without telling anyone.
There’s no single source of truth.
Worse: the data passed to templates is completely inconsistent.
Email templates expect order objects with certain fields. SMS templates expect a flattened structure. Push notifications expect a completely different format.
One design change requires touching dozens of files.
The developer estimates: “Two weeks, maybe three.”
Marketing: “It’s just a design update. How is that two weeks?”
Week 20: Performance Crisis
Black Friday.
The system crashes.
Investigation reveals: notification handlers are opening new database connections for every single notification sent.
Some handlers properly close connections. Some don’t.
Connection pools exhausted. Some handlers retry failed sends immediately and indefinitely, amplifying the problem during the outage. One handler spawns a goroutine for each notification but never limits concurrency.
The server runs out of memory processing a batch of 10,000 order confirmations.
Different developers made different assumptions about error handling.
Some silently swallow errors and log them. Some retry with exponential backoff. Some fail fast. Some store failed notifications in one database table for retry. Others use a different table. One developer integrated a third-party queue system that nobody else knew existed.
Notifications are getting lost between these systems.
I’ve been on calls where the CTO asks: “How many notification systems do we have?”
Nobody can answer.
Week 24: The Audit
Compliance team asks a simple question:
“Can you show us a record of all notifications sent to customer X in the past 90 days?”
The team cannot answer this.
Notification logs are scattered everywhere.
Some handlers log to stdout. Some to files. Some to a database table. Some don’t log at all.
The log formats are completely different. Some include the full message content. Some just log “notification sent” without details. There’s no correlation between the notification and the triggering event.
The auditor asks: “How do you ensure notifications contain required legal disclosures?”
Each template was created independently. Some include required legal text. Some don’t. There’s no centralized enforcement.
I’ve seen this audit happen. Teams spend weeks reconstructing logs manually.
The Breaking Point
VP of Engineering asks for a simple feature:
“Add an unsubscribe link to all emails.”
The team estimates: Three weeks.
The VP is shocked.
“It’s just adding a link. How is that three weeks of work?”
The tech lead explains:
“We have seven different code paths that send emails. Each uses a different templating system. Some render templates on the server. Some fetch them from external systems. Some are hardcoded strings. We need to update each one individually, ensure the unsubscribe logic is consistent across all of them, add tracking for unsubscribe events, update the preferences system to handle unsubscribes properly, and test everything thoroughly because there’s no centralized testing strategy.”
Three weeks. For a link.
The VP asks the obvious question: “How did it get this bad?”
What Went Wrong?
Here’s the thing that kills me about this story.
Nobody made a catastrophically bad decision.
Sarah’s Week 1 implementation was appropriate. Mike’s template system was a reasonable improvement. Emma’s service class was a genuine attempt to bring order.
Every single developer was trying to do good work under deadline pressure.
The problem wasn’t the individual decisions.
It was the absence of a shared architectural vision.
Without clear boundaries and layers, each developer made reasonable local optimizations that created global chaos.
The “I’ll refactor it later” moments never came because there was never a good time to stop feature development.
The “let’s standardize this” conversations happened but never resulted in action because no one had time to migrate existing code.
The codebase evolved organically.
And organic growth without structure doesn’t produce a garden. It produces a weed-infested lot.
“But This Is Just a Communication Problem”
You might be thinking: the real issue was that developers didn’t communicate.
If Sarah and Mike had talked, they wouldn’t have built two different templating systems. If Emma had socialized her service class better, others would have adopted it.
Better standups. Better code reviews. Better documentation. That’s what was missing, not architecture.
This is seductive because it’s partially true.
But here’s why it misses the point: architecture IS communication.
“Architecture IS communication. It’s the most important form of communication for technical decisions.”
It’s the most important form of communication for technical decisions.
Think about what actually happened in the story.
The team DID communicate. Mike showed his template system in code review. Emma presented her service class and got positive feedback. They had a meeting in Week 11 trying to align on standards.
The communication happened.
What didn’t happen was turning those conversations into durable, enforceable decisions.
This is the key difference:
Conversation says “we should probably do X.”
Architecture says “X is how we do things here, and here’s where it lives.”
“Conversation is ephemeral. Architecture is the artifact that persists after the meeting ends.”
When a new developer joins and asks “where should notification logic go?”, the answer shouldn’t require scheduling a meeting or hunting through Slack history.
It should be obvious from looking at the codebase.
Communication without architecture leads to the problem Emma faced. She built something good. People agreed it was good. And then… nothing changed.
Without architectural decisions being explicitly made (“from now on, all notifications go through NotificationService” ), the good idea just becomes another option in an increasingly fragmented codebase.
Good communication can prevent chaos. But it can’t survive bad processes.
When developers are under deadline pressure, working on different features, joining the team at different times, communication will have gaps.
Architecture is the safety net for when communication fails.
It’s the shared context that makes it possible to work somewhat independently without creating complete divergence.
So yes, the team in our story could have communicated better.
But the solution isn’t “communicate more.”
It’s “communicate the architecture and make it stick.”
Document where things belong. Make architectural decisions explicit. Enforce them in code review. Build structure that persists beyond any individual conversation.
Because at the end of the day, you can have all the Slack channels and standups and retros you want.
Without a shared architectural foundation, you’re just having the same conversations over and over while the codebase continues to fragment.
What Should Have Happened in Week 1
Sarah should have spent 30 minutes writing this:
# Notification System Architecture
## Where Things Live
- All notification logic → services/notification_service.py
- Templates → templates/ directory (Jinja2 format)
- Preference checks → services/preference_service.py
- Delivery logging → notification_log table
## How to Add a New Notification Type
1. Add template to templates/
2. Add method to NotificationService
3. Log delivery attempt (success or failure)
4. Add tests to test_notification_service.py
## Error Handling
- Retries: 3 attempts with exponential backoff (1s, 2s, 4s)
- Failed sends → dead_letter_queue table
- All errors logged with correlation ID
## Preferences
- Check preferences BEFORE sending (not during)
- Default: all notifications enabled
- Unsubscribe → set all preferences to false
That’s it. 30 minutes of work. Would have saved months of chaos.
<hr class="wp-block-separator has-alpha-channel-