The Problem
Documentation doesn’t fail because people are careless.
It fails because of latency.
- Code changes → docs updated later (maybe)
- Systems evolve → docs stay frozen
- New engineers → trust outdated information
Now scale that to reality:
33 microservices
Multiple chains (EVM, Solana, Starknet, XRPL…)
Independent deploy cycles
Zero centralised ownership
At that point, documentation isn’t just outdated.
It’s fiction.
What I wanted was simple:
A knowledge base that watches the codebase and updates itself.
What Even Is a Knowledge Base?
Before we get into the build, let's get the basics right.
A knowledge base is a structured, queryable collection of information about a system — not code, not logs, but understanding.
It answers questions like:
- What does this service actually do?
- What APIs does it expose?
- How does data flow between components?
- What changed last month — and why?
In a microservices architecture, a knowledge base typically includes:
- Service docs — one file per service (purpose, APIs, models, behavior)
- Architecture doc — how services interact and depend on each other
- Sources registry — tracked repos, branches, last ingested commits
- Activity log — a timeline of changes for audits and onboarding
Together, this becomes a single source of truth for both humans and AI agents.
The problem?
It doesn’t stay true for long.
The Architecture
The system has three core components:
┌──────────────┐
│ Watcher │
└──────┬───────┘
│ detects changes
▼
┌──────────────┐
│ Ingestion AI │
└──────┬───────┘
│ updates docs
▼
┌──────────────┐
│ Knowledge DB │
└──────┬───────┘
│ PR for review
▼
Engineers
1. The Watcher
A lightweight server (built with Elysia) runs a cron job every hour.
It reads a SOURCES.md file:
| Service | Repo | Branch | Last Commit |
| ----------- | ---- | ------ | ----------- |
| core-engine | ... | main | 9559715 |
For each service:
- Fetch latest commit from Gitea
- Compare with stored hash
If changed → mark for ingestion
Example logs:
[cron] ↑ core-engine: 9559715 → 393ca68 (changed)
[cron] ↑ core-daemon: bafe49e → e8b58de (changed)
[cron] = core-comms: e029062 (up to date)
````
---
### 2. The Ingestion Agent (The Brain)
This is where things get interesting.
For each changed repo:
1. Shallow clone
2. Generate a **targeted diff**
```bash
git diff OLD_COMMIT NEW_COMMIT
3. Pass diff to an AI agent
---
#### Why Diff-Based?
Instead of reading entire codebases (slow + expensive), the agent sees only:
> **what actually changed**
This reduces:
* token usage (~10x reduction)
* noise
* irrelevant processing
---
#### What the Agent Actually Does
This is the magic part.
It doesn’t just “update docs.”
It **understands changes semantically**:
* Detects new API endpoints
* Identifies modified data models
* Tracks service interaction changes
* Ignores refactors, tests, formatting noise
Then it:
* Updates the specific service doc
* Updates `architecture.md` if flows changed
* Maintains consistency across the system
---
### 3. The Knowledge Base
Everything lives in a dedicated repo (`core-context`).
After processing:
core-context/
├── services/
│ ├── core-engine.md
│ ├── core-daemon.md
├── architecture.md
├── SOURCES.md
└── ACTIVITY_LOG.md
Workflow:
1. Create branch:
```
auto/ingest-2026-04-21
```
2. Commit updates
3. Open PR:
ingest: update docs (2026-04-21)
Updated:
- core-engine: 9559715 → 393ca68
- core-daemon: bafe49e → e8b58de
- Human review → merge
What It Looks Like in Practice
[Deploy happens]
│
▼
[Watcher detects change]
│
▼
[Repo cloned + diff generated]
│
▼
[AI processes changes]
│
▼
[Docs updated automatically]
│
▼
[PR created]
│
▼
[Team reviews + merges]
Time from code change → documentation update:
~1 hour
The Tricky Parts
Bun Compatibility
The Agent SDK spawns a subprocess.
Bun broke due to missing DNS APIs:
TypeError: q.addAddress is not a function
Fix: Force execution via Node.js
Token Limits
Early approach:
“Read entire repo”
Result:
- Massive token usage
- Rate limit issues
Fix: Diff-only ingestion
→ ~10x efficiency improvement
Environment Propagation
Overriding env in subprocess broke runtime:
- Lost PATH
- Missing configs
Fix: Let SDK inherit environment naturally
Why This Works
The key insight:
Documentation rot is a latency problem.
| Approach | Latency | Result |
| Manual updates | Days/weeks | Outdated docs |
| “We’ll update later” | Infinite | Dead docs |
| This system | ~1 hour | Always accurate |
The Real Impact
This isn’t just “cool automation.”
It changes how teams operate:
- New engineers read reality, not stale docs
- No more “tribal knowledge bottlenecks”
- System understanding scales with codebase size
- Documentation becomes something you trust
Final Thought
I didn’t set out to build a documentation system.
I set out to eliminate the need for discipline in maintaining one.
Because the truth is:
The best documentation system isn’t the one people remember to update.
It’s the one that updates itself.
Built with Elysia, Bun, and the Anthropic Agent SDK. Running across 33 microservices in production.