Your Docs Are Lying. Mine Update Themselves.

BackerLeader posted 3 min read

The Problem

Documentation doesn’t fail because people are careless.

It fails because of latency.

  • Code changes → docs updated later (maybe)
  • Systems evolve → docs stay frozen
  • New engineers → trust outdated information

Now scale that to reality:

33 microservices
Multiple chains (EVM, Solana, Starknet, XRPL…)
Independent deploy cycles
Zero centralised ownership

At that point, documentation isn’t just outdated.

It’s fiction.

What I wanted was simple:

A knowledge base that watches the codebase and updates itself.


What Even Is a Knowledge Base?

Before we get into the build, let's get the basics right.

A knowledge base is a structured, queryable collection of information about a system — not code, not logs, but understanding.

It answers questions like:

  • What does this service actually do?
  • What APIs does it expose?
  • How does data flow between components?
  • What changed last month — and why?

In a microservices architecture, a knowledge base typically includes:

  • Service docs — one file per service (purpose, APIs, models, behavior)
  • Architecture doc — how services interact and depend on each other
  • Sources registry — tracked repos, branches, last ingested commits
  • Activity log — a timeline of changes for audits and onboarding

Together, this becomes a single source of truth for both humans and AI agents.

The problem?

It doesn’t stay true for long.


The Architecture

The system has three core components:

    ┌──────────────┐
    │   Watcher    │
    └──────┬───────┘
           │ detects changes
           ▼
    ┌──────────────┐
    │ Ingestion AI │
    └──────┬───────┘
           │ updates docs
           ▼
    ┌──────────────┐
    │ Knowledge DB │
    └──────┬───────┘
           │ PR for review
           ▼
       Engineers

1. The Watcher

A lightweight server (built with Elysia) runs a cron job every hour.

It reads a SOURCES.md file:


| Service     | Repo | Branch | Last Commit |
| ----------- | ---- | ------ | ----------- |
| core-engine | ...  | main   | 9559715     |

For each service:

  • Fetch latest commit from Gitea
  • Compare with stored hash

If changed → mark for ingestion

Example logs:


[cron] ↑ core-engine: 9559715 → 393ca68 (changed)
[cron] ↑ core-daemon: bafe49e → e8b58de (changed)
[cron] = core-comms: e029062 (up to date)

````

---

### 2. The Ingestion Agent (The Brain)

This is where things get interesting.

For each changed repo:

1. Shallow clone
2. Generate a **targeted diff**
   ```bash
   git diff OLD_COMMIT NEW_COMMIT

3. Pass diff to an AI agent

---

#### Why Diff-Based?

Instead of reading entire codebases (slow + expensive), the agent sees only:

> **what actually changed**

This reduces:

* token usage (~10x reduction)
* noise
* irrelevant processing

---

#### What the Agent Actually Does

This is the magic part.

It doesn’t just “update docs.”

It **understands changes semantically**:

* Detects new API endpoints
* Identifies modified data models
* Tracks service interaction changes
* Ignores refactors, tests, formatting noise

Then it:

* Updates the specific service doc
* Updates `architecture.md` if flows changed
* Maintains consistency across the system

---

### 3. The Knowledge Base

Everything lives in a dedicated repo (`core-context`).

After processing:

core-context/
├── services/
│ ├── core-engine.md
│ ├── core-daemon.md
├── architecture.md
├── SOURCES.md
└── ACTIVITY_LOG.md


Workflow:

1. Create branch:

   ```
   auto/ingest-2026-04-21
   ```

2. Commit updates

3. Open PR:

ingest: update docs (2026-04-21)

Updated:

  • core-engine: 9559715 → 393ca68
  • core-daemon: bafe49e → e8b58de
    
    
  1. Human review → merge

What It Looks Like in Practice

[Deploy happens]
        │
        ▼
[Watcher detects change]
        │
        ▼
[Repo cloned + diff generated]
        │
        ▼
[AI processes changes]
        │
        ▼
[Docs updated automatically]
        │
        ▼
[PR created]
        │
        ▼
[Team reviews + merges]

Time from code change → documentation update:

~1 hour


The Tricky Parts

Bun Compatibility

The Agent SDK spawns a subprocess.

Bun broke due to missing DNS APIs:

TypeError: q.addAddress is not a function

Fix: Force execution via Node.js


Token Limits

Early approach:

“Read entire repo”

Result:

  • Massive token usage
  • Rate limit issues

Fix: Diff-only ingestion
→ ~10x efficiency improvement


Environment Propagation

Overriding env in subprocess broke runtime:

  • Lost PATH
  • Missing configs

Fix: Let SDK inherit environment naturally


Why This Works

The key insight:

Documentation rot is a latency problem.

Approach Latency Result
Manual updates Days/weeks Outdated docs
“We’ll update later” Infinite Dead docs
This system ~1 hour Always accurate

The Real Impact

This isn’t just “cool automation.”

It changes how teams operate:

  • New engineers read reality, not stale docs
  • No more “tribal knowledge bottlenecks”
  • System understanding scales with codebase size
  • Documentation becomes something you trust

Final Thought

I didn’t set out to build a documentation system.

I set out to eliminate the need for discipline in maintaining one.

Because the truth is:

The best documentation system isn’t the one people remember to update.
It’s the one that updates itself.


Built with Elysia, Bun, and the Anthropic Agent SDK. Running across 33 microservices in production.

More Posts

AI Reliability Gap: Why Large Language Models are not for Safety-Critical Systems

praneeth - Mar 31

Agent Action Guard

praneeth - Mar 31

From Prompts to Goals: The Rise of Outcome-Driven Development

Tom Smithverified - Apr 11

AI Agents Don't Have Identities. That's Everyone's Problem.

Tom Smithverified - Mar 13

Your Tech Stack Isn’t Your Ceiling. Your Story Is

Karol Modelskiverified - Apr 9
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!