In my last post, I talked about what agents actually need from API responses — how minimal confirmations triggered verification loops and wasted more tokens than a richer response would have cost. That work happened inside Flowplane, where I was building MCP tools for an Envoy-based API platform.
But before the agent could deploy anything, it needed to know what the target APIs looked like. And that's where I hit a problem I've been hitting for years.
The Spec Problem
I've spent a lot of my career telling organisations they need OpenAPI specs. Good specs. Up-to-date specs. Machine-readable, version-controlled, reviewed-by-humans specs.
Almost nobody has them.
It's not because people don't care. It's because the process is painful. You either write YAML by hand — which is tedious and error-prone — or you generate specs from source code annotations, which only work if someone maintains the annotations. Spoiler: they don't.
With Flowplane, this wasn't a theoretical concern. I needed accurate schemas for APIs I was integrating with. Some had specs, most didn't, and the ones that did were missing fields, wrong on types, or a version behind. The usual story. But now I had a new consumer for those specs — an agent — and agents don't gracefully handle "close enough."
So I built Specwatch.
The Approach
Instead of trying to write specs or generate them from code, Specwatch figures out the schema from live traffic. Point it at any API, use the API normally, and it learns the schema from what it sees.
It runs as a local reverse proxy. No cloud, no agents, no sidecars. One CLI command:
npx specwatch start https://api.example.com --name "my-api"
That gives you a local proxy on localhost:8080. Use it instead of the real API — send your usual requests, run your test suite through it, whatever. When you're done:
npx specwatch export --name "my-api" -o openapi.yaml
You get an OpenAPI 3.1 spec. Or 3.0 — since a lot of tooling and API gateways still don't fully support 3.1, Specwatch can export either format. The more traffic you send through it, the better the spec gets — more fields, tighter types, higher confidence.
What I Didn't Expect to Be Hard
I assumed the hard part would be schema inference — figuring out types, handling nested objects, dealing with arrays of mixed types. And that was work, certainly. But the problem that surprised me most was path normalisation.
When you see /users/123 and /users/456, it's obvious those are the same endpoint with a path parameter. But what about /users/123/orders/789? Specwatch needs to figure out that both segments are parameters, and it needs to name them contextually — {userId} and {orderId}, not {param1} and {param2}. The naming comes from the preceding path segment. It sounds simple, but getting it right across a real API surface with dozens of endpoints took more iteration than I expected.
There were other surprises too — handling PATCH semantics so partial-update bodies don't pollute required field detection, confidence scoring so you know which parts of the spec to trust and which need more traffic. Each one felt like a small thing until it wasn't.
Why a Proxy?
I considered a few approaches. Browser extension, packet capture, log parsing. A reverse proxy hit the sweet spot: it works with any HTTP client, needs zero code changes, and the implementation stays simple.
The proxy is non-blocking — your response comes back first, and inference runs in the background. Zero latency impact on your actual work. And nothing sensitive gets stored. Raw request and response bodies are never persisted. Only the inferred schemas go into the local SQLite database on your machine.
Where This Came From
The inference engine is a TypeScript port of a Rust module I wrote for Flowplane. While porting it, I found and fixed a couple of bugs in the original — oneOf types were losing data because the implementation stored type names instead of full schemas, and the integer/number compatibility logic was backwards. integer widening to number should be compatible; the Rust code had it the other way around.
Building Specwatch as a standalone tool made those issues obvious in a way they weren't inside the larger system. That's something I keep experiencing: pulling a piece out of a bigger project and making it work on its own always improves it. The constraints are clearer. The tests are more focused. You can't hide behind the complexity of the surrounding system.
What's Next
Specwatch is at v0.1.0. It works, it's tested, but where it goes from here depends on how people use it. The backlog has ideas — HAR file import, Postman collection ingestion, breaking change detection between API versions — but I'd rather let real usage shape the priority than guess.
If you work with APIs that don't have specs — or have specs you don't trust — give it a try. It's MIT licensed and the whole thing runs locally.
github.com/rajeevramani/specwatch