We Ported NanoClaw from Claude to Codex. The Hard Part Was Not the Runtime.

We Ported NanoClaw from Claude to Codex. The Hard Part Was Not the Runtime.

posted 8 min read

NanoClaw started as an agent harness that was not just compatible with Claude, but heavily shaped by Claude while working inside its own repo. Porting it to Codex meant rewriting both the runtime and the instruction substrate that had been teaching the old agent how to extend the app.

Most "AI app ports" are not really ports.

Usually they are some combination of:

  • swap the SDK
  • change the model name
  • leave the runtime assumptions alone
  • declare victory

This was not that.

When Anthropic tightened Claude subscription usage for third-party harnesses, one thing became obvious fast:

Building a serious agent system on top of a first-party subscription runtime is not a stable foundation.

That did not instantly kill NanoClaw. But it did force the right question:

What parts of this system are actually our architecture, and what parts are just the residue of the agent runtime that originally helped build it?

That turned out to be the whole story.

NanoClaw Was Built Inside an Agent World

NanoClaw originally evolved in a Claude-native operating model.

The workflow looked roughly like this:

  • point Claude Code at the repo
  • give it a stack of markdown instructions
  • run command-driven setup flows
  • let it scaffold, extend, and refine the app from inside that environment

That means the old system was not merely "compatible with Claude."

It was, in a very real sense, built by an agent that also helped define the repo's conventions for how future work should happen.

So once we decided to make NanoClaw truly Codex-native, the job was not:

"replace Anthropic with OpenAI."

The real job was:

separate the application from the worldview of the agent that originally co-authored its architecture.

The Markdown Was Part of the Runtime

This is the part that matters if you have ever let an agent spend serious time extending a codebase.

NanoClaw did not just have docs. It had a markdown-based behavioral layer:

  • CLAUDE.md
  • .claude/skills/...
  • setup flows
  • installer guidance
  • capability docs
  • hidden conventions for how the repo should be extended

These files were not just passive notes.

They told the agent:

  • how to initialize the app
  • how to install integrations
  • how memory should work
  • how channels should be extended
  • how "skills" should be applied

So some of the .md files were effectively architecture, even if they were not executable TypeScript.

That meant the port had two distinct layers:

  1. Port the host/container runtime
  2. Audit and rewrite the instruction layer that had been shaping the system

If we only did the first, the app would still think in Claude-shaped terms.

What NanoClaw Actually Is

At its core, NanoClaw is a small Node.js orchestrator for running isolated assistants inside containers.

The host process owns:

  • channels
  • routing
  • SQLite state
  • scheduling
  • group registration
  • IPC
  • container lifecycle

The container owns:

  • the actual agent runtime
  • group-local context
  • active skills
  • delegated subagents
  • tool access through MCP

That architecture survived the port.

What changed was the substrate it was built on.

What Was Actually Claude-Specific

Before the port, Claude-specific assumptions were everywhere:

  • Claude SDK query loop
  • CLAUDE.md instruction contract
  • .claude/ session and state layout
  • .claude/skills/ install model
  • old slash-command setup flows
  • Claude-era delegation semantics
  • remote control built around Claude tooling

And because the app had evolved with Claude actively working in the repo, those assumptions were reinforced both in code and in markdown.

So the first real task was not coding.

It was classification:

  • what is host application behavior?
  • what is Claude-specific runtime behavior?
  • what is a real feature that just happens to be wearing Claude-shaped names?
  • which markdown files are still runtime-significant?
  • which ones are just historical scaffolding now?

That step mattered more than any single code change.

If you classify the layers wrong, you either delete real capabilities or preserve the wrong abstractions.

The Port Was Not "Remove Claude, Add Codex"

A real port required four categories of work.

1. Replace the runtime contract

The old container runner was built around Claude-native execution.

The new one is built around Codex CLI running non-interactively inside the container.

That meant:

  • building prompt context from host state plus group instructions
  • invoking Codex correctly for fresh turns and resumed turns
  • streaming results back through a stable host/container protocol
  • preserving session continuity where possible

That sounds simple until you hit the reality that CLI surfaces diverge in annoying ways.

We hit bugs where:

  • flags worked on codex but not codex exec
  • fresh turns accepted options that codex exec resume rejected

Those are the kinds of portability problems that do not show up in architecture diagrams. They only show up when the host queue is already working and production message flow starts failing on resume paths.

2. Rewrite the instruction layer

This was the difference between a fake port and a real one.

The Claude-native repo had accumulated behavior through markdown:

  • skills
  • setup flow
  • integration guidance
  • memory conventions
  • extension patterns

We had to go file by file and make hard calls:

  • convert
  • replace
  • prune
  • delete

That meant moving from Claude-specific conventions to Codex-native ones:

  • CLAUDE.md became AGENTS.md
  • repo and group skills became SKILL.md based
  • old .claude/skills/ assumptions were removed
  • setup had to be rewritten around the runtime that actually exists today

This was not glamorous work, but it was essential.

If the repo still teaches the wrong mental model to the next agent that works in it, the architecture immediately starts drifting backwards.

3. Keep the real product features

One easy failure mode in a port is to treat every old feature as vendor-specific and quietly delete it.

That would have been wrong here.

Some things really were Claude-specific and needed to die:

  • Claude SDK integration
  • Claude remote control
  • old Claude-only marketplace conventions

But other things were just product capabilities that happened to live in a Claude-native system:

  • Telegram
  • WhatsApp
  • Web UI
  • subagent delegation
  • host skills

Those needed to be ported, not discarded.

The right question was not:

"Does this mention Claude?"

The right question was:

"Is this a real application capability, and does Codex have a legitimate equivalent or integration path?"

4. Make the host own the architecture

This is the actual lesson of the whole project.

The clean shape is:

  • NanoClaw owns orchestration
  • NanoClaw owns state
  • NanoClaw owns channels
  • NanoClaw owns setup
  • NanoClaw owns skills
  • Codex is the agent backend

If the host owns those things, the backend is replaceable.

If the backend owns those things, the host is just branding.

The Bugs That Proved the Point

The most interesting bugs in this kind of port are not random. They reveal exactly where your assumptions still belong to the old system.

Bug 1: The bot looked alive but was brain-dead

Telegram connected.
Messages were reaching the database.
Logs looked active.

But nothing was actually being processed.

Why?

Because WhatsApp was loading even when it was not really configured, entering a reconnect loop, and blocking the async startup path before the message loop was fully alive.

That produced a nasty half-working state:

  • inbound messages landed in SQLite
  • logs looked normal enough
  • the main message loop never really came online
  • the system looked alive while functionally dead

The fix was to stop treating channel startup as monolithic:

  • load channels only from real config/auth state
  • allow channel connection failure without killing overall startup
  • skip missing channels cleanly instead of blocking the process

That sounds basic. It was not basic in context. It was a symptom of a system that had grown around agent-era assumptions instead of explicit host orchestration.

Bug 2: Auth existed, but Codex still returned 401

This was my favorite bug in the whole port because it was pure systems work.

The host was logged into Codex.
The container was supposed to inherit auth.
But every in-container turn failed with:

401 Unauthorized: Missing bearer or basic authentication in header

The credentials were not absent.
They were misplaced.

The chain looked like this:

  • auth was copied into a nested .codex/.codex/auth.json
  • that path was mounted into the container
  • Codex expected auth relative to $HOME/.codex/auth.json
  • the file existed on disk, just not where the CLI actually looked

This is the kind of bug that disappears if you only talk in high-level diagrams.

It only becomes obvious when you trace:

  • host path
  • mounted path
  • container $HOME
  • actual CLI lookup conventions

The fix:

  • flatten the per-group Codex state layout
  • mount it directly to /home/node/.codex
  • migrate the old nested layout forward

After that, the 401 vanished and the live message path finally completed.

Bug 3: Resume was not the same as fresh execution

Fresh turns worked.
Resume did not.

The reason was simple and annoying:

codex exec resume did not accept the same options as the fresh execution path.

That is a perfect example of why "we already integrated the CLI" is not a meaningful milestone.

You have to test:

  • fresh turns
  • resumed turns
  • session restoration
  • output handling after resume

Those are separate runtime contracts, even when they share most of the binary name.

What the System Looks Like Now

The current NanoClaw branch is not "Claude code with Codex paint."

It is a different runtime model.

Today it has:

  • a Codex-native container runner
  • AGENTS.md for project and group instructions
  • SKILL.md based skills
  • per-group Codex state
  • Telegram, WhatsApp, and Web UI channels
  • Codex-backed delegation through MCP
  • host-owned setup and orchestration

Most importantly, it no longer depends on the repo being interpreted through Claude-native markdown conventions to keep evolving correctly.

That was the actual finish line.

Not "the build passes."

Not "the logs look healthy."

But:

the application can now be operated, extended, and reasoned about as a Codex-native system.

The Real Lesson

The strongest takeaway from this work is that agent-built systems can end up with architecture embedded in places most teams do not normally treat as architecture.

In this case, that included:

  • markdown instruction files
  • setup flows
  • skill directories
  • implied extension workflows
  • hidden filesystem conventions

If you want to port a system like that, you have to audit all of it.

Not just the code.

Not just the SDK.

All of it.

Because once an agent has spent enough time building inside a repo, the repo starts encoding assumptions about that agent's worldview.

That is fascinating when it works.

It is dangerous when you need to switch runtimes.

If You Are Thinking About Doing This Yourself

My advice is:

  • inventory the instruction layer before touching the runtime
  • treat markdown conventions as potentially runtime-significant architecture
  • separate host ownership from backend ownership early
  • verify real CLI behavior instead of trusting docs from memory
  • test startup, auth, fresh turns, resumed turns, and recovery independently
  • do not call it a port if you quietly deleted capabilities that had real equivalents

The hard part is not replacing the model.

The hard part is identifying all the places where the old model had already shaped the application.

Once you do that, the work becomes much clearer:

you are not porting an SDK. You are reclaiming the architecture.

More Posts

I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules

snapsynapseverified - Apr 20

From Prompts to Goals: The Rise of Outcome-Driven Development

Tom Smithverified - Apr 11

From Subjective Narratives to Objective Data: Re-engineering the Elderly Care Communication Loop

Huifer - Jan 28

Claude Code Deep Dive Part 2: The 1,421-Line While Loop That Runs Everything

harrisonguo - May 5

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

4 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!