The frustration that started everything
I'd been using Claude Code and OpenCode daily for months. The models were impressive — genuinely capable of understanding complex codebases, writing solid logic, catching bugs I missed. But there was a pattern I couldn't shake: give it a real task, something with more than two or three steps, and it would fall apart halfway through. Not because it was dumb. Because it had no discipline.
It would skip reading the relevant files and just start writing. It would forget what it decided three prompts ago. It would use the most expensive model for a task that needed a two-line answer. Every session felt like handing the wheel to someone brilliant who had no memory of the drive so far.
Task completion — on anything non-trivial — hovered around 20%. That's not a model problem. That's a structure problem.
What I actually built
AgentKit is an open-source workflow layer that sits on top of AI coding agents and gives them what they're missing: a memory, a plan, and a process they have to follow before touching your code.
It has five core layers:
The Intelligent Skill Router : classifies every prompt and injects only the relevant skills into context — cutting token usage
by ~89% per session. The agent stops reading 45,000 tokens of
documentation it doesn't need and reads the 5,000 it does.
The Project Memory Graph: is a SQLite-backed knowledge graph that records every file, function, API route, and architectural
decision across sessions. When you start a new session, AgentKit
already knows what you were building, what you decided, and why.
The Token Budget Intelligence layer :automatically routes simple tasks to cheaper models and complex ones to powerful models. Against
an all-Sonnet baseline, it cuts costs by around 60%. The Workflow
Engine is the most important piece. It enforces a strict Research →
Plan → Execute → Review → Ship state machine. The agent literally
cannot edit a file without an approved plan. No shortcuts. No jumping
ahead.
- The Universal Platform Layer means none of this is locked to one tool. AgentKit installs across 11 platforms — Claude Code, OpenCode,
Hermes, and more — with a single command.
The moment the benchmark landed
I ran the same tasks with the same model — Gemma 4 31b — with and without AgentKit active. Same prompts, same codebase, same evaluation criteria.
Without AgentKit: ~20% task completion.
With AgentKit: ~80%.
I ran it twice because I didn't believe it the first time. The difference is entirely structural. Planning before execution changes everything. Not the model. The process.
Where it is now:

AgentKit is live, open source, and published on npm. We're at v0.5.x — early, but stable enough that developers are cloning it daily and the Skill-Sync Bridge is already auto-generating new skills from session experience.
The project is also starting to get attention from the developer community, including a benchmark writeup that landed on Dev.to.
The preview version (agentkit-preview) goes further — a Telegram bridge for 24/7 remote agent control from your phone, an approval gate system so the agent can never auto-execute without your confirmation, and a self-improving skill library that grows from your own sessions.
Try it
npx agentkit-ai@latest init
GitHub: https://github.com/Ajaysable123/AgentKit