I built AgentKit because my AI coding agent kept failing — here's what I learned

Question

I built AgentKit because my AI coding agent kept failing — here's what I learned

Ajay_dev posted Apr 12 2 min read

The frustration that started everything

I'd been using Claude Code and OpenCode daily for months. The models were impressive — genuinely capable of understanding complex codebases, writing solid logic, catching bugs I missed. But there was a pattern I couldn't shake: give it a real task, something with more than two or three steps, and it would fall apart halfway through. Not because it was dumb. Because it had no discipline.

It would skip reading the relevant files and just start writing. It would forget what it decided three prompts ago. It would use the most expensive model for a task that needed a two-line answer. Every session felt like handing the wheel to someone brilliant who had no memory of the drive so far.

Task completion — on anything non-trivial — hovered around 20%. That's not a model problem. That's a structure problem.

What I actually built

AgentKit is an open-source workflow layer that sits on top of AI coding agents and gives them what they're missing: a memory, a plan, and a process they have to follow before touching your code.
It has five core layers:

The Intelligent Skill Router : classifies every prompt and injects only the relevant skills into context — cutting token usage
by ~89% per session. The agent stops reading 45,000 tokens of
documentation it doesn't need and reads the 5,000 it does.
The Project Memory Graph: is a SQLite-backed knowledge graph that records every file, function, API route, and architectural
decision across sessions. When you start a new session, AgentKit
already knows what you were building, what you decided, and why.

The Token Budget Intelligence layer :automatically routes simple tasks to cheaper models and complex ones to powerful models. Against
an all-Sonnet baseline, it cuts costs by around 60%. The Workflow
Engine is the most important piece. It enforces a strict Research →
Plan → Execute → Review → Ship state machine. The agent literally
cannot edit a file without an approved plan. No shortcuts. No jumping
ahead.

The Universal Platform Layer means none of this is locked to one tool. AgentKit installs across 11 platforms — Claude Code, OpenCode,
Hermes, and more — with a single command.

The moment the benchmark landed

I ran the same tasks with the same model — Gemma 4 31b — with and without AgentKit active. Same prompts, same codebase, same evaluation criteria.

Without AgentKit: ~20% task completion.
With AgentKit: ~80%.

I ran it twice because I didn't believe it the first time. The difference is entirely structural. Planning before execution changes everything. Not the model. The process.

Where it is now:

AgentKit is live, open source, and published on npm. We're at v0.5.x — early, but stable enough that developers are cloning it daily and the Skill-Sync Bridge is already auto-generating new skills from session experience.

The project is also starting to get attention from the developer community, including a benchmark writeup that landed on Dev.to.

The preview version (agentkit-preview) goes further — a Telegram bridge for 24/7 remote agent control from your phone, an approval gate system so the agent can never auto-execute without your confirmation, and a self-improving skill library that grows from your own sessions.

Try it

npx agentkit-ai@latest init

GitHub: https://github.com/Ajaysable123/AgentKit

2 Comments

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

DuchessCodes · Answer 1 · 2026-04-12T15:12:41+0000

This is a strong “it wasn’t the model, it was the workflow” argument — and it’s believable.

The biggest takeaway is the 20% → 80% jump coming purely from forcing structure: planning, memory, routing, and execution gates. That’s basically a reminder that most “agent failure” isn’t intelligence failure, it’s coordination failure.

What stands out most is the state machine idea (Research → Plan → Execute → Review → Ship). That’s very close to how good engineers already work — you’re just making it non-optional for the agent, which is probably why it works better.

The interesting question going forward is how much of this stays necessary as models improve vs. how much becomes the real differentiator anyway (i.e., process vs raw capability).

	How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work Dharanidharan - Feb 9
	I Wrote a Script to Fix Audible's Unreadable PDF Filenames snapsynapse - Apr 20
	Your AI Agent Skills Have a Version Control Problem snapsynapse - Apr 22
	Same model. Different results. — AgentKit Benchmark + OpenCode Integration Ajay_dev - Apr 12
	I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules snapsynapse - Apr 20

I built AgentKit because my AI coding agent kept failing — here's what I learned

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

Your AI Agent Skills Have a Version Control Problem

Same model. Different results. — AgentKit Benchmark + OpenCode Integration

I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules

More From Ajay_dev

Same model. Different results. — AgentKit Benchmark + OpenCode Integration

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,302 amazing developers

Don't have an account? Sign up

OR

I built AgentKit because my AI coding agent kept failing — here's what I learned

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

Your AI Agent Skills Have a Version Control Problem

Same model. Different results. — AgentKit Benchmark + OpenCode Integration

I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules

More From Ajay_dev

Same model. Different results. — AgentKit Benchmark + OpenCode Integration

Related Jobs

Commenters (This Week)