I gave Claude Code 24 specialist roles, then used them to rebuild my own agency

I gave Claude Code 24 specialist roles, then used them to rebuild my own agency

calendar_today agoschedule7 min read
— Originally published at dev.to

TL;DR — A few months ago I posted that I'd given Claude Code 16 specialist roles to ship features solo. I didn't stop at 16. It's 24 now — and the change I actually care about isn't the count. I split every auditor into a read-only finder that can't touch your code and a scoped fixer that changes only what was flagged. Six slash commands became eleven. The whole system is now a one-command plugin install. Then I did the obvious thing: I pointed the agent team at my own company and rebuilt Creative Brain's site with it — client zero. Same gather → delegate → verify loop I'd run on a client repo.


Where this started

If you read the first article, here's the one-paragraph recap: I'm a solo founder. I built a Claude Code plugin where one orchestrator session does nothing but gather context, delegate to a narrow specialist, and verify the output. Sixteen specialists, six slash commands, file-based memory in memory/, and safety hooks that block dangerous moves in the runtime. Idea to staging in under an hour, everything in the repo, no new SaaS.

That system worked. So I kept building it. This is the "I didn't stop at 16" update — and then the part I'm actually proud of: what the system let me do.

The orchestrator delegates; the specialists do the work

Fig 1 — One orchestrator gathers context and delegates; each specialist works in its own clean context window.


The one change that mattered most: split find from fix

Here's the most important thing I changed, and it's not the headcount.

Before, a single agent would "find the bug and fix it." Sounds efficient. In practice it meant: "found 14 issues — and quietly rewrote three unrelated files while it was in there." Classic. When one agent finds a problem and patches it in the same breath, it gets ambitious. You ask it to tighten one auth check and it "tidies up" three other files while it's there. Now your diff is a mystery and your review is useless.

One agent that finds AND fixes drifts — an unbounded blast radius

Fig 2 — The anti-pattern: one agent that both finds and fixes has an unbounded blast radius.

So I split every auditor in two:

  • Finders are read-only. They scan the diff and hand back a structured report. They literally cannot touch your code — their tool allowlist has no write access.
  • Fixers take that report and change only what was flagged. Nothing else.

The loop becomes: find → I read → scoped fix. A read-only audit, then I read it, then a fix that does exactly one thing. Reviewable. Boring. Safe. Every agent's blast radius shrank to one job.

The split: read-only finder to report to scoped fixer

Fig 3 — The split: a read-only finder reports issues, and a scoped fixer changes only what was flagged.

Before vs after: every agent's blast radius shrank to one job

Fig 4 — Before, one move touched everything. After, each agent is scoped to exactly one job.

This is the kind of constraint that's invisible until you've been burned by its absence. If you run coding agents: do you actually constrain what your "fix" step is allowed to touch — or do you just hope?


What else changed since the last post

The numbers moved, and I want to be precise about them because the build status is now reconciled against the actual files on disk — not aspirational counts.

  • 16 specialists → 24. The roster grew, but mostly because the auditors doubled (a finder and a fixer where there used to be one combined agent), plus a few new roles.
  • 6 slash commands → 11. Added a rollback-author, a performance profiler, a BRD-to-roadmap command, and a changelog builder, among others.
  • A setup script → a one-command plugin install. It used to need a setup script. Now it's /plugin install and you've got the full roster, no scaffolding step.

The 24-specialist roster, in four bands

Fig 5 — The 24 specialists in four bands — the audit band is where the finder/fixer doubling happened.

Six slash commands became eleven

Fig 6 — Six commands became eleven: rollback, perf profiler, BRD→roadmap, changelog builder, and more.

That last bullet matters more than it sounds. A one-command install is the difference between "a thing I built for myself" and "a thing someone else can actually run." The whole roster — agents, commands, hooks, memory model — installs in one move.

A setup script became one command

Fig 7 — /plugin install brings the whole roster in one command. No setup script.


The rule I'd bet the whole thing on

Everything above rests on one principle, and it hasn't changed since the first article:

The safety gates live in the runtime, not the prompt. A prompt can be ignored. A runtime gate cannot.

Claude Code's hook system runs scripts at lifecycle events, and I use them to enforce hard limits — no force pushes, no rm -rf near root, no production deploy without my explicit approval, every write appended to an audit log. These aren't polite requests in a system prompt that a confident model can rationalize past. They're hooks. They run in the runtime layer. The AI cannot override them.

The finder/fixer split is the same idea pushed one level deeper. A finder being read-only isn't a sentence in its prompt asking it nicely not to edit files — it's an empty write-tool allowlist. The constraint is structural, not advisory. That's the whole reason I trust the system enough to point it at production.

The gate runs in the runtime, not the prompt

Fig 8 — Hooks gate the tool lifecycle in the runtime layer — a dangerous command is blocked before it runs.


Then I made my own company client zero

This is the part I actually wanted to write about.

For years, Creative Brain was a creative + web development studio — good work, but priced and paced like every other shop: scoped in weeks, billed by the hour. The pivot to an AI-native agency wasn't a new logo. It was a new way of building. And before I sold "we ship with a gated AI agent team" to a single client, I made myself client zero.

So I pointed my own 24-specialist Claude Code team at creativebrain.ca and rebuilt the whole site:

  • a full /services architecture
  • a free /tools suite and a /seo-tools suite
  • a complete custom illustration system — no stock, no icon-library filler, hand-built SVGs in one coherent family
  • an actual design system, tokens and all, documented

The loop was exactly the one I'd run on a client repo. A planner spec'd it. Builders wrote it. Read-only finders audited it. I reviewed and shipped:

planner drafts the spec        → I approve
builders implement             → finders audit (read-only) → I review
I ship

gather → delegate → verify. What used to be a 2–3 week site project compressed into 3–4 days of focused sessions — because I wasn't doing the typing, I was doing the deciding.

Client zero: the agency rebuilt itself with its own agent team

Fig 9 — Client zero: the same gather→delegate→verify loop, pointed at the agency's own site.


The honest part (this hasn't changed either)

The AI did not replace the taste. Every design call, every "no, not that," every brand decision was mine. The agents gave me leverage, not judgment — and that's still the job. A few more honest limits, same as the first article:

  • It's slow on day one of a new repo. The specialists need a memory/ to read against. Day three, it flies; day one, you're hand-holding.
  • Migrations still make me nervous. I read every Supabase migration line by line. The auditor is a second opinion, not the final word.
  • It still can't design anything beautiful. Correct, accessible components — yes. Taste — no. That's on me.
  • It costs tokens. Twenty-four specialists run against a non-trivial diff eat budget. The trade vs. the alternative is still wildly favorable, but it isn't free.

None of that is a disclaimer. It's the reason the system is safe to run: I know exactly where it stops being trustworthy, and the gates are built around those edges.

Weeks of studio work, compressed into days

Fig 10 — The receipt: a 2–3 week studio project compressed into 3–4 days of focused, decision-led sessions.


So what's the pitch now

Creative Brain isn't "a studio that uses AI." It's an agency where one person plus a gated agent team ships at a pace a traditional shop can't match — in fixed-price, fixed-scope Sprints, most of them 14 days, with the client owning 100% of the code on day one. The rebuilt site is the receipt: live, custom, and built by the exact system I've been posting about.

If you run an agency or a studio, here's the question the rebuild left me with: what's the one project you'd redo from scratch if your build time dropped by 5x?

And if you want to see this running on your stack — Next.js + Supabase or otherwise — I do free 15-minute walkthroughs. I'll look at your repo and tell you honestly whether the approach fits:

https://calendly.com/creativebrain-ca/free-mvp-strategy-call

The whole orchestrator pattern — the 24 agents, the finder/fixer split, the hooks, the memory model — is documented in the repo. Drop a question in the comments with your stack and I'll dig into the specifics.

🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

Your AI Doesn't Just Write Tests. It Runs Them Too.

Kevin Martinez - May 12

TypeScript Complexity Has Finally Reached the Point of Total Absurdity

Karol Modelskiverified - Apr 23

I gave Claude Code 16 specialist roles. Now I ship full-stack features before lunch.

CreativeBrainCA - Jun 11

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

snapsynapseverified - Apr 20
chevron_left
168 Points4 Badges
Toronto Ontariocreativebrain.ca
3Posts
0Comments

Related Jobs

View all jobs →

Commenters (This Week)

19 comments
4 comments
2 comments

Contribute meaningful comments to climb the leaderboard and earn badges!