How We Built a 19-Agent AI Dev Team: foxdev and foxagentdev

posted 3 min read

Introduction

What if your code was audited by 10 specialized AI agents before every merge? What if another team of 9 AI agents could build, deploy, and monitor your infrastructure autonomously?

That's exactly what we built at Fox Digital: foxdev (the audit team) and foxagentdev (the build team). Together, they form a 19-agent AI development workflow that has shipped production-grade code — including a full fiscal API with 464 tests and a 93.4/100 quality score — in days instead of weeks.


foxdev v3.0 — The Audit Team (10 Agents)

foxdev is a team of 10 specialized LLM-powered agents, each responsible for a specific audit domain. Every code change passes through all 10 agents before it can merge.

  • ag01 Code Review: style consistency, design patterns, anti-patterns, DRY violations
  • ag02 Test Coverage: finds untested code paths, suggests edge cases, runs mutation testing
  • ag03 Security: OWASP checks, injection vectors, auth flaws, replay attack surfaces
  • ag04 Debug: traces error paths, identifies silent failures, validates retry logic
  • ag05 Refactor: identifies fat services, extracts traits, enforces Single Responsibility
  • ag06 Docs: PHPDoc coverage, README quality, ADR completeness, changelog accuracy
  • ag07 Performance: N+1 queries, missing indexes, cache opportunities, query plan optimization
  • ag08 Compliance: LGPD data handling, fiscal law compliance, data retention rules
  • ag09 Observability: structured logging, metrics instrumentation, tracing, alerting coverage
  • ag10 FinOps: cloud cost tracking, resource optimization, billing accuracy

FOXVERIFY Scoring

Each agent produces findings and a domain score. FOXVERIFY consolidates them into a single 0–100 quality score. The threshold for merge is 90.

Score evolution on FOX NF-e:

Version Score
v1.0 70.0
v1.1 77.0
v1.4 93.0
v2.0 76.0
v2.1 77.9
v2.2 82.4
v2.3 84.9
v2.4 91.7
v2.5 93.4

We started at 70/100 on day one. After 9 audit-rework iterations over 3 days, we reached 93.4/100 with zero open findings.


foxagentdev — The Build Team (9 Agents + 8 Hooks)

foxagentdev is the autonomous build side. While foxdev audits, foxagentdev writes code, runs tests, fixes issues, and deploys.

The system runs on 9 specialized build agents with 8 lifecycle hooks:

  • SecurityScan — runs security checks before any code is committed
  • QualityCheck — pre-commit quality gate
  • Stop — emergency brake for unexpected states
  • PostToolUse — cleans up after each tool execution
  • memory-check — validates FoxMemory context before building
  • environment-selector — routes tasks to the correct environment (dev/staging/prod)
  • skill-router — assigns the right skill to the right agent
  • Plus one additional orchestration hook

foxagentdev manages 18 skills across the entire ecosystem, with cron jobs running foxpresence GEO visibility (scoring 100/100 in our internal benchmarks).

The two systems coexist via COEXISTENCE.md — a protocol that prevents foxdev (auditor) and foxagentdev (builder) from conflicting during parallel execution.


The Workflow

  1. Developer writes a DSPy-format prompt (15–25 lines max — structured, not free-form)
  2. foxagentdev builds the feature: writes code, tests, documentation
  3. foxdev audits the result via FOXVERIFY
  4. Score < 90? Automatic rework loop — foxagentdev addresses all findings
  5. Score >= 90? Merge to main
  6. Every merge: FoxMemory saves lessons learned for continuous improvement

The key insight is that DSPy-format prompts dramatically outperform free-form prompts for agent tasks. Structured prompts reduce ambiguity, enabling agents to operate more autonomously and produce more consistent results.


Real Results: FOX NF-e Project

Everything in the following list was built using the foxdev + foxagentdev workflow:

  • 464 automated tests, 1,109 assertions — built in 4 days
  • 93.4/100 quality score via FOXVERIFY
  • 0 open findings at ship time
  • 24 MCP tools implemented (Streamable HTTP, JSON-RPC 2.0)
  • 5,571 municipalities enriched with NFSe provider mapping
  • Full LGPD Art. 18 compliance endpoints
  • Prometheus metrics + webhook replay protection
  • Automatic contingency: EPEC + SVC-AN + SVC-RS + FS-DA

Benefits

  • Code quality goes up every iteration — measurable, not subjective (you have a score)
  • Security issues caught before production — ag03 runs on every PR
  • Documentation stays current — ag06 enforces it automatically
  • Compliance is proactive — ag08 flags LGPD issues before they become violations
  • New developers onboard faster — everything is documented, tested, and explained
  • Cost: ~$30–80/month in LLM API calls for the entire audit pipeline

Lessons Learned

  • Specialized agents outperform generalist agents — 10 focused agents beat 1 "do everything" agent every time
  • Scoring creates accountability — a number forces honesty about code quality
  • Memory systems prevent regressions — FoxMemory means mistakes get made once, not repeatedly
  • Human-in-the-loop for architecture, AI for implementation — the right division of responsibility
  • DSPy-format prompts are a game changer — structured prompts produce more reliable, reproducible results

Author

Paulo Fox — CEO at Fox Digital. Former contractor for SpaceX (2014–2017) and Google (2019–2022). MIT AI Strategy & GenAI. Creator of FOX NF-e, foxdev, and foxagentdev.

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

Dharanidharan - Feb 9

Breaking the AI Data Bottleneck: How Hammerspace's AI Data Platform Eliminates Migration Nightmares

Tom Smithverified - Mar 16

Your AI Agent Skills Have a Version Control Problem

snapsynapseverified - Apr 22

Everyone says DeepSeek is cheaper, but I got tired of guessing the exact math. So I built a calculat

abarth23 - Apr 27
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

6 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!