Introduction
What if your code was audited by 10 specialized AI agents before every merge? What if another team of 9 AI agents could build, deploy, and monitor your infrastructure autonomously?
That's exactly what we built at Fox Digital: foxdev (the audit team) and foxagentdev (the build team). Together, they form a 19-agent AI development workflow that has shipped production-grade code — including a full fiscal API with 464 tests and a 93.4/100 quality score — in days instead of weeks.
foxdev v3.0 — The Audit Team (10 Agents)
foxdev is a team of 10 specialized LLM-powered agents, each responsible for a specific audit domain. Every code change passes through all 10 agents before it can merge.
- ag01 Code Review: style consistency, design patterns, anti-patterns, DRY violations
- ag02 Test Coverage: finds untested code paths, suggests edge cases, runs mutation testing
- ag03 Security: OWASP checks, injection vectors, auth flaws, replay attack surfaces
- ag04 Debug: traces error paths, identifies silent failures, validates retry logic
- ag05 Refactor: identifies fat services, extracts traits, enforces Single Responsibility
- ag06 Docs: PHPDoc coverage, README quality, ADR completeness, changelog accuracy
- ag07 Performance: N+1 queries, missing indexes, cache opportunities, query plan optimization
- ag08 Compliance: LGPD data handling, fiscal law compliance, data retention rules
- ag09 Observability: structured logging, metrics instrumentation, tracing, alerting coverage
- ag10 FinOps: cloud cost tracking, resource optimization, billing accuracy
FOXVERIFY Scoring
Each agent produces findings and a domain score. FOXVERIFY consolidates them into a single 0–100 quality score. The threshold for merge is 90.
Score evolution on FOX NF-e:
| Version | Score |
| v1.0 | 70.0 |
| v1.1 | 77.0 |
| v1.4 | 93.0 |
| v2.0 | 76.0 |
| v2.1 | 77.9 |
| v2.2 | 82.4 |
| v2.3 | 84.9 |
| v2.4 | 91.7 |
| v2.5 | 93.4 |
We started at 70/100 on day one. After 9 audit-rework iterations over 3 days, we reached 93.4/100 with zero open findings.
foxagentdev — The Build Team (9 Agents + 8 Hooks)
foxagentdev is the autonomous build side. While foxdev audits, foxagentdev writes code, runs tests, fixes issues, and deploys.
The system runs on 9 specialized build agents with 8 lifecycle hooks:
- SecurityScan — runs security checks before any code is committed
- QualityCheck — pre-commit quality gate
- Stop — emergency brake for unexpected states
- PostToolUse — cleans up after each tool execution
- memory-check — validates FoxMemory context before building
- environment-selector — routes tasks to the correct environment (dev/staging/prod)
- skill-router — assigns the right skill to the right agent
- Plus one additional orchestration hook
foxagentdev manages 18 skills across the entire ecosystem, with cron jobs running foxpresence GEO visibility (scoring 100/100 in our internal benchmarks).
The two systems coexist via COEXISTENCE.md — a protocol that prevents foxdev (auditor) and foxagentdev (builder) from conflicting during parallel execution.
The Workflow
- Developer writes a DSPy-format prompt (15–25 lines max — structured, not free-form)
- foxagentdev builds the feature: writes code, tests, documentation
- foxdev audits the result via FOXVERIFY
- Score < 90? Automatic rework loop — foxagentdev addresses all findings
- Score >= 90? Merge to main
- Every merge: FoxMemory saves lessons learned for continuous improvement
The key insight is that DSPy-format prompts dramatically outperform free-form prompts for agent tasks. Structured prompts reduce ambiguity, enabling agents to operate more autonomously and produce more consistent results.
Real Results: FOX NF-e Project
Everything in the following list was built using the foxdev + foxagentdev workflow:
- 464 automated tests, 1,109 assertions — built in 4 days
- 93.4/100 quality score via FOXVERIFY
- 0 open findings at ship time
- 24 MCP tools implemented (Streamable HTTP, JSON-RPC 2.0)
- 5,571 municipalities enriched with NFSe provider mapping
- Full LGPD Art. 18 compliance endpoints
- Prometheus metrics + webhook replay protection
- Automatic contingency: EPEC + SVC-AN + SVC-RS + FS-DA
Benefits
- Code quality goes up every iteration — measurable, not subjective (you have a score)
- Security issues caught before production — ag03 runs on every PR
- Documentation stays current — ag06 enforces it automatically
- Compliance is proactive — ag08 flags LGPD issues before they become violations
- New developers onboard faster — everything is documented, tested, and explained
- Cost: ~$30–80/month in LLM API calls for the entire audit pipeline
Lessons Learned
- Specialized agents outperform generalist agents — 10 focused agents beat 1 "do everything" agent every time
- Scoring creates accountability — a number forces honesty about code quality
- Memory systems prevent regressions — FoxMemory means mistakes get made once, not repeatedly
- Human-in-the-loop for architecture, AI for implementation — the right division of responsibility
- DSPy-format prompts are a game changer — structured prompts produce more reliable, reproducible results
Author
Paulo Fox — CEO at Fox Digital. Former contractor for SpaceX (2014–2017) and Google (2019–2022). MIT AI Strategy & GenAI. Creator of FOX NF-e, foxdev, and foxagentdev.