Last year at Info-Tech LIVE, Martin Bufi warned the audience that agentic AI was overhyped and underbuilt. Seventy percent of organizations had launched agent pilots. Ninety percent were failing to return ROI. The gap between ambition and readiness was obvious.
Then the market did what the market does. It heard "autonomy" and ran with it.
Bufi, Principal Research Director at Info-Tech Research Group, opened this year's feature keynote with a viral cautionary tale. A developer using Meta's OpenClaw agent typed what seemed like a reasonable instruction: clear everything in the inbox older than February 15. The agent executed. Immediately. Hundreds of emails deleted. The human typed "do not do that." The agent kept going. "Stop don't do anything." Still going. "STOP OPENCLAW." The user had to physically kill every process on the host machine to get it to stop.
The agent later apologized. It acknowledged that it had acted without showing the plan first. Which, as Bufi noted, is not the same as not acting.
"Autonomy without architecture isn't innovation," he said. "It's chaos."
While that was happening at scale across the industry, Info-Tech took a different approach. They treated it as an engineering problem. Prototype fast. Measure everything. Ship only what clears the bar. After a year of work, the results: 13 prototypes, 63 agents built, 123 custom tools created, across five enterprise domains — finance, HR, IT, engineering, and retail. Average run time: 177 seconds. Average cost per run: $0.43.
Those numbers matter. Here's what produced them.
Story 1: The Whiteboard Week
Every prototype started the same way. Not with code. With a whiteboard.
Bufi described the moment that made this discipline non-negotiable. You ask five people how a workflow operates. You get five different answers. PDFs, emails, screenshots, spreadsheets — everyone has a different version of the process, and one of those people has been doing it the same way for 20 years with knowledge that lives entirely in their head.
"Standardizing the workflow is half the build," he said. "Before you automate it, map it. Before you map it, standardize it."
The IT service desk workflow Bufi walked through makes this concrete. What looks like a simple five-step process — intake, triage, routing, response, closure — carries layers of reasoning logic, edge cases, and data dependencies that only become visible when you document them explicitly. The agent can't handle what wasn't mapped.
One more thing they learned: generalist agents fail unpredictably. Every prototype Info-Tech built was multi-agent. An orchestrator delegates to specialists. An extractor pulls fields from documents. A validator checks data against rules. A router sends work to the right queue. An escalator flags exceptions for humans. "Not one genius agent," Bufi said, "but a team of simple ones."
Story 2: 123 Tools for 63 Agents
The model is the ceiling. Tools and architecture are the floor.
Info-Tech built 123 custom tools across their 63 agents. The real engineering work — the part that determines whether an agent actually performs in production — is in the integrations. An agent without tools is just a chatbot.
On safety, Bufi was direct: prompts alone aren't enough. They use three layers. First, narrow tools — give the agent a scalpel, not a Swiss Army knife. If it doesn't need delete access, it doesn't get delete access. Second, prompt instructions — useful guidance but not enforceable controls. Third, hard guardrails: structured validation on every call, blocking destructive operations at the software level regardless of what the model decides to do.
"That's like hoping your dog won't eat the food on the counter," he said of relying on prompts alone. "The hard guardrail is the one that guarantees it."
They also default to stateless design. Each agent run starts clean. No persistent memory across sessions unless the use case explicitly demands it. The reason is security: persistent state creates a surface area for prompt injection that compounds across runs. In a stateless architecture, if run 3 gets poisoned, run 4 starts clean. "Ninety-nine percent of agents work better this way," Bufi said. "Be deliberate about memory. Add it only when you have to."
Deployment follows microservice architecture. Each workflow is its own service. One workflow, one container. Triggered by webhook, cron, queue, or API. No monolith, no shared state, no single point of failure. When something breaks, you replace the service, not the whole system.
Story 3: The $0.43 Question
The governance conversation starts when someone asks: "What happens when this runs 500 times a day?"
At $0.43 per run, one workflow at 500 daily runs costs about $78K annually. Scale that to 25 workflows and you're at $2 million. That's not a surprise you want in a board meeting six months after deployment.
Info-Tech's governance framework answers three questions before any agent ships:
Can we afford it? Every run is costed. Not estimated — measured. Cost visibility is a deployment requirement, not an afterthought.
Can we stop it? Every agent is built with two and only two options when it hits ambiguity: consult a human, or exit gracefully. "There is no third option," Bufi said. "No fallback path, no deployment."
Can we improve it? They run continuous evaluations — not as a quality checkpoint, but as the development loop itself. One workflow went from 72% task completion accuracy on the first iteration to 96% by the third, through prompt rewrites, tool rerouting, and edge case handling. Evals aren't the finish line. They're how you get there.
Where Are You Right Now?
Bufi closed with three questions every builder should be able to answer before calling anything production-ready.
Can you draw the workflow end-to-end right now — not the happy path, but every edge case, every data dependency?
Does your agent fail safely when a tool goes down, or does it improvise a workaround you didn't design?
Do you know what last week's agents cost — and whether they were right?
If any of those made you hesitate, you know where to start.
The title of the talk was "Architect for Autonomy." The actual message was simpler: don't start with an agent. Start with a workflow. Build it with a method, not a vibe. And next year, come back with something deployed.