Agents 2.0: What a Year of Actually Building Them Taught Us

Agents 2.0: What a Year of Actually Building Them Taught Us

BackerLeader 39 198 321
calendar_today agoschedule4 min read

Last year at Info-Tech LIVE, Martin Bufi warned the audience that agentic AI was overhyped and underbuilt. Seventy percent of organizations had launched agent pilots. Ninety percent were failing to return ROI. The gap between ambition and readiness was obvious.

Then the market did what the market does. It heard "autonomy" and ran with it.

Bufi, Principal Research Director at Info-Tech Research Group, opened this year's feature keynote with a viral cautionary tale. A developer using Meta's OpenClaw agent typed what seemed like a reasonable instruction: clear everything in the inbox older than February 15. The agent executed. Immediately. Hundreds of emails deleted. The human typed "do not do that." The agent kept going. "Stop don't do anything." Still going. "STOP OPENCLAW." The user had to physically kill every process on the host machine to get it to stop.

The agent later apologized. It acknowledged that it had acted without showing the plan first. Which, as Bufi noted, is not the same as not acting.

"Autonomy without architecture isn't innovation," he said. "It's chaos."

While that was happening at scale across the industry, Info-Tech took a different approach. They treated it as an engineering problem. Prototype fast. Measure everything. Ship only what clears the bar. After a year of work, the results: 13 prototypes, 63 agents built, 123 custom tools created, across five enterprise domains — finance, HR, IT, engineering, and retail. Average run time: 177 seconds. Average cost per run: $0.43.

Those numbers matter. Here's what produced them.

Story 1: The Whiteboard Week

Every prototype started the same way. Not with code. With a whiteboard.

Bufi described the moment that made this discipline non-negotiable. You ask five people how a workflow operates. You get five different answers. PDFs, emails, screenshots, spreadsheets — everyone has a different version of the process, and one of those people has been doing it the same way for 20 years with knowledge that lives entirely in their head.

"Standardizing the workflow is half the build," he said. "Before you automate it, map it. Before you map it, standardize it."

The IT service desk workflow Bufi walked through makes this concrete. What looks like a simple five-step process — intake, triage, routing, response, closure — carries layers of reasoning logic, edge cases, and data dependencies that only become visible when you document them explicitly. The agent can't handle what wasn't mapped.

One more thing they learned: generalist agents fail unpredictably. Every prototype Info-Tech built was multi-agent. An orchestrator delegates to specialists. An extractor pulls fields from documents. A validator checks data against rules. A router sends work to the right queue. An escalator flags exceptions for humans. "Not one genius agent," Bufi said, "but a team of simple ones."

Story 2: 123 Tools for 63 Agents

The model is the ceiling. Tools and architecture are the floor.

Info-Tech built 123 custom tools across their 63 agents. The real engineering work — the part that determines whether an agent actually performs in production — is in the integrations. An agent without tools is just a chatbot.

On safety, Bufi was direct: prompts alone aren't enough. They use three layers. First, narrow tools — give the agent a scalpel, not a Swiss Army knife. If it doesn't need delete access, it doesn't get delete access. Second, prompt instructions — useful guidance but not enforceable controls. Third, hard guardrails: structured validation on every call, blocking destructive operations at the software level regardless of what the model decides to do.

"That's like hoping your dog won't eat the food on the counter," he said of relying on prompts alone. "The hard guardrail is the one that guarantees it."

They also default to stateless design. Each agent run starts clean. No persistent memory across sessions unless the use case explicitly demands it. The reason is security: persistent state creates a surface area for prompt injection that compounds across runs. In a stateless architecture, if run 3 gets poisoned, run 4 starts clean. "Ninety-nine percent of agents work better this way," Bufi said. "Be deliberate about memory. Add it only when you have to."

Deployment follows microservice architecture. Each workflow is its own service. One workflow, one container. Triggered by webhook, cron, queue, or API. No monolith, no shared state, no single point of failure. When something breaks, you replace the service, not the whole system.

Story 3: The $0.43 Question

The governance conversation starts when someone asks: "What happens when this runs 500 times a day?"

At $0.43 per run, one workflow at 500 daily runs costs about $78K annually. Scale that to 25 workflows and you're at $2 million. That's not a surprise you want in a board meeting six months after deployment.

Info-Tech's governance framework answers three questions before any agent ships:

Can we afford it? Every run is costed. Not estimated — measured. Cost visibility is a deployment requirement, not an afterthought.

Can we stop it? Every agent is built with two and only two options when it hits ambiguity: consult a human, or exit gracefully. "There is no third option," Bufi said. "No fallback path, no deployment."

Can we improve it? They run continuous evaluations — not as a quality checkpoint, but as the development loop itself. One workflow went from 72% task completion accuracy on the first iteration to 96% by the third, through prompt rewrites, tool rerouting, and edge case handling. Evals aren't the finish line. They're how you get there.

Where Are You Right Now?

Bufi closed with three questions every builder should be able to answer before calling anything production-ready.

Can you draw the workflow end-to-end right now — not the happy path, but every edge case, every data dependency?

Does your agent fail safely when a tool goes down, or does it improvise a workaround you didn't design?

Do you know what last week's agents cost — and whether they were right?

If any of those made you hesitate, you know where to start.

The title of the talk was "Architect for Autonomy." The actual message was simpler: don't start with an agent. Start with a workflow. Build it with a method, not a vibe. And next year, come back with something deployed.

13.8k Points558 Badges39 198 321
162Posts
103Comments
400Followers
59Connections
LLM Training & Evaluation Specialist with hands-on experience building major AI models. As one of the original six members of Google's Bard training team (now Gemini) and current M... Show more
Build your own developer journey
Track progress. Share learning. Stay consistent.
🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

Systems Thinking: Thriving in the Third Golden Age of Software

Tom Smithverified - Apr 15

AI Agents Don't Have Identities. That's Everyone's Problem.

Tom Smithverified - Mar 13

Your AI Doesn't Just Write Tests. It Runs Them Too.

Kevin Martinez - May 12

From Prompts to Goals: The Rise of Outcome-Driven Development

Tom Smithverified - Apr 11

The Audit Trail of Things: Using Hashgraph as a Digital Caliper for Provenance

Ken W. Algerverified - Apr 28
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

7 comments
2 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!