Your AI Agents Are Live. Now What?

Your AI Agents Are Live. Now What?

BackerLeader posted 6 min read

Most companies deploying AI agents have no idea if they're working. Here's how to fix that.


You've built the agent. You've deployed it. Leadership is asking for results.

And you're staring at a dashboard full of logs that tells you how long the model took to respond and how many tokens it consumed — but nothing about whether the people using it are actually getting anything done.

That's the problem most engineering and product teams are sitting with right now. And it was the central theme of Pendomonium 2026, Pendo's annual product management conference held in Raleigh earlier this month.

Two sessions stood out for anyone building or shipping AI agents in an enterprise environment. One covered what happens when agents go wrong. The other covered how to know if they're going right.


When Agents Go Wrong: The Security Problem Nobody's Talking About

Gil Feig, CTO and co-founder of Merge, opened with a story that got the room's attention fast.

The head of AI safety at Meta deployed a personal agent to clean up her email. It ran haywire and deleted everything. He tweeted about it. The head of AI safety at Meta.

The point wasn't to embarrass anyone. It was to make clear that the failure modes in AI agents aren't exotic. They're mundane. And most of them happen at the same place: the moment an agent calls a tool.

Tool calling is what separates an AI agent from a chatbot. When an agent needs to do something beyond generating text — query a database, update a ticket, send an email, retrieve a file — it makes a tool call. That's where most security failures happen.

Feig identified three primary failure modes:

Over-provisioning. You generated one API key with access to everything because it was easier. Your agent now has the ability to delete your entire Jira instance. It won't do that on purpose. But given enough edge cases, retries, and hallucinated intent, it might.

Missing identity. You're not logging which agent made which tool call. When something goes wrong — and something will go wrong — you can't reconstruct what happened, when, or why. You don't have security. You have plausible deniability.

Misconfigured credentials. You used service-level credentials when user-scoped credentials would have been safer. Now your agent can take actions on behalf of any user in the system, not just the one who initiated the request.

His framework for fixing this is straightforward. Every tool call needs to carry four things explicitly: who made it (tenant, user, actor type), what was done (tool called, resource touched), when (timestamps you can correlate across systems), and why (intent or reason code).

If you can't reconstruct a tool call from your logs, you don't have security. Full stop.

He also made a point worth repeating to any product manager who thinks this is an engineering problem: product is now responsible for defining what an agent is and is not allowed to do. You're the one thinking through the use cases. That means you're the one who needs to specify the minimum required scopes — not hand it off to security and assume they'll figure it out.

The practical guidance: create separate API credentials for each agent, scoped to exactly what that agent needs to do its job. No broad access. Treat your agents like new hires on their first day — give them access to what they need, not everything the company owns.


When Agents Are Running: The Measurement Problem Nobody's Solving

The afternoon session brought a different problem into focus. Jake Wills from Pendo and Brandon Stauber from Salesforce presented a two-pillar framework for measuring agentic ROI that's worth building into how you think about agent deployments from day one.

The honest observation they led with: just deploying agents isn't the hard part. Proving they're working is.

Most teams measure agents the way they measure infrastructure — uptime, latency, token cost, error rates. These are important. They are not ROI.

ROI in agentic systems comes from what humans do differently because the agent exists. That requires a second layer of measurement that most teams aren't capturing.

Pillar 1: Agent Performance

This is the layer most teams have, or think they have. It covers how often the agent is being used, what users are actually asking it to do (which is often not what you designed it for), where conversations drop off, and whether the tools the agent depends on are functioning correctly.

Pendo's Agent Analytics product handles this layer. It tracks conversations, surfaces common use cases, identifies failed tasks and ignored instructions, and flags when users are abandoning the agent mid-interaction. It also shows retention — are people coming back to the agent, or trying it once and reverting to the old process?

Pillar 2: Human Performance

This is the layer most teams are missing entirely.

After a user interacts with your agent, where do they go next? Are they re-doing the work the agent just completed? That's a trust problem — and a reliability problem. Are they moving on to higher-value tasks? That's the outcome you built the agent for.

The ROI of an AI agent is not measured in minutes saved on a single transaction. It's measured in whether humans are spending more time on work that matters. To make that case to a CFO or a board, you need behavioral data before and after deployment — a baseline you can point to.

Stauber offered a useful heuristic for where to start. Look for processes that are low-to-medium complexity for humans but high-friction. Not the hardest problems in the company — those carry too much risk of agent failure and too much organizational resistance. Start with the work that's tedious, repetitive, and well-defined. That's where agents prove their value fastest and where the ROI story is cleanest.

His framework: get started with initial experiments, establish a baseline of the business outcomes you're targeting, then measure whether human behavior actually shifts. Don't measure minutes saved. Measure whether your close rate went up, your onboarding time went down, or your support tickets dropped.


What This Looks Like in Practice

Quorum, a legislative engagement platform, shared a case study during the Agent Analytics session that illustrated both problems in one story.

They built an internal AI agent, deployed it to employees, and initially had no visibility into whether people were using it or what problems they were running into. Standard observability tools told them technical performance. They had no window into the user experience side.

When they deployed Agent Analytics, they discovered that only about 10% of users were engaging with a core feature they'd built the agent around. They could see exactly where conversations were failing, what users were actually asking for, and which parts of the agent's knowledge base were inadequate.

That data let them intervene with targeted guides and content fixes without waiting for another engineering cycle. They could also see which organizations were using the agent for use cases the product team hadn't anticipated — which fed directly into their product roadmap.

The broader point: the gap between "the agent is live" and "the agent is working" is where most enterprise AI investments quietly fail. Security failures get headlines. Adoption failures don't. But adoption failure is more common, more expensive, and more preventable.


The Takeaway for Engineering and Product Teams

Three things worth carrying back from Pendomonium:

First, build security into your agent architecture from day one. Scoped credentials, explicit identity on every tool call, and an audit trail you can actually use. Retrofitting security onto a deployed agent is significantly harder than building it in at the start.

Second, establish a behavioral baseline before you deploy. Know what the process looks like today — how long it takes, where humans spend their time, what the business outcome metrics are. You'll need that baseline to make the ROI case after deployment.

Third, measure what humans do after the agent interaction, not just what the agent does. That's where the actual value is, and it's what leadership will eventually ask you to prove.

The good news is that these are all solvable problems. The engineering and tooling exist. The harder part is building the organizational habit of treating agent deployment as the beginning of the work, not the end.

1 Comment

1 vote

More Posts

Breaking the AI Data Bottleneck: How Hammerspace's AI Data Platform Eliminates Migration Nightmares

Tom Smithverified - Mar 16

AI Agents Don't Have Identities. That's Everyone's Problem.

Tom Smithverified - Mar 13

️ Agent Action Guard: Framework for Safer AI Agents

praneeth - Apr 1

Your Tech Stack Isn’t Your Ceiling. Your Story Is

Karol Modelskiverified - Apr 9

5 Web Dev Pitfalls That Are Silently Killing Your Projects (With Real Fixes)

Dharanidharan - Mar 3
chevron_left

Commenters (This Week)

2 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!