The Four Villains Living in Your Agent's System Prompt

The Four Villains Living in Your Agent's System Prompt

1 3 20
calendar_today agoschedule2 min read
— Originally published at renezander.com

Your AI agent fails decisions for the same four reasons a bad manager does. A bigger model fixes none of them.

Not because the model is dumb. Because nothing in its loop forces it to widen its options, look for evidence it is wrong, or check itself before it reports "done." It takes the first reading of your prompt and runs.

You know the shape. The agent confidently ships a plan, the plan was wrong three steps back, and the only signal you got was a fluent summary saying it worked. A reliability study this June put a number on it: the strongest models melt down most in long task chains, failure rates up to 19%, precisely because they chase the most ambitious strategies.

These four failures are not new. Chip and Dan Heath named them in Decisive, a 2013 book about human decisions. They call them the four villains.

Narrow framing. The agent treats a task as one path and never generates a second. No "what else could this mean."

Confirmation bias. It defends its own first plan instead of testing it. It collects reasons it is right, not reasons it is wrong.

Short-term pull. For a human it is emotion. For an agent it is the cheapest token path: the answer fastest to produce, not the one that holds.

Overconfidence. The dangerous one. It marks work complete without verifying, then writes you a convincing story about it.

The Heaths' answer is a process you can encode. Four steps, and all four fit in a system prompt as a gate every non-trivial decision passes through. The acronym is WRAP.

W, widen. Force at least two real options before committing. The cheap trigger: "if the obvious approach were banned, what would I do?" Put it in the prompt as a required step, not a suggestion.

R, reality-test. Ooch before you commit: run the change against fake data or a dry-run, not the whole thing live. And make the agent hunt for the disconfirming fact, not the confirming one.

A, attain distance. Tag the decision: reversible, or one-way door? Reversible runs autonomously. One-way doors stop and ask. That single line of policy buys back most of your blast radius.

P, prepare to be wrong. The step everyone skips. A premortem ("it is a week later and this broke, why?") plus a tripwire: a concrete signal that triggers a halt. Call it a circuit breaker if that lands better. Without it, "autonomous" just means "fails silently for longer."

This is not a book riff. In June 2026 Google DeepMind shipped its AI Control Roadmap, which treats internal agents as potentially misaligned and has a second trusted system watch the working one. That is reality-test and prepare-to-be-wrong, in production, at one of the labs building the models. The same week's reliability research says the same thing from the other side: more capability, more meltdown.

So the lever is not the next model. The Heaths measured that a disciplined process contributes more to decision quality than added analysis. For agents that means the four steps belong in the prompt, not the model card.

Pull up your agent's system prompt. Which of the four villains does it actually gate, and which one is it one bad tool call away from?


I write field notes from real builds: AI integration, cron-driven automation, and the parts that break in production. New posts every two weeks; if this one was useful, the agent playbook is the companion download.

🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download)

Pocket Portfolio - Apr 1

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

The Privacy Gap: Why sending financial ledgers to OpenAI is broken

Pocket Portfolio - Feb 23

Architecting a Local-First Hybrid RAG for Finance

Pocket Portfolio - Feb 25

I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules

snapsynapseverified - Apr 20
chevron_left
560 Points24 Badges
Berlin, Germanyrenezander.com
15Posts
5Comments
6Connections
Enterprise AI Architect | Production-Ready AI Agents for CRM & ERP

Related Jobs

View all jobs →

Commenters (This Week)

1 comment
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!