The Four Villains Living in Your Agent's System Prompt

Question

The Four Villains Living in Your Agent's System Prompt

calendar_todayJun 25 • schedule2 min read

— Originally published at renezander.com

Your AI agent fails decisions for the same four reasons a bad manager does. A bigger model fixes none of them.

Not because the model is dumb. Because nothing in its loop forces it to widen its options, look for evidence it is wrong, or check itself before it reports "done." It takes the first reading of your prompt and runs.

You know the shape. The agent confidently ships a plan, the plan was wrong three steps back, and the only signal you got was a fluent summary saying it worked. A reliability study this June put a number on it: the strongest models melt down most in long task chains, failure rates up to 19%, precisely because they chase the most ambitious strategies.

These four failures are not new. Chip and Dan Heath named them in Decisive, a 2013 book about human decisions. They call them the four villains.

Narrow framing. The agent treats a task as one path and never generates a second. No "what else could this mean."

Confirmation bias. It defends its own first plan instead of testing it. It collects reasons it is right, not reasons it is wrong.

Short-term pull. For a human it is emotion. For an agent it is the cheapest token path: the answer fastest to produce, not the one that holds.

Overconfidence. The dangerous one. It marks work complete without verifying, then writes you a convincing story about it.

The Heaths' answer is a process you can encode. Four steps, and all four fit in a system prompt as a gate every non-trivial decision passes through. The acronym is WRAP.

W, widen. Force at least two real options before committing. The cheap trigger: "if the obvious approach were banned, what would I do?" Put it in the prompt as a required step, not a suggestion.

R, reality-test. Ooch before you commit: run the change against fake data or a dry-run, not the whole thing live. And make the agent hunt for the disconfirming fact, not the confirming one.

A, attain distance. Tag the decision: reversible, or one-way door? Reversible runs autonomously. One-way doors stop and ask. That single line of policy buys back most of your blast radius.

P, prepare to be wrong. The step everyone skips. A premortem ("it is a week later and this broke, why?") plus a tripwire: a concrete signal that triggers a halt. Call it a circuit breaker if that lands better. Without it, "autonomous" just means "fails silently for longer."

This is not a book riff. In June 2026 Google DeepMind shipped its AI Control Roadmap, which treats internal agents as potentially misaligned and has a second trusted system watch the working one. That is reality-test and prepare-to-be-wrong, in production, at one of the labs building the models. The same week's reliability research says the same thing from the other side: more capability, more meltdown.

So the lever is not the next model. The Heaths measured that a disciplined process contributes more to decision quality than added analysis. For agents that means the four steps belong in the prompt, not the model card.

Pull up your agent's system prompt. Which of the four villains does it actually gate, and which one is it one bad tool call away from?

I write field notes from real builds: AI integration, cron-driven automation, and the parts that break in production. New posts every two weeks; if this one was useful, the agent playbook is the companion download.

2 Comments

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

peponder · Answer 1 · 2026-06-27T05:09:12+0000

peponder • Jun 27

Nice read. It's easy to overlook prompt design until things start breaking. Have you found one fix that gives the biggest improvement?

René Zander • Jun 27

@[peponder] Yes, I told my agent to build a system prompt, at the right place, that fits the WRAP framework. Ask your agent if he thinks the framework makes sense for agents, and first understand it yourself, and be convinced enough that it should also be used with your agent communications. I believe that human biases inherently convert into anything humans build or touch. That led me here. That all said, not every decision must go through the framework. The WRAP framework is for these hard to revert bigger decisions.

	Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download) Pocket Portfolio - Apr 1
	I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt Karol Modelskiverified - Mar 19
	The Privacy Gap: Why sending financial ledgers to OpenAI is broken Pocket Portfolio - Feb 23
	Architecting a Local-First Hybrid RAG for Finance Pocket Portfolio - Feb 25
	The Trust Gap: Why Your Product Fails Even When the Math is Right Karol Modelskiverified - Jul 16

The Four Villains Living in Your Agent's System Prompt

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download)

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

The Privacy Gap: Why sending financial ledgers to OpenAI is broken

Architecting a Local-First Hybrid RAG for Finance

The Trust Gap: Why Your Product Fails Even When the Math is Right

More From René Zander

Never Let an AI Agent Grade Its Own Homework

This Smart-Home Agent Treats Its Own 1B Model as Untrusted Input

Sandboxing an AI Coding Agent: The Harness Owns the Boundaries

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,753 amazing developers

Don't have an account? Sign up

OR

The Four Villains Living in Your Agent's System Prompt

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From René Zander

Related Jobs

Commenters (This Week)