Why your RAG pipeline, guardrails, and prompt engineering each fix a different hallucination — and why none of them fix the others.
The Frustration
You built a RAG system. Added re-ranking. Tuned the temperature. Even threw in a "don't make things up" prompt. And yet — your agent still hallucinates. So you add more retrieval chunks. Then guardrails. Then another LLM call to "verify" the first one. The hallucinations shift shape, but they never disappear.
Here's the hard truth: you're treating five different architectural failures as one "hallucination bug."
Type 1: Confabulation — The LLM Makes Facts Up
What it looks like: "According to a 2023 study by Stanford..." (no such study exists)
Why it happens: The model is a next-token predictor with no ground-truth anchor. It optimizes for plausibility, not accuracy.
The fix — Retrieval Grounding + Fact Verification:
- Every claim must cite a retrieved source
- Verification LLM checks claim ↔ source alignment
- Abstention gate: if no source supports it, output "I don't know"
Prompt engineering won't fix this. The model has no fact-checker in its weights. You need an external verification architecture.
Type 2: Attribution Error — Right Fact, Wrong Source
What it looks like: Cites Document A for a fact that actually came from Document B
Why it happens: Semantic similarity retrieves the wrong chunk; the LLM doesn't verify provenance
The fix — Provenance Tracking + Citation Scoring:
- Chunk-level metadata: source ID, section, timestamp
- Citation scorer: cross-check generated citation against retrieved chunks
- Mismatch → flag or abstain
Type 3: Temporal Drift — The Truth Expired
What it looks like: "The latest React version is 18.2" (it's 19.x now)
Why it happens: Static knowledge base with no freshness signal; retrieval doesn't prioritize recency
The fix — Knowledge Freshness Timestamps + Update Pipelines:
- Every chunk tagged with last_updated
- Retrieval re-ranks by recency for time-sensitive queries
- Scheduled re-indexing pipeline for evolving domains
Type 4: Logical Inconsistency —Contradictory Claims in One Answer
What it looks like: "X is faster than Y" and "Y outperforms X" in the same response
Why it happens: LLM processes tokens locally; no global consistency check across the full output
The fix — Structured Reasoning + Formal Verification:
- Chain-of-thought with explicit intermediate claims
- Consistency checker: compare all claims pairwise
- For critical domains: compile claims to SAT/SMT solver
What it looks like: call_tool("get_weather", location="Mars") — tool doesn't exist, or parameter is invalid
Why it happens: The LLM generates tool calls like it generates text — by pattern matching, not by schema validation
The fix — Tool Schema Validation + Execution Sandbox:
- Tool registry with JSON schemas
- Pre-execution validator: name + params must match schema
- Sandbox: tool runs in isolated env, output validated before passing back
The Unified Architecture

Closing
Hallucination is not a bug you patch. It's a symptom of five different architectural gaps. Fix the architecture, and the "hallucination problem" dissolves into five solvable engineering problems.