Your LLM Is Lying to You in Production — So I Built a System to Catch It in Real Time

Your LLM Is Lying to You in Production — So I Built a System to Catch It in Real Time

posted 2 min read

LLMs don't throw exceptions when they hallucinate. They return 200 OK with a confident wrong answer.

That's the problem I set out to fix. Here's how I built Failure Intelligence Engine (FIE) — a real-time observability layer that detects, classifies, and explains LLM failures before your users notice them.


Why Standard Monitoring Isn't Enough

Latency dashboards, error rates, uptime checks — none of these tell you if your model just hallucinated a drug interaction or contradicted itself three times in a row.

LLMs fail silently. You need a layer that understands what the model said, not just whether it responded.


The 4-Stage Pipeline

Your App → [Prompt Guard] → [Signal Extraction] → [DiagnosticJury] → Dashboard

Stack: FastAPI + LangGraph + MongoDB + Groq + React

Stage 1 — Prompt Guard

9 detection layers (regex + pattern matching, zero LLM calls) screen every incoming prompt for injection attacks, jailbreaks, and role manipulation. Score ≥ 0.75 = immediate block. Fast enough to run on every request.

Stage 2 — Failure Signal Vector

For prompts that pass the guard, FIE samples the model multiple times and builds a signal vector:

failure_signal = {
    "entropy":           0.82,  # how random are the outputs?
    "agreement_score":   0.31,  # do outputs semantically agree?
    "high_failure_risk": True,
}

High entropy + low agreement = the model is guessing.

Stage 3 — Archetype Classification

Every inference gets a human-readable failure label:

  • STABLE — Confident and consistent
  • HALLUCINATION_RISK — Entropy ≥ 0.75
  • UNSTABLE_OUTPUT — Outputs contradict each other
  • LOW_CONFIDENCE — Model hedges consistently
  • MODEL_BLIND_SPOT — Adversarial prompt blocked

Stage 4 — DiagnosticJury

Three specialized agents deliberate on every flagged failure:

  • AdversarialSpecialist — was this a prompt attack?
  • LinguisticAuditor — any factual gaps or contradictions?
  • KnowledgeAuditor — does the output contradict known facts?

Their verdicts aggregate into a jury_confidence score and a plain-English failure_summary.


Wired Together with LangGraph

g.add_conditional_edges(
    "prompt_guard",
    route_after_guard,
    {"block": "block", "signal_extract": "signal_extract"},
)
g.add_edge("signal_extract", "jury_deliberate")

Blocked prompts short-circuit to END instantly. Safe prompts run the full pipeline. Typed state makes every node's contract explicit and the whole thing debuggable.


Monitor Any LLM in 3 Lines

from fie_sdk import FIEClient

client = FIEClient(api_key="fie-your-key")
result = client.monitor(
    prompt="What is the capital of France?",
    model_outputs=["Paris", "Paris", "Lyon"],
)
print(result.archetype)        # UNSTABLE_OUTPUT
print(result.failure_summary)  # "Inconsistent outputs — 2/3 agree"

What I Learned

  • Watch inputs, not just outputs — the guard layer catches most failures for free
  • Single metrics lie, vectors don't — entropy alone is noisy; combine it with agreement and drift
  • Multi-agent deliberation catches edge cases — jury disagreement is itself a signal

GitHub: github.com/AyushSingh110/Failure_Intelligence_System
Live: failure-intelligence-system.pages.dev
Install: pip install fie-sdk


Ayush Singh — building open-source LLM infrastructure. GitHub · dev.to

More Posts

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

Dharanidharan - Feb 9

Everyone says DeepSeek is cheaper, but I got tired of guessing the exact math. So I built a calculat

abarth23 - Apr 27

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download)

Pocket Portfolioverified - Apr 1

I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules

snapsynapseverified - Apr 20
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

18 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!