Each /slop Is a Calibration Signal — AI-SLOP Detector v3.6.0 and the Claude Code Skill

Question

Each /slop Is a Calibration Signal — AI-SLOP Detector v3.6.0 and the Claude Code Skill

calendar_todayApr 30 • schedule5 min read

— Originally published at flamehaven.space

The Quiet Failure of AI Development

AI-assisted development has a quiet failure mode: the assistant that creates the pattern often becomes the assistant that reviews it.

When you and Claude work inside the same session, you drift together. The review criteria shift with the assistant's habits. After enough sessions, the same assistant that wrote the hollow function body is also the one approving the pull request. There is no external reference point — unless you build one.

That is the problem AI-SLOP Detector v3.6.0 addresses with the Claude Code skill.

Every time you run /slop inside a session, the scan result is recorded to a project-scoped history. When enough re-scan evidence accumulates, bounded self-calibration adjusts the detection weights for your codebase — automatically, without a manual command. The scanner does not drift with the session. It stays anchored to observed scan outcomes.

It does not get smarter every time. It builds calibration signal every time. That is a more accurate claim, and the distinction matters.

What the Skill Does

The Skill layer Quality Policy

Install:

cp -r claude-skills/slop-detector ~/.claude/skills/slop-detector
# restart Claude Code

Four slash commands become available:

Command	What it does
`/slop`	Full project scan — interprets findings, prioritizes fixes, proposes patch plan
`/slop-file [path]`	Per-file deep-dive — explains each metric, gives concrete fix per pattern
`/slop-gate`	Hard gate decision — PASS or FAIL, lists blocking files with deficit_score >= 70
`/slop-spar`	Adversarial validation — probes metric boundaries, catches calibration drift

The intended workflow inside a Claude session:

1. /slop               → baseline scan, identify top offenders
2. review findings     → Claude prioritizes by deficit_score
3. patch files         → fix patterns with Claude's help
4. /slop-file <path>   → verify improvement per file
5. /slop               → confirm project aggregate improved
6. /slop-gate          → gate decision before merge

Quality policy lives in the skill layer. You do not re-explain what CRITICAL_DEFICIT means or which patterns are critical on every session.

The LEDA Flywheel

This is the part that matters.

LEDA is not model retraining. It is bounded weight calibration based on repeated scan outcomes.

/slop runs slop-detector --project . --json — without --no-history. Every invocation auto-records results to ~/.slop-detector/history.db, tagged with a project_id (sha256 of cwd) so signals never mix across different repositories.

After every 10 re-scanned files, the tool runs the LEDA self-calibration loop automatically:

/slop called
    │
    ├─► scan result → recorded to history.db (project-scoped)
    │
    ├─► 10 re-scanned files milestone?
    │       └─► SelfCalibrator: 4D grid-search over run history
    │               (ldr × inflation × ddc × purity weights)
    │               └─► confidence gap > 0.10?
    │                       └─► .slopconfig.yaml updated silently
    │
    └─► next /slop → calibrated weights, sharper detection

The calibrator uses re-scanned files as signal — not raw record count. A file counts toward the milestone only when the tool has seen it improve or degrade across at least two runs. This prevents first-time project scans from triggering calibration on noise.

Constrained to Reality

Three constraints keep calibration bounded:

Domain-anchored — grid search is constrained to ±0.15 around domain baseline weights. Detection cannot drift outside the meaningful range for your project type.
Confidence gate — only applies when the top candidate weight set beats the second by > 0.10. Ambiguous signals produce no change.
Drift warnings — CalibrationResult.warnings flags any dimension that shifted > 0.25 from the anchor.

/slop-spar adds a separate adversarial layer: it probes known-pattern anchors, metric boundary cases, and existence conditions. When it detects that measured behavior has diverged from metric claims, it recommends --self-calibrate --apply-calibration explicitly.

What the Data Shows — and What We Won't Claim

Workflow telemetry, not empty claims

We will not tell you that AI-SLOP Detector improves code quality by X%.

We have not run a controlled study. We have not compared matched projects with and without the tool. Any number we put here would be a claim we cannot prove, and this tool is built specifically to catch that kind of thing.

What we do have: the tool scanning itself. Every time a core module was changed, it got re-scanned. N = 14,367 records across all projects in ~/.slop-detector/history.db.

This is not outcome evidence. It is workflow telemetry. Here is what the scan history shows for the eight most-improved files in this codebase:

File                   Scans  Worst → Best   Improvement
─────────────────────────────────────────────────────────
ddc.py                   86   87.8 →  11.0    -76.8 pts
placeholder.py           92   70.3 →   0.0    -70.3 pts
cross_file.py            89   70.3 →   5.0    -65.3 pts
ci_gate.py               88   69.3 →   6.2    -63.1 pts
cli.py                   88   68.4 →   8.4    -60.0 pts
ldr.py                   90   58.0 →   0.1    -57.9 pts
python_advanced.py       95   74.0 →  18.0    -55.9 pts
context_jargon.py        86   55.7 →   5.0    -50.7 pts
─────────────────────────────────────────────────────────
Source: self-scan, history.db — not an independent study

And the weekly project aggregate (avg deficit score):

Week      Avg Deficit   Critical Files   Note
────────────────────────────────────────────────────────
2026-W09     11.9            3           baseline
2026-W10     22.1           20           structural refactor spike
2026-W14     20.0           58           large feature addition
2026-W15     11.9           14           post-refactor recovery
2026-W17     12.2           13           current — stable CLEAN state
────────────────────────────────────────────────────────

The mechanism is not mysterious. Scan reveals structural problems → Claude sees exact pattern names and line references → Claude (or the developer) fixes them → rescan confirms improvement → LEDA registers the delta and adjusts detection weights accordingly.

The loop does not guarantee quality. It makes quality visible, then measurable, then improvable.

Whether that loop improves your codebase is something your history.db will tell you — not us.

Also in v3.6.0

System diagnostics & Protocol refinements

CI gate exit code fix. --ci-mode hard without --ci-report was returning exit 0 even on CRITICAL_DEFICIT files — a two-line fix in _evaluate_ci_gate() (commit 0d67997). This affected v3.1.1 through v3.5.0 on the specific path of using the gate without the reporting flag. A regression test at the subprocess level was added to prevent recurrence (commit 0208af4).

Pre-commit hooks rewritten. Three hook variants now use python -m slop_detector.cli as entry point (bypasses Windows .exe wrapper exit-code issue), and --severity high (nonexistent flag) replaced with --ci-mode:

repos:
  - repo: https://github.com/flamehaven01/AI-SLOP-Detector
    rev: v3.6.0
    hooks:
      - id: slop-detector           # hard gate
      # - id: slop-detector-warn    # report only
      # - id: slop-detector-patterns  # fast per-file

VS Code Extension v3.6.0. Version tracks core library. No behavior changes from v3.5.0.

The Shape of the Loop

An External reference point

The skill + LEDA loop is the external reference point. Detection weights stay grounded in observed scan outcomes — files that improved across re-scans, files that stayed problematic — rather than in what the assistant believes is correct at any given moment.

The loop does not guarantee quality. It makes quality visible, then measurable, then improvable.

We won't tell you what percentage your code will improve. That would make us the thing we are trying to detect.

The scanner is not Claude's opinion about code quality. It is a measurement that gets calibrated against reality, session by session. Your history.db will tell you the rest.

The Shape of the Loop

Links:

4 Comments

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

J.Bruni · Answer 1 · 2026-05-02T04:44:36+0000

J.Bruni • May 1

Interesting idea. But isn’t this still kinda self referential if Claude is involved in the fixes too? How do you avoid bias creeping back in?

Flamehaven • May 2

@[J.Bruni] That is an incredibly sharp question, and honestly, the self-referential loop is the greatest danger in AI-assisted engineering. If AI detects the slop, and AI fixes the slop, you eventually end up optimizing for the AI's stylistic preferences, not actual code quality.

Here is exactly how AI-SLOP-DETECTOR breaks that loop structurally:

1) The tool is a Diagnostic Instrument, not an Auto-Fixer. AI-SLOP-DETECTOR is explicitly designed to be an X-Ray, not a surgeon. The purpose of the Claude Code integration (/slop) is not for the AI to blindly auto-generate fixes. It is to present structured, mathematical evidence to the human developer, who then directs the AI on how to patch it. AI measures; the human judges.

2) Math over Opinion (Structural Facts vs. Stylistic Vibes). The metrics aren't based on an LLM's "vibe check." They are grounded in deterministic parsing:

LDR (Logic Density): Measured by AST node counting.
DDC (Dependency Check): Verified via importlib.util.find_spec.
Complexity: Computed mathematically by radon. Claude cannot "hallucinate" or smooth-talk its way out of a 300-line function with a cyclomatic complexity of 45. The metrics are structural facts.

3) The Oracle is Human Behavior, Not AI Scoring. When the engine runs its self-calibration, its ground truth isn't "did the AI score improve?" The ground truth is human git commits.

If the tool flags an issue and the human developer accepts a fix (an improvement event), the weight is reinforced.
If the tool flags an issue but the human ignores it, it is logged as a false-positive candidate. The calibration anchor is entirely rooted in what the human actually decides to merge.

In short: Bias creeps in when you treat AI as the judge. We avoid it by stripping AI of its judicial power, reducing it to a highly articulate measuring tape, and keeping the human developer firmly as the oracle."

I am actually releasing version 3.7.0 today, which heavily solidifies this exact architectural philosophy. I would genuinely appreciate it if you could test it out in your workflow. I'm not looking for 'stars' or superficial praise—what I really need is actual, ruthless feedback from practitioners with sharp perspectives like yours.

Thank you again for the insightful critique. It’s exactly this kind of thinking that forces the tool to become better.

DuchessCodes · Answer 2 · 2026-05-02T18:44:22+0000

DuchessCodes • May 2

Interesting approach especially the focus on breaking the ‘same agent writes and reviews’ loop. The calibration angle is compelling, but I wouldn’t frame it as the solution to drift. External signals can come from multiple places (separate review passes, different models, human review), and this is one structured way to enforce that. Framing it as ‘one way to anchor evaluation’ might land stronger than positioning the loop itself as the anchor

Flamehaven • May 2

@[DuchessCodes] — Your framing is sharper than mine.
You are right: “one structured anchor” is the honest claim.
Not the solution to drift.

An anchor.

That distinction matters.

Two clarifications about this repo:

1. This is not an auto-fixer.

AI-SLOP Detector is a diagnostic instrument.

It surfaces structural evidence for a developer to judge:

▪️ AST-derived signals
▪️ metric breakdowns
▪️ line references
▪️ critical deficit patterns
▪️ disconnected or hollow structures

The point of the Claude Code Skill loop is not:
“AI found it, AI fixed it, AI approved it.”

That would collapse back into self-reference.

The point is:
scan → diagnose → patch → re-scan → gate → calibrate

AI can assist inside the loop. But AI does not get final authority.
AI measures. The human decides.

2. This is one layer in a review stack, not the stack.

External signals can come from many places:

▪️ independent human review
▪️ separate model passes
▪️ adversarial metric checks
▪️ CI gates
▪️ regression tests
▪️ domain-specific review tools

AI-SLOP Detector is one structured measurement layer in that stack.
SPAR is another layer I use for claim-aware review — checking whether the metric still supports the claim.

Each catches what the others miss.
We do not read one paper and claim to understand a field.
We should not run one tool and claim the code is clean.

Constant skepticism.
Never absolute trust.

So yes — I agree with your correction.

The stronger framing is not:
“This solves drift.”

It is:

“This gives AI-assisted development one measurable anchor outside the assistant’s immediate confidence loop.”

One more thing, I’ve learned from reading your security writing.

That same standard — mindset over tools — is what I’m trying to preserve here.

Thanks for pushing on it.

Critique like yours is what keeps tooling honest.

	The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI Ken W. Algerverified - Jun 4
	Helping Clients Move from Pilot to Production: The Agentic AI Governance Playbook Tom Smithverified - Jun 8
	It Gets Smarter Every Scan: AI-SLOP Detector v3.5.0 and the Self-Calibration Loop Flamehaven - Apr 13
	I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules snapsynapseverified - Apr 20
	I Wrote a Script to Fix Audible's Unreadable PDF Filenames snapsynapseverified - Apr 20

Each /slop Is a Calibration Signal — AI-SLOP Detector v3.6.0 and the Claude Code Skill

What the Skill Does

The LEDA Flywheel

What the Data Shows — and What We Won't Claim

Also in v3.6.0

The Shape of the Loop

4 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

Helping Clients Move from Pilot to Production: The Agentic AI Governance Playbook

It Gets Smarter Every Scan: AI-SLOP Detector v3.5.0 and the Self-Calibration Loop

I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

More From Flamehaven

The Slop Gap: Why Rising Model Capability Hasn't Solved Code Quality

Loop Engineering, Not Loop Running: What Agent Loops Can Learn from Parcae

You Can’t Outsource Agent Judgment

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,762 amazing developers

Don't have an account? Sign up

OR

Each /slop Is a Calibration Signal — AI-SLOP Detector v3.6.0 and the Claude Code Skill

What the Skill Does

The LEDA Flywheel

What the Data Shows — and What We Won't Claim

Also in v3.6.0

The Shape of the Loop

4 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Flamehaven

Related Jobs

Commenters (This Week)