Debuggix Tested 9 Security Engines On Kubernetes Goat. 134 Raw Findings. Only 6 Were Real. Here Is

Question

Debuggix Tested 9 Security Engines On Kubernetes Goat. 134 Raw Findings. Only 6 Were Real. Here Is

calendar_todayJun 12 • schedule4 min read

A case study in alert fatigue: how test files, build artifacts, and intentional patterns generate false positives, and why AI filtering changes the equation.

The Debuggix team ran a full security scan on Kubernetes Goat, a deliberately vulnerable training project. The raw scan across 9 engines produced 134 findings. Two were critical severity. Thirty-two were high severity.

Then we ran the same scan through our AI filter.

Six findings required attention. The rest were false positives.

This is the alert fatigue crisis. A developer running a standard security scan receives 134 alerts. Most are noise. The developer either spends hours triaging or ignores the scanner entirely. Neither outcome makes the code safer.

Here is what the noise actually looks like.

Where The False Positives Came From

Test files (47 findings): Kubernetes Goat includes test files that contain example secrets and intentionally vulnerable patterns. The scanners flagged these as real issues. But test files never run in production. The findings were irrelevant.

Build scripts (23 findings): The project includes build scripts that download packages from external URLs. The scanners flagged these as dependency on unverified sources. But build scripts run in a controlled CI environment. The findings were noise.

Intentional patterns (38 findings): Kubernetes Goat is designed to be vulnerable. The scanners correctly identified the vulnerabilities. But the project documentation clearly states that these vulnerabilities are intentional for training purposes. The findings were expected.

Documentation examples (26 findings): The project's README includes code examples that demonstrate insecure patterns. The scanners flagged these as real issues. But they are examples, not production code. The findings were misleading.

The Cost Of These False Positives

A developer running a standard security scan on Kubernetes Goat sees 134 findings. They do not know that 128 are false positives. They must investigate each one.

Investigating a finding takes approximately 2 minutes on average. Reading the code. Reading the documentation. Determining whether the finding applies to production.

134 findings at 2 minutes each is 268 minutes. Nearly 4.5 hours of developer time. For a training project.

For a real project that the developer is responsible for, the cost is even higher. The developer cannot ignore findings because some might be real. They must triage everything.

This is why most developers stop running security scanners. The time cost exceeds the perceived benefit.

What Existing Tools Do About False Positives

Snyk provides prioritization features. Findings are ranked by severity and exploitability. But the developer still must review each finding. Snyk does not automatically know that a test file is irrelevant.

Semgrep allows custom rules. A developer can write rules that ignore certain directories or patterns. But this requires expertise. Most developers never write custom rules.

GitHub Advanced Security uses CodeQL, which produces fewer false positives than some alternatives. But CodeQL still flags test files and example code. The developer still triages.

Trivy focuses on CVEs, which have lower false positive rates than static analysis. But Trivy misses application logic flaws entirely. The developer gains low noise but loses detection breadth.

Gitleaks flags potential secrets. Some are real. Some are example keys. The developer decides.

None of these tools read your documentation. None understand that your test directory is not production. None know that you intentionally use a vulnerable pattern for training.

What AI Filtering Does Differently

Debuggix runs the same 9 engines. Then it applies an AI filter that reads the project's documentation.

The AI identifies test directories and treats findings there as lower priority. It recognizes build scripts and evaluates them with appropriate severity. It reads README files to understand intentional patterns.

When the AI has low confidence about a finding, it reports that uncertainty. The developer sees "70 percent confidence" and knows to review manually. When the AI has high confidence, it flags the finding as action required.

On Kubernetes Goat, the AI read the README. It saw that the project is deliberately vulnerable for training. It classified all intentional findings accordingly. The developer saw six real issues that required attention, not 134.

How Debuggix Reports Confidence

Each finding in a Debuggix report includes a confidence score from 0 to 100 percent.

90-100 percent confidence: The AI is certain this is a real issue. The project documentation does not indicate intentional use. The finding is not in a test directory. The finding is not in example code. Fix this.

70-89 percent confidence: The AI is fairly certain but there is some ambiguity. The finding might be intentional but the documentation is unclear. The developer should review.

50-69 percent confidence: The AI has identified a pattern but cannot determine context. The developer should investigate.

Below 50 percent confidence: The AI thinks this is likely a false positive but includes it for transparency. The developer can likely ignore.

On Kubernetes Goat, the 128 false positives all received confidence scores below 50 percent. The 6 real issues received scores above 90 percent. The developer knew exactly where to focus.

What You Can Do Today Without AI

If you are not using an AI filter, you can still reduce false positives with these manual steps:

Step one: Configure your scanner to ignore test directories. Most scanners support ignore patterns. Add tests/, spec/, __tests__/, and testdata/ to your ignore list.

Step two: Separate development dependencies from production dependencies. A CVE in a testing library is lower priority than a CVE in a production library. Use dependency groups if your package manager supports them.

Step three: Document intentional patterns. If you use a deprecated algorithm for compatibility reasons, add a comment explaining why. A developer triaging a finding will see the comment and know to ignore it.

Step four: Run scanners in CI only on production branches. Running on every commit to every branch generates noise. Run on merge to main only.

These steps reduce false positives but do not eliminate them. AI filtering eliminates more.

The Bottom Line

Alert fatigue is not a problem of detection. The scanners are working. They find vulnerabilities. They also find everything else.

The problem is filtering. Developers need a way to separate real threats from noise. The technology exists. It uses AI to read documentation and understand context.

Until that technology is standard, developers will continue to ignore security scanners. Not because they are careless. Because they cannot afford the time to triage false positives.

Debuggix is free for open source repositories. Paid plans for private repos start at $29 per month.

Try it: debuggix.space

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	Debuggix Analyzed AI-Generated Code From Cursor, Lovable, And Bolt. Here Are The 5 Security Patterns Lucky - Jun 12
	Debuggix Ran 9 Security Engines Across 100 Repos. Here Is The Raw Data On Dependency CVEs. Lucky - Jun 12
	The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI Ken W. Algerverified - Jun 4
	Your Backup Data Knows More Than You Think. HYCU aiR Is Finally Asking It the Right Questions. Tom Smithverified - May 14
	Stop Implementing Authentication Inside Containers on Kubernetes Alexandre Vazquez - Jul 25

Debuggix Tested 9 Security Engines On Kubernetes Goat. 134 Raw Findings. Only 6 Were Real. Here Is

Where The False Positives Came From

The Cost Of These False Positives

What Existing Tools Do About False Positives

What AI Filtering Does Differently

How Debuggix Reports Confidence

What You Can Do Today Without AI

The Bottom Line

0 Comments

Please log in to comment on this post.

More Posts

Debuggix Analyzed AI-Generated Code From Cursor, Lovable, And Bolt. Here Are The 5 Security Patterns

Debuggix Ran 9 Security Engines Across 100 Repos. Here Is The Raw Data On Dependency CVEs.

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

Your Backup Data Knows More Than You Think. HYCU aiR Is Finally Asking It the Right Questions.

Stop Implementing Authentication Inside Containers on Kubernetes

More From Lucky

Why Dependencies Are Your Biggest Security Risk

The "It Works Locally" Trap: Why Your Local Environment Is Lying to You

AI Coding Assistants Are Secretly Making Your Code Less Secure

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,763 amazing developers

Don't have an account? Sign up

OR

Debuggix Tested 9 Security Engines On Kubernetes Goat. 134 Raw Findings. Only 6 Were Real. Here Is

Where The False Positives Came From

The Cost Of These False Positives

What Existing Tools Do About False Positives

What AI Filtering Does Differently

How Debuggix Reports Confidence

What You Can Do Today Without AI

The Bottom Line

0 Comments

Please log in to comment on this post.

More Posts

More From Lucky

Related Jobs

Commenters (This Week)