A case study in alert fatigue: how test files, build artifacts, and intentional patterns generate false positives, and why AI filtering changes the equation.
The Debuggix team ran a full security scan on Kubernetes Goat, a deliberately vulnerable training project. The raw scan across 9 engines produced 134 findings. Two were critical severity. Thirty-two were high severity.
Then we ran the same scan through our AI filter.
Six findings required attention. The rest were false positives.
This is the alert fatigue crisis. A developer running a standard security scan receives 134 alerts. Most are noise. The developer either spends hours triaging or ignores the scanner entirely. Neither outcome makes the code safer.
Here is what the noise actually looks like.
Where The False Positives Came From
Test files (47 findings): Kubernetes Goat includes test files that contain example secrets and intentionally vulnerable patterns. The scanners flagged these as real issues. But test files never run in production. The findings were irrelevant.
Build scripts (23 findings): The project includes build scripts that download packages from external URLs. The scanners flagged these as dependency on unverified sources. But build scripts run in a controlled CI environment. The findings were noise.
Intentional patterns (38 findings): Kubernetes Goat is designed to be vulnerable. The scanners correctly identified the vulnerabilities. But the project documentation clearly states that these vulnerabilities are intentional for training purposes. The findings were expected.
Documentation examples (26 findings): The project's README includes code examples that demonstrate insecure patterns. The scanners flagged these as real issues. But they are examples, not production code. The findings were misleading.
The Cost Of These False Positives
A developer running a standard security scan on Kubernetes Goat sees 134 findings. They do not know that 128 are false positives. They must investigate each one.
Investigating a finding takes approximately 2 minutes on average. Reading the code. Reading the documentation. Determining whether the finding applies to production.
134 findings at 2 minutes each is 268 minutes. Nearly 4.5 hours of developer time. For a training project.
For a real project that the developer is responsible for, the cost is even higher. The developer cannot ignore findings because some might be real. They must triage everything.
This is why most developers stop running security scanners. The time cost exceeds the perceived benefit.
Snyk provides prioritization features. Findings are ranked by severity and exploitability. But the developer still must review each finding. Snyk does not automatically know that a test file is irrelevant.
Semgrep allows custom rules. A developer can write rules that ignore certain directories or patterns. But this requires expertise. Most developers never write custom rules.
GitHub Advanced Security uses CodeQL, which produces fewer false positives than some alternatives. But CodeQL still flags test files and example code. The developer still triages.
Trivy focuses on CVEs, which have lower false positive rates than static analysis. But Trivy misses application logic flaws entirely. The developer gains low noise but loses detection breadth.
Gitleaks flags potential secrets. Some are real. Some are example keys. The developer decides.
None of these tools read your documentation. None understand that your test directory is not production. None know that you intentionally use a vulnerable pattern for training.
What AI Filtering Does Differently
Debuggix runs the same 9 engines. Then it applies an AI filter that reads the project's documentation.
The AI identifies test directories and treats findings there as lower priority. It recognizes build scripts and evaluates them with appropriate severity. It reads README files to understand intentional patterns.
When the AI has low confidence about a finding, it reports that uncertainty. The developer sees "70 percent confidence" and knows to review manually. When the AI has high confidence, it flags the finding as action required.
On Kubernetes Goat, the AI read the README. It saw that the project is deliberately vulnerable for training. It classified all intentional findings accordingly. The developer saw six real issues that required attention, not 134.
How Debuggix Reports Confidence
Each finding in a Debuggix report includes a confidence score from 0 to 100 percent.
90-100 percent confidence: The AI is certain this is a real issue. The project documentation does not indicate intentional use. The finding is not in a test directory. The finding is not in example code. Fix this.
70-89 percent confidence: The AI is fairly certain but there is some ambiguity. The finding might be intentional but the documentation is unclear. The developer should review.
50-69 percent confidence: The AI has identified a pattern but cannot determine context. The developer should investigate.
Below 50 percent confidence: The AI thinks this is likely a false positive but includes it for transparency. The developer can likely ignore.
On Kubernetes Goat, the 128 false positives all received confidence scores below 50 percent. The 6 real issues received scores above 90 percent. The developer knew exactly where to focus.
What You Can Do Today Without AI
If you are not using an AI filter, you can still reduce false positives with these manual steps:
Step one: Configure your scanner to ignore test directories. Most scanners support ignore patterns. Add tests/, spec/, __tests__/, and testdata/ to your ignore list.
Step two: Separate development dependencies from production dependencies. A CVE in a testing library is lower priority than a CVE in a production library. Use dependency groups if your package manager supports them.
Step three: Document intentional patterns. If you use a deprecated algorithm for compatibility reasons, add a comment explaining why. A developer triaging a finding will see the comment and know to ignore it.
Step four: Run scanners in CI only on production branches. Running on every commit to every branch generates noise. Run on merge to main only.
These steps reduce false positives but do not eliminate them. AI filtering eliminates more.
The Bottom Line
Alert fatigue is not a problem of detection. The scanners are working. They find vulnerabilities. They also find everything else.
The problem is filtering. Developers need a way to separate real threats from noise. The technology exists. It uses AI to read documentation and understand context.
Until that technology is standard, developers will continue to ignore security scanners. Not because they are careless. Because they cannot afford the time to triage false positives.
Debuggix is free for open source repositories. Paid plans for private repos start at $29 per month.
Try it: debuggix.space