Built an Autonomous DFIR Agent SIFT-AEGIS — Here's What I Learned

Question

Built an Autonomous DFIR Agent SIFT-AEGIS — Here's What I Learned

calendar_todayJun 16 • schedule2 min read

I just submitted to the SANS Find Evil! Hackathon and wanted to share what I built: SIFT-AEGIS, a fully autonomous Digital Forensics and Incident Response agent that investigates a real criminal case end-to-end without human intervention.
The Problem
DFIR investigations are slow and analyst-dependent. A skilled investigator working the M57-Patents case (real NIST CFReDS corporate espionage scenario) might take hours to correlate memory artifacts, disk evidence, emails, and browser history into a coherent theory. I wanted to know: can an AI agent do this autonomously with the same rigor — and prove it with ground-truth benchmarks?
What I Built
SIFT-AEGIS runs on a SANS SIFT Workstation and connects to a custom Model Context Protocol (MCP) server with 20 read-only forensic tools wrapping Volatility3. The key architectural decision: the write-block is architectural, not prompt-based. The MCP server has zero write/delete/shell tools. The agent physically cannot destroy evidence.
The investigation pipeline:

Iteration 1 — Memory forensics (process list, malfind injection detection, network connections, registry persistence)
Iteration 2 — Disk forensics (MFT timeline, email artifacts, browser history, document staging)
Iteration 3 — Cross-correlation, theory finalization, disconfirmation searches

The Competing Case Theory Engine simultaneously scores three theories — Insider Threat, Authorized Business Research, Accidental Activity — updating confidence as evidence accumulates. Theory evolution: 70% → 99% across iterations.
Honest Benchmark Results
Against the 10-item M57-Patents ground truth:

Disk-Layer Precision: 1.0 (zero false positives)
Disk-Layer Recall: 0.8 (8/10 ground truth items found)
Disk-Layer F1: 0.8889
Hallucination Rate: 0.0
Total tool calls: 33
Self-corrections applied: 45
Runtime: ~6 minutes

I report two scores transparently — disk-layer only and full-finding (which includes memory injection artifacts that have no ground truth entry). Most submissions would hide the lower number. I reported both and documented why.
Biggest Technical Challenges

Volatility3 output noise — raw output has stderr, progress bars, and warnings that corrupted MCP responses. Fixed with a stdio wrapper.
Ground truth scope mismatch — M57 ground truth covers disk only. Memory injection findings (csrss.exe, winlogon.exe) are real artifacts but score as false positives. Solved with dual-score methodology.
Self-correction memoization — early iterations re-ran failing tools repeatedly. Fixed with a persistent failure tracker that marks degraded tools and routes to alternatives.
Autonomous polling — getting the agent to monitor a 6-minute investigation without human prompting required careful tool return message design.

What I'd Do Differently
Start with ground truth benchmarking on day 1, not day 3. The benchmark revealed gaps that shaped the entire architecture — if I'd built it first I'd have made better tool coverage decisions earlier.
Stack
Python, FastMCP, Volatility3, OpenClaw, Gemini API, Google ADK, SIFT Workstation

Links

GitHub: https://github.com/ssurekumar01111-hue/sift-aegis
Demo: https://youtu.be/iaf47TIKkLw
Devpost: https://devpost.com/software/sift-aegis

Built solo in 4 days. Happy to answer questions about the MCP architecture, the self-correction loop, or the benchmarking methodology.

2 Comments

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Surendra Kumar

2.9k Points • 42 Badges

India • portfolio.gfood.in

12Posts

20Comments

4Connections

Flutter and Firebase developer from Banda, India. I spend my time building
real, production-grade m... Show more

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

HakoneMatate · Answer 1 · 2026-06-17T15:21:54+0000

HakoneMatate • Jun 17

Very cool project. What was the biggest challenge getting the agent to make reliable decisions on its own?

MorningStar47 • Jun 17

@[HakoneMatate] Thanks! The biggest challenge was preventing the agent from becoming overconfident based on a single piece of evidence.

In DFIR, one artifact rarely tells the full story. Early versions of SIFT-AEGIS would find something suspicious and immediately treat it as significant. To address that, I built a Promotion Integrity Engine that requires corroboration across multiple tools and forensic domains before a finding can be promoted.

Another challenge was balancing autonomy with forensic rigor. The agent needed enough freedom to generate investigative leads and explore evidence, while still maintaining a complete audit trail and evidence-grounded reasoning. That led to features like the Self-Correction Engine, Competing Case Theory Engine, and read-only MCP architecture.

The result is an agent that doesn't just collect artifacts—it continuously questions its own conclusions and looks for supporting or contradictory evidence before reaching a verdict.

	The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI Ken W. Algerverified - Jun 4
	MCP Is the USB-C of AI. So Why Are You Plugging Everything In? Ken W. Algerverified - Jun 10
	How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work Dharanidharan - Feb 9
	Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI Masbadar - Mar 12
	I Wrote a Script to Fix Audible's Unreadable PDF Filenames snapsynapseverified - Apr 20

Built an Autonomous DFIR Agent SIFT-AEGIS — Here's What I Learned

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

MCP Is the USB-C of AI. So Why Are You Plugging Everything In?

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

More From MorningStar47

Overengineering Doesn't Always Win Hackathons — I Learned This the Hard Way

Why "Just Build JARVIS" Is Harder Than It Sounds (And What Actually Gets You Close)

What If Your Repository Could Explain Itself?

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,759 amazing developers

Don't have an account? Sign up

OR

Built an Autonomous DFIR Agent SIFT-AEGIS — Here's What I Learned

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From MorningStar47

Related Jobs

Commenters (This Week)