Built an Autonomous DFIR Agent SIFT-AEGIS — Here's What I Learned

Leader 2 3 22
calendar_today agoschedule2 min read

I just submitted to the SANS Find Evil! Hackathon and wanted to share what I built: SIFT-AEGIS, a fully autonomous Digital Forensics and Incident Response agent that investigates a real criminal case end-to-end without human intervention.
The Problem
DFIR investigations are slow and analyst-dependent. A skilled investigator working the M57-Patents case (real NIST CFReDS corporate espionage scenario) might take hours to correlate memory artifacts, disk evidence, emails, and browser history into a coherent theory. I wanted to know: can an AI agent do this autonomously with the same rigor — and prove it with ground-truth benchmarks?
What I Built
SIFT-AEGIS runs on a SANS SIFT Workstation and connects to a custom Model Context Protocol (MCP) server with 20 read-only forensic tools wrapping Volatility3. The key architectural decision: the write-block is architectural, not prompt-based. The MCP server has zero write/delete/shell tools. The agent physically cannot destroy evidence.
The investigation pipeline:

Iteration 1 — Memory forensics (process list, malfind injection detection, network connections, registry persistence)
Iteration 2 — Disk forensics (MFT timeline, email artifacts, browser history, document staging)
Iteration 3 — Cross-correlation, theory finalization, disconfirmation searches

The Competing Case Theory Engine simultaneously scores three theories — Insider Threat, Authorized Business Research, Accidental Activity — updating confidence as evidence accumulates. Theory evolution: 70% → 99% across iterations.
Honest Benchmark Results
Against the 10-item M57-Patents ground truth:

Disk-Layer Precision: 1.0 (zero false positives)
Disk-Layer Recall: 0.8 (8/10 ground truth items found)
Disk-Layer F1: 0.8889
Hallucination Rate: 0.0
Total tool calls: 33
Self-corrections applied: 45
Runtime: ~6 minutes

I report two scores transparently — disk-layer only and full-finding (which includes memory injection artifacts that have no ground truth entry). Most submissions would hide the lower number. I reported both and documented why.
Biggest Technical Challenges

Volatility3 output noise — raw output has stderr, progress bars, and warnings that corrupted MCP responses. Fixed with a stdio wrapper.
Ground truth scope mismatch — M57 ground truth covers disk only. Memory injection findings (csrss.exe, winlogon.exe) are real artifacts but score as false positives. Solved with dual-score methodology.
Self-correction memoization — early iterations re-ran failing tools repeatedly. Fixed with a persistent failure tracker that marks degraded tools and routes to alternatives.
Autonomous polling — getting the agent to monitor a 6-minute investigation without human prompting required careful tool return message design.

What I'd Do Differently
Start with ground truth benchmarking on day 1, not day 3. The benchmark revealed gaps that shaped the entire architecture — if I'd built it first I'd have made better tool coverage decisions earlier.
Stack
Python, FastMCP, Volatility3, OpenClaw, Gemini API, Google ADK, SIFT Workstation

Links

GitHub: https://github.com/ssurekumar01111-hue/sift-aegis
Demo: https://youtu.be/iaf47TIKkLw
Devpost: https://devpost.com/software/sift-aegis

Built solo in 4 days. Happy to answer questions about the MCP architecture, the self-correction loop, or the benchmarking methodology.

🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

Ken W. Algerverified - Jun 4

MCP Is the USB-C of AI. So Why Are You Plugging Everything In?

Ken W. Algerverified - Jun 10

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

Dharanidharan - Feb 9

Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI

Masbadar - Mar 12

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

snapsynapseverified - Apr 20
chevron_left
1.7k Points27 Badges
9Posts
12Comments
4Connections
Flutter and Firebase developer from Banda, India. I spend my time building
real, production-grade m... Show more

Related Jobs

View all jobs →

Commenters (This Week)

4 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!