Over the past few weeks, I've been building PhantomSOC, an autonomous incident response platform that doesn't just investigate security incidents—it learns from them.
Most AI-powered SOC tools can analyze alerts and generate reports. However, they rarely improve their investigation quality over time. That was the problem I wanted to solve.
The Problem
Security teams face thousands of alerts every day. False positives consume valuable analyst time, while real attacks can slip through the cracks. Even when investigations are completed, lessons learned often remain trapped in reports rather than improving future investigations.
I wanted to build a system capable of:
Investigating incidents autonomously
Evaluating the quality of its own investigations
Detecting blind spots and overconfidence
Learning from past mistakes
Automatically improving future investigations
The Architecture
PhantomSOC consists of three primary layers:
Layer 1 — SOC Triage Agent
The SOC Agent receives alerts and determines whether they are:
False positives
Suspicious activity
Escalation-worthy threats
It performs threat scoring, MITRE ATT&CK mapping, and checks previous investigations through an investigation memory system.
Layer 2 — Phantom Forensic Agent
Escalated incidents are handed to the DFIR agent.
This agent:
Reconstructs attack timelines
Extracts indicators of compromise
Maps attack chains
Generates stakeholder-specific reports
Produces autonomous incident response runbooks
Layer 3 — Learning Meta-Agent
This is the most interesting part of the project.
After each investigation:
The investigation is evaluated by an LLM Judge
Quality scores are generated
Confidence drift is measured
Historical traces are queried through Arize Phoenix MCP
Blind spots are identified
Investigation playbooks are automatically updated
Future investigations immediately benefit from those improvements.
Why Arize Phoenix?
Instead of using Phoenix purely for observability, I used it as the foundation of the learning loop.
Every Gemini reasoning step is captured through OpenInference instrumentation.
The Learning Agent queries Phoenix MCP to answer questions like:
Which investigations scored below 70%?
Which blind spots appear repeatedly?
Where is the system consistently overconfident?
That information becomes training data for operational improvement.
Results
After enabling the self-improvement loop:
DFIR quality improved from 58% to 77%
SOC quality improved from 50% to 75%
Confidence drift decreased from CRITICAL to WARNING
MITRE ATT&CK coverage doubled from 3 tactics to 6
Investigation memory successfully recalled related incidents
Executive reports and runbooks were generated automatically.
Technology Stack
Google ADK
Gemini
Google Cloud Run
Google Cloud Storage
Arize Phoenix
Phoenix MCP
OpenInference
Python
SQLite
Pydantic
What I Learned
The most valuable lesson from this project was that observability becomes far more powerful when it is connected directly to learning.
Tracing alone tells you what happened.
Evaluation tells you whether it was good.
Learning systems use that information to continuously improve.
That's the direction I believe autonomous agents need to move toward.
Project Links
GitHub:
https://github.com/ssurekumar01111-hue/phantomsoc
Live Demo:
https://phantomsoc-745097138732.us-central1.run.app/dashboard
Demo Video:
https://youtu.be/mAJ5f7dyKsk
I'd love to hear feedback from the community and discuss ideas for making autonomous security operations more reliable, observable, and self-improving.
Built for #Google Cloud Rapid Agent Hackathon 2026, #Arize Track.