I Tested AWS DevOps Agent—Here’s What Happened When I Broke My EC2 Instance

Question

I Tested AWS DevOps Agent—Here’s What Happened When I Broke My EC2 Instance

calendar_todayMar 22 • schedule3 min read

How well does AI-powered incident response actually work? I decided to find out by deliberately stress-testing an EC2 instance and letting AWS DevOps Agent investigate — with zero hints from me.

What is AWS DevOps Agent?

AWS DevOps Agent (currently in public preview) is Amazon's autonomous AI agent for incident response. It connects to CloudWatch, logs, code repos, and third-party observability tools, then correlates data across all of them to find root causes — like an always-on, AI-powered on-call engineer.

During preview, it's free with limits: 20 investigation hours/month, up to 10 Agent Spaces, us-east-1 only.

Setting Up

Setup involves creating an Agent Space — a logical boundary that defines what the agent can access.

I configured the basics: connected my AWS account, enabled the web app, and left optional integrations (Datadog, Slack, GitHub, MCP Servers) empty to test out-of-box capabilities.

The agent also supports Skills (custom runbooks) and Prevention (weekly analysis of past incidents for proactive recommendations). I left Skills empty and enabled Prevention.

Creating Chaos

I launched a t2.medium EC2 instance ("demo") and connected via SSM Session Manager. Then I ran the stress utility to spike CPU:

# Maximum chaos first
stress --cpu 2 --timeout 300

# Then single CPU worker
stress --cpu 1 --timeout 300

A t2.medium has 2 vCPUs with 20% baseline performance. Running stress would push it well above baseline, burn CPU credits, and create a clear spike in CloudWatch.

The Investigation

I ran two investigations. The first was broad: "Investigate high CPU utilization across my compute resources." The second was targeted: "High CPU Usage on one of the EC2 Instances in us-east-1" with the exact incident timestamp.

The agent immediately identified the instance, noted it launched just ~15 minutes before the incident, and began gathering metrics in parallel.

What the Agent Found

The agent generated detailed CPU charts showing the escalation: average CPU rose from 2.8% → 28.3% → 58.5% across three 5-minute windows. Maximum hit 100% at 14:05, confirming full vCPU saturation.

It also tracked CPU credit burn rate — peaking at 5.83 credits per 5-minute window, roughly 2.9x the earn rate.

But the agent didn't stop at CPU. It simultaneously analyzed:

Network I/O — heavy 75MB ingress during bootstrap, dropping before CPU spike → not network-driven
EBS Disk I/O — boot activity settling before CPU spike → disk I/O was a result, not a cause
Status checks — all passing, no hardware failures
CWAgent metrics — none found, flagging a visibility gap

The Mitigation Plan — This Is Where It Got Interesting

The Mitigation plan tab said: "No mitigation action can be identified."

That sounds like a failure. It's actually the smartest part. The agent explained it found a strong correlation between SSM sessions and the CPU spike, but couldn't safely recommend action because:

SSM Session Manager logging wasn't configured — actual shell commands were invisible
CloudWatch Agent wasn't publishing per-process metrics yet
IAM permissions prevented running diagnostic commands via SSM RunCommand

Instead of blindly suggesting "kill the process" or "stop the instance," the agent gave concrete next steps: run top or ps aux --sort=-%cpu via the console, configure SSM logging, and coordinate with the root user (it even identified the IP address). That's real engineering judgment.

The Verdict

What impressed me: The autonomous correlation across CPU, network, disk, credits, and status checks — building a coherent timeline without any manual intervention. The agent thinks like an engineer, distinguishing cause from effect and knowing when it doesn't have enough evidence to act.

Bottom line: AWS DevOps Agent is a genuine shift from observability dashboards to operational reasoning. It won't replace engineers, but it dramatically compresses the time between "something is wrong" and "here's exactly what happened." Worth trying during the free preview.

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Venkata Pavan Vishnu Rachapudi

885 Points • 16 Badges

Hyderabad • dev.to/vishnu_rachapudi_75e73248

7Posts

0Comments

12Connections

I started my career as a Java developer, building backend systems and developing a strong foundation... Show more

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	AWS Certifications Are a Building Block, Not the Final Destination Ijay - Jun 16
	10 Proven Ways to Cut Your AWS Bill rogo032 - Jan 16
	How to Reduce Your AWS Bill by 50% rogo032 - Jan 27
	Implementing Cellular Redundancy: Cross-Cloud Failover with AWS Transit Gateway and Azure ExpressRou Cláudio Raposo - May 5
	Learn AWS for Free Hands On Without Getting Charged Ijay - Feb 24

I Tested AWS DevOps Agent—Here’s What Happened When I Broke My EC2 Instance

What is AWS DevOps Agent?

Setting Up

Creating Chaos

The Investigation

What the Agent Found

The Mitigation Plan — This Is Where It Got Interesting

The Verdict

0 Comments

Please log in to comment on this post.

More Posts

AWS Certifications Are a Building Block, Not the Final Destination

10 Proven Ways to Cut Your AWS Bill

How to Reduce Your AWS Bill by 50%

Implementing Cellular Redundancy: Cross-Cloud Failover with AWS Transit Gateway and Azure ExpressRou

Learn AWS for Free Hands On Without Getting Charged

More From vishnu99

️ The Power of Tagging in AWS From Chaos to Clarity

Small AWS Updates, Big Impact: S3 Namespace Shift and Smarter Security Scanning

From Certifications to Real Engineering: My Journey to 14 AWS Certifications

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,631 amazing developers

Don't have an account? Sign up

OR

I Tested AWS DevOps Agent—Here’s What Happened When I Broke My EC2 Instance

What is AWS DevOps Agent?

Setting Up

Creating Chaos

The Investigation

What the Agent Found

The Mitigation Plan — This Is Where It Got Interesting

The Verdict

0 Comments

Please log in to comment on this post.

More Posts

More From vishnu99

Related Jobs

Commenters (This Week)