When Your Database Is On Fire at 2 AM, Ellie Shows Up

Question

When Your Database Is On Fire at 2 AM, Ellie Shows Up

calendar_todayJun 15 • schedule4 min read

Most database problems don't announce themselves in advance. They show up at 2 AM, when systems are down and the clock is running. And the DBA who knew this environment best left the company six months ago.

That's the reality Dave Page, CTO of pgEdge, has been watching play out for nearly 30 years in the Postgres community. It's also the problem he set out to fix with the pgEdge AI DBA Workbench, which moved to general availability last week.

The Staffing Problem Is the Database Problem

Ask most database teams where their biggest pain point is and you might expect the answer to be query performance, schema design, or index bloat. Page has a different take.

"Honestly, in my experience, it's finding staff that are experienced enough," he said.

That's the context behind the Workbench. Teams are being asked to manage more Postgres infrastructure with fewer qualified people. And the tools they've had available were built for a simpler era — one where you set manual thresholds and hoped you caught problems before users did.

The Workbench gives teams Ellie, an AI agent built directly into the monitoring system. She doesn't replace your DBA. But she works 24/7, and when something goes wrong, she can investigate 10 to 100 times faster than a human working through the problem manually.

"When your system's down and you're losing a million bucks an hour, Ellie helps you reduce that time," Page said.

What Ellie Actually Sees

The Workbench collects continuous snapshots of Postgres system views and stores them historically. If you have extensions like pg_stat_statements installed, Ellie has access to that data too — aggregate query stats across databases and users. Add the System Stats extension and she can also see operating system-level data: memory usage, CPU, disk mounts.

On top of that, she has access to a live RAG database of current product documentation, so she isn't limited by whatever the underlying model's training cutoff happens to be.

Every session is isolated to the individual user, so Ellie is working within your user context and permissions. Whatever Postgres exposes through system views for your session, she can see and query.

The result is an agent with genuine situational awareness — not a chatbot taking guesses based on a prompt, but something that can actually see the current state of your systems and correlate it with historical data.

Three-Tier Alerting, Anomaly Detection Included

One of the things Page said he always wanted to build but never had the technology for was anomaly-based alerting. With the Workbench, you don't need to manually configure thresholds for every metric. The system learns baselines and tells you when something is off.

"You don't need to do anything, just let it run," Page said.

The alerting runs across three tiers, with the final tier being an LLM that assesses cluster state and flags what needs attention. Every chart and graph in the system has a button that invokes an AI session to explain what you're looking at and suggest next steps.

Early on, the two most common issues the system has been surfacing are poorly indexed tables — showing up as queries running far longer than expected — and unexpectedly low cache hit ratios. Page noted the cache hit issue caught him off guard in frequency.

"Even on machines where you think, 'I've got plenty of memory on this box, it's going to be fine' — you look at the cache hit ratio and it's 8%," he said.

When Ellie identifies a fixable problem, she generates the SQL. You can execute it directly from the UI or copy it and run it yourself through your own change management process.

Built for Governance, Not Just Speed

One of the questions that comes up immediately with any AI agent touching production databases is: what are the guardrails?

The Workbench has an extensive RBAC system governing access not just to the system but to specific MCP tools and resources. LLM interactions are fully logged and traceable, which Page noted was built early for their own debugging purposes before becoming a governance feature.

And the system is explicit about human approval for any destructive action.

"We don't want Claude running amok," Page said.

The AI model layer is also flexible. The Workbench supports Anthropic, OpenAI, Voyage, Ollama, or any OpenAI API-compatible service. During the beta, the most common feedback was around proxy support — customers who route requests through a bearer-token proxy rather than connecting directly to an AI provider. That's now supported.

Free to Use, Open Source at the Core

The Workbench is available as a free download from GitHub and works with any Postgres environment running version 14 or above. pgEdge describes itself as a fiercely open-source company. The commercial play is straightforward: when larger teams are running the Workbench in production across hundreds of servers, they want a support contract and someone to call when something goes wrong.

Page has been building graphical management tools for Postgres since the early days of the community. The Workbench, he said, is the monitoring system he always wanted to build — finally made possible by technology that didn't exist until recently.

"It's all the lessons learned, things I've wanted to do for the past 20 years," he said.

Ben Fried, former CIO of Google and now a venture partner at Rally Ventures, put it more directly after seeing it: "I wish I'd had something like this years ago."

The Workbench is available at github.com/pgEdge/ai-dba-workbench.

2 Comments

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Tom Smithverified

15.6k Points • 657 Badges

Raleigh, NC • insightsfromanalytics.com

194Posts

119Comments

81Connections

LLM Training & Evaluation Specialist with hands-on experience building major AI models. As one of th... Show more

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

buildbasekit · Answer 1 · 2026-06-16T10:07:32+0000

This is the type of AI tooling that actually makes sense to me.

Not because it replaces DBAs, but because it reduces the time spent collecting context during incidents. The biggest challenge is rarely writing the SQL fix. It's figuring out what changed, where the bottleneck is, and how all the signals connect.

If Ellie can reliably shorten that investigation phase while keeping humans in control of production changes, that's a huge win for database teams.

	The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI Ken W. Algerverified - Jun 4
	Your Tech Stack Isn’t Your Ceiling. Your Story Is Karol Modelskiverified - Apr 9
	Europe Just Dropped the Hammer on AI: A Wake-Up Call? PrabashanaDev - Jul 15
	Delivering Database Changes Steve Fentonverified - Jul 22
	My Nginx Died at 2 AM and Nobody Noticed for 6 Hours. Now I Have a Watchdog Script BashSnippets - May 21

When Your Database Is On Fire at 2 AM, Ellie Shows Up

The Staffing Problem Is the Database Problem

What Ellie Actually Sees

Three-Tier Alerting, Anomaly Detection Included

Built for Governance, Not Just Speed

Free to Use, Open Source at the Core

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

Your Tech Stack Isn’t Your Ceiling. Your Story Is

Europe Just Dropped the Hammer on AI: A Wake-Up Call?

Delivering Database Changes

My Nginx Died at 2 AM and Nobody Noticed for 6 Hours. Now I Have a Watchdog Script

More From Tom Smithverified

Atsign's Kill Switch for AI Agents Isn't Software. It's an Identity You Can Revoke.

Snyk: Enterprises Can See a Third of Their Own AI Attack Surface. The Other Two-Thirds Is Where th

Cyera: Non-Human Identities Grew 480% in Six Months. Most Companies Have No Idea What They're Doing.

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,778 amazing developers

Don't have an account? Sign up

OR

When Your Database Is On Fire at 2 AM, Ellie Shows Up

The Staffing Problem Is the Database Problem

What Ellie Actually Sees

Three-Tier Alerting, Anomaly Detection Included

Built for Governance, Not Just Speed

Free to Use, Open Source at the Core

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Tom Smithverified

Related Jobs

Commenters (This Week)