Your Test and Dev Data Is a Bigger Risk Than You Think

Question

Your Test and Dev Data Is a Bigger Risk Than You Think

calendar_todayJun 4 • schedule3 min read

Most developers know they shouldn't use production data in non-production environments. But knowing and doing are two different things — and the gap between them is getting more expensive.

According to Nick Mathison, Senior Product Manager for Delphix at Perforce Software, around 80% of organizations are still sharing production data downstream into non-production environments. That's not a new problem. What's new is that AI makes it significantly more dangerous.

AI Finds What You Missed

Mathison pointed to a recent example that frames the risk well. When the Mythos vulnerabilities were disclosed, researchers found that multiple low-severity issues could be chained together into a critical exploit. The same logic applies to data exposure.

"Before, you might have been okay with some risk," Mathison said. "You knew data was downstream. But now AI can find potentially divergent data points that — when put together — allow you to cross-reference and start to find these giant chains."

In other words, data that seemed harmless in isolation isn't harmless anymore. An AI agent combing through a non-production environment can connect dots that no human analyst would.

The People and Process Problem

When asked what most engineering teams get wrong about test data, Mathison didn't point to tooling. "Time and time again, it's a people and process problem," he said. "There are many tools out there that can do the job. At the end of the day, it's organizing your team in a way that shows the value of working this way."

Developers aren't oblivious to the problem either. Mathison said he asked this question at a user group recently and got a lot of knowing chuckles. "They know they're hiding their heads under the covers," he said. "They know it's there. It's just hard to track and monitor."

That's a pattern worth paying attention to. It's not ignorance — it's friction. Teams know the right approach, but the path to getting there feels expensive and disruptive.

Masking First, Synthetic Where It Fits

The most reliable fix, according to Mathison, is straightforward: mask production data before it ever leaves the production environment and pass only desensitized data downstream.

"Profile it, mask it in production, and then only share desensitized data into non-prod environments," he said. "Once it's masked, those teams are free to use it however they need — build the next feature, solve the next bug, iterate quickly."

Synthetic data is increasingly popular, especially with AI-generated datasets becoming easier to produce. But Mathison is direct about where it works and where it doesn't. Synthetic data is a good fit for greenfield projects and supplementing masked datasets. It breaks down when you're trying to replicate a full production database — complex relationships, correlated values, and real-world edge cases are hard to synthesize accurately. And when it comes to LLM training specifically, there's an added wrinkle: a model trained on AI-generated data can introduce a feedback loop that may not yield reliable results.

The practical takeaway: use masking as your primary strategy and treat synthetic data as a supplement rather than a replacement.

Shift Left on Data, Not Just Code

Developers have internalized shift-left thinking for testing and security. The same logic applies to data governance — it's just less common in practice.

Mathison described the target state as a self-service model. A platform team or data administration team manages the masking policies and makes compliant datasets available. Developers and testers pull what they need, when they need it, without opening a ticket or waiting on another team. "The upsides of getting access to that data when I need it, how I need it, just trumps the gotchas of making sure it's masked," Mathison said.

On the compliance side — GDPR, HIPAA, CCPA — Mathison is realistic about what developers can be expected to know. "You can't be an expert in everything," he said. The CISO and data administration teams own the policy. Developers need clear guardrails and tooling that make following those policies the path of least resistance.

One more watch-out: cloud costs. Teams moving to cloud-based data environments can see resource consumption scale quickly if access controls aren't balanced with spend management. It's the same pattern that bit organizations during the early lift-and-shift era.

What Good Looks Like

The bottom line is clear boundaries between production and non-production, a reliable masking process, and a self-service data delivery workflow that doesn't slow developers down.

Toil is the enemy. If the process for getting a compliant dataset is painful, developers will work around it. If it's fast and self-serve, they won't have to.

15 Comments

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Tom Smithverified

15.4k Points • 641 Badges

Raleigh, NC • insightsfromanalytics.com

187Posts

117Comments

81Connections

LLM Training & Evaluation Specialist with hands-on experience building major AI models. As one of th... Show more

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

DuchessCodes · Answer 1 · 2026-06-05T01:19:44+0000

What stood out to me is how AI changes the definition of "low risk" data exposure. Information that looked harmless in isolation a few years ago can now be correlated at scale. The best takeaway here is that data governance needs the same shift-left mindset we've already adopted for testing and application security.

Ayush_SIngh · Answer 2 · 2026-06-05T04:02:46+0000

I think the biggest takeaway is that this is not really a technology problem. Most teams already know production data should not be used in non-production environments. The real challenge is making the secure option simple and easy to follow. When the process is slow or complicated, teams often choose speed over compliance.

yogirahul · Answer 3 · 2026-06-05T08:58:23+0000

yogirahul • Jun 5

The scariest part is that the same AI is being used for attacks, finding leaks.

Tom Smithverified • Jun 5

@[yogirahul] That's the part that keeps security teams up at night. The same capabilities that make AI useful for developers — pattern recognition, connecting disparate data points, operating at scale — are exactly what makes it effective for finding exposures. The attack surface and the toolset are evolving at the same time.

Steve Fentonverified · Answer 4 · 2026-06-05T19:10:58+0000

I've worked in a lot of places and only one of them did great test data management. All environments got a tenant with a complete set of artificial data and it was easy to reset it. It meant there was a known starting point for a number of scenarios, which made integration testing easy and also supported really solid demos of the software to prospects.

I've seen masking a few times and it's not entirely without risk. A new field gets added to the database, but not to the set of masking columns, or even worse, email addresses get replaced with "fake emails" except the domain could be registered "Emails are not allowed", allowing someone to register the domain and recover accounts on your test server, which would be odd.

Just transferring production data to pre-production environments with no masking has to be one of the riskiest things people can do.

horushe · Answer 5 · 2026-06-06T02:24:17+0000

Really insightful piece on the escalating risks of using production data in non-production environments, especially with AI's ability to connect seemingly disparate data points. I agree with Mathison that the core issue is often 'people and process', not tooling. The recommendation to prioritize data masking over synthetic data, and shift left on data governance like we have for code, makes a lot of sense for enabling faster, safer development cycles.

buildbasekit · Answer 6 · 2026-06-06T12:53:29+0000

buildbasekit • Jun 6

The AI risk nobody talks about:

Giving your AI agent access to a dev database copied from production.

What looks like harmless test data to a human can become a surprisingly complete customer profile when AI starts connecting the dots.

"Non-production" doesn't mean "non-sensitive".

Tom Smithverified • Jun 6

@[buildbasekit] "Non-production doesn't mean non-sensitive" — that's the line right there. The environment label doesn't change what the data actually is. And you're right that the AI angle makes this qualitatively different. A human browsing a dev database might miss the connections. An agent won't.

buildbasekit • Jun 8

@[Tom Smith] Exactly.

A lot of teams treat data masking as a compliance requirement.

It becomes an engineering requirement the moment you start giving AI tools access to your internal environments.

The less an AI agent can see, the less damage it can do if something goes wrong.

Matt Allford · Answer 7 · 2026-06-09T06:36:34+0000

Data copied from production to lower environments, or copies of the production database placed in open locations, have been sources of several data breaches. I've worked with customers where non-production environments aren't as secured as production, or are accessible by more staff (by design), widening the attack vector.

It's not a trivial problem to solve, but it's one I see platform teams having a core responsibility for. As a developer, having a one-click/one-command process to get a sanitized database for the application I'm working on, without needing to think about it too much because the security defaults are baked in, would be amazing!

This would also apply to ephemeral environments too, where applications might need functional databases for the environment to become live, and I want valid data, but I (usually) won't mind too much about the contents of the data.

	I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt Karol Modelskiverified - Mar 19
	Your Backup Data Knows More Than You Think. HYCU aiR Is Finally Asking It the Right Questions. Tom Smithverified - May 14
	Your Service Desk Data Is Smarter Than You Think. AI Is Finally Proving It. Tom Smithverified - Jun 10
	Your Tech Stack Isn’t Your Ceiling. Your Story Is Karol Modelskiverified - Apr 9
	The End of Data Export: Why the Cloud is a Compliance Trap Pocket Portfolio - Apr 6

Your Test and Dev Data Is a Bigger Risk Than You Think

AI Finds What You Missed

The People and Process Problem

Masking First, Synthetic Where It Fits

Shift Left on Data, Not Just Code

What Good Looks Like

15 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Your Backup Data Knows More Than You Think. HYCU aiR Is Finally Asking It the Right Questions.

Your Service Desk Data Is Smarter Than You Think. AI Is Finally Proving It.

Your Tech Stack Isn’t Your Ceiling. Your Story Is

The End of Data Export: Why the Cloud is a Compliance Trap

More From Tom Smithverified

Torq SOC Brain: Why Reasoning From Precedent Beats Retrieving It

Developer Weekly Briefing — July 25, 2026

Sophos Traces a Ransomware Operation Back to 12 AI Agents Running Inside a Coding Assistant

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,754 amazing developers

Don't have an account? Sign up

OR

Your Test and Dev Data Is a Bigger Risk Than You Think

AI Finds What You Missed

The People and Process Problem

Masking First, Synthetic Where It Fits

Shift Left on Data, Not Just Code

What Good Looks Like

15 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Tom Smithverified

Related Jobs

Commenters (This Week)