What I Learned Building a GDPR Compliance Tool (And What I'd Do Differently)

7 18
calendar_today agoschedule5 min read

A few months ago I had an idea that seemed simple enough. Build a tool that scans a website, checks it against the key GDPR requirements, and spits out a report. How hard could it be?

Turns out, pretty hard. Not because the individual pieces were complicated, but because every decision you make early on has a way of coming back to bite you later. This is an honest account of how ClearlyCompliant came together, the wrong turns I took, and what I'd change if I started over today.


The Idea

I kept noticing the same thing on small business websites. Cookie banners that didn't actually block cookies. Contact forms with no privacy policy link. Privacy policies copied from a template that mentioned GDPR once and said nothing useful.

The businesses weren't ignoring compliance out of malice. They just had no easy way to know what they were getting wrong. Enterprise compliance tools cost hundreds per month. Legal consultants cost more. Most small businesses either crossed their fingers or paid someone to tell them they were probably fine.

There was a gap there. A simple, affordable scan that told you specifically what was wrong and why it mattered. One-off fee, no subscription, no jargon.

So I built it.


The First Mistake: Overcomplicating the Architecture

My first instinct was to do this properly. Task queue, worker processes, the works. I spent the better part of a week getting Celery and Redis set up before I stopped and asked myself whether I actually needed any of it.

The answer was no. For the scale I was targeting, Python's built-in threading module was completely sufficient. One less service to run, one less thing to monitor, one less thing to break at 2am.

The lesson I keep relearning: start with the simplest thing that works and add complexity only when you have a specific problem that requires it. Celery is a great tool. I didn't need it.


The Second Mistake: Underestimating PDF Generation

I assumed PDF generation would be the easy part. Grab WeasyPrint, write some HTML and CSS, done.

WeasyPrint is genuinely great at producing well-designed PDFs from HTML. The problem is it has a GTK dependency that made my Windows development environment a nightmare. Hours of debugging later I still couldn't get it working reliably.

I switched to ReportLab. It's lower level, you're building the document programmatically rather than styling HTML, and the learning curve is steeper. But it's pure Python, it installed in seconds, and it works the same everywhere. The reports look professional and I have precise control over every element.

If I started over I'd go straight to ReportLab. WeasyPrint would have been fine on Linux, but I wasn't developing on Linux and fighting your tooling is a tax on every hour you spend building.


The Part That Actually Worked: Using AI for Policy Analysis

Most of the GDPR checks are deterministic. Does a cookie banner exist? Is HTTPS enforced? Are there security headers? You're looking for specific things and either they're there or they aren't.

Privacy policy analysis is different. A privacy policy is a natural language document and the question isn't just whether one exists but whether it actually covers what it's supposed to cover. Does it mention retention periods? Does it explain the lawful basis for processing? Does it tell users how to make a complaint to the ICO?

You can't answer those questions with a regex.

I used the Claude API (Haiku model) to analyse the policy content against a structured prompt listing the required GDPR elements. It evaluates each one and returns a PRESENT, PARTIAL, or MISSING status with a one-sentence explanation. The results feed directly into the PDF report alongside the deterministic checks.

This was the part of the build I was least sure about going in and it ended up being one of the cleanest pieces of the whole system. The AI handles the ambiguity well and the structured prompt keeps the output consistent enough to parse reliably.


The Part I Got Wrong: Scope Creep Avoidance

Early on I decided the MVP would not include remediation guidance. The report would tell you what was wrong but not how to fix it. Ship first, add that later.

That was the right call for getting to market. But "later" has a way of becoming "never" when you're already moving onto the next thing. The most consistent piece of feedback from early users has been that they want to know how to fix the issues the report flags, not just what they are.

It's on the roadmap. But if I had my time again I'd have built at least a basic version of it into the initial release. The guidance doesn't need to be comprehensive. Even a short paragraph per finding explaining what action to take would have meaningfully improved the product from day one.


The Part Nobody Tells You About: Finding the Privacy Policy URL

This sounds trivial. It wasn't.

The first version asked users to paste in their privacy policy URL manually. Most didn't know it offhand, had to go find it, and some just left the form. Friction kills conversions.

So I built auto-detection. Crawl the page, look for links matching common privacy policy patterns in the href or link text, fall back to trying common paths like /privacy-policy or /privacy if nothing matches.

It works well for maybe 85% of sites. The edge cases are genuinely weird. Sites that host their privacy policy on a subdomain. Sites that use JavaScript to render the footer where the link lives. Sites that link to a third-party policy hosted on a completely different domain.

I keep improving it. But it was a reminder that "obvious" features often have long tails of edge cases that eat time you didn't budget for.


On Pricing

I went back and forth on this. Monthly subscription felt wrong for a compliance check that you might only need once or twice a year. Freemium felt like it would attract users who'd never convert. I settled on a one-off payment of £29.99.

It's early days but the conversion rate has been reasonable and support overhead is low because there are no ongoing billing issues to deal with. One-off pricing also removes the hesitation that comes with committing to yet another recurring charge.

I think it was the right call for this product. I might add a bulk option for agencies who want to run scans across multiple client sites, but the core model stays one-off.


What I'd Do Differently

Start with ReportLab, not WeasyPrint. Don't fight your tooling.

Build threading first, not Celery. Add infrastructure when you have a specific reason to, not because it's the "proper" way.

Include at least basic remediation guidance from day one. Users want to know what to do, not just what's wrong.

Spend more time on the privacy policy auto-detection before launch. The edge cases are frustrating enough that they show up in feedback more than any other issue.

Write the content marketing pieces earlier. The product was live for a couple of weeks before I started writing about it. Every week without content is a week without potential organic traffic.


Where It Is Now

ClearlyCompliant is live at clearlycompliant.co.uk. It runs 23 GDPR checks across cookie consent, privacy policy content, forms, security headers, and third-party scripts, analyses the policy with AI, and delivers a PDF report by email. The whole scan takes a few minutes.

If you're building something in a regulated space I'd be interested to hear how you've handled the compliance side of it. Drop a comment below.


1.4k Points25 Badges7 18
8Posts
7Comments
4Followers
3Connections
Hi, I’m Joe, a web developer and tech entrepreneur. I don’t just write code, I build projects that solve real problems and help businesses grow. From designing clean, user-friendly... Show more
Build your own developer journey
Track progress. Share learning. Stay consistent.
🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

Just completed another large-scale WordPress migration — and the client left this

saqib_devmorph - Apr 7

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

Dharanidharan - Feb 9

The End of Data Export: Why the Cloud is a Compliance Trap

Pocket Portfolio - Apr 6

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

snapsynapseverified - Apr 20
chevron_left

Commenters (This Week)

4 comments
3 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!