AI Watch Tester (AWT)

AI Watch Tester (AWT)

posted 2 min read

Every new project, same story: write login tests, write form validation tests, write navigation tests. Copy-paste from the last project, tweak selectors, pray nothing breaks.

After 25 years in IT, I decided to automate the boring part. I built AWT (AI Watch Tester) — an open-source tool where you enter a URL, and AI writes the tests for you.

How It Works

  1. Enter a URL — that's your only input
  2. AI scans the page — analyzes DOM structure + takes screenshots
  3. Generates test scenarios — login flows, form validation, navigation checks
  4. Runs them with Playwright — real browser, real clicks, real screenshots

No selectors to write. No test scripts to maintain. AI handles the planning, Playwright handles the execution.

"Can't Claude/GPT Just Do This with Computer Use?"

Fair question. I get it a lot.

Computer Use is a general-purpose GUI agent — it can click buttons and type text. But for E2E testing, you'd still need:

  • Docker environment setup
  • Screenshot pipeline management
  • Result parsing and storage
  • CI/CD integration
  • Scenario tracking across runs

And each test costs $0.50–2.00 because the AI processes every screenshot.

AWT uses AI only for test generation (analyzing what to test), then runs tests with Playwright — no per-screenshot AI cost. A typical scan costs $0.002–0.03. That's 10–100x cheaper.

Think of it this way: Computer Use is the hammer. AWT is the furniture store.

What Makes It Different

AWT Playwright/Cypress testRigor/Applitools
Test writing AI writes them You write them AI assists
Cost Free (MIT) + BYOK Free $800+/mo
AI provider Your choice (OpenAI, Anthropic, Ollama*) N/A Locked in
Local mode Yes (Ollama, experimental) Yes No

*Ollama adapter is included but experimental — works best with larger models (70B+). Results may vary with smaller models.

The Honest Limitations

This is v1.0 by a solo developer. Let me be upfront:

  • ✅ Works well on simple login/form pages (SauceDemo, standard auth flows)
  • ⚠️ Complex SPAs with heavy dynamic content — still improving
  • ⚠️ No cancel button for long scans yet
  • ⚠️ Free plan is limited (5 pages per scan)

Tech Stack

  • Backend: Python, FastAPI, Playwright
  • Frontend: Next.js, TypeScript
  • Database: PostgreSQL (Supabase)
  • AI: OpenAI / Anthropic / Ollama adapters
  • License: MIT

Try It

Sign up → Settings → Enter your OpenAI key → Start scanning.

Ollama adapter is also included for local execution, though it's still experimental — best results with larger models.

What I Learned Building This

  1. AI is great at generating test plans, bad at executing them. That's why I separated generation (AI) from execution (Playwright). Trying to do both with AI is expensive and fragile.

  2. Language detection matters. My first users got Korean test scenarios on English sites. Lesson: always detect the target site's language before generating.

  3. Assert validation is critical. AI sometimes generates structurally invalid assertions. A post-processing validator that auto-corrects the schema saved me from shipping broken tests.

Bug reports, feedback, and PRs are all welcome. What edge cases should I try next?

More Posts

5 Things This Playwright SQL Fixture Does So You Don't Have To

vitalicset - Apr 13

Angular-Aware E2E Testing: Query Components by @Input and Signals in Playwright

vitalicset - Apr 2

Flakestorm

frankhumarang - Jan 19

Your AI Doesn't Just Write Tests. It Runs Them Too.

Kevin Martinez - May 12

Breaking the AI Data Bottleneck: How Hammerspace's AI Data Platform Eliminates Migration Nightmares

Tom Smithverified - Mar 16
chevron_left

Related Jobs

Commenters (This Week)

2 comments

Contribute meaningful comments to climb the leaderboard and earn badges!