AI Watch Tester (AWT)

Question

AI Watch Tester (AWT)

calendar_todayFeb 25 • schedule2 min read

Every new project, same story: write login tests, write form validation tests, write navigation tests. Copy-paste from the last project, tweak selectors, pray nothing breaks.

After 25 years in IT, I decided to automate the boring part. I built AWT (AI Watch Tester) — an open-source tool where you enter a URL, and AI writes the tests for you.

How It Works

Enter a URL — that's your only input
AI scans the page — analyzes DOM structure + takes screenshots
Generates test scenarios — login flows, form validation, navigation checks
Runs them with Playwright — real browser, real clicks, real screenshots

No selectors to write. No test scripts to maintain. AI handles the planning, Playwright handles the execution.

"Can't Claude/GPT Just Do This with Computer Use?"

Fair question. I get it a lot.

Computer Use is a general-purpose GUI agent — it can click buttons and type text. But for E2E testing, you'd still need:

Docker environment setup
Screenshot pipeline management
Result parsing and storage
CI/CD integration
Scenario tracking across runs

And each test costs $0.50–2.00 because the AI processes every screenshot.

AWT uses AI only for test generation (analyzing what to test), then runs tests with Playwright — no per-screenshot AI cost. A typical scan costs $0.002–0.03. That's 10–100x cheaper.

Think of it this way: Computer Use is the hammer. AWT is the furniture store.

What Makes It Different

	AWT	Playwright/Cypress	testRigor/Applitools
Test writing	AI writes them	You write them	AI assists
Cost	Free (MIT) + BYOK	Free	$800+/mo
AI provider	Your choice (OpenAI, Anthropic, Ollama*)	N/A	Locked in
Local mode	Yes (Ollama, experimental)	Yes	No

*Ollama adapter is included but experimental — works best with larger models (70B+). Results may vary with smaller models.

The Honest Limitations

This is v1.0 by a solo developer. Let me be upfront:

✅ Works well on simple login/form pages (SauceDemo, standard auth flows)
⚠️ Complex SPAs with heavy dynamic content — still improving
⚠️ No cancel button for long scans yet
⚠️ Free plan is limited (5 pages per scan)

Tech Stack

Backend: Python, FastAPI, Playwright
Frontend: Next.js, TypeScript
Database: PostgreSQL (Supabase)
AI: OpenAI / Anthropic / Ollama adapters
License: MIT

Try It

Cloud: https://ai-watch-tester.vercel.app
GitHub: https://github.com/ksgisang/AI-Watch-Tester

Sign up → Settings → Enter your OpenAI key → Start scanning.

Ollama adapter is also included for local execution, though it's still experimental — best results with larger models.

What I Learned Building This

AI is great at generating test plans, bad at executing them. That's why I separated generation (AI) from execution (Playwright). Trying to do both with AI is expensive and fragile.
Language detection matters. My first users got Korean test scenarios on English sites. Lesson: always detect the target site's language before generating.
Assert validation is critical. AI sometimes generates structurally invalid assertions. A post-processing validator that auto-corrects the schema saved me from shipping broken tests.

Bug reports, feedback, and PRs are all welcome. What edge cases should I try next?

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	5 Things This Playwright SQL Fixture Does So You Don't Have To vitalicset - Apr 13
	Meet kalbee: State Estimation Without the Boilerplate Vincente - Jul 18
	Angular-Aware E2E Testing: Query Components by @Input and Signals in Playwright vitalicset - Apr 2
	Flakestorm frankhumarang - Jan 19
	The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI Ken W. Algerverified - Jun 4

AI Watch Tester (AWT)

How It Works

"Can't Claude/GPT Just Do This with Computer Use?"

What Makes It Different

The Honest Limitations

Tech Stack

Try It

What I Learned Building This

0 Comments

Please log in to comment on this post.

More Posts

5 Things This Playwright SQL Fixture Does So You Don't Have To

Meet kalbee: State Estimation Without the Boilerplate

Angular-Aware E2E Testing: Query Components by @Input and Signals in Playwright

Flakestorm

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,757 amazing developers

Don't have an account? Sign up

OR

AI Watch Tester (AWT)

How It Works

"Can't Claude/GPT Just Do This with Computer Use?"

What Makes It Different

The Honest Limitations

Tech Stack

Try It

What I Learned Building This

0 Comments

Please log in to comment on this post.

More Posts

5 Things This Playwright SQL Fixture Does So You Don't Have To

Meet kalbee: State Estimation Without the Boilerplate

Angular-Aware E2E Testing: Query Components by @Input and Signals in Playwright

Flakestorm

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

Related Jobs

Commenters (This Week)