The Problem
The manual checklist for spotting AI video is dead. Counting fingers, studying shadows, looking for melted backgrounds: that advice stopped working the day frontier models started passing casual inspection. Veo and Sora output now survives a frame-by-frame eyeball test more often than most people want to admit.
The tools that replaced the checklist have a different problem. Most detectors return a single percentage. You paste a clip, you get "87 percent AI," and the UI never tells you how fragile that number is. A 2024 survey of video detection methods (arXiv 2407.10575) found that no single detection approach holds up across generators, and every new model release, every re-encode, and every compression pass moves the numbers. Detection is an arms race, and a one-number scoreboard hides which side is winning on your specific clip.
Meanwhile the cost of guessing wrong keeps climbing. The EU AI Act's Article 50 transparency obligations and labeling laws like California's SB 942 take effect across 2026. Fact-checkers need to vet footage before amplifying it, brand teams need to screen creator submissions before a logo lands on synthetic video, and creators licensing stock or UGC need to know what they are buying. All of them are currently choosing between a gut call and a black-box percentage.
What the AI Video Detector Does
Upload a clip and you get one of four verdict tiers, AI-Generated, Likely AI, Likely Real, or Real, backed by six independent signal families that each report their own status and explanation:
- Visual Artifact Analysis: per-frame generation fingerprints in texture and frequency patterns
- Temporal Coherence: frame-to-frame motion statistics that generators consistently get wrong
- AI Watermark Scan: SynthID-class invisible watermarks embedded by Google and others
- Content Provenance: C2PA content credentials, the cryptographically signed kind OpenAI attaches to Sora videos
- Facial Authenticity: a dedicated deepfake pass for face swaps and re-enactment artifacts
- File Forensics: container metadata and encoder signatures left behind by generation pipelines
The signals split into two classes, and the UI keeps them separate. Watermarks and credentials are definitive: a hit on either is effectively conclusive evidence of AI origin. The logic is one way by design, since most genuine camera footage carries no credential and plenty of generators skip watermarking, so absence proves nothing. The other four signals are probabilistic. They accumulate evidence and the verdict weighs them together.
A Typical Detection Run
Drop an MP4, WebM, or MOV up to 30 seconds and 150 MB. Format and duration checks run in your browser before any upload, so an oversized file costs you nothing.
Your browser does the first pass locally. It extracts up to six representative frames, computes frame-to-frame motion statistics, and reads the container metadata for encoder fingerprints. Only those frames upload. The full video never leaves your device.
The six dimensions come back with status chips reading CLEAN, SUSPICIOUS, FLAGGED, or UNAVAILABLE, plus the overall verdict tier. If the clip contains a visible face, the deepfake model runs as well, and a positive flag earns a separate Deepfake Suspected badge on top of the verdict.
Read the evidence panel. Mixed signals are normal on real-world clips, and each dimension explains what it found, why it matters, and what it cannot tell you.
Key Capabilities
Six Signals Beat One Score: Research on detector generalization (DeCoF, arXiv 2402.02085) found temporal coherence to be the most robust signal against generators the detector has never seen, because it targets how video moves through time rather than how any specific model paints pixels. We run it alongside five other layers so a clip has to fool all of them at once.
A SynthID Detector Built In: Google embeds SynthID watermarks in Veo output and OpenAI signs Sora videos with C2PA credentials. The detector scans for both on every run, which makes it one of the few places you can check a downloaded clip for either mark without writing code.
Four Honest Verdict Tiers: There is no accuracy percentage anywhere in the product, and we would gently suggest distrusting any detector that advertises one. Accuracy claims go stale with every generator release and every re-encode. Tiers plus visible evidence age better than a number that was measured on last year's models.
Browser-Side Privacy: Frame extraction, motion statistics, and file forensics all run client-side. The frames that do upload sit in temporary storage, get cleared after a short retention window, and never train anything.
A Deepfake Pass With Its Own Badge: Face-swap boundaries and re-enactment artifacts get a dedicated model rather than a share of one blended score. When a clip has no visible face, the dimension reports UNAVAILABLE and the UI says so plainly instead of guessing.
Refunds on Failure: A detection run costs 20 credits, and the charge confirms only when you receive a verdict. If the pipeline fails midway, credits come back automatically.
Stack Notes for the Engineers in the Room
The app is Next.js 15 deployed to Cloudflare Pages through OpenNext, sharing infrastructure with the rest of the Virality Predictor family: D1 with Drizzle ORM for accounting, Durable Objects for credit pool state, NextAuth v5 with Google OAuth, R2 for short-lived frame storage, Tailwind CSS v4 with Shadcn UI.
The client does more work than you might expect. Canvas-based frame extraction picks representative frames with top-K frame-difference selection and a guaranteed floor, so even a static clip yields usable frames. Motion variance statistics and container metadata parsing also happen in the browser, which is what lets the full video stay on the device.
Server-side, the frames fan out to specialist detection models, up to twelve concurrent calls per run with an AI-image pass and a deepfake pass per frame, gathered with allSettled and a success threshold. If more than half of the detection calls fail, the run aborts and refunds. A weighted aggregation engine then folds frame scores, temporal statistics, and forensics into per-dimension statuses and the final tier, with a safety bound that prevents gray-zone forensic evidence from pushing a verdict past Likely AI on its own.
One number from our internal benchmarks, with that qualifier attached: AI-generated clips showed frame-to-frame motion variance roughly two orders of magnitude lower than handheld camera footage. That gap needs no knowledge of which generator made the clip, which is exactly why temporal analysis anchors the probabilistic side of the verdict. The aggregation thresholds are tuned on a small internal sample and will keep moving as generators do.
Pricing
Sign in with Google to get starter credits and see the full flow. Each detection run costs 20 credits, charged only on a successful verdict, with automatic refunds on failure. Credit packs and subscription plans top up one shared balance that also covers the Viral Potential Predictor and the Video Maker. No credit card is required to sign in. Full details are at viralitypredictor.net/pricing.
Closing
The AI Video Detector is live at viralitypredictor.net/ai-video-detector. Drop in a clip you already know the origin of and check whether the evidence panel reaches the same conclusion you would.
I built this because every detector I tried gave me a confident percentage and no way to interrogate it. If you have shipped detection or provenance tooling, wrestled with C2PA in a real pipeline, or have opinions on which signals deserve more weight, I would genuinely like to hear them. Drop a comment below.