That "PayPal" Email Isn't From PayPal: Catching Unicode Homoglyph Spoofing in Gmail with Apps Script

3
calendar_today agoschedule5 min read

A phishing email landed in my inbox from "Wix.com" — or so it appeared. It looked exactly like a Wix notification. It wasn't: the "i" was actually U+0456 and the "o" was U+043E, both Cyrillic characters, and the actual sender was *Emails are not allowed*, a hijacked German pub's mail server.

(A note on notation: this article spells out Unicode code points like U+0456 instead of printing the characters themselves, because the characters are visually indistinguishable from Latin letters — which is the entire point of the attack.)

Gmail's filters let it through. SPF, DKIM, and DMARC all passed, because the attacker wasn't forging the sender address at all — they were forging the display name, which no authentication protocol checks.

So I built Unspoofer, a Google Apps Script that scans your inbox every 15 minutes and flags display-name spoofs. This post walks through how it works and the edge cases that turned out to be the hard part.

The attack: homoglyphs in the display name

An email's From header has two parts:

From: "Wix.com" <*Emails are not allowed*>

(where the display name's "i" and "o" are the Cyrillic look-alikes U+0456 and U+043E)

Email authentication (SPF/DKIM/DMARC) validates the address in angle brackets. The quoted display name is free text — anything goes. Most mail clients, especially on mobile, show the display name prominently and hide the address.

Attackers exploit this with homoglyphs: Unicode characters that look identical to Latin letters. Cyrillic U+0430 vs Latin a (U+0061). Greek omicron U+03BF vs Latin o (U+006F). Fullwidth U+FF50 vs p. To a human, "PayPal" written with Cyrillic a's is indistinguishable from the real thing. To a string comparison, they're completely different — which is exactly why naive keyword filters miss them.

The detection logic

The core idea is simple:

  1. Normalize homoglyphs in the display name back to ASCII
  2. Check whether the normalized name impersonates a known brand
  3. Compare the brand's real domain against the actual sender domain
  4. Mismatch → flag it

Step 1: Homoglyph normalization

The normalization map covers ~80 characters: Cyrillic and Greek look-alikes (both cases), fullwidth Latin letters and digits, and dot look-alikes like the one-dot leader (U+2024) used to fake "wix.com" patterns:

const HOMOGLYPH_MAP = {
  '\u0430': 'a', // Cyrillic a
  '\u0441': 'c', // Cyrillic c
  '\u0435': 'e', // Cyrillic e
  '\u043E': 'o', // Cyrillic o
  '\u03BF': 'o', // Greek omicron
  '\u2024': '.', // one dot leader
  // ... ~80 entries total
};

function normalizeToAscii(str) {
  if (!str) return '';
  let result = '';
  for (let i = 0; i < str.length; i++) {
    result += HOMOGLYPH_MAP[str[i]] || str[i];
  }
  return result.toLowerCase();
}

After normalization, the spoofed display name collapses to plain ASCII "wix.com" and you can run ordinary string matching against it.

Step 2: Domain comparison

Once the display name is normalized, the script extracts the brand it claims to be and compares root domains:

// "Wix.com" (with Cyrillic i, o) <*Emails are not allowed*>
// normalized display name -> "wix.com" -> implied domain: wix.com
// actual sender domain    -> bistro-pub.de
// wix.com !== bistro-pub.de -> SPOOF

There are two matching paths: a curated list of ~50 commonly impersonated brands, and a generic fallback that extracts any domain-like pattern from the display name and checks it against the sender. The generic path catches impersonation of brands that aren't on any list — if your display name says you're somebrand.com but you're mailing from random-host.de, that's suspicious regardless of whether I've heard of the brand.

The edge cases are the real work

The happy path took an evening. Avoiding false positives took much longer. A spoof detector that cries wolf gets uninstalled within a week. Some of what came up:

Compound TLDs. Naive root-domain extraction turns leumi.co.il into co.il, which would match every Israeli company against every other. The fix: if the second-to-last segment is 2 characters or fewer, take three segments instead of two.

function extractRootDomain(domain) {
  const parts = domain.toLowerCase().split('.');
  if (parts.length <= 2) return domain.toLowerCase();
  const secondToLast = parts[parts.length - 2];
  if (secondToLast.length <= 2) return parts.slice(-3).join('.'); // leumi.co.il
  return parts.slice(-2).join('.');                               // wix.com
}

Legitimate subdomains and sibling brands. mail.wix.com is real Wix. YouTube notifications come from google.com infrastructure. The detector needs a related-domains map so brand ↔ parent-company mail doesn't get flagged.

Your own domain in display names. Form services like Netlify Forms send notifications with your domain in the display name ("Form submission from yoursite.com") from their own infrastructure. That pattern is structurally identical to a spoof — except nobody phishes you by impersonating your own domain to you. The fix: skip the check when the implied domain equals the inbox owner's domain.

Short brand names. "x.com" appears inside thousands of innocent strings. Brands shorter than 4 characters require word-boundary matching.

Abuse-friendly platforms. A surprising share of phishing comes from *.firebaseapp.com and *.appspot.com hosting. Some campaigns use Firebase with a custom domain, which hides the platform — but the DKIM selector (s=firebase1) in the raw headers still gives it away. Checking DKIM selectors against known abuse patterns catches these even when the visible domain looks unremarkable.

Why Apps Script

No servers, no OAuth app verification process, no forwarding your mail through a third party. The script runs inside your own Google account, reads your inbox via GmailApp, and its only outputs are a SPOOF-ALERT label and a star on flagged messages. A time-driven trigger fires every 15 minutes (~96 runs/day, safely under the 100/day quota), and the scan loop self-terminates before the 6-minute execution limit. A rolling cache of 10,000 processed message IDs prevents re-scanning.

Installation is copy-pasting five .gs files into script.google.com and running setup() — full instructions in the repo. If you run Google Workspace for your company mail, it works there too, and that's where the display-name attacks tend to be most convincing: "CEO Name" <Emails are not allowed> asking for a wire transfer.

Takeaways

Display-name spoofing succeeds because authentication protocols verify the envelope while humans read the label. Any defense has to operate at the same layer the deception does: the rendered text. Homoglyph normalization is the key move — without it, string matching is blind to the entire attack class. And in detection systems generally, the detection logic is the easy 20%; suppressing false positives without creating blind spots is the 80%.

The code is MIT-licensed: github.com/yoelf22/unspoofer. Issues and PRs welcome — the brand list and homoglyph map both have room to grow.


Yoel Frischoff is a product strategist who has been shipping connected products for 30 years. He writes about connected-hardware strategy and IoT security at tangibles-book.com, where his book "Tangibles" and a set of free IoT security tools — a security scorecard, an ETSI-mapped requirements generator, and regulatory mapping for EU CRA / UK PSTI — are available.

126 Points3 Badges3
Tel Aviv theroadtlv.com
1Posts
0Comments
1Followers
1Connections
I am a product strategist, principal at theroadtlv.com, specializing in tangibles - Software augmented Hardware. My interests span durable goods, equipment, and software that makes them easy to use and of long lasting value. My most recent writing project is tangibles-book.com, due out in 2026. I also run https://iotdigest.substack.com a weekly IoT digest. As for coding, I mostly use claude code, vibing through from rigorous specifications to continuous release of utilities, book editing, pr...
Build your own developer journey
Track progress. Share learning. Stay consistent.
🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

Your Tech Stack Isn’t Your Ceiling. Your Story Is

Karol Modelskiverified - Apr 9

5 Web Dev Pitfalls That Are Silently Killing Your Projects (With Real Fixes)

Dharanidharan - Mar 3

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

Dharanidharan - Feb 9

Comparison: Universal Import vs. Plaid/Yodlee

Pocket Portfolio - Mar 12

The Interface of Uncertainty: Designing Human-in-the-Loop

Pocket Portfolio - Mar 10
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!