About the Scam Message Scanner

The Scam Message Scanner is the analytical engine behind every “Scan X email/message” page on this site. It is one of two parts of ScamSupport — the other being the long-form scam guides that explain the patterns in human language. Paste a message into any scanner page and a panel of 23 detection checks examines the text for the structural signals associated with UK consumer-scam campaigns: sender-domain typosquatting, brand-name versus URL hostname mismatch, urgency markers, credential requests, business-email-compromise patterns, fake regulatory claims, and the rest of the catalogue documented in “The 23 detection checks” section below. Each signal contributes to a verdict tier (from “Looks safe” through to “High-risk scam detected”) and the specific signals that fired are returned alongside the verdict so you can see why the scanner reached its conclusion.

Everything below is the published methodology. We treat the scanner as a tool people use to decide whether to act on a message that might lead to financial loss or identity exposure, which makes the case for inspectable reasoning higher than it is for most consumer products. Where we know the scanner is weaker (sophisticated targeted attacks, languages other than English, contextual situations where past correspondence with the sender matters) we say so, in the Limits section below. Where the open-source community has flagged a class of message we get wrong, the “Report a false negative” route on the model training page feeds directly into the next training cycle.

What the scanner does

Paste a suspicious email or SMS into any "Scan X email/message" page on this site. The scanner analyses the text against 23 detection patterns and returns one of five verdict tiers:

Looks safe — no scam patterns detected. Always verify the sender independently.
Low risk — weak signals only.
Possible scam — review carefully. At least one moderate signal fired.
Probable scam — multiple findings. Do not act on this message.
High-risk scam detected — strong signals. Do not click, do not reply, do not pay.

Each verdict comes with the specific findings that triggered it — so you can see why the scanner flagged the message and verify the signals against your own judgment.

The 23 detection checks

The scanner aggregates signals across:

Sender-domain analysis — Levenshtein typosquat distance against ~95 known legitimate UK brand domains
Brand-impersonation detection — brand name in text vs URL hostname mismatch
Suspicious-link analysis — typosquat domains, suspicious TLDs (.online, .live, .xyz, etc.), URL shorteners, excessive subdomains, IP-address URLs, punycode, brand-name in non-official hostnames
Urgency + threat markers — "within 24 hours", "final notice", "legal action", "account suspended"
Sensitive-information request — context-aware password/PIN/sort-code/account-number/seed-phrase/recovery-phrase detection
Business Email Compromise (BEC) — executive-impersonation + financial-mechanism + urgency three-way AND-gate
Investment-pitch detection — institutional sender + investment-vocab + low-friction CTA
Callback-pattern detection — brand + phone + callback ask + urgency/threat, with defensive-context exemption for legit fraud-alert messages
Job-offer / mentor-pitch detection — earnings claim + upfront fee + informal tone
Brand-suspended / account-suspended, prize-fraud, advance-fee fraud, heritage / inheritance, family-impersonation ("Hi Mum"), attachment-language, fake regulatory approval (FCA / FOS / etc. claims)
ML model — logistic-regression aggregator over the heuristic features for borderline-shape detection
Community DB + NCSC email-auth + threat-intelligence — supplementary async checks (best-effort)

Validation

Each detection class is validated against a hand-crafted test set with labelled positive and negative cases. Tests run the actual production worker.js via a Node CLI harness (production-truth testing — no port, no shim).

Final cross-category results (17 May 2026)

Category	Cases	Precision	Recall	F1
Banking smishing (UK retail banks)	55	100%	100%	100%
HMRC phishing	50	96.55%	100%	98.25%
Parcel smishing (Royal Mail, Evri, DPD, etc.)	50	100%	100%	100%
Account-suspension (Amazon, PayPal, Apple, etc.)	50	100%	100%	100%
Gov-agency (DVLA, NHS, DWP, TV Licensing)	50	100%	100%	100%
Crypto wallet drainer	48	96.43%	93.10%	94.74%
Advance-fee / job-offer	48	96.43%	96.43%	96.43%
Macro-averaged	351	98.49%	98.50%	98.49%

Five of seven categories at perfect 100/100/100. The remaining categories sit just below — primarily on edge cases (novel scam patterns not yet covered, or genuinely ambiguous messages that even a human reviewer would struggle with).

Privacy — the scanner runs entirely in your browser

Your message text is processed inside your browser via a Web Worker
No message content is sent to SignalTools, our servers, or any third party
No analytics on the message content (Google Analytics only tracks page visits with Consent Mode v2 default-denied)
No saved history of scanned messages
If you reload the page, your scan result and the message are gone

This is by deliberate design. Many "free scam-check" tools harvest the messages you paste into them. The SignalTools scanner does not.

Limits + caveats

The scanner provides a signal, not a verdict. A "looks safe" result is not a guarantee the message is legitimate.
Always verify the sender independently before clicking links, sharing details, or making payments. Use the phone number on the back of your card or the published contact details from the brand's official website.
The scanner is calibrated against UK scam patterns. Non-UK scam shapes may be under-served.
Adversarial messages crafted specifically to evade the detection patterns can pass through.
The validation set is composite (no real victim PII). Real-world false-negative rate is unknown.

Open methodology

The full source is at github.com/AkVin-design/akh in signaltools-hub/scamsupport/worker.js. The validation cases are in signaltools-hub/tests/. Anyone can:

Reproduce the precision / recall numbers
Add additional test cases
Critique the detection rules
Propose patches via corrections@signaltools.org

Where the scanner sits in your toolkit

The scanner is one of three different things you might reach for, and getting the right tool for the situation matters:

The scanner is for the message in front of you right now. Paste it in, get a verdict and a breakdown, decide what to do. Best for individual messages where you want a fast structural read.
The scam guides are for understanding the pattern. If the scanner says “probable scam” and you want to know why this particular brand is being impersonated this week, the relevant guide in All Scam Guides walks through the specific campaign currently running.
The Recover section is for what to do after the fact. If you have already clicked, replied or paid, the Recover walkthrough is the next stop, with the right sequence of bank calls, reports and protective registrations.

What the scanner does not try to do

It does not authenticate the sender at the email-protocol level (no DMARC / DKIM / SPF inspection on raw headers — the scanner reads the visible text, not the message envelope). It does not replace a password manager’s domain-lock-out for credential phishing, or your bank’s in-app fraud team for a transaction-specific question. It does not give you a personalised judgement based on your relationship with the sender — it cannot know whether you were expecting that invoice from a supplier you actually use. And it does not detect zero-day targeted attacks crafted specifically against you; the model is trained on patterns that recur at scale, which is most fraud but not all of it.

For all of those, the scanner’s verdict is a starting point and not a final answer. A “Looks safe” verdict on a message asking you to do something unusual deserves a phone call to the brand on a number you found yourself, not from the message. A “Probable scam” verdict on a message that genuinely matches a recent legitimate interaction with the brand deserves the same.