How We Detect Phishing Without Seeing Your URLs: k-Anonymity in a Browser Extension

Every browser security extension faces the same tension: to protect you, it needs to know what sites you're visiting. But if it knows that, so does the company running it. Here's how we detect phishing without ever seeing your URLs.

The problem with blocklist-based detection

Traditional browser security tools work the same way: they maintain a blocklist of known-bad URLs, and when you navigate somewhere, they send that URL to a central server. This creates two problems: it builds a surveillance infrastructure, and it's reactive — only catching threats that have already been reported.

How k-anonymity works

The technique is called k-anonymity, and Google uses the same approach for Safe Browsing. When you navigate to a URL:

We compute a SHA-256 hash of the URL locally — it never leaves your device.
We send only the first 8 hex characters (32 bits) to our backend.
Our server returns all known threat hashes starting with that prefix.
Your device checks locally whether the full hash matches. The URL itself is never sent.

With 8 hex characters, each prefix matches approximately 1 in 4 billion possible URL hashes. It is mathematically impossible to reconstruct the original URL from the prefix.

How we proxy JavaScript APIs without breaking websites

We also monitor JavaScript behavior — clipboard access, canvas fingerprinting, eval abuse — using API proxying. When the content script loads, it wraps certain browser APIs with monitored versions that call through to the original:

const original = HTMLCanvasElement.prototype.toDataURL;
HTMLCanvasElement.prototype.toDataURL = function(...args) {
  behaviorLog.push({ type: 'canvas_fingerprint', timestamp: Date.now() });
  return original.apply(this, args); // page still works
};

The proxies always call through to the real API — the page never breaks. We observe the call pattern but not the values.

What we collect and what we don't

We log whether APIs were called, not what was passed to them. Cookie values, clipboard contents, form fields, and keystrokes are never logged. The behavioral analysis happens locally; only the resulting risk score is used.

Where heuristics fail

k-anonymity only catches URLs already in threat databases. A brand-new phishing site won't be there yet — this is where behavioral detection fills the gap. But sophisticated attackers can time attacks to avoid detection windows, and legitimate sites sometimes use patterns (eval, canvas) that trigger false positives. We tune thresholds to balance sensitivity, but the arms race is real.

CyberXrai is a free Chrome extension that detects phishing and malicious scripts in real time — without sending your URLs to any server. Install it free.