Dirty History Of reCaptcha
reCAPTCHA is “spyware” in a behavioral sense - not literally, but close enough to matter. It collects a wide range of device and behavior data (often without users noticing) and sends that data to Google for risk scoring and cross‑site profiling.
In the EU this raises clear GDPR and ePrivacy issues: personal data, profiling, cross‑border transfers, and the need for consent or other lawful bases.
How reCAPTCHA works
Origins: Early CAPTCHAs were obvious tests - type warped text or pick traffic lights. They were noisy and hard, but visible.
v2 (checkbox / challenge): runs JavaScript in your browser, may set cookies, and triggers visible challenges when a site’s risk rules decide one is needed. It collects signals that help build a device/browser fingerprint.
Invisible reCAPTCHA: quietly gathers signals in the background and only challenges sometimes.
v3: runs continuously and invisibly, producing a risk score (0.0–1.0) for actions across pages. Site owners set thresholds that determine what happens next.
Enterprise: similar but with more telemetry and integration options for big customers; still client‑side collection sent to Google.
What kind of data is collected?
Rather than a long technical laundry list, think in categories:
Network and browser identifiers (IP, User‑Agent, cookies)
Interaction and timing data (mouse, touch, keystroke timing);
Fingerprinting signals (fonts, canvas/WebGL quirks, plugins);
Device and hardware metadata (screen size, CPU/GPU traits); and other page‑context signals (DOM interactions, cache/service worker status).
The exact mix depends on the version and settings, but together they can make a highly identifying “behavioral fingerprint.”
Why people call it “spyware” (behavioral reasons)
Invisible, persistent collection: modern modes gather data continuously without obvious prompts.
Fingerprinting: combined signals can uniquely identify or link users over time.
Cross‑site profiling: the same Google script on many sites lets Google match behavior across the web.
Third‑party script risk: reCAPTCHA runs code hosted by Google inside the page, with broad access to browser APIs.
Cross‑border transfers: collected data is usually sent to Google servers outside the EEA, raising transfer and surveillance concerns.
Check out this video that inspired this post
EU legal implications
Personal data: fingerprints, IPs, and behavioral signals can be personal data under the GDPR (see Recital 30). Processing them triggers full GDPR obligations.
Lawful basis & consent: ePrivacy rules and GDPR mean that non‑essential cookies/trackers generally require explicit consent. Claiming “legitimate interest” is risky when profiling or cross‑site tracking is involved.
DPIA: using site‑wide profiling (e.g., v3) is likely a high-risk processing activity that requires a Data Protection Impact Assessment.
International transfers: after Schrems II, transfers to the US must be assessed and protected; controllers should document transfer impact assessments and safeguards (e.g., SCCs, additional measures).
Enforcement trends: European data protection authorities (e.g., CNIL and others) have warned site owners about transparency and consent; enforcement is already happening in some cases.
Technically, reCAPTCHA collects broad behavioral and device signals that can be combined into identifying profiles and used for cross‑site risk scoring. That makes it behaviorally “spyware‑like,” and under EU law it usually triggers GDPR/ePrivacy obligations. Regulators are watching, and many site owners will need to do more than hope users don’t notice.