The Leaky Forms Dilemma

When the Firefox and Safari browsers started blocking third-party cookies and trackers, something had to be done. One of the options was to collect user data as it is typed into a form. Yes, that’s right, grab the data even before the user clicks Submit.

Forms are ubiquitous on the internet, prompting you to sign up for newsletters, special offers, free trials and the like, but many users leave websites without submitting the form they started filling out. As many as two out of every three people in some cases. So websites are increasingly gathering this data as it’s being typed in, including user details, email addresses and even potential passwords

In a study of the top 100,000 websites, European researchers found:

…that users’ email addresses are exfiltrated to tracking, marketing and analytics domains before form submission and without giving consent on 1, 844 websites in the EU… and 2, 950 websites in the US…

Leaky Forms: A Study of Email and Password Exfiltration Before Form Submission

(That 60% difference is the result of Europe’s tougher data protection laws.)

Okay, 1.8% and 2.9% may not seem like much, but there are some big-name websites in there:

News websites such as usatoday.com, foxnews.com and independent.co.uk, appear high on the lists… Medical news and information websites webmd.com and healthline.com are other notable entries for their sensitive content.

High on the list are fashion/beauty, online shopping, general news, software/hardware and business websites. Public information, government/military, and games sites leaked less than 1%, while bottom of the list were porn sites:

… despite filling email fields on hundreds of websites categorized as Pornography, we have not a single email leak. [sic]

They also discovered:

While the majority of email addresses are sent to known tracking domains, we further identify 41 tracker domains that are not listed by any of the popular blocklists.

And that in spite of Europe’s tough General Data Protection Regulations (GDPR):

… email addresses or their hashes are sent to facebook.com on 21 distinct websites in the EU.

Sidebar: What’s a hash?
Hashing simply transforms a string of characters into a fixed-length value that uniquely represents the original string. Here’s some samples from an online hash tool using the MD2 hash…

bob@example.com > c909341ff7663bc753748745a9897f07
rob@example.com > fbb257724fac4091cb5a99df9b7dc86a

Note how a single letter change results in a completely different hash.

greatbiglongusername@longdomainname.com > 6b4be23630ae846dec3ebeff3e552aed
g@eg.com > 991ec7869d8f97376a58294e4a5298cf

Every email address, no matter how big or small, gets the same (in this case) 32-character hash. That makes them easier to handle in a database.

The key point is that both bob@example.com and c909341ff7663bc753748745a9897f07 are one and the same thing. They both uniquely identify the user concerned.


The researchers’ paper is due to be presented at the 31st USENIX Security Symposium in August, but the work is on-going and their website, Leaky Forms, contains even more up to information, including this:

We found that unlike what is claimed, both Meta and TikTok collect hashed personal data when the user clicks links or buttons that in no way resemble a submit button. In fact, Meta and TikTok scripts don’t even try to recognize submit buttons, or listen to (form) submit events… That means Meta and TikTok Pixel collect hashed personal information, even when a user decides to abandon a form, and clicks a button/link to navigate away from the page.

In March 2022, we ran additional crawls of top 100K websites to detect leaks triggered by unrelated button or link clicks… We found that 8,438 (US) / 7,379 (EU) sites may leak to Meta when the user clicks on virtually any button or a link, after filling up a form. In addition, we found 154 (US) / 147 (EU) sites that may leak to TikTok in a similar manner.

Countermeasures?

What can be done to protect users from this sort of data harvesting? The short answer: not much at the moment. While Firefox and Safari block third-party cookies and cross-site tracking, when the researchers tested them against ten known exfiltration websites:

We found that neither Safari nor Firefox blocked email exfiltrations to tracking endpoints…

So the battle continues. Without stringent data protection laws, we’re reliant on a tech-fix to a tech-problem. Browser vendors must take steps to protect against scripts that harvest email addresses for tracking purposes. As the authors note:

We believe the scale of unconsented data collection uncovered in our study justifies a similar countermeasure for scripts that harvest email addresses.


Photo by Joshua Sortino on Unsplash

Tweet or share this:

Leave a Reply

Your email address will not be published. Required fields are marked *