Detecting the FLoC opt-out HTTP Header

Federated Learning of Cohorts (FLoC) is a new technology that exists in versions of Chrome 89+. Its purpose is to assign Chrome users to groups (or cohorts), depending on their browser history. These cohorts can be used for targeted advertising by Google. This technology is currently in the “origin trial” stage, where individual sites can opt-in to test this technology. This will allow Google to test the technology in the browser and also improve the underlying algorithms in the background that are assigning users to cohorts.

As per the spec, it is possible for websites to serve a Permissions-Policy HTTP response header to opt out of the cohort computation:

Permissions-Policy: interest-cohort=()

Over the next few months it will be interesting to see how this header is adopted across the monthly crawl. With the help of @paulcalvano and @tunetheweb a query and initial results have been gathered and they can be seen below:

SELECT
  _TABLE_SUFFIX AS client,
  count(DISTINCT pageid) AS pages,
  pages_total,
  count(DISTINCT pageid) / pages_total AS pages_pct,
FROM
  `httparchive.summary_requests.2021_04_01_*`
JOIN
  (SELECT _TABLE_SUFFIX, COUNT(DISTINCT pageid) AS pages_total FROM `httparchive.summary_requests.2021_04_01_*` GROUP BY _TABLE_SUFFIX)
USING (_TABLE_SUFFIX)
WHERE
  LOWER(respOtherHeaders) LIKE '%interest-cohort=()%' AND
  firstHTML
GROUP BY
  _TABLE_SUFFIX,
  pages_total

The latest crawl stats produce this:
Screenshot 2021-04-29 at 23.04.05

Since this is a very new change , and sites started adding the opt-out header mid way through April the latest crawl won’t show a complete picture. This query will be useful in coming months.

A link to @tunetheweb Google Sheet with the data can be found here.

An alternative source for the moment from Crawler.ninja can be found here which lists 2,354 sites at the date of posting.

3 Likes

I don’t seem to be able to edit the post, so adding in another source for reference:

censys.io reports 3,001 sites with the HTTP header added.

Just bumping this post to see if any other crawl data is now available for analysis? I assume we may have had a full months crawl data by now (or coming up very soon).

Yes, May 2021 data is available in BigQuery now.

1 Like

Looks like definitely a big jump up for May, but relatively small in grand scheme of things:

client pages pages_total pages_pct
mobile 7,258 7,205,383 0.1007%
desktop 11,229 6,242,688 0.1799%

June will be finished in a couple of weeks so can check then too.

1 Like