In a blog post on dev.to, I shared some insights into the state of SameSite cookie usage across the web using HTTP Archive data. This post shares the SQL queries and data I used to prepare that article.
Last year Google announced updates to Chrome that provide a way for developers to control how cross site cookies should work across their site. The goal is to improve user security and privacy by limiting which third parties can read cookies set while visiting a different site. It also defeats cross site request forgery attacks. The implementation is fairly simple, and only requires developers to add the SameSite attribute to their cookies.
Google changed the default behavior of SameSite attribute to secure cookies by default when Chrome 80 was released in February 2020. However it was rolled back in April 2020 to ensure stability in the initial stage of the COVID-19 response. Now they are planning to resume SameSite cookie enforcement with Chrome 84, which will be released on July 14th.
Creating the Cookies table
In the HTTP Archive, cookies are stored in the requests tables - which are quite large. For June 2020’s mobile dataset this table was 3.29 TB. Querying this table repeatedly would be expensive, so I ran the following to create a smaller table containing all of the Set-Cookie
response headers for all third parties. The query below uses a user defined function to extract the Set-Cookie headers to an array. It also excludes first party cookies. The resulting table httparchive.scratchspace.2020_06_01_mobile_set_cookies
is 28.5GB and contains 88,497,101 cookies.
-- Warning, this query processes 3.29TB of data!!!
-- The output of this query was saved to httparchive.scratchspace.2020_06_01_mobile_set_cookies
-- Please use that instead.
CREATE TEMPORARY FUNCTION getHeaders(payload STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
var $ = JSON.parse(payload);
var headers = $.response.headers;
var cookies=[];
var j=0;
for (i in headers) {
if (headers[i].name.toLowerCase().indexOf('set-cookie')!= -1) {
cookies[j] = headers[i].value;
j++;
}
}
try {
return cookies;
} catch (e) {
return [];
}
""";
SELECT page, url, set_cookie
FROM (
SELECT
page,
url,
getHeaders(payload) AS set_cookies
FROM
`httparchive.requests.2020_06_01_mobile`
WHERE
NET.HOST(page) != NET.HOST(url)
)
CROSS JOIN
UNNEST(set_cookies) AS set_cookie
SameSite Usage Across All Third Parties
Now that we have a smaller table to query, I wanted to summarize how many SameSite cookies are in use. The graph below shows that 32.9% of cookies were set with the SameSite attribute
The query behind this graph is:
SELECT
SUM(IF(LOWER(set_cookie) LIKE '%samesite%',1,0)) SameSiteCookies,
SUM(IF(LOWER(set_cookie) LIKE '%samesite%',0,1)) OtherCookies,
SUM(IF( LOWER(set_cookie) NOT LIKE "%samesite%"
AND LOWER(set_cookie) LIKE "%bytes were stripped]"
,1,0)
) StrippedCookies,
COUNT(DISTINCT page) Sites
FROM
`httparchive.scratchspace.2020_06_01_mobile_set_cookies`
Adding the “Secure” attribute to the query allowed me to look at this for both classes of cookies.
SELECT
IF(LOWER(set_cookie) LIKE "%secure%","secure","not secure") AS Secure,
SUM(IF(LOWER(set_cookie) LIKE "%samesite=lax%",1,0)) AS SameSiteLax,
SUM(IF(LOWER(set_cookie) LIKE "%samesite=strict%",1,0)) AS SameSiteStrict,
SUM(IF(LOWER(set_cookie) LIKE "%samesite=none%",1,0)) AS SameSiteNone,
SUM( IF(LOWER(set_cookie) NOT LIKE "%samesite=none%"
AND LOWER(set_cookie) NOT LIKE "%samesite=lax%"
AND LOWER(set_cookie) NOT LIKE "%samesite=strict%",1,0)
) AS SameSiteOther
FROM
`httparchive.scratchspace.2020_06_01_mobile_set_cookies`
WHERE
LOWER(set_cookie) LIKE "%samesite%"
GROUP BY
Secure
Identifying 3rd Parties with Incorrect SameSite Usage
Google’s new SameSite policy will default all cookies to SameSite=Lax unless the SameSite attribute is present. If a third party wants to set a more permissive setting, then they can only set SameSite=None if the Secure attribute is present. In the previous graph we can see that 2.65% of cookies were handing this incorrectly.
The following table breaks this down by third party hostname. The insecure SameSite=None cookies that these domains are setting will default to SameSite=Lax in Chrome 84.
SELECT
NET.REG_DOMAIN(url) host,
COUNT(*) insecure_SameSiteNone,
ROUND(
COUNT(*) /
(SELECT
COUNT(*)
FROM
`httparchive.scratchspace.2020_06_01_mobile_set_cookies`
WHERE
LOWER(set_cookie) LIKE "%samesite=none%"
AND LOWER(set_cookie) NOT LIKE "%secure%"
),4) percent_insecure_SameSiteNone,
COUNT(distinct page) websitesAffected
FROM
`httparchive.scratchspace.2020_06_01_mobile_set_cookies`
WHERE
LOWER(set_cookie) LIKE "%samesite=none%"
AND LOWER(set_cookie) NOT LIKE "%secure%"
GROUP BY
host
ORDER BY insecure_SameSiteNone DESC
SameSite Usage Per Third Party Domain
Next I wanted to look at the popular third party cookies to see how many had set SameSite attributes and how many have left it to the browser default.
The query for this also uses @patrickhulce’s third_party_web dataset to categorize third party domains :
SELECT
NET.REG_DOMAIN(url) host,
ThirdPartyTable.category AS thirdPartyCategory,
SUM(IF(LOWER(set_cookie) LIKE "%samesite=lax%",1,0)) AS SameSiteLax,
SUM(IF(LOWER(set_cookie) LIKE "%samesite=strict%",1,0)) AS SameSiteStrict,
SUM(IF(LOWER(set_cookie) LIKE "%samesite=none%",1,0)) AS SameSiteNone,
SUM(IF(LOWER(set_cookie) NOT LIKE "%samesite%",1,0)) AS NoSameSiteAttribute,
COUNT(*) total,
COUNT(distinct page) websites
FROM
`httparchive.scratchspace.2020_06_01_mobile_set_cookies`
LEFT JOIN
`lighthouse-infrastructure.third_party_web.2020_05_01` AS ThirdPartyTable
ON NET.HOST(url) = ThirdPartyTable.domain
GROUP BY
host, thirdPartyCategory
ORDER BY total DESC
All of the data used in the blog post is available in this Google Sheet.