I’m interested in exploring the relationship between various CDN providers and their sites’ XSS protections (maybe the CDN doesn’t add these headers, but it would still be cool to know). I altered the query from the Security 2019 page (8.37) to check but it’s not quite working as intended:
I basically replaced client with cdn where applicable and altered the requests dataset, but the directive portion is a bit confusing. I tried placing it in the selection (as something that comes from the final joined table) but that doesn’t run. In the current state, the percentages are all over 100% regardless of where the hundred is multiplied in (or not at all)
#standardSQL
# 08_37: SameSite cookies
CREATE TEMPORARY FUNCTION extractHeader(payload STRING, name STRING)
RETURNS STRING LANGUAGE js AS '''
try {
var $ = JSON.parse(payload);
var header = $._headers.response.find(h => h.toLowerCase().startsWith(name.toLowerCase()));
if (!header) {
return null;
}
return header.substr(header.indexOf(':') + 1).trim();
} catch (e) {
return null;
}
''';
SELECT
cdn,
total,
COUNT(DISTINCT page) AS pages,
ROUND(COUNT(DISTINCT page) * 100 / total, 2) AS pct
FROM
`httparchive.requests.2020_07_01_desktop`,
UNNEST(SPLIT(extractHeader(payload, 'Set-Cookie'), ';')) AS directive
JOIN
(SELECT url, cdn, COUNT(0) AS total FROM `httparchive.summary_pages.2020_07_01_*` GROUP BY cdn, url)
USING (url)
WHERE
STARTS_WITH(TRIM(directive), 'SameSite')
GROUP BY
cdn,
total
ORDER BY
pages / total DESC
Hi @slane. It looks like your query is JOINing the summary_pages table to identify the CDN, and the requests table to extract the cookie details. You may want to use the _cdn_provider column in the summary_requests table instead - since that will tell you the CDN that served the request (rather than the CDN that served the base HTML).
I wrote an analysis on SameSite cookies a few weeks ago, which you can read here - SameSite Cookies Analysis. I didn’t include CDN analysis in that, but I did extract all of the cookies for the June 2020 dataset into a separate table for anaysis. That table might be useful to you for this as well.
The table httparchive.scratchspace.2020_06_01_mobile_set_cookies contains all of the Set-Cookie headers for every single request. I would avoid counting the occurances from this since there are often multiple cookies set on a single request. However you can count the distinct request urls to get a feel for whether a SameSite cookie was set.
Here’s an example of a query counting the requests with a SameSite cookie per CDN. Note that I’m using my scratchspace cookies table, JOINing the summary_requests table, and counting the distinct urls. (processes 73GB)
SELECT
_cdn_provider,
COUNT(DISTINCT summary_requests.url) AS samesite_cookies
FROM
`httparchive.scratchspace.2020_06_01_mobile_set_cookies` AS cookies
INNER JOIN
`httparchive.summary_requests.2020_06_01_mobile` AS summary_requests
ON NET.HOST(page) = NET.HOST(summary_requests.url)
WHERE
LOWER(set_cookie) LIKE "%samesite%"
GROUP BY
_cdn_provider
ORDER BY samesite_cookies DESC
This gives you number of requests, but not %s. You’d need the number of requests served by each CDN to do that comparison. For that I would run the following (processes 2.8GB):
SELECT
_cdn_provider,
COUNT(*) AS requests
FROM
`httparchive.summary_requests.2020_06_01_mobile` AS summary_requests
GROUP BY
_cdn_provider
ORDER BY requests DESC
Then combine the results and calculate the %s. I did this and you can see the results below
Hi @paulcalvano, thanks so much for the analysis. Your results are definitely interesting, but I’m looking to get this done on a more recurring basis (to chart it over time) and so I’d prefer to do most of the work in one step. I altered the query to no longer JOIN two tables, but now I’m a bit lost on how to get the percentage. The following query should get the total URLs with the Same Site Cookie and organize by CDN, but I’m unsure about how to get the percentages in this case. Since the table I’m effectively utilizing here is only those URLs which contain the header, I don’t know how to count the total URLs to get a percentage. Any help in this fashion would be much appreciated.
SELECT
JSON_EXTRACT(payload, '$._cdn_provider') as cdn,
COUNT(0) as totalSS,
ROUND(COUNT(0) * 100 / _______, 2) as pct
FROM
`httparchive.requests.2020_07_01_desktop`,
UNNEST(SPLIT(extractHeader(payload, 'Set-Cookie'), ';')) AS directive
WHERE
STARTS_WITH(TRIM(directive), 'SameSite')
GROUP BY
cdn
ORDER BY
pct DESC
I haven’t tested this at all but at a glance, maybe something like this would do the trick?
SELECT
JSON_EXTRACT(payload, '$._cdn_provider') as cdn,
COUNTIF(STARTS_WITH(TRIM(directive), 'SameSite')) as totalSS,
COUNT(0) AS total,
ROUND(COUNTIF(STARTS_WITH(TRIM(directive), 'SameSite')) * 100 / COUNT(0), 2) as pct
FROM
`httparchive.requests.2020_07_01_desktop`,
UNNEST(SPLIT(extractHeader(payload, 'Set-Cookie'), ';')) AS directive
GROUP BY
cdn
ORDER BY
pct DESC
Worked perfectly @rviscomi, thanks! Just one minor typo though for anyone else who may attempt to run this, there should only be two parentheses after ‘SameSite’ right before the FROM line.
Also, @rviscomi any idea on how to narrow this down to only use base pages rather than including every URL? I was planning on using the value within headers from the payload to get the base URL, but I’m not quite sure how to make that work with the directive in this framework.