After the recent Twitter discussion of Barclays including a script from web.archive.org here’s a query that attempts to discover how many other sites are doing something similar
SELECT
pages.pageid,
pages.url as pages_url,
requests.url as requests_url
FROM
`httparchive.summary_pages.2020_05_01_mobile` AS pages
JOIN
`httparchive.summary_requests.2020_05_01_mobile` AS requests
ON
pages.pageid = requests.pageid
WHERE
requests.url LIKE "https://web.archive.org%"
and pages.url NOT LIKE "https://web.archive.org%"
GROUP BY
pages.pageid,
pages.url,
requests.url
Results set has 4,364 resources that are requested from web.archive.org, some pages include multiple resources from there - https://docs.google.com/spreadsheets/d/1Y-TLGPRlupaLKPYncF_MSw0x3rwCGgZDp63fC7x7Sb8/edit?usp=sharing
Deduplicating the pages… there are 838 sites making the requests
SELECT
DISTINCT pages.url
FROM
`httparchive.summary_pages.2020_05_01_mobile` AS pages
JOIN
`httparchive.summary_requests.2020_05_01_mobile` AS requests
ON
pages.pageid = requests.pageid
WHERE
requests.url LIKE "https://web.archive.org%"
and pages.url NOT LIKE "https://web.archive.org%"
Results: https://docs.google.com/spreadsheets/d/12J_mrvR7t0fxU-t-SuduYhTYR94S0G6k1awuyrEMmQY/edit?usp=sharing
Queries only check for resources requested over HTTPS and there may be some from via HTTP