Wappalyzer provides detection logic for over 1,000 different kinds of web technologies categorized by things like CMS and JS frameworks.
We can query these detections in the
technologies dataset, for example to count the number of WordPress websites:
#standardSQL # Counts 906,033 distinct URLs SELECT COUNT(DISTINCT url) FROM `httparchive.technologies.2019_03_01_desktop` WHERE app = 'WordPress'
This query only costs about 1 GB.
If we don’t care about whether the website is built for mobile or desktop, we can use a wildcard in place of the client type:
#standardSQL # Counts 1,213,389 distinct URLs SELECT COUNT(DISTINCT url) FROM `httparchive.technologies.2019_03_01_*` WHERE app = 'WordPress'
Note that this consumes about 2x as much data because it queries both the desktop and mobile tables.
For a list of all possible
app names, see https://www.wappalyzer.com/technologies