As of December 15th 2018, the HTTP Archive is crawling the full list of desktop origins from the Chrome User Experience (CrUX) Report for the desktop crawls (mobile will be added as of January 1, 2019). The URL list used is the latest available at the time of the crawl (November 2018 in this case).
That change makes the HTTP Archive data set much more useful for doing across-the-web kinds of analysis for all of the sites that get a non-trivial amount of traffic in any given month. The CrUX dataset is MUCH cleaner than any of the Alexa lists and records all page navigations and includes the fqdn, not just the origin (so blogspot records all of the individual blogs that get sufficient traffic for example, not just blogspot.com). The data is skewed to sites that get visits from Chrome so some long-tail sites in some regions may be under-represented.
W3techs reports the reverse-proxy usage across the web using the Alexa 10M list of domains and shows CDN usage at around 10%. That seemed a bit lower than I expected so I decided to compare it to what the HTTP Archive sees using the CrUX origins list:
#standardSQL
SELECT
RTRIM(LTRIM(JSON_EXTRACT(payload, '$._base_page_cdn'),"\""),"\"") as cdn, count(*) as freq
FROM
`httparchive.pages.2018_12_15_desktop`
GROUP BY
cdn
ORDER BY
freq DESC
Results:
Row | cdn | freq | |
---|---|---|---|
1 | 3163368 | ||
2 | Cloudflare | 307118 | |
3 | 155957 | ||
4 | Akamai | 51538 | |
5 | Amazon CloudFront | 47140 | |
6 | Fastly | 41848 | |
7 | WordPress | 17993 | |
8 | Incapsula | 17394 | |
9 | Sucuri Firewall | 12961 | |
10 | OVH CDN | 3278 | |
11 | Cloudflare, Fastly | 2571 | |
12 | Netlify | 2499 | |
13 | Edgecast | 1815 | |
14 | CDN | 1766 | |
15 | CDNetworks | 1243 | |
16 | Google, Cloudflare | 1198 | |
17 | Amazon CloudFront, Cloudflare | 966 | |
18 | ChinaNetCenter | 660 | |
19 | Yunjiasu | 615 | |
20 | Limelight | 608 | |
21 | GoCache | 566 | |
22 | Zenedge | 545 | |
23 | Microsoft Azure | 522 | |
24 | Instart Logic | 514 | |
25 | Cedexis | 447 | |
26 | section.io | 440 | |
27 | StackPath | 402 | |
28 | Level 3 | 393 | |
29 | Highwinds | 344 | |
30 | Azion | 328 | |
31 | ChinaCache | 326 |
With 3840067 rows in the dataset, that gives us:
Row | cdn | freq | Percent |
---|---|---|---|
1 | 3163368 | 82.38% | |
2 | Cloudflare | 307118 | 8.00% |
3 | 155957 | 4.06% | |
4 | Akamai | 51538 | 1.34% |
5 | Amazon CloudFront | 47140 | 1.23% |
6 | Fastly | 41848 | 1.09% |
7 | WordPress | 17993 | 0.47% |
8 | Incapsula | 17394 | 0.45% |
9 | Sucuri Firewall | 12961 | 0.34% |
10 | OVH CDN | 3278 | 0.09% |
11 | Cloudflare, Fastly | 2571 | 0.07% |
12 | Netlify | 2499 | 0.07% |
13 | Edgecast | 1815 | 0.05% |
14 | CDN | 1766 | 0.05% |
15 | CDNetworks | 1243 | 0.03% |
16 | Google, Cloudflare | 1198 | 0.03% |
17 | Amazon CloudFront, Cloudflare | 966 | 0.03% |
18 | ChinaNetCenter | 660 | 0.02% |
19 | Yunjiasu | 615 | 0.02% |
20 | Limelight | 608 | 0.02% |
21 | GoCache | 566 | 0.01% |
22 | Zenedge | 545 | 0.01% |
23 | Microsoft Azure | 522 | 0.01% |
24 | Instart Logic | 514 | 0.01% |
25 | Cedexis | 447 | 0.01% |
26 | section.io | 440 | 0.01% |
27 | StackPath | 402 | 0.01% |
28 | Level 3 | 393 | 0.01% |
29 | Highwinds | 344 | 0.01% |
30 | Azion | 328 | 0.01% |
31 | ChinaCache | 326 | 0.01% |
18% of origins with visitor activity in November 2018 use a CDN (8% of all origins using Cloudflare). The results are only as accurate as WebPageTestâs detection and may over-count Google (anything served by a GFE is considered to be on a Google CDN) and miss any CDNâs it doesnât know about but should be in the ballpark.