CDN Usage - December 2018


#1

As of December 15th 2018, the HTTP Archive is crawling the full list of desktop origins from the Chrome User Experience (CrUX) Report for the desktop crawls (mobile will be added as of January 1, 2019). The URL list used is the latest available at the time of the crawl (November 2018 in this case).

That change makes the HTTP Archive data set much more useful for doing across-the-web kinds of analysis for all of the sites that get a non-trivial amount of traffic in any given month. The CrUX dataset is MUCH cleaner than any of the Alexa lists and records all page navigations and includes the fqdn, not just the origin (so blogspot records all of the individual blogs that get sufficient traffic for example, not just blogspot.com). The data is skewed to sites that get visits from Chrome so some long-tail sites in some regions may be under-represented.

W3techs reports the reverse-proxy usage across the web using the Alexa 10M list of domains and shows CDN usage at around 10%. That seemed a bit lower than I expected so I decided to compare it to what the HTTP Archive sees using the CrUX origins list:

#standardSQL
SELECT
 RTRIM(LTRIM(JSON_EXTRACT(payload, '$._base_page_cdn'),"\""),"\"") as cdn, count(*) as freq
FROM
  `httparchive.pages.2018_12_15_desktop`
GROUP BY
  cdn
ORDER BY
  freq DESC

Results:

Row cdn freq
1 3163368
2 Cloudflare 307118
3 Google 155957
4 Akamai 51538
5 Amazon CloudFront 47140
6 Fastly 41848
7 WordPress 17993
8 Incapsula 17394
9 Sucuri Firewall 12961
10 OVH CDN 3278
11 Cloudflare, Fastly 2571
12 Netlify 2499
13 Edgecast 1815
14 CDN 1766
15 CDNetworks 1243
16 Google, Cloudflare 1198
17 Amazon CloudFront, Cloudflare 966
18 ChinaNetCenter 660
19 Yunjiasu 615
20 Limelight 608
21 GoCache 566
22 Zenedge 545
23 Microsoft Azure 522
24 Instart Logic 514
25 Cedexis 447
26 section.io 440
27 StackPath 402
28 Level 3 393
29 Highwinds 344
30 Azion 328
31 ChinaCache 326

With 3840067 rows in the dataset, that gives us:

Row cdn freq Percent
1 3163368 82.38%
2 Cloudflare 307118 8.00%
3 Google 155957 4.06%
4 Akamai 51538 1.34%
5 Amazon CloudFront 47140 1.23%
6 Fastly 41848 1.09%
7 WordPress 17993 0.47%
8 Incapsula 17394 0.45%
9 Sucuri Firewall 12961 0.34%
10 OVH CDN 3278 0.09%
11 Cloudflare, Fastly 2571 0.07%
12 Netlify 2499 0.07%
13 Edgecast 1815 0.05%
14 CDN 1766 0.05%
15 CDNetworks 1243 0.03%
16 Google, Cloudflare 1198 0.03%
17 Amazon CloudFront, Cloudflare 966 0.03%
18 ChinaNetCenter 660 0.02%
19 Yunjiasu 615 0.02%
20 Limelight 608 0.02%
21 GoCache 566 0.01%
22 Zenedge 545 0.01%
23 Microsoft Azure 522 0.01%
24 Instart Logic 514 0.01%
25 Cedexis 447 0.01%
26 section.io 440 0.01%
27 StackPath 402 0.01%
28 Level 3 393 0.01%
29 Highwinds 344 0.01%
30 Azion 328 0.01%
31 ChinaCache 326 0.01%

18% of origins with visitor activity in November 2018 use a CDN (8% of all origins using Cloudflare). The results are only as accurate as WebPageTest’s detection and may over-count Google (anything served by a GFE is considered to be on a Google CDN) and miss any CDN’s it doesn’t know about but should be in the ballpark.


#2

Tip: to get the string value without the quotes, use JSON_EXTRACT_SCALAR. :smile:


#3

very insightful @patmeenan thanks

so Cloudflare’s usage is nearly as much as number 2 to 7 combined for Google + Akamai + Cloudfront + Fastly + Wordpress !


#4

Ahh, good to know, thanks. Never understood why it wasn’t actually extracting the values themselves which seems a little nuts.


#5

@patmeenan for rows that have two CDNs in them does that imply single domains load balancing between CDNs?


#6

No, those are generally layered. i.e. Cloudflare in front of CloudFront for some reason. The data is all collected from a single page load and those have the headers from both.


#7

i see, thank you for the clarification.