Top Base Page CDNs for Top URLs

I recently started recording the CDN value in HTTP Archive. This is the CDN used for the base HTML page. This value is determined by WebPagetest (relevant code). This set of queries might be a controversial. If you see flaws or caveats please comment. One caveat is that “Google” includes a lot of blogs that run on Blogger (so classifying that as using Google as a “CDN” is debatable).

It’s interesting how Akamai leads for the topmost sites, but when we look across all 300K URLs Google and Cloudflare rise to the top. This might be due to cost of entry.

Top 1,000 Websites

``` SELECT cdn, round(ratio*1000)/10 AS percent FROM ( SELECT cdn, count(*) AS total, RATIO_TO_REPORT(total) OVER() AS ratio FROM httparchive:runs.latest_pages where rank <= 1000 GROUP by cdn) order by percent desc; ```

Top 10,000 Websites

(Change "1000" to "10000" in the query.)

Top 100,000 Websites

(Change "1000" to "100000" in the query.)

Top 300,000 Websites

(Change "1000" to "300000" in the query.)

2 Likes

@souders fascinating! Do you have a link for the set of rules that WPT uses to group these?

I’m kinda curious about lxdns.com? Not familiar with it, and the domain itself doesn’t return anything?

Here’s the WebPagetest code: http://code.google.com/p/webpagetest/source/browse/trunk/agent/browser/ie/pagetest/cdn.h

I updated the original post to clarify: this is the CDN value for the BASE HTML PAGE. I’m not currently recording the CDN for subresources.

@igrigorik lxdns.com owned by China Net Center, CDN and hosting service provider in China. http://sydney.abongo.com/investigate/lxdns.com/whois

@souders I looked into cdn.h and want to fix some for CDNetworks. ‘Panther’ (panthercdn.com) was merged into CDNetworks at 2009 and some suffix is missing… can you update cdn.h? I have uploaded my patch at : http://pastebin.com/pygV2ERr

@junhochoi thanks for the pointer. Re, patch: WPT is now on GitHub [1], so the easiest way would be to open a pull request there.

[1] https://github.com/WPO-Foundation/webpagetest

@igrigorik done. thanks. https://github.com/WPO-Foundation/webpagetest/pull/117

@souders

This is the CDN used for the base HTML page

This is what makes your result set very interesting as you clearly provided the CDNs which provide any acceleration of dynamic content (otherwise base HTML is highly unlikely to pass through a CDN barring a small percent wherein the HTML itself is cacheable).

The best data point is that of Amazon Cloudfront which just launched their Whole Site Delivery service in May 2012 and see 0.1% of the top 10,000 whereas AT&T which has been around for a while is also stuck at the same percent.

A minor nit is that Cotendo is now a part of Akamai so that 0.1% should also be merged on the Akamai. I shall submit a pull request patterned after @junhochoi’s patch