I recently started recording the CDN value in HTTP Archive. This is the CDN used for the base HTML page. This value is determined by WebPagetest (relevant code). This set of queries might be a controversial. If you see flaws or caveats please comment. One caveat is that “Google” includes a lot of blogs that run on Blogger (so classifying that as using Google as a “CDN” is debatable).
It’s interesting how Akamai leads for the topmost sites, but when we look across all 300K URLs Google and Cloudflare rise to the top. This might be due to cost of entry.
Top 1,000 Websites
```
SELECT cdn, round(ratio*1000)/10 AS percent FROM (
SELECT cdn, count(*) AS total, RATIO_TO_REPORT(total) OVER() AS ratio
FROM httparchive:runs.latest_pages where rank <= 1000
GROUP by cdn) order by percent desc;
```
@souders I looked into cdn.h and want to fix some for CDNetworks. ‘Panther’ (panthercdn.com) was merged into CDNetworks at 2009 and some suffix is missing… can you update cdn.h? I have uploaded my patch at : http://pastebin.com/pygV2ERr
This is what makes your result set very interesting as you clearly provided the CDNs which provide any acceleration of dynamic content (otherwise base HTML is highly unlikely to pass through a CDN barring a small percent wherein the HTML itself is cacheable).
The best data point is that of Amazon Cloudfront which just launched their Whole Site Delivery service in May 2012 and see 0.1% of the top 10,000 whereas AT&T which has been around for a while is also stuck at the same percent.
A minor nit is that Cotendo is now a part of Akamai so that 0.1% should also be merged on the Akamai. I shall submit a pull request patterned after @junhochoi’s patch