What's the popularity of different CDNs?

I always want know the rank of CDNs on Webpagetest, below I share my queries . I’m using all the tests starting from November of 2014.

SELECT cdn, num, ROUND(ratio*100) as percent FROM (
  SELECT cdn, COUNT(cdn) as num, RATIO_TO_REPORT(num) OVER() ratio   FROM (
     SELECT CASE 
       WHEN _cdn_provider IN ('')
         THEN 'None'
         ELSE 'CDN'
       END as cdn
      FROM httparchive:runs.2014_11_01_requests,  
           httparchive:runs.2014_11_15_requests,
           httparchive:runs.2014_11_01_requests_mobile,
           httparchive:runs.2014_11_15_requests_mobile
   ) GROUP BY cdn
) ORDER BY percent DESC

I thought that CDN was more used, and I hoped that CDNs would be above 30%. Considering only HTTP requests on CDN’s, it’s a surprise for me that google is first. Could Google be a CDN provider?

SELECT provider, round(100*ratio) as percent, num 
FROM (SELECT REGEXP_REPLACE(_cdn_provider,r'^$', 'None') as provider, count(*) as num, RATIO_TO_REPORT(num) OVER() ratio
 FROM httparchive:runs.2014_11_01_requests,  
      httparchive:runs.2014_11_15_requests,
      httparchive:runs.2014_11_01_requests_mobile,
      httparchive:runs.2014_11_15_requests_mobile
 WHERE  _cdn_provider != ''
GROUP BY provider   
)
ORDER BY num desc LIMIT 10 

Then, what’s the rank hostnames inside Google as CDN?

SELECT req_host, count(req_host) as num
FROM httparchive:runs.2014_11_01_requests,  
     httparchive:runs.2014_11_15_requests,
     httparchive:runs.2014_11_01_requests_mobile,
     httparchive:runs.2014_11_15_requests_mobile
WHERE _cdn_provider = 'Google'
GROUP BY req_host
ORDER BY num DESC
LIMIT 20

I’m curios. 3rd party objects are Google hosted like videos, jquery/angularjs, etc. could they be CDN hosted?

Then, discard Google and HTTP requests without CDN. … Considering only CDNs that users “pay” to delivery content, the rank is:

SELECT provider, round(100*ratio) as percent, num 
FROM (SELECT REGEXP_REPLACE(_cdn_provider,r'^$', 'None') as provider, count(*) as num, RATIO_TO_REPORT(num) OVER() ratio
  FROM httparchive:runs.2014_11_01_requests,  
       httparchive:runs.2014_11_15_requests,
       httparchive:runs.2014_11_01_requests_mobile,
       httparchive:runs.2014_11_15_requests_mobile
  WHERE _cdn_provider != 'Google' AND _cdn_provider != ''
GROUP BY provider   
)
ORDER BY num desc

1 Like

@fernandoike fun analysis, great work! Since the bulk of the post is analyzing CDN use, I’ve updated the title to reflect that, such that more people can find it in the future. If you feel it’s incorrect, please feel free to hit the revert button. Also, added syntax highlighting to the queries. :smile:

1 Like

Hey @igrigorik! Thanks for your revision. Your suggestion to title is better. Thanks! :smile:

Thanks for this smart idea here. Having the CDN data in WPT is fantastic, and keeping it in HTTP Archive is also brilliant. Thanks for the great data!

One interesting side effect of the synthetic test data is that you can’t see examples of the value of different CDNs for a given site. Many sites these days) often use 2 or more CDNs for the same content, but the test run will only see one at a time per resource (obviously).

Even so, I was curious about the distribution and size of CDN usage for top sites. I expanded the query to pull out the top ~100 sites by rank and then look at their resources to see which CDNs are used (sorry for the double query here - I need to brush up on my sql a bit!):

// rank, pageid and url for top ~100
SELECT rank, pageid, urlShort
	FROM httparchive:runs.2015_01_01_pages
WHERE rank < 110
ORDER BY rank ASC

// select alexa top 100
SELECT pageid, urlShort, req_host, _cdn_provider, respSize
      FROM httparchive:runs.2015_01_01_requests,  
           httparchive:runs.2015_01_01_requests_mobile
WHERE pageid IN (21833830,21833831,21833834,21833838,21833841,21833843,21833845,21833849,21833851,21833855,21833858,21833860,21833864,21833872,21833879,21833880,21833886,21833887,21833888,21833893,21833895,21833899,21833901,21833902,21833905,21833907,21833909,21833910,21833914,21833915,21833919,21833922,21833924,21833926,21833929,21833932,21833935,21833938,21833945,21833947,21833953,21833956,21833957,21833961,21833963,21833964,21833965,21833969,21833974,21833979,21833981,21833986,21833991,21833993,21834000,21834001,21834006,21834011,21834015,21834021,21834024,21834025,21834027,21834029,21834030,21834033,21834034,21834037,21834041,21834047,21834048,21834051,21834052,21834053,21834055,21834056,21834057,21834058,21834062,21834065,21834066,21834067,21834068,21834069,21834070,21834072,21834073,21834074,21834075,21834076,21834077,21834078,21834080,21834081,21834082,21834083,21834084,21834085,21834087,21834120,21834126,21834134,21834138,22325423,22325439,22325463)

When you remove Google, Yahoo and blanks, the data shows a lot of Akamai, as you’d expect, and quite a few others out there for these top ranked sites.

1 Like