Websites with cache-control: no-store, by CDN

I recently queried the HTTP Archive to get the number of websites (pages) that serve the HTML with response header cache-control contains no-store .

My query:

SELECT
  _TABLE_SUFFIX AS client,
  count(DISTINCT pageid) AS pages,
  pages_total,
  count(DISTINCT pageid) / pages_total AS pages_pct,
FROM
  `httparchive.summary_requests.2021_08_01_*`
JOIN
  (SELECT _TABLE_SUFFIX, COUNT(DISTINCT pageid) AS pages_total FROM `httparchive.summary_requests.2021_08_01_*` GROUP BY _TABLE_SUFFIX)
USING (_TABLE_SUFFIX)
WHERE
  LOWER(resp_cache_control) LIKE '%no-store%' AND
  firstHTML
GROUP BY
  _TABLE_SUFFIX,
  pages_total

Now, I’d like take a next step and get the number of pages with that header, by CDN.
Say the total is 2,000,000 pages with that header’s directive on mobile, how many of those are served by Akamai, Cloudflare, Fastly, etc?

But ideally I can get the % of pages that each CDN serves with that header’s directive (a bfcache killer :frowning_face: ), for example:

Akamai:

  • Total pages: 200,000
  • Pages with no-store: 100,000
  • % pages with no-store: 50%

My SQL is rusty at best and I can’t get it to work in BigQuery.
Anybody have some pointers for me? Txs !

1 Like

Hi @aaronpeters :wave:

I took a stab at this using the new all.pages table for CDN detection and all.requests table for no-store detection. My general approach is to use WITH subqueries for each of these pieces, then bring them together with a JOIN in the final SELECT statement.

WITH no_store AS (
  SELECT
    page
  FROM
    `httparchive.all.requests`,
    UNNEST(response_headers) AS header
  WHERE
    date = '2023-08-01' AND
    client = 'mobile' AND
    is_root_page AND
    is_main_document AND
    LOWER(header.name) = 'cache-control' AND
    REGEXP_CONTAINS(header.value, r'no-store')
),

cdns AS (
  SELECT
    rank,
    page,
    JSON_VALUE(summary, '$.cdn') AS cdn
  FROM
    `httparchive.all.pages`
  WHERE
    date = '2023-08-01' AND
    client = 'mobile' AND
    is_root_page AND
    JSON_VALUE(summary, '$.cdn') != ''
)


SELECT
  cdn,
  COUNTIF(no_store.page IS NOT NULL) AS pages_with_no_store,
  COUNT(0) AS total_pages,
  COUNTIF(no_store.page IS NOT NULL) / COUNT(0) AS pct_no_store
FROM
  cdns
LEFT JOIN
  no_store
USING
  (page)
GROUP BY
  cdn
HAVING
  total_pages >= 1000
ORDER BY
  total_pages DESC

Note that the we’re detecting CDNs based on WebPageTest.

Here are the results for CDNs with 1k+ sites, ordered by popularity:

cdn pages_with_no_store total_pages pct_no_store
Cloudflare 391,675 2,548,966 15.4%
Google 139,629 1,099,485 12.7%
Fastly 12,681 363,678 3.5%
Amazon CloudFront 66,242 297,571 22.3%
Akamai 43,628 115,532 37.8%
Automattic 5 105,874 0.0%
Sucuri Firewall 8,157 53,357 15.3%
Vercel 6,326 47,304 13.4%
Incapsula 11,806 38,333 30.8%
Microsoft Azure 4,926 20,522 24.0%
CDN 1,173 19,184 6.1%
Cloudflare, Google 4,858 16,901 28.7%
OVH CDN 3,669 14,450 25.4%
Netlify 282 13,806 2.0%
Cloudflare, Fastly 665 13,434 5.0%
Amazon CloudFront, Cloudflare 1,123 12,282 9.1%
Highwinds 5,875 11,414 51.5%
Zenedge 10,353 10,700 96.8%
GoCache 8 8,208 0.1%
section.io 4,041 7,820 51.7%
Azion 118 6,581 1.8%
Alibaba 595 5,531 10.8%
QUIC.cloud 405 4,214 9.6%
Fastly, Google 141 3,551 4.0%
Tencent 170 2,698 6.3%
BunnyCDN 27 2,528 1.1%
CDNetworks 314 1,338 23.5%

Here’s another view of the top 10 CDNs by % no-store:

cdn pages_with_no_store total_pages pct_no_store
Zenedge 10,353 10,700 96.8%
section.io 4,041 7,820 51.7%
Highwinds 5,875 11,414 51.5%
Akamai 43,628 115,532 37.8%
Incapsula 11,806 38,333 30.8%
Cloudflare, Google 4,858 16,901 28.7%
OVH CDN 3,669 14,450 25.4%
Microsoft Azure 4,926 20,522 24.0%
CDNetworks 314 1,338 23.5%
Amazon CloudFront 66,242 297,571 22.3%
Cloudflare 391,675 2,548,966 15.4%
Sucuri Firewall 8,157 53,357 15.3%

Does this answer your question?

1 Like

Also, here’s a sample of 100 of the most popular pages to use no-store:

SELECT
  rank,
  cdn,
  page
FROM
  cdns
JOIN
  no_store
USING
  (page)
ORDER BY
  rank,
  NET.REG_DOMAIN(page)
LIMIT
  100
rank cdn page
1,000 Akamai https://www.accuweather.com/
1,000 Akamai https://www.ad.nl/
1,000 Akamai https://m.alibaba.com/
1,000 Cloudflare https://altibbi.com/
1,000 Akamai https://sellercentral.amazon.com/
1,000 Fastly https://ameblo.jp/
1,000 Cloudflare https://www.ancestry.com/
1,000 Cloudflare https://bbs.animanch.com/
1,000 Cloudflare https://www.arbada.com/
1,000 Cloudflare https://m.beboo.ru/
1,000 Cloudflare https://www.bet365.com/
1,000 Cloudflare https://br.betano.com/
1,000 Amazon CloudFront https://www.bikewale.com/
1,000 Cloudflare https://in.bookmyshow.com/
1,000 Fastly https://www.buzzfeed.com/
1,000 Cloudflare https://www.canva.com/
1,000 Amazon CloudFront https://coinmarketcap.com/
1,000 Akamai https://www.coppel.com/
1,000 Akamai https://link.coupang.com/
1,000 Akamai https://www.creditkarma.com/
1,000 Amazon CloudFront, Cloudflare https://www.curseforge.com/
1,000 Cloudflare https://www.dagbladet.no/
1,000 Incapsula https://hiroba.dpoint.docomo.ne.jp/
1,000 Cloudflare https://www.dogdrip.net/
1,000 Cloudflare https://doublelist.com/
1,000 Cloudflare https://dualeotruyenz.com/
1,000 Akamai https://www.elcorteingles.es/
1,000 Cloudflare https://pl.escort.club/
1,000 Akamai https://www.espncricinfo.com/
1,000 Akamai https://www.etsy.com/
1,000 Akamai https://www.expressen.se/
1,000 Facebook https://web.facebook.com/
1,000 Facebook https://free.facebook.com/
1,000 Facebook https://www.facebook.com/
1,000 Facebook https://apps.facebook.com/
1,000 Facebook https://mbasic.facebook.com/
1,000 Facebook https://m.facebook.com/
1,000 Facebook https://business.facebook.com/
1,000 Cloudflare https://faphouse.com/
1,000 Google https://www.finn.no/
1,000 Cloudflare https://funpay.com/
1,000 Google https://play.google.com/
1,000 Google https://sites.google.com/
1,000 Google https://docs.google.com/
1,000 Google https://one.google.com/
1,000 Google https://mail.google.com/
1,000 Google https://accounts.google.com/
1,000 Google https://news.google.com/
1,000 Google https://myactivity.google.com/
1,000 Google https://translate.google.com/
1,000 Cloudflare https://ww1.goojara.to/
1,000 Akamai https://www.hln.be/
1,000 Amazon CloudFront https://www.imdb.com/
1,000 Amazon CloudFront https://m.imdb.com/
1,000 Cloudflare https://imhentai.xxx/
1,000 Akamai https://www.incometax.gov.in/
1,000 Fastly https://www.independent.co.uk/
1,000 Google https://m.indiamart.com/
1,000 Akamai https://navbharattimes.indiatimes.com/
1,000 Facebook https://www.instagram.com/
1,000 Cloudflare https://isaidub3.co/
1,000 Cloudflare https://jable.tv/
1,000 Akamai https://www.jiocinema.com/
1,000 Cloudflare https://www.jusbrasil.com.br/
1,000 Cloudflare https://jut.su/
1,000 Akamai https://www.kleinanzeigen.de/
1,000 Cloudflare https://m.ladies.de/
1,000 Akamai https://www.latamairlines.com/
1,000 Akamai https://www.lavanguardia.com/
1,000 Cloudflare https://www.learncbse.in/
1,000 Akamai https://m.lightinthebox.com/
1,000 Akamai https://today.line.me/
1,000 Microsoft Azure https://www.linkedin.com/
1,000 Cloudflare https://litnet.com/
1,000 Akamai https://www.lowes.com/
1,000 Cloudflare https://en.luxuretv.com/
1,000 Amazon CloudFront https://select.mamastar.jp/
1,000 Cloudflare https://www.mangago.me/
1,000 Google https://www.marktplaats.nl/
1,000 Cloudflare https://www.mediafire.com/
1,000 Amazon CloudFront https://articulo.mercadolibre.com.mx/
1,000 Amazon CloudFront https://www.mercadolibre.com.mx/
1,000 Google https://homepage.mintnav.com/
1,000 Cloudflare https://mobile-tracker-free.com/
1,000 Akamai https://www.moneycontrol.com/
1,000 Cloudflare https://moviesda4.net/
1,000 Microsoft Azure https://www.msn.com/
1,000 Akamai https://www.mundodeportivo.com/
1,000 Akamai https://www.myntra.com/
1,000 Cloudflare https://www.nairaland.com/
1,000 Akamai https://n.news.naver.com/
1,000 Akamai https://m.naver.com/
1,000 Azion https://www.netshoes.com.br/
1,000 Cloudflare https://olimpbet.kz/
1,000 Cloudflare https://chat.openai.com/
1,000 Fastly https://www.paypal.com/
1,000 Amazon CloudFront https://picrew.me/
1,000 Cloudflare https://www.pixiv.net/
1,000 Cloudflare https://www.rbc.ua/
1,000 Amazon CloudFront https://www.redfin.com/

Thanks @rviscomi you are the man! :pray:
The query looks great and … so simple :smiley:

Thanks for including some data in your response, that is very nice.

First comments on the data:

1,099,485 pages served by Google :
That sure is not companies who are using Google Cloud CDN for accelerating their website. I guess there are many, many Google web properties in HTTP Archive (.blogspot.com?)

“Cloudflare, Google”: does that signify multi-cdn?

What is “CDN”?

:clap: Automattic looks fine in this table

Thanks @aaronpeters glad I can help.

Here’s the WPT source where that detection is done, if interested:

You’re right, when I look at the most common domain names behind the Google CDN, most of them are blogspot.com:

SELECT
  APPROX_TOP_COUNT(NET.REG_DOMAIN(page), 10)
FROM
  `httparchive.all.pages`
WHERE
  date = '2023-08-01' AND
  client = 'mobile' AND
  is_root_page AND
  JSON_VALUE(summary, '$.cdn') = 'Google'
domain pages
blogspot.com 249,999
business.site 28,680
softonic.com 19,368
negocio.site 6,744
eatbu.com 5,908
softonic.com.br 3,253
repl.co 1,784
netlify.app 1,721
choiceqr.com 1,487
play-cricket.com 1,467

Yes IIRC that means it’s multi-CDN, with Cloudflare in front. @patmeenan can correct me if I’m wrong.

Thanks @rviscomi and yes, I know of the WPT logic for CDN detection.
That needs a clean up … NetDNA, HiberniaCDN and others should no longer be in there, and I wonder how WPT handles the case where Akamai has a CloudFront origin … which response header wins?
Anyway …

None “win”. As Rick mentioned, both will get listed. There’s no harm in the old, dead CDN’s being there but always happy to accept PR’s.

Got it Pat.
Will find the time for a PR soon.