CDN consumption in India

Amazon.in seems to use Akamai as one of its CDNs.

Is that true? If so what’s their multi-CDN story? Digging into the CruX data and running the following simple query tells us that amazon.in has indeed been using multiple CDNs

SELECT page, JSON_EXTRACT_SCALAR(payload,"$._cdn_provider") AS cdn
FROM httparchive.latest.requests_mobile
JOIN
chrome-ux-report.country_in.201907
ON
CONCAT(origin, ‘/’) = page
WHERE url LIKE “https://www.amazon.in/
GROUP BY
page, cdn

43%20PM

This result made me a little curious. A few questions came up -

  1. Do sites switch CDNs or between CDNs and no-CDN?
  2. What are some top multi-CDN combinations trending in India?

To address thee questions, I had to use CrUX country India dataset and the httparchive.latest.requests_desktop dataset and of course, Big Query!

A couple of things about these datasets

  1. chrome-ux-report.country_in.201908 -> This will help us pull out the top origins in a given country (India, in this case). We need to note that there’s no information if the origin is a CDN or not in this dataset. This is why we need #2 below
  2. httparchive.latest.requests_desktop -> This will help us match the set of URLs we pull from the first set against a list of pages (URLs) and check if CDN is enabled or not. If enabled, what is/are the different CDNs?

QUERY

SELECT COUNT(page) as num_pages, cdn
FROM(SELECT
page,
STRING_AGG(DISTINCT(CASE WHEN JSON_EXTRACT_SCALAR(payload,"._cdn_provider") = '' THEN "Direct Origin" WHEN JSON_EXTRACT_SCALAR(payload,"._cdn_provider") = ‘null’ THEN “Direct Origin” WHEN JSON_EXTRACT_SCALAR(payload,"._cdn_provider") iS NULL THEN "Direct Origin" ELSE JSON_EXTRACT_SCALAR(payload,"._cdn_provider") END), " | " order by CASE WHEN JSON_EXTRACT_SCALAR(payload,"._cdn_provider") = '' THEN "Direct Origin" WHEN JSON_EXTRACT_SCALAR(payload,"._cdn_provider") = ‘null’ THEN “Direct Origin” WHEN JSON_EXTRACT_SCALAR(payload,"._cdn_provider") iS NULL THEN "Direct Origin" ELSE JSON_EXTRACT_SCALAR(payload,"._cdn_provider") END
) as cdn
FROM
httparchive.latest.requests_desktop
JOIN
chrome-ux-report.country_in.201908
ON
CONCAT(origin, ‘/’) = page
GROUP BY
page)

GROUP BY
cdn
ORDER BY
num_pages DESC
LIMIT 10

So, here are the results
Table View
13%20PM

Donut Graph view
12%20PM

Inference

  • Around 35% of pages requested in India directly hit the origin and Google. It’s important to note that any request hitting the Google’s GFE network is being marked as “Google” in this case. This does not necessarily have to be Google’s CDN
  • 17% hitting the origin directly. It will be interesting to check what percentage of these requests are hitting origins outside of India
  • 16.3% of pages keep switching between Cloudflare, Direct Origin and Google
  • 5% of pages with Akamai and 4.3% with Cloudfront seems a little too low, considering Akamai’s presence in India.

Inputs/Feedback/Corrections are welcome!

2 Likes