How many scripts are being fetched from the non-HTML / non-root domain?
SELECT
NTH(50, quantiles(num_scripts,101)) median,
NTH(75, quantiles(num_scripts,101)) seventy_fifth,
NTH(90, quantiles(num_scripts,101)) ninetieth,
NTH(95, quantiles(num_scripts,101)) ninety_fifth
FROM (
SELECT req.pageid, COUNT(req.url) as num_scripts
FROM [httparchive:runs.latest_requests] as req JOIN (
SELECT HOST(url) self, pageid
FROM [httparchive:runs.latest_pages]
) as pages ON pages.pageid = req.pageid
WHERE HOST(req.url) != pages.self
AND LOWER(mimeType) CONTAINS "script"
GROUP BY req.pageid
)
Based on June 2013 data:
We can also ask the reverse (just remove the ! in the WHERE clause), which is to say… How many scripts are being loaded from the same host?
Are sites sharding scripts on sub-hostnames of HTML domain?
SELECT
NTH(50, quantiles(num_scripts,101)) median,
NTH(75, quantiles(num_scripts,101)) seventy_fifth,
NTH(90, quantiles(num_scripts,101)) ninetieth,
NTH(95, quantiles(num_scripts,101)) ninety_fifth
FROM (
SELECT req.pageid, COUNT(req.url) as num_scripts
FROM [httparchive:runs.latest_requests] as req JOIN (
SELECT DOMAIN(url) self, pageid
FROM [httparchive:runs.latest_pages]
) as pages ON pages.pageid = req.pageid
WHERE DOMAIN(req.url) = pages.self
AND LOWER(mimeType) CONTAINS "script"
GROUP BY req.pageid
)
A few more scripts in the tail, showing that some sites are indeed sharding on root domain…
So the median website has 13 scripts, 5 from the same host as the main page, and 8 from some other host. We now know the number of scripts, but from how many distinct hosts are they being fetched?
SELECT
NTH(50, quantiles(num_hosts,101)) median,
NTH(75, quantiles(num_hosts,101)) seventy_fifth,
NTH(90, quantiles(num_hosts,101)) ninetieth,
NTH(95, quantiles(num_hosts,101)) ninety_fifth
FROM (
SELECT req.pageid, COUNT(DISTINCT HOST(req.url), 1000000) as num_hosts
FROM [httparchive:runs.latest_requests] as req JOIN (
SELECT HOST(url) self, pageid
FROM [httparchive:runs.latest_pages]
) as pages ON pages.pageid = req.pageid
WHERE HOST(req.url) != pages.self
AND LOWER(mimeType) CONTAINS "script"
GROUP BY req.pageid
)
Looking at the median, the 8 scripts requested on hosts different than the main page are requested across 5 different domains. Yikes!