In Update recommended min UA Resource Timing buffer size to 250 by nicjansma · Pull Request #155 · w3c/resource-timing · GitHub @igrigorik writes:
FWIW, I think the “what is the distribution of same-frame resources prior to onload” question is worth investigating a bit further. I believe we should have all the right bits in HTTP Archive to extract this: we have detailed data on initiator / frame, and we can filter out any requests initiated after onload fires
I wanted to help answer this question and post it here on the discussion group so everyone else can see my approach and the results.
Here’s my query:
#standardSQL
# Note: This query processes 287 GB.
CREATE TEMPORARY FUNCTION toInt(n STRING) AS (CAST(n AS INT64));
SELECT
requests,
COUNT(0) AS frequency
FROM (
SELECT
SUM(IF(toInt(JSON_EXTRACT(request.payload, '$._load_start')) < toInt(JSON_EXTRACT(page.payload, '$.pageTimings.onLoad')), 1, 0)) AS requests
FROM
`httparchive.pages.2018_05_01_desktop` AS page
JOIN
`httparchive.requests.2018_05_01_desktop` AS request
ON
page.url = request.page
GROUP BY
page.url)
GROUP BY
requests
ORDER BY
requests
See the results here
The tail is really long so this chart clips it at 500 requests. The maximum number of requests before onload is 3,432!
There is a mode at 1 request and another around 50 requests. The median is 74 requests.
Out of curiosity I flipped the inequality and queried for the median number of requests per page on or after onload. The result is 3 requests. Either sites are doing a poor job of lazyloading or HTTP Archive is not well suited to trigger additional requests, for example scrolling down the page to initiate below-the-fold resource loading.