FWIW, I think the “what is the distribution of same-frame resources prior to onload” question is worth investigating a bit further. I believe we should have all the right bits in HTTP Archive to extract this: we have detailed data on initiator / frame, and we can filter out any requests initiated after onload fires
I wanted to help answer this question and post it here on the discussion group so everyone else can see my approach and the results.
Here’s my query:
#standardSQL # Note: This query processes 287 GB. CREATE TEMPORARY FUNCTION toInt(n STRING) AS (CAST(n AS INT64)); SELECT requests, COUNT(0) AS frequency FROM ( SELECT SUM(IF(toInt(JSON_EXTRACT(request.payload, '$._load_start')) < toInt(JSON_EXTRACT(page.payload, '$.pageTimings.onLoad')), 1, 0)) AS requests FROM `httparchive.pages.2018_05_01_desktop` AS page JOIN `httparchive.requests.2018_05_01_desktop` AS request ON page.url = request.page GROUP BY page.url) GROUP BY requests ORDER BY requests
See the results here
The tail is really long so this chart clips it at 500 requests. The maximum number of requests before onload is 3,432!
There is a mode at 1 request and another around 50 requests. The median is 74 requests.
Out of curiosity I flipped the inequality and queried for the median number of requests per page on or after onload. The result is 3 requests. Either sites are doing a poor job of lazyloading or HTTP Archive is not well suited to trigger additional requests, for example scrolling down the page to initiate below-the-fold resource loading.