Requests per page before onload


#1

In https://github.com/w3c/resource-timing/pull/155 @igrigorik writes:

FWIW, I think the “what is the distribution of same-frame resources prior to onload” question is worth investigating a bit further. I believe we should have all the right bits in HTTP Archive to extract this: we have detailed data on initiator / frame, and we can filter out any requests initiated after onload fires

I wanted to help answer this question and post it here on the discussion group so everyone else can see my approach and the results.

Here’s my query:

#standardSQL
# Note: This query processes 287 GB.
CREATE TEMPORARY FUNCTION toInt(n STRING) AS (CAST(n AS INT64));

SELECT
  requests,
  COUNT(0) AS frequency
FROM (
  SELECT
    SUM(IF(toInt(JSON_EXTRACT(request.payload, '$._load_start')) < toInt(JSON_EXTRACT(page.payload, '$.pageTimings.onLoad')), 1, 0)) AS requests
  FROM
    `httparchive.pages.2018_05_01_desktop` AS page
  JOIN
    `httparchive.requests.2018_05_01_desktop` AS request
  ON
    page.url = request.page
  GROUP BY
    page.url)
GROUP BY
  requests
ORDER BY
  requests

See the results here

The tail is really long so this chart clips it at 500 requests. The maximum number of requests before onload is 3,432!

There is a mode at 1 request and another around 50 requests. The median is 74 requests.

Out of curiosity I flipped the inequality and queried for the median number of requests per page on or after onload. The result is 3 requests. Either sites are doing a poor job of lazyloading or HTTP Archive is not well suited to trigger additional requests, for example scrolling down the page to initiate below-the-fold resource loading.