Requests per page before onload

In https://github.com/w3c/resource-timing/pull/155 @igrigorik writes:

FWIW, I think the “what is the distribution of same-frame resources prior to onload” question is worth investigating a bit further. I believe we should have all the right bits in HTTP Archive to extract this: we have detailed data on initiator / frame, and we can filter out any requests initiated after onload fires

I wanted to help answer this question and post it here on the discussion group so everyone else can see my approach and the results.

Here’s my query:

#standardSQL
# Note: This query processes 287 GB.
CREATE TEMPORARY FUNCTION toInt(n STRING) AS (CAST(n AS INT64));

SELECT
  requests,
  COUNT(0) AS frequency
FROM (
  SELECT
    SUM(IF(toInt(JSON_EXTRACT(request.payload, '$._load_start')) < toInt(JSON_EXTRACT(page.payload, '$.pageTimings.onLoad')), 1, 0)) AS requests
  FROM
    `httparchive.pages.2018_05_01_desktop` AS page
  JOIN
    `httparchive.requests.2018_05_01_desktop` AS request
  ON
    page.url = request.page
  GROUP BY
    page.url)
GROUP BY
  requests
ORDER BY
  requests

See the results here

https://docs.google.com/spreadsheets/d/16x-4uO90MZYxa6WtuOjZag-_aVsrA-dln0jjmVeYcdM/edit?usp=sharing

image

The tail is really long so this chart clips it at 500 requests. The maximum number of requests before onload is 3,432!

There is a mode at 1 request and another around 50 requests. The median is 74 requests.

Out of curiosity I flipped the inequality and queried for the median number of requests per page on or after onload. The result is 3 requests. Either sites are doing a poor job of lazyloading or HTTP Archive is not well suited to trigger additional requests, for example scrolling down the page to initiate below-the-fold resource loading.

2 Likes