Who initiates image downloads?

Internet Explorer emits a handy X-Download-Initiator header that describes the type of resource being fetched, what triggered it, plus some other metadata - see MSDN blog post. With a bit of regex gymnastics, we can figure out how and why the image requests are initiated…

SELECT initiator, reason, cnt, 
  ROUND(ratio*100,2) AS percent,
  ROUND(byteRatio*100,2) as bytePercent
FROM (
  SELECT initiator, reason, 
    count(initiator) as cnt, RATIO_TO_REPORT(cnt) OVER() AS ratio, 
    sum(respBodySize) as bodySize, RATIO_TO_REPORT(bodySize) OVER() as byteRatio
  FROM (
    SELECT reqOtherHeaders, respBodySize, type, initiator,
      REGEXP_EXTRACT(reason, r'(\w+)') as reason
    FROM ( 
      SELECT url, reqOtherHeaders, respBodySize,
        REGEXP_EXTRACT(reqOtherHeaders, r'X-Download-Initiator\s=\s(\w+)') as type,
        REGEXP_EXTRACT(reqOtherHeaders, r'"doc[^;]*;([^;]*);') as initiator,
        REGEXP_EXTRACT(reqOtherHeaders, r'"doc[^;]*;[^;]*;([^;]*)') as reason,
      FROM [httparchive:runs.2015_04_01_requests] 
    ) WHERE type = 'image'
  )
  GROUP by initiator, reason
  HAVING CNT > 100
)    

Results for “desktop” April 1st, 2015 run:

  • ~43% of image fetches are initiated by the speculative HTML scanner, which account for ~50% of transferred bytes.
  • ~37% of other fetches are initiated by parsing the src attribute of the img tag - i.e. by the document parser.
  • ~20% of remaining fetches are initiated via CSS (“background-image”).

In total this means ~80% of images are declared in the HTML markup and they amount to ~84% of transferred bytes. The remaining ~20% is declared via CSS. That said, also a few caveats for these numbers:

  • This is for initial load and doesn’t account for images fetched later when, for example, a script injects an image based on user input, or some CSS rule is activated with a new background-image.
  • This does not account for XHR-fetched images.
2 Likes

@igrigorik : MSDN Link is broken, I believe this is the one which you are referring to :smile:

Ooops, good catch, fixed - thanks!

@igrigorik Isn’t that should be FROM [httparchive:runs.2015_04_01_requests] instead of FROM [runs.2015_04_01_requests] ???

Yep, I run within the project, hence it works for me… Fixed!

1 Like

Great stats!

A few thoughts;
(1) I believe HA still uses IE 9? If so, IE 11 has gotten very aggressive with its speculative image downloads, and so the number from that parser may be much higher.
(2) if you can spare the time, it’ll be interesting to see the stats for the median page (it’s probably doable with a fancier query). Averages on HA are often a bit nuts.
(3) I think the src count would capture JS loaded images that used src to do so, which includes most image galleries and lazy loaders. Maybe the initiator can hint to you whether it was JS loaded or native HTML?

Keep the data flowing!