Resource Hints adoption


#1

There are various ways to use Resource Hints:

  • dns-prefetch
  • preconnect
  • prefetch
  • prerender
  • preload

How popular is each loading hint?

SELECT
  APPROX_TOP_COUNT(LOWER(pre), 5)
FROM (
  SELECT
    REGEXP_EXTRACT_ALL(body, '(?i)<link[^>]*rel=[\'"]?(dns-prefetch|preconnect|preload|prefetch|prerender)') AS pre
  FROM
    `httparchive.response_bodies.2018_08_15_desktop`),
  UNNEST(pre) AS pre
hint frequency
dns-prefetch 1574860
preconnect 274090
preload 134417
prefetch 88896
prerender 5827

#2

Awesome!

According to the resource hints specification there are multiple ways to use these resource hints. The query above shows us how often the resource hints are specified in the document markup.

The resource hint link's may be specified in the document markup, 
MAY be provided via the HTTP Link header, and MAY be dynamically
added to and removed from the document.

Now let’s explore how many links are being provided via HTTP Link headers.

First we need to extract the Link headers from the HTTP responses. Here’s a query that attempts to extract all the Link headers from httparchive.requests.2018_08_15_desktop. Note that this query processes 810GB of data.

CREATE TEMPORARY FUNCTION getHeaders(payload STRING)
RETURNS STRING
LANGUAGE js AS """
  try {
    var $ = JSON.parse(payload);
    var headers = $.response.headers;
    var st = headers.find(function(e) { 
      return e['name'].toLowerCase() === 'link'
    });
    return st['value'];
  } catch (e) {
    return '';
  }
""";
SELECT * FROM (
  SELECT page, url, getHeaders(payload) AS link
  FROM `httparchive.requests.2018_08_15_desktop`
)
WHERE link != ""

Next we need to figure out what the rel attribute in the Link header is set to. In order to avoid querying this large table repeatedly, I saved it to a scratchspace table. If you want to explore it, the table is at httparchive.scratchspace.link_headers_2018_08_15_desktop and is 450MB.

There were some cases where the Link header ended with something like “rel=preconnect” and other cases where it contained “rel=preconnect; crossorigin”. That made it difficult to write a regular expression to extract the rel attribute value. The solution I came up with was to use a combination of SUBSTR and REGEX_REPLACE to remove everything except the Rel attribute and then summarize it.

That query was:

SELECT REGEXP_REPLACE(
             REGEXP_REPLACE(
                  REGEXP_REPLACE(
                      SUBSTR(link, STRPOS(link,"rel=")) , r",.*", "")
                  , r";.*","")
             ,r"\"","") link_rel, count(*) freq 
FROM `httparchive.scratchspace.link_headers_2018_08_15_desktop`
GROUP BY link_rel
ORDER BY freq DESC

The results from that query show 867,953 additional preconnects, 11105 additional preloads and 3406 additional dns-prefetches

Also interesting to note - 822,568 of the preconnects were in the HTTP response headers of CSS loaded from fonts.googleapis.com.


#3

Do you know if resource hints headers on sub-resources actually work in any browser?


#4

I am fairly certain the resource hints header works in Chrome for subrequests. The Google fonts case was discussed quite actively. @yoav should know for sure.


#5

Thanks, I was planning on having a play to see what worked and what didn’t


#6

Resource Hints should definitely work on subresources in Chromium.


#7

Does prefetch work with SPAs? For example to prefetch a subsequent profile page on LinkedIn when on feed. Given, LinkedIn SPA is designed using Ember.


#8

In a recent talk I encouraged the use of link rel=preload for critical scripts. I was surprised to see that Chrome Platform Status said 25% of pages use link rel=preload. This seems high, but maybe it’s popular for other types of resources.

I wanted to specifically look for the percentage of sites that were using it for scripts. Paul Calvano was kind enough to create a temporary table that contained all the main HTML document response bodies from the Oct 15 2018 crawl. (The “main” HTML document is the first HTML document returned for a given URL, so does not include iframes.) I used the following regular expression to count how many of these used link rel=preload for scripts:

SELECT count(*)
FROM `httparchive.scratchspace.responsebodies_2018_10_15_desktop`
WHERE REGEXP_CONTAINS(body, '(?i)<link[^>]*rel=[\'"]?preload')
      AND REGEXP_CONTAINS(body, '(?i)<link[^>]*as=[\'"]?script')

The number turned out to be 12,469 pages out of 1,255,904 sites, or 0.98% use link rel=preload for scripts. This is much less than the 25% cited by Chrome Platform Status. This difference could be due to link rel=preload being used for things other than scripts, or iframes using link rel=preload.

Why did I impose those two conditions on my search?

  • I only searched for scripts because synchronous scripts inflict painful delays on rendering and the user experience. In another HA analysis, I showed that scripts consume the most CPU. In yet another HA analysis, I showed that synchronous scripts still outnumber async. It’d be good for sites with synchronous scripts to preload them to avoid those blocking delays.

  • I only searched the main HTML document because that’s the primary content on the page. I think there’s less benefit if a third party uses an iframe to link rel=preload their third party scripts. In fact, that’s probably a bad pattern because then those third party scripts may compete for bandwidth and CPU with the main page’s scripts. So I wanted to focus on whether the website owner was using link rel=preload to get its own scripts to download sooner.

It’s disappointing to see that only 1% of sites are using link rel=preload for its own scripts. But that means there’s an opportunity to be had! Link rel=preload is a great technique to get critical content downloaded sooner. Since synchronous scripts inflict painful delays, any page with synchronous scripts should consider using link rel=preload as a way to mitigate that pain.


#9

Make sure to look at the http headers as well for the header-based preload. My guess is that that is where it is coming from in most cases.


#10

Hey Steve, have you found that preloading synchronous scripts helps? I thought that the browser’s lookahead scanner would be catching those. Do you have some data on this? Thanks!


#12

I should have mentioned I also looked at this but it was extremely small. The query was:

FROM `httparchive.summary_requests.2018_10_15_desktop` as r
WHERE respOtherHeaders like '%preload%'
      AND firstHtml = true
      AND REGEXP_EXTRACT(respOtherHeaders, r"link(.*)") is not null
      AND strpos(respOtherHeaders, "rel=preload") != 0
      AND strpos(respOtherHeaders, "as=script") != 0

The result was 1670 sites (0.1%). Even removing the “as=script” condition, only 2817 sites (0.2%) use the link rel=preload response header.


#13

It does but I don’t have data to share. It might be small if all you do is markup, but adding the link rel=preload HTTP response header will have a bigger impact.


#14

I started down this road after performance.now() but haven’t had time to finish it with the work I’ve got on at the moment.

I’m a bit skeptical of the benefits of preloading synchronous scripts but haven’t yet got much data to support the argument either way ATM

Preload is a tradeoff so whenever we use it to boost the priority of something then we’re implicitly decreasing the priority of something else (unless the network would be idle at that time)

So given the typical pattern I see in many elements – external CSS before external blocking JS – preloading the JS via headers potentially delays the CSS due to network contention, H2 priorities etc. and execution of those scripts is blocked waiting for CSSOM construction. (know sites should aim to have as little blocking JS as possible but think we all know the reality is rather different!)

I’ve often wondered if we should actually be preloading the CSS via HTTP headers so the browser can discover it before the HTML parser starts up – it’s on my list of things to test along with where should we position the preload directives for various resource types within the document itself

Hopefully will get more time to explore in Feb but please challenge my train of thought


#15

One thing to keep in mind is many websites aren’t the clean, deterministic loading of resources we all hope for. Instead, many sites use tag managers that may insert content that isn’t anticipated, including blocking scripts. This is another motivation to preload synchronous scripts that are first domain.