Awesome!
According to the resource hints specification there are multiple ways to use these resource hints. The query above shows us how often the resource hints are specified in the document markup.
The resource hint link's may be specified in the document markup,
MAY be provided via the HTTP Link header, and MAY be dynamically
added to and removed from the document.
Now let’s explore how many links are being provided via HTTP Link headers.
First we need to extract the Link headers from the HTTP responses. Here’s a query that attempts to extract all the Link headers from httparchive.requests.2018_08_15_desktop
. Note that this query processes 810GB of data.
CREATE TEMPORARY FUNCTION getHeaders(payload STRING)
RETURNS STRING
LANGUAGE js AS """
try {
var $ = JSON.parse(payload);
var headers = $.response.headers;
var st = headers.find(function(e) {
return e['name'].toLowerCase() === 'link'
});
return st['value'];
} catch (e) {
return '';
}
""";
SELECT * FROM (
SELECT page, url, getHeaders(payload) AS link
FROM `httparchive.requests.2018_08_15_desktop`
)
WHERE link != ""
Next we need to figure out what the rel attribute in the Link header is set to. In order to avoid querying this large table repeatedly, I saved it to a scratchspace table. If you want to explore it, the table is at httparchive.scratchspace.link_headers_2018_08_15_desktop
and is 450MB.
There were some cases where the Link header ended with something like “rel=preconnect” and other cases where it contained “rel=preconnect; crossorigin”. That made it difficult to write a regular expression to extract the rel attribute value. The solution I came up with was to use a combination of SUBSTR and REGEX_REPLACE to remove everything except the Rel attribute and then summarize it.
That query was:
SELECT REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
SUBSTR(link, STRPOS(link,"rel=")) , r",.*", "")
, r";.*","")
,r"\"","") link_rel, count(*) freq
FROM `httparchive.scratchspace.link_headers_2018_08_15_desktop`
GROUP BY link_rel
ORDER BY freq DESC
The results from that query show 867,953 additional preconnects, 11105 additional preloads and 3406 additional dns-prefetches
Also interesting to note - 822,568 of the preconnects were in the HTTP response headers of CSS loaded from fonts.googleapis.com.