Starting in the April 15, 2017 crawl, har tables will include a _third-parties
custom metric. This includes name and version information for ~100 of the most popular JavaScript libraries used on the web. This enables an exciting new area for analysis.
The format of the metric is a JSON-encoded array of library objects like this: [{"name":"jQuery","version":"1.4.2"}]
Here’s an example of a query that gets all library names and versions for all pages in the 4/15 crawl:
CREATE TEMPORARY FUNCTION parseJson(libs STRING)
RETURNS ARRAY<STRUCT<name STRING, version STRING>>
LANGUAGE js AS """
try {
return JSON.parse(libs);
} catch (e) {
return [];
}
""";
SELECT
url,
lib
FROM (
SELECT
url,
parseJson(TRIM(JSON_EXTRACT(payload, "$._third-parties"), '"')) AS libs
FROM
`httparchive.har.2017_04_15_chrome_pages`)
CROSS JOIN
UNNEST(libs) AS lib
ORDER BY
url;
Note that this uses the standard SQL dialect (rather than legacy). You can run the query here. Alternatively, you can browse the results of this query in the temporary httparchive:scratchspace.2017_04_15_js_libs table (a permanent table still TBD).
Using this scratch table, it’s really easy to get answers to questions like “How popular is jQuery?”:
SELECT
ROUND(jquery.count / total.count, 2)
FROM
(SELECT COUNT(0) AS count FROM `httparchive.scratchspace.2017_04_15_js_libs` WHERE lib.name = 'jQuery') AS jquery,
(SELECT COUNT(0) AS count FROM `httparchive.har.2017_04_15_chrome_pages`) AS total
It turns out that jQuery is used by 82% of the ~500K sites crawled by HTTP Archive!
How about the top 3 most popular jQuery versions?
SELECT
APPROX_TOP_COUNT(lib.version, 3)
FROM
`httparchive.scratchspace.2017_04_15_js_libs`
WHERE
lib.name = 'jQuery';
Results:
f0_.value f0_.count
1.12.4 80179
1.11.1 28507
1.11.3 26073
Version 1.12.4 is by far the leader. Interestingly, the latest version (3.2.1) is all the way down at number 42 on the list of popular versions.
There’s so much more to explore with this data, so dig in and share your findings!
Check out https://github.com/HTTPArchive/httparchive/issues/77 for more about how this works under the hood.
PS: We’re working on expanding the detection to platforms (Wordpress, etc) as well so stay tuned.