JavaScript Library Detection

It seems the previous queries don’t age well and there’s something wonky with the processing. Here are two new alternatives for accessing the top 10 JS libraries:

#standardSQL
CREATE TEMPORARY FUNCTION getJsLibs(payload STRING)
RETURNS ARRAY<STRUCT<name STRING, version STRING>>
LANGUAGE js AS """
  try {
    const $ = JSON.parse(payload);
    const libs = JSON.parse($['_third-parties']);
    return Array.isArray(libs) ? libs : [];
  } catch (e) {
    return [];
  }
""";

SELECT
  APPROX_TOP_COUNT(lib.name, 10)
FROM (
  SELECT
    url,
    getJsLibs(payload) AS libs
  FROM
    `httparchive.pages.2018_05_01_desktop`),
  UNNEST(libs) AS lib

This uses the new pages dataset (same data as har:YYYY_MM_DD_pages_*) and does all of the JSON parsing in a UDF.

#standardSQL
CREATE TEMPORARY FUNCTION getJsLibs(report STRING)
RETURNS ARRAY<STRUCT<name STRING, version STRING>>
LANGUAGE js AS """
  try {
    const $ = JSON.parse(report);
    const libs = $.audits['no-vulnerable-libraries'].extendedInfo.jsLibs;
    return Array.isArray(libs) ? libs.map(({name, version}) => ({name, version})) : [];
  } catch (e) {
    return [];
  }
""";

SELECT
  APPROX_TOP_COUNT(lib.name, 10)
FROM (
  SELECT
    url,
    getJsLibs(report) AS libs
  FROM
    `httparchive.lighthouse.2018_05_01_mobile`),
  UNNEST(libs) AS lib

This uses the Lighthouse JS vulnerability audit, which includes the same library detection logic. The differences being that it is only available for mobile pages, Lighthouse audits may fail at a higher rate so not all pages may be included, and it doesn’t require double-parsing the results. Note in the first query we needed two JSON.parse calls, so this way is a bit more straightforward (kind of).

Building on these new detection methods, here’s a query that tracks the frequency of jQuery:

#standardSQL
CREATE TEMPORARY FUNCTION getJsLibs(payload STRING)
RETURNS ARRAY<STRUCT<name STRING, version STRING>>
LANGUAGE js AS """
  try {
    const $ = JSON.parse(payload);
    const libs = JSON.parse($['_third-parties']);
    return Array.isArray(libs) ? libs : [];
  } catch (e) {
    return [];
  }
""";

SELECT
  date,
  client,
  SUM(IF(lib.name = 'jQuery', 1, 0)) AS jQuery
FROM (
  SELECT
    SUBSTR(_TABLE_SUFFIX, 0, 10) AS date,
    IF(ENDS_WITH(_TABLE_SUFFIX, 'desktop'), 'desktop', 'mobile') AS client,
    getJsLibs(payload) AS libs
  FROM
    `httparchive.pages.*`),
  UNNEST(libs) AS lib
GROUP BY
  date,
  client
ORDER BY
  date,
  client

image

It’s clear that there was some mysterious data loss between September 2017 and February 2018 - we’ll have to look into that. But looking past that, we can see a clear trend that jQuery detections are declining. It’s also clear that jQuery usage is lower on mobile but the change over time compared to desktop seems to be about equal.