JavaScript Library Detection


#1

Starting in the April 15, 2017 crawl, har tables will include a _third-parties custom metric. This includes name and version information for ~100 of the most popular JavaScript libraries used on the web. This enables an exciting new area for analysis.

The format of the metric is a JSON-encoded array of library objects like this: [{"name":"jQuery","version":"1.4.2"}]


Here’s an example of a query that gets all library names and versions for all pages in the 4/15 crawl:

    CREATE TEMPORARY FUNCTION parseJson(libs STRING)
    RETURNS ARRAY<STRUCT<name STRING, version STRING>>
    LANGUAGE js AS """
      try {
        return JSON.parse(libs);
      } catch (e) {
        return [];
      }
    """;

    SELECT
      url,
      lib
    FROM (
      SELECT
        url,
        parseJson(TRIM(JSON_EXTRACT(payload, "$._third-parties"), '"')) AS libs
      FROM
        `httparchive.har.2017_04_15_chrome_pages`)
    CROSS JOIN
      UNNEST(libs) AS lib
    ORDER BY
      url;

Note that this uses the standard SQL dialect (rather than legacy). You can run the query here. Alternatively, you can browse the results of this query in the temporary httparchive:scratchspace.2017_04_15_js_libs table (a permanent table still TBD).


Using this scratch table, it’s really easy to get answers to questions like “How popular is jQuery?”:

    SELECT
      ROUND(jquery.count / total.count, 2)
    FROM
      (SELECT COUNT(0) AS count FROM `httparchive.scratchspace.2017_04_15_js_libs` WHERE lib.name = 'jQuery') AS jquery,
      (SELECT COUNT(0) AS count FROM `httparchive.har.2017_04_15_chrome_pages`) AS total

It turns out that jQuery is used by 82% of the ~500K sites crawled by HTTP Archive!


How about the top 3 most popular jQuery versions?

    SELECT
      APPROX_TOP_COUNT(lib.version, 3)
    FROM
      `httparchive.scratchspace.2017_04_15_js_libs`
    WHERE
      lib.name = 'jQuery';

Results:

f0_.value	f0_.count
1.12.4		80179
1.11.1		28507
1.11.3		26073

Version 1.12.4 is by far the leader. Interestingly, the latest version (3.2.1) is all the way down at number 42 on the list of popular versions.

There’s so much more to explore with this data, so dig in and share your findings!


Check out https://github.com/HTTPArchive/httparchive/issues/77 for more about how this works under the hood.

PS: We’re working on expanding the detection to platforms (Wordpress, etc) as well so stay tuned.


Is jQuery still relevant?
Average age of jQuery
#2

Let’s dig deeper into the JS library data and try to answer some common questions. I’ll be adding to this thread with different analyses. First up…

What are the most popular libraries?

SELECT
  library.value AS library,
  library.count AS volume,
  ROUND(library.count / total.count, 3) AS coverage
FROM
  UNNEST((SELECT APPROX_TOP_COUNT(lib.name, 10) FROM `httparchive.scratchspace.2017_04_15_js_libs` WHERE lib.name IS NOT NULL)) AS library,
  (SELECT COUNT(0) AS count FROM `httparchive.har.2017_04_15_chrome_pages`) AS total
ORDER BY
  volume DESC
Row	library		volume	coverage	 
1	jQuery		394296	0.822	 
2	jQuery UI	104193	0.217	 
3	Modernizr	76339	0.159	 
4	Bootstrap	61711	0.129	 
5	yepnope		54589	0.114	 
6	FlexSlider	39465	0.082	 
7	SWFObject	23054	0.048	 
8	Underscore	19283	0.04	 
9	Google Maps	16091	0.034	 
10	Moment.js	14834	0.031

Run it on BigQuery

Querying for the top 10 libraries shows that jQuery is way out ahead and jQuery UI leads the race behind it.


http://jsfiddle.net/8e5uobnb/4/

Getting more than the top 10 is as easy as changing the APPROX_TOP_COUNT arguments. But because jQuery has so much coverage, plotting everything linearly makes the smaller libraries’ data impossible to read.


http://jsfiddle.net/8e5uobnb/5/

Plotting the entire data in a logarithmically-scaled chart reveals the long tail.


#3

Continuing with the JS library analysis, let’s look at…

What is the relationship between site ranking and library usage?

We start by constructing a query that will count the number of times a library is used in a range of Alexa ranks. So for example, it will tell us that React is used by 31 of the top 1000 sites. This data will be used to draw a histogram.

SELECT
  libs.name AS library,
  INTEGER(FLOOR(pages.rank / 1000) * 1000) AS bucket,
  COUNT(0) AS volume
FROM
  (SELECT url, lib.name AS name FROM httparchive:scratchspace.2017_04_15_js_libs WHERE lib.name IN ('jQuery', 'Google Maps', 'Bootstrap', 'Modernizr', 'Polymer', 'Angular', 'AngularJS', 'React')) AS libs JOIN
  (SELECT url, rank FROM httparchive:runs.latest_pages) AS pages ON pages.url = libs.url
WHERE
  pages.rank IS NOT NULL
GROUP BY
  library,
  bucket
ORDER BY
  bucket ASC

Run it on BigQuery

After a bit of pre-processing, we can feed the JSON data through a charting tool to visualize the results.


http://jsfiddle.net/osq3Leur/

As expected from the previous post, jQuery dominates. But this chart tells us that the domination is pretty consistent across the rankings.

Peeling back jQuery and Modernizr, we can get a better look at how some of the other popular libraries are distributed.

Focusing now on Bootstrap and AngularJS, we can see that there is a noticeable bump in the top 50k ranked sites and usage decreases into the tail.

This is especially clear with React. There’s a huge spike in React usage in the top 10k sites.

On the other hand, Google Maps usage more strongly favors the tail of ranked sites over the top 50k, where usage is much lower. In fact, only 6 of the top 1000 sites use Google Maps. This is a good time to remind everyone that HTTP Archive only measures the home pages of these sites, so Maps usage on secondary pages would not be reflected.

Angular and Polymer adoption is low compared to the other libraries we looked at and noticeably sparse into the tail.


We can tweak our original query a bit to get a sense of library usage in the top 100 sites.

SELECT
  libs.name AS library,
  COUNT(0) AS volume
FROM
  (SELECT url, lib.name AS name FROM httparchive:scratchspace.2017_04_15_js_libs WHERE lib.name IN ('jQuery', 'Google Maps', 'Bootstrap', 'Modernizr', 'Polymer', 'Angular', 'AngularJS', 'React')) AS libs JOIN
  (SELECT url, rank FROM httparchive:runs.latest_pages) AS pages ON pages.url = libs.url
WHERE
  pages.rank < 100
GROUP BY
  library
ORDER BY
  volume DESC

Run it on BigQuery

Results:

library		volume
jQuery		47
Modernizr	6
React		3
Bootstrap	1

So the top 100 still favor jQuery, but at a much lower rate than the global average (47% vs 82%). Conversely, React has 3% usage in the top 100 as opposed to 0.4% overall.


#4

Taking your query - interactive Data Studio viz:

https://datastudio.google.com/org/aLzLLuH1QJC-2sBBmo7qdw/reporting/0ByGAKP3QmCjLS3E5Y1FVNEY5TTQ/page/cSrE


#5

The less popular (by Alexa rank) sites use less of AngularJs, while GoogleMaps grows more popular:


#6

How are JS libraries changing over time?

Now that we have two crawls completed, let’s see what changed.

SELECT
  now.lib,
  now.volume,
  now.volume - previous.volume AS change,
  ROUND((now.volume - previous.volume) * 100 / previous.volume, 1) AS percent_change
FROM (
  SELECT
    lib.name AS lib,
    COUNT(0) AS volume
  FROM
    `httparchive.scratchspace.2017_04_15_js_libs`
  GROUP BY
    lib
) AS previous INNER JOIN (
  SELECT
    lib.name AS lib,
    COUNT(0) AS volume
  FROM
    `httparchive.scratchspace.2017_05_01_js_libs`
  GROUP BY
    lib
) AS now
ON previous.lib = now.lib
ORDER BY
  percent_change DESC

Run it on BigQuery and view the results on Google Sheets.

FuseJS had the biggest change relative to its volume. But in absolute terms, it gained just 3 more sites. The next most significant changes are to Angular and Zepto.js, which acquired 51 and 259 more sites respectively.

In absolute terms, Bootstrap gained the most sites with 397 more. On the other end, jQuery and jQuery UI lost the most sites with -1039 and -894 respectively. One could argue that the 7 biggest losers in web share during this period are all libraries that have been obsoleted in some way by recent advances in web standards:

  1. jQuery
  2. jQuery UI
  3. Modernizr
  4. yepnope
  5. FlexSlider
  6. SWFObject
  7. jQuery Tools

These are exciting changes to see and we’ll continue to monitor them as HTTP Archive continues to collect more data.


#7

Hey Rick,

Did the format of _third-parties change? If you run the above query on the 2017-06-01 crawl, the query fails:

Error: Failed to coerce output value {“0”:{“name”:“Bootstrap”,“version”:“3.3.6”},“1”:{“name”:“jQuery”,“version”:“2.1.4”},“2”:{“name”:“jQuery UI”,“version”:“1.10.2”}} to type ARRAY<STRUCT<name STRING, version STRING>>

Thanks!


#8

Yeah that’s weird. It looks like the array was mistakenly stringified as an object, eg with indices as properties.

This updated UDF query should properly handle the old and new format, it just wraps the JSON.parse in an Array.from call to ensure it always comes out as an array.

CREATE TEMPORARY FUNCTION parseJson(libs STRING)
RETURNS ARRAY<STRUCT<name STRING, version STRING>>
LANGUAGE js AS """
  try {
    return Array.from(JSON.parse(libs));
  } catch (e) {
    return [];
  }
""";

SELECT
  url,
  lib
FROM (
  SELECT
    url,
    parseJson(TRIM(JSON_EXTRACT(payload, "$._third-parties"), '"')) AS libs
  FROM
    `httparchive.har.2017_06_01_chrome_pages`)
CROSS JOIN
  UNNEST(libs) AS lib
ORDER BY
  url

#9

It seems the previous queries don’t age well and there’s something wonky with the processing. Here are two new alternatives for accessing the top 10 JS libraries:

#standardSQL
CREATE TEMPORARY FUNCTION getJsLibs(payload STRING)
RETURNS ARRAY<STRUCT<name STRING, version STRING>>
LANGUAGE js AS """
  try {
    const $ = JSON.parse(payload);
    const libs = JSON.parse($['_third-parties']);
    return Array.isArray(libs) ? libs : [];
  } catch (e) {
    return [];
  }
""";

SELECT
  APPROX_TOP_COUNT(lib.name, 10)
FROM (
  SELECT
    url,
    getJsLibs(payload) AS libs
  FROM
    `httparchive.pages.2018_05_01_desktop`),
  UNNEST(libs) AS lib

This uses the new pages dataset (same data as har:YYYY_MM_DD_pages_*) and does all of the JSON parsing in a UDF.

#standardSQL
CREATE TEMPORARY FUNCTION getJsLibs(report STRING)
RETURNS ARRAY<STRUCT<name STRING, version STRING>>
LANGUAGE js AS """
  try {
    const $ = JSON.parse(report);
    const libs = $.audits['no-vulnerable-libraries'].extendedInfo.jsLibs;
    return Array.isArray(libs) ? libs.map(({name, version}) => ({name, version})) : [];
  } catch (e) {
    return [];
  }
""";

SELECT
  APPROX_TOP_COUNT(lib.name, 10)
FROM (
  SELECT
    url,
    getJsLibs(report) AS libs
  FROM
    `httparchive.lighthouse.2018_05_01_mobile`),
  UNNEST(libs) AS lib

This uses the Lighthouse JS vulnerability audit, which includes the same library detection logic. The differences being that it is only available for mobile pages, Lighthouse audits may fail at a higher rate so not all pages may be included, and it doesn’t require double-parsing the results. Note in the first query we needed two JSON.parse calls, so this way is a bit more straightforward (kind of).

Building on these new detection methods, here’s a query that tracks the frequency of jQuery:

#standardSQL
CREATE TEMPORARY FUNCTION getJsLibs(payload STRING)
RETURNS ARRAY<STRUCT<name STRING, version STRING>>
LANGUAGE js AS """
  try {
    const $ = JSON.parse(payload);
    const libs = JSON.parse($['_third-parties']);
    return Array.isArray(libs) ? libs : [];
  } catch (e) {
    return [];
  }
""";

SELECT
  date,
  client,
  SUM(IF(lib.name = 'jQuery', 1, 0)) AS jQuery
FROM (
  SELECT
    SUBSTR(_TABLE_SUFFIX, 0, 10) AS date,
    IF(ENDS_WITH(_TABLE_SUFFIX, 'desktop'), 'desktop', 'mobile') AS client,
    getJsLibs(payload) AS libs
  FROM
    `httparchive.pages.*`),
  UNNEST(libs) AS lib
GROUP BY
  date,
  client
ORDER BY
  date,
  client

It’s clear that there was some mysterious data loss between September 2017 and February 2018 - we’ll have to look into that. But looking past that, we can see a clear trend that jQuery detections are declining. It’s also clear that jQuery usage is lower on mobile but the change over time compared to desktop seems to be about equal.


#10

A year of JS library adoption data

Building on the query in the previous post to track jQuery usage over time, I wrote a general purpose query that gets the number of detections for all available libraries over the entire date range. You can see all of the results in this sheet:

It’s not editable, but feel free to make a copy and play with the pivot table to generate timeseries for different libraries.

I found something interesting when comparing the adoption of Angular and React:

React’s adoption was growing and ultimately peaked in September 2017, but then began to decline. In the most recent run (May 1, 2018) it’s almost back to where it was a year ago.

Contrast that with Angular adoption. It has been growing nonstop and is nearly 6x as big as it was last year.

Here’s the query used to build the sheet:

#standardSQL
CREATE TEMPORARY FUNCTION getJsLibs(payload STRING)
RETURNS ARRAY<STRUCT<name STRING, version STRING>>
LANGUAGE js AS """
  try {
    const $ = JSON.parse(payload);
    const libs = JSON.parse($['_third-parties']);
    return Array.isArray(libs) ? libs : [];
  } catch (e) {
    return [];
  }
""";

SELECT
  date,
  client,
  lib.name AS library,
  COUNT(DISTINCT url) AS frequency
FROM (
  SELECT
    REPLACE(SUBSTR(_TABLE_SUFFIX, 0, 10), '_', '-') AS date,
    IF(ENDS_WITH(_TABLE_SUFFIX, 'desktop'), 'desktop', 'mobile') AS client,
    url,
    getJsLibs(payload) AS libs
  FROM
    `httparchive.pages.*`),
  UNNEST(libs) AS lib
GROUP BY
  date,
  client,
  library
ORDER BY
  date,
  client,
  library

Warning: it consumes 280 GB

What else can we learn about JS libraries in this dataset?


#11

thx for the data warning!


#12

Throwing Vue into the mix is interesting. Seems it just passed React sometime in April.


#13

When you say “Angular” here, do we distinguish between AngularJS (1) and Angular (2=>6 etc)?


#14

Yes, “AngularJS” is listed separately. See https://docs.google.com/spreadsheets/d/1d_M7wAjRgEa7rFRHKc4xEBQC8LfHI9AcPy_SqewNeMs/edit#gid=553559059


#16

Has there been any discussion of accounting for libraries that don’t pollute global scope? The win.X detection used here won’t work for React when bundled via Webpack or Rollup, which would be the majority case at this point.

I’ve submitted PRs to improve React detection and add Preact detection that I think might be an approach worth taking for other plugins. Perhaps a single TreeWalker could be used and library detection nested within it? It would be good for performance.


#17

Hi Rick, Noah,

I ran the same analysis from Jan to Sep 2018, and the results are a bit confusing:

The overall number of sites appears to have increased dramatically, I expect this is related to the switch to CrUX and the increase in scope. Is this a correct guess?

Apologies if this is answered elsewhere!

Best regards,


#18

@anthonyhogg Yes, we tripled our capacity in July, so the absolute count of JS library detections is expected to go up as well. For best results, I suggest looking at the number of detections as a percent of all websites in the crawl.


#19

Hi Rick,

Thanks a lot, that clarifies it.

I was looking at normalized stacks and seeing a reversal of the vue/react trend, wanted to make sure I wasn’t misreading.

Anthony Hogg


#20

FYI as @developit notes, we’re also undercounting libraries like those that don’t leave any detectable traces in the global scope. That may contribute to the trend you were seeing (unrelated to the capacity increase in July).


#21

Makes sense, so the increased detection of React could be attributed to @developit’s PR (thanks!), and Vue could be stagnating because we’re just counting those that are adding themselves to window, pending a similar fix.

So in a nutshell these trends are to be treated very circumspectly?