JavaScript Library Detection


#1

Starting in the April 15, 2017 crawl, har tables will include a _third-parties custom metric. This includes name and version information for ~100 of the most popular JavaScript libraries used on the web. This enables an exciting new area for analysis.

The format of the metric is a JSON-encoded array of library objects like this: [{"name":"jQuery","version":"1.4.2"}]


Here’s an example of a query that gets all library names and versions for all pages in the 4/15 crawl:

    CREATE TEMPORARY FUNCTION parseJson(libs STRING)
    RETURNS ARRAY<STRUCT<name STRING, version STRING>>
    LANGUAGE js AS """
      try {
        return JSON.parse(libs);
      } catch (e) {
        return [];
      }
    """;

    SELECT
      url,
      lib
    FROM (
      SELECT
        url,
        parseJson(TRIM(JSON_EXTRACT(payload, "$._third-parties"), '"')) AS libs
      FROM
        `httparchive.har.2017_04_15_chrome_pages`)
    CROSS JOIN
      UNNEST(libs) AS lib
    ORDER BY
      url;

Note that this uses the standard SQL dialect (rather than legacy). You can run the query here. Alternatively, you can browse the results of this query in the temporary httparchive:scratchspace.2017_04_15_js_libs table (a permanent table still TBD).


Using this scratch table, it’s really easy to get answers to questions like “How popular is jQuery?”:

    SELECT
      ROUND(jquery.count / total.count, 2)
    FROM
      (SELECT COUNT(0) AS count FROM `httparchive.scratchspace.2017_04_15_js_libs` WHERE lib.name = 'jQuery') AS jquery,
      (SELECT COUNT(0) AS count FROM `httparchive.har.2017_04_15_chrome_pages`) AS total

It turns out that jQuery is used by 82% of the ~500K sites crawled by HTTP Archive!


How about the top 3 most popular jQuery versions?

    SELECT
      APPROX_TOP_COUNT(lib.version, 3)
    FROM
      `httparchive.scratchspace.2017_04_15_js_libs`
    WHERE
      lib.name = 'jQuery';

Results:

f0_.value	f0_.count
1.12.4		80179
1.11.1		28507
1.11.3		26073

Version 1.12.4 is by far the leader. Interestingly, the latest version (3.2.1) is all the way down at number 42 on the list of popular versions.

There’s so much more to explore with this data, so dig in and share your findings!


Check out https://github.com/HTTPArchive/httparchive/issues/77 for more about how this works under the hood.

PS: We’re working on expanding the detection to platforms (Wordpress, etc) as well so stay tuned.


Is jQuery still relevant?
Average age of jQuery
#2

Let’s dig deeper into the JS library data and try to answer some common questions. I’ll be adding to this thread with different analyses. First up…

What are the most popular libraries?

SELECT
  library.value AS library,
  library.count AS volume,
  ROUND(library.count / total.count, 3) AS coverage
FROM
  UNNEST((SELECT APPROX_TOP_COUNT(lib.name, 10) FROM `httparchive.scratchspace.2017_04_15_js_libs` WHERE lib.name IS NOT NULL)) AS library,
  (SELECT COUNT(0) AS count FROM `httparchive.har.2017_04_15_chrome_pages`) AS total
ORDER BY
  volume DESC
Row	library		volume	coverage	 
1	jQuery		394296	0.822	 
2	jQuery UI	104193	0.217	 
3	Modernizr	76339	0.159	 
4	Bootstrap	61711	0.129	 
5	yepnope		54589	0.114	 
6	FlexSlider	39465	0.082	 
7	SWFObject	23054	0.048	 
8	Underscore	19283	0.04	 
9	Google Maps	16091	0.034	 
10	Moment.js	14834	0.031

Run it on BigQuery

Querying for the top 10 libraries shows that jQuery is way out ahead and jQuery UI leads the race behind it.


http://jsfiddle.net/8e5uobnb/4/

Getting more than the top 10 is as easy as changing the APPROX_TOP_COUNT arguments. But because jQuery has so much coverage, plotting everything linearly makes the smaller libraries’ data impossible to read.


http://jsfiddle.net/8e5uobnb/5/

Plotting the entire data in a logarithmically-scaled chart reveals the long tail.


#3

Continuing with the JS library analysis, let’s look at…

What is the relationship between site ranking and library usage?

We start by constructing a query that will count the number of times a library is used in a range of Alexa ranks. So for example, it will tell us that React is used by 31 of the top 1000 sites. This data will be used to draw a histogram.

SELECT
  libs.name AS library,
  INTEGER(FLOOR(pages.rank / 1000) * 1000) AS bucket,
  COUNT(0) AS volume
FROM
  (SELECT url, lib.name AS name FROM httparchive:scratchspace.2017_04_15_js_libs WHERE lib.name IN ('jQuery', 'Google Maps', 'Bootstrap', 'Modernizr', 'Polymer', 'Angular', 'AngularJS', 'React')) AS libs JOIN
  (SELECT url, rank FROM httparchive:runs.latest_pages) AS pages ON pages.url = libs.url
WHERE
  pages.rank IS NOT NULL
GROUP BY
  library,
  bucket
ORDER BY
  bucket ASC

Run it on BigQuery

After a bit of pre-processing, we can feed the JSON data through a charting tool to visualize the results.


http://jsfiddle.net/osq3Leur/

As expected from the previous post, jQuery dominates. But this chart tells us that the domination is pretty consistent across the rankings.

Peeling back jQuery and Modernizr, we can get a better look at how some of the other popular libraries are distributed.

Focusing now on Bootstrap and AngularJS, we can see that there is a noticeable bump in the top 50k ranked sites and usage decreases into the tail.

This is especially clear with React. There’s a huge spike in React usage in the top 10k sites.

On the other hand, Google Maps usage more strongly favors the tail of ranked sites over the top 50k, where usage is much lower. In fact, only 6 of the top 1000 sites use Google Maps. This is a good time to remind everyone that HTTP Archive only measures the home pages of these sites, so Maps usage on secondary pages would not be reflected.

Angular and Polymer adoption is low compared to the other libraries we looked at and noticeably sparse into the tail.


We can tweak our original query a bit to get a sense of library usage in the top 100 sites.

SELECT
  libs.name AS library,
  COUNT(0) AS volume
FROM
  (SELECT url, lib.name AS name FROM httparchive:scratchspace.2017_04_15_js_libs WHERE lib.name IN ('jQuery', 'Google Maps', 'Bootstrap', 'Modernizr', 'Polymer', 'Angular', 'AngularJS', 'React')) AS libs JOIN
  (SELECT url, rank FROM httparchive:runs.latest_pages) AS pages ON pages.url = libs.url
WHERE
  pages.rank < 100
GROUP BY
  library
ORDER BY
  volume DESC

Run it on BigQuery

Results:

library		volume
jQuery		47
Modernizr	6
React		3
Bootstrap	1

So the top 100 still favor jQuery, but at a much lower rate than the global average (47% vs 82%). Conversely, React has 3% usage in the top 100 as opposed to 0.4% overall.


#4

Taking your query - interactive Data Studio viz:

https://datastudio.google.com/org/aLzLLuH1QJC-2sBBmo7qdw/reporting/0ByGAKP3QmCjLS3E5Y1FVNEY5TTQ/page/cSrE


#5

The less popular (by Alexa rank) sites use less of AngularJs, while GoogleMaps grows more popular:


#6

How are JS libraries changing over time?

Now that we have two crawls completed, let’s see what changed.

SELECT
  now.lib,
  now.volume,
  now.volume - previous.volume AS change,
  ROUND((now.volume - previous.volume) * 100 / previous.volume, 1) AS percent_change
FROM (
  SELECT
    lib.name AS lib,
    COUNT(0) AS volume
  FROM
    `httparchive.scratchspace.2017_04_15_js_libs`
  GROUP BY
    lib
) AS previous INNER JOIN (
  SELECT
    lib.name AS lib,
    COUNT(0) AS volume
  FROM
    `httparchive.scratchspace.2017_05_01_js_libs`
  GROUP BY
    lib
) AS now
ON previous.lib = now.lib
ORDER BY
  percent_change DESC

Run it on BigQuery and view the results on Google Sheets.

FuseJS had the biggest change relative to its volume. But in absolute terms, it gained just 3 more sites. The next most significant changes are to Angular and Zepto.js, which acquired 51 and 259 more sites respectively.

In absolute terms, Bootstrap gained the most sites with 397 more. On the other end, jQuery and jQuery UI lost the most sites with -1039 and -894 respectively. One could argue that the 7 biggest losers in web share during this period are all libraries that have been obsoleted in some way by recent advances in web standards:

  1. jQuery
  2. jQuery UI
  3. Modernizr
  4. yepnope
  5. FlexSlider
  6. SWFObject
  7. jQuery Tools

These are exciting changes to see and we’ll continue to monitor them as HTTP Archive continues to collect more data.


#7

Hey Rick,

Did the format of _third-parties change? If you run the above query on the 2017-06-01 crawl, the query fails:

Error: Failed to coerce output value {“0”:{“name”:“Bootstrap”,“version”:“3.3.6”},“1”:{“name”:“jQuery”,“version”:“2.1.4”},“2”:{“name”:“jQuery UI”,“version”:“1.10.2”}} to type ARRAY<STRUCT<name STRING, version STRING>>

Thanks!


#8

Yeah that’s weird. It looks like the array was mistakenly stringified as an object, eg with indices as properties.

This updated UDF query should properly handle the old and new format, it just wraps the JSON.parse in an Array.from call to ensure it always comes out as an array.

CREATE TEMPORARY FUNCTION parseJson(libs STRING)
RETURNS ARRAY<STRUCT<name STRING, version STRING>>
LANGUAGE js AS """
  try {
    return Array.from(JSON.parse(libs));
  } catch (e) {
    return [];
  }
""";

SELECT
  url,
  lib
FROM (
  SELECT
    url,
    parseJson(TRIM(JSON_EXTRACT(payload, "$._third-parties"), '"')) AS libs
  FROM
    `httparchive.har.2017_06_01_chrome_pages`)
CROSS JOIN
  UNNEST(libs) AS lib
ORDER BY
  url