Average age of jQuery

In the JS library detection announcement Tuesday, I used jQuery as an example of the powerful queries you can run to answer complex questions about the web.

Ilya tweeted:

jQuery is used by 82% of the ~500K sites crawled by HTTP Archive: JavaScript Library Detection - Analysis - HTTP Archive - really old jQuery too… 1.12.4 dominates. yikes?

And a really great discussion followed about why this particular version is so popular. As it turns out, 1.x versions are compatible with older browsers and still actively* maintained. 1.12.4’s popularity is aided by the fact that it is the default version in WordPress.

* The jQuery release history on Wikipedia shows that 1.12.4 was last updated on May 20, 2016.

So that got me thinking. If we take the release data from Wikipedia and join it with the version data on HTTP Archive, we could better understand not only what the most popular versions are, but how old.

I wrote this script to extract the data into a succinct format to be used by BigQuery. Here’s the full extraction:

[
  {
    "version": "1.0",
    "date": "2006-8-26"
  },
  {
    "version": "1.1",
    "date": "2007-1-14"
  },
  {
    "version": "1.2",
    "date": "2007-9-10"
  },
  {
    "version": "1.3",
    "date": "2009-1-14"
  },
  {
    "version": "1.4",
    "date": "2010-1-14"
  },
  {
    "version": "1.5",
    "date": "2011-1-31"
  },
  {
    "version": "1.6",
    "date": "2011-5-3"
  },
  {
    "version": "1.7",
    "date": "2011-11-3"
  },
  {
    "version": "1.7.2",
    "date": "2012-3-21"
  },
  {
    "version": "1.8",
    "date": "2012-8-9"
  },
  {
    "version": "1.8.3",
    "date": "2012-11-13"
  },
  {
    "version": "1.9",
    "date": "2013-1-15"
  },
  {
    "version": "1.9.1",
    "date": "2013-2-4"
  },
  {
    "version": "1.10",
    "date": "2013-5-24"
  },
  {
    "version": "1.10.2",
    "date": "2013-7-3"
  },
  {
    "version": "1.11",
    "date": "2014-1-24"
  },
  {
    "version": "1.11.3",
    "date": "2015-4-28"
  },
  {
    "version": "1.12",
    "date": "2016-1-8"
  },
  {
    "version": "1.12.4",
    "date": "2016-5-20"
  },
  {
    "version": "2.0",
    "date": "2013-4-18"
  },
  {
    "version": "2.0.3",
    "date": "2013-7-3"
  },
  {
    "version": "2.1",
    "date": "2014-1-24"
  },
  {
    "version": "2.1.4",
    "date": "2015-4-28"
  },
  {
    "version": "2.2",
    "date": "2016-1-8"
  },
  {
    "version": "2.2.4",
    "date": "2016-5-20"
  },
  {
    "version": "3.0",
    "date": "2016-6-9"
  },
  {
    "version": "3.0.0",
    "date": "2016-6-9"
  },
  {
    "version": "3.1",
    "date": "2016-7-7"
  },
  {
    "version": "3.1.1",
    "date": "2016-9-23"
  }
]

Then I wrote a User-Defined Function (UDF) in BigQuery that basically just lets me use JSON data as if it were a table. (Let me know if there’s an easier way!). I reference this UDF and join it with the JS library popularity data to get a list of weighted ages for each version. If a particular version does not have a release (or last modified) date, then it falls back to the date of its minor version.

CREATE TEMPORARY FUNCTION getVersionDates()
RETURNS ARRAY<STRUCT<version STRING, date STRING>>
LANGUAGE js AS """
  // Adapted from https://en.wikipedia.org/wiki/JQuery#Release_history
  // See https://gist.github.com/rviscomi/31916a648679b64532d3612356807792
  return [{"version":"1.0","date":"2006-8-26"},{"version":"1.1","date":"2007-1-14"},{"version":"1.2","date":"2007-9-10"},{"version":"1.3","date":"2009-1-14"},{"version":"1.4","date":"2010-1-14"},{"version":"1.5","date":"2011-1-31"},{"version":"1.6","date":"2011-5-3"},{"version":"1.7","date":"2011-11-3"},{"version":"1.7.2","date":"2012-3-21"},{"version":"1.8","date":"2012-8-9"},{"version":"1.8.3","date":"2012-11-13"},{"version":"1.9","date":"2013-1-15"},{"version":"1.9.1","date":"2013-2-4"},{"version":"1.10","date":"2013-5-24"},{"version":"1.10.2","date":"2013-7-3"},{"version":"1.11","date":"2014-1-24"},{"version":"1.11.3","date":"2015-4-28"},{"version":"1.12","date":"2016-1-8"},{"version":"1.12.4","date":"2016-5-20"},{"version":"2.0","date":"2013-4-18"},{"version":"2.0.3","date":"2013-7-3"},{"version":"2.1","date":"2014-1-24"},{"version":"2.1.4","date":"2015-4-28"},{"version":"2.2","date":"2016-1-8"},{"version":"2.2.4","date":"2016-5-20"},{"version":"3.0","date":"2016-6-9"},{"version":"3.0.0","date":"2016-6-9"},{"version":"3.1","date":"2016-7-7"},{"version":"3.1.1","date":"2016-9-23"}];
""";

SELECT SUM(weighted_age) / SUM(count) AS average_age_days FROM (
SELECT
  version.value AS version,
  version.count,
  release.version AS release_version,
  release.date AS release_date,
  DATE_DIFF(CURRENT_DATE(), release.date, DAY) AS age,
  version.count * DATE_DIFF(CURRENT_DATE(), release.date, DAY) AS weighted_age
FROM
  (SELECT
    COUNT(0) AS count,
    lib.version AS value
  FROM
    `httparchive.scratchspace.2017_04_15_js_libs`
  WHERE
    lib.name = 'jQuery'
  GROUP BY
    lib.version) AS version
JOIN
  (SELECT
    version,
    CAST(date AS DATE) AS date
  FROM
    UNNEST(getVersionDates())) AS release
ON
  release.version = version.value OR
  (NOT version.value IN ("1.0", "1.1", "1.2", "1.3", "1.4", "1.5", "1.6", "1.7", "1.7.2", "1.8", "1.8.3", "1.9", "1.9.1", "1.10", "1.10.2", "1.11", "1.11.3", "1.12", "1.12.4", "2.0", "2.0.3", "2.1", "2.1.4", "2.2", "2.2.4", "3.0", "3.0.0", "3.1", "3.1.1")) AND
  release.version = REGEXP_EXTRACT(version.value, r'\d+\.\d+')
ORDER BY
  count DESC,
  version DESC
)

See it on BigQuery.

The results of the inner query show the breakdown of versions and ages:

Then the outer query averages it all up to get the final answer: 1214 days, or about 3.3 years!

Here’s a visualization of this data: https://jsfiddle.net/sp48f2j3/