Who are the top RUM analytics providers?

SELECT vendor, sites, ROUND(ratio*10000)/100 as percent FROM (
  SELECT     
    COUNT(DISTINCT pageid, 1000000) as sites, 
    RATIO_TO_REPORT(sites) OVER() AS ratio,
    CASE 
      WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/ga.js") THEN "Google Analytics"
      WHEN REGEXP_MATCH(url, "http[s]*://rum-static.pingdom.net/prum") THEN "Pingdom"
      WHEN REGEXP_MATCH(url, "insight.torbit.com/.*.js") THEN "Torbit"
      WHEN url CONTAINS  "aksb.min.js" THEN "Akamai"
      WHEN url CONTAINS "mpulse.net/boomerang/" 
        OR url CONTAINS "lognormal.net/boomerang/" THEN "SOASTA"
      WHEN url CONTAINS "/gomez" 
        AND url CONTAINS "rum" 
        AND url CONTAINS ".js" THEN "Gomez"
      WHEN url CONTAINS "boomerang.js" THEN "boomerang.js"
      WHEN url CONTAINS "/eum/rum.js" THEN "New Relic"
    END as vendor
  FROM [httparchive:runs.latest_requests] 
  GROUP BY vendor
  HAVING vendor IS NOT NULL
)
ORDER BY sites desc

Note: Akamai dynamically injects their RUM script at the time when the HTML is being generated (with customer defined sampling rate). As a result, the absolute count will vary (and won’t report all sites).

3 Likes

@souders all merged into a single query now! The percent column is the relative market share for all the sites that have RUM installed. Trying to figure out how to add an absolute % as well, but that’s turning out to be a bit more tricky… /cc @fhoffa

Reran the query with 4/15/17 data:

Not much change with GA but Pingdom, SOASTA, and Akamai have grown substantially.

Would also be interesting to see the distribution of Alexa ranks for each vendor before and after…

Edit: Looking at the absolute numbers, GA now has fewer sites, but a similar % of vendorshare. Where did they go?

Was there a change in New Relic detection? Going from 6440 sites to 2 seems suspect.

1 Like

Changing the NewRelic detection to:

WHEN url CONTAINS “js-agent.newrelic.com” THEN “New Relic”

appears to work -

1 Like

To add a few more providers into the mix (as AppDynamics can be self-hosted it may under-report their numbers)

SELECT vendor, sites, ROUND(ratio*10000)/100 as percent FROM (
  SELECT     
    COUNT(DISTINCT pageid, 1000000) as sites, 
    RATIO_TO_REPORT(sites) OVER() AS ratio,
    CASE 
      WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/ga.js") THEN "Google Analytics"
      WHEN REGEXP_MATCH(url, "http[s]*://rum-static.pingdom.net/prum") THEN "Pingdom"
      WHEN REGEXP_MATCH(url, "insight.torbit.com/.*.js") THEN "Torbit"
      WHEN url CONTAINS  "aksb.min.js" THEN "Akamai"
      WHEN url CONTAINS "mpulse.net/boomerang/" 
        OR url CONTAINS "lognormal.net/boomerang/" THEN "SOASTA"
      WHEN url CONTAINS "/gomez" 
        AND url CONTAINS "rum" 
        AND url CONTAINS ".js" THEN "Gomez"
      WHEN url CONTAINS "boomerang.js" THEN "boomerang.js"
      WHEN url CONTAINS "js-agent.newrelic.com" THEN "New Relic"
      WHEN url CONTAINS "spdcrv" THEN "SpeedCurve"
      WHEN url CONTAINS "nccrum.core.js" THEN "NCC Web Perf"
      WHEN url CONTAINS "cdn.appdynamics.com" THEN "AppDynamics"
    END as vendor
  FROM [httparchive:runs.latest_requests] 
  GROUP BY vendor
  HAVING vendor IS NOT NULL
)
ORDER BY sites desc

Looking for my own purposes and thought I’d share the results as of May.
image

I’m curious what others think about this:
For the sites I’ve inspected and companies I’ve worked with that explore RUM as a performance tool, many of them use Google Analytics AND something else - is this analysis a fair representation of what’s out there?

I’m going to experiment with the querying tool some more, but haven’t arrived at any different results with a simple reordering of the case statement.

1 Like

Another conclusion/question -
I reran the query without the “HAVING vendor IS NOT NULL” and ended up with about 82% of sites having no observable RUM presence. That’s not terribly surprising, but it’s interesting when thinking about market penetration of paid-for performance monitoring solutions.

Rerunning the query against latest tables (with larger origin count)…

SELECT vendor, sites, ROUND(ratio*10000)/100 as percent FROM (
  SELECT     
    COUNT(DISTINCT page, 1000000) as sites, 
    RATIO_TO_REPORT(sites) OVER() AS ratio,
    CASE 
      WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/ga.js") THEN "Google Analytics"
      WHEN REGEXP_MATCH(url, "http[s]*://rum-static.pingdom.net/prum") THEN "Pingdom"
      WHEN REGEXP_MATCH(url, "insight.torbit.com/.*.js") THEN "Torbit"
      WHEN url CONTAINS  "aksb.min.js" THEN "Akamai"
      WHEN url CONTAINS "mpulse.net/boomerang/" 
        OR url CONTAINS "lognormal.net/boomerang/" THEN "SOASTA"
      WHEN url CONTAINS "/gomez" 
        AND url CONTAINS "rum" 
        AND url CONTAINS ".js" THEN "Gomez"
      WHEN url CONTAINS "boomerang.js" THEN "boomerang.js"
      WHEN url CONTAINS "js-agent.newrelic.com" THEN "New Relic"
      WHEN url CONTAINS "spdcrv" THEN "SpeedCurve"
      WHEN url CONTAINS "nccrum.core.js" THEN "NCC Web Perf"
      WHEN url CONTAINS "cdn.appdynamics.com" THEN "AppDynamics"
    END as vendor
  FROM [httparchive:response_bodies.2018_07_01_desktop]  
  GROUP BY vendor
  HAVING vendor IS NOT NULL
)
ORDER BY sites desc

image

1 Like

I would suggest one change from @igrigorik’s query above to use the requests dataset rather than response_bodies, which is 3x the size. Since the query doesn’t care about the responses themselves (just the URL) requests would do.

Hmm, just realized that we’re missing some GA tags in above query (analytics.js, gtag.js)…

SELECT vendor, sites, ROUND(ratio*10000)/100 as percent FROM (
  SELECT     
    COUNT(DISTINCT page, 1000000) as sites, 
    RATIO_TO_REPORT(sites) OVER() AS ratio,
    CASE 
      WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/ga.js") THEN "Google Analytics"
      WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/analytics.js") THEN "Google Analytics"
      WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*googletagmanager.com/gtag/js") THEN "Google Analytics"
      WHEN REGEXP_MATCH(url, "http[s]*://rum-static.pingdom.net/prum") THEN "Pingdom"
      WHEN REGEXP_MATCH(url, "insight.torbit.com/.*.js") THEN "Torbit"
      WHEN url CONTAINS  "aksb.min.js" THEN "Akamai"
      WHEN url CONTAINS "mpulse.net/boomerang/" 
        OR url CONTAINS "lognormal.net/boomerang/" THEN "SOASTA"
      WHEN url CONTAINS "/gomez" 
        AND url CONTAINS "rum" 
        AND url CONTAINS ".js" THEN "Gomez"
      WHEN url CONTAINS "boomerang.js" THEN "boomerang.js"
      WHEN url CONTAINS "js-agent.newrelic.com" THEN "New Relic"
      WHEN url CONTAINS "spdcrv" THEN "SpeedCurve"
      WHEN url CONTAINS "nccrum.core.js" THEN "NCC Web Perf"
      WHEN url CONTAINS "cdn.appdynamics.com" THEN "AppDynamics"
    END as vendor
  FROM [httparchive:requests.2018_07_01_desktop]  
  GROUP BY vendor
)
ORDER BY sites desc

Using the above, a big bump for GA…

image

1 Like

I can’t see the inside view, but is this really accurately representing RUM by including all of the GA content? I know developers can create and send performance metrics through GA, but it seems like that’s more of a boundary case.

GA isn’t on the short list at all when I think about RUM (performance) in any context. I’d be happy to know otherwise.

Updated query:

  1. Renamed Akamai to Akamai (Legacy RUM)
  2. Renamed SOASTA to Akamai mPulse
  3. Added the lux.js URL for SpeedCurve
  4. Added CloudFlare’s RUM
SELECT vendor, sites, ROUND(ratio*10000)/100 as percent FROM (
  SELECT     
    COUNT(DISTINCT page, 1000000) as sites, 
    RATIO_TO_REPORT(sites) OVER() AS ratio,
    CASE 
      WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/ga.js") THEN "Google Analytics"
      WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/analytics.js") THEN "Google Analytics"
      WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*googletagmanager.com/gtag/js") THEN "Google Analytics"
      WHEN REGEXP_MATCH(url, "http[s]*://rum-static.pingdom.net/prum") THEN "Pingdom"
      WHEN REGEXP_MATCH(url, "insight.torbit.com/.*.js") THEN "Torbit"
      WHEN url CONTAINS "aksb.min.js" THEN "Akamai (Legacy RUM)"
      WHEN url CONTAINS "go-mpulse.net/boomerang/" 
        OR url CONTAINS "lognormal.net/boomerang/" THEN "Akamai mPulse"
      WHEN url CONTAINS "/gomez" 
        AND url CONTAINS "rum" 
        AND url CONTAINS ".js" THEN "Gomez"
      WHEN url CONTAINS "boomerang.js" THEN "boomerang.js"
      WHEN url CONTAINS "js-agent.newrelic.com" THEN "New Relic"
      WHEN (url CONTAINS "spdcrv" OR url CONTAINS 'speedcurve.com/js/lux.js') THEN "SpeedCurve"
      WHEN url CONTAINS "nccrum.core.js" THEN "NCC Web Perf"
      WHEN url CONTAINS "cdn.appdynamics.com" THEN "AppDynamics"
      WHEN url CONTAINS "static.cloudflareinsights.com/beacon.min.js" THEN "CloudFlare"
    END as vendor
  FROM [httparchive:requests.2020_01_01_desktop]  
  GROUP BY vendor
)
ORDER BY sites desc

(note you have to run in Legacy SQL mode)

2020-01-01 results:

Row vendor sites percent
1 null 4280567 57.46
2 Google Analytics 2994638 40.2
3 New Relic 142304 1.91
4 Akamai mPulse 14058 0.19
5 Pingdom 5795 0.08
6 CloudFlare 3585 0.05
7 AppDynamics 3192 0.04
8 Akamai (Legacy RUM) 3047 0.04
9 SpeedCurve 1940 0.03
10 boomerang.js 761 0.01
11 NCC Web Perf 19 0.0
1 Like

HTTP Archive integrates Wappalyzer detections upstream in WebPageTest and they have an Analytics category. It’s not exhaustive for web perf tools but it’s great because the detections are open source and community-maintained. I’d definitely recommend filing a PR to get any major RUM tools added.

Here’s an example of querying the Wappalyzer data, which lives in the technologies dataset:

#standardSQL
SELECT
  app,
  COUNT(0) AS sites,
  COUNT(0) / total AS percent
FROM
  `httparchive.technologies.2020_01_01_desktop`,
  (SELECT COUNT(0) AS total FROM `httparchive.summary_pages.2020_01_01_desktop`)
WHERE
  category = 'Analytics'
GROUP BY
  app,
  total
ORDER BY
  sites DESC

This query processes 1.12 GB, which is much smaller than querying the requests tables.

app sites percent
Google Analytics 3087580 72.13%
Google Analytics Enhanced eCommerce 185820 4.34%
Hotjar 177381 4.14%
Yandex.Metrika 134906 3.15%
Matomo 117786 2.75%
New Relic 116995 2.73%
comScore 84665 1.98%
TrackJs 79999 1.87%
Liveinternet 77979 1.82%
Quantcast 48154 1.12%
Crazy Egg 42600 1.00%
SiteCatalyst 31086 0.73%
Statcounter 30389 0.71%
Mixpanel 27669 0.65%
Segment 27241 0.64%
Visual Website Optimizer 20877 0.49%
Intercom 18662 0.44%
Optimizely 16710 0.39%
Gemius 16113 0.38%
Clicky 15133 0.35%
WP-Statistics 12453 0.29%
Chartbeat 10578 0.25%
Mouse Flow 10551 0.25%
Inspectlet 6315 0.15%
BugSnag 6299 0.15%

Also modified the boomerang.js query a bit to match a bunch of URLs we found from open-source users like /boomerang-n.n.n.min.js and /boomerang:

SELECT vendor, sites, ROUND(ratio*10000)/100 as percent FROM (
  SELECT     
    COUNT(DISTINCT page, 1000000) as sites, 
    RATIO_TO_REPORT(sites) OVER() AS ratio,
    CASE 
      WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/ga.js") THEN "Google Analytics"
      WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/analytics.js") THEN "Google Analytics"
      WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*googletagmanager.com/gtag/js") THEN "Google Analytics"
      WHEN REGEXP_MATCH(url, "http[s]*://rum-static.pingdom.net/prum") THEN "Pingdom"
      WHEN REGEXP_MATCH(url, "insight.torbit.com/.*.js") THEN "Torbit"
      WHEN url CONTAINS "aksb.min.js" THEN "Akamai (Legacy RUM)"
      WHEN url CONTAINS "go-mpulse.net/boomerang/" 
        OR url CONTAINS "lognormal.net/boomerang/" THEN "Akamai mPulse"
      WHEN url CONTAINS "/gomez" 
        AND url CONTAINS "rum" 
        AND url CONTAINS ".js" THEN "Gomez"
      WHEN REGEXP_MATCH(url, r".*boomerang[\d\-\w\.]*(\.js)?$") THEN "boomerang.js"
      WHEN url CONTAINS "js-agent.newrelic.com" THEN "New Relic"
      WHEN (url CONTAINS "spdcrv" OR url CONTAINS 'speedcurve.com/js/lux.js') THEN "SpeedCurve"
      WHEN url CONTAINS "nccrum.core.js" THEN "NCC Web Perf"
      WHEN url CONTAINS "cdn.appdynamics.com" THEN "AppDynamics"
      WHEN url CONTAINS "static.cloudflareinsights.com/beacon.min.js" THEN "CloudFlare"
    END as vendor
  FROM [httparchive:requests.2020_01_01_desktop]  
  GROUP BY vendor
)
ORDER BY sites desc

Which bumps it up quite a bit:

Row vendor sites percent
1 null 4280567 56.88
2 Google Analytics 2994638 39.79
3 New Relic 142304 1.89
4 boomerang.js 76162 1.01
5 Akamai mPulse 14058 0.19
6 Pingdom 5795 0.08
7 CloudFlare 3585 0.05
8 AppDynamics 3192 0.04
9 Akamai (Legacy RUM) 3047 0.04
10 SpeedCurve 1940 0.03
11 NCC Web Perf 19 0.0

Thanks Rick! I’ll take a look at updating Wappalyzer’s data too

1 Like