SELECT vendor, sites, ROUND(ratio*10000)/100 as percent FROM (
SELECT
COUNT(DISTINCT pageid, 1000000) as sites,
RATIO_TO_REPORT(sites) OVER() AS ratio,
CASE
WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/ga.js") THEN "Google Analytics"
WHEN REGEXP_MATCH(url, "http[s]*://rum-static.pingdom.net/prum") THEN "Pingdom"
WHEN REGEXP_MATCH(url, "insight.torbit.com/.*.js") THEN "Torbit"
WHEN url CONTAINS "aksb.min.js" THEN "Akamai"
WHEN url CONTAINS "mpulse.net/boomerang/"
OR url CONTAINS "lognormal.net/boomerang/" THEN "SOASTA"
WHEN url CONTAINS "/gomez"
AND url CONTAINS "rum"
AND url CONTAINS ".js" THEN "Gomez"
WHEN url CONTAINS "boomerang.js" THEN "boomerang.js"
WHEN url CONTAINS "/eum/rum.js" THEN "New Relic"
END as vendor
FROM [httparchive:runs.latest_requests]
GROUP BY vendor
HAVING vendor IS NOT NULL
)
ORDER BY sites desc
Note: Akamai dynamically injects their RUM script at the time when the HTML is being generated (with customer defined sampling rate). As a result, the absolute count will vary (and won’t report all sites).
@souders all merged into a single query now! The percent column is the relative market share for all the sites that have RUM installed. Trying to figure out how to add an absolute % as well, but that’s turning out to be a bit more tricky… /cc @fhoffa
To add a few more providers into the mix (as AppDynamics can be self-hosted it may under-report their numbers)
SELECT vendor, sites, ROUND(ratio*10000)/100 as percent FROM (
SELECT
COUNT(DISTINCT pageid, 1000000) as sites,
RATIO_TO_REPORT(sites) OVER() AS ratio,
CASE
WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/ga.js") THEN "Google Analytics"
WHEN REGEXP_MATCH(url, "http[s]*://rum-static.pingdom.net/prum") THEN "Pingdom"
WHEN REGEXP_MATCH(url, "insight.torbit.com/.*.js") THEN "Torbit"
WHEN url CONTAINS "aksb.min.js" THEN "Akamai"
WHEN url CONTAINS "mpulse.net/boomerang/"
OR url CONTAINS "lognormal.net/boomerang/" THEN "SOASTA"
WHEN url CONTAINS "/gomez"
AND url CONTAINS "rum"
AND url CONTAINS ".js" THEN "Gomez"
WHEN url CONTAINS "boomerang.js" THEN "boomerang.js"
WHEN url CONTAINS "js-agent.newrelic.com" THEN "New Relic"
WHEN url CONTAINS "spdcrv" THEN "SpeedCurve"
WHEN url CONTAINS "nccrum.core.js" THEN "NCC Web Perf"
WHEN url CONTAINS "cdn.appdynamics.com" THEN "AppDynamics"
END as vendor
FROM [httparchive:runs.latest_requests]
GROUP BY vendor
HAVING vendor IS NOT NULL
)
ORDER BY sites desc
Looking for my own purposes and thought I’d share the results as of May.
I’m curious what others think about this:
For the sites I’ve inspected and companies I’ve worked with that explore RUM as a performance tool, many of them use Google Analytics AND something else - is this analysis a fair representation of what’s out there?
I’m going to experiment with the querying tool some more, but haven’t arrived at any different results with a simple reordering of the case statement.
Another conclusion/question -
I reran the query without the “HAVING vendor IS NOT NULL” and ended up with about 82% of sites having no observable RUM presence. That’s not terribly surprising, but it’s interesting when thinking about market penetration of paid-for performance monitoring solutions.
Rerunning the query against latest tables (with larger origin count)…
SELECT vendor, sites, ROUND(ratio*10000)/100 as percent FROM (
SELECT
COUNT(DISTINCT page, 1000000) as sites,
RATIO_TO_REPORT(sites) OVER() AS ratio,
CASE
WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/ga.js") THEN "Google Analytics"
WHEN REGEXP_MATCH(url, "http[s]*://rum-static.pingdom.net/prum") THEN "Pingdom"
WHEN REGEXP_MATCH(url, "insight.torbit.com/.*.js") THEN "Torbit"
WHEN url CONTAINS "aksb.min.js" THEN "Akamai"
WHEN url CONTAINS "mpulse.net/boomerang/"
OR url CONTAINS "lognormal.net/boomerang/" THEN "SOASTA"
WHEN url CONTAINS "/gomez"
AND url CONTAINS "rum"
AND url CONTAINS ".js" THEN "Gomez"
WHEN url CONTAINS "boomerang.js" THEN "boomerang.js"
WHEN url CONTAINS "js-agent.newrelic.com" THEN "New Relic"
WHEN url CONTAINS "spdcrv" THEN "SpeedCurve"
WHEN url CONTAINS "nccrum.core.js" THEN "NCC Web Perf"
WHEN url CONTAINS "cdn.appdynamics.com" THEN "AppDynamics"
END as vendor
FROM [httparchive:response_bodies.2018_07_01_desktop]
GROUP BY vendor
HAVING vendor IS NOT NULL
)
ORDER BY sites desc
I would suggest one change from @igrigorik’s query above to use the requests dataset rather than response_bodies, which is 3x the size. Since the query doesn’t care about the responses themselves (just the URL) requests would do.
Hmm, just realized that we’re missing some GA tags in above query (analytics.js, gtag.js)…
SELECT vendor, sites, ROUND(ratio*10000)/100 as percent FROM (
SELECT
COUNT(DISTINCT page, 1000000) as sites,
RATIO_TO_REPORT(sites) OVER() AS ratio,
CASE
WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/ga.js") THEN "Google Analytics"
WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/analytics.js") THEN "Google Analytics"
WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*googletagmanager.com/gtag/js") THEN "Google Analytics"
WHEN REGEXP_MATCH(url, "http[s]*://rum-static.pingdom.net/prum") THEN "Pingdom"
WHEN REGEXP_MATCH(url, "insight.torbit.com/.*.js") THEN "Torbit"
WHEN url CONTAINS "aksb.min.js" THEN "Akamai"
WHEN url CONTAINS "mpulse.net/boomerang/"
OR url CONTAINS "lognormal.net/boomerang/" THEN "SOASTA"
WHEN url CONTAINS "/gomez"
AND url CONTAINS "rum"
AND url CONTAINS ".js" THEN "Gomez"
WHEN url CONTAINS "boomerang.js" THEN "boomerang.js"
WHEN url CONTAINS "js-agent.newrelic.com" THEN "New Relic"
WHEN url CONTAINS "spdcrv" THEN "SpeedCurve"
WHEN url CONTAINS "nccrum.core.js" THEN "NCC Web Perf"
WHEN url CONTAINS "cdn.appdynamics.com" THEN "AppDynamics"
END as vendor
FROM [httparchive:requests.2018_07_01_desktop]
GROUP BY vendor
)
ORDER BY sites desc
I can’t see the inside view, but is this really accurately representing RUM by including all of the GA content? I know developers can create and send performance metrics through GA, but it seems like that’s more of a boundary case.
GA isn’t on the short list at all when I think about RUM (performance) in any context. I’d be happy to know otherwise.
SELECT vendor, sites, ROUND(ratio*10000)/100 as percent FROM (
SELECT
COUNT(DISTINCT page, 1000000) as sites,
RATIO_TO_REPORT(sites) OVER() AS ratio,
CASE
WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/ga.js") THEN "Google Analytics"
WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/analytics.js") THEN "Google Analytics"
WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*googletagmanager.com/gtag/js") THEN "Google Analytics"
WHEN REGEXP_MATCH(url, "http[s]*://rum-static.pingdom.net/prum") THEN "Pingdom"
WHEN REGEXP_MATCH(url, "insight.torbit.com/.*.js") THEN "Torbit"
WHEN url CONTAINS "aksb.min.js" THEN "Akamai (Legacy RUM)"
WHEN url CONTAINS "go-mpulse.net/boomerang/"
OR url CONTAINS "lognormal.net/boomerang/" THEN "Akamai mPulse"
WHEN url CONTAINS "/gomez"
AND url CONTAINS "rum"
AND url CONTAINS ".js" THEN "Gomez"
WHEN url CONTAINS "boomerang.js" THEN "boomerang.js"
WHEN url CONTAINS "js-agent.newrelic.com" THEN "New Relic"
WHEN (url CONTAINS "spdcrv" OR url CONTAINS 'speedcurve.com/js/lux.js') THEN "SpeedCurve"
WHEN url CONTAINS "nccrum.core.js" THEN "NCC Web Perf"
WHEN url CONTAINS "cdn.appdynamics.com" THEN "AppDynamics"
WHEN url CONTAINS "static.cloudflareinsights.com/beacon.min.js" THEN "CloudFlare"
END as vendor
FROM [httparchive:requests.2020_01_01_desktop]
GROUP BY vendor
)
ORDER BY sites desc
HTTP Archive integrates Wappalyzer detections upstream in WebPageTest and they have an Analytics category. It’s not exhaustive for web perf tools but it’s great because the detections are open source and community-maintained. I’d definitely recommend filing a PR to get any major RUM tools added.
Here’s an example of querying the Wappalyzer data, which lives in the technologies dataset:
#standardSQL
SELECT
app,
COUNT(0) AS sites,
COUNT(0) / total AS percent
FROM
`httparchive.technologies.2020_01_01_desktop`,
(SELECT COUNT(0) AS total FROM `httparchive.summary_pages.2020_01_01_desktop`)
WHERE
category = 'Analytics'
GROUP BY
app,
total
ORDER BY
sites DESC
This query processes 1.12 GB, which is much smaller than querying the requests tables.
Also modified the boomerang.js query a bit to match a bunch of URLs we found from open-source users like /boomerang-n.n.n.min.js and /boomerang:
SELECT vendor, sites, ROUND(ratio*10000)/100 as percent FROM (
SELECT
COUNT(DISTINCT page, 1000000) as sites,
RATIO_TO_REPORT(sites) OVER() AS ratio,
CASE
WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/ga.js") THEN "Google Analytics"
WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*google-analytics.com/analytics.js") THEN "Google Analytics"
WHEN REGEXP_MATCH(url, "http[s]*://[www.|ssl.]*googletagmanager.com/gtag/js") THEN "Google Analytics"
WHEN REGEXP_MATCH(url, "http[s]*://rum-static.pingdom.net/prum") THEN "Pingdom"
WHEN REGEXP_MATCH(url, "insight.torbit.com/.*.js") THEN "Torbit"
WHEN url CONTAINS "aksb.min.js" THEN "Akamai (Legacy RUM)"
WHEN url CONTAINS "go-mpulse.net/boomerang/"
OR url CONTAINS "lognormal.net/boomerang/" THEN "Akamai mPulse"
WHEN url CONTAINS "/gomez"
AND url CONTAINS "rum"
AND url CONTAINS ".js" THEN "Gomez"
WHEN REGEXP_MATCH(url, r".*boomerang[\d\-\w\.]*(\.js)?$") THEN "boomerang.js"
WHEN url CONTAINS "js-agent.newrelic.com" THEN "New Relic"
WHEN (url CONTAINS "spdcrv" OR url CONTAINS 'speedcurve.com/js/lux.js') THEN "SpeedCurve"
WHEN url CONTAINS "nccrum.core.js" THEN "NCC Web Perf"
WHEN url CONTAINS "cdn.appdynamics.com" THEN "AppDynamics"
WHEN url CONTAINS "static.cloudflareinsights.com/beacon.min.js" THEN "CloudFlare"
END as vendor
FROM [httparchive:requests.2020_01_01_desktop]
GROUP BY vendor
)
ORDER BY sites desc
Updated the query to use Standard SQL instead of Legacy SQL. Also added DynaTrace’s RUM.
SELECT
sites,
ROUND(sites / (SELECT COUNT(DISTINCT page) FROM `httparchive.requests.2020_06_01_desktop`) * 100 , 2) AS pct_sites,
vendor
FROM (
SELECT
COUNT(DISTINCT page) as sites,
CASE
WHEN REGEXP_CONTAINS(url, r"http[s]*://[www.|ssl.]*google-analytics.com/ga.js") THEN "Google Analytics"
WHEN REGEXP_CONTAINS(url, r"http[s]*://[www.|ssl.]*google-analytics.com/analytics.js") THEN "Google Analytics"
WHEN REGEXP_CONTAINS(url, r"http[s]*://[www.|ssl.]*googletagmanager.com/gtag/js") THEN "Google Analytics"
WHEN REGEXP_CONTAINS(url, r"http[s]*://rum-static.pingdom.net/prum") THEN "Pingdom"
WHEN REGEXP_CONTAINS(url, r"insight.torbit.com/.*.js") THEN "Torbit"
WHEN url LIKE "%aksb.min.js%" THEN "Akamai (Legacy RUM)"
WHEN url LIKE "%go-mpulse.net/boomerang/%"
OR url LIKE "%lognormal.net/boomerang/%" THEN "Akamai mPulse"
WHEN url LIKE "%/gomez%"
AND url LIKE "%rum%"
AND url LIKE "%.js%" THEN "Gomez"
WHEN REGEXP_CONTAINS(url, r".*boomerang[\d\-\w\.]*(\.js)?$") THEN "boomerang.js"
WHEN url LIKE "%js-agent.newrelic.com%" THEN "New Relic"
WHEN url LIKE "%spdcrv%" OR url LIKE '%speedcurve.com/js/lux.js%'THEN "SpeedCurve"
WHEN url LIKE "%nccrum.core.js%" THEN "NCC Web Perf"
WHEN url LIKE "%cdn.appdynamics.com%" THEN "AppDynamics"
WHEN url LIKE "%static.cloudflareinsights.com/beacon.min.js%" THEN "CloudFlare"
WHEN url LIKE "%ruxitagent%.js%" THEN "Dynatrace"
END as vendor
FROM `httparchive.requests.2020_06_01_desktop`
GROUP BY vendor
)
WHERE vendor IS NOT NULL
GROUP BY vendor, sites
ORDER BY sites DESC
Thanks @paulcalvano for adding Dynatrace. I don’t know how to identify DataDog OR Raygun’s RUM solutions but it will good to add those as well.
I remember @andydavies saying on a thread that he has a long list of RUM providers. @andydavies - are you able to list provider names here so that we can expand this analysis?