SELECT COUNT(distinct page) AS num FROM `httparchive.response_bodies.2019_10_01_desktop`
WHERE REGEXP_CONTAINS(LOWER(body), r'<meta name=["\']?monetization["\']?')
134
SELECT COUNT(distinct page) AS num FROM `httparchive.response_bodies.2020_10_01_desktop`
WHERE REGEXP_CONTAINS(LOWER(body), r'<meta name=["\']?monetization["\']?')
SELECT COUNT(distinct page) AS num FROM `httparchive.response_bodies.2021_10_01_desktop`
WHERE REGEXP_CONTAINS(LOWER(body), r'<meta\s+([^>\s]+\s+)*name=["\']?monetization["\']?')
The result is 1275 out of 5,531,644 pages, or 0.023 %, so it looks like it has increased a lot in one year!
Would anyone be interested to add a custom metric to more reliably detect this at runtime in WebPageTest? That would also make the analysis much cheaper as it wouldn’t depend on the response_bodies dataset!
Updated query to use blink_features. Pasting here for posterity.
#standardSQL
# returns the value of the monetization meta node
SELECT
yyyymmdd,
client,
COUNTIF(feature = 'HTMLMetaElementMonetization') AS meta,
COUNTIF(feature = 'HTMLLinkElementMonetization') AS link,
COUNTIF(feature IN ('HTMLMetaElementMonetization', 'HTMLLinkElementMonetization')) AS either,
COUNT(DISTINCT url) AS total,
COUNTIF(feature = 'HTMLMetaElementMonetization') / COUNT(DISTINCT url) AS meta_pct,
COUNTIF(feature = 'HTMLLinkElementMonetization') / COUNT(DISTINCT url) AS link_pct,
COUNTIF(feature IN ('HTMLMetaElementMonetization', 'HTMLLinkElementMonetization')) / COUNT(DISTINCT url) AS either_pct
FROM
`httparchive.blink_features.features`
WHERE
yyyymmdd = '2021-07-01'
GROUP BY
yyyymmdd,
client
ORDER BY
client,
either DESC