Usage of Web Monetization 2019 vs 2020

I wanted to check the adoption rate of Web Monetization, so I ran two queries in httparchive.

Results

2019-10-01: 134 pages out of 4,325,232 = 0.0031%
2020-10-01: 415 pages out of 5,506,819 = 0.0075%

So, very few pages using it; more pages are using the nonstandard and obsolete no-op basefont element (0.0092% of pages). Year-over-year increase by ~142% (only two data points though!), so if that rate continues it takes 6 years to get to >1% of pages, after 10 years >50% of pages.

Queries

SELECT COUNT(distinct page) AS num FROM `httparchive.response_bodies.2019_10_01_desktop`
WHERE REGEXP_CONTAINS(LOWER(body), r'<meta name=["\']?monetization["\']?')

134

SELECT COUNT(distinct page) AS num FROM `httparchive.response_bodies.2020_10_01_desktop`
WHERE REGEXP_CONTAINS(LOWER(body), r'<meta name=["\']?monetization["\']?')

415

2 Likes

As @nhoizey pointed out on twitter, the regex doesn’t account for pages that use the content attribute before the name attribute.

I reran the query for 2020, with r'<meta\s+([^>\s]+\s+)*name=["\']?monetization["\']?'

500 instead of 415, or 0.0091%.

1 Like

That’s 20 % more, it’s huge! :sweat_smile:

I just ran the query with 2021-10 data:

SELECT COUNT(distinct page) AS num FROM `httparchive.response_bodies.2021_10_01_desktop`
WHERE REGEXP_CONTAINS(LOWER(body), r'<meta\s+([^>\s]+\s+)*name=["\']?monetization["\']?')

The result is 1275 out of 5,531,644 pages, or 0.023 %, so it looks like it has increased a lot in one year!

Would anyone be interested to add a custom metric to more reliably detect this at runtime in WebPageTest? That would also make the analysis much cheaper as it wouldn’t depend on the response_bodies dataset!

For the 2021 Markup chapter for Web Almanac we created this query which doesn’t depend on response_bodies. I believe the numbers add up too.

#standardSQL

# returns the value of the monetization meta node
CREATE TEMPORARY FUNCTION get_almanac_meta_monetization(almanac_string STRING)
RETURNS STRING LANGUAGE js AS '''
try {
    const almanac = JSON.parse(almanac_string); 
    if (Array.isArray(almanac) || typeof almanac != 'object') return '';
    let nodes = almanac["meta-nodes"]["nodes"];
    nodes = typeof nodes === "string" ? JSON.parse(nodes) : nodes;
    
    const filteredNode = nodes.filter(n => n.name && n.name.toLowerCase() == "monetization");    
    if (filteredNode.length === 0) {
      return "";
    }
    
    return filteredNode[0].content;
} catch (e) { 
  return "";
}
''';

SELECT
  client,
  COUNTIF(monetization != "") AS freq,
  COUNT(0) AS total,
  COUNTIF(monetization != "") / COUNT(0) AS pct
FROM (
  SELECT
    _TABLE_SUFFIX AS client,
    get_almanac_meta_monetization(JSON_EXTRACT_SCALAR(payload, '$._almanac')) AS monetization
  FROM
    `httparchive.pages.2021_07_01_*`
)
GROUP BY
  client
ORDER BY
  client,
  freq DESC

Source: almanac.httparchive.org/monetization.sql at main · HTTPArchive/almanac.httparchive.org · GitHub

1 Like

Oh that’s a better idea @imkevdev. If the monetization info is always in a meta element, querying the meta-nodes object should be sufficient.

Updated query to use blink_features. Pasting here for posterity.

#standardSQL

# returns the value of the monetization meta node
SELECT
  yyyymmdd,
  client,
  COUNTIF(feature = 'HTMLMetaElementMonetization') AS meta,
  COUNTIF(feature = 'HTMLLinkElementMonetization') AS link,
  COUNTIF(feature IN ('HTMLMetaElementMonetization', 'HTMLLinkElementMonetization')) AS either,
  COUNT(DISTINCT url) AS total,
  COUNTIF(feature = 'HTMLMetaElementMonetization') / COUNT(DISTINCT url) AS meta_pct,
  COUNTIF(feature = 'HTMLLinkElementMonetization') / COUNT(DISTINCT url) AS link_pct,
  COUNTIF(feature IN ('HTMLMetaElementMonetization', 'HTMLLinkElementMonetization')) / COUNT(DISTINCT url) AS either_pct
FROM
  `httparchive.blink_features.features`
WHERE
  yyyymmdd = '2021-07-01'
GROUP BY
  yyyymmdd,
  client
ORDER BY
  client,
  either DESC