Higher level stats for structured data

Agreed.

I found a bug in Rick’s original query above. We need to JSON.parse the string or it’s just an array of letters and returns true for any non-null values.

I.e.

{"json_ld":false,"microdata":false...}

Is stringified to the following string array:

[{,",j,s,o,n,_,l,d,",:,f,a,l,s,e,,,",m,i,c,r,o,d,a,t,a,",:,f,a,l,s,e,...}]

which always returns True for some(i => i) except when the string is null (i.e. when the custom metric failed completely).

Using JSON.parse to convert the passed in string back to an Object to allow Object.values to work properly works:

CREATE TEMP FUNCTION HAS_STRUCTURED_DATA(data STRING) RETURNS BOOLEAN LANGUAGE js AS '''
try {
  return Object.values(JSON.parse(data)).some(i => i);
} catch {
  return false;
}
''';

WITH sd AS (
  SELECT
    client,
    root_page,
    JSON_QUERY(custom_metrics, '$.structured-data.structured_data.rendered.present') AS data,
    HAS_STRUCTURED_DATA(JSON_QUERY(custom_metrics, '$.structured-data.structured_data.rendered.present')) AS has_structured_data
  FROM
    `httparchive.all.pages`
  WHERE
    date = '2023-10-01'
)


SELECT
  client,
  COUNT(DISTINCT IF(has_structured_data, root_page, NULL)) / COUNT(DISTINCT root_page) AS pct
FROM
  sd
GROUP BY
  client

It gives the following results, which seems much more reasonable and in keeping with the Web Almanac’s split out results:

client pct
desktop 71.8%
mobile 72.6%

Still trying to recall what that really common was that I was thinking about and whether we fixed it or not… <meta... property="..."> might have been it (in which case it’s still there, and it’s correct that it is) but thought it was something even more common than that (most regularly used <meta> elements use name and content attributes rather than property).

3 Likes