Structured Data adoption


#1

We could query for patterns like <script type="application/ld+json"> , eg:

#standardSQL
SELECT
  COUNT(0)
FROM
  `httparchive.response_bodies.2018_05_15_desktop`
WHERE
  body LIKE '%<script type="application/ld+json">%'

The result is 88,467 pages containing the JSON-LD signature. There are obviously several other forms of structured data, but this gives us a rough idea.

Changing the table in the query to 2017_05_15_desktop (one year ago), the result is 25,471. So adoption of structured data is definitely growing, at least for JSON-LD!

With Lighthouse support in HTTP Archive, we will soon be able to more easily query for structured data usage and validity. There is a new audit being developed: https://github.com/GoogleChrome/lighthouse/issues/4359


#2

Hi Rick !

And for microdata format we could query something like :

body LIKE 'itemtype="http%://schema.org%'

#3

Ah good idea. Modified query:

#standardSQL
SELECT
  SUM(IF(body LIKE '%<script type="application/ld+json">%', 1, 0)) AS jsonld,
  SUM(IF(REGEXP_CONTAINS(body, 'itemtype=[\'"]?https?://schema.org'), 1, 0)) AS microdata
FROM
  `httparchive.response_bodies.2018_05_15_desktop`
jsonld microdata
88467 148960

And the 2017 equivalent:

jsonld microdata
25471 56307

So we could say that both formats are growing rapidly and microdata is a more popular format.


#4

Nice :wink: here we are looking per origin or document ?


#5

The query is a bit lazy and searches through non-HTML resources as well, but the likelihood of the patterns matching is low so we could simplify by saying “per document”. But keep in mind that there could be multiple documents loaded by a page, for example if the page has no SD but it embeds an iframe that does, it’d count as 1 detection. And if 1000 pages embed the same iframe (Facebook like button, etc), it’d be counted 1000 times.