Researching the state of structured data on the web in preparation for an upcoming Lighthouse audit. Came up with this query to extract a sample of JSON-LD snippets from the top sites:
SELECT rank, url, data FROM ( SELECT url, JSON_EXTRACT( REGEXP_EXTRACT(body, '(?i)<script type=[\'"]?application/ld\\+json[\'"]?>(.*)</script>'), '$') AS data FROM `httparchive.response_bodies.2018_04_15_desktop` WHERE body LIKE '%application/ld+json%') JOIN `httparchive.summary_pages.2018_04_15_desktop` USING (url) WHERE data IS NOT NULL AND rank IS NOT NULL ORDER BY rank LIMIT 100
Warning: running queries like this that process response bodies will consume your entire free monthly quota (1 TB)
The results are in this sheet.
Using this sample, we can see what the popular JSON-LD properties are and what kinds of validation would be most impactful in a Lighthouse audit.