How to find that in how many websites h5 & h6 is being used?
The response_bodies
tables on BigQuery contain the raw HTML for each web page (in addition to other text-based resources like JS and CSS).
You can select the number of distinct pages WHERE body LIKE '%<h6%'
or similar, perhaps also taking H6
-style capitalization into account.
That should be enough to get started, but let me know if you need help writing the query.
Be aware that the response bodies are very large and the entire dataset is 853 GB, so make sure you have enough free BigQuery quota.
1 Like
@zcorpan did some analysis that answers this question: Use of HTML elements