Use of custom elements with attributes

zcorpan · March 6, 2019, 11:57pm

By ask on twitter https://twitter.com/briankardell/status/1100081446528708611

I ran

#standardSQL
SELECT * FROM (
SELECT
  COUNT(*) AS num,
  REGEXP_EXTRACT(r.body, r"(?i)(<[a-z]+-[a-z0-9_-]*\s+(?:[^>/]|/[^>])+>)") AS match
FROM
  `httparchive.response_bodies.2019_02_01_desktop` AS r
WHERE
  page = url
GROUP BY
  match
ORDER BY
  num DESC
) WHERE match IS NOT NULL

See https://gist.github.com/zcorpan/56d1040e1afaa883b610c342c2e7a437

briankardell · March 12, 2019, 10:09pm

I’m not skilled with this and it seems potentially pricey to try to learn on the fly but would it make sense to do

 #standardSQL
SELECT * FROM (
SELECT
  COUNT(*) AS frequency,
  REGEXP_EXTRACT(r.body, r"(?i)(<[a-z]+-[a-z0-9_-]*\s+(?:[^>/]|/[^>])+>)") AS match,
  COUNT(DISTINCT url) AS urls
FROM
  `httparchive.response_bodies.2019_02_01_desktop` AS r
WHERE
  page = url
GROUP BY
  match
ORDER BY
  frequency DESC
) WHERE match IS NOT NULL

So that this matches the same basic format/data of better one in Use of HTML elements ? Maybe not entirely as this needs further processing still to get just the tag name and join up the counts… It feels like these two ‘ideas’ tho would be good ones to run/publish every so often in a place that is easy to find/discuss.

Topic		Replies	Views
Use of HTML elements Analysis	7	11525	November 6, 2018
Representative URLs for common unusual markup for investigation Analysis	8	2979	August 28, 2019
Which of the well-known Web Analytics providers are the most popular? Analysis	2	2574	October 30, 2013
Pages with role="text" Analysis	0	2047	August 9, 2016
How can you query HTTP Archive to yield outbound links (and domains) from homepages? Analysis	4	1389	August 17, 2020

Use of custom elements with attributes

Related topics