Use of deprecated HTML features on the web?

Can’t resist.

I took the list of deprecated elements and created a regex pattern: Array.from($0.querySelectorAll('dfn code')).map(e => e.innerText).join('|') which results in: "applet|acronym|bgsound|dir|noframes|isindex|keygen|listing|menuitem|nextid|noembed|plaintext|rb|rtc|strike|xmp|basefont|big|blink|center|font|multicol|nobr|spacer|tt|marquee".

I’m matching elements if the tag name is preceded by a <. Not super robust but a good place to start.

#standardSQL
SELECT
  LOWER(tag) AS tag,
  COUNT(0) AS frequency,
  COUNT(DISTINCT url) AS urls
FROM (
  SELECT
    url,
    REGEXP_EXTRACT_ALL(body, '(?i)<(applet|acronym|bgsound|dir|noframes|isindex|keygen|listing|menuitem|nextid|noembed|plaintext|rb|rtc|strike|xmp|basefont|big|blink|center|font|multicol|nobr|spacer|tt|marquee)') AS tags
  FROM
    `httparchive.response_bodies.2018_07_01_desktop`),
  UNNEST(tags) AS tag
GROUP BY
  tag
ORDER BY
  frequency DESC

WARNING: 2.5 TB!

tag frequency urls
font 5143594 223473
center 1070270 217903
tt 333876 41528
nobr 225042 30122
big 101907 18438
strike 98386 12867
menuitem 78198 9896
xmp 78162 62446
plaintext 65701 57276
marquee 49669 25423
rb 41385 4954
dir 32802 4690
basefont 27476 646
acronym 25012 2687
applet 12842 5128
blink 11097 4689
noframes 9850 4700
spacer 8141 688
noembed 7659 852
bgsound 5692 781
listing 2886 640
rtc 230 53
nextid 193 165
keygen 109 40
isindex 73 64
multicol 19 16

Anyone else want to look into the deprecated attributes?

Edit: @HenriHelvetica pointed out that the original query didn’t include marquee. I’ve rerun the query and updated the table.

1 Like