Can’t resist.
I took the list of deprecated elements and created a regex pattern: Array.from($0.querySelectorAll('dfn code')).map(e => e.innerText).join('|')
which results in: "applet|acronym|bgsound|dir|noframes|isindex|keygen|listing|menuitem|nextid|noembed|plaintext|rb|rtc|strike|xmp|basefont|big|blink|center|font|multicol|nobr|spacer|tt|marquee"
.
I’m matching elements if the tag name is preceded by a <
. Not super robust but a good place to start.
#standardSQL
SELECT
LOWER(tag) AS tag,
COUNT(0) AS frequency,
COUNT(DISTINCT url) AS urls
FROM (
SELECT
url,
REGEXP_EXTRACT_ALL(body, '(?i)<(applet|acronym|bgsound|dir|noframes|isindex|keygen|listing|menuitem|nextid|noembed|plaintext|rb|rtc|strike|xmp|basefont|big|blink|center|font|multicol|nobr|spacer|tt|marquee)') AS tags
FROM
`httparchive.response_bodies.2018_07_01_desktop`),
UNNEST(tags) AS tag
GROUP BY
tag
ORDER BY
frequency DESC
WARNING: 2.5 TB!
tag | frequency | urls |
---|---|---|
font | 5143594 | 223473 |
center | 1070270 | 217903 |
tt | 333876 | 41528 |
nobr | 225042 | 30122 |
big | 101907 | 18438 |
strike | 98386 | 12867 |
menuitem | 78198 | 9896 |
xmp | 78162 | 62446 |
plaintext | 65701 | 57276 |
marquee | 49669 | 25423 |
rb | 41385 | 4954 |
dir | 32802 | 4690 |
basefont | 27476 | 646 |
acronym | 25012 | 2687 |
applet | 12842 | 5128 |
blink | 11097 | 4689 |
noframes | 9850 | 4700 |
spacer | 8141 | 688 |
noembed | 7659 | 852 |
bgsound | 5692 | 781 |
listing | 2886 | 640 |
rtc | 230 | 53 |
nextid | 193 | 165 |
keygen | 109 | 40 |
isindex | 73 | 64 |
multicol | 19 | 16 |
Anyone else want to look into the deprecated attributes?
Edit: @HenriHelvetica pointed out that the original query didn’t include marquee
. I’ve rerun the query and updated the table.