Use of deprecated HTML features on the web?


#1

The HTML standard contains a list of deprecated elements + attributes and features:
https://html.spec.whatwg.org/multipage/obsolete.html#non-conforming-features

Would anyone be willing to dig in and do some analysis on existing use of these elements on the web? :slight_smile:

For example, which are the most frequently used, and are there any common culprits / examples of where they’re being used? In particular, this question recently came up in context of use of <plaintext>, and I’d love to understand where and why it’s still used on the web.


#2

Can’t resist.

I took the list of deprecated elements and created a regex pattern: Array.from($0.querySelectorAll('dfn code')).map(e => e.innerText).join('|') which results in: "applet|acronym|bgsound|dir|noframes|isindex|keygen|listing|menuitem|nextid|noembed|plaintext|rb|rtc|strike|xmp|basefont|big|blink|center|font|multicol|nobr|spacer|tt|marquee".

I’m matching elements if the tag name is preceded by a <. Not super robust but a good place to start.

#standardSQL
SELECT
  LOWER(tag) AS tag,
  COUNT(0) AS frequency,
  COUNT(DISTINCT url) AS urls
FROM (
  SELECT
    url,
    REGEXP_EXTRACT_ALL(body, '(?i)<(applet|acronym|bgsound|dir|noframes|isindex|keygen|listing|menuitem|nextid|noembed|plaintext|rb|rtc|strike|xmp|basefont|big|blink|center|font|multicol|nobr|spacer|tt|marquee)') AS tags
  FROM
    `httparchive.response_bodies.2018_07_01_desktop`),
  UNNEST(tags) AS tag
GROUP BY
  tag
ORDER BY
  frequency DESC

WARNING: 2.5 TB!

tag frequency urls
font 5143594 223473
center 1070270 217903
tt 333876 41528
nobr 225042 30122
big 101907 18438
strike 98386 12867
menuitem 78198 9896
xmp 78162 62446
plaintext 65701 57276
marquee 49669 25423
rb 41385 4954
dir 32802 4690
basefont 27476 646
acronym 25012 2687
applet 12842 5128
blink 11097 4689
noframes 9850 4700
spacer 8141 688
noembed 7659 852
bgsound 5692 781
listing 2886 640
rtc 230 53
nextid 193 165
keygen 109 40
isindex 73 64
multicol 19 16

Anyone else want to look into the deprecated attributes?

Edit: @HenriHelvetica pointed out that the original query didn’t include marquee. I’ve rerun the query and updated the table.


#3

57K sites using plaintext, really? Wow! What the heck are they using it for…


#4

clearly, centring elements isn’t hard: 1M+ uses. :eyes:


#5

BTW, we are now trying to deprecate Shadow DOM V0, which used <content> and <shadow>.
Now Blink is the only engine that supports Shadow DOM V0, and its usage is estimated to be ~2%
of page views, but usage of <content> seems much higher than that.
<shadow> is used for “multiple shadow roots” feature, which was a part of Shadow DOM V0 spec
but the support code is already removed from Blink, so now <shadow> behaves identically like <content>.

ElementCreateShadwoRoot (representative for Shadow DOM V0 usage)
https://www.chromestatus.com/metrics/feature/timeline/popularity/456

HTMLContentElement
https://www.chromestatus.com/metrics/feature/timeline/popularity/1896


#6

I don’t know (yet) what url means in the data. Are these really different sites/domains, or unique URLs, which can be on a lot less domains?


#7

Here is the query I used to detect which sites were using HTML Imports:

#standardSQL
SELECT
  url
FROM
  `httparchive.pages.2018_07_15_desktop`
WHERE
  JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.HTMLImports') IS NOT NULL

The HTTP Archive extracts the blink and css feature usage info and includes it in the pages data. The feature number to name matching is here.

It looks like 456 is “ElementCreateShadowRoot” and 1896 is “HTMLContentElement” (there is also “HTMLShadowElement”).

This will get you the pages that tripped ElementCreateShadowRoot (4,773 of them):

#standardSQL
SELECT
  url
FROM
  `httparchive.pages.2018_07_15_desktop`
WHERE
  JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.ElementCreateShadowRoot') IS NOT NULL

This will get you the pages that tripped HTMLContentElement (89,175 of them):

#standardSQL
SELECT
  url
FROM
  `httparchive.pages.2018_07_15_desktop`
WHERE
  JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.HTMLContentElement') IS NOT NULL

#8

@nhoizey in this context url is the URL of the origin being tested. We test one page per origin, which is defined as a unique protocol, subdomain, and domain for a website. We’ve deduplicated the list of origins having the same host (subdomain and domain) and preferred the HTTPS version.

The page we test is always just the root / path or home/landing page.


#9

The most popular plaintext domain is blogger.com with 41,630 URLs. Out of a sample of 100 URLs, they are all variations of the page www.blogger.com/followers.

For example: https://www.blogger.com/followers.g?blogID=6947649422197228177&colors=Cgt0cmFuc3BhcmVudBILdHJhbnNwYXJlbnQaByMxMDQzNWQiByMyMjg4YmIqByNmZmZmZmYyByMwMDAwMDA6ByMxMDQzNWRCByMyMjg4YmJKByM5OTk5OTlSByMyMjg4YmJaC3RyYW5zcGFyZW50&pageSize=21&origin=https://universodascalopsitas.blogspot.com/&usegapi=1&jsh=m;//scs/apps-static//js/k%3Doz.gapi.en_US.hfiMrY347qE.O/m%3D__features__/am%3DwQ/rt%3Dj/d%3D1/rs%3DAGLTcCMOrzLFQ_Qou2Cj9qH2b2vdRcf4zQ&bpli=1&pli=1

<plaintext></plaintext>

It doesn’t even have any content.

The next most popular domain is google.com (20,617 URLs), but it’s actually just a redirect to the previous page.

https://accounts.google.com/ServiceLogin?continue=https://www.blogger.com/followers.g?blogID%3D7419705059708931073…

Similarly, twitter.com is the most popular domain that uses menuitem (59,238 URLs). Sampling those URLs, they’re all the embedded Follow button:

https://platform.twitter.com/widgets/follow_button.bed9e19e565ca3b578705de9e73c29ed.en.html

  <menu type="context" id="menu" data-scribe="component:contextmenu">
    <menuitem id="m-follow" label="Follow user"></menuitem>
    <menuitem id="m-profile" label="View user on Twitter"></menuitem>
    <menuitem id="m-tweet" label="Send Tweet to user"></menuitem>
  </menu>

#10

Ok, thanks for the detailed answer!