Use of deprecated HTML features on the web?

The HTML standard contains a list of deprecated elements + attributes and features:
https://html.spec.whatwg.org/multipage/obsolete.html#non-conforming-features

Would anyone be willing to dig in and do some analysis on existing use of these elements on the web? :slight_smile:

For example, which are the most frequently used, and are there any common culprits / examples of where they’re being used? In particular, this question recently came up in context of use of <plaintext>, and I’d love to understand where and why it’s still used on the web.

1 Like

Can’t resist.

I took the list of deprecated elements and created a regex pattern: Array.from($0.querySelectorAll('dfn code')).map(e => e.innerText).join('|') which results in: "applet|acronym|bgsound|dir|noframes|isindex|keygen|listing|menuitem|nextid|noembed|plaintext|rb|rtc|strike|xmp|basefont|big|blink|center|font|multicol|nobr|spacer|tt|marquee".

I’m matching elements if the tag name is preceded by a <. Not super robust but a good place to start.

#standardSQL
SELECT
  LOWER(tag) AS tag,
  COUNT(0) AS frequency,
  COUNT(DISTINCT url) AS urls
FROM (
  SELECT
    url,
    REGEXP_EXTRACT_ALL(body, '(?i)<(applet|acronym|bgsound|dir|noframes|isindex|keygen|listing|menuitem|nextid|noembed|plaintext|rb|rtc|strike|xmp|basefont|big|blink|center|font|multicol|nobr|spacer|tt|marquee)') AS tags
  FROM
    `httparchive.response_bodies.2018_07_01_desktop`),
  UNNEST(tags) AS tag
GROUP BY
  tag
ORDER BY
  frequency DESC

WARNING: 2.5 TB!

tag frequency urls
font 5143594 223473
center 1070270 217903
tt 333876 41528
nobr 225042 30122
big 101907 18438
strike 98386 12867
menuitem 78198 9896
xmp 78162 62446
plaintext 65701 57276
marquee 49669 25423
rb 41385 4954
dir 32802 4690
basefont 27476 646
acronym 25012 2687
applet 12842 5128
blink 11097 4689
noframes 9850 4700
spacer 8141 688
noembed 7659 852
bgsound 5692 781
listing 2886 640
rtc 230 53
nextid 193 165
keygen 109 40
isindex 73 64
multicol 19 16

Anyone else want to look into the deprecated attributes?

Edit: @HenriHelvetica pointed out that the original query didn’t include marquee. I’ve rerun the query and updated the table.

1 Like

57K sites using plaintext, really? Wow! What the heck are they using it for…

clearly, centring elements isn’t hard: 1M+ uses. :eyes:

BTW, we are now trying to deprecate Shadow DOM V0, which used <content> and <shadow>.
Now Blink is the only engine that supports Shadow DOM V0, and its usage is estimated to be ~2%
of page views, but usage of <content> seems much higher than that.
<shadow> is used for “multiple shadow roots” feature, which was a part of Shadow DOM V0 spec
but the support code is already removed from Blink, so now <shadow> behaves identically like <content>.

ElementCreateShadwoRoot (representative for Shadow DOM V0 usage)
https://www.chromestatus.com/metrics/feature/timeline/popularity/456

HTMLContentElement
https://www.chromestatus.com/metrics/feature/timeline/popularity/1896

I don’t know (yet) what url means in the data. Are these really different sites/domains, or unique URLs, which can be on a lot less domains?

Here is the query I used to detect which sites were using HTML Imports:

#standardSQL
SELECT
  url
FROM
  `httparchive.pages.2018_07_15_desktop`
WHERE
  JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.HTMLImports') IS NOT NULL

The HTTP Archive extracts the blink and css feature usage info and includes it in the pages data. The feature number to name matching is here.

It looks like 456 is “ElementCreateShadowRoot” and 1896 is “HTMLContentElement” (there is also “HTMLShadowElement”).

This will get you the pages that tripped ElementCreateShadowRoot (4,773 of them):

#standardSQL
SELECT
  url
FROM
  `httparchive.pages.2018_07_15_desktop`
WHERE
  JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.ElementCreateShadowRoot') IS NOT NULL

This will get you the pages that tripped HTMLContentElement (89,175 of them):

#standardSQL
SELECT
  url
FROM
  `httparchive.pages.2018_07_15_desktop`
WHERE
  JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.HTMLContentElement') IS NOT NULL
1 Like

@nhoizey in this context url is the URL of the origin being tested. We test one page per origin, which is defined as a unique protocol, subdomain, and domain for a website. We’ve deduplicated the list of origins having the same host (subdomain and domain) and preferred the HTTPS version.

The page we test is always just the root / path or home/landing page.

The most popular plaintext domain is blogger.com with 41,630 URLs. Out of a sample of 100 URLs, they are all variations of the page www.blogger.com/followers.

For example: https://www.blogger.com/followers.g?blogID=6947649422197228177&colors=Cgt0cmFuc3BhcmVudBILdHJhbnNwYXJlbnQaByMxMDQzNWQiByMyMjg4YmIqByNmZmZmZmYyByMwMDAwMDA6ByMxMDQzNWRCByMyMjg4YmJKByM5OTk5OTlSByMyMjg4YmJaC3RyYW5zcGFyZW50&pageSize=21&origin=https://universodascalopsitas.blogspot.com/&usegapi=1&jsh=m;//scs/apps-static//js/k%3Doz.gapi.en_US.hfiMrY347qE.O/m%3D__features__/am%3DwQ/rt%3Dj/d%3D1/rs%3DAGLTcCMOrzLFQ_Qou2Cj9qH2b2vdRcf4zQ&bpli=1&pli=1

<plaintext></plaintext>

It doesn’t even have any content.

The next most popular domain is google.com (20,617 URLs), but it’s actually just a redirect to the previous page.

https://accounts.google.com/ServiceLogin?continue=https://www.blogger.com/followers.g?blogID%3D7419705059708931073…

Similarly, twitter.com is the most popular domain that uses menuitem (59,238 URLs). Sampling those URLs, they’re all the embedded Follow button:

https://platform.twitter.com/widgets/follow_button.bed9e19e565ca3b578705de9e73c29ed.en.html

  <menu type="context" id="menu" data-scribe="component:contextmenu">
    <menuitem id="m-follow" label="Follow user"></menuitem>
    <menuitem id="m-profile" label="View user on Twitter"></menuitem>
    <menuitem id="m-tweet" label="Send Tweet to user"></menuitem>
  </menu>
1 Like

Ok, thanks for the detailed answer!

For the Markup chapter of the Web Almanac I revisited this topic. Here are the latest results:

#standardSQL
# 03_01a: % of pages with deprecated elements
CREATE TEMPORARY FUNCTION containsDeprecatedElement(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var elements = JSON.parse($._element_count)
  var deprecatedElements = new Set(["applet","acronym","bgsound","dir","frame","frameset","noframes","isindex","keygen","listing","menuitem","nextid","noembed","plaintext","rb","rtc","strike","xmp","basefont","big","blink","center","font","marquee","multicol","nobr","spacer","tt"]);
  return !!Object.keys(elements).find(e => {
    return deprecatedElements.has(e);
  });
} catch (e) {
  return false;
}
''';

SELECT
  _TABLE_SUFFIX AS client,
  COUNTIF(containsDeprecatedElement(payload)) AS pages,
  ROUND(COUNTIF(containsDeprecatedElement(payload)) * 100 / COUNT(0), 2) AS pct_pages
FROM
  `httparchive.pages.2019_07_01_*`
GROUP BY
  client
client pages % of pages with deprecated elements
mobile 804387 15.18%
desktop 705134 16.13%

Top deprecated elements:

#standardSQL
# 03_01b: Top deprecated elements
CREATE TEMPORARY FUNCTION getElements(payload STRING)
RETURNS ARRAY<STRING> LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var elements = JSON.parse($._element_count)
  return Object.keys(elements);
} catch (e) {
  return [];
}
''';

CREATE TEMPORARY FUNCTION isDeprecated(element STRING) AS (
  element IN ("applet","acronym","bgsound","dir","frame","frameset","noframes","isindex","keygen","listing","menuitem","nextid","noembed","plaintext","rb","rtc","strike","xmp","basefont","big","blink","center","font","marquee","multicol","nobr","spacer","tt")
);

SELECT
  _TABLE_SUFFIX AS client,
  element AS deprecated,
  COUNT(0) AS freq,
  SUM(COUNT(0)) OVER (PARTITION BY _TABLE_SUFFIX) AS total,
  ROUND(COUNT(0) * 100 / SUM(COUNT(0)) OVER (PARTITION BY _TABLE_SUFFIX), 2) AS pct
FROM
  `httparchive.pages.2019_07_01_*`,
  UNNEST(getElements(payload)) AS element
WHERE
  isDeprecated(element)
GROUP BY
  client,
  deprecated
ORDER BY
  freq DESC,
  client
client deprecated freq total pct
mobile center 421571 1010479 41.72
mobile font 390929 1010479 38.69
desktop center 363308 887408 40.94
desktop font 350413 887408 39.49
mobile marquee 63378 1010479 6.27
desktop marquee 46620 887408 5.25
desktop nobr 31002 887408 3.49
mobile nobr 29247 1010479 2.89
mobile big 24996 1010479 2.47
desktop big 23030 887408 2.6
mobile frame 18649 1010479 1.85
mobile frameset 18387 1010479 1.82
desktop frame 17147 887408 1.93
desktop frameset 16902 887408 1.9
mobile noframes 14531 1010479 1.44
mobile strike 14346 1010479 1.42
desktop strike 14270 887408 1.61
desktop noframes 11060 887408 1.25

Kind of surprised to see such a big change since my last analysis. Namely, center taking over the top spot from font and marquee jumping up the list. This could be due to the corpus changes that happened between these times, with our dataset reaching more of the tail of the web. The methodologies are also a bit different, before I was querying over the HTML itself with a regular expression, and now I’m using a custom metric that extracts each tag at runtime.

You can explore the full results in this sheet.