Interesting outliers

When I wrote the markup chapter for the almanac in 2019 we found a number of outliers - data that was sort of very “outside expectations”, and often same errors repeated across several sites. As we had some trouble getting the queries right initially, we cross-checked those manually unti we were satisfied that the problems were not how we were collecting or something. As a result, we wound up being able to reach out and help fix thousands of sites, because they had common cause problems. Each year when we write these, I see some outliers that make me very curious. Currently, looking at the CSS data being used for the current almanac, a few that jump out to me: Sites that have really excess (hundreds or even thousands) of stylesheets and stylesheets that use orders of magnitude more images, and animations that intend to take millions of years.

I’m very curious to see a few sample urls to investigate this further - it might or might not be relevant to the chapter but it strikes me that it might be interesting to see if we can get some kind of “outliers report” for those sorts of things in the future? Is that interesting to anyone else?