Annual / seasonal trends in HTTP Archive data

I saw a reduction in the number of sites between January and February in 2022 for the technologies I’m watching, and so I took a look at the overall number of sites sampled in those months to determine whether the reduced count could be related to the number of sites samples in a given month.

I think I see an annual temporary drop in the number of sampled sites 2019, 2020, 2021, 2022 from January to February.

Does the team have an idea why that may happen each year in the same fashion?

Reference: HTTP Archive: State of the Web

Thanks in advance!

My thoughts -

  • Based on my experience, eCommerce sites tend to see a drop in traffic in Jan compared to previous months.
  • Roughly 20% of sites are eCommerce sites (as per Ecommerce | 2021 | The Web Almanac by HTTP Archive). Actually it should be more as Wappalyzer doesn’t detect all eCommerce sites
  • My understanding is that Jan crawl uses a URL list to crawl from Dec CRUX data (that’s why you see drop in Feb instead of Jan)… @rviscomi should be able to confirm this.

This may result in seasonal drop in Feb every year as for an origin to be included in CRUX, it needs to cross certain traffic threshold. So my guess here is that a % of eCommerce sites are dropping from CRUX based on low traffic levels in Jan…

thoughts?

When we start the crawl on February 1, the most recently available CrUX dataset is the one published on the second Tuesday of January, which includes data from the entire month of December. So the dips in the number of desktop URLs in February actually correspond with desktop users visiting fewer websites in December.

1 Like

I’d suspected the same, but you provided more detail the Ecommerce-specific report. Thank you for that!

Thank you for being so responsive, @rviscomi. That confirms what I suspected.