I’d be really interested to see an analysis into how representative home pages are for the origins in the HTTP Archive. For as long as we’ve been tracking how the web is built, we’ve only been testing home pages. And from those tests we make general statements about the entire web.
My hypothesis is that a home page is usually built using the same frameworks as secondary pages and would tend to have similar UX characteristics.
One idea is to randomly sample some large N number of websites, discover secondary pages somehow (crawler? parsing links on the home page?), and feed those secondary URLs into a one-time HTTP Archive test. When the test is complete we can see how the home page fits into the distribution of UX characteristics for all of the secondary pages.
A caveat with this approach is that the secondary pages (or the home page for that matter) might not actually be pages users predominantly visit. Would be great to validate this somehow. Maybe we can take each discovered secondary URL and use the PageSpeed Insights API to see if there is page-level CrUX data available. If not, that may suggest that there is insufficient data (low popularity).
Especially as we prepare the Web Almanac this year, I’d like to bolster confidence in our dataset that it’s representative of the web. Or if it turns out that home pages are not actually representative of a website, that’s really good info to have and we should change our methodology accordingly.
Would this be useful? Any other ideas for how we might test this hypothesis?