How representative is a website's homepage?

I’d be really interested to see an analysis into how representative home pages are for the origins in the HTTP Archive. For as long as we’ve been tracking how the web is built, we’ve only been testing home pages. And from those tests we make general statements about the entire web.

My hypothesis is that a home page is usually built using the same frameworks as secondary pages and would tend to have similar UX characteristics.

One idea is to randomly sample some large N number of websites, discover secondary pages somehow (crawler? parsing links on the home page?), and feed those secondary URLs into a one-time HTTP Archive test. When the test is complete we can see how the home page fits into the distribution of UX characteristics for all of the secondary pages.

A caveat with this approach is that the secondary pages (or the home page for that matter) might not actually be pages users predominantly visit. Would be great to validate this somehow. Maybe we can take each discovered secondary URL and use the PageSpeed Insights API to see if there is page-level CrUX data available. If not, that may suggest that there is insufficient data (low popularity).

Especially as we prepare the Web Almanac this year, I’d like to bolster confidence in our dataset that it’s representative of the web. Or if it turns out that home pages are not actually representative of a website, that’s really good info to have and we should change our methodology accordingly.

Would this be useful? Any other ideas for how we might test this hypothesis?

Should be easy enough to look for same-domain links and look at the rectangle for the anchor tag. Maybe favor anchor tags towards the middle of the page and a mix of text-only anchors and anchors for the larger images on the page. That should handle things like product pages and content pages anyway.

I’d probably only bother crawling 1-deep from the home page though.

Been meaning to build something like that for WPT for years but never got around to it.

I agree with Rick on this.

One thing to note is many of these sites are gated behind a password, like Facebook. So those pages will look very different and you wont be able to access them unless you go through the trouble of creating accounts and running a login script, etc.

But does the NYT home page profile differ from an article page is worth examining.

I don’t think that many is really applicable. Facebook, et al. and associated properties may indeed dominate much of the time on the web, but not the number of websites. Then again, the continuing shift to messenging services, may make this moot in a couple of years.

I’d suggest focussing on websites that provide nav elements. These are now fairly well-established in many frameworks and themes and would take a lot of the guess work out of any analysis.

here are my observations:
Home Pages are not representative of the particular brand’s speed. Here are few reasons:

  • not all pages/content would have been served on a CDN
  • home pages would be using one implementation (for e.g. AEM) but search would be implemented as an SPA
  • varies with domains - Retail, Travel Site, Hospitality, Banking etc…

I’d suggest doing a crawl upto a certain limit.

Typically a first level crawl and running a pagespeed/lighthouse would be enough. But the risks here are:

  • you would have a retail scenario with close to 1000+ subURLs for each product type
  • crawlers/robots blocked by the CDN itself

You may need to filter out deep lying directory paths which may show up at 1 level depth of crawling for e.g
…/product/gaming-console/xbox/xbox-games/sports/football/fifa-19 and select the pages probably 2-3 depth in directory path.

Hope it makes sense.

1 Like

From my experience while auditing a lot of sites with clients, I’ll find most of the common problems on the homepage. On the homepage, you’ll find :

  • server configuration,
  • the 3rd parties,
  • the way CSS and JS are included,
  • until now, the weight of the JS / CSS (but maybe route-slitping popularity will slowly change that),
  • the header / footer with hidden things,
  • if lazyloading images is deployed at all,
  • how images are managed in a responsive context,
  • the font problems and already deployed solutions

The Homepage will differ from other pages because there will be for example way too many images. Other pages will have more specific problems like a result page with an heavier JS framework, but globally I find 70% of the performance problems of one website by analyzing the homepage alone (it takes days).

1 Like