Are Popular Websites Faster?

There have been many studies done on how quickly a website must load before customers lose focus. I usually use stats like 40% expect a site to load in 3s, and 64% leave if there is not content after 4s. As Ilya has presented in the past – the goal is to get content to glass in less than one second. If you can at least partially paint the screen ASAP, you give the perception of fast content, and your customers will likely stay a bit longer.

Web performance geeks love this stuff, and we love helping websites get faster. As Wim Leers pointed out in his Performance Calendar post this year – the top websites have staff or at least a budget to perform optimizations – perhaps even a team of performance engineers whose sole focus is to make the site faster. In my last post on how fast the web is slowing down, I saw that highly ranked websites are slowing faster than the rest of the web. I ALSO noticed that these websites tend to load much faster than the “rest.” Are these sites more popular because they ARE fast, or did they become popular and then get faster (the proverbial chicken/egg question)?

To attempt to answer this question, I peered into the HTTP Archive. First, I found the distribution of the SpeedIndex (rounded to the nearest second) for various rank distributions:

SELECT INTEGER(ROUND(SpeedIndex/1000)) as SI_bucket, SUM(pages) as pages FROM (  
  SELECT SpeedIndex, COUNT(*) AS pages, rank
  FROM [httparchive:runs.2013_12_15_pages] 
  where rank >500 AND rank <1000
  GROUP BY SpeedIndex, rank
  
) 
GROUP BY SI_bucket  

ORDER BY SI_bucket;

Graphing these, you see that top sites (lines for rank 1-100, 101-500 and 501-1000) peak at ~2 seconds and tail off quickly, while sites with rank over 100k have a lower peak at ~3s, and a larger shoulder in the slower direction. There are still sites with rank over 100k that load in 2 seconds, but the percentage as a group is lower.

This graph is displaying the SpeedIndex (rounded to the nearest second) for all sites within a certain popularity rank. Looking at the same data for mobile pages has a slightly different distribution.(NOTE: HttpArchive mobile has only 4800 sites)

Top ranked mobile sites appear to have 2 peaks – one at 2-3 seconds and a rounded hump of sites around 7 seconds. In reality, I think the Mobile and web graphs are identical, and simply rounding to the nearest second allows for just one point in the sahrp “fast” peak on the highly ranked sites. The mobile sites do have a much larger “slow” shoulder to the initial peak of fast websites – indicating that there is a larger distribution of timings, and there are sites that are extremely slow on mobile.

Ok, so top ranked websites load faster on the desktop and on mobile. Can we quantify HOW much faster? If we grab the SpeedIndex Quantiles for the web and mobile web by rank – we can.

SELECT  date, 
  NTH(10, quantiles(SpeedIndex)) tenth, 
  NTH(20, quantiles(SpeedIndex)) twentieth, 
  NTH(30, quantiles(SpeedIndex)) thirtieth, 
  NTH(40, quantiles(SpeedIndex)) fortieth, 
  NTH(50, quantiles(SpeedIndex)) fiftieth, 
  NTH(60, quantiles(SpeedIndex)) sixtieth, 
  NTH(70, quantiles(SpeedIndex)) seventieth, 
  NTH(80, quantiles(SpeedIndex)) eightieth, 
  NTH(90, quantiles(SpeedIndex)) ninetieth FROM 
     (SELECT STRFTIME_UTC_USEC(INTEGER(createDate*1000000), "%Y-%m") date, rank, SpeedIndex FROM 
 
    httparchive:runs.2013_11_01_pages,httparchive:runs.2013_12_15_pages
    where rank >500000
    AND rank <1000000
    )  GROUP BY  date, order by date 
    ;

By changing the rank values, and building up a table of quantile values by rank – we can see again that for almost every quantile – the higher ranked website is faster. The exception is the top 10th percentile – where sites load in just over a second - there is no room for sites TO get faster.)

This graph is showing the measured SpeedIndex at each 10th percentile for various ranks. Sites that are very popular (low rank) tend to load faster than sites that are less popular at every percentile.

If we measure the difference between each rank category and the dataset of all websites, we can see how much faster (or slower) each ranked set is from the whole. We can then determine the percentage difference for each set of websites (rank and percentile):

This graph tells us that the top 100 websites are in general ~30% faster than the majority of websites. In fact, in every quantile, websites ranked higher than 5,000 outperform the whole (as seen in the chart below):

Mobile is quite similar: the top 1000 websites (of only 4800) are faster than the bulk of websites.

Top sites tend to be faster than all websites

No matter what distribution of websites you look at – the more popular websites trend faster. Even the slowest popular website is much faster than those that are less popular. On the web, the top 500 sites are nearly 1s faster (by the median), and on mobile it is closer to 1.5s faster. This is interesting data. If you are looking to help make your website faster - perhaps looking at the most popular websites - and tehir optimizations would be a good way to get started. As Charles Caleb Colton said in the 1800s, “imitation is the highest form of flattery.”

I plan to dig deeper into this to see if there are characteristics of the top 500 that point to faster load times in future studies. If anyone has any suggestions where to look first, please have a look, or post ideas in the comments.

5 Likes

It’s good to be cautious when using time values from synthetic testing. This is esp. true wrt the HTTP Archive data because each website is only tested 3 times.

I really like this study because it focuses on comparisons of time values rather than the absolute time values. Since this is synthetic testing the absolute time values are highly dependent on the hardware, connection speeds, geo location, etc. which may not be representative of the real world at large. But the test conditions within a crawl are fairly consistent, so comparisons are safe.

This analysis is also good because although the sample size is small per website, this study looks at a large number of websites (100s or 1000s) and is thus less likely to be affected by outliers (a website that happened to be overloaded when tested or further away geographically).

I’m always nervous about people drawing too many conclusions from the time values in HTTP Archive, but this analysis does a great job.

As far as “characteristics … that point to faster load times”, I recommend looking at the correlation charts for start render and load time. Speed Index is more closely associated with start render, so there we see CSS requests & transfer size as highly correlated variables for top 300K URLs. For top 1K URLs it’s max requests on 1 domain, total requests, image requests, & DOM elements.

4 Likes