I know that the FAQ explains that each URL is visited 3 times and the median result is recorded. However, I noticed that the response to this post in the Meta section indicates that that the monthly crawl can take several weeks to complete. Taking these two together, I am curious if the repeat measurements mentioned in the FAQ are made consecutively after the initial visit or are they scattered throughout the crawl. If the 3 visits are made consecutively, are there methods to account for the differences in networking conditions between two sites which were measured at different times of day? For example, there is likely to be more traffic and congestion in the Internet at 10am when compared to 2am.
The FAQ actually needs to be updated. We’ve reduced the number of runs per test from 3 to 1 to be able to handle the 10x sample size increase last year.
Network conditions are a valid source of concern if what you’re trying to measure is affected by round trip time (RTT). In synthetic testing like this, however, it’s impossible to get a test that’s representative of a typical user experience for all ~5M websites each month. Some users are closer to the web server, some users have slower connections, etc. Similarly, some users have slower phones or laptops and so JS execution may take longer. So I’d recommend that you don’t read too closely into time-based measurements in synthetic testing.
If you want to understand how long it typically takes users to load a page, I strongly recommend using a real-user dataset like the Chrome UX Report instead.
To answer your underlying question though, when we did run the test 3 times, it was consecutively.
I actually use the httparchive.org results to benchmark a group of sites
that we monitor more frequently, because who monitors the monitor and
over the time the httparchive.org results can be considered indicative
and a reasonable basis for comparison. However, since the number of
tests per run has been reduced from three to one the variance for any
particular site has gone up. This usually seems to be down to “network
effects” and these can be extremely difficult to track.