What's the web page speed in kbps?

SELECT type, pages, avg, min, first, tenth, twentieth, thirtieth, fortieth, median,
       sixtieth, seventieth, eightieth, ninetieth, ninety_ninth, max
FROM
(SELECT
  'desktop' type,
  COUNT(*) pages,
  AVG(((bytesTotal / 1024)/(fullyLoaded / 1000))) avg,
  MIN(((bytesTotal / 1024)/(fullyLoaded / 1000))) min,
  NTH(1, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) first,
  NTH(10, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) tenth,
  NTH(20, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) twentieth,
  NTH(30, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) thirtieth,
  NTH(40, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) fortieth,
  NTH(50, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) median,
  NTH(60, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) sixtieth,
  NTH(70, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) seventieth,
  NTH(80, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) eightieth,
  NTH(90, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) ninetieth,
  NTH(99, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) ninety_ninth,
  MAX(((bytesTotal / 1024)/(fullyLoaded / 1000))) max
FROM [httparchive:runs.latest_pages]),
(SELECT
  'mobile' type,
  COUNT(*) pages,
  AVG(((bytesTotal / 1024)/(fullyLoaded / 1000))) avg,
  MIN(((bytesTotal / 1024)/(fullyLoaded / 1000))) min,
  NTH(1, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) first,
  NTH(10, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) tenth,
  NTH(20, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) twentieth,
  NTH(30, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) thirtieth,
  NTH(40, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) fortieth,
  NTH(50, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) median,
  NTH(60, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) sixtieth,
  NTH(70, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) seventieth,
  NTH(80, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) eightieth,
  NTH(90, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) ninetieth,
  NTH(99, quantiles(((bytesTotal / 1024)/(fullyLoaded / 1000)), 101)) ninety_ninth,
  MAX(((bytesTotal / 1024)/(fullyLoaded / 1000))) max
FROM [httparchive:runs.latest_pages_mobile])

I always wondered what could be a good “page speed” metric. Since most pages have a few kbytes of payload and take a few seconds to load kbps (kilobytes per second) could be a good metric. The query above shows the “page speed” in kbps quantiles as well as min, max, avg and count for both desktop and mobile pages.

Desktop vs Mobile aren’t comparable until 2013-06-15 as WebPageTest was running unthrottled wifi over Fiber for Mobile as discussed in this tweet. Hopefully next dump we get more precise with emulated 3G network for mobile.

1 Like

I’ve also run the query above over time, i.e. from httparchive first dump to last one (2013-06-15), for desktop only and got interesting data as you can see in the chart below or in this shared spreadsheet.

What’s up with the sudden increase in 2013-04-01?

I believe that’s due to the switch to new connection profile:

Mar 19 2013 The default connection speed was increased from DSL (1.5
mbps) to Cable (5.0 mbps). This only affects IE (not iPhone).

So, what the graph shows is that most sites up to that point were limited by the DSL profile… And, I think even more importantly - it looks like it’s not the bandwidth, but the RTTs (which we already knew, I guess :)). Prior to the switch the sites in 90th percentile were maxing out at ~120kBps (or 1Mbps, which is still below the 1.5Mbps profile max). After the switch, we jump to ~250kBps, or almost double… The difference? 28ms RTT for cable, and 50ms RTT for DSL. Shorter roundtrips = faster ramp for slow start.

1 Like

That’s interesting, I first start this query to get the median web page speed KBps and it’s been been helpful so far to understand/identify HTTPArchive+WebPagetest connection setup issues/changes at certain points in time.
Regarding maxing out current Cable profile, the tip of the tail has pages loading 370Kb in 270ms; 13Mb in 11s or ~1.3MBps (~double cable profile). I believe they might be repeat view counted as first view.

Not sure I follow this. Where are the 370KB and 13MB numbers coming from? (And we should be careful with bits vs bytes conversions here).

More generally, I’m still trying to grok whether this is a meaningful metric… For one, we’re not really measuring “speed” here, as “speed” depends on amount of data downloaded, how the data is served (number of connections, etc), client link properties, etc. Rather, what the graph shows is the throughput of the page, wrt to fixed connection properties.

So, for example: one could construct two pages, both of which have the same total amount of data, but both of which may have very different throughput based on how and when those requests are being download – e.g. trivial example: sequential vs parallel download of resources.

Turning the question on its head: which sites are unable to saturate the bandwidth link, and which sites are? E.g. excessive sharding, unnecessarily deferred downloads, etc… Of course, once again, we have to be careful with the data here, since we’re extrapolating these conclusions based on totalBytes / totalTime.

1 Like

I looked at the tail with the following query in KB(ytes)ps and Mb(its)ps:

SELECT
  (bytesTotal / 1024) KiloBytes,
  (fullyLoaded / 1000) seconds,
  (bytesTotal / 1024)/(fullyLoaded / 1000) KBps,
  ((bytesTotal / 1024)/(fullyLoaded / 1000)) * 8 / 1024 Mbps
FROM [httparchive:runs.latest_pages]
ORDER BY KBps DESC
LIMIT 10

I intentionally omitted url in this query because some of them are porn websites.

370KB and 13MB are the 1st 2 rows:

If pages are loaded using Cable profile, i.e.: 5.0 Mb(its)ps, are these top 10 results (tail) maxing out this bandwidth? Let me put this way:

  • The second row is http://www.infinitydownline.com/ which has 59
    requests and 13.5MB of total data;
  • The big chunk is an autoplay video with a couple of minutes playback
    which alone has 12MB data;
  • My question is: how can 13.5MB of data including a single 12MB
    resource be downloaded in 11 seconds? It means 1300KB(ytes)ps or
    10Mb(its)ps which is double the Cable 5Mb(its)ps profile. With 5Mb(its)ps bandwidth this page should have been loaded in around 20 seconds. Any chance the video got cached somehow and counted as 1st (fresh)
    view by WPT?

On the other tail, there’re lots of 0 KB(ytes)ps, all coming from 0 totalBytes pages with few ms totalTime, which is clearly error (server or wpt agent) or those pages have intentionally no data (weird for top 300K sites on HTTPArchive).

I’m not convinced this is a meaningful metric either. It might not tell us web page “speed” but it definitely tells us something about httparchive+wpt connection settings. Are these pages always being tested from the same source (WPT agent location)? If so, there should be some consistency on the tests and totalBytes / totalTime might not be so useless.

Ah, the things you discover when you actually start looking at the data… :smile: I think you’re right: these are error cases, and possibly bugs in WPT.

For example, drilling into the infinitydownline.com case: http://httparchive.webpagetest.org/export.php?test=130618_0_3SG (select wpid column in pages)

If you open up the HAR, the FLV is actually being downloaded for ~50s+:

Which, of course, is counter to what “fully loaded” is supposed to report.

It would be nice if we had the error code logged in pages… that would help us track down and eliminate the bad apples. /cc @souders @pmeenan

@marcelduran a quick refactor of the original query… Can you sanity check it:

SELECT type,  
  NTH(10, quantiles(mbit_s, 101)) tenth,
  NTH(20, quantiles(mbit_s, 101)) twentieth,
  NTH(30, quantiles(mbit_s, 101)) thirtieth,
  NTH(40, quantiles(mbit_s, 101)) fortieth,
  NTH(50, quantiles(mbit_s, 101)) median,
  NTH(60, quantiles(mbit_s, 101)) sixtieth,
  NTH(70, quantiles(mbit_s, 101)) seventieth,
  NTH(80, quantiles(mbit_s, 101)) eightieth,
  NTH(90, quantiles(mbit_s, 101)) ninetieth,
  NTH(99, quantiles(mbit_s, 101)) ninety_ninth
FROM (
  SELECT type,
    ROUND(((kBytesTotal/fullyLoaded_s)*8/1000)*1000)/1000 mbit_s
  FROM (
    SELECT 'desktop' type,
      (bytesTotal/1024) kbytesTotal,
      (fullyLoaded/1000) fullyLoaded_s,
    FROM [httparchive:runs.latest_pages]
  ) d, (
    SELECT 'mobile' type,
      (bytesTotal/1024) kbytesTotal,
      (fullyLoaded/1000) fullyLoaded_s,
    FROM [httparchive:runs.latest_pages_mobile]
  ) m
)
GROUP BY type;

1 Like

Histogram of the average (mbps) throughput for desktop:

SELECT mbits_bucket, COUNT(*) pages FROM (
  SELECT pageid,
    ROUND(((kBytesTotal/fullyLoaded_s)*8/1000)*10)/10 mbits_bucket
  FROM (
    SELECT pageid,
      (bytesTotal/1024) kbytesTotal,
      (fullyLoaded/1000) fullyLoaded_s
    FROM [httparchive:runs.latest_pages]
  )
)
GROUP BY mbits_bucket
ORDER BY mbits_bucket;
1 Like

You should only look at tests that have a result of 0 (success) or (99999) content error. That should help with the “0” cases. Looks like there is also something strange with the fully loaded on that one site with an flv that I’ll dig into. Opened an issue to track it: https://github.com/WPO-Foundation/webpagetest/issues/9

Without looking at the code, I stop storing response data after 1MB to avoid blowing out memory. It’s possible that the fully loaded time was getting truncated as a result.

@pmeenan looking at the pages schema, it doesn’t look like HTTP Archive current logs the “run status” – unless I’m overlooking something obvious? It would be really nice if it did… /cc @souders

Could you add a bug to HTTP Archive for “run status”? I’m guessing we really want something else - like a fail/pass boolean, or maybe even better handling of error cases. It doesn’t seem obvious that (runstatus=0 OR runstatus=99999) are “good” results and everything else is bad.

The fix for the fully loaded time with really long-downloading resources was just pushed and I rolled out an update on the httparchive server as well. The 7/15 crawl should have better fully-loaded times for this case.

@pmeenan awesome, thanks! Any thoughts on reporting run status… or rather, what to report? It’s not clear to me why we would want to look at 99999 (content error)?

@souders I don’t know what all the WPT error codes are, but my vote would be to pass them through directly - this way we can also debug problems more easily.

99999 (content error) just means that something on the page error’d (more common than not it’s a 404 for some random resource). It’s a little scary how broken the web is but if you exclude 99999’s you’l be excluding a LOT of sites (certainly double-digits percentages).

btw, the list of result codes is here: https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/result-codes