How many sites are supporting HTTP/2, and what is the trend? For those sites, how does page load performance compare if the page is loaded via HTTP/2 vs forced to downgrade to HTTP/1?
I’m sure it’s possible to answer this using httparchive+bigquery though I don’t have a direct answer, but here’s a relevant project that comes to mind: http://isthewebhttp2yet.com/ (might be interesting to compare)
I would go with something like this:
SELECT reqHttpVersion, count(reqHttpVersion) as how_many FROM [httparchive:runs.2016_05_15_requests_mobile], [httparchive:runs.2016_05_01_requests_mobile] GROUP BY reqHttpVersion ORDER BY how_many DESC
With Ilya’s great project (httparchive+BQ) you can gain the answers to this interesting question.
Hmm, our HAR output doesn’t record the negotiated protocol of the associated socket, and it probably should! @patmeenan am I overlooking anything here?
Related: $.response.httpVersion seems to return some rather interesting garbage values in some cases… I’ll have to dig into that later.
Doesn’t look like it is currently recorded. I’m in adding stream ID, weights and dependencies right now so I’ll make sure it gets included and it should be in the next crawl.
A few things to note though:
- This is on a request-by-request basis. There’s no page-level “this page was served over HTTP/2” because that doesn’t usually have much meaning with all of the 3rd-parties usually involved
- This will only tell you which sites served some content over HTTP/2, not how they would perform if you disabled HTTP/2
- You generally don’t want to get performance data from the HTTP/Archive
My recommendation would be to collect the list of “interesting” sites (sites that are on HTTP/2) from bigquery once the fields are added and reliable then run a separate performance-focused test where you test the site as-is and with HTTP/2 disabled.
I just pushed an update that will add several HTTP/2 fields to the HAR’s starting with the next crawl (on a per-request basis):
"_protocol": "HTTP/2", "_http2_stream_id": "3", "_http2_stream_dependency": "0", "_http2_stream_weight": "256", "_http2_stream_exclusive": "1",
Right now the only “protocol” reported is HTTP/2 but eventually I’ll roll-in HTTP/1.1 and QUIC support.
related to this topic, there’s an ongoing discussion about upgrading the spec (HAR 1.3 ?) and what information is lacking from its current iteration, the conversation started on the HAR Mailing List:
following up on this discussion, I’ve created a github repo and transfer the 1.2 spec over and created a 1.3 proposal: https://github.com/ahmadnassri/har-spec
particularly of interest are Andrew’s suggestions on postData and timings improvements: https://github.com/ahmadnassri/har-spec/pulls
a github repo with markdown format makes it a lot easier for edits / suggestions and community contribution.
(I feel that github is a great medium for allowing people to add suggestions and discussions, beyond a mailing list) happy to donate that repo to the an official group / committee that can oversee collaborative work on the spec, but of course the browser vendors are key to this collaboration.
I’ve recently joined the Open API Initiative in forming the future of Open API Spec (formerly known as Swagger) and we’re seeing some success in having an official group of committed owners from businesses directly interacting with the spec and leading its evolution.
perhaps we need something official around HAR as well to help steer the future of the spec and tooling around it.
what do you think is the appropriate channel to discuss? here, mailing list? github? elsewhere?
@ahmad big +1 to migrating the work to GitHub. Re, your repo: is @janodvarko open to revving the spec, etc? Want to make sure we don’t end up with a ‘hard fork’ by accident.
Just checking in… is there still not a way to use BigQuery to search on HTTP/2 adoption? I don’t see it in the schema of the most recent run but I could be missing something.
There aren’t going to be separate database fields in the schema. It will require querying against the full HAR dataset (I believe Ilya has a query example as a sticky)
I used this (example) to analyze HTTP/2
-- finds HTTP/2 hosts and request counts by host for each page SELECT page, JSON_EXTRACT(payload, '$._host') as host, JSON_EXTRACT(payload, '$._ip_addr') as ipaddr, count(*) as urlcnt FROM [httparchive:har.2016_11_01_chrome_requests] WHERE JSON_EXTRACT(payload, '$._protocol') = '\"HTTP/2\"' group by page, host, ipaddr order by page, host, ipaddr, urlcnt desc;
Patrick, is there a design reason not to separate out the contents of the
requests into discrete fields? Right now BigQuery has to parse scan the
entire JSON response to find fields such as _protocol. Perhaps the
payload column could be augmented with additional columns?
I think these could be useful to break out.
- timings - JSON object as string
- several of the _ fields such as _protocol, _was_pushed, _host
The database schema would get out of control pretty quickly because new fields get added very regularly and there are a bunch of things reported in the JSON that aren’t even in the field list (breakdowns by mime type, script time on the main thread, etc).
If something is proven to be valuable for ongoing analysis then it’s worth looking at moving into a dedicated field but the JSON is there so ad-hoc analysis can be done first to prove the value.
I was not thinking of having every possible field broken out - just the ones that might be most popular for analysis.
However - who knows what we might consider as popular
I don’t suppose there is a usage tracking log feature within BigQuery, so we could look at the various ways users are analyzing the data?
This question was originally asked in July 2016. I don’t think we had a good answer for this pretty important question more than a year later. Does anyone have any updates or ideas?
_protocol field in the HAR, here’s a query that computes the distribution of protocols over all requests:
#standardSQL # This query processes 281 GB. SELECT protocol, frequency, ROUND(frequency / SUM(frequency) OVER (), 4) AS pct FROM ( SELECT JSON_EXTRACT(payload, '$._protocol') AS protocol, COUNT(0) AS frequency FROM `httparchive.requests.2018_05_01_desktop` GROUP BY protocol ORDER BY frequency DESC)
I also modified the query to compute the distribution of protocols on a per-host basis:
#standardSQL # This query processes 287 GB. SELECT protocol, frequency, ROUND(frequency / SUM(frequency) OVER (), 4) AS pct FROM ( SELECT DISTINCT protocol, COUNT(0) AS frequency FROM ( SELECT NET.HOST(url) AS host, JSON_EXTRACT(payload, '$._protocol') AS protocol FROM `httparchive.requests.2018_05_01_desktop` GROUP BY host, protocol) GROUP BY protocol) ORDER BY frequency DESC
So 19% of hosts support HTTP/2 but those hosts are responsible for 38% of requests.
And finally, here’s a look at the most popular hosts grouped by protocol:
#standardSQL # This query processes 287 GB. SELECT NET.HOST(url) AS host, JSON_EXTRACT(payload, '$._protocol') AS protocol, COUNT(0) AS frequency FROM `httparchive.requests.2018_05_01_desktop` GROUP BY host, protocol ORDER BY frequency DESC LIMIT 100
I ran a timeseries of H2 adoption as a percent of all requests:
A few interesting observations:
- It didn’t get started until August 2016. Not sure of the significance of this date. Maybe it was just when the protocol field was exposed in WebPageTest (cc @patmeenan)
- Visually, the trend is slightly better than linear. This could be the start of exponential growth similar to HTTPS.
- Mobile seems to be outpacing desktop. Mobile does have fewer requests overall, but the gap is closing, not widening.
http/2 couldn’t be tracked before the switch to Chrome. In general, mobile browsers have more up to date browsers, ie. no Internet Explorer <= 1, that you still find on desktops, especially in corporate environments. But the difference is what I’d expect. Takeup is likely to increase as hosting becomes more unified and CDNs become standard. We’ve been running some tests to see if http/2 brings much benefit to clients and the big news is that at least it doesn’t slow things down. Developers will adopt it once they realise that it makes some of the optimisations obsolete.
I’d expect a larger discrepancy between mobile and desktops when it comes to IPv6 which is now standard on some, especially, Asian networks, but which I don’t think we cover. Well, it’s more of a client/infrastructure issue: serving to an IPv6 client shouldn’t really still be a problem but without the relevent DNS settings, it will be!