How many sites are supporting HTTP/2, and what is the trend? For those sites, how does page load performance compare if the page is loaded via HTTP/2 vs forced to downgrade to HTTP/1?
I’m sure it’s possible to answer this using httparchive+bigquery though I don’t have a direct answer, but here’s a relevant project that comes to mind: http://isthewebhttp2yet.com/ (might be interesting to compare)
I would go with something like this:
SELECT reqHttpVersion, count(reqHttpVersion) as how_many FROM [httparchive:runs.2016_05_15_requests_mobile], [httparchive:runs.2016_05_01_requests_mobile] GROUP BY reqHttpVersion ORDER BY how_many DESC
With Ilya’s great project (httparchive+BQ) you can gain the answers to this interesting question.
Hmm, our HAR output doesn’t record the negotiated protocol of the associated socket, and it probably should! @patmeenan am I overlooking anything here?
Related: $.response.httpVersion seems to return some rather interesting garbage values in some cases… I’ll have to dig into that later.
Doesn’t look like it is currently recorded. I’m in adding stream ID, weights and dependencies right now so I’ll make sure it gets included and it should be in the next crawl.
A few things to note though:
- This is on a request-by-request basis. There’s no page-level “this page was served over HTTP/2” because that doesn’t usually have much meaning with all of the 3rd-parties usually involved
- This will only tell you which sites served some content over HTTP/2, not how they would perform if you disabled HTTP/2
- You generally don’t want to get performance data from the HTTP/Archive
My recommendation would be to collect the list of “interesting” sites (sites that are on HTTP/2) from bigquery once the fields are added and reliable then run a separate performance-focused test where you test the site as-is and with HTTP/2 disabled.
I just pushed an update that will add several HTTP/2 fields to the HAR’s starting with the next crawl (on a per-request basis):
"_protocol": "HTTP/2", "_http2_stream_id": "3", "_http2_stream_dependency": "0", "_http2_stream_weight": "256", "_http2_stream_exclusive": "1",
Right now the only “protocol” reported is HTTP/2 but eventually I’ll roll-in HTTP/1.1 and QUIC support.
related to this topic, there’s an ongoing discussion about upgrading the spec (HAR 1.3 ?) and what information is lacking from its current iteration, the conversation started on the HAR Mailing List:
following up on this discussion, I’ve created a github repo and transfer the 1.2 spec over and created a 1.3 proposal: https://github.com/ahmadnassri/har-spec
particularly of interest are Andrew’s suggestions on postData and timings improvements: https://github.com/ahmadnassri/har-spec/pulls
a github repo with markdown format makes it a lot easier for edits / suggestions and community contribution.
(I feel that github is a great medium for allowing people to add suggestions and discussions, beyond a mailing list) happy to donate that repo to the an official group / committee that can oversee collaborative work on the spec, but of course the browser vendors are key to this collaboration.
I’ve recently joined the Open API Initiative in forming the future of Open API Spec (formerly known as Swagger) and we’re seeing some success in having an official group of committed owners from businesses directly interacting with the spec and leading its evolution.
perhaps we need something official around HAR as well to help steer the future of the spec and tooling around it.
what do you think is the appropriate channel to discuss? here, mailing list? github? elsewhere?
@ahmad big +1 to migrating the work to GitHub. Re, your repo: is @janodvarko open to revving the spec, etc? Want to make sure we don’t end up with a ‘hard fork’ by accident.
Just checking in… is there still not a way to use BigQuery to search on HTTP/2 adoption? I don’t see it in the schema of the most recent run but I could be missing something.
There aren’t going to be separate database fields in the schema. It will require querying against the full HAR dataset (I believe Ilya has a query example as a sticky)
I used this (example) to analyze HTTP/2
-- finds HTTP/2 hosts and request counts by host for each page SELECT page, JSON_EXTRACT(payload, '$._host') as host, JSON_EXTRACT(payload, '$._ip_addr') as ipaddr, count(*) as urlcnt FROM [httparchive:har.2016_11_01_chrome_requests] WHERE JSON_EXTRACT(payload, '$._protocol') = '\"HTTP/2\"' group by page, host, ipaddr order by page, host, ipaddr, urlcnt desc;
Patrick, is there a design reason not to separate out the contents of the
requests into discrete fields? Right now BigQuery has to parse scan the
entire JSON response to find fields such as _protocol. Perhaps the
payload column could be augmented with additional columns?
I think these could be useful to break out.
- timings - JSON object as string
- several of the _ fields such as _protocol, _was_pushed, _host
The database schema would get out of control pretty quickly because new fields get added very regularly and there are a bunch of things reported in the JSON that aren’t even in the field list (breakdowns by mime type, script time on the main thread, etc).
If something is proven to be valuable for ongoing analysis then it’s worth looking at moving into a dedicated field but the JSON is there so ad-hoc analysis can be done first to prove the value.
I was not thinking of having every possible field broken out - just the ones that might be most popular for analysis.
However - who knows what we might consider as popular
I don’t suppose there is a usage tracking log feature within BigQuery, so we could look at the various ways users are analyzing the data?
This question was originally asked in July 2016. I don’t think we had a good answer for this pretty important question more than a year later. Does anyone have any updates or ideas?