HTTP/2 Adoption

How many sites are supporting HTTP/2, and what is the trend? For those sites, how does page load performance compare if the page is loaded via HTTP/2 vs forced to downgrade to HTTP/1?

1 Like

I’m sure it’s possible to answer this using httparchive+bigquery though I don’t have a direct answer, but here’s a relevant project that comes to mind: http://isthewebhttp2yet.com/ (might be interesting to compare)

I would go with something like this:
SELECT reqHttpVersion, count(reqHttpVersion) as how_many FROM [httparchive:runs.2016_05_15_requests_mobile], [httparchive:runs.2016_05_01_requests_mobile] GROUP BY reqHttpVersion ORDER BY how_many DESC
With Ilya’s great project (httparchive+BQ) you can gain the answers to this interesting question.

Hmm, our HAR output doesn’t record the negotiated protocol of the associated socket, and it probably should! @patmeenan am I overlooking anything here?

Related: $.response.httpVersion seems to return some rather interesting garbage values in some cases… I’ll have to dig into that later.

Doesn’t look like it is currently recorded. I’m in adding stream ID, weights and dependencies right now so I’ll make sure it gets included and it should be in the next crawl.

A few things to note though:

  • This is on a request-by-request basis. There’s no page-level “this page was served over HTTP/2” because that doesn’t usually have much meaning with all of the 3rd-parties usually involved
  • This will only tell you which sites served some content over HTTP/2, not how they would perform if you disabled HTTP/2
  • You generally don’t want to get performance data from the HTTP/Archive

My recommendation would be to collect the list of “interesting” sites (sites that are on HTTP/2) from bigquery once the fields are added and reliable then run a separate performance-focused test where you test the site as-is and with HTTP/2 disabled.

2 Likes

I just pushed an update that will add several HTTP/2 fields to the HAR’s starting with the next crawl (on a per-request basis):

            "_protocol": "HTTP/2",
            "_http2_stream_id": "3",
            "_http2_stream_dependency": "0",
            "_http2_stream_weight": "256",
            "_http2_stream_exclusive": "1",

Right now the only “protocol” reported is HTTP/2 but eventually I’ll roll-in HTTP/1.1 and QUIC support.

3 Likes

Hey guys,

related to this topic, there’s an ongoing discussion about upgrading the spec (HAR 1.3 ?) and what information is lacking from its current iteration, the conversation started on the HAR Mailing List:
https://groups.google.com/forum/#!topic/http-archive-specification/0Nvyx65q3oI

following up on this discussion, I’ve created a github repo and transfer the 1.2 spec over and created a 1.3 proposal: https://github.com/ahmadnassri/har-spec

particularly of interest are Andrew’s suggestions on postData and timings improvements: https://github.com/ahmadnassri/har-spec/pulls

a github repo with markdown format makes it a lot easier for edits / suggestions and community contribution.

(I feel that github is a great medium for allowing people to add suggestions and discussions, beyond a mailing list) happy to donate that repo to the an official group / committee that can oversee collaborative work on the spec, but of course the browser vendors are key to this collaboration.

I’ve recently joined the Open API Initiative in forming the future of Open API Spec (formerly known as Swagger) and we’re seeing some success in having an official group of committed owners from businesses directly interacting with the spec and leading its evolution.

perhaps we need something official around HAR as well to help steer the future of the spec and tooling around it.

what do you think is the appropriate channel to discuss? here, mailing list? github? elsewhere?

@ahmad big +1 to migrating the work to GitHub. Re, your repo: is @janodvarko open to revving the spec, etc? Want to make sure we don’t end up with a ‘hard fork’ by accident.

Just checking in… is there still not a way to use BigQuery to search on HTTP/2 adoption? I don’t see it in the schema of the most recent run but I could be missing something.

There aren’t going to be separate database fields in the schema. It will require querying against the full HAR dataset (I believe Ilya has a query example as a sticky)

I used this (example) to analyze HTTP/2

-- finds HTTP/2 hosts and request counts by host for each page

SELECT page,  JSON_EXTRACT(payload, '$._host') as host,
JSON_EXTRACT(payload, '$._ip_addr') as ipaddr, count(*) as urlcnt
FROM [httparchive:har.2016_11_01_chrome_requests]
WHERE
JSON_EXTRACT(payload, '$._protocol') = '\"HTTP/2\"'
group by page,   host, ipaddr
order by page,   host, ipaddr, urlcnt desc;

Patrick, is there a design reason not to separate out the contents of the
requests into discrete fields? Right now BigQuery has to parse scan the
entire JSON response to find fields such as _protocol. Perhaps the
payload column could be augmented with additional columns?

I think these could be useful to break out.

  • request.method
  • request.url
  • request.headersize
  • request.headers.user-agent
  • response.headersize
  • response.bodysize
  • response.headers,cache-control
  • response.headers.content-type
  • resposne.content.mime-type
  • timings - JSON object as string
  • several of the _ fields such as _protocol, _was_pushed, _host

The database schema would get out of control pretty quickly because new fields get added very regularly and there are a bunch of things reported in the JSON that aren’t even in the field list (breakdowns by mime type, script time on the main thread, etc).

If something is proven to be valuable for ongoing analysis then it’s worth looking at moving into a dedicated field but the JSON is there so ad-hoc analysis can be done first to prove the value.

Thanks Patrick.

I was not thinking of having every possible field broken out - just the ones that might be most popular for analysis.

However - who knows what we might consider as popular :slight_smile:

I don’t suppose there is a usage tracking log feature within BigQuery, so we could look at the various ways users are analyzing the data?

Happy Thanksgiving!

This question was originally asked in July 2016. I don’t think we had a good answer for this pretty important question more than a year later. Does anyone have any updates or ideas?

Using the _protocol field in the HAR, here’s a query that computes the distribution of protocols over all requests:

#standardSQL
# This query processes 281 GB.
SELECT
  protocol,
  frequency,
  ROUND(frequency / SUM(frequency) OVER (), 4) AS pct
FROM (
  SELECT
    JSON_EXTRACT(payload, '$._protocol') AS protocol,
    COUNT(0) AS frequency
  FROM
    `httparchive.requests.2018_05_01_desktop`
  GROUP BY
    protocol
  ORDER BY
    frequency DESC)
protocol frequency pct
“http/1.1” 30486682 0.6135
“HTTP/2” 18900424 0.3803
252205 0.0051
“http/1.0” 53113 0.0011
“http/0.9” 92 0

I also modified the query to compute the distribution of protocols on a per-host basis:

#standardSQL
# This query processes 287 GB.
SELECT
  protocol,
  frequency,
  ROUND(frequency / SUM(frequency) OVER (), 4) AS pct
FROM (
  SELECT
    DISTINCT protocol,
    COUNT(0) AS frequency
  FROM (
    SELECT
      NET.HOST(url) AS host,
      JSON_EXTRACT(payload, '$._protocol') AS protocol
    FROM
      `httparchive.requests.2018_05_01_desktop`
    GROUP BY
      host,
      protocol)
  GROUP BY
    protocol)
ORDER BY
  frequency DESC
protocol frequency pct
“http/1.1” 832201 0.7686
“HTTP/2” 208280 0.1924
32645 0.0301
“http/1.0” 9613 0.0089
“http/0.9” 36 0

So 19% of hosts support HTTP/2 but those hosts are responsible for 38% of requests.

And finally, here’s a look at the most popular hosts grouped by protocol:

#standardSQL
# This query processes 287 GB.
SELECT
  NET.HOST(url) AS host,
  JSON_EXTRACT(payload, '$._protocol') AS protocol,
  COUNT(0) AS frequency
FROM
  `httparchive.requests.2018_05_01_desktop`
GROUP BY
  host,
  protocol
ORDER BY
  frequency DESC
LIMIT
  100
host protocol frequency
www.facebook.com “HTTP/2” 1181019
www.google-analytics.com “HTTP/2” 817913
googleads.g.doubleclick.net “HTTP/2” 629224
fonts.gstatic.com “HTTP/2” 573159
www.google.com “HTTP/2” 452129
pagead2.googlesyndication.com “HTTP/2” 428947
tpc.googlesyndication.com “HTTP/2” 407214
www.youtube.com “HTTP/2” 329903
pbs.twimg.com “HTTP/2” 304498
connect.facebook.net “HTTP/2” 295173
securepubads.g.doubleclick.net “HTTP/2” 269169
fonts.googleapis.com “HTTP/2” 262295
cm.g.doubleclick.net “HTTP/2” 246085
apis.google.com “HTTP/2” 240016
stats.g.doubleclick.net “HTTP/2” 238326
cdn.shopify.com “HTTP/2” 209773
fonts.gstatic.com “http/1.1” 196236
ib.adnxs.com “http/1.1” 180842
mc.yandex.ru “http/1.1” 180436
us-u.openx.net “http/1.1” 165821
platform.twitter.com “HTTP/2” 156830
adservice.google.com “HTTP/2” 143828
x.bidswitch.net “http/1.1” 135007
pagead2.googlesyndication.com “http/1.1” 134647
maps.googleapis.com “HTTP/2” 131151
tags.bluekai.com “http/1.1” 123320
match.adsrvr.org “HTTP/2” 116983
syndication.twitter.com “HTTP/2” 116906
www.googletagmanager.com “HTTP/2” 114269
vk.com “HTTP/2” 110783
pixel.rubiconproject.com “http/1.1” 110088
cdnjs.cloudflare.com “HTTP/2” 109794
ps.eyeota.net “http/1.1” 108815
fonts.googleapis.com “http/1.1” 98488
s0.2mdn.net “HTTP/2” 96512
ajax.googleapis.com “HTTP/2” 91790
d.adroll.com “http/1.1” 91400
scontent.fsjc1-3.fna.fbcdn.net “HTTP/2” 89117
image2.pubmatic.com “http/1.1” 86412
sync.mathtag.com “http/1.1” 85389
www.gstatic.com “HTTP/2” 83826
use.typekit.net “HTTP/2” 83492
staticxx.facebook.com “HTTP/2” 82380
simage2.pubmatic.com “http/1.1” 82151
dpm.demdex.net “http/1.1” 82106
maxcdn.bootstrapcdn.com “http/1.1” 81956
pixel.mathtag.com “http/1.1” 79862
i.ytimg.com “HTTP/2” 79474
cm.g.doubleclick.net “http/1.1” 76038
s3.amazonaws.com “http/1.1” 72547
www.googleadservices.com “HTTP/2” 71935
scontent.cdninstagram.com “HTTP/2” 69554
i0.wp.com “HTTP/2” 67180
trk.vidible.tv “http/1.1” 66963
secure.adnxs.com “http/1.1” 66737
i1.wp.com “HTTP/2” 63931
i2.wp.com “HTTP/2” 63836
sb.scorecardresearch.com “http/1.1” 63234
ssl.google-analytics.com “HTTP/2” 61806
idsync.rlcdn.com “http/1.1” 58777
pp.userapi.com “HTTP/2” 58277
sync-tm.everesttech.net “HTTP/2” 58220
p.adsymptotic.com “http/1.1” 56253
scontent-sjc3-1.xx.fbcdn.net “HTTP/2” 53716
ap.lijit.com “http/1.1” 53216
ce.lijit.com “http/1.1” 52880
beacon.krxd.net “http/1.1” 52652
www.google.com “http/1.1” 52412
match.adsrvr.org “http/1.1” 52381
sync.search.spotxchange.com “http/1.1” 52097
ads.adaptv.advertising.com “http/1.1” 50807
eb2.3lift.com “http/1.1” 50687
px.moatads.com “http/1.1” 50454
counter.yadro.ru “http/1.1” 49756
video.fsjc1-3.fna.fbcdn.net “HTTP/2” 49029
stags.bluekai.com “http/1.1” 48930
an.yandex.ru “http/1.1” 48683
cdn.bannerflow.com “HTTP/2” 47688
pixel.advertising.com “HTTP/2” 46041
pixel.tapad.com “HTTP/2” 45691
bcp.crwdcntrl.net “http/1.1” 45053
platform.twitter.com “http/1.1” 43949
hm.baidu.com “http/1.1” 42990
ml314.com “http/1.1” 42890
p.rfihub.com “http/1.1” 42727
ocsp.digicert.com 40916
4.bp.blogspot.com “HTTP/2” 40046
idsync.rlcdn.com “HTTP/2” 40000
staticxx.facebook.com “http/1.1” 39980
1.bp.blogspot.com “HTTP/2” 39634
ajax.googleapis.com “http/1.1” 39488
2.bp.blogspot.com “HTTP/2” 39429
3.bp.blogspot.com “HTTP/2” 39308
bh.contextweb.com “http/1.1” 39277
abs.twimg.com “HTTP/2” 39074
b.scorecardresearch.com “http/1.1” 38727
s7.addthis.com “http/1.1” 38664
aa.agkn.com “http/1.1” 37002
bat.bing.com “HTTP/2” 36864
rtb-csync.smartadserver.com “http/1.1” 36658

It’s interesting to see some hosts serving over both HTTP 2 and 1.1 (eg fonts.gstatic.com, platform.twitter.com) and some hosts that are predominantly HTTP/1.1 (eg maxcdn.bootstrapcdn.com).

2 Likes

I heard HTTP and HTTPS not HTTP2 so its new to me.

I ran a timeseries of H2 adoption as a percent of all requests:

image

Raw data
Query (Warning: 19.6 TB!!)

A few interesting observations:

  • It didn’t get started until August 2016. Not sure of the significance of this date. Maybe it was just when the protocol field was exposed in WebPageTest (cc @patmeenan)
  • Visually, the trend is slightly better than linear. This could be the start of exponential growth similar to HTTPS.
  • Mobile seems to be outpacing desktop. Mobile does have fewer requests overall, but the gap is closing, not widening.
1 Like

http/2 couldn’t be tracked before the switch to Chrome. In general, mobile browsers have more up to date browsers, ie. no Internet Explorer <= 1, that you still find on desktops, especially in corporate environments. But the difference is what I’d expect. Takeup is likely to increase as hosting becomes more unified and CDNs become standard. We’ve been running some tests to see if http/2 brings much benefit to clients and the big news is that at least it doesn’t slow things down. Developers will adopt it once they realise that it makes some of the optimisations obsolete.

I’d expect a larger discrepancy between mobile and desktops when it comes to IPv6 which is now standard on some, especially, Asian networks, but which I don’t think we cover. Well, it’s more of a client/infrastructure issue: serving to an IPv6 client shouldn’t really still be a problem but without the relevent DNS settings, it will be! :wink:

2 Likes

The latest stats show continued HTTP/2 adoption. Was 40% last we checked in June 2018.

#standardSQL
# WARNING: 4.04 TB query!
SELECT
  _TABLE_SUFFIX AS client,
  ROUND(SUM(IF(JSON_EXTRACT_SCALAR(payload, '$._protocol') = 'HTTP/2', 1, 0)) * 100 / COUNT(0), 2) AS percent
FROM
  `httparchive.requests.2019_02_01_*`
GROUP BY
  client
client percent
mobile 49.84
desktop 49.72
1 Like

Hello Team. I may be wrong but I’m not sure the queries above work anymore. What’s the recommended way to query for HTTP/2 adoption these days?