Proposal: add a connection_id to requests

Could we add a connection_id to the requests table, and start populating that field by tracking which requests went over the same connection? The connection_id could be unique per page. This would be helpful in understanding the full distribution of requests per connection; as I understand it, currently only the average is average as reqTotal/_connections.

FWIW, I am pretty sure the connection ID is in the raw HAR data and can be queried in bigquery (there is already one reported by WPT that is unique per page and used for the “connection view” waterfalls).

Yep, as Pat said, we already track this data in the HAR dumps. A quick example:

SELECT socket, count(socket) as num_requests 
FROM (
  SELECT 
    JSON_EXTRACT(payload, '$._socket') AS socket,
    url, page
  FROM [har.android_feb_15_2016_requests]
  WHERE page == 'http://www.amazon.com/'
) 
GROUP by socket
ORDER BY num_requests desc

Results:

Amazon.com: 49 sockets in total, sorted by number of requests per socket.

Great! I was hoping it could be added to the mysql dumps, so it’s good to know it’s available in the HAR dumps.

Is this something that makes more sense to propose on the github page? Or is the code that produces the mysql dumps hosted somewhere else? Thanks!

Yes, please open a github issue here: https://github.com/HTTPArchive/httparchive/issues

Done!
https://github.com/HTTPArchive/httparchive/issues/61