HTTP Archive collects millions of traces each month. These contain diagnostic information about what Chrome is doing under the hood. See about:tracing for more info.
For example, for each test ID we have a corresponding trace file available for download at http://httparchive.webpagetest.org/getgzip.php?test={TEST_ID}&compressed=1&file=1_trace.json.gz
. The test IDs correspond to the wptid
field in the summary_pages tables.
In order to do bulk analysis of the traces, we’d need to map the rows in the latest summary_pages table to download URLs, download and unzip each trace, and pass them through a tool than can do the analysis.
Here’s a query that can do the ID mapping:
SELECT
url,
CONCAT(
'http://httparchive.webpagetest.org/getgzip.php?test=',
wptid,
'&compressed=1&file=1_trace.json.gz'
) AS trace_url
FROM
`httparchive.summary_pages.2019_04_01_mobile`
Alternatively, @patmeenan are these traces available somewhere that can be more easily transferred over the network, like GCS or FTP? Probably don’t want 5 million download requests hitting the server.
cc @slightlylate, who has shown an interest in doing this kind of bulk analysis