Analyzing traces

HTTP Archive collects millions of traces each month. These contain diagnostic information about what Chrome is doing under the hood. See about:tracing for more info.

For example, for each test ID we have a corresponding trace file available for download at http://httparchive.webpagetest.org/getgzip.php?test={TEST_ID}&compressed=1&file=1_trace.json.gz. The test IDs correspond to the wptid field in the summary_pages tables.

In order to do bulk analysis of the traces, we’d need to map the rows in the latest summary_pages table to download URLs, download and unzip each trace, and pass them through a tool than can do the analysis.

Here’s a query that can do the ID mapping:

SELECT
  url,
  CONCAT(
    'http://httparchive.webpagetest.org/getgzip.php?test=',
    wptid,
    '&compressed=1&file=1_trace.json.gz'
  ) AS trace_url
FROM
  `httparchive.summary_pages.2019_04_01_mobile`

Alternatively, @patmeenan are these traces available somewhere that can be more easily transferred over the network, like GCS or FTP? Probably don’t want 5 million download requests hitting the server.

cc @slightlylate, who has shown an interest in doing this kind of bulk analysis

1 Like

Yikes, please fetch them from GCS if possible, they are all in the “httparchive” bucket. That will prevent having to make round trips to the server.

They are stored with all of the traces for a given crawl in a directory for that crawl.

i.e…

gs://httparchive/traces-android-May_1_2019/
gs://httparchive/traces-chrome-May_1_2019/

The files themselves are .json.gz:

gs://httparchive/traces-android-May_1_2019/190501_Mx10_204X.json.gz

If you prefer, you can fetch them over HTTPS:

https://storage.googleapis.com/httparchive/traces-android-May_1_2019/190501_Mx10_204X.json.gz
1 Like

btw, I’m not sure where the documentation for it is right now but there was a Chromium map/reduce tool for processing traces in bulk a few years ago that could process the trace files directly from the HTTP Archive cloud storage bucket.

1 Like