Analyzing traces

rviscomi · May 21, 2019, 5:02pm

HTTP Archive collects millions of traces each month. These contain diagnostic information about what Chrome is doing under the hood. See about:tracing for more info.

For example, for each test ID we have a corresponding trace file available for download at http://httparchive.webpagetest.org/getgzip.php?test={TEST_ID}&compressed=1&file=1_trace.json.gz. The test IDs correspond to the wptid field in the summary_pages tables.

In order to do bulk analysis of the traces, we’d need to map the rows in the latest summary_pages table to download URLs, download and unzip each trace, and pass them through a tool than can do the analysis.

Here’s a query that can do the ID mapping:

SELECT
  url,
  CONCAT(
    'http://httparchive.webpagetest.org/getgzip.php?test=',
    wptid,
    '&compressed=1&file=1_trace.json.gz'
  ) AS trace_url
FROM
  `httparchive.summary_pages.2019_04_01_mobile`

Alternatively, @patmeenan are these traces available somewhere that can be more easily transferred over the network, like GCS or FTP? Probably don’t want 5 million download requests hitting the server.

cc @slightlylate, who has shown an interest in doing this kind of bulk analysis

patmeenan · May 21, 2019, 5:37pm

Yikes, please fetch them from GCS if possible, they are all in the “httparchive” bucket. That will prevent having to make round trips to the server.

They are stored with all of the traces for a given crawl in a directory for that crawl.

i.e…

gs://httparchive/traces-android-May_1_2019/
gs://httparchive/traces-chrome-May_1_2019/

The files themselves are .json.gz:

gs://httparchive/traces-android-May_1_2019/190501_Mx10_204X.json.gz

If you prefer, you can fetch them over HTTPS:

https://storage.googleapis.com/httparchive/traces-android-May_1_2019/190501_Mx10_204X.json.gz

patmeenan · May 21, 2019, 5:38pm

btw, I’m not sure where the documentation for it is right now but there was a Chromium map/reduce tool for processing traces in bulk a few years ago that could process the trace files directly from the HTTP Archive cloud storage bucket.

Topic		Replies	Views
Quickstart guide to exploring the HTTP Archive FAQ	0	19191	March 1, 2016
Analyzing HTML, CSS, and JavaScript response bodies Analysis	6	9507	September 9, 2016
Trafic from webviews Analysis	2	814	May 31, 2019
Data collection in HTTPArchive Analysis	1	1629	January 15, 2019
Download .har files? Analysis	1	1972	February 10, 2018

Analyzing traces

Related topics