How to download the HTTP Archive data

igrigorik · February 25, 2016, 11:04pm

If you want to use own or local tools to explore the HTTP Archive dataset, then you’ll need a local copy of the relevant data. However, before you rush ahead, do note that the datasets can be very large (e.g. >1TB), so plan accordingly and read through the available options to determine the best option for your particular case.

Downloading the full HAR files

You can download the individual HAR files for each and every site crawled by HTTP Archive. Each HAR file contains the full log of the navigation, all of the associated metadata, and even the response bodies for text content-types (e.g. HTML, CSS, JavaScript).

Demo HAR file for google.com, as recorded by HTTP Archive on 01/01/2015:

gist.github.com

https://gist.github.com/igrigorik/d76c301c2a1aed98a8a6

google.har.json

{
  "log": {
    "version": "1.1",
    "creator": {
      "name": "WebPagetest",
      "version": "2.18"
    },
    "pages": [{
      "startedDateTime": "2016-01-01T16:30:10.000+00:00",
      "title": "Run 1, First View for http://www.google.com/",

This file has been truncated. show original

To view the available HAR datasets for download:

To download the full list of mobile (Chrome for Android) HAR files for a particular run:

$> gsutil -m rsync gs://httparchive/android-Jan_1_2016 .

Note: the denormalized HAR data is also available via BigQuery: httparchive:har dataset.

Adjust the above bucket name to match the dataset you would like to sync. Also, do keep in mind that you might be downloading hundreds of thousands of individual HAR files to local disk - plan accordingly!

Downloading the summary tables

HTTP Archive builds a set of summary tables from the above HAR dataset. This dataset contains per-page aggregate statistics that are used to power the “trends” and “stats” pages on the site. You can download these tables both in MySQL and CSV formats.

Note: summary tables contain a subset of the data contained by the HAR files but still weigh in at ~5GB in size.

Selective export via BigQuery

If you’re only interested in downloading a subset of the available data, consider using BigQuery to select and export the relevant subset. For example, you can export subset of the available fields, or a subset of the sites (e.g. top 10K), and then use local tools to run your analysis.

Topic		Replies	Views
Download .har files? Analysis	1	1972	February 10, 2018
Downloading HAR-Datasets later than May 2022? Meta	4	1039	October 16, 2023
Does BigQuery contain HAR Archive or cookies of crawled webpages?	13	2956	June 11, 2021
Quickstart guide to exploring the HTTP Archive FAQ	0	19191	March 1, 2016
如何获取api 报文数据，进行学习？ Analysis	1	465	April 28, 2024

How to download the HTTP Archive data

Downloading the full HAR files

Downloading the summary tables

Selective export via BigQuery

Related topics