What are the effects of the COVID-19 pandemic on web usage and experience?

For those just getting started with these datasets, here’s a quick introduction.

Read @paulcalvano’s excellent Getting Started with BigQuery guide to get your environment set up. Paul also has a few sections complete of his Guided Tour that goes into detail for each dataset (WIP).

HTTP Archive

HTTP Archive is a monthly dataset of how the web is built, containing metadata from ~5 million web pages. This is considered “lab data” in that the results come from a single test of a page load, but those results contains a wealth of information about the page.

The March 2020 dataset is available in tables named “2020_03_01” and suffixed with “desktop” or “mobile” accordingly. For example, the 2020_03_01_mobile table of the summary_pages dataset contains summary data about 5,484,239 mobile pages.

Note that the table name corresponds with March 1, 2020 but the tests took place throughout the month of March.

Other datasets exist with different kinds of information about each page:

Dataset Description
pages JSON data about each page including loading metrics, CPU stats, optimization info, and more
requests JSON data about each request including request/response headers, networking info, payload size, MIME type, etc
response_bodies (very expensive) Full payloads for text-based resources like HTML, JS, and CSS
summary_pages A subset of high-level stats per page
summary_requests A subset of high-level stats per request
technologies A list of which technologies are used per page, detected by Wappalyzer
blink_features A list of which JavaScript, HTML, or CSS APIs are used per page
lighthouse (mobile only) Full JSON Lighthouse report containing audit results in areas of accessibility, performance, mobile friendliness, SEO, and more

Chrome UX Report

The Chrome UX Report is a monthly dataset of how the web is experienced. This is considered “field data” in that it is sourced from real Chrome users. Data in this project encapsulates the user experience with a small number of metrics including: time to first byte, first paint, first contentful paint, largest contentful paint, DOM content loaded, onload, first input delay, cumulative layout shift, and notification permission acceptance rates. You can query the data by origin (website) month, country, form factor (desktop/mobile/tablet), and effective connection type (4G, 3G, 2G, slow 2G, offline).

The most recent dataset is 202002 (Februrary 2020). The March 2020 dataset will be released on April 14 and it includes user experience data for the full calendar month of March.

Data for each metric is organized as a histogram so that you can measure the percent of user experiences for a given range of times, for example how often users experience TTFB between 0 and 200 ms. If you want fine-grained control over these ranges, you can query the all or country-specific datasets. If you want to query these ranges over time, you should use the experimental dataset, which is optimized for month-to-month analysis. See this post for more info.

If you don’t need fine-grained control of the histogram ranges, we summarize key “fast”, “average”, and “slow” percentages in the materialized dataset. This is also optimized for month-to-month analysis.


For even more info, check out the Web Almanac methodology for an in-depth explanation of how this transparency data is used for a large-scale research project.

1 Like