How can we explore the carbon impact of websites using the HTTP Archive / Web Almanac?

With the growing interest in ESG (Environmental, Social, and Corporate Governance) it was suggested that there could be ways the HTTP Archive could explore energy consumption features of the web. There may even be a point in the future when the Web Almanac includes a chapter on the topic.

@rviscomi suggested we use this message board to discuss the idea and perhaps make a proof of concept to see how much the dataset has to offer.

Do you (yes, you!) have ideas on how we can explore the carbon impact of websites using the HTTP Archive and perhaps share our findings later in an upcoming chapter of the Web Almanac?

The Website Carbon Calculator (by Tom Greenwood) uses a robust methodology for calculating the carbon emissions generated by a web page. Take a look at https://www.websitecarbon.com/ . The results returned after you enter a URL to be checked are for a page view on desktop, calculated at Document Complete, not Full Load.

When checking a page for carbon emission generation, I tend to use WebPageTest to grab the page weight at full load on both desktop and mobile and run those through the base formula employed by the Website Carbon Calculator.

The api at https://api.websitecarbon.com/ powers the Website Carbon Calculator, which adds additional information. For example, Website Carbon Calculator translates the power consumption of a web page (visited 10,000 times a month) into the number of cups of tea that could be made and the number of km an electric car could drive on the same amount of energy. The https://api.websitecarbon.com/ integrates with Green Web Foundation API in order to determine if a site uses a green host or not, and this is factored into the carbon emissions calculation.

Used in combination, these tools calculate carbon emissions based on the weight at initial page load. Going forward, I recommend a more sophisticated method involving the emulation of user scrolling (possibly via Puppeteer) and the calculation of carbon emissions at 25% scroll, 50%, 75% and 100% in addition to initial full load.

1 Like

Here’s a related query from the CMS chapter of the 2020 Web Almanac:

#standardSQL
# Distribution of page weight, requests, and co2 grams per CMS web page
# https://gitlab.com/wholegrain/carbon-api-2-0/-/blob/b498ec3bb239536d3612c5f3d758f46e0d2431a6/includes/carbonapi.php
CREATE TEMP FUNCTION
  GREEN(url STRING) AS (FALSE); -- TODO: Investigate fetching from Green Web Foundation
CREATE TEMP FUNCTION
  adjustDataTransfer(val INT64) AS (val * 0.75 + 0.02 * val * 0.25);
CREATE TEMP FUNCTION
  energyConsumption(bytes FLOAT64) AS (bytes * 1.805 / 1073741824);
CREATE TEMP FUNCTION
  getCo2Grid(energy FLOAT64) AS (energy * 475);
CREATE TEMP FUNCTION
  getCo2Renewable(energy FLOAT64) AS (energy * 0.1008 * 33.4 + energy * 0.8992 * 475);
CREATE TEMP FUNCTION
  CO2(url STRING, bytes INT64) AS (
  IF
    (GREEN(url),
      getCo2Renewable(energyConsumption(adjustDataTransfer(bytes))),
      getCo2Grid(energyConsumption(adjustDataTransfer(bytes)))));

SELECT
  percentile,
  client,
  APPROX_QUANTILES(requests, 1000)[OFFSET(percentile * 10)] AS requests,
  ROUND(APPROX_QUANTILES(bytes, 1000)[OFFSET(percentile * 10)] / 1024 / 1024, 2) AS mbytes,
  APPROX_QUANTILES(co2grams, 1000)[OFFSET(percentile * 10)] AS co2grams
FROM (
  SELECT
    _TABLE_SUFFIX AS client,
    reqTotal AS requests,
    bytesTotal AS bytes,
    CO2(url, bytesTotal) AS co2grams
  FROM
    `httparchive.summary_pages.2020_08_01_*`
  JOIN (
    SELECT
      _TABLE_SUFFIX,
      url
    FROM
      `httparchive.technologies.2020_08_01_*`
    WHERE
      category = 'CMS')
  USING
    (_TABLE_SUFFIX, url)),
  UNNEST([10, 25, 50, 75, 90]) AS percentile
GROUP BY
  percentile,
  client
ORDER BY
  percentile,
  client

So while we don’t have host-specific data to know whether it’s green, we do estimate the grams of CO2 based on the page weight.

I’d be curious to hear ideas for other environment-related insights we could derive from HTTP Archive.

1 Like

Nice, thanks for the comments @joehoyle, @rviscomi.

One distinction I wonder about is the difference between the carbon impact of:

  • hosting a site
  • delivering a site to a user, perhaps addressed in page weight (bytes)
  • how the browser treats the site once it arrives (for example how script timers are treated, as in Edge’s sleeping tabs; “Using sleeping tabs… increases your battery life as a sleeping tab uses 37% less CPU on average than a non-sleeping tab.”)

Certainly lots to explore.

It’s probably not within scope but according to the research I’m doing about 98% of the pollution occurs at the device level, in either the creation of the page, but more likely (for high usage pages), in the use of the page.

1 Like

Very interesting @gerrymcgovernireland. Is there a link you can share to the research you’re doing to learn more?

I’ve haven’t published specifically on these issues at the moment, though I am planning to. I did publish a book last year on the general theme, called World Wide Waste. I can send you a copy if you’d like.

What’s I’ve been trying to do over the last couple of years is come up with some sort of model that calculates the CO2 of data. The more I dig into it, the more I end up focusing on the device that creates, stores, distributes, processes or allows the data to be consumed.

In the models I’m developing 98% of the pollution / energy use occurs in the creation or the use of the data. For example, from discussing with a variety of content professionals, an average 1,000 words that gets published on a webpage can have taken 20 hours worth of work. All of that work occurs on a device like a laptop or desktop. The energy consumed in creation tends to be vastly greater than the energy consumed in storage, for example. (A large part of the total CO2 Netflix is responsible for occurs during the production of their shows.)

If the content has low use then the main impact will be in the creation. However, where you have heavy use of a webpage or piece of content, the main impact is within the user’s device. Mozilla have done some interesting work calculating their CO2 footprint. They found that only 2% of their CO2 was attributable to their workers, offices, etc. 98% of the CO2 occurred during the browser use.

Okay team thanks for the early ideas… I expect our exploration of the carbon impact of websites will be limited by what is visible in HTTP Archive data. What might those things include?

  • Page weight because less page weight means less resources consumed to send and render a site
  • Prefers-reduced-data in CSS because data savings translates into reduced energy for rendering and display
  • Save Data http header because data savings translates into reduced energy for rendering and display
  • Script timers because less script timers are associated with greater energy consumption (I don’t know where signals of script timers live in HTTP Archive data but I’m guessing others do)

I don’t think we worry just yet about whether the things above are well adopted yet. I’m just brainstorming a backlog of places to poke around.

What other bullets could be added to the list above?

1 Like