What is the addressable market of active web development?

Inspired by Malte’s tweet:

What percent of websites changed their tech stack in a given month/year?

Although it wasn’t published anywhere, I ran a similar analysis last year to understand the addressable market of the web. Here’s the query and what I found:

CREATE TEMP FUNCTION PARSE_LAST_MODIFIED(last_modified STRING) RETURNS DATE DETERMINISTIC AS (
  CAST(SAFE.PARSE_DATETIME('%a, %d %h %Y %T GMT', last_modified) AS DATE)
);

SELECT
  age,
  COUNT(0) AS n
FROM (
  SELECT
    page,
    -- Number of days from most recent Last-Modified resource to the test date.
    DATE_DIFF(
      ANY_VALUE(DATE(TIMESTAMP_SECONDS(startedDateTime))),
      MAX(PARSE_LAST_MODIFIED(resp_last_modified)),
    DAY) AS age
  FROM
    `httparchive.almanac.requests`
  WHERE
    date = '2021-07-01' AND
    -- Desktop only for simplicity. Mobile results are very similar.
    client = 'desktop' AND
    -- First-party resources only.
    NET.HOST(page) = NET.HOST(url)
  GROUP BY
    page)
GROUP BY
  age
ORDER BY
  age

This query measures first-party resource freshness as a proxy for addressability. Freshness is calculated as the number of days elapsed from the time of the most recent first-party resource’s Last-Modified date to the time of the test.

So if a page has updated their home hero image in the last week, its Last-Modified date should reflect that and used as the basis for comparison against the date of the test. Whether one week, one month, or one year is the threshold for freshness or addressability is entirely subjective.

(Sample size: 6,286,373 pages)

The results are really interesting:

  • 21% of pages updated a Last-Modified header on the same day as the test
  • 48% of pages were modified within 7 days
  • 63% of pages were modified within 30 days
  • 78% of pages were modified with 90 days
  • 89% of pages were modified within 365 days
  • The median page is 9 days old
  • The p75 age is 79 days old

I was surprised to find that 21% of pages are 1 day old. Maybe some servers are setting the Last-Modified date of resources to the current time by default? I’d be curious to find out if anyone is aware of that happening in practice. If so that would invalidate some of these results.

It’s also worth acknowledging the potential bias of the corpus of websites in this sample. These websites are sourced from the Chrome UX Report, which is a collection of websites actively visited by Chrome users in a given month. Perhaps this cohort is more likely to visit websites that are actively maintained. In a way, that makes the results more reliable, since the freshness of a website that no one visits isn’t very important. Still, being based on Chrome usage means that there may be websites less likely to be included in the dataset.

All that being said, one key takeaway from these results is that at the 90 day threshold, 78% of websites can be considered addressable. That is really, really good. It means that a huge part of the web is actively maintained.

1 Like