Inspired by Malte’s tweet:
What percent of websites changed their tech stack in a given month/year?
Inspired by Malte’s tweet:
What percent of websites changed their tech stack in a given month/year?
Although it wasn’t published anywhere, I ran a similar analysis last year to understand the addressable market of the web. Here’s the query and what I found:
CREATE TEMP FUNCTION PARSE_LAST_MODIFIED(last_modified STRING) RETURNS DATE DETERMINISTIC AS (
CAST(SAFE.PARSE_DATETIME('%a, %d %h %Y %T GMT', last_modified) AS DATE)
);
SELECT
age,
COUNT(0) AS n
FROM (
SELECT
page,
-- Number of days from most recent Last-Modified resource to the test date.
DATE_DIFF(
ANY_VALUE(DATE(TIMESTAMP_SECONDS(startedDateTime))),
MAX(PARSE_LAST_MODIFIED(resp_last_modified)),
DAY) AS age
FROM
`httparchive.almanac.requests`
WHERE
date = '2021-07-01' AND
-- Desktop only for simplicity. Mobile results are very similar.
client = 'desktop' AND
-- First-party resources only.
NET.HOST(page) = NET.HOST(url)
GROUP BY
page)
GROUP BY
age
ORDER BY
age
This query measures first-party resource freshness as a proxy for addressability. Freshness is calculated as the number of days elapsed from the time of the most recent first-party resource’s Last-Modified
date to the time of the test.
So if a page has updated their home hero image in the last week, its Last-Modified
date should reflect that and used as the basis for comparison against the date of the test. Whether one week, one month, or one year is the threshold for freshness or addressability is entirely subjective.
(Sample size: 6,286,373 pages)
The results are really interesting:
Last-Modified
header on the same day as the testI was surprised to find that 21% of pages are 1 day old. Maybe some servers are setting the Last-Modified
date of resources to the current time by default? I’d be curious to find out if anyone is aware of that happening in practice. If so that would invalidate some of these results.
It’s also worth acknowledging the potential bias of the corpus of websites in this sample. These websites are sourced from the Chrome UX Report, which is a collection of websites actively visited by Chrome users in a given month. Perhaps this cohort is more likely to visit websites that are actively maintained. In a way, that makes the results more reliable, since the freshness of a website that no one visits isn’t very important. Still, being based on Chrome usage means that there may be websites less likely to be included in the dataset.
All that being said, one key takeaway from these results is that at the 90 day threshold, 78% of websites can be considered addressable. That is really, really good. It means that a huge part of the web is actively maintained.