It’s commonly said that the classic ‘www.’ subdomain is in steep decline, prompted by general lack of need and catering to mobile users. I agree that it does seem like you see more ‘foo.com’ rather than ‘www.foo.com’ each year. Can the HTTP Archive quantify this, perhaps by looking at the top 1 million domains by year and seeing what fraction either lacks a ‘www.’ subdomain or it redirects?
Hi @gwern.
At the end of 2018, the HTTP Archive changed the source of the URLs from Alexa to CrUX. Alexa URLs (before Dec 2018) include the www
by default (www.example.com
), while the CrUX URLs do not (example.com
and www.example.com
are both valid origins). Since the initial URL has changed, we don’t have a reliable comparison before 2018. Some websites would redirect from www.example.com
to example.com
, while others would serve the same content on both domains or redirect to another subdomain altogether (en.example.com
or m.example.com
).
Looking at the data from Dec 2018 onwards:
client | year | total | www | pct |
---|---|---|---|---|
desktop | 2022* | 1,246,032 | 511,012 | 41.01% |
desktop | 2021 | 5,824,858 | 2,508,805 | 43.07% |
desktop | 2020 | 6,018,707 | 2,716,807 | 45.14% |
desktop | 2019 | 4,291,086 | 2,086,970 | 48.64% |
desktop | 2018 | 3,840,067 | 1,815,773 | 47.28% |
mobile | 2022* | 1,613,142 | 651,794 | 40.41% |
mobile | 2021 | 7,957,652 | 3,350,268 | 42.10% |
mobile | 2020 | 7,157,942 | 3,203,779 | 44.76% |
mobile | 2019 | 5,181,871 | 2,490,108 | 48.05% |
mobile | 2018 | 1,247,333 | 286,662 | 22.98% |
* 2022 is sampled at 10% of corpus
Hope that helps.
Query:
#standardSQL
# Count of domains which use 'www' subdomain
WITH data_2022 AS (
SELECT
_TABLE_SUFFIX AS client,
"2022" AS year,
COUNT(DISTINCT url) AS total,
COUNT(DISTINCT IF(REGEXP_CONTAINS(url, r"^https?://www\."), url, NULL)) AS www
FROM
`httparchive.pages.2022_12_01_*` TABLESAMPLE SYSTEM(10 PERCENT)
GROUP BY
client
),
data_2021 AS (
SELECT
_TABLE_SUFFIX AS client,
"2021" AS year,
COUNT(DISTINCT url) AS total,
COUNT(DISTINCT IF(REGEXP_CONTAINS(url, r"^https?://www\."), url, NULL)) AS www
FROM
`httparchive.pages.2021_12_01_*` TABLESAMPLE SYSTEM(100 PERCENT)
GROUP BY
client
),
data_2020 AS (
SELECT
_TABLE_SUFFIX AS client,
"2020" AS year,
COUNT(DISTINCT url) AS total,
COUNT(DISTINCT IF(REGEXP_CONTAINS(url, r"^https?://www\."), url, NULL)) AS www
FROM
`httparchive.pages.2020_12_01_*` TABLESAMPLE SYSTEM(100 PERCENT)
GROUP BY
client
),
data_2019 AS (
SELECT
_TABLE_SUFFIX AS client,
"2019" AS year,
COUNT(DISTINCT url) AS total,
COUNT(DISTINCT IF(REGEXP_CONTAINS(url, r"^https?://www\."), url, NULL)) AS www
FROM
`httparchive.pages.2019_12_01_*` TABLESAMPLE SYSTEM(100 PERCENT)
GROUP BY
client
),
data_2018 AS (
SELECT
_TABLE_SUFFIX AS client,
"2018" AS year,
COUNT(DISTINCT url) AS total,
COUNT(DISTINCT IF(REGEXP_CONTAINS(url, r"^https?://www\."), url, NULL)) AS www
FROM
`httparchive.pages.2018_12_15_*` TABLESAMPLE SYSTEM(100 PERCENT)
GROUP BY
client
),
combined_data AS (
SELECT
client,
year,
www,
total,
ROUND(100 * www / total, 2) AS pct
FROM
data_2022
UNION ALL
SELECT
client,
year,
www,
total,
ROUND(100 * www / total, 2) AS pct
FROM
data_2021
UNION ALL
SELECT
client,
year,
www,
total,
ROUND(100 * www / total, 2) AS pct
FROM
data_2020
UNION ALL
SELECT
client,
year,
www,
total,
ROUND(100 * www / total, 2) AS pct
FROM
data_2019
UNION ALL
SELECT
client,
year,
www,
total,
ROUND(100 * www / total, 2) AS pct
FROM
data_2018
)
SELECT
client,
year,
total,
www,
pct
FROM
combined_data
ORDER BY
client,
year DESC
Oh, I didn’t know that about the data transition. Thanks for highlighting that - that could have been very misleading (although probably one would notice that www
went off a cliff in December 2018 for no apparent reason).
Your 2018->2022 numbers seem clear, however: www
is going out of fashion, at a remarkable rate of 2% a year - at least for ‘desktop’ (47% → 41%).
Do you know what’s going on with your ‘mobile’ numbers there? Did Alexa cover mobile badly while CrUX had much better coverage? Obviously ‘mobile’ browsers didn’t really jump from 23% to 48% and then immediately reverse to start declining 48%->41%. I read this as the 23% being bogus, and the real number presumably more like 49-50%, and then declining ~2%/year like ‘desktop’.
The 2018 data is from the mid-December run, so it should already be from CrUX. Looking at a random sample of URLs they seems correct, however in January 2019 (~ 15 days later) the data is already drastically different, so I suspect the December mobile data for Dec 2018 (both runs) is somehow incorrect.
client | year | www | total | pct |
---|---|---|---|---|
mobile | Jan 2019 | 1905679 | 4023592 | 47.36 |
desktop | Jan 2019 | 1813069 | 3831026 | 47.33 |
Query:
WITH data_jan_2019 AS (
SELECT
_TABLE_SUFFIX AS client,
'Jan 2019' AS year,
COUNT(DISTINCT url) AS total,
COUNT(DISTINCT IF(REGEXP_CONTAINS(url, r"^https?://www\."), url, NULL)) AS www
FROM
`httparchive.pages.2019_01_01_*` TABLESAMPLE SYSTEM(100 PERCENT)
GROUP BY
client
)
SELECT
client,
year,
www,
total,
ROUND(100 * www / total, 2) AS pct
FROM
data_jan_2019
Note that the desktop dataset increased in size on Dec 15, 2018 while the mobile dataset lagged behind until Jan 1, 2019.
wait rick what does this all mean
@marry to clarify, the post that @imkevdev linked to earlier describes the dataset changes at the end of 2018:
- As of 2018_07_01 we started using URLs from the Chrome UX Report (CrUX). We switched mobile URLs to CrUX on 07_15. This increased our coverage from 500K URLs to ~1.3M.
- As of 2018_12_01 we have decreased the number of test runs per URL from 3 to 1. The loss of redundancy may affect the reliability of time-based performance metrics but these are not especially useful in synthetic tests. For accurate real-world performance data join with the CrUX dataset. This change reduces the time for the crawl to complete, allowing us to add more URLs.
- As of 2018_12_15 we have increased the desktop URLs to all CrUX URLs for desktop (3.9M).
- As of 2019_01_01 we are reducing the crawl frequency from semi-monthly to monthly. Combined with the reduced runs per URL, this change will enable us to afford testing the full CrUX corpus for both desktop and mobile. As of this crawl we will increase the mobile URLs to all CrUX URLs for mobile (4.2M).
So the rollout to use all CrUX URLs was applied to desktop on 2018_12_15 and mobile on 2019_01_01.