Why are some of the crux pages missing?

ydimova11 · February 21, 2026, 6:48pm

I noticed that the there are less websites included in the HTTPArchive dataset than in the CrUX list. For example, for the top 10k CrUX, only 6531 websites are in httparchive.crawl.pages. The difference seems way to high to be related to crawler failure-related issues. Is there another reason for it?

This is the query I am using:

SELECT DISTINCT page
FROM httparchive.crawl.pages
JOIN `chrome-ux-report.all.202512`
ON NET.HOST(page) = NET.HOST(origin)
WHERE
date = '2025-12-01'
AND client = 'desktop'
AND is_root_page
AND experimental.popularity.rank <= 10000

tunetheweb · February 21, 2026, 7:16pm

There’s a number of reasons for this, including:

The ranks are shared between clients and you’re only looking at desktop. Mobile has more coverage but even then it’s far from 100% for the other reasons.
Some sites block our crawler. We do identify ourselves with PTST in the user agent as good net citizens but the downside of that is we get blocked as “bot traffic” by a percentage of sites.
We crawl only from US datacenters and so some sites redirect to the US version of the site and we stop the crawl for that site for origin changes (that US origin will likely be already in the CrUX list if popular enough).

See also this thread: Investigate missing Top 1k home pages · Issue #222 · HTTPArchive/data-pipeline · GitHub where we explained ~20% of sites are blocked. In that thread we also introduced the crawl_failures table where you can see the rejection reasons for any missing entries.

Topic		Replies	Views
Missing websites in April 2022 Analysis	4	1283	December 19, 2022
Why number of URLs are changing in each month?	8	1255	April 22, 2020
Number of domains in HTTPArchive Analysis	3	2098	September 24, 2017
[10x project] Exceeding 10M and announcing new capabilities Announcements	2	1171	July 2, 2022
Summary_pages tables don't have the rank for desktop browser from 2018_07_01 Analysis	3	1245	January 10, 2019

Why are some of the crux pages missing?

Related topics