Getting domain rank with the new rank-less Chrome UX Report corpus

rviscomi · August 24, 2018, 8:43pm

I wanted to share an example query that shows how it’s still possible to infer site rank even though we switched over to the 1.3M unranked Chrome UX Report URLs.

The 1.3M URLs were explicitly selected to be on domains in the March 15, 2017 Alexa 1M list. In BigQuery this list is accessible at httparchive.urls.20170315. This date is used because it’s the last snapshot we have before Alexa switched to using (what I consider to be) a much lower quality list.

Given that, every URL maps to a domain that has a rank in the Alexa 1M list. We can join any table with a URL field to the Alexa list to get its rank:

#standardSQL
SELECT
  Alexa_rank AS rank,
  url
FROM
  `httparchive.pages.2018_08_01_desktop`
JOIN
  `httparchive.urls.20170315`
ON
  NET.REG_DOMAIN(url) = Alexa_domain
WHERE
  JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.V8SpeechRecognition_Start_Method') IS NOT NULL
ORDER BY
  rank

rank	url
25072	https://strc.guanajuato.gob.mx/
46511	https://eservices.mcs.gov.sa/
46511	https://www.mcs.gov.sa/
594240	https://www.talater.com/

For example, this query shows us the domain rank and URL for all 4 sites that do some kind of speech recognition. Note that rank 46511 is duplicated across two separate URLs. This is because they’re on the same domain (mcs.gov.sa).

So we lose per-URL rank granularity but we gain the ability to analyze multiple origins for a given domain, which is how we ended up with 1.3M > 1M URLs. This is especially important for domains that host user-generated content, like WordPress or Blogger. And also for large companies with many products under a single domain, like maps.google.com, mail.google.com, books.google.com, etc. Previously, we only knew that “google.com” was rank #1 and so we’d simply crawl http://www.google.com, missing out on all of the other popular sites.

We haven’t done it yet, but at some point we will regenerate the list of URLs from newer Chrome UX Report datasets. We will probably also intersect it with the Alexa 1M to preserve coarse ranking info, but it depends on a few other factors like our crawl capacity.

Topic		Replies	Views
Alexa Rank for each url Meta	1	1906	January 27, 2020
Summary_pages tables don't have the rank for desktop browser from 2018_07_01 Analysis	3	1234	January 10, 2019
Recent Alexa Ranks	4	1626	June 22, 2021
Why are there no statistics for well-known websites on some dates?	2	927	February 24, 2019
Alexa rank attribute in the pages table	1	1363	June 26, 2018

Getting domain rank with the new rank-less Chrome UX Report corpus

Related topics