Number of domains in HTTPArchive

amirian · September 12, 2017, 4:13pm

Why are there only about 470K domains, even though the FAQ states that the http archive crawls the Alexa Top 1 Mil?

Along those notes, when crawling, do you use the updated Alexa file from that day? So, if you run a scan on 9/1, do you use the Alexa file from 9/1, or is there a set that you are scanning?

rviscomi · September 13, 2017, 5:29pm

Hi @amirian good questions!

HTTP Archive does use the Alexa Top 1M websites to seed its list of pages to crawl. However, from that list of 1M we only use the first 500K – that’s all we can fit in the 2 week window between crawls! So then why are you seeing 470K instead of 500K? Approximately 30K of the tests fail each crawl. This could be due to an error in WebPageTest or the website itself. Check out Handle errors · Issue #115 · HTTPArchive/legacy.httparchive.org · GitHub to learn more about these errors and how we plan to mitigate them.

Alexa has stopped updating their list of 1M sites. HTTP Archive uses an older snapshot of the list for each of the crawls, so you should expect to see the same URLs and page ranks in all of the crawls this year.

The link above has more info about the Alexa list and some alternatives.

Yeribel · September 22, 2017, 2:58pm

Si mi Sitio web de Anuncios Clasificados no está en el top 1mill de alexa ranks no podré verlo en httparchive?

o qué más se toma en cuenta?

rviscomi · September 24, 2017, 7:56pm

If my Classified Ads Website is not in the top 1mill of alexa ranks I will not be able to see it on httparchive?

Correct, by default only the first 500,000 websites in Alexa’s top 1 million list are included.

HTTP Archive does have a way to add URLs that are not in the top 500k. Go to http://httparchive.org/addsite.php and enter your website’s URL to add it to future crawls. Be advised that this functionality may not work in the future beta site.

Topic		Replies	Views
Use Tranco list instead of Alexa Top 1M Analysis	7	4085	March 13, 2019
Recent Alexa Ranks	4	1628	June 22, 2021
Alexa Rank for each url Meta	1	1907	January 27, 2020
Why number of URLs are changing in each month?	8	1206	April 22, 2020
Archive of Alexa ranks Analysis	0	2523	March 4, 2014

Number of domains in HTTPArchive

Related topics