HTTP Archive does use the Alexa Top 1M websites to seed its list of pages to crawl. However, from that list of 1M we only use the first 500K – that’s all we can fit in the 2 week window between crawls! So then why are you seeing 470K instead of 500K? Approximately 30K of the tests fail each crawl. This could be due to an error in WebPageTest or the website itself. Check out Handle errors · Issue #115 · HTTPArchive/legacy.httparchive.org · GitHub to learn more about these errors and how we plan to mitigate them.
If my Classified Ads Website is not in the top 1mill of alexa ranks I will not be able to see it on httparchive?
Correct, by default only the first 500,000 websites in Alexa’s top 1 million list are included.
HTTP Archive does have a way to add URLs that are not in the top 500k. Go to http://httparchive.org/addsite.php and enter your website’s URL to add it to future crawls. Be advised that this functionality may not work in the future beta site.