Hi guys,
Correct me if I’m wrong, but I’ve checked how many days per year are in the tables (query below) and I was surprised that not every day is in the tables.
select year(date) year, count(*) count
from (
SELECT date(SEC_TO_TIMESTAMP(startedDateTime)) date
FROM (TABLE_QUERY([httparchive:runs], 'REGEXP_MATCH(table_id, r"^*.*requests$")'))
GROUP BY date)
group by year
What I’m doing wrong (if)?
If this is the case almost every query done before comparing by year is irrelevant!
Regards,
Sebastian
It’s not working for me. (_)
How about this?
select count(*) count, startedDateTime date
FROM [httparchive:runs.latest_requests]
GROUP BY date
The results are shown on the basis of days.
anyone can help me to make this issue cleared?
HTTP Archive crawls are done twice a month, roughly: one crawl starts at the start of the month, and another is kicked off on the 15th. Each run takes multiple days to complete since we’re crawling lots of URLs with finite resources - i.e. we can’t crawl everything instantaneously and in parallel. For more details, see: http://httparchive.org/about.php
As such, looking at the day of when a particular run for a given site was kicked off is basically meaningless. If you want to extrapolate trends, treat each YYYY_MM_DD_requests
table as a snapshot for first or mid of a particular month and compare based on that.
1 Like
thanks, I was not sure about it
Let’s be more inclusive and not use “guys”. Words like “folks”, “group”, and “team” are better choices.
I updated the HA FAQ to explicitly mention the crawls are done on the 1st and 15th of each month: http://httparchive.org/about.php#datagathered
1 Like