Days per year in whole tables

Hi guys,

Correct me if I’m wrong, but I’ve checked how many days per year are in the tables (query below) and I was surprised that not every day is in the tables.

select year(date) year, count(*) count
from (
  SELECT date(SEC_TO_TIMESTAMP(startedDateTime)) date
  FROM (TABLE_QUERY([httparchive:runs], 'REGEXP_MATCH(table_id, r"^*.*requests$")'))
  GROUP BY date)
 group by year

What I’m doing wrong (if)?

If this is the case almost every query done before comparing by year is irrelevant!

Regards,

Sebastian

It’s not working for me. (_)
How about this?

select count(*) count, startedDateTime date
FROM [httparchive:runs.latest_requests]
GROUP BY date

The results are shown on the basis of days.

anyone can help me to make this issue cleared?

HTTP Archive crawls are done twice a month, roughly: one crawl starts at the start of the month, and another is kicked off on the 15th. Each run takes multiple days to complete since we’re crawling lots of URLs with finite resources - i.e. we can’t crawl everything instantaneously and in parallel. For more details, see: http://httparchive.org/about.php

As such, looking at the day of when a particular run for a given site was kicked off is basically meaningless. If you want to extrapolate trends, treat each YYYY_MM_DD_requests table as a snapshot for first or mid of a particular month and compare based on that.

1 Like

thanks, I was not sure about it

Let’s be more inclusive and not use “guys”. Words like “folks”, “group”, and “team” are better choices.

I updated the HA FAQ to explicitly mention the crawls are done on the 1st and 15th of each month: http://httparchive.org/about.php#datagathered

1 Like