I'm using the tables from the June 15, 2017 run and running into some strange numbers.
For example, when I run
select count(*) from httparchive:runs.2017_06_15_pages I get 474,696 rows. I ran
select count(distinct(pageid)) from httparchive:runs.2017_06_15_pages to see if there were any pageid's that were duplicate, and received 498599 as an answer. My understanding is that there is one pageid per row.
I'm pretty stuck at this point. The second query shouldn't be returning a number larger than the first, at least based on my understanding from the docs. What am I missing?