How to include only all .EU domains?

Hi Guys,

how to include only for example all .EU domains in Google BigQuery from httparchive:runs.latest_requests?
Command WHERE url LIKE ‘http://%.eu%/’ is not working

BigQuery has some restrictions, one of them being no support for “wildcard” syntax in strings. That said, you can use regular expressions or CONTAINS() to achieve same results. Or, for this particular case:

SELECT COUNT(url) FROM [runs.2014_11_15_pages] 
WHERE TLD(URL) = '.eu'

That returns 1028 hosts for me. Once you have the hosts you can join them against the requests table, etc. For the docs, head to:

1 Like

Let’s be more inclusive and not use “guys”. Words like “folks”, “group”, and “team” are better choices.