What are the rights/licensing for data in HTTP Archive?

This seems like a very basic question, but I can’t find the answer anywhere. What is the license for the data in HTTP Archive? (e.g. the data in BigQuery)

There is no information that I can find on the project website for data rights, attribution, or licensing. The github project for httparchieve.org is Apache 2.0, but I don’t know if that applies to the data itself. There is mention that HTTP Archive data is “public”, but does that mean “public domain”?

Once I have the answer, I’m happy to create a PR to add this to the FAQs or other places.

I think we need to refine exactly what the right license is for the data. People are free to use the data however they want, but of course we always appreciate attribution when possible.

@igrigorik is this something you’ve considered before?

1 Like

Thanks, Rick! Your response is probably fine for my purposes, but of course it’d be awesome to have it officially documented somewhere. Maybe this is just an OSS habit :slight_smile:

1 Like

Bumping this a bit since it came up with DuckDuckGo’s new Tracker Radar crawl data.

It would be great to officially state that this data is PDDL (with attribution greatly appreciated of course :slight_smile: ). I can then do the same for third-party-web.

Thanks @patrickhulce, PDDL looks like a great fit for HA data. I’ll see what other folks think.