I’m looking at open sourcing some code that compares the performance of some compression tools. It’s meant to be used as a repeatable benchmark. An important part of this is a static set of HTML, CSS, and Javascript resources that are used as the test files to be compressed. These resources have been scraped from real websites, and therefore have their own copyrights.
So, I’m trying to figure out how I can make the static set of resources available alongside the source code for the project, without violating any copyrights.
It seems HTTP Archive is in a similar position; source code that can easily be open sourced, alongside supporting data/content that is copyrighted by a 3rd party yet still available from HTTP archive.
What’s the trick to make this work?