How is copyright handled for HAR file content

I’m looking at open sourcing some code that compares the performance of some compression tools. It’s meant to be used as a repeatable benchmark. An important part of this is a static set of HTML, CSS, and Javascript resources that are used as the test files to be compressed. These resources have been scraped from real websites, and therefore have their own copyrights.

So, I’m trying to figure out how I can make the static set of resources available alongside the source code for the project, without violating any copyrights.

It seems HTTP Archive is in a similar position; source code that can easily be open sourced, alongside supporting data/content that is copyrighted by a 3rd party yet still available from HTTP archive.

What’s the trick to make this work?

You probably actually want a lawyer, but using a collection of MIT/BSD licensed frameworks or their generated documentation sites might suffice.

Same disclaimer and advice as @aranjedeath.

Another strategy to consider: instead of freezing a particular snapshot, provide code to retrieve the data? E.g. it could crawl sites and snapshot resources; it could fetch a snapshot from HTTP Archive and extract relevant files from that, etc.