Ads influence on a site

Hey all, I am enjoying all of the data in this amazing archive, but I am afraid there is many “noise” of ads in the data. I guess the data includes all the network calls for ads and banners that are running on the domains, and they can take some 40% of the site’s performance (e.g, 1 banner can have a “waterfall” of ads inside its JS and call around 10 different servers until it gets an ad).
So few questions:
1- Does the data includes network calls from ads technologies ?
2- How can one make a “clean” test/query and understand the real performance of the web (without the ads)?

Thanks

  1. Yes.
  2. Tricky. As in, you’d have to figure out which requests are “ad requests” and then account for those in your analysis. Practically speaking, for more reliable results you’d want to run your crawl with your logic to prevent such requests… which we don’t do (and no plans to do so).

Thanks a lot for the swift reply!
(BTW, I still haven’t made any query in “httparchive” so I am not familiar with its capabilities, so excuse me for any funny questions :slight_smile: )

3- Do you think I can spot network calls of AdBlockers in order to select only pages with ad block? I think I will suggest this query in suggestions