Missing: number of requested domains per page


#1

I can’t find information about the number of different domains requested by a page - which used to be the most interesting datapoint for me here. Is this information gone or have I overlooked it?


#2

In the shuffle of launching the beta website, the legacy website is temporarily housed at legacy.httparchive.org, which is where you can find older metrics like the number of domains: https://legacy.httparchive.org/trends.php#numDomains&maxDomainReqs

Curious to hear what you use the data for.


#3

There it is, thanks! I really like the new layout and hope you will include this stat.

I’m half developer, half tech journalist. I think including content from many different domains is a serious privacy problem that many developers don’t seem to recognize. So I used this number (which I find shockingly high) in several stories and news I wrote trying to raise awareness.


#4

I think it would be really great if there was a flag noting the nature of each domain - which ones are considered third-party and of what type they are (JS libraries, Advertising, Analytics, Fraud prevention, Social media, Live Chat, etc.)

Google already has a starting point for a database of known domains., Could this be worked into a future version of WPT so they get flagged in the JSON content?

We could do a query to lookup the domains into some sort of reference table, but that would be an expensive query to run to do this sort of basic analysis of the composition of the various sites being crawled.

See:

https://developers.google.com/web/updates/2017/05/devtools-release-notes#badges

https://stackoverflow.com/questions/45720950/how-do-chrome-devtools-3rd-party-badges-get-added

Examples:

native.sharethrough.com is Advertising

*.outbrain.com is Advertising

*.livefyre.com is Advertising

(cdn3.optimizely.com|cdn.optimizely.com) is CX optimization

mouseflow.com
is CX optimization

*.howtank.com is Live Chat

*.

providesupport.com
is Live Chat

*.

livestatserver.com is Analytics

*.

pardot.com is Marketing automation

*.

optnmnstr.com is Abandonment Detection / eCommerce

*.riskified.com is Fraud detection


#5

@LesMurphy that reminds me that we have an open feature request to do something very similar: https://github.com/HTTPArchive/httparchive.org/issues/12. Would you be interested to contribute in any way?


#6

Somehow I missed seeing that post. See my reply there. Thanks!