The data set is HTTP Archive, and I use BigQuery to run queries on the data. It’s 1.3 million pages and their response bodies including CSS/JS subresources, and other data (e.g. which URLs trigger a particular Chrome use counter). You can set up an account (with payment to Google for usage over 1TB or some such) and do this yourself also. More info here:
https://httparchive.org/faq#how-do-i-use-bigquery-to-write-custom-queries-over-the-data
I ran this query:
#standardSQL
SELECT
Alexa_rank AS rank,
r.page AS page
FROM
`httparchive.response_bodies.2018_10_01_desktop` AS r
JOIN
`httparchive.urls.20170315`
ON
NET.REG_DOMAIN(r.page) = Alexa_domain
WHERE
REGEXP_CONTAINS(r.body, r"(?i)-(webkit|moz)-appearance\s*:\s*menulist-textfield\b")
ORDER BY
rank
Results: