Just after seeing @igrigorik’s tweet about Chorme 36 and Mixed Content Blocking, I was wondering how many websites were using Mixed Contents (a firstHtml
page using HTTPS that fetches requests over non secure HTTP).
I wanted to judge the interest to add a rule to detect such an issue with www.dareboost.com (websites test tool)
I started documenting myself on the browsers current policy, here is some usefull links if you want to know more :
Article about Firefox policy with useful information : https://blog.mozilla.org/tanvi/2013/04/10/mixed-content-blocking-enabled-in-firefox-23/
You can test your current browser policy here : https://www.ssllabs.com/ssltest/viewMyClient.html
I first searched for pages with an url
field starting with https. There were very few of them (18!). So I try to look directly in requests table, for requests with a true firstHtml
field and an url starting with https. I found 9185 entries.
Next, for all the matching pageid of these HTTPS firstHtml
requests , I look for requests that were not HTTPS, excluding 301 and 302 status codes, in order to avoid to detect this scheme as Mixed Content :
http ://example.com — 302|301 -----> httpS://example.com
SELECT pages.pageid, pages.url, pages.rank, COUNT(requests.url) AS resource_count FROM [httparchive:runs.latest_requests] requests JOIN EACH
(
SELECT url, rank, pageid FROM [httparchive:runs.latest_pages]
) pages ON pages.pageid = requests.pageid
WHERE requests.status!=301 AND requests.status!=302
AND requests.pageid IN (
# Here we have pageid for which firstHtml is an https resource - 9185 entries
SELECT pages.pageid FROM [httparchive:runs.latest_requests] requests JOIN EACH
(
SELECT rank, pageid FROM [httparchive:runs.latest_pages]
) pages ON pages.pageid = requests.pageid
WHERE requests.firstHtml = 1 AND requests.url LIKE ("https%")
)
AND requests.url NOT LIKE ("https%") GROUP BY pages.pageid, pages.url, pages.rank ORDER BY pages.rank ASC ;
Among 9185 home pages over SSL, it shows that 1031 are using Mixed Contents (11%).
As I found websites such as paypal.jp or usbank.com in the results, I first thought I made a mistake, so I manually checked some HARs downloaded via httparchive.org… That confirmed the Mixed Content issues.
But it still disapoints me that there are so many mixed contents websites, did I missed something ?
NB :
- I make no distinction between active or passive mixed contents
- some cases are probably excluded due to
WHERE
clause on status field, but I did not find another way to simply exclude false positive related to requests with non SSL redirect before reachingfirstHtml
pages over SSL. - I also found that for some websites, issues have been solved ever since. It would be interesting to compare current results with next run ones.