How Many Sites Are Still Using AppCache?


#1

The Application Cache has been deprecated and removed from the web standards. While some browsers still support it - that support is going away. For example, starting with Firefox 44 a console warning advised developers to use Service Workers instead. In Chrome v68, when an HTTP page loads with AppCache configured, the browser provides a warning that v69 will restrict AppCache to secure context only.

And this evening I saw the intent to fully deprecate AppCache in Blink.

After reading this, I started wondering how many sites are still using AppCache today…

Finding AppCache Manifests in Response Bodies
The application cache is enabled by including a manifest HTML attribute in a page. That mainfest then lists resources that should be cached. I decided to see if I could craft a regular expression to both identify the presence of the mainfest and also extract the manifest filename. The following looked promising -
image

I used that regular expression in the following query, which outputs a list of all URLs where such a manifest is defined. (Note, this query processes 2.5 TB of data which exceeds the free tier by 1.5TB )

SELECT page,url, appcache_manifest
FROM (
  SELECT page, url, 
       REGEXP_EXTRACT(LOWER(body), r'<html.*manifest=["](.*)["] ') appcache_manifest
  FROM `httparchive.response_bodies.2018_08_01_desktop`
 )
WHERE appcache_manifest IS NOT null

The results look like this, and you can see that cache.appcache seems to be a popular manifest name. I also found examples like cache.manifest, mainfest.appcache, offline.appcache and others.

I only found 268 pages out of 1,275,374 pages in the HTTP Archive that matched this pattern. 65 of them are HTTP pages, and 203 of then are HTTPS. That seemed low, so I decided to check another way.

Lighthouse to the Rescue
Lighthouse has an audit that indicates whether AppCache is used. They also have a ServiceWorker audit. So I decided to give that a try. Here’a query that will aggregate the scores for the AppCache and ServiceWorker audits and group it by protocol. Note: Processes 197GB of data

SELECT SUBSTR(url, 0, 5) p,
       JSON_EXTRACT_SCALAR(report, "$.audits.appcache-manifest.score") AS appcache,
       JSON_EXTRACT_SCALAR(report, "$.audits.service-worker.score") AS serviceWorkers,
       count(*)     
FROM `httparchive.lighthouse.2018_08_01_mobile` 
GROUP BY p, appcache, serviceWorkers

The results look like this -
image

When we clean up the output and display it in a cross tabular format, it’s a lot easier to analyze. From the results below, we can see that there are 1,126 sites still using AppCache (and only 23 of them are using ServiceWorkers). 301 of the sites using AppCache are still serving content over HTTP - which means they will experience the deprecation first.

There are 14,844 sites that registered a service worker though, which is promising! The 346 sites trying to register service workers on HTTP pages is interesting since the service worker will not run on insecure pages. Perhaps there’s more to investigate here…


#2

Thank you very much for your analysis, Paul!

Love the format, great writing! I’m also very grateful for the queries – we’ll definitely draw inspiration from your work when we start looking into usage.


#3

Really interesting, I thought more sites were using AppCache. Maybe most of them already migrated.

It looks like I now have to migrate https://play.esviji.com/ from its old-fashioned manifest.appcache to a Service Worker with the Cache API.

Deprecating things that used to work put pressure on little pet projects I might not have time to update…