How many resources return Last-Modified and/or ETag values?

SELECT revalidation, resource_count, ROUND(ratio*100) as percent FROM (
  SELECT revalidation, COUNT(url) as resource_count, RATIO_TO_REPORT(resource_count) OVER() ratio FROM (
    SELECT url, CASE
      WHEN resp_etag != "" AND resp_last_modified != "" THEN "etag + last-modified"
      WHEN resp_etag != "" THEN "etag only"
      WHEN resp_last_modified != "" THEN "last-modified only"
      ELSE "no revalidation"
      END revalidation
    FROM httparchive:runs.2014_11_01_requests
  ) GROUP BY revalidation
) ORDER BY percent DESC

Results for Nov 1st, 2014 run:

  • 23% of resources do not return any revalidation values - no ETag or Last-Modified.
  • Very few resources use ETag-only (just 4%), whereas Last-Modified-only accounts for 30%.
  • Most resources return both Last-Modified and ETag values (43%).


It’s of course GOOD to provide a validator, so the troubling part of these results is the 23% of responses that don’t return any revalidation values. Digging deeper into those we see that about half (49%) have a Cache-Control header that dictates they must ALWAYS be re-fetched from origin - so perhaps revalidation is less critical. About a quarter don’t have a Cache-Control header - these are the ones that really need focus by the website owner as the behavior is left to the browser implementation and can’t be validated. And the other quarter also need help as they ARE intended to be cached but when that cache time expires revalidation is not possible.

SELECT cache_header, round(100*ratio) as percent, resource_count FROM (
SELECT count(*) as resource_count, RATIO_TO_REPORT(resource_count) OVER() ratio, 
       CASE WHEN resp_cache_control = "" THEN "no cache header"
            WHEN (resp_cache_control contains "no-cache" OR resp_cache_control contains "no-store" OR resp_cache_control contains "must_revalidate" OR resp_cache_control contains "max-age=0") THEN "explicit NO caching"
            ELSE "other"
       END as cache_header
FROM httparchive:runs.2014_11_01_requests
WHERE (resp_etag = "" AND resp_last_modified = "")
GROUP BY cache_header
ORDER BY resource_count desc