SELECT policy, COUNT(policy) as policy_count,
RATIO_TO_REPORT(policy_count) OVER() as ratio FROM (
SELECT resp_cache_control, resp_expires,
CASE
WHEN resp_cache_control CONTAINS "no-store"
THEN "no-store"
WHEN resp_cache_control CONTAINS "no-cache"
OR resp_cache_control CONTAINS "max-age=0"
OR resp_expires = "-1"
THEN "non-cacheable"
WHEN LENGTH(resp_cache_control) = 0
AND LENGTH(resp_expires) = 0
THEN "undefined"
ELSE "cacheable"
END as policy
FROM [runs.latest_requests]
WHERE firstHtml = true
) GROUP BY policy
ORDER BY policy_count DESC
For 2014-06-15 run, the stats are:

Note that “non-cacheable” means “no-store” was not specified, but the reverse does not hold.
Hi, Ilya. Can you clarify what you’re trying to measure? I see “firstHtml = true”. Is this the caching policies only for the main HTML document? If so, the high percentage of “no-store” and “non-cacheable” is understandable. The high percentage of “undefined” is bad - it’s better to specify explicit caching headers rather than leave it up to browser heuristics (which vary by browser).
Yep, good catch, updated the title of the post. Was trying to get a feel for CC policies of the HTML documents, and I’m with you on relying on heuristic caching.
A deeper look at the docs that do specify the max-age:
SELECT
INTEGER(REGEXP_EXTRACT(resp_cache_control, r'max-age=(\d+)')) age,
count(pageid) cnt,
RATIO_TO_REPORT(cnt) OVER() as ratio
FROM [httparchive:runs.latest_requests]
WHERE firstHtml = true
AND resp_cache_control CONTAINS 'max-age'
GROUP BY age
HAVING cnt > 500
ORDER BY age asc
50% mark the doc as non-cacheable, followed by spikes at 3s (hmm), 10m, 1hr, and 1 day.