SELECT INTEGER(REGEXP_EXTRACT(resp_cache_control, r'max-age=(\d+)')) age, count(pageid) cnt
FROM [httparchive:runs.latest_requests_mobile]
GROUP BY age
HAVING cnt > 500
ORDER BY cnt desc
Based on Aug 15, 2013 data (mobile)… 40% of the requests don’t specify maxage, and for the ones that do:
SELECT bucket, total, ROUND(ratio*100) as percent FROM (
SELECT bucket, SUM(CNT) total, RATIO_TO_REPORT(total) OVER() ratio FROM (
SELECT cnt, CASE
WHEN age < 60 THEN "less than 60s"
WHEN age > 60 AND age < 60*10 THEN "1-10min"
WHEN age > 60*10 AND age < 60*60 THEN "10min-1hr"
WHEN age > 60*60 AND age < 60*60*24 THEN "1hr-1d"
WHEN age > 60*60*24 AND age < 60*60*24*30 THEN "1d-30d"
WHEN age > 60*60*24*30 AND age < 60*60*24*365 THEN "30d-365d"
WHEN age > 60*60*24*365 THEN "year+"
ELSE "missing"
END bucket
FROM (
SELECT INTEGER(REGEXP_EXTRACT(resp_cache_control, r'max-age=(\d+)')) age, count(pageid) cnt
FROM [httparchive:runs.latest_requests_mobile]
GROUP BY age
)
) GROUP BY bucket
) ORDER BY percent desc
Whats the difference between the field “expAge” and the maxage the way you extract here?
The reason I ask is that if I use expAge instead of max-age as follows:
SELECT bucket, total, ROUND(ratio*100) as percent FROM (
SELECT bucket, SUM(CNT) total, RATIO_TO_REPORT(total) OVER() ratio FROM (
SELECT cnt, CASE
WHEN age < 60 THEN "less than 60s"
WHEN age > 60 AND age < 60*10 THEN "1-10min"
WHEN age > 60*10 AND age < 60*60 THEN "10min-1hr"
WHEN age > 60*60 AND age < 60*60*24 THEN "1hr-1d"
WHEN age > 60*60*24 AND age < 60*60*24*30 THEN "1d-30d"
WHEN age > 60*60*24*30 AND age < 60*60*24*365 THEN "30d-365d"
WHEN age > 60*60*24*365 THEN "year+"
ELSE "missing"
END bucket
FROM (
SELECT expAge age, count(pageid) cnt
FROM [httparchive:runs.latest_requests_mobile]
GROUP BY age
)
) GROUP BY bucket
) ORDER BY percent desc
Mobile
Desktop
or is it that since the last time you ran these results. So I ran the same on August 2013 data set
You should run same query on the Aug 2013 dataset to see if there has been any big changes.
AFAIK, the expAge column tries to approximate freshness based on HTTP heuristic caching algorithm, which is why the numbers would be higher (max-age is just one, albeit prefered, way to specify resource freshness). For more details, see: http://httparchive.org/about.php#charts (under Uncacheable resources).