What is the Cache-Control maxage distribution?

SELECT INTEGER(REGEXP_EXTRACT(resp_cache_control, r'max-age=(\d+)')) age, count(pageid) cnt
FROM [httparchive:runs.latest_requests_mobile]
GROUP BY age
HAVING cnt > 500
ORDER BY cnt desc

Based on Aug 15, 2013 data (mobile)… 40% of the requests don’t specify maxage, and for the ones that do:

1 Like
SELECT bucket, total, ROUND(ratio*100) as percent FROM (
  SELECT bucket, SUM(CNT) total, RATIO_TO_REPORT(total) OVER() ratio FROM (
    SELECT cnt, CASE
      WHEN age < 60                                 THEN "less than 60s"
      WHEN age > 60 AND age < 60*10                 THEN "1-10min"
      WHEN age > 60*10 AND age < 60*60              THEN "10min-1hr"
      WHEN age > 60*60 AND age < 60*60*24           THEN "1hr-1d"
      WHEN age > 60*60*24 AND age < 60*60*24*30     THEN "1d-30d"
      WHEN age > 60*60*24*30 AND age < 60*60*24*365 THEN "30d-365d"
      WHEN age > 60*60*24*365                       THEN "year+"
      ELSE "missing"
      END bucket
    FROM (
      SELECT INTEGER(REGEXP_EXTRACT(resp_cache_control, r'max-age=(\d+)')) age, count(pageid) cnt
      FROM [httparchive:runs.latest_requests_mobile]
      GROUP BY age
    )
  ) GROUP BY bucket
) ORDER BY percent desc

Mobile:

Desktop:

2 Likes

Whats the difference between the field “expAge” and the maxage the way you extract here?

The reason I ask is that if I use expAge instead of max-age as follows:

SELECT bucket, total, ROUND(ratio*100) as percent FROM (
  SELECT bucket, SUM(CNT) total, RATIO_TO_REPORT(total) OVER() ratio FROM (
	SELECT cnt, CASE
	  WHEN age < 60                                 THEN "less than 60s"
	  WHEN age > 60 AND age < 60*10                 THEN "1-10min"
	  WHEN age > 60*10 AND age < 60*60              THEN "10min-1hr"
	  WHEN age > 60*60 AND age < 60*60*24           THEN "1hr-1d"
	  WHEN age > 60*60*24 AND age < 60*60*24*30     THEN "1d-30d"
	  WHEN age > 60*60*24*30 AND age < 60*60*24*365 THEN "30d-365d"
	  WHEN age > 60*60*24*365                       THEN "year+"
	  ELSE "missing"
	  END bucket
	FROM (
	  SELECT expAge age, count(pageid) cnt
	  FROM [httparchive:runs.latest_requests_mobile]
	  GROUP BY age
	)
  ) GROUP BY bucket
) ORDER BY percent desc

Mobile

Desktop

or is it that since the last time you ran these results. So I ran the same on August 2013 data set

Desktop:

Mobile:

You should run same query on the Aug 2013 dataset to see if there has been any big changes.

AFAIK, the expAge column tries to approximate freshness based on HTTP heuristic caching algorithm, which is why the numbers would be higher (max-age is just one, albeit prefered, way to specify resource freshness). For more details, see: http://httparchive.org/about.php#charts (under Uncacheable resources).

The last 2 charts were based on Aug 2013 data set :slight_smile:

Thanks a lot for the clarification as this is exactly what I was looking for…