What is the Cache-Control maxage distribution?

igrigorik · August 22, 2013, 10:12pm

SELECT INTEGER(REGEXP_EXTRACT(resp_cache_control, r'max-age=(\d+)')) age, count(pageid) cnt
FROM [httparchive:runs.latest_requests_mobile]
GROUP BY age
HAVING cnt > 500
ORDER BY cnt desc

Based on Aug 15, 2013 data (mobile)… 40% of the requests don’t specify maxage, and for the ones that do:

igrigorik · August 22, 2013, 10:25pm

SELECT bucket, total, ROUND(ratio*100) as percent FROM (
  SELECT bucket, SUM(CNT) total, RATIO_TO_REPORT(total) OVER() ratio FROM (
    SELECT cnt, CASE
      WHEN age < 60                                 THEN "less than 60s"
      WHEN age > 60 AND age < 60*10                 THEN "1-10min"
      WHEN age > 60*10 AND age < 60*60              THEN "10min-1hr"
      WHEN age > 60*60 AND age < 60*60*24           THEN "1hr-1d"
      WHEN age > 60*60*24 AND age < 60*60*24*30     THEN "1d-30d"
      WHEN age > 60*60*24*30 AND age < 60*60*24*365 THEN "30d-365d"
      WHEN age > 60*60*24*365                       THEN "year+"
      ELSE "missing"
      END bucket
    FROM (
      SELECT INTEGER(REGEXP_EXTRACT(resp_cache_control, r'max-age=(\d+)')) age, count(pageid) cnt
      FROM [httparchive:runs.latest_requests_mobile]
      GROUP BY age
    )
  ) GROUP BY bucket
) ORDER BY percent desc

Mobile:

Desktop:

pganti · July 17, 2014, 11:41pm

Whats the difference between the field “expAge” and the maxage the way you extract here?

The reason I ask is that if I use expAge instead of max-age as follows:

SELECT bucket, total, ROUND(ratio*100) as percent FROM (
  SELECT bucket, SUM(CNT) total, RATIO_TO_REPORT(total) OVER() ratio FROM (
	SELECT cnt, CASE
	  WHEN age < 60                                 THEN "less than 60s"
	  WHEN age > 60 AND age < 60*10                 THEN "1-10min"
	  WHEN age > 60*10 AND age < 60*60              THEN "10min-1hr"
	  WHEN age > 60*60 AND age < 60*60*24           THEN "1hr-1d"
	  WHEN age > 60*60*24 AND age < 60*60*24*30     THEN "1d-30d"
	  WHEN age > 60*60*24*30 AND age < 60*60*24*365 THEN "30d-365d"
	  WHEN age > 60*60*24*365                       THEN "year+"
	  ELSE "missing"
	  END bucket
	FROM (
	  SELECT expAge age, count(pageid) cnt
	  FROM [httparchive:runs.latest_requests_mobile]
	  GROUP BY age
	)
  ) GROUP BY bucket
) ORDER BY percent desc

Mobile

Desktop

or is it that since the last time you ran these results. So I ran the same on August 2013 data set

Desktop:

Mobile:

igrigorik · July 18, 2014, 6:31am

You should run same query on the Aug 2013 dataset to see if there has been any big changes.

AFAIK, the expAge column tries to approximate freshness based on HTTP heuristic caching algorithm, which is why the numbers would be higher (max-age is just one, albeit prefered, way to specify resource freshness). For more details, see: http://httparchive.org/about.php#charts (under Uncacheable resources).

pganti · July 19, 2014, 6:16am

The last 2 charts were based on Aug 2013 data set

Thanks a lot for the clarification as this is exactly what I was looking for…

Topic		Replies	Views
Cache-Control response policy of HTML documents Analysis	2	2751	July 13, 2014
Caching HTML: Mobile vs. Desktop Analysis	3	4041	July 22, 2014
What's the distribution of requests per page? Analysis	8	7420	May 1, 2019
Analyzing Resource Age by Content Type Analysis	0	4289	May 27, 2019
HTTP Keep-Alive analysis Analysis	0	3046	November 13, 2013

What is the Cache-Control maxage distribution?

Related topics