Caching HTML: Mobile vs. Desktop

Once again, @igrigorik 's awesome post on html caching got me thinking about mobile caching. When I took his two queries, and modified for mobile, there were minor differences, but the general trend was similar. (The big differences are likely due to the smaller sample size for mobile).

This got me wondering, how do sites treat their mobile caching differently from their desktop caching? I ran a few tests, and found that the overall results mobile to desktop were similar, but that there were some values missing. Certainly, there would be outlying sites that treat mobile significantly differently than desktop. But how many? And how? So - I mashedup up Ilya’s query with GuyPo’s (from his m. study) to grab the cache control headers and max-age from the first html on desktop and mobile and compare them:

SELECT dData.url, dData.age, mData.age, dData.resp_cache_control, mData.resp_cache_control//COUNT(dData.age) as web_count, COUNT(mData.age) as Mobile_count
FROM
  (SELECT pages.pageid as pid,url,urlhash,wptid,fHtml,fReq,fStatus,loc, age, resp_cache_control
   FROM [httparchive:runs.latest_pages] as pages JOIN
   (select pageid, MAX(firstHtml) as fHtml,MAX(firstReq) as fReq,MAX(status) fStatus, MAX(resp_location) as loc, 
INTEGER(REGEXP_EXTRACT(resp_cache_control, r'max-age=(\d+)')) age, 
resp_cache_control
    from [httparchive:runs.latest_requests]
    WHERE 
    firstHtml = true AND 
   status =200 
    group by resp_cache_control,age, pageid) as reqs ON
   (reqs.pageid = pages.pageid)
   
   )as dData
 JOIN
  (SELECT pages.pageid as pid,url,wptid,fHtml,fReq,fStatus,loc , age, resp_cache_control
   FROM [httparchive:runs.latest_pages_mobile] as pages JOIN
   (select pageid, MAX(firstHtml) as fHtml,MAX(firstReq) as fReq,MAX(status) fStatus, MAX(resp_location) as loc, 
INTEGER(REGEXP_EXTRACT(resp_cache_control, r'max-age=(\d+)')) age,
resp_cache_control
      
   from [httparchive:runs.latest_requests_mobile]
   WHERE 
   firstHtml = true AND 
   status =200

   group by resp_cache_control,age,pageid) as reqs ON
   (reqs.pageid = pages.pageid )
   
   ) as mData
 ON mData.url=dData.url
 where mData.url=dData.url AND dData.age!=mData.age
 
 Group By dData.url, dData.age, mData.age, dData.resp_cache_control, mData.resp_cache_control
 //having web_count >20
 //order by dData.age asc

Of 4672 sites that match in the 2 databases, 3645 (78%) have the same cache control response header. 996 (21%) have the same max-age (mData.age = dData.age). I then changed the where parameter to further breakdown the sites into various categories.
What I am interested in are the sites that are outside the norm. There are certainly legitimate reasons to cache longer (or shorter) on a mobile device compared to desktop. So, let’s look into the 1027 (22%) sites that are doing caching differently mobile vs. desktop:

Table 1: Breakdown of sites with different cache control headers for Mobile and Desktop.

Let’s look through these one by one (ignoring headers with the same max-age – because that sounds kind of boring):

322 have different cache headers but no max age values. Of these:

Table 2 Breakdown of different cache control headers with no max-age values for mobile or desktop.

The first 2 lines in the Table 2 show sites that have cache control headers for only mobile (69), or only desktop (85) but not the other version (that’s 3.3% of all sites). A large number are different by only a few characters, and glancing at the results – they are generally missing commas between parameters. Then there are 1.5% of sites that have cache control headers that are longer for either mobile or desktop due to more parameters being added for one or the other.

Table 3: When Cache Control Max ages differ

In table 3, the top line and bottom line show 2 extremes, where the cache directives differ by over 15 minutes one way or another. 1.1% of websites studied suffer from this. Another 1.6% of sites have cache headers that are over 2 minutes (but less than 15 minutes) different.

Tables 4 and 5 Sites with Max-age values for only mobile or desktop, broken down by available max-age.

In Tables 4 and 5, we see a breakdown of mobile max ages when there is a mobile max-age, but no desktop max-age (and desktop max ages when there is no mobile max age). Most are under 5 minutes, but interestingly, there are 51 sites that have max-ages 5min-1 day different (1.1% of all sites). 25 sites have a max age>1 day (while not specifying the other)! That’s 0.54% of all sites studied.

In conclusion, cache control headers and the max-age for caching can (and probably should) vary for mobile and desktop sites. We see 22% of sites with headers that vary from our sample of 4672 sites. However, there are no real patterns in the data as to identify ideal caching length, and ~10% of sites have cache control headers or max-age values that are extremely different between their mobile and desktop offerings. This goes to show that devlopers should 1. add cache headers and 2. periodically review the values on a fairly regular basis to ensure that all of the sites you deliver have cache headers that make sense for mobile and desktop.

1 Like

@doug_sillars awesome analysis.

A large number are different by only a few characters, and glancing at the results – they are generally missing commas between parameters.

That makes me curious… would modern UA’s cache resources with these headers? Technically, they are incorrect values, so I wouldn’t be surprised if the UA’s ignore it entirely.

Good Question. I think the browser will handle this properly. This appears to be an artifact of how the different agents record the data. I checked 3 sites that differed by a comma in the HTTPArchive (using Chrome devtools - emulating Nexus4), and the headers look like this:

for both mobile and desktop. Perhaps the desktop version does not append a comma? For example, Shutterfly.com has the comma in mobile, but not desktop:

http://httparchive.org/viewsite.php?pageid=17328081#requests
http://mobile.httparchive.org/viewsite.php?pageid=304776#requests

NB that shutterfly on Chrome emulation does not redirect to the ‘m.’ even for the iPhone 4, but it does in the HTTPArchive data.

FWIW, all the HA scans are done with IE, so I guess my actual question is: how lenient are the different UAs in their parsing of CC headers (i.e. do they enforce commas)? The UAs may also display the header values differently, but that’s a different story…