Sites that deliver Images using gzip/deflate encoding

From the standard best practices for reducing payload we know the following rule:
Don’t use gzip for image or other binary files.

However lets see if this is true in practice and if not what are the sites that do use gzip/deflate for their images using the following sample query

select count(requestid) as ct, domain(url) as dom from [httparchive:runs.latest_requests] where 
(resp_content_type='image/png' or resp_content_type='image/jpeg' or resp_content_type='image/jpg')
and resp_content_encoding in ('gzip', 'deflate') 
group by dom
having ct > 333
order by ct desc limit 20

shows up the following domains

We can drill down into specific URLs for the domain using the following query

select url from [httparchive:runs.latest_requests] where resp_content_type='image/jpeg' and resp_content_encoding in ('gzip', 'deflate') and domain(url) = 'facebook.com'

Since the top one happens to ‘external.ak.fbcdn.net’ which looks like Facebook but is served from Akamai. Funny thing is that we can verify that the “Content-Encoding:gzip” is actually added by the Akamai config not the original content publisher as seen here:

Content Publisher: http://www.webpagetest.org/result/140416_1P_64S/1/details/#request1
FB Shared Content:
http://www.webpagetest.org/result/140416_BB_675/1/details/#request1

Lets pick Slideshare for example which does use Akamai as their CDN but their origin is the one setting gzip headers on the jpeg (maybe a misconfig on their end)

Origin: http://www.webpagetest.org/result/140416_33_69J/1/details/#request1

CDN : http://www.webpagetest.org/result/140416_6T_69Q/1/details/#request1

Finally given that facebook.com existed on top20 domains doing this, I picked the profile pic url: http://www.webpagetest.org/result/140416_S2_SSX/1/details/#request1

I was a wondering if we pay any decompression costs (high on mobile relative to desktop), so ran the same query for mobile dataset it returns 0 records which is good news as no sites in httparchive do gzip compression on images

select count(requestid) as ct, domain(url) as dom from [httparchive:runs.latest_requests] where 
(resp_content_type='image/png' or resp_content_type='image/jpeg' or resp_content_type='image/jpg')
and resp_content_encoding in ('gzip', 'deflate') 
group by dom
having ct > 333
order by ct desc limit 20

Net net of all this is that I can conclude most of these just misconfigurations (as in the generating endpoint democratically compresses without regards to content type)

Is there any other harm I am not aware of when compressing images?

@pganti nice analysis.

Is there any other harm I am not aware of when compressing images?

Unnecessary CPU overhead on the server and client… and in some cases, extra bytes due to gzip headers. :slight_smile:

Interesting finding. A couple of small comments:
(1) It actually looks like Facebook adds the gzip at Facebook origin, as you can see from requesting the image through www.facebok.com (which is not, as far as I can tell, on Akamai): http://www.webpagetest.org/result/140416_P8_8b5b21a97ca0921a78924ef2305bbdf4/1/details/#request1

(2) That said, you can make Akamai gzip your images, but it’s not default (a customer can state which content types should be compressed)

I’ve actually seen cases where gzipping jpeg images reduced their size. However, arguably that’s just the result of incorrect encoding of the images, for instance leaving metadata in there. I can envision cases where you have to keep metadata on your images for various reasons, and it’s plausible that in those cases gzip might help, but that seems like a very rare scenario.

By nature some mandatory JPEG headers like the quantization tables are relatively easy to compress with gzip, but once you enter the body of the image which is huffman encoded gzip will no longer be able to work efficiently.
This could be tested using gzthermal, look for DQT (Define Quantization Table) and SOFx (Start of Frame) markers in the compressed stream (edit: unlike PNG, JPEG markers are not in ASCII but their position in the file can be retrieved with tools like JPEGsnoop).
Here is a sample heatmap view produced by gzthermal when fed with a .jpg.gz file:
http://encode.ru/threads/1889-gzthermal-pseudo-thermal-view-of-Gzip-Deflate-compression-efficiency?p=37584&viewfull=1#post37584

It is possible to losslessly recompress JPEGs.

The fastest solution (a part from the usual sequential to progressive scan conversion) would be to switch to arithmetic coding instead of Huffman encoding (saves about 7%), but arithmetic coding is not well supported (historically a patent issue). The huge advantage of this technique is that it is damn easy to reencode to Huffman if compatibility with legacy JPEG decoders is important.

A more evolved one would be to use specific tools like PackJPG http://www.elektronik.htw-aalen.de/packjpg/

Unfortunately Gzip/Deflate is way to old to include such a sophisticated recompression sheme, it does not even cope specifically with Base64 encoded (HTML/CSS inlined) blobs since it is only a generic data compressor.

Paddy,

cool study.

If you cut out the ct>333 limit for mobile, there are sites with Gzip for images:

The mobile data set is so small that if you remove the

and resp_content_encoding in ('gzip', 'deflate') 

line, the max # of responses from one domain is only ~800.

Note above that I am showing the host (not the domain). The external.ak.fbcdn.net is the culprit on the web side as well. Is it possible that since these images are used externally that they have this added header? Or perhaps tested differently?

@doug_sillars: Good Catch. I did put the filter of 333 for certain traffic level but you are right in identifying mobile sites that do this. FB seems to be the main one.

@guypod: Thank you for the clarification. It does seem clear that its an origin issue. I shall notify our friends at FB to take some action on this.

The two Facebook images you reference are the same image, but they aren’t the same size. That makes comparing them kind of pointless.

@josephscott: I understand. But the key import of doing that example is to look at who is adding the header in the first place not so much as the content. Even if you change the size the header remains.

FB has been notified of the issue and are actively investigating the fix and I hope by the next crawl we should see fb removed from the list