File size and Compression savings

Continuing the discussion from How many text-based resources are served without compression?:

Ilya’s post got me thinking - how much can compression save for each file?

So, I added a new row to his query:

  ROUND(megabytes/resource_count*1024, 3) as average_size_kb,

What I discovered was that uncompressed HTML files average 2 KB smaller than those that ARE compressed:

This implies there are a LOT of small HTML files out there. So, I added another filter to break out files under 500 bytes (I figure 500 byte files fit into one packet and are not going to benefit as much from compression).

 FROM (
  SELECT respSize,
    REGEXP_EXTRACT(LOWER(resp_content_type),r'(script|json|html|css|xml|plain)') as type,
    IF(respSize >500, INTEGER(1), INTEGER(0)) AS size,
    IF (REGEXP_MATCH(LOWER(resp_content_encoding), r'gzip|deflate'), INTEGER(1), INTEGER(0)) AS compressed,
  FROM [httparchive:runs.latest_requests]

The results were surprising. Of 10.1M requests, 2.8M are under 500 bytes! (27%)

19% of all CSS
49% of HTML
53% of JSON
61% of plain
58% of XML
requests are extremely small.

The files <500 bytes are highlighted in orange in the table below:

While interesting. this still did not answer my initial question: How much can we save per file (on average)? If we assume that the files >500 bytes are evenly distributed between compressed/uncompressed, we can find the difference and potential savings per file. As you can see in the table above, the potential savings per filetype varies from 27% (for Script) to 66% (for XML)

If we apply these average savings to the aggregate totals, we find that compression would reduce these files by 14.1 GB!

Doing the same math to mobile:
only 21% of requests are <500 bytes. (BETTER!) These are shown in green below.

We see similar savings potentials: 27% for scripts to nearly 50% for CSS and JSON. Interestingly, plaintext files show little savings potential.

One more interesting piece of data. Comparing average file sizes web vs.mobile - mobile requests skew higher payload than web.

2 Likes

@doug_sillars interesting!

I was under the impression that respBodySize and respSize were supposed to represent decompressed vs compressed size of the asset… which would make finding this answer pretty easy. That said, looking at the actual data, it appears that they’re reporting the same exact numbers. Hmm…

@souders what is the distinction between these two vars, and how does HA currently calculate gzip savings (column in pages table)?

Just a question, how is the Potential_savings actually computed? Servers are free to use any level of Gzip compression some may foster lower levels to reduce CPU consumption others may deliver static content that has been heavily compressed beforehand (Google Zopfli, Kzip+kzip2gz, 7za…).
Could it be possible to roughly guess the compression level that was used?
I suppose that some compression engines are not based on the common Zlib/Gzip code, especially front-end appliance dedicated to traffic optimization.

Hi,

In my data above, the potential savings is comparing the average sizes. If gzipped files are in general 40% smaller than those not gzipped - that is the potential savings.

In HA and Webpagetest, I am not sure what method the potential savings uses.

Doug