Continuing the discussion from How many text-based resources are served without compression?:
Ilya’s post got me thinking - how much can compression save for each file?
So, I added a new row to his query:
ROUND(megabytes/resource_count*1024, 3) as average_size_kb,
What I discovered was that uncompressed HTML files average 2 KB smaller than those that ARE compressed:
This implies there are a LOT of small HTML files out there. So, I added another filter to break out files under 500 bytes (I figure 500 byte files fit into one packet and are not going to benefit as much from compression).
FROM (
SELECT respSize,
REGEXP_EXTRACT(LOWER(resp_content_type),r'(script|json|html|css|xml|plain)') as type,
IF(respSize >500, INTEGER(1), INTEGER(0)) AS size,
IF (REGEXP_MATCH(LOWER(resp_content_encoding), r'gzip|deflate'), INTEGER(1), INTEGER(0)) AS compressed,
FROM [httparchive:runs.latest_requests]
The results were surprising. Of 10.1M requests, 2.8M are under 500 bytes! (27%)
19% of all CSS
49% of HTML
53% of JSON
61% of plain
58% of XML
requests are extremely small.
The files <500 bytes are highlighted in orange in the table below:
While interesting. this still did not answer my initial question: How much can we save per file (on average)? If we assume that the files >500 bytes are evenly distributed between compressed/uncompressed, we can find the difference and potential savings per file. As you can see in the table above, the potential savings per filetype varies from 27% (for Script) to 66% (for XML)
If we apply these average savings to the aggregate totals, we find that compression would reduce these files by 14.1 GB!
Doing the same math to mobile:
only 21% of requests are <500 bytes. (BETTER!) These are shown in green below.
We see similar savings potentials: 27% for scripts to nearly 50% for CSS and JSON. Interestingly, plaintext files show little savings potential.
One more interesting piece of data. Comparing average file sizes web vs.mobile - mobile requests skew higher payload than web.