How many text files are not served with gzip?


#1

The last bounty went well so let’s leave this one up to the community to answer :cowboy_hat_face:


Video walkthrough of getting started with HTTP Archive on BigQuery
#2

Some thing like

select resp_content_encoding, count(*) as total
from httparchive:runs.latest_requests
where mimeType in ('text/html', 'text/css', 'text/plain', 'image/svg+xml')
group by resp_content_encoding
order by total desc

With the full list of mimetypes will give a breakdown including those not using gzip, br etc

A query for the complete list of mimeTypes shows some pretty weird ones in use (including '‘image/Double Braid Polyester Rope9.jpg’)

select mimeType, count(*)
from httparchive:runs.latest_requests
group by mimeType

#3

Oh, you haven’t heard of the image/Double Braid Polyester Rope9.jpg spec?


#4

select
sum(if(resp_content_encoding = “gzip”,1,0)) gzipCount,
sum(if(resp_content_encoding != “gzip”,1,0)) nogzipCount,

from (
SELECT
resp_content_encoding,
type

from httparchive.runs.latest_requests
where (type contains"text"||type contains"html"||type contains"css"||type contains"js"||type contains"JSON")

)

51.6% use gzip
image

If we limit to files >1KB, (adding “AND respSize>1000” the where) the results are much more respectable.

image

Since files under 1KB fit into a single packet - you can generally skip the compression step.


#5

Wonder if you want to include the other compression options in the count - brotli, SDCH, deflate?

(Though couldn’t find any responses with SDCH so might have wrong type?)


#6

first select to:

sum(if(resp_content_encoding = “gzip”,1,0)) gzipCount,
sum(if(resp_content_encoding = “br”,1,0)) brotliCount,
sum(if(resp_content_encoding = “deflate”,1,0)) deflateCount,
count(*) total,

image
SDCH had 0.


#7

If course, the question was how many text files DO NOT use compression. The sum of Gzip, Brotli and deflate is 82% - meaning 18% is not


#8

I would update that query to also include requests where type="xml". The values of the type column also did not contain js or json, since they were represented with type="script".

Here’s a query that looks at all of the type values, and summarizes the percentage of uncompressed content:

SELECT type,
       COUNT(*) total,
       SUM(IF(resp_content_encoding!="gzip" AND resp_content_encoding!="deflate" AND resp_content_encoding!="brotli",1,0)) uncompresed_text,
       ROUND(SUM(IF(resp_content_encoding!="gzip" AND resp_content_encoding!="deflate" AND resp_content_encoding!="brotli",1,0)) / COUNT(*),2) percent_uncompressed
FROM httparchive.runs.2017_09_15_requests  
GROUP BY type
ORDER BY total DESC

When looking at the results for this, you can see the type values that are textual are script, html, css, xml and text. You can see that 27% of script resources are uncompressed, along with 61% of html, 28% of css, 47% of xml, etc

image

If we combine the textual content and look at the overall percentage of uncompressed text resources - we get approximately 36% of text based content is not being compressed.

SELECT SUM(IF(type="script" OR type="html" OR type="css" OR type="xml" OR type="text",1,0)) text,
       SUM(IF(type="script" OR type="html" OR type="css" OR type="xml" OR type="text",IF(resp_content_encoding!="gzip" AND resp_content_encoding!="deflate" AND resp_content_encoding!="brotli",1,0),0)) uncompresed_text,
       ROUND(SUM(IF(type="script" OR type="html" OR type="css" OR type="xml" OR type="text",IF(resp_content_encoding!="gzip" AND resp_content_encoding!="deflate" AND resp_content_encoding!="brotli",1,0),0)) /SUM(IF(type="script" OR type="html" OR type="css" OR type="xml" OR type="text",1,0)),2) UncompressedText
FROM httparchive.runs.2017_09_15_requests

image

Michael Gooding and Gareth Hughes did a Fluent presentation (June 2017) where they looked at similar data for gzip and brotli compression per domain to understand not just the percentage of resources that were compressed overall, but also look at histograms of resources compressed per url. The slides for that are here - https://www.slideshare.net/GarethHughes3/tldr-web-performance-workshop (slides 35 and 37 have content related to this)


#9

The value for Brotli in this field is actually br


#10

Doh! How did I miss that :slight_smile: