I recently encountered issues on a proxy facing websites returning Deflate Content-Enconding to requests with Accept-Encoding to ‘gzip,deflate’ (older versions of IIS for instance).
So I was curious:
select count(url) from [httparchive:runs.latest_requests]
where firstHtml=true and resp_content_encoding = "deflate";
Shows a result of 541, 0.18% of the total of firstHtml requests.
Is there any easy way to know if Deflate content is actually Raw Deflate or HTTP/1.1 Deflate (deflate inside a zlib formatted stream) ? (even out of HttpArchive context)
AFAIK, no. You’d have to sniff the response body, but even with that, since the crawler is running IE, we don’t have access to the raw bytes since WinINet would handle all the decompression, etc.
Thanks a lot for your anwser.
Assuming I use a proxy so I do have access to raw bytes. I did not found any way to detect Raw Deflate easily using Java, neither using any kind of utility.
I found a low-level solution, from KDE Core, to check zlib header:
const char firstChar = d[0];
if ((firstChar & 0x0f) != 8) {
// In a zlib header, CM should be 8 (cf RFC 1950)
zlibHeader = false;
} else if (d.size() > 1) {
const char flg = d[1];
if ((firstChar * 256 + flg) % 31 != 0) { // Not a multiple of 31? invalid zlib header then
zlibHeader = false;
}
}
if I make no mistake, zlibHeader = false would be Raw Deflate for sure?
If so, I will probably take a shot on the 541 results!
1 Like
Seems right, but I’m not an expert on the subject. Seems like a good StackOverflow question for some compression gurus!
sounds fair! Thanks again.
I hope to come back soon with the results.