I wanted to know the percentage of responses that had a status code of 200, 304, 404, etc. Here’s the query:
SELECT status, round(100*ratio) as percent, num
FROM (
SELECT status, count(*) as num, RATIO_TO_REPORT(num) OVER() ratio
FROM httparchive:runs.2014_11_15_requests
GROUP BY status
)
ORDER BY num desc
Whats with the “0” as response code? Does that mean literal 0 or some error code?
It would also be interesting to fork httparchive to maintain a browser cache and look at the trend of 304s. My hypothesis is that low number of 304s are explained by the fact that there is no persistence between crawls, right?