How does one go about finding out the median # of application/json or /xml requests for pages within http archive? Wish to break that down to first party and third party stats as well.
Anyone happen to have this data handy?
How does one go about finding out the median # of application/json or /xml requests for pages within http archive? Wish to break that down to first party and third party stats as well.
Anyone happen to have this data handy?
Hi akhil,
I may have some data handy. Here are 4 queries you can run on BigQuery:
Percentiles for a specific month for all pages (50% percentile is median)
SELECT PERCENTILE_DISC(total_requests, 0) OVER() p0, PERCENTILE_DISC(total_requests, 0.1) OVER() p10, PERCENTILE_DISC(total_requests, 0.25) OVER() p25, PERCENTILE_DISC(total_requests, 0.5) OVER() p50, PERCENTILE_DISC(total_requests, 0.75) OVER() p75, PERCENTILE_DISC(total_requests, 0.9) OVER() p90, PERCENTILE_DISC(total_requests, 1) OVER() p100 FROM ( SELECT pageid, COUNT(case when mimeType LIKE "%json%" then 1 else null end) total_requests FROM httparchive.summary_requests.2018_12_15_desktop GROUP BY pageid ) LIMIT 1
Percentiles for a specific months for pages with at least 1 JSON request:
SELECT PERCENTILE_DISC(total_requests, 0) OVER() p0, PERCENTILE_DISC(total_requests, 0.1) OVER() p10, PERCENTILE_DISC(total_requests, 0.25) OVER() p25, PERCENTILE_DISC(total_requests, 0.5) OVER() p50, PERCENTILE_DISC(total_requests, 0.75) OVER() p75, PERCENTILE_DISC(total_requests, 0.9) OVER() p90, PERCENTILE_DISC(total_requests, 1) OVER() p100 FROM ( SELECT pageid, COUNT(*) total_requests FROM httparchive.summary_requests.2018_12_15_desktop WHERE mimeType LIKE "%json%" GROUP BY pageid ) LIMIT 1
Average JSON requests and pages that send at least 1 JSON request over time:
SELECT t1.table_name as table_suffix, total_requests, total_json_requests, total_json_requests/total_requests as percent_json_requets, total_json_requests/total_json_page_requests as percent_json_requets_json_pages, total_pages, json_pages, json_pages/total_pages as percent_json_pages, total_requests/total_pages AS requests_per_page, total_json_requests/total_pages AS json_requests_per_page, total_json_requests/json_pages AS json_requests_per_json_page FROM (SELECT _TABLE_SUFFIX as table_name, COUNT(*) total_json_requests, COUNT(DISTINCT pageid) json_pages FROM `httparchive.summary_requests.201*` WHERE LOWER(mimeType) LIKE "%json%" AND _TABLE_SUFFIX LIKE '%desktop' AND _TABLE_SUFFIX BETWEEN '5_00' AND '8_13' GROUP BY table_name ) t1 JOIN (SELECT _TABLE_SUFFIX as table_name, COUNT(*) total_requests, COUNT(DISTINCT pageid) total_pages FROM `httparchive.summary_requests.201*` WHERE _TABLE_SUFFIX LIKE '%desktop' AND _TABLE_SUFFIX BETWEEN '5_00' AND '8_13' GROUP BY table_name ) t2 ON t1.table_name = t2.table_name JOIN ( SELECT table_name, SUM(json_page_requests) total_json_page_requests FROM ( SELECT _TABLE_SUFFIX as table_name, pageid, COUNT(*) json_page_requests FROM `httparchive.summary_requests.201*` WHERE _TABLE_SUFFIX LIKE '%desktop' AND _TABLE_SUFFIX BETWEEN '5_00' AND '8_13' GROUP BY table_name, pageid HAVING json_page_requests > 0 AND COUNT(CASE WHEN LOWER(mimeType) LIKE '%json%' THEN 1 ELSE NULL END) > 0 ) GROUP BY table_name ) t3 ON t1.table_name = t3.table_name ORDER BY table_suffix
I have some plots for the last query using the average # of JSON requests (similar plots can be made using median).
Percent of HTTP Archive pages that send at least 1 JSON request (JSON Pages/All Pages)
Average # of JSON Requests per JSON Pages (Pages that send at least 1 JSON request)
Percent of JSON requests (JSON/All Requests)
Note: These plots do not start at 0 (and do not end at 100) so they are out of scale.
Hope this gives a starting point.
This is great and just what I was looking for @SantiagoVargas. Thank you!
Wow I can’t believe I haven’t known about this function until now! I’ve always used APPROX_QUANTILES
with OFFSET
.