Median # of application/json or /xml requests within a page

#1

How does one go about finding out the median # of application/json or /xml requests for pages within http archive? Wish to break that down to first party and third party stats as well.

Anyone happen to have this data handy?

#2

Hi akhil,

I may have some data handy. Here are 4 queries you can run on BigQuery:

Percentiles for a specific month for all pages (50% percentile is median)

SELECT 
PERCENTILE_DISC(total_requests, 0) OVER() p0,
PERCENTILE_DISC(total_requests, 0.1) OVER() p10,
PERCENTILE_DISC(total_requests, 0.25) OVER() p25,
PERCENTILE_DISC(total_requests, 0.5) OVER() p50,
PERCENTILE_DISC(total_requests, 0.75) OVER() p75,
PERCENTILE_DISC(total_requests, 0.9) OVER() p90,
PERCENTILE_DISC(total_requests, 1) OVER() p100
FROM (
SELECT pageid,
COUNT(case when mimeType LIKE "%json%" then 1 else null end) total_requests
FROM httparchive.summary_requests.2018_12_15_desktop
GROUP BY pageid
) LIMIT 1

Percentiles for a specific months for pages with at least 1 JSON request:

SELECT 
PERCENTILE_DISC(total_requests, 0) OVER() p0,
PERCENTILE_DISC(total_requests, 0.1) OVER() p10,
PERCENTILE_DISC(total_requests, 0.25) OVER() p25,
PERCENTILE_DISC(total_requests, 0.5) OVER() p50,
PERCENTILE_DISC(total_requests, 0.75) OVER() p75,
PERCENTILE_DISC(total_requests, 0.9) OVER() p90,
PERCENTILE_DISC(total_requests, 1) OVER() p100
FROM (
SELECT pageid,
COUNT(*) total_requests
FROM httparchive.summary_requests.2018_12_15_desktop
WHERE mimeType LIKE "%json%"
GROUP BY pageid
) LIMIT 1

Average JSON requests and pages that send at least 1 JSON request over time:

SELECT 
t1.table_name as table_suffix,
total_requests,
total_json_requests,
total_json_requests/total_requests as percent_json_requets,
total_json_requests/total_json_page_requests as percent_json_requets_json_pages,
total_pages,
json_pages,
json_pages/total_pages as percent_json_pages,
total_requests/total_pages AS requests_per_page,
total_json_requests/total_pages AS json_requests_per_page,
total_json_requests/json_pages AS json_requests_per_json_page
FROM (SELECT
  _TABLE_SUFFIX as table_name,
  COUNT(*) total_json_requests,
  COUNT(DISTINCT pageid) json_pages
FROM
  `httparchive.summary_requests.201*`
WHERE
  LOWER(mimeType) LIKE "%json%" AND _TABLE_SUFFIX LIKE '%desktop' AND _TABLE_SUFFIX BETWEEN '5_00' AND '8_13'
GROUP BY table_name
) t1 
JOIN (SELECT
  _TABLE_SUFFIX as table_name,
  COUNT(*) total_requests,
  COUNT(DISTINCT pageid) total_pages
FROM
  `httparchive.summary_requests.201*`
WHERE
  _TABLE_SUFFIX LIKE '%desktop' AND _TABLE_SUFFIX BETWEEN '5_00' AND '8_13'
GROUP BY table_name
) t2 ON t1.table_name = t2.table_name
JOIN (
  SELECT
    table_name,
    SUM(json_page_requests) total_json_page_requests
  FROM (
    SELECT
      _TABLE_SUFFIX as table_name,
      pageid,
      COUNT(*) json_page_requests
    FROM
      `httparchive.summary_requests.201*`
    WHERE
      _TABLE_SUFFIX LIKE '%desktop' AND _TABLE_SUFFIX BETWEEN '5_00' AND '8_13'
    GROUP BY table_name, pageid
    HAVING json_page_requests > 0 AND COUNT(CASE WHEN LOWER(mimeType) LIKE '%json%' THEN 1 ELSE NULL END) > 0
  ) 
  GROUP BY table_name
) t3 ON t1.table_name = t3.table_name
ORDER BY table_suffix

I have some plots for the last query using the average # of JSON requests (similar plots can be made using median).

Percent of HTTP Archive pages that send at least 1 JSON request (JSON Pages/All Pages)
json_pages_percent

Average # of JSON Requests per JSON Pages (Pages that send at least 1 JSON request)
json_requests_json_page

Percent of JSON requests (JSON/All Requests)
percent_json_requests

Note: These plots do not start at 0 (and do not end at 100) so they are out of scale.

Hope this gives a starting point.

1 Like
#3

This is great and just what I was looking for @SantiagoVargas. Thank you!

1 Like
#4

Wow I can’t believe I haven’t known about this function until now! I’ve always used APPROX_QUANTILES with OFFSET.