How do we retrieve the complete record (e.g. onLoad, bytesTotal, reqTotal) corresponding to the median onload time of a URL (e.g. http://www.uber.com/) across runs?
The queries in the report on the site allows grouping by runs using wild card character (table names suffixed with the date - httparchive.summary_pages.*)… Though I was able to use group by URL and APPROX_QUANTILES across tables, need some advise in retrieving the other attributes corresponding to the median value.
Was trying to understand the variability in performance of pages across runs.
IIUC you’re looking for the date at which the OL metric is the median for all crawls, then you’d like to get the other stats for that date.
The trick is to use a subquery in the WHERE clause that gets the median OL time.
SELECT
SUBSTR(_TABLE_SUFFIX, 0, 10) AS date,
onLoad,
bytesTotal,
reqTotal
FROM
`httparchive.summary_pages.*`
WHERE
url = 'http://www.uber.com/'
AND ENDS_WITH(_TABLE_SUFFIX, 'desktop')
AND onLoad = (
SELECT
APPROX_QUANTILES(onLoad, 1001)[OFFSET(501)]
FROM
`httparchive.summary_pages.*`
WHERE
url = 'http://www.uber.com/'
AND ENDS_WITH(_TABLE_SUFFIX, 'desktop'))
Thanks @rviscomi - it helped.
Also included another nested subquery, to allow for fetching of details corresponding to the median run for multiple urls in the query.
SELECT
SUBSTR(_TABLE_SUFFIX, 0, 10) AS date,
url,
onLoad,
bytesTotal,
reqTotal
FROM
`httparchive.summary_pages.*`
WHERE
_TABLE_SUFFIX >= '2018_01_01'
AND ENDS_WITH(_TABLE_SUFFIX, 'desktop')
AND url IN ('http://www.uber.com/',
'http://www.sbi.co.in/',
'http://www.onlinesbi.com/')
AND CONCAT(url, CAST(onLoad AS STRING)) IN (
SELECT
CONCAT(url, CAST(p50onLoad AS STRING)) AS urlPlusOnLoad
FROM (
SELECT
url,
COUNT(url) AS urlcount,
APPROX_QUANTILES(onLoad, 1001)[
OFFSET
(501)] AS p50onLoad
FROM
`httparchive.summary_pages.*`
WHERE
_TABLE_SUFFIX >= '2018_01_01'
AND ENDS_WITH(_TABLE_SUFFIX, 'desktop')
AND url IN ('http://www.uber.com/',
'http://www.sbi.co.in/',
'http://www.onlinesbi.com/')
GROUP BY
url
ORDER BY
url ) )