Median onload time of a URL across runs


#1

How do we retrieve the complete record (e.g. onLoad, bytesTotal, reqTotal) corresponding to the median onload time of a URL (e.g. http://www.uber.com/) across runs?

The queries in the report on the site allows grouping by runs using wild card character (table names suffixed with the date - httparchive.summary_pages.*)… Though I was able to use group by URL and APPROX_QUANTILES across tables, need some advise in retrieving the other attributes corresponding to the median value.

Was trying to understand the variability in performance of pages across runs.

Any help is appreciated.


#2

IIUC you’re looking for the date at which the OL metric is the median for all crawls, then you’d like to get the other stats for that date.

The trick is to use a subquery in the WHERE clause that gets the median OL time.

SELECT
  SUBSTR(_TABLE_SUFFIX, 0, 10) AS date,
  onLoad,
  bytesTotal,
  reqTotal
FROM
  `httparchive.summary_pages.*`
WHERE
  url = 'http://www.uber.com/'
  AND ENDS_WITH(_TABLE_SUFFIX, 'desktop')
  AND onLoad = (
  SELECT
    APPROX_QUANTILES(onLoad, 1001)[OFFSET(501)]
  FROM
    `httparchive.summary_pages.*`
  WHERE
    url = 'http://www.uber.com/'
    AND ENDS_WITH(_TABLE_SUFFIX, 'desktop'))
date onLoad bytesTotal reqTotal
2015_10_15 9974 5179852 71

https://bigquery.cloud.google.com/savedquery/226352634162:566dab5afa194ca8b6d89ffc23787d1c


#3

Thanks @rviscomi - it helped.
Also included another nested subquery, to allow for fetching of details corresponding to the median run for multiple urls in the query.

SELECT
  SUBSTR(_TABLE_SUFFIX, 0, 10) AS date,
  url,
  onLoad,
  bytesTotal,
  reqTotal
FROM
  `httparchive.summary_pages.*`
WHERE
  _TABLE_SUFFIX >= '2018_01_01'
  AND ENDS_WITH(_TABLE_SUFFIX, 'desktop')
  AND url IN ('http://www.uber.com/',
    'http://www.sbi.co.in/',
    'http://www.onlinesbi.com/')
  AND CONCAT(url, CAST(onLoad AS STRING)) IN (
  SELECT
    CONCAT(url, CAST(p50onLoad AS STRING)) AS urlPlusOnLoad
  FROM (
    SELECT
      url,
      COUNT(url) AS urlcount,
      APPROX_QUANTILES(onLoad, 1001)[
    OFFSET
      (501)] AS p50onLoad
    FROM
      `httparchive.summary_pages.*`
    WHERE
      _TABLE_SUFFIX >= '2018_01_01'
      AND ENDS_WITH(_TABLE_SUFFIX, 'desktop')
      AND url IN ('http://www.uber.com/',
        'http://www.sbi.co.in/',
        'http://www.onlinesbi.com/')
    GROUP BY
      url
    ORDER BY
      url ) )

image