Size of document cookies

    SELECT  
     NTH(50, quantiles(numbytes)) median,
     NTH(75, quantiles(numbytes)) seventy_fifth,
     NTH(90, quantiles(numbytes)) ninetieth,
     NTH(95, quantiles(numbytes)) ninety_fifth
    FROM (  
      SELECT respCookieLen as numbytes
      FROM [httparchive:runs.latest_requests]
      WHERE firsthtml = true
    )

Based on the July 1 2013 crawl:

Not as big as I feared. But there are still some big ones out there:

    select respCookieLen, pageid, url from httparchive:runs.latest_requests 
    where firsthtml=true order by respCookieLen desc limit 10

My heart skipped a beat when I initially thought the 10th largest cookie was from β€œgoogle.com”. Then I looked closer. :wink:

3 Likes

Well, the good news is, the tail is indeed very long but it decays very quickly! Once you get over 1.5K, we’re looking at single digit numbers of sites (with outliers at 10k). Quick histogram:

SELECT cookie_bucket, COUNT(*) pages FROM (  
  SELECT
    ROUND(respCookieLen/10)*10 cookie_bucket
  FROM [httparchive:runs.latest_requests]
  WHERE firsthtml = true
)
GROUP BY cookie_bucket  
ORDER BY cookie_bucket;