Evite is delivering 1MB+ in HTTP cookies, the bulk of which is due to 150+ TowerData tracking requests with large cookies… Is that a bug, or a recurring pattern?
SELECT * FROM ( SELECT *, SUM(reqHeaders+respHeaders+respBody) OVER(PARTITION BY pages.url) as total FROM ( SELECT pages.url, pages.rank, COUNT(*) as numRequests, SUM(reqHeadersSize) as reqHeaders, SUM(respHeadersSize) as respHeaders, SUM(respBodySize) as respBody, FROM httparchive:runs.2014_11_15_pages as pages JOIN ( SELECT pageid, url, reqHeadersSize, respHeadersSize, respBodySize FROM httparchive:runs.2014_11_15_requests WHERE url CONTAINS 'rlcdn.com' ) as req ON req.pageid = pages.pageid WHERE pages.rank IS NOT NULL GROUP BY pages.url, pages.rank ) ORDER BY rank, total DESC ) WHERE total > 1000000
There are 1306 sites that incur 1MB+ due to rlcdn.com (TowerData’s CDN)… The Atlantic, Shutterfly, Evite, Daily Record, and the list goes on. These sites account for ~10% of pages on which TowerData is used. Most pages only make a few requests to their CDN:
This difference is suspect… I wonder if this is a (really costly) misconfiguration for 10% of the sites, or if the ~1MB+ of trackers is a “feature”. In either case though… ouch!
Followup question: is TowerData an outlier, or is this a common pattern?
It does appear to be an outlier. There are over 300+ third-party domains which are used across 1000+ pages and rlcdn.com incurs the highest overhead.
E.g. reading above chart… google.com is the most widely referenced third-party hostname: used by ~125K pages and accounts for ~700K requests. The median number of requests is 2, and 95th percentile is 18. For these requests, the median request and response header size is ~2KB, and ~18KB for 95th percentile.
SELECT origin, COUNT(DISTINCT pageid) as pages, SUM(numReq) as reqTotal, /* number of requests */ NTH(50, QUANTILES(numReq)) as numReq_50, NTH(95, QUANTILES(numReq)) as numReq_95, /* request headers bytesize */ NTH(50, QUANTILES(reqHeaders)) as reqHeaders_50, NTH(95, QUANTILES(reqHeaders)) as reqHeaders_95, /* response headers bytesize */ NTH(50, QUANTILES(respHeaders)) as respHeaders_50, NTH(95, QUANTILES(respHeaders)) as respHeaders_95, /* response body bytesize */ NTH(50, QUANTILES(respBody)) as respBody_50, NTH(95, QUANTILES(respBody)) as respBody_95, FROM ( SELECT origin, pageid, COUNT(*) as numReq, SUM(reqHeadersSize) as reqHeaders, SUM(reqBodySize) as reqBody, SUM(respHeadersSize) as respHeaders, SUM(respBodySize) as respBody FROM ( SELECT req.pageid as pageid, DOMAIN(req.url) as origin, reqHeadersSize, respHeadersSize, IF (INTEGER(respBodySize) IS NULL, 0, INTEGER(respBodySize)) as respBodySize, IF (INTEGER(reqBodySize) IS NULL, 0, INTEGER(reqBodySize)) as reqBodySize FROM httparchive:runs.2014_11_15_requests as req JOIN ( SELECT pageid, url FROM httparchive:runs.2014_11_15_pages ) as pages ON req.pageid = pages.pageid /* restrict analysis to non-origin domains */ WHERE DOMAIN(req.url) != DOMAIN(pages.url) ) GROUP BY pageid, origin ) GROUP EACH BY origin /* restrict to origins used by 1000+ distinct pages */ HAVING pages > 1000 ORDER BY reqTotal desc