Evite is delivering 1MB+ in HTTP cookies, the bulk of which is due to 150+ TowerData tracking requests with large cookies… Is that a bug, or a recurring pattern?
SELECT * FROM (
SELECT *, SUM(reqHeaders+respHeaders+respBody) OVER(PARTITION BY pages.url) as total FROM (
SELECT pages.url, pages.rank,
COUNT(*) as numRequests,
SUM(reqHeadersSize) as reqHeaders,
SUM(respHeadersSize) as respHeaders,
SUM(respBodySize) as respBody,
FROM httparchive:runs.2014_11_15_pages as pages JOIN (
SELECT pageid, url, reqHeadersSize, respHeadersSize, respBodySize
FROM httparchive:runs.2014_11_15_requests
WHERE url CONTAINS 'rlcdn.com'
) as req ON req.pageid = pages.pageid
WHERE pages.rank IS NOT NULL
GROUP BY pages.url, pages.rank
)
ORDER BY rank, total DESC
)
WHERE total > 1000000
There are 1306 sites that incur 1MB+ due to rlcdn.com (TowerData’s CDN)… The Atlantic, Shutterfly, Evite, Daily Record, and the list goes on. These sites account for ~10% of pages on which TowerData is used. Most pages only make a few requests to their CDN:
This difference is suspect… I wonder if this is a (really costly) misconfiguration for 10% of the sites, or if the ~1MB+ of trackers is a “feature”. In either case though… ouch!
Followup question: is TowerData an outlier, or is this a common pattern?
It does appear to be an outlier. There are over 300+ third-party domains which are used across 1000+ pages and rlcdn.com incurs the highest overhead.
E.g. reading above chart… google.com is the most widely referenced third-party hostname: used by ~125K pages and accounts for ~700K requests. The median number of requests is 2, and 95th percentile is 18. For these requests, the median request and response header size is ~2KB, and ~18KB for 95th percentile.
SELECT origin,
COUNT(DISTINCT pageid) as pages,
SUM(numReq) as reqTotal,
/* number of requests */
NTH(50, QUANTILES(numReq)) as numReq_50,
NTH(95, QUANTILES(numReq)) as numReq_95,
/* request headers bytesize */
NTH(50, QUANTILES(reqHeaders)) as reqHeaders_50,
NTH(95, QUANTILES(reqHeaders)) as reqHeaders_95,
/* response headers bytesize */
NTH(50, QUANTILES(respHeaders)) as respHeaders_50,
NTH(95, QUANTILES(respHeaders)) as respHeaders_95,
/* response body bytesize */
NTH(50, QUANTILES(respBody)) as respBody_50,
NTH(95, QUANTILES(respBody)) as respBody_95,
FROM (
SELECT origin, pageid,
COUNT(*) as numReq,
SUM(reqHeadersSize) as reqHeaders,
SUM(reqBodySize) as reqBody,
SUM(respHeadersSize) as respHeaders,
SUM(respBodySize) as respBody
FROM (
SELECT req.pageid as pageid, DOMAIN(req.url) as origin,
reqHeadersSize, respHeadersSize,
IF (INTEGER(respBodySize) IS NULL, 0, INTEGER(respBodySize)) as respBodySize,
IF (INTEGER(reqBodySize) IS NULL, 0, INTEGER(reqBodySize)) as reqBodySize
FROM httparchive:runs.2014_11_15_requests as req JOIN (
SELECT pageid, url
FROM httparchive:runs.2014_11_15_pages
) as pages ON req.pageid = pages.pageid
/* restrict analysis to non-origin domains */
WHERE DOMAIN(req.url) != DOMAIN(pages.url)
)
GROUP BY pageid, origin
)
GROUP EACH BY origin
/* restrict to origins used by 1000+ distinct pages */
HAVING pages > 1000
ORDER BY reqTotal desc