1MB+ of HTTP overhead due to TowerData cookies


#1

Evite is delivering 1MB+ in HTTP cookies, the bulk of which is due to 150+ TowerData tracking requests with large cookies… Is that a bug, or a recurring pattern?

SELECT * FROM (
  SELECT *, SUM(reqHeaders+respHeaders+respBody) OVER(PARTITION BY pages.url) as total FROM (
    SELECT pages.url, pages.rank,
      COUNT(*) as numRequests,
      SUM(reqHeadersSize) as reqHeaders,
      SUM(respHeadersSize) as respHeaders,
      SUM(respBodySize) as respBody,
    FROM httparchive:runs.2014_11_15_pages as pages JOIN (
      SELECT pageid, url, reqHeadersSize, respHeadersSize, respBodySize
      FROM httparchive:runs.2014_11_15_requests
      WHERE url CONTAINS 'rlcdn.com'
    ) as req ON req.pageid = pages.pageid
    WHERE pages.rank IS NOT NULL
    GROUP BY pages.url, pages.rank
  ) 
  ORDER BY rank, total DESC
) 
WHERE total > 1000000

There are 1306 sites that incur 1MB+ due to rlcdn.com (TowerData’s CDN)… The Atlantic, Shutterfly, Evite, Daily Record, and the list goes on. These sites account for ~10% of pages on which TowerData is used. Most pages only make a few requests to their CDN:

This difference is suspect… I wonder if this is a (really costly) misconfiguration for 10% of the sites, or if the ~1MB+ of trackers is a “feature”. In either case though… ouch!


Followup question: is TowerData an outlier, or is this a common pattern?

It does appear to be an outlier. There are over 300+ third-party domains which are used across 1000+ pages and rlcdn.com incurs the highest overhead.

E.g. reading above chart… google.com is the most widely referenced third-party hostname: used by ~125K pages and accounts for ~700K requests. The median number of requests is 2, and 95th percentile is 18. For these requests, the median request and response header size is ~2KB, and ~18KB for 95th percentile.

SELECT origin,
  COUNT(DISTINCT pageid) as pages,
  SUM(numReq) as reqTotal,  
  /* number of requests */
  NTH(50, QUANTILES(numReq)) as numReq_50,
  NTH(95, QUANTILES(numReq)) as numReq_95,
  /* request headers bytesize */
  NTH(50, QUANTILES(reqHeaders)) as reqHeaders_50,
  NTH(95, QUANTILES(reqHeaders)) as reqHeaders_95,
  /* response headers bytesize */
  NTH(50, QUANTILES(respHeaders)) as respHeaders_50,
  NTH(95, QUANTILES(respHeaders)) as respHeaders_95,
  /* response body bytesize */
  NTH(50, QUANTILES(respBody)) as respBody_50,
  NTH(95, QUANTILES(respBody)) as respBody_95,
FROM (
  SELECT origin, pageid,
    COUNT(*) as numReq,
    SUM(reqHeadersSize) as reqHeaders, 
    SUM(reqBodySize) as reqBody,
    SUM(respHeadersSize) as respHeaders, 
    SUM(respBodySize) as respBody
  FROM (
    SELECT req.pageid as pageid, DOMAIN(req.url) as origin,
      reqHeadersSize, respHeadersSize, 
      IF (INTEGER(respBodySize) IS NULL, 0, INTEGER(respBodySize)) as respBodySize,
      IF (INTEGER(reqBodySize) IS NULL, 0, INTEGER(reqBodySize)) as reqBodySize
    FROM httparchive:runs.2014_11_15_requests as req JOIN (
      SELECT pageid, url
      FROM httparchive:runs.2014_11_15_pages
    ) as pages ON req.pageid = pages.pageid
    /* restrict analysis to non-origin domains */
    WHERE DOMAIN(req.url) != DOMAIN(pages.url)
  ) 
  GROUP BY pageid, origin
) 
GROUP EACH BY origin
/* restrict to origins used by 1000+ distinct pages */
HAVING pages > 1000
ORDER BY reqTotal desc

HTTP Archive turns 7!
#3

BTW vdopia.com is the new towerCDN :slight_smile: