Too Many Redirects: Your Customers are Not Ping Pong Balls


#1

A few weeks past, I looked at the cost of initiating website load with a redirect, and discovered that mobile websites that kickoff page load with a redirect generally have a SpeedIndex that is 2500 higher than similar sites. I called this the “SpeedIndex Tax.”

As I speak to other people in the performance field, I have heard rumors of sites that start offload with 5 or 6 redirects. I see this in my head as customers being tossed like a ping-pong ball from server to server. But, what is the actual maximum number of ‘ping-pong’ bouces a site will do before loading content? @andydavies and I tried to discover this at the 11th hour before our talk last year at Velocity, but ran into issues (that I’ll outline below). I thought that it would be really fun to revisit this problem, and discover the sites that ping pong their customers around on their servers the most. So I went back to BigQuery and began to dig in.

In the HTTPArchive, all requests have a requestId, but since multiple devices are testing sites at once, the requestIds for a single site are not sequential. This means that if id 1234567 is the first request for a site, 1234568 is not always the second. So walking the request ID sequentially was not going to work (this is where Andy and I gave up last year). However, I can get the request ID of the first request, and the requestId of the first HTML page, and calculate the difference:

SELECT  
firstrequest.pageid, firstrequest.url, (firsthtml.requestid-firstrequest.requestid) as requestdiff  
FROM [httparchive:runs.2015_01_01_requests_mobile] firstrequest 
JOIN EACH (  
SELECT  pageid, firstHtml, requestid
  FROM [httparchive:runs.2015_01_01_requests_mobile]
  WHERE firstHtml=true
) firsthtml ON firstrequest.pageid = firsthtml.pageid
WHERE firsthtml.firstHtml=true AND firstrequest.firstReq = true
order by requestdiff desc

In between these 2 requests, there might be no more additional requests for that site(302 -> html). Those are not so exciting :smile: . We are looking for the sites with the largest number of additional requests in between (since those would all have to be redirects of some sort.)

Now, admittedly, there are a lot of reasons for the first request and the firsthtml page could have a large difference in requestId:

  1. The site is hosted far away from Dulles, the RTT is long (I do see a large percentage of foreign websites in this list.)
  2. It could be that another device was testing a site with a bunch of small files in between the 2 requests.
  3. Or, the page in question has a huge number of redirects.
    So, the goal is to find the #3 data. As you might expect, there are a few sites with a large delta, quickly dropping to small values. If we sort the sites from highest delta to lowest, we get a chart like below:

So, the site we are looking for, the “I can’t believe that this site redirects THAT many times” is over to the left somewhere. There are 46 sites with a requestId >10 and 365 sites with a requestID difference >4 in this data set.

Here’s where I got into a bit of brute force in my hunt for “the site.” I have all of the page ids, so I can just count the # of steps in between for each one:

SELECT pageid,requestid, url, status, firstHtml, firstReq
FROM httparchive:runs.2015_01_01_requests_mobile as requests
Where pageid =
order by requestid asc;

Here are the results of the first 46 (where the difference in requested is 10 or greater):

It should be evident why I stopped checking after the requested difference at 10. The record holder for the mobile site with the greatest number of redirects wins out at 10! It is for a job search company in Russia. To double check, I re-ran this site in WebPageTest.org on a Moto G. Here is a screenshot of the waterfall:

Now there are 11 redirects before code is rendered! A full 12s of customer ping-pong before even the first HTML page is requested. (Here is an appropriate place to insert a This is Spinal Tap reference, but I will leave that the reader.)

Of the 2 sites with 6 redirects, again we have another Russian site, but also a popular toy company (note that they redirect all the image requests too!). Again, this is a Moto G in WebPageTest:

In response to my last post, I received a tweet reminding developers to cache their 302s for the return visit.

Here is the cached view of the waterfall (and you’ll guess how well those 30xs are cached):

OUCH! 19 requests: 74% are redirects.

I believe that it is likely that there are more mobile sites that tax their users with 6-9 redirects before loading any useful content. I am sure someone (with stronger SQL-fu) than I could work out a query to automatically test all of the sites with a large delta between the first request and first html.

However, the brute force method of testing all of the pages has answered the question “what is the maximum number of redirects before loading content?” The answer is 10 (or 11, as seen on the Moto G).


#2

@doug_sillars instead of relying on requestid’s why not follow the chain based on status and resp_location? An example query that singles out pages that force a redirect:

SELECT url, status, resp_location 
FROM [runs.2015_01_15_requests] 
WHERE firstReq = true AND status != 200

Given above results you can then query the requests table for resp_location = url and repeat the same logic to follow the chain.

/cc @fhoffa perhaps you have some tips for how to best script such a query?
/cc @stevesoudersorg requests table has a redirectUrl field, but it’s empty for all records. Bug?