Mobile requests that redirect to 'm.' or '.mobi'

I saw a tweet last week where Brad Frost asked about how many sites do a redirect to “m.” sites. I thought it would be interesting to try to figure this out (and teach myself SQL at the same time):

SELECT  pages.pageid, pages.url, lib.url, numRedirects, rank 
FROM [httparchive:runs.2013_09_01_pages_mobile] AS pages JOIN (
  SELECT pageid, url  
  FROM [httparchive:runs.2013_09_01_requests_mobile]  
  WHERE url LIKE 'http://m.%/' 
  GROUP BY pageid, url  
) AS lib ON lib.pageid = pages.pageid
WHERE numRedirects >0
ORDER BY rank asc

This gives us a list with 752 results:

I know this list is not exhaustive, but limiting to urls that end in a ‘/’ seems like a good compromise to exclude all the other attributes that are requested from http://m.* sites. For example, we miss YouTube in this query:

But in order to follow a trend, this is probably adequate.

Changing the dates of the data sets shows that this is growing from 670 YoY, and from 739 6 months ago.

You can also track ‘http://%.mobi%/’ for redirects to .mobi domains.

It appears to have 16 results, but the first 2 entries are duplicate as Foxnews.com does a series of redirects:

In fact, if you type foxnews.com on your mobile, you are redirected 3 times:
foxnews.com -> www.foxnews.com -> foxnews.mobi -> www.foxnews.mobi

HTTP Archive starts with the “www” prefix, so we just see two.

I forgot to consider redirects to “https://m.*”

There are a few of these too (with a similar growth pattern YoY):

09/2012 17
03/2013 18
09/2013 27

1 Like

@doug_sillars great stuff! Quick thought: to work around the trailing params, you could first normalize the URL via HOST() [1], and then do the LIKE match… that would also take into account the HTTPS redirects.

[1] https://developers.google.com/bigquery/query-reference#urlfunctions

@igrigorik Thanks for the pointers. This helped, but did open another can of worms. In many cases, there multiple requests to one m. domain. Here is Youtube, for example:

The trick I used to get around that is using firsthtml =true. The count is remarkably similar, but the list is FAR more accurate!

SELECT lib.rank,lib.url, HOST(requests.url) as host 
FROM [httparchive:runs.2013_09_01_requests_mobile] AS requests JOIN (
    SELECT rank, pageid, url, numRedirects, domain(url) as domain2  
    FROM [httparchive:runs.2013_09_01_pages_mobile] 
    GROUP BY rank, pageid, url, domain2 , numRedirects
  ) AS lib ON lib.pageid = requests.pageid
WHERE numRedirects >0
AND ((HOST(requests.url)  LIKE 'm.%') OR (HOST(requests.url)  LIKE '%.mobi'))
AND requests.firstHtml = true
     
  // this line forces results to be the from the same domain ... 
  // The domain compare is interesting, if you do the != domains 
  // you see where the website sends users to a different domain.    
  //AND lib.domain2 == DOMAIN(requests.url)

ORDER BY lib.rank asc;

Hi everyone,

Where exactly can I run these SQL queries if I’d like to write my own?

The FAQ does not mention this.

Thanks,
Vince

Vince,

Ilya has a great post on getting started. I got my instance running from this, and his talk at Velocity Santa Clara from Summer 2013. That video is embedded:

http://bigqueri.es/t/getting-started-with-bigquery-http-archive/2

Great topic, I missed this conversation before but have been looking to tackle the same thing. I felt like the pattern matching path is full of naming convention pitfalls, so took a different approach, correlating redirects seen in HTTP Archive Mobile and not in HTTP Archive.

Here’s the post: http://www.guypo.com/mobile/rwd-and-one-web-adoption-rates/

And here’s the query for achieving a similar goal to what you specified:

SELECT ((mData.fStatus=302 OR mData.fStatus=301) && dData.fStatus=200) as diff, count(*)
FROM
  (SELECT pages.pageid as pid,url,urlhash,wptid,fHtml,fReq,fStatus,loc 
   FROM [httparchive:runs.latest_pages] as pages JOIN
   (select pageid,MAX(firstHtml) as fHtml,MAX(firstReq) as fReq,MAX(status) fStatus, MAX(resp_location) as loc
    from [httparchive:runs.latest_requests]
    WHERE firstReq=true
    group by pageid) as reqs ON
   reqs.pageid = pages.pageid) as dData
 JOIN
  (SELECT pages.pageid as pid,url,wptid,fHtml,fReq,fStatus,loc 
   FROM [httparchive:runs.latest_pages_mobile] as pages JOIN
   (select pageid,MAX(firstHtml) as fHtml,MAX(firstReq) as fReq,MAX(status) fStatus, MAX(resp_location) as loc  
   from [httparchive:runs.latest_requests_mobile]
   WHERE firstReq=true
   group by pageid) as reqs ON
   reqs.pageid = pages.pageid) as mData
 ON mData.url=dData.url
group by diff
;

You can achieve a similar result if you compare whether the first request was also the first HTML instead of comparing redirects, meaning replacing the first line of the previous query with this one:

SELECT (mData.fHtml=false && dData.fHtml=true) as diff, count(*)

Guy,
Totally agree on naming pitfalls.I came up with 16.5% of sites redirecting - so right in the same ballpark as your data (and probably missing 3% or so due to the aforementioned naming pitfalls). It is always cool to see 2 completely different approaches come up with the same number :).

Unfortunately, I think the number is even higher. In your post, you see 27% (1326) of sites redirect on BOTH mobile and desktop - and assume that these are not mobile specific redirects.

However, it looks like 377 of those actually redirect to a different URL for mobile vs. desktop. That’s another 8% of sites with mobile specific redirects (28% total). Here is my code (This is a BIG query, but I like seeing the urls to make sure I am doing it correctly.)

SELECT dData.url, drequesturl, mData.url,mrequesturl
FROM
  (SELECT pages.pageid as pid,url,urlhash,wptid,drequesturl,fHtml,fReq,fStatus,loc 
   FROM [httparchive:runs.latest_pages] as pages JOIN
   (select pageid,url as drequesturl, MAX(firstHtml) as fHtml,MAX(firstReq) as fReq,MAX(status) fStatus, MAX(resp_location) as loc
    from [httparchive:runs.latest_requests]
    WHERE 
    firstHtml = true
    group by pageid, drequesturl) as reqs ON
   (reqs.pageid = pages.pageid)
   Where drequesturl != pages.url 
   )as dData
 JOIN
  (SELECT pages.pageid as pid,url,wptid,mrequesturl,fHtml,fReq,fStatus,loc 
   FROM [httparchive:runs.latest_pages_mobile] as pages JOIN
   (select pageid, url as mrequesturl, MAX(firstHtml) as fHtml,MAX(firstReq) as fReq,MAX(status) fStatus, MAX(resp_location) as loc  
   from [httparchive:runs.latest_requests_mobile]
   WHERE 
   firstHtml = true  
   group by pageid, mrequesturl) as reqs ON
   (reqs.pageid = pages.pageid )
   Where mrequesturl != pages.url) as mData
 ON mData.url=dData.url
 where mrequesturl != drequesturl

;