Why is emulated Android page size so much bigger than iPhone?


#1

Several people on Twitter commented about the increase in page size when we switched from iPhone to emulated Android. Here’s the chart:

I started my investigation by going to the stats page. There’s a new feature there: “Compare two runs”. I compared Feb 15 2016 vs Mar 1 2016 - the last iPhone crawl and the first emulated Android crawl.

The top chart gives a good overview explanation:

Overall size is 1684 kB (Android) vs 1284 kB (iPhone). Why is Android 400 kB bigger?

  • images (+186 kB): Android is 974 kB vs 788 kB on iPhone.
  • video (+120 kB): Android has 120 kB average video size compared to 0 kB for iPhone.
  • JavaScript (+94 kB): Android is 401 kB vs iPhone’s 307 kB.

Those three content-types add up to 400 kB. Now the question is, why are those areas larger?


#2

About video, there’s a difference. Android Browser support MP4 not M3U8(TS) while iPhone Safari support M3U8(TS) not MP4. MP4 format is much larger and always represent in a whole piece, however M3U8 is served in pieces of TS stream.


#3

Very interesting! It might also be that the HTTP Archive code isn’t reporting those files correctly and with the correct content-type. It would be great if you could find an example website that has this type of iPhone M3U8(TS) files and see if they show up in HTTP Archive and what size and content-type the have. Please share any examples here.


#4

Looking at both runs, I see a few notable differences:

  • Feb 15 run has 17% GIF images while Mar 1 has 33%
  • Feb 15 run has 9% errors while Mar 1 has a huge 21%

There are these differences even when the URLs are restricted to the top 1000.


#5

About GIF images
Select AVG(reqGif), AVG(bytesGif) FROM [httparchive:runs.2016_02_15_pages_mobile] where rank<1000;
AVG(reqGif)=5.23
AVG(bytesGif)=22447

Select AVG(reqGif), AVG(bytesGif) FROM [httparchive:runs.2016_03_01_pages_mobile] where rank<1000;
AVG(reqGif)=5.07
AVG(bytesGif)=29908

Am i missing something?


#6

The iPhones are running iOS 4 which at this point is pretty ancient and a good amount of the web is just broken on it. I expect the bulk of the difference comes from sites that actually work correctly in the Android emulation but didn’t load correctly on iOS 4 (so the new stats better represent reality and we have been undercounting as more of the web broke for iOS 4).

We are running both in parallel until the end of June so I pulled a few sites from the 5/15 crawl from both (url links go to the WPT comparison UI of the runs):

www.youtube.com

Android (1038 KB) iOS (534 KB)

www.cnn.com

Android (3119 KB) iOS (1215 KB) (WTF! 3MB for the mobile page? OUCH!)

#7

Still digging into it, but some preliminary results…

There is an increase across the board. Results for May 1st runs and query below:

SELECT * FROM 
  (SELECT 
    'iPhone',
    ROUND(AVG(bytes)) average,
    NTH(10, quantiles(bytes,101)) tenth,
    NTH(20, quantiles(bytes,101)) twentieth,
    NTH(30, quantiles(bytes,101)) thirtieth,
    NTH(40, quantiles(bytes,101)) fortieth,
    NTH(50, quantiles(bytes,101)) fiftieth,
    NTH(60, quantiles(bytes,101)) sixtieth,
    NTH(70, quantiles(bytes,101)) seventieth,
    NTH(80, quantiles(bytes,101)) eightieth,
    NTH(90, quantiles(bytes,101)) ninetieth,
    NTH(95, quantiles(bytes,101)) ninety_fifth,
    NTH(99, quantiles(bytes,101)) ninety_ninth
   FROM (
     SELECT 
      ROUND(bytesTotal / 1024) AS bytes
      FROM  [httparchive:runs.2016_05_01_pages_mobile]
    )
  ), 
  (SELECT 'android',
    ROUND(AVG(bytes)) average,
    NTH(10, quantiles(bytes,101)) tenth,
    NTH(20, quantiles(bytes,101)) twentieth,
    NTH(30, quantiles(bytes,101)) thirtieth,
    NTH(40, quantiles(bytes,101)) fortieth,
    NTH(50, quantiles(bytes,101)) fiftieth,
    NTH(60, quantiles(bytes,101)) sixtieth,
    NTH(70, quantiles(bytes,101)) seventieth,
    NTH(80, quantiles(bytes,101)) eightieth,
    NTH(90, quantiles(bytes,101)) ninetieth,
    NTH(95, quantiles(bytes,101)) ninety_fifth,
    NTH(99, quantiles(bytes,101)) ninety_ninth
   FROM (
    SELECT 
      ROUND(INTEGER(JSON_EXTRACT(payload, '$._bytesIn')) / 1024) AS bytes
    FROM [httparchive:har.2016_05_01_android_pages]
   )
  )

We can compute the byte diffs and the results are a wee bit scary:

Spreadsheet with all of the results: https://docs.google.com/spreadsheets/d/1LiQ180eIKxe_Nq4rUZUWr8YBC60ukzB5xKXn-BcDWSE/edit#gid=1543584302

SELECT url, (androidKBytes-iphoneKBytes) as diff, androidKBytes, iphoneKBytes 
FROM (
  SELECT
    url,
    ROUND(INTEGER(JSON_EXTRACT(payload, '$._bytesIn')) / 1024) AS androidKBytes,
    iphoneKBytes
  FROM [httparchive:har.2016_05_01_android_pages] as android
  JOIN 
    (SELECT 
      url as iphonePage,
      ROUND(bytesTotal / 1024) AS iphoneKBytes
      FROM [httparchive:runs.2016_05_01_pages_mobile]
    ) as iphone
  ON url = iphonePage
)
ORDER BY diff desc

Digging a bit deeper into some of the above results shows that main culprits are video and gifs… For example, consider asu.edu:

SELECT 
  INTEGER(JSON_EXTRACT_SCALAR(payload, '$._bytesIn')) as bytes,
  url
FROM [har.2016_05_01_android_requests]
WHERE page = 'http://www.asu.edu/'
ORDER BY bytes desc
LIMIT 5


Here are the top requests, by weight, for the heaviest pages:

SELECT 
  page,
  ROUND(INTEGER(JSON_EXTRACT_SCALAR(payload, '$._bytesIn'))/1024) as kbytes,
  url
FROM [har.2016_05_01_android_requests]
WHERE page IN (
  SELECT url FROM (
    SELECT url, (androidKBytes-iphoneKBytes) as diff, androidKBytes, iphoneKBytes 
    FROM (
      SELECT
        url,
        ROUND(INTEGER(JSON_EXTRACT(payload, '$._bytesIn')) / 1024) AS androidKBytes,
        iphoneKBytes
      FROM [httparchive:har.2016_05_01_android_pages] as android
      JOIN 
        (SELECT 
          url as iphonePage,
          ROUND(bytesTotal / 1024) AS iphoneKBytes
          FROM [httparchive:runs.2016_05_01_pages_mobile]
        ) as iphone
      ON url = iphonePage
    )
    ORDER BY diff desc
    ) 
  LIMIT 100
)
ORDER BY kbytes desc
LIMIT 100

#8

SELECT * FROM [httparchive:runs.2016_03_01_requests_mobile] where ext like 'mp4%';
I ran this query, found that requestid=30954175 is a mp4, but it’s mimetype is ‘application/javascript’. I checked it on my Chrome emulated Galaxy S5, it’s mimetype is ‘video/mp4’.
Maybe something block your crawler?