How much video we find on webpages?
Last week, I was looking at the HTTPArchive, and I noticed with surprise that video averages 25% of the bytes on a page (September 15, 2017 crawl):
That’s lazy pie chart reading - the math says 23.2% of all webpage traffic is video. That’s a lot more than I expected! The problem here is the math of averages (many of you may have read Ilya’s post “The Average Page is a Myth”. Because video files are so incredibly large, they completely skew the “averages” in the HTTPArchive.
We all know that videos are big files. But how big are they? The average video file is 44x larger than the average JPEG (I used 2 MB for the video size because the legend is “off the charts” and, yeah… I know… averages
(On mobile, the average video response is nearly double - at ~3,200 KB!)
So any page that has video will weigh (in bytes) as predominately video. But how many sites actually have video requests on them?
The stats show that only 6% of desktop sites (4% on mobile) have any video at all - yet, they are the elephant in the room. Just 6 out of every 100 sites are enough to tip the scales of the “average” website to 23% video.
Into the Data For More Information
Video is still on a minority of sites. But is the use of video on sites increasing? Time to dig in.
SELECT date,
NTH(50, quantiles(speedIndex)) medianSI,
NTH(50, quantiles(reqVideo)) medianVideoReq,
NTH(50, quantiles(bytesVideo)) medianVideoBytes,
NTH(50, quantiles(bytesTotal)) mediantotalBytes,
COUNT() siteCount
FROM
(SELECT STRFTIME_UTC_USEC(INTEGER(createDate1000000), “%Y-%m”) date, speedIndex, reqVideo, bytesVideo, bytesTotal FROM
httparchive:runs.2016_03_01_pages_mobile,
httparchive:runs.2017_03_01_pages_mobile ,
httparchive:runs.2016_04_01_pages_mobile,
httparchive:runs.2017_04_01_pages_mobile ,
httparchive:runs.2016_05_01_pages_mobile,
httparchive:runs.2017_05_01_pages_mobile ,
httparchive:runs.2016_06_01_pages_mobile,
httparchive:runs.2017_06_01_pages_mobile ,
httparchive:runs.2016_07_01_pages_mobile,
httparchive:runs.2017_07_01_pages_mobile ,
httparchive:runs.2016_08_01_pages_mobile,
httparchive:runs.2017_08_01_pages_mobile ,
httparchive:runs.2016_09_01_pages_mobile,
httparchive:runs.2017_09_01_pages_mobile
httparchive:runs.2016_10_01_pages_mobile,
httparchive:runs.2016_11_01_pages_mobile,where reqVideo>0 ) GROUP BY date, order by date ;
So, how has video grown over time? I re-ran the query above with:
reqVideo=0
and figured out the percentage of sites with video (video/(video + not video)), and plotted the number of sites with video over the last 14 months. There is a slow upward trend in sites containing video:
We can also look to see how the page weight has changed over time. The data (mobile sites) shows that the median webpage is (still) growing over time, but that sites with video are growing 10x faster than those without. (yes, based on the slope of the linear trendline - it is not a great fit, but there is a obvious larger difference in growth for video containing sites than for sites without video)
This chart also shows the disparity of the tonnage in sites with video vs. without - in the months surveyed, the median difference is 480% larger (The minimum was 445% and the max 655%) for sites with videos embedded.
What does the presence of videos on sites lead to in terms of load times? Again, graphing the median mobile SpeedIndex over time:
Sites without video load about 28% faster than sites with video, and this has held pretty constant over the last 14 months.
So, in the grand scheme of things - we have not really learned that much yet - video files are big, and big numbers skew averages. Sites that load more big files are slower to load
What Types of Video?
What can we learn about the video files? Using the mobile data set, there are 78k requests with MimeType containing “video” and 82k requests with type “video”.
SELECT ext, mimeType,url, urlShort, respSize, _cdn_provider, type FROM httparchive.runs.latest_requests_mobile WHERE type CONTAINS "video" ORDER BY type DESC
We’ll use the larger dataset with type.
Let’s look at the file extensions (with count, and 50/95th percentile response size):
As I would expect, mostly Mp4, with a smattering of other formats, including 33 flv Flash files (check out that 95th percentile - 42 MB of Flash sent to an Android phone!)
Where is the video coming from?
Looking at the video traffic from a CDN perspective, most of the requests come from Facebook:
But - where is YouTube?
YouTube: We missed all the embedded videos!
Embedded YouTube videos do not download any video content at page load (unless autoplay is turned on.). The super familiar video with the red play button is not in our analysis up until now:
Building a simple page with a YouTube video, and running through Webpagetest shows us what to look for:
A quick search (NOTE: the search here is for desktop):
select
host,
count(*) as cnt,
type
from(SELECT
HOST(url) as host, type
FROM
httparchive.runs.latest_requests_mobile
WHERE
url CONTAINS “www-embed-player.js”
)
group by host, type
order by cnt desc
shows:
Joining this to the number of websites, and we find 32k desktop and 27k mobile sites - nearly double the 17k (3.8%) of mobile sites with embedded videos! (Vimeo adds another ~4k sites on mobile and desktop). If we assume that the sites with YouTube/Vimeo embedded videos and sites with downloaded video content do not overlap - we approach 10% of mobile sites containing video!
Conclusion
While only 3.8% of mobile websites download video content - these sites sway the ‘average’ website KB breakdown to 23% video, again demonstrating the dangers of using averages to compute meaningful data.
Further digging into sites with video, we discover that including sites with YouTube or Vimeo embedded videos increases the number of mobile sites with video content to ~10%.
Video is here to stay, and we need to begin looking at how utilizing video on our websites affects the performance.