Yes, codesplitting is a great best practice for sites that combine all their scripts into a single, monolithic script. I wanted to find out: How many sites need codesplitting?
The hard part isn’t writing a query, it’s defining “need codesplitting”. Here are some filters I applied:
- I only look at 1st party resources. In my experience teams don’t try to combine 3rd party scripts into their single payload.
- I only look at sites with at least 200K of 1st party JS. I chose this as the threshold where combining into a single file would cause the problems codesplitting solves. Although, it’s easy to change that in the query and run other thresholds.
Given those filters, I created a histogram of the ratio of the largest 1st party script over the total 1st party JS size. That’s hard to understand so let’s look at the query and results and revisit the histogram.
Here’s the query:
SELECT biggest_percent_bucket, round(100*count(*)/303067) as percent_of_sites, count(*) as num
FROM (
SELECT pageid,
max(JS_KB) as biggest_JS_KB,
sum(JS_KB) as total_JS_KB,
round(100 * max(JS_KB) / sum(JS_KB)) as biggest_percent_of_total,
floor(0.9 + (10 * (max(JS_KB)/sum(JS_KB))))*10 as biggest_percent_bucket
FROM (
SELECT p.pageid as pageid,
round(respSize/1024) as JS_KB
FROM `httparchive.summary_pages.2018_10_01_desktop` as p,
`httparchive.summary_requests.2018_10_01_desktop` as r
WHERE p.pageid = r.pageid
AND ( resp_content_type like "%application/javascript%" OR resp_content_type like "%text/javascript%" OR resp_content_type like "%application/x-javascript%" OR resp_content_type like "%application/json" )
AND round(respSize/1024) > 0
AND STRPOS(NET.REG_DOMAIN(r.url),REGEXP_EXTRACT(NET.REG_DOMAIN(p.url), r'([\w-]+)'))>0
)
GROUP BY pageid
)
WHERE total_JS_KB > 200
GROUP BY biggest_percent_bucket
ORDER BY biggest_percent_bucket asc
Here are the results:
The column definitions are:
- “biggest_percent_bucket” is the percentage of the biggest 1st party script out of the total size of all 1st party scripts. This is broken into buckets 1-10, 11-20, etc.
- “num” is the number of sites that fall into the bucket
- “percent_of_sites” is the percent of sites that fall into the bucket
For example, the last row of the table says: 10% of sites have a monolithic 1st party script that represents more than 90% of the 1st party total JS size. The second-to-last row says: 5% of sites have a monolithic script that is 81-90% of the total JS size. Etc.
I’m not sure why my query produces a “0.0” row, but it’s only 11 sites so I didn’t worry about. Tips on doing better bucketing are appreciated.
If you’d like to explore other “JS total size” thresholds just edit the where condition “total_JS_KB > 200”. If you’d like to include 3rd party scripts just remove the where condition “STRPOS(NET.REG_DOMAIN(r.url),REGEXP_EXTRACT(NET.REG_DOMAIN(p.url)”.