I’d recently become aware of the fact that importScripts are slow to load when given more than 1 URL, at least in their Chromium implementation (and the current spec, with a pending PR to fix that). When multiple URLs are given, the browser downloads them serially, resulting in this operation being latency bound, which is bad.
A discussion ensued, during which folks asked the question “should we even bother optimizing them? Or can we just tell folks to move to use module workers?”.
IMO, a large part of the equation is optimizing existing content. If we made importScripts faster, how much content would benefit from that?
So, as you can probably guess from the fact that I’m writing this here, I turned to HA for help. (with @rviscomi’s assistance, because my regex skills are… not great).
I wanted to search for response bodies that contain
importScripts(...), and then count the instances that have more than a single URL in them (and would therefore benefit from the optimization).
Here’s the query I ended up with:
SELECT * FROM( SELECT URL as URL, ARRAY_LENGTH(SPLIT(import, ',')) AS num_urls FROM ( SELECT bodies.url AS URL, REGEXP_EXTRACT_ALL(bodies.body, r'importScripts\(([^)]*)\)') as imports FROM httparchive.response_bodies.2020_04_01_mobile AS bodies WHERE bodies.body LIKE '%importScripts(%' ), UNNEST(imports) AS import ) WHERE num_urls > 1
While the final query does plow through 12.4TB of data, the (many) iterations on that query were done using the
httparchive.sample_data.response_bodies_desktop_10k table, which is significantly smaller.
The results show over 30K current worker scripts that would benefit from that optimization! That, in my book, means making it faster is worth our while.