For the JS chapter of the Web Almanac one of the metrics I’m analyzing is the % of sites that ship sourcemaps. Sounds like a simple premise but answering that question is a bit tricky. I’d love to get some help from those who are more familiar with how they work and are measured by Chrome.
Here’s what I came up with:
#standardSQL SELECT COUNT(DISTINCT page) AS freq, ROUND(COUNT(DISTINCT page) * 100 / (SELECT COUNT(DISTINCT page) FROM `httparchive.almanac.summary_response_bodies`), 2) AS pct FROM `httparchive.almanac.summary_response_bodies` WHERE type = 'script' AND body LIKE '%sourceMappingURL%' AND NET.REG_DOMAIN(page) = NET.REG_DOMAIN(url)
It’s querying a processed
summary_response_bodies table which is different from other HTTP Archive tables on BigQuery in a couple of ways. It combines the
response_bodies for the July 2019 crawl with their corresponding
summary_requests values. In this case, that makes it so much easier to select only the response bodies for JS resources. It’s also a much more efficient way to query thanks to the partitioning and clustering this table employs, a relatively new BigQuery feature that changes the way the data is stored and accessed. Other than that, the fields are all the same as what you’d be used to with existing datasets.
After a chat with @addyosmani there are a few gotchas with this approach:
- developers may have source mapping working in a local dev environment but not set up properly in production, so the
sourceMappingURLmight point to an invalid resource
- even if the sourcemap does load, it may still be invalid
- the detection is very permissive and would include false positives for any script that happens to have that string, whether or not it’s actually declaring a sourcemap URL
- one of the conditions is that script is a first party resource on the same domain as the website and it’s not clear whether this is unnecessarily restrictive
So I’m curious to hear if anyone knows of a more robust way to detect that the sourcemap is not only declared and loaded, but also valid, and how we might query that. Here’s what I can think of to get started:
- also check the
X-SourceMapheader as an alternative way to declare the URL
- extract the URL from the declaration and compare with known requests on that page
- check the HTTP status of requests for files ending in the
- Chrome/Blink feature counter?