Detecting JavaScript sourcemaps

For the JS chapter of the Web Almanac one of the metrics I’m analyzing is the % of sites that ship sourcemaps. Sounds like a simple premise but answering that question is a bit tricky. I’d love to get some help from those who are more familiar with how they work and are measured by Chrome.

Here’s what I came up with:

#standardSQL
SELECT
  COUNT(DISTINCT page) AS freq,
  ROUND(COUNT(DISTINCT page) * 100 /
    (SELECT COUNT(DISTINCT page) FROM `httparchive.almanac.summary_response_bodies`), 2) AS pct
FROM
  `httparchive.almanac.summary_response_bodies`
WHERE
  type = 'script' AND
  body LIKE '%sourceMappingURL%' AND
  NET.REG_DOMAIN(page) = NET.REG_DOMAIN(url)

It’s querying a processed summary_response_bodies table which is different from other HTTP Archive tables on BigQuery in a couple of ways. It combines the response_bodies for the July 2019 crawl with their corresponding summary_requests values. In this case, that makes it so much easier to select only the response bodies for JS resources. It’s also a much more efficient way to query thanks to the partitioning and clustering this table employs, a relatively new BigQuery feature that changes the way the data is stored and accessed. Other than that, the fields are all the same as what you’d be used to with existing datasets.

After a chat with @addyosmani there are a few gotchas with this approach:

  • developers may have source mapping working in a local dev environment but not set up properly in production, so the sourceMappingURL might point to an invalid resource
  • even if the sourcemap does load, it may still be invalid
  • the detection is very permissive and would include false positives for any script that happens to have that string, whether or not it’s actually declaring a sourcemap URL
  • one of the conditions is that script is a first party resource on the same domain as the website and it’s not clear whether this is unnecessarily restrictive

So I’m curious to hear if anyone knows of a more robust way to detect that the sourcemap is not only declared and loaded, but also valid, and how we might query that. Here’s what I can think of to get started:

  • also check the X-SourceMap header as an alternative way to declare the URL
  • extract the URL from the declaration and compare with known requests on that page
  • check the HTTP status of requests for files ending in the .map extension
  • Chrome/Blink feature counter?

I’ve been working on adding source map gathering to Lighthouse, so this thread caught my attention :slight_smile:

also check the X-SourceMap header as an alternative way to declare the URL

FWIW, I have a feeling this is much less used, maybe even negligible. Would be good to know for sure.

extract the URL from the declaration and compare with known requests on that page
check the HTTP status of requests for files ending in the .map extension

Unfortunately, Blink will not make any requests for a source map. For example, the only way that the DevTools frontend app gets source map data is by listening for Debugger.scriptParsed events, and downloading the sourceMapURL. The DevTools protocol sets this field by looking for a sourceMappingURL comment and for the SourceMap / X-SourceMap headers (the latter is deprecated).

Chrome/Blink feature counter?

Maybe, but the data is already in Chrome (via the scriptParsed events). I don’t know a lot about HttpArchive - maybe there’s a table that has devtools protocol events? If not, there should be :slight_smile:

1 Like

Don’t forget they can be inlined too - came across an example where 75% of a third-part script was base64 inlined source maps

In this case it seemed to be so Sentry could re-assemble their stack traces, don’t know why they didn’t just upload the source maps to Sentry!

1 Like

Listening to Debugger.scriptParsed would’ve been a good case for our almanac Custom Metrics by flagging SourceMaps usage. Maybe for the september crawl?

1 Like

Don’t forget they can be inlined too - came across an example where 75% of a third-part script was base64 inlined source maps

Good point. I should have mentioned that the sourceMapURL field of Debugger.scriptParsed can also contain a data URL.

1 Like