Web Apps vs Web Sites comparative numbers

Hello all, after a brief conversation on Twitter (https://twitter.com/rick_viscomi/status/1028043709961695232) RIck Viscomi suggested I post a thread here.

I’m interested to know what %age of the web is really super client-side apps, and what %age is good old-fashioned server-generated HTML. (We know, for example, that WordPress powers many sites, and apart from some image-carousel/ lightbox JS candy, my gut feeling is that most WP themes pretty much generate from PHP and the client doesn’t do much with the DOM after that.)

Amelia suggested “what you really need to test is how much the DOM changes between the initial HTML parse and the point when the content stabilizes (if it ever stabilizes). And then create some sort of cut-off to distinguish client-side rendering from script-based progressive enhancement”, pointing out this Lighthouse test https://developers.google.com/web/tools/lighthouse/audits/no-js

I think if a page only emits a “you need JavaScript” message as a noscript fallback, it should be go into the “Web app only” bucket, but i have no idea if that’s do-able.

Does anyone have any other ideas, and know how to write the query and pull the levers to find this sort of info?


The “nothing-but-a-<noscript>” approach would get some positives, but it’s the server-rendered web apps that are tricky.

We could detect known frameworks, but that might mistakenly count WordPress sites where a plugin uses React for a widget, etc.

There’s probably not a silver bullet solution, but some ideas:

  • If the first child of the body is <div id="root">, it’s probably a web app

  • If the URL has a hashbang, it’s probably a web app (or includes <meta name="fragment" content="!">)

  • If the URL loads 5 MB of JS, I really hope it’s a web app

1 Like

Really interesting question. I don’t know of any HTTP Archive indicators for DOM activity between the initial parse and another timing metric. And I suspect that many non-client-side pages will still have a noscript fallback.

Another thought I had was to look at DOM elements at DOM Content Loaded vs onLoad vs Fully Loaded and try to look for patterns. But HAR files don’t collect that level of detail.

Could using Wappalyzer to analyze for the presence of certain frameworks be useful here? Or useful combined with another heuristic? Using Wappalyzer to Analyze CPU Times Across JS Frameworks

Just thinking out loud in case it helps…

One though might be to look at all of the same-origin links on the page (and maybe click handlers). Presumably the SPA’s will all have something like the same URL with a hash or other delineation while classic web apps will navigate to distinct urls (as he hand waves about the specifics).

Collecting all of the links from a page is something we could add to the json results easily enough or if you can come up with a robust detection that can run in js we can add it as a custom metric (assuming it can’t be detected with the data we already collect).

If change in DOM size from initial HTML doc to final is a good indicator, you could do two test runs: one normal and one with scripts blocked (eg all urls containing “.js”). Then compare the two DOM sizes. This would be wiggly, eg, inline script blocks would still run, scripts without “.js” would still run, etc.

If HA can’t do such a run we’d be happy to let you use SpeedCurve.

“This would be wiggly” < Wiggly is fine, I think; I don’t see ‘web app’ vs ‘web site’ to be entirely different things, but 2 ends of a spectrum.

“If HA can’t do such a run we’d be happy to let you use SpeedCurve.” < yay. How?

Not sure if comparing the DOM is reliable: we’ve moved a lot of navigation into JSON that is loaded asynchronously because the navigation tree can often be the largest part of the DOM on an individual page. But it’s a good place to start.

Otherwise there are probably particular events to look for, particularly using fetch. I susppose that comparing the new homepage, which is very appy, with the legacy one might be interesting.

My friend Dan Shappir suggested that a good proxy to measure on might be use of History API, eg presence of a local router.

I’m not sure that would work; the very server-rendered Ruby on Rails framework uses the History API in their Turbolinks progressive enhancement, as seen on sites like GitHub.