The topic of WordPress performance has come up on this forum in the past, but we haven’t really had a good answer for it at the time. We’re looking to integrate Wappalyzer into our data processing pipeline, but there is still some engineering work required to make that a reality.
We can still emulate Wappalyzer’s WordPress detection directly in BigQuery. For example:
SELECT DISTINCT page FROM `httparchive.har.2017_10_15_chrome_requests_bodies` WHERE REGEXP_CONTAINS(body, "<link rel=[\"']stylesheet[\"'] [^>]+wp-(?:content|includes)")
This query consumes 76% of your free monthly quota! Tip: go to bit.ly/ha50 for an additional 10TB free while supplies last.
This query takes one of the signals in Wappalyzer (a stylesheet link with
wp-includes in the resource URL) and finds all pages on which that link is used. There are ~90k pages (18%) matching this pattern. It’s not perfect and other signals are needed to definitively determine if a site uses WordPress, but it’s a good start considering BuiltWith estimates the top 500K to be somewhere between 20-30% WordPress. For example, https://techcrunch.com/ is a known WordPress site tracked by HTTP Archive, but it’s not in the list of 90k despite having links containing the detected keywords, because none of them are
Don’t run the query above. Instead, you can go directly to the results that have been saved to a scratchspace table: https://bigquery.cloud.google.com/table/httparchive:scratchspace.wordpress?tab=preview
You can join this table with the har/runs datasets to find out more about WordPress performance. For example:
SELECT url IN (SELECT url FROM `httparchive.scratchspace.wordpress`) AS wordpress, ROUND(APPROX_QUANTILES(bytesTotal, 1001)[OFFSET(501)] / 1024) AS totalKB FROM `httparchive.runs.2017_10_15_pages` GROUP BY wordpress
We get 1590 KB for wordpress=false and 1897 KB for wordpress=true. So this simple example demonstrates that WordPress pages tend to be heavier than non-WordPress pages. This also begs more questions like what makes it heavier (scripts, images, videos, etc) and what needs to be done to fix it (minification, caching, service workers, etc).
I’d really love the help of the HTTP Archive community to dig into this data more and find other interesting conclusions about the WordPress ecosystem. @amedina and I will be travelling to Nashville in a couple of weeks to share these findings at Wordcamp US.
I’ll be updating this thread with more analyses as they happen. Feel free to reply if you’ve found anything interesting!
Some areas for exploration should you need any ideas:
- Do images tend to be less optimized?
- Is there a greater reliance on third parties? What is the relative effect?
- Do WordPress pages tend to have more JS vulnerabilities? (see the new vulnerability audits in the Lighthouse results)
- Do WordPress pages tend to have more a11y audit failures? (again, see Lighthouse data)
- Join with the origins in the Chrome UX Report to find out relative real user performance.
- Are WordPress websites more or less likely to do “coinjacking” (background currency mining)?
We could ask these questions (and many many more) of any group of websites. Some WordPress-specific areas to explore include detected plugins and themes and how they influence performance, WordPress hosting services, and enterprise/VIP sites compared to other WordPress sites.