Measuring CMS host TTFB in CrUX

I’m currently at WordCamp EU in Berlin and today I’m sharing some research I’ve done into an upcoming metric in CrUX, time to first byte (TTFB) AKA server response time. This metric is incredibly useful as a barometer for hosting providers because it ignores the variability of front end performance and focuses only on network setup and backend response time. So things like oversubscribed shared hosts and slow database lookups should stick out like a sore thumb.

I wrote up a doc at bit.ly/ttfb-crux-wp with the methodology for my research, combining the CrUX data with HTTP Archive to understand which origins represent known WordPress sites.

I’d love to use this thread to share my methodology and collaborate on ways to improve the analysis. I’d be especially interested in adding more detections for various hosts (BlueHost, Host Gator, GoDaddy, etc) for a more complete picture of the landscape.

Excerpts from the methodology below:


Here’s an example of a query to get fast/avg/slow TTFB for all CMSs:

SELECT
  app,
  client,
  COUNT(DISTINCT origin) AS n,
  SUM(IF(ttfb.start < 200, ttfb.density, 0))
     / SUM(ttfb.density) AS fast,
  SUM(IF(ttfb.start >= 200 AND ttfb.start < 1000, ttfb.density, 0))
     / SUM(ttfb.density) AS avg,
  SUM(IF(ttfb.start >= 1000, ttfb.density, 0))
     / SUM(ttfb.density) AS slow
FROM
  `chrome-ux-report.all.201906`,
  UNNEST(experimental.time_to_first_byte.histogram.bin) AS ttfb
JOIN
  (SELECT _TABLE_SUFFIX AS client, *
   FROM `httparchive.technologies.2019_05_01_*`
   WHERE category = 'CMS')
ON
  client = IF(form_factor.name = 'desktop', 'desktop', 'mobile') AND
  CONCAT(origin, '/') = url
GROUP BY
  app,
  client
ORDER BY
  n DESC

Note that 201906 doesn’t exist publicly yet, so this query will fail.

app client n fast avg slow
WordPress mobile 940,191 19.22% 36.51% 44.27%
Drupal mobile 84,303 25.16% 45.09% 29.75%
Joomla mobile 70,750 9.67% 53.32% 37.01%
Squarespace mobile 21,482 29.88% 55.05% 15.07%
Wix mobile 21,233 2.35% 68.66% 28.99%
1C-Bitrix mobile 20,485 15.10% 60.36% 24.53%
TYPO3 CMS mobile 18,443 29.11% 48.60% 22.29%
Adobe Experience Manager mobile 10,142 31.77% 44.81% 23.42%
DNN mobile 7,341 29.14% 47.99% 22.86%
Weebly mobile 7,267 28.44% 56.20% 15.35%
DataLife Engine mobile 6,895 13.40% 67.85% 18.75%
Jimdo mobile 6,665 15.54% 58.62% 25.84%
Microsoft SharePoint mobile 6,099 24.39% 44.42% 31.19%
Liferay mobile 4,895 25.16% 48.28% 26.56%
Concrete5 mobile 3,498 17.53% 40.16% 42.32%
Sitefinity mobile 3,317 29.00% 44.84% 26.16%
Craft CMS mobile 2,908 9.86% 51.52% 38.62%
Contao mobile 2,565 14.32% 65.04% 20.64%
SPIP mobile 2,290 20.66% 57.02% 22.32%
Plone mobile 2,038 28.60% 46.27% 25.13%
SilverStripe mobile 1,994 13.08% 50.42% 36.49%
Contentful mobile 1,955 22.20% 51.11% 26.69%
MODX mobile 1,908 20.18% 58.79% 21.04%
Tilda mobile 1,687 32.51% 55.94% 11.54%
eZ Publish mobile 1,514 25.15% 47.94% 26.91%
Business Catalyst mobile 1,388 35.45% 49.49% 15.06%
Sitecore mobile 1,066 20.00% 51.05% 28.95%

Things get interesting when you try to look at individual hosts. Based on an approach used by Pantheon, we can look at response headers as an indicator of where the site is hosted. Many hosts leave “calling cards” in the response headers, which we can sniff out and ID. Here’s an example of looking at three WordPress hosts using these header patterns:

  • Automattic: automattic.com/jobs
  • Pantheon: x-pantheon-styx
  • WP Engine: wpe-

SELECT
  CASE 
   WHEN platform = 'automattic.com/jobs' THEN 'Automattic'
   WHEN platform = 'x-pantheon-styx-hostname' THEN 'Pantheon'
   WHEN platform = 'wpe-backend' THEN 'WP Engine'
   ELSE NULL
  END AS platform,
  client,
  COUNT(DISTINCT origin) AS n,
  SUM(IF(ttfb.start < 200, ttfb.density, 0)) / SUM(ttfb.density) AS fast,
  SUM(IF(ttfb.start >= 200 AND ttfb.start < 1000, ttfb.density, 0)) / SUM(ttfb.density) AS avg,
  SUM(IF(ttfb.start >= 1000, ttfb.density, 0)) / SUM(ttfb.density) AS slow
FROM
  `chrome-ux-report.all.201906`,
  UNNEST(experimental.time_to_first_byte.histogram.bin) AS ttfb
JOIN
  (SELECT _TABLE_SUFFIX AS client, url FROM `httparchive.technologies.2019_05_01_*` WHERE app = 'WordPress')
ON
  client = IF(form_factor.name = 'desktop', 'desktop', 'mobile') AND
  CONCAT(origin, '/') = url
JOIN
  (SELECT _TABLE_SUFFIX AS client, url, REGEXP_EXTRACT(LOWER(respOtherHeaders), '(automattic.com/jobs|x-pantheon-styx-hostname|wpe-backend)') AS platform FROM `httparchive.summary_requests.2019_05_01_*`)
USING
  (client, url)
WHERE
  platform IS NOT NULL
GROUP BY
  platform,
  client
ORDER BY
  n DESC
platform client n fast avg slow
WP Engine desktop 42,611 32.99% 40.41% 26.60%
WP Engine mobile 40,033 26.23% 46.59% 27.18%
Automattic desktop 15,755 10.73% 72.15% 17.12%
Automattic mobile 15,716 10.73% 69.62% 19.65%
Pantheon desktop 3,193 26.25% 44.68% 29.07%
Pantheon mobile 2,874 22.04% 42.66% 35.30%
1 Like

Super excited about the insights this (TTFB, that is) will unlock! It’s a unique lens on server performance that’s not available with any other existing benchmarks of hosting providers.

@patmeenan I believe we store PTR records as part of the HA crawl? I suspect that should give us a good base signal for identifying various hosting providers (and CDNs).

Other options we should explore…

  • Leverage IP ranges, e.g. https://github.com/client9/ipcat/blob/master/datacenters.csv
  • Augment detection with host specific fingerprints (like examples above)
    • The benefit of this approach is that a site may be using a CDN and the IP+DNS will point to the CDN, but the fingerprint can still help us identify the origin hosting provider.
    • Q: are we aware of any existing datasets for such identification?
2 Likes

Yep (if available). We record the ptr, cname and NS records for the base page (at least in theory, I don’t know how strongly it has been tested):

            "_base_page_dns_server": "ns-cloud-e1.googledomains.com",
            "_base_page_ip_ptr": "",
            "_base_page_cname": "",

Regarding the matching of automattic.com/jobs, I think we’ve got some sites on WordPress.com that don’t provide that in the response headers. I’m going to check on that.

2 Likes

It won’t catch a whole lot, but BlueHost uses the same favicon for its sites by default, and a lot of site owners never change it.

We (HubSpot) just added a header to sites powered by our CMS.

x-powered-by: HubSpot

Example: https://www.hubspot.com/

1 Like