How does web font usage vary by country?

#1

I was running an audit of a Japanese website today and was surprised to see that it loads over 1 MB of fonts. Here’s the content breakdown provided by WebPageTest:

MIME Type Bytes Uncompressed
image 5,437,450 5,437,450
font 1,025,064 1,025,064
js 287,016 313,603
css 143,103 143,103
html 15,828 15,048
other 13,882 13,882

Yeah, images are the biggest issue (5.5 MB!) but this post is specifically about the fonts. Here’s the info we have about each font file:

Resource Bytes Downloaded
…/fonts/NotoSansCJKjp-DemiLight.woff2 483.8 KB
…/fonts/NotoSansCJKjp-Medium.woff2 487.2 KB
`https://fonts.gstatic.com/s/roboto/v18/…qEu92Fr1Mu4mxK.woff2 15.0 KB
`https://fonts.gstatic.com/s/roboto/v18/…2Fr1MmWUlfBBc4.woff2 15.1 KB

So there were two Roboto files loaded from the Google Fonts CDN for a total of only 30 KB. The big issue is the Noto Sans CJK files.

Noto Sans CJK and Noto Serif CJK comprehensively cover Simplified Chinese, Traditional Chinese, Japanese, and Korean in a unified font family. This includes the full coverage of CJK Ideographs with variation support for 4 regions, Kangxi radicals, Japanese Kana, Korean Hangul, and other CJK symbols and letters in the Basic Multilingual Plane of Unicode. It also provides limited coverage of CJK Ideographs in Plane 2 of Unicode as necessary to support standards from China and Japan.

https://www.google.com/get/noto/help/cjk/

Also:

…be aware that the web latency for large fonts, such as for Noto Sans CJK, can be large.

https://www.google.com/get/noto/help/guidelines/

In a State of the Web episode on web fonts with my guest Dave Crossland, Dave talked about the challenges of loading CJK fonts and how they can be 100x larger than a European font. Here’s the relevant clip and transcript:

The biggest challenge has been for Chinese, Japanese, and Korean fonts. A typical font for Indian languages can maybe be two or three times larger than a European font. But for East Asia, it can be a hundred times bigger.

That was a big wind-up but that leads me to my web transparency question: if CJK fonts are so huge, how does the median number of font KB compare across countries? We should expect to see Chinese, Japanese, and Korean websites have more font bytes, right?

I adapted the following query from the CrUX Cookbook. The country-specific CrUX tables contain origins for popular websites visited by Chrome users. Because the HTTP Archive crawls the home pages of all CrUX origins, we can JOIN these datasets together in BigQuery and answer our web transparency question.

#standardSQL
WITH
  countries AS (
  SELECT *, 'ad' AS country_code, 'Andorra' AS country FROM `chrome-ux-report.country_ad.201903` UNION ALL
  SELECT *, 'ae' AS country_code, 'United Arab Emirates' AS country FROM `chrome-ux-report.country_ae.201903` UNION ALL
  SELECT *, 'af' AS country_code, 'Afghanistan' AS country FROM `chrome-ux-report.country_af.201903` UNION ALL
  SELECT *, 'ag' AS country_code, 'Antigua and Barbuda' AS country FROM `chrome-ux-report.country_ag.201903` UNION ALL
  SELECT *, 'ai' AS country_code, 'Anguilla' AS country FROM `chrome-ux-report.country_ai.201903` UNION ALL
  SELECT *, 'al' AS country_code, 'Albania' AS country FROM `chrome-ux-report.country_al.201903` UNION ALL
  SELECT *, 'am' AS country_code, 'Armenia' AS country FROM `chrome-ux-report.country_am.201903` UNION ALL
  SELECT *, 'ao' AS country_code, 'Angola' AS country FROM `chrome-ux-report.country_ao.201903` UNION ALL
  SELECT *, 'ar' AS country_code, 'Argentina' AS country FROM `chrome-ux-report.country_ar.201903` UNION ALL
  SELECT *, 'as' AS country_code, 'American Samoa' AS country FROM `chrome-ux-report.country_as.201903` UNION ALL
  SELECT *, 'at' AS country_code, 'Austria' AS country FROM `chrome-ux-report.country_at.201903` UNION ALL
  SELECT *, 'au' AS country_code, 'Australia' AS country FROM `chrome-ux-report.country_au.201903` UNION ALL
  # All ~200 countries...
  # See the link to the query below for the unabridged version.
)

SELECT
  _TABLE_SUFFIX AS client,
  country,
  COUNT(0) AS urls,
  ROUND(APPROX_QUANTILES(bytesFont, 1001)[OFFSET(501)] / 1024, 2) AS median_font_bytes
FROM
  countries
JOIN
  `httparchive.summary_pages.2019_02_01_*`
ON
  CONCAT(origin, '/') = url
GROUP BY
  client,
  country
client country median_font_bytes urls
desktop China 0 23138
desktop Japan 14.54 560364
desktop Korea 70.35 143057
mobile China 1.17 13342
mobile Japan 10.54 653001
mobile Korea 51.29 131356

Query | Results

Asia Desktop:
image

Asia Mobile:
image

So according to the stats in the Page Weight report, the median font bytes for desktop/mobile is around 100 KB. China and Japan have relatively few font bytes, although China’s sample size is much smaller. Interestingly, Korea has more font bytes than other CJK websites, at 70 KB for desktop.

So why would CJK websites load fewer font bytes? Maybe it’s because the web font files are just so prohibitively huge that it’s not even worth it and they rely on system fonts. Korea is known to have some of the fastest internet in the world, so maybe the download cost is more tolerable. Does anyone have any other insights either from the data or real world experience?

1 Like
#2

Interestingly, the http archive test machines (and WebPageTest machines) have all of the Noto system fonts installed so a font rule that prefers a local version and falls back to webfont is even more important for CJK.

1 Like
#3

Oh, yeah that’s interesting. How and why is Noto used locally? I’d imagine it’s fairly unusual for end users to have it installed and might skew the lab results. Should run a test to be sure…

#4

It is installed locally under the assumption that people in the CJK countries will have local system fonts installed and not just latin.

#5

Much as I like good typography, this is really proof that web-fonts really are a solution in search of a problem. One of my customers is a huge fan of Helvetica and insists on using it for all corporate websites. However, waivers have been given to the east Asian websites precisely because of the problem. Add to this, of course, the inbuilt mismatch between the idea of a typeface designed for alphabets with ideograms and you see “bias” at work. Use system fonts wherever you can. Where they really must be used, it is possible to dynamically assess the characters required for a site or even for a single page and create a much smaller font file: the size is generally down to the need for a lot of very rarely used glyphs.

Fortunately, some of the proposed improvements to CSS should reduce the need for web fonts in the future.

#6

I think we can agree that websites should have the control to stylize text to fit their brand/identity and ensure that it looks consistent across devices. It’s on them to ensure that they do it responsibly and efficiently, just like all other aspects of web dev (image optimization, JS usage, mobile friendliness, etc).

Can you elaborate on why you think this is a case for system fonts? For example, Roboto looks great in Latin languages and doesn’t support CJK. That’s by design and why Noto exists. Web devs can serve one or the other using localization techniques.

Also, anecdotally, I’ve heard that CJK system fonts can have poor legibility and web fonts are often the preferred way to counter that. I’ll try to loop in some people with more familiarity on the subject for their perspectives.

I actually think things like variable fonts and color fonts will increase the desire for web fonts, giving developers and designers more control over how their text appears.

Anyway, thanks for sharing your perspective. It’s always great to hear other POV and have a healthy discussion.

#7

I believe Google Fonts is blocked in China. Wouldn’t that be a big part of the issue here? That was my real world experience a couple years ago, and I don’t think that’s changed.

That wouldn’t explain why it’s so low in Japan however.

#8

All devices have system fonts for the relevant character sets and local fonts will always be faster… Typefaces for Latin (and Cyrillic) won’t work for all languages. For example, some characters must have serifs so it doesn’t make sense to try to force one on them, though there are usually equivalents.

Browsers for years have been adept at automatic substitution though mixing say Arial and CJK on the same page can cause problems, at least it used to look odd on Windows when did this.

We really need the concept of performance budgets for typefaces, say 40KB max. Variable fonts are pretty neat but I suspect we’ll see more and more functionality moved into CSS so that you only “need” to ship minimal typeface definitions and let the browser do faux bold, faux italic, etc. Designers will hate it but it’s the best approach for the user. I know of one site where 7 fonts of the same typeface are used for the homepage and this is about 50% of the pageload before onload.

And, working with multiple languages including German which is famous for long words: I’d really like to see hyphenation support improve in browsers, less glamourous but far more important in my view. Fortunately, Chromium is finally getting it but the results in all browsers still leave a lot to be desired.