Building on the technique in Analyzing stylesheets with a JS-based parser to tokenize stylesheets, and given how popular my tweet about z-index values was, let’s turn our attention to the font-weight
property. What are the top numeric font-weight
values?
The query to answer this question is in two parts: getting all font weights from a stylesheet or inline style block, then aggregating them by frequency. The first is done using a BigQuery UDF:
#standardSQL
CREATE TEMPORARY FUNCTION getNumericWeights(css STRING)
RETURNS ARRAY<NUMERIC> LANGUAGE js AS '''
try {
var reduceValues = (values, rule) => {
if ('rules' in rule) {
return rule.rules.reduce(reduceValues, values);
}
if (!('declarations' in rule)) {
return values;
}
rule.declarations.forEach(d => {
if (d.property.toLowerCase() == 'font-weight' &&
!isNaN(parseInt(d.value))) {
values.push(parseInt(d.value));
}
});
return values;
};
var $ = JSON.parse(css);
return $.stylesheet.rules.reduce(reduceValues, []);
} catch (e) {
return [];
}
''';
For each style declaration we check the property to see if it’s font-weight
, then check whether the value is numeric. If so, we append the value to the output array.
We can query the parsed CSS and get all numeric weights by calling this function:
SELECT
weight,
COUNT(0) AS freq
FROM
`httparchive.almanac.parsed_css`,
UNNEST(getNumericWeights(css)) AS weight
GROUP BY
weight
ORDER BY
freq DESC
The only thing tricky about this part is the UNNEST
. This turns rows of arrays of weights into rows of weights, allowing us to group them individually. Also note the comma at the end of the table name, this is an implicit CROSS JOIN
.
Putting the two together:
#standardSQL
CREATE TEMPORARY FUNCTION getNumericWeights(css STRING)
RETURNS ARRAY<NUMERIC> LANGUAGE js AS '''
try {
var reduceValues = (values, rule) => {
if ('rules' in rule) {
return rule.rules.reduce(reduceValues, values);
}
if (!('declarations' in rule)) {
return values;
}
rule.declarations.forEach(d => {
if (d.property.toLowerCase() == 'font-weight' &&
!isNaN(parseInt(d.value))) {
values.push(parseInt(d.value));
}
});
return values;
};
var $ = JSON.parse(css);
return $.stylesheet.rules.reduce(reduceValues, []);
} catch (e) {
return [];
}
''';
SELECT
weight,
COUNT(0) AS freq
FROM
`httparchive.almanac.parsed_css`,
UNNEST(getNumericWeights(css)) AS weight
GROUP BY
weight
ORDER BY
freq DESC
All results. Here are the top 10:
weight | freq |
---|---|
400 | 247,039,354 |
700 | 192,970,429 |
500 | 93,801,941 |
300 | 80,280,387 |
600 | 75,440,689 |
900 | 22,721,321 |
800 | 20,621,637 |
100 | 17,690,041 |
200 | 11,522,690 |
440 | 1,031,610 |
For the first 9, the results are unsurprising. 400 is the numeric version of normal
, the default weight. The second most popular weight is 700, the numeric version of bold
. The rest of the top 9 values seem to favor stronger weights than lighter ones, with 900 and 800 more popular than 100 and 200.
Things get weird after that. The 10th most popular weight is 440. This… seems like a typo. You can imagine hitting 4-4-0 instead of 4-0-0. But a typo with over a million instances? A quick search for font-weight: 440
brought me to this PR:
It seems WordPress core accidentally included a 440 weight. Whether these are the 1M 440s we see in HTTP Archive is inconclusive, although not impossible to query.
One interesting way to view the results is by capping it at weight 1000:
The chart shows a few cases of negative weights, which AFAIK would do nothing. There are many cases of weights between 0 and 100, which seem like accidents. I can’t imagine the text would look legible at that level of lightness. And there are several weights with frequencies 1k+ that appear to line up on exactly in the middle of the gridlines, 350 type weights.
It’s worth mentioning that positive weights not evenly divisible by 100 are perfectly valid. With variable fonts, it’s possible to have a font-weight
of 321 (there are 15 cases of that). Does that mean that this chart shows the popularity of variable fonts? Not necessarily. My guess is that most of these are accidental.
Removing the 1k frequency cap, there are many crazy font-weight
values:
(Both axes log scale)
The largest value is 100,200,300,400,500,000,000,000,000
, which occurs 58 times. It seems like the developer thought they were declaring a font-weight
with multiple fallback values, but then things went off the rails and it got multiplied by 100 billion for some reason.
Similarly, the next largest value with 42 occurrences is 300,400,500,600,700,000,000
. It seems like the same thought process entered a developer’s mind before arbitrarily multiplying the weight by 100 million.
The least popular values that occur only once are:
weight | freq |
---|---|
660,000 | 1 |
666,666 | 1 |
1,270 | 1 |
4,009 | 1 |
1,069 | 1 |
700,500 | 1 |
10,000,000,000,000,000,000 | 1 |
41,000 | 1 |
5,050 | 1 |
550,000 | 1 |
55,000 | 1 |
-1,000 | 1 |
-6 | 1 |
300,480 | 1 |
6,060 | 1 |
So you could say -6 is actually the loneliest number.