Ilya recently tweeted about Eric Lawrence’s important article about making sure to specify a Charset for HTML documents. Without it the browser may have to restart parsing the HTML or display it with errors. As Eric says, “for functionality and performance reasons, it is a best-practice to specify the encoding using HTTP response headers.”
Specifying the Charset is done within the Content-Type response header. Here’s a quick check on how many websites do this.
The number of HTML documents in the most recent crawl is 291,356:
select count(*) as num from httparchive:runs.2014_02_01_requests where firsthtml=true``` Of those, the number that specify Charset is 194,368:
select count(*) as num from httparchive:runs.2014_02_01_requests
where firsthtml=true and resp_content_type like “%charset%”```
Conclusion: Across the top ~300K websites 67% specify Charset in an HTTP header.