How many sites specify Charset in headers?

Ilya recently tweeted about Eric Lawrence’s important article about making sure to specify a Charset for HTML documents. Without it the browser may have to restart parsing the HTML or display it with errors. As Eric says, “for functionality and performance reasons, it is a best-practice to specify the encoding using HTTP response headers.”

Specifying the Charset is done within the Content-Type response header. Here’s a quick check on how many websites do this.

The number of HTML documents in the most recent crawl is 291,356:

select count(*) as num from httparchive:runs.2014_02_01_requests 
where firsthtml=true```

Of those, the number that specify Charset is 194,368:

select count(*) as num from httparchive:runs.2014_02_01_requests
where firsthtml=true and resp_content_type like “%charset%”```

Conclusion: Across the top ~300K websites 67% specify Charset in an HTTP header.

3 Likes

Just for fun, in one query:

select
  count(*) as num,
  resp_content_type like "%charset%" has_charset,
  ratio_to_report(num) over() ratio
 from httparchive:runs.2014_02_01_requests 
 where firsthtml=true
 group by has_charset