Character encoding checker

Check if the page encoding is correctly specified and matches between the header and HTML

Free
No sign-up
Instant results

Check results

This check only covers encoding. For a full picture of your page, run a page audit.

For issues across your whole site — duplicate titles, orphan pages, broken internal links — run a site audit.

Want us to fix what we found? Our team can help.

What is character encoding and why it matters

Character encoding (charset) determines how bytes in an HTML document are converted into characters on screen. If the encoding is specified incorrectly or is missing, text on the page may display as garbled characters. The standard for the modern web is UTF-8.

What this tool checks

  • Content-Type header charset — the server-level declaration
  • Meta charset in HTML — the page-level declaration (both forms: <meta charset="UTF-8"> and the legacy <meta http-equiv="Content-Type">)
  • Header/meta consistency — when both are set, they should agree
  • Charset position — meta charset must appear within the first 1024 bytes per HTML5 spec
  • BOM detection — UTF-8 Byte Order Mark at file start
  • Actual encoding match — counts U+FFFD replacement characters to detect systemic mojibake
  • UTF-8 recommendation — legacy encodings flagged

Why encoding matters for SEO

  • Garbled content isn't indexed correctly — Google indexes what it reads; if it reads mojibake, that's what goes into the index
  • Non-ASCII content (Cyrillic, Chinese, accented Latin, emoji) is the sensitive case — pure ASCII pages rarely have encoding issues
  • User metrics suffer — users see "???? ????" instead of words and bounce immediately
  • UTF-8 is the HTML5 default — every modern framework, CMS, and editor defaults to it. Legacy encodings create compatibility problems

Good vs bad examples

Good — charset as the very first element in <head>:

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <title>...</title>
</head>

Good — server sending charset in Content-Type header:

Content-Type: text/html; charset=UTF-8

Bad — charset declared late (after many kilobytes of scripts/styles):

<head>
  <script src="/huge-analytics.js"></script>
  <style>/* ... 2 KB of CSS ... */</style>
  <meta charset="UTF-8">  <!-- too late, browser may miss it -->

Bad — header and meta disagree:

Content-Type: text/html; charset=windows-1251
...
<meta charset="UTF-8">  <!-- header wins, HTML lies -->

Bad — wrong declaration causes mojibake:

<meta charset="UTF-8">
...
Привет  →  ???????? (file actually saved as windows-1251)

Common mistakes

  • No charset declaration — browser guesses, usually wrong on non-ASCII pages
  • Using legacy encodings (windows-1251, ISO-8859-1, windows-1252) — they still work for specific languages but break on mixed content
  • Header/meta mismatch — the HTTP header wins, so the HTML declaration is misleading rather than just redundant
  • Charset meta tag placed late in <head> — browsers start decoding bytes as they arrive; a late declaration may miss the 1024-byte window
  • Declaring UTF-8 but saving as something else — the most common source of mojibake. Fix the file, not the declaration
  • BOM in HTML files — unnecessary and sometimes harmful in combination with server/template behaviors

Frequently asked questions

This usually happens due to a mismatch between the actual file encoding and what's specified in the meta tag or HTTP header. For example, the file is saved in windows-1251 but UTF-8 is specified in the meta tag. The solution is to standardize everything to UTF-8.
UTF-8 is a universal encoding that supports characters from all world languages, emojis, and special symbols. Windows-1251 only supports Cyrillic and Latin. The HTML5 standard recommends UTF-8, and all modern websites use it.
Both, and they should match. The HTTP header Content-Type: text/html; charset=UTF-8 is the most reliable — it tells the browser the encoding before a single HTML byte is parsed. The meta tag <meta charset="UTF-8"> is the HTML-level safety net and must appear within the first 1024 bytes of the document per HTML5 spec. If header and meta disagree, the header wins.
Browsers start decoding bytes as they receive them. If they're halfway through parsing the document using a guessed encoding and only then encounter <meta charset> declaring a different one, they have to restart decoding — and some browsers just don't. The HTML5 spec sets the 1024-byte cutoff as the reliable window. Practical consequence: put the charset declaration as the very first element inside <head>, before <title> and any scripts or styles.
Mojibake — the file bytes don't match the declared encoding. Most common cause: the file was originally saved in windows-1251, windows-1252, or ISO-8859-1, and someone switched the declaration to UTF-8 without actually converting the bytes. Fix: re-save the file as actual UTF-8 (most editors have "Save with encoding" or "UTF-8 without BOM"). Just editing the meta tag doesn't fix the file — you're lying to the browser. Command-line: iconv -f windows-1251 -t UTF-8 file.html.