Character encoding checker
Check if the page encoding is correctly specified and matches between the header and HTML
Check results
This check only covers encoding. For a full picture of your page, run a page audit.
For issues across your whole site — duplicate titles, orphan pages, broken internal links — run a site audit.
Want us to fix what we found? Our team can help.
What is character encoding and why it matters
Character encoding (charset) determines how bytes in an HTML document are converted into characters on screen. If the encoding is specified incorrectly or is missing, text on the page may display as garbled characters. The standard for the modern web is UTF-8.
What this tool checks
- Content-Type header charset — the server-level declaration
- Meta charset in HTML — the page-level declaration (both forms:
<meta charset="UTF-8">and the legacy<meta http-equiv="Content-Type">) - Header/meta consistency — when both are set, they should agree
- Charset position — meta charset must appear within the first 1024 bytes per HTML5 spec
- BOM detection — UTF-8 Byte Order Mark at file start
- Actual encoding match — counts U+FFFD replacement characters to detect systemic mojibake
- UTF-8 recommendation — legacy encodings flagged
Why encoding matters for SEO
- Garbled content isn't indexed correctly — Google indexes what it reads; if it reads mojibake, that's what goes into the index
- Non-ASCII content (Cyrillic, Chinese, accented Latin, emoji) is the sensitive case — pure ASCII pages rarely have encoding issues
- User metrics suffer — users see "???? ????" instead of words and bounce immediately
- UTF-8 is the HTML5 default — every modern framework, CMS, and editor defaults to it. Legacy encodings create compatibility problems
Good vs bad examples
Good — charset as the very first element in <head>:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>...</title>
</head>
Good — server sending charset in Content-Type header:
Content-Type: text/html; charset=UTF-8
Bad — charset declared late (after many kilobytes of scripts/styles):
<head>
<script src="/huge-analytics.js"></script>
<style>/* ... 2 KB of CSS ... */</style>
<meta charset="UTF-8"> <!-- too late, browser may miss it -->
Bad — header and meta disagree:
Content-Type: text/html; charset=windows-1251
...
<meta charset="UTF-8"> <!-- header wins, HTML lies -->
Bad — wrong declaration causes mojibake:
<meta charset="UTF-8">
...
Привет → ???????? (file actually saved as windows-1251)
Common mistakes
- No charset declaration — browser guesses, usually wrong on non-ASCII pages
- Using legacy encodings (
windows-1251,ISO-8859-1,windows-1252) — they still work for specific languages but break on mixed content - Header/meta mismatch — the HTTP header wins, so the HTML declaration is misleading rather than just redundant
- Charset meta tag placed late in
<head>— browsers start decoding bytes as they arrive; a late declaration may miss the 1024-byte window - Declaring UTF-8 but saving as something else — the most common source of mojibake. Fix the file, not the declaration
- BOM in HTML files — unnecessary and sometimes harmful in combination with server/template behaviors
Frequently asked questions
Content-Type: text/html; charset=UTF-8 is the most reliable — it tells the browser the encoding before a single HTML byte is parsed. The meta tag <meta charset="UTF-8"> is the HTML-level safety net and must appear within the first 1024 bytes of the document per HTML5 spec. If header and meta disagree, the header wins.<meta charset> declaring a different one, they have to restart decoding — and some browsers just don't. The HTML5 spec sets the 1024-byte cutoff as the reliable window. Practical consequence: put the charset declaration as the very first element inside <head>, before <title> and any scripts or styles.windows-1251, windows-1252, or ISO-8859-1, and someone switched the declaration to UTF-8 without actually converting the bytes. Fix: re-save the file as actual UTF-8 (most editors have "Save with encoding" or "UTF-8 without BOM"). Just editing the meta tag doesn't fix the file — you're lying to the browser. Command-line: iconv -f windows-1251 -t UTF-8 file.html.