Character encoding checker

Q: Where should encoding be specified — in HTML or the server header?

Both, and they should match. The HTTP header Content-Type: text/html; charset=UTF-8 is the most reliable — it tells the browser the encoding before a single HTML byte is parsed. The meta tag <meta charset="UTF-8"> is the HTML-level safety net and must appear within the first 1024 bytes of the document per HTML5 spec. If header and meta disagree, the header wins.

Check if the page encoding is correctly specified and matches between the header and HTML

Free

No sign-up

Instant results

Check results

This check only covers encoding. For a full picture of your page, run a page audit.

For issues across your whole site — duplicate titles, orphan pages, broken internal links — run a site audit.

Want us to fix what we found? Our team can help.

Page audit Site audit Fix errors

What is character encoding and why it matters

Character encoding (charset) determines how bytes in an HTML document are converted into characters on screen. If the encoding is specified incorrectly or is missing, text on the page may display as garbled characters. The standard for the modern web is UTF-8.

What this tool checks

Content-Type header charset — the server-level declaration
Meta charset in HTML — the page-level declaration (both forms: <meta charset="UTF-8"> and the legacy <meta http-equiv="Content-Type">)
Header/meta consistency — when both are set, they should agree
Charset position — meta charset must appear within the first 1024 bytes per HTML5 spec
BOM detection — UTF-8 Byte Order Mark at file start
Actual encoding match — counts U+FFFD replacement characters to detect systemic mojibake
UTF-8 recommendation — legacy encodings flagged

Why encoding matters for SEO

Garbled content isn't indexed correctly — Google indexes what it reads; if it reads mojibake, that's what goes into the index
Non-ASCII content (Cyrillic, Chinese, accented Latin, emoji) is the sensitive case — pure ASCII pages rarely have encoding issues
User metrics suffer — users see "???? ????" instead of words and bounce immediately
UTF-8 is the HTML5 default — every modern framework, CMS, and editor defaults to it. Legacy encodings create compatibility problems

Good vs bad examples

Good — charset as the very first element in <head>:

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <title>...</title>
</head>

Good — server sending charset in Content-Type header:

Content-Type: text/html; charset=UTF-8

Bad — charset declared late (after many kilobytes of scripts/styles):

<head>
  <script src="/huge-analytics.js"></script>
  <style>/* ... 2 KB of CSS ... */</style>
  <meta charset="UTF-8">  <!-- too late, browser may miss it -->

Bad — header and meta disagree:

Content-Type: text/html; charset=windows-1251
...
<meta charset="UTF-8">  <!-- header wins, HTML lies -->

Bad — wrong declaration causes mojibake:

<meta charset="UTF-8">
...
Привет  →  ???????? (file actually saved as windows-1251)

Common mistakes

No charset declaration — browser guesses, usually wrong on non-ASCII pages
Using legacy encodings (windows-1251, ISO-8859-1, windows-1252) — they still work for specific languages but break on mixed content
Header/meta mismatch — the HTTP header wins, so the HTML declaration is misleading rather than just redundant
Charset meta tag placed late in <head> — browsers start decoding bytes as they arrive; a late declaration may miss the 1024-byte window
Declaring UTF-8 but saving as something else — the most common source of mojibake. Fix the file, not the declaration
BOM in HTML files — unnecessary and sometimes harmful in combination with server/template behaviors

Frequently asked questions

Why does the page show garbled characters?

This usually happens due to a mismatch between the actual file encoding and what's specified in the meta tag or HTTP header. For example, the file is saved in windows-1251 but UTF-8 is specified in the meta tag. The solution is to standardize everything to UTF-8.

Why is UTF-8 better than windows-1251?

UTF-8 is a universal encoding that supports characters from all world languages, emojis, and special symbols. Windows-1251 only supports Cyrillic and Latin. The HTML5 standard recommends UTF-8, and all modern websites use it.

Where should encoding be specified — in HTML or the server header?

Both, and they should match. The HTTP header Content-Type: text/html; charset=UTF-8 is the most reliable — it tells the browser the encoding before a single HTML byte is parsed. The meta tag <meta charset="UTF-8"> is the HTML-level safety net and must appear within the first 1024 bytes of the document per HTML5 spec. If header and meta disagree, the header wins.

Why must <meta charset> be in the first 1024 bytes?

Browsers start decoding bytes as they receive them. If they're halfway through parsing the document using a guessed encoding and only then encounter <meta charset> declaring a different one, they have to restart decoding — and some browsers just don't. The HTML5 spec sets the 1024-byte cutoff as the reliable window. Practical consequence: put the charset declaration as the very first element inside <head>, before <title> and any scripts or styles.

Why does my page show ? or � instead of accented letters?

Mojibake — the file bytes don't match the declared encoding. Most common cause: the file was originally saved in windows-1251, windows-1252, or ISO-8859-1, and someone switched the declaration to UTF-8 without actually converting the bytes. Fix: re-save the file as actual UTF-8 (most editors have "Save with encoding" or "UTF-8 without BOM"). Just editing the meta tag doesn't fix the file — you're lying to the browser. Command-line: iconv -f windows-1251 -t UTF-8 file.html.

Character encoding checker

Check results

What is character encoding and why it matters

What this tool checks

Why encoding matters for SEO

Good vs bad examples

Common mistakes

Frequently asked questions

Related checks

Title tag

Meta description

Headings H1-H6

Open Graph

Schema.org

Viewport