Sitemap.xml checker
Check if your sitemap is accessible, contains valid URLs, and has no errors
Check results
This check only covers sitemap.xml. For a full picture of your page, run a page audit.
For issues across your whole site — duplicate titles, orphan pages, broken internal links — run a site audit.
Want us to fix what we found? Our team can help.
What is sitemap.xml and why it matters
A sitemap is a file that lists the URLs on your site so search engines can discover them without crawling the entire graph link-by-link. It's particularly important for large sites, new sites without many inbound links, sites with deep pages not linked from the homepage, and content that updates frequently. Google and Bing both use the sitemap to decide what to crawl and how often. The spec comes from sitemaps.org; supported formats are XML (<urlset> and <sitemapindex>), RSS 2.0, and Atom.
What this tool checks
- Sitemap discovery — all
Sitemap:directives in robots.txt, plus the default/sitemap.xmlfallback - Multiple Sitemap directives — robots.txt can list several; all are parsed
- Sitemap reference in robots.txt — missed opportunity if present at /sitemap.xml but not referenced
- Format validation — recognized as urlset, sitemapindex, RSS, or Atom
- Sitemap Index handling — nested sitemaps checked for accessibility (first 10)
- URL count — sitemaps.org spec caps a single file at 50,000 URLs
- lastmod presence — helps search engines prioritize recrawl
- Future dates in lastmod — usually clock-skew or timezone bug
- All-same-date lastmod — likely CMS stamping generation time rather than edit time
- Foreign-domain URLs — Google rejects sitemaps referencing URLs outside the site
- HTTP URLs on HTTPS site — mixed protocol causes duplicate-content issues
- Current page membership — is the URL you're auditing included in the sitemap
Good vs bad examples
Good — minimal valid XML sitemap:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-04-10</lastmod>
</url>
</urlset>
Good — Sitemap Index for large sites (split by content type, each file under the 50K URL cap):
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap><loc>https://example.com/posts-sitemap.xml</loc></sitemap>
<sitemap><loc>https://example.com/products-sitemap.xml</loc></sitemap>
</sitemapindex>
Good — multiple Sitemap directives in robots.txt (supported by all major search engines):
User-agent: *
Disallow:
Sitemap: https://example.com/posts-sitemap.xml
Sitemap: https://example.com/products-sitemap.xml
Sitemap: https://example.com/categories-sitemap.xml
Bad — sitemap with URLs on a different domain (Google rejects):
<url><loc>https://other-domain.com/page</loc></url>
Bad — mixed HTTP and HTTPS URLs:
<url><loc>https://example.com/page1</loc></url>
<url><loc>http://example.com/page2</loc></url> <!-- should be https -->
Common mistakes
- No Sitemap directive in robots.txt — even if /sitemap.xml works, non-Google crawlers (Bing, Yandex, DuckDuckGo) rely on robots.txt for discovery
- Cross-domain URLs in a single sitemap — Google rejects. For multi-domain content, use per-domain sitemaps
- Mixing HTTP and HTTPS URLs — search engines treat them as different URLs, causing duplicate content
- Stale or all-same lastmod values — if every URL has the same
<lastmod>value (CMS stamps generation time), search engines downweight the signal - Missing lastmod entirely — crawlers can't tell which pages changed recently
- Single sitemap over 50,000 URLs or over 50 MB uncompressed — split via Sitemap Index (or gzip compressed — also spec-supported, though our current checker fetches uncompressed)
- Invalid XML — most often an un-escaped
&in a URL. Use& - Including noindex or non-canonical URLs — sitemaps should list the canonical version you want indexed, not every variant
Frequently asked questions
Sitemap: lines in robots.txt. Plugins like WordPress Yoast commonly generate several segmented sitemaps (post-sitemap.xml, page-sitemap.xml, category-sitemap.xml) and list each in robots.txt. All are fetched and processed.