Sitemap.xml checker

Check if your sitemap is accessible, contains valid URLs, and has no errors

Free
No sign-up
Instant results

Check results

This check only covers sitemap.xml. For a full picture of your page, run a page audit.

For issues across your whole site — duplicate titles, orphan pages, broken internal links — run a site audit.

Want us to fix what we found? Our team can help.

What is sitemap.xml and why it matters

A sitemap is a file that lists the URLs on your site so search engines can discover them without crawling the entire graph link-by-link. It's particularly important for large sites, new sites without many inbound links, sites with deep pages not linked from the homepage, and content that updates frequently. Google and Bing both use the sitemap to decide what to crawl and how often. The spec comes from sitemaps.org; supported formats are XML (<urlset> and <sitemapindex>), RSS 2.0, and Atom.

What this tool checks

  • Sitemap discovery — all Sitemap: directives in robots.txt, plus the default /sitemap.xml fallback
  • Multiple Sitemap directives — robots.txt can list several; all are parsed
  • Sitemap reference in robots.txt — missed opportunity if present at /sitemap.xml but not referenced
  • Format validation — recognized as urlset, sitemapindex, RSS, or Atom
  • Sitemap Index handling — nested sitemaps checked for accessibility (first 10)
  • URL count — sitemaps.org spec caps a single file at 50,000 URLs
  • lastmod presence — helps search engines prioritize recrawl
  • Future dates in lastmod — usually clock-skew or timezone bug
  • All-same-date lastmod — likely CMS stamping generation time rather than edit time
  • Foreign-domain URLs — Google rejects sitemaps referencing URLs outside the site
  • HTTP URLs on HTTPS site — mixed protocol causes duplicate-content issues
  • Current page membership — is the URL you're auditing included in the sitemap

Good vs bad examples

Good — minimal valid XML sitemap:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-04-10</lastmod>
  </url>
</urlset>

Good — Sitemap Index for large sites (split by content type, each file under the 50K URL cap):

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap><loc>https://example.com/posts-sitemap.xml</loc></sitemap>
  <sitemap><loc>https://example.com/products-sitemap.xml</loc></sitemap>
</sitemapindex>

Good — multiple Sitemap directives in robots.txt (supported by all major search engines):

User-agent: *
Disallow:
Sitemap: https://example.com/posts-sitemap.xml
Sitemap: https://example.com/products-sitemap.xml
Sitemap: https://example.com/categories-sitemap.xml

Bad — sitemap with URLs on a different domain (Google rejects):

<url><loc>https://other-domain.com/page</loc></url>

Bad — mixed HTTP and HTTPS URLs:

<url><loc>https://example.com/page1</loc></url>
<url><loc>http://example.com/page2</loc></url>  <!-- should be https -->

Common mistakes

  • No Sitemap directive in robots.txt — even if /sitemap.xml works, non-Google crawlers (Bing, Yandex, DuckDuckGo) rely on robots.txt for discovery
  • Cross-domain URLs in a single sitemap — Google rejects. For multi-domain content, use per-domain sitemaps
  • Mixing HTTP and HTTPS URLs — search engines treat them as different URLs, causing duplicate content
  • Stale or all-same lastmod values — if every URL has the same <lastmod> value (CMS stamps generation time), search engines downweight the signal
  • Missing lastmod entirely — crawlers can't tell which pages changed recently
  • Single sitemap over 50,000 URLs or over 50 MB uncompressed — split via Sitemap Index (or gzip compressed — also spec-supported, though our current checker fetches uncompressed)
  • Invalid XML — most often an un-escaped & in a URL. Use &amp;
  • Including noindex or non-canonical URLs — sitemaps should list the canonical version you want indexed, not every variant

Frequently asked questions

Not formally — search engines can find pages through links. But for large sites and new pages, a sitemap significantly speeds up indexing. Google recommends having one.
A Sitemap Index is a map of maps. Instead of page URLs, it contains links to other sitemap files. Used on large sites where the URL count exceeds the single file limit (50,000 URLs).
Whenever pages are added, removed, or significantly updated. Most CMS platforms (WordPress, Shopify, Wix, Ghost) regenerate the sitemap automatically on every content change. For static sites, most generators (Next.js, Hugo, Astro, Gatsby) create it at build time. If you maintain the file manually, update it with every structural change — stale sitemaps are worse than no sitemap because search engines cache them.
Yes — the sitemaps.org protocol and all major search engines (Google, Bing, Yandex, DuckDuckGo) support multiple Sitemap: lines in robots.txt. Plugins like WordPress Yoast commonly generate several segmented sitemaps (post-sitemap.xml, page-sitemap.xml, category-sitemap.xml) and list each in robots.txt. All are fetched and processed.
No. A sitemap tells search engines "here are URLs you might want to crawl." Whether they actually crawl and index each URL still depends on content quality, site authority, crawl budget, and whether the URL is allowed by robots.txt / meta-robots. The sitemap helps with discovery, not ranking. Pages that are noindex, blocked in robots.txt, or point to a different canonical won't get indexed regardless of sitemap listing.