Robots.txt checker

Check if robots.txt is accessible on your site, whether it blocks important pages, and if it links to a sitemap

Free
No sign-up
Instant results

Check results

This check only covers robots.txt. For a full picture of your page, run a page audit.

For issues across your whole site — duplicate titles, orphan pages, broken internal links — run a site audit.

Want us to fix what we found? Our team can help.

What is robots.txt and why it matters

The robots.txt file is a plain-text file at the root of your domain (/robots.txt) that tells search-engine crawlers which paths they may or may not fetch. Crawlers read it before anything else — a single bad line can block the entire site from Google. It also points crawlers at your sitemap, which helps Bing, Yandex, and smaller engines discover URLs they otherwise wouldn't. robots.txt controls crawling, not indexing — a blocked page can still appear in search results if linked from elsewhere.

What this tool checks

  • File presence — whether /robots.txt responds with HTTP 200
  • HTML instead of text — some servers serve the homepage at /robots.txt when the file doesn't exist
  • File size — Google ignores content after the first 500 KB
  • Entire-site blockDisallow: / under User-agent: *
  • Sitemap directive — helps non-Google crawlers discover the sitemap
  • Deprecated Host directive — replaced by 301 redirects and canonical years ago
  • Crawl-delay — slows indexing; ignored by Google, respected by Bing
  • Wildcard rules — Disallow rules under User-agent: *
  • Per-bot blocks for major Western search — Googlebot and Bingbot (Bingbot also powers DuckDuckGo and Yahoo)
  • Current URL match — whether the URL you're auditing matches a Disallow rule

Good vs bad examples

Good — minimal valid robots.txt for most content sites:

User-agent: *
Disallow:
Sitemap: https://example.com/sitemap.xml

Good — blocking admin areas but allowing everything else:

User-agent: *
Disallow: /admin/
Disallow: /api/internal/
Sitemap: https://example.com/sitemap.xml

Bad — entire site blocked from search engines (staging-site pattern left on production):

User-agent: *
Disallow: /

Bad — Bingbot fully blocked (loses Bing, DuckDuckGo, and Yahoo traffic):

User-agent: Bingbot
Disallow: /

Bad — important resources blocked (prevents Google from rendering the page correctly):

User-agent: *
Disallow: /css/
Disallow: /js/
Disallow: /images/

Common mistakes

  • File missing entirely — crawlers fetch every accessible URL, waste of crawl budget on utility pages
  • Staging Disallow: / left on production — biggest catastrophe. Site falls out of search, often noticed weeks later by the SEO team
  • Blocking CSS, JS, or images — Google needs these to render the page. Blocking them can trigger mobile-usability issues
  • Using robots.txt to deindex — robots.txt blocks crawling, not indexing. Blocked pages can still appear in search (sans snippet) if externally linked. Use a noindex meta tag instead
  • Forgetting the Sitemap directive — not critical for Google (Search Console handles it) but matters for Bing, Yandex, DuckDuckGo
  • Per-bot Disallow with different intent — unique rules for a single crawler (Googlebot, Bingbot) are rarely needed and easy to misconfigure
  • HTML returned at /robots.txt — the server serves the homepage or a custom 404 page with 200 status. Google treats this as "no robots.txt present" but your analytics and testing may look fine

Frequently asked questions

Not formally — the site will work without it. But Google Search Console recommends having one. Without robots.txt, crawlers will scan all accessible pages, including utility pages.
No. Robots.txt only prevents crawling. A page blocked in robots.txt can still be indexed if other sites link to it. To prevent indexing, use the noindex meta tag.
Google re-crawls robots.txt every few days by default. To verify changes faster, use the robots.txt Tester in Google Search Console — it fetches the current version and shows you exactly which rules apply to any URL you paste in.
For Google alone, not strictly — Search Console's sitemap-submission tool is the primary channel. But the Sitemap: directive in robots.txt is the main discovery path for Bing, Yandex, DuckDuckGo, and smaller crawlers that don't have mainstream submission tools. It's a one-line addition with zero downside.
robots.txt blocks crawling — crawlers don't fetch the URL. noindex blocks indexing — crawlers fetch the URL but search engines don't show it in results. If you want a page to stay out of search results, use <meta name="robots" content="noindex"> and don't block the URL in robots.txt (crawlers need to fetch it to see the noindex directive). A common mistake: blocking a page in robots.txt thinking it will be deindexed — it won't, it just won't be crawled, and may still appear in search without a snippet.