Robots.txt checker

Q: What's the difference between robots.txt and noindex?

robots.txt blocks crawling — crawlers don't fetch the URL. noindex blocks indexing — crawlers fetch the URL but search engines don't show it in results. If you want a page to stay out of search results, use <meta name="robots" content="noindex"> and don't block the URL in robots.txt (crawlers need to fetch it to see the noindex directive). A common mistake: blocking a page in robots.txt thinking it will be deindexed — it won't, it just won't be crawled, and may still appear in search without a snippet.

Check if robots.txt is accessible on your site, whether it blocks important pages, and if it links to a sitemap

Free

No sign-up

Instant results

Check results

This check only covers robots.txt. For a full picture of your page, run a page audit.

For issues across your whole site — duplicate titles, orphan pages, broken internal links — run a site audit.

Want us to fix what we found? Our team can help.

Page audit Site audit Fix errors

What is robots.txt and why it matters

The robots.txt file is a plain-text file at the root of your domain (/robots.txt) that tells search-engine crawlers which paths they may or may not fetch. Crawlers read it before anything else — a single bad line can block the entire site from Google. It also points crawlers at your sitemap, which helps Bing, Yandex, and smaller engines discover URLs they otherwise wouldn't. robots.txt controls crawling, not indexing — a blocked page can still appear in search results if linked from elsewhere.

What this tool checks

File presence — whether /robots.txt responds with HTTP 200
HTML instead of text — some servers serve the homepage at /robots.txt when the file doesn't exist
File size — Google ignores content after the first 500 KB
Entire-site block — Disallow: / under User-agent: *
Sitemap directive — helps non-Google crawlers discover the sitemap
Deprecated Host directive — replaced by 301 redirects and canonical years ago
Crawl-delay — slows indexing; ignored by Google, respected by Bing
Wildcard rules — Disallow rules under User-agent: *
Per-bot blocks for major Western search — Googlebot and Bingbot (Bingbot also powers DuckDuckGo and Yahoo)
Current URL match — whether the URL you're auditing matches a Disallow rule

Good vs bad examples

Good — minimal valid robots.txt for most content sites:

User-agent: *
Disallow:
Sitemap: https://example.com/sitemap.xml

Good — blocking admin areas but allowing everything else:

User-agent: *
Disallow: /admin/
Disallow: /api/internal/
Sitemap: https://example.com/sitemap.xml

Bad — entire site blocked from search engines (staging-site pattern left on production):

User-agent: *
Disallow: /

Bad — Bingbot fully blocked (loses Bing, DuckDuckGo, and Yahoo traffic):

User-agent: Bingbot
Disallow: /

Bad — important resources blocked (prevents Google from rendering the page correctly):

User-agent: *
Disallow: /css/
Disallow: /js/
Disallow: /images/

Common mistakes

File missing entirely — crawlers fetch every accessible URL, waste of crawl budget on utility pages
Staging Disallow: / left on production — biggest catastrophe. Site falls out of search, often noticed weeks later by the SEO team
Blocking CSS, JS, or images — Google needs these to render the page. Blocking them can trigger mobile-usability issues
Using robots.txt to deindex — robots.txt blocks crawling, not indexing. Blocked pages can still appear in search (sans snippet) if externally linked. Use a noindex meta tag instead
Forgetting the Sitemap directive — not critical for Google (Search Console handles it) but matters for Bing, Yandex, DuckDuckGo
Per-bot Disallow with different intent — unique rules for a single crawler (Googlebot, Bingbot) are rarely needed and easy to misconfigure
HTML returned at /robots.txt — the server serves the homepage or a custom 404 page with 200 status. Google treats this as "no robots.txt present" but your analytics and testing may look fine

Frequently asked questions

Is robots.txt required?

Not formally — the site will work without it. But Google Search Console recommends having one. Without robots.txt, crawlers will scan all accessible pages, including utility pages.

Does robots.txt prevent indexing?

No. Robots.txt only prevents crawling. A page blocked in robots.txt can still be indexed if other sites link to it. To prevent indexing, use the noindex meta tag.

How quickly will Google see robots.txt changes?

Google re-crawls robots.txt every few days by default. To verify changes faster, use the robots.txt Tester in Google Search Console — it fetches the current version and shows you exactly which rules apply to any URL you paste in.

Do I still need Sitemap: in robots.txt if I use Search Console?

For Google alone, not strictly — Search Console's sitemap-submission tool is the primary channel. But the Sitemap: directive in robots.txt is the main discovery path for Bing, Yandex, DuckDuckGo, and smaller crawlers that don't have mainstream submission tools. It's a one-line addition with zero downside.

What's the difference between robots.txt and noindex?

robots.txt blocks crawling — crawlers don't fetch the URL. noindex blocks indexing — crawlers fetch the URL but search engines don't show it in results. If you want a page to stay out of search results, use <meta name="robots" content="noindex"> and don't block the URL in robots.txt (crawlers need to fetch it to see the noindex directive). A common mistake: blocking a page in robots.txt thinking it will be deindexed — it won't, it just won't be crawled, and may still appear in search without a snippet.

Robots.txt checker

Check results

What is robots.txt and why it matters

What this tool checks

Good vs bad examples

Common mistakes

Frequently asked questions

Related checks

Meta robots

sitemap.xml

Canonical

HTTP → HTTPS

www redirect

Server 404