Robots.txt Generator

Generate a valid robots.txt file to control how search engine crawlers access your website. Configure rules and download instantly.

robots.txt Preview
User-agent: *
Allow: /

What is robots.txt?

The robots.txt file tells search engine crawlers which pages or sections of your website they can or cannot access. It must be placed in the root of your domain (e.g., https://example.com/robots.txt). Use Disallow rules to block admin panels, staging paths, or duplicate content. Always add your sitemap URL at the bottom to help crawlers find all your pages.

The Complete Guide to robots.txt for SEO

The robots.txt file is a simple but powerful tool for controlling how search engine crawlers navigate your website. Placed in your domain root, it acts as a set of instructions to any bot that visits — telling them which pages they are welcome to crawl and which should be left alone. A correctly configured robots.txt protects sensitive areas of your site, prevents indexing of duplicate or low-quality content, and helps allocate your crawl budget efficiently on large websites.

How Robots.txt Works

When a search engine crawler like Googlebot arrives at your domain, the first URL it requests is /robots.txt. The crawler reads the file and follows its instructions before crawling any other pages. The file uses a simple directive syntax: User-agent specifies which bot the rules apply to, Disallow blocks access to specified paths, and Allow explicitly permits access (useful for overriding a broader disallow). The wildcard * in the User-agent field applies the rules to all crawlers.

It is critical to understand that robots.txt is a protocol based on voluntary compliance — it tells well-behaved crawlers (Googlebot, Bingbot, etc.) not to access certain areas, but it does not technically prevent access. Malicious bots routinely ignore robots.txt entirely. For truly sensitive content, use server-level access controls or password protection rather than relying on robots.txt.

What to Disallow in robots.txt

Common paths to disallow include: /admin/ and /wp-admin/ for CMS admin panels, /login and /account/ for user account pages, /cart and /checkout/ for e-commerce transaction pages, /search for internal search result pages, /api/ for API endpoints, staging or preview subdirectories, and any parameter-driven URLs that generate duplicate content without adding value to search results.

A common mistake is disallowing CSS and JavaScript files. If Googlebot cannot access your stylesheet and scripts, it cannot render your page and may undervalue its content — or worse, flag it as cloaking (showing different content to users than to bots). Unless you have a specific reason to block them, never disallow CSS, JavaScript, or image files.

Crawl Budget and Large Websites

Google allocates a "crawl budget" to each website — the number of pages Googlebot will crawl within a given timeframe. For most small to medium websites this is not a concern, as Google will crawl all pages readily. For large websites with tens of thousands or millions of pages, crawl budget becomes important. Disallowing low-value pages (paginated archives, filtered parameter pages, duplicate content) through robots.txt helps ensure Googlebot spends its crawl budget on your most important, unique content rather than wasting it on pages that add no search value.

Robots.txt vs. Meta Noindex

Robots.txt and the meta noindex tag serve different purposes and should not be confused. Robots.txt prevents crawling — Googlebot will not visit the page at all. Meta noindex prevents indexing — Googlebot can visit and crawl the page, but will not include it in search results. If you block a page with robots.txt, Google cannot discover the noindex tag on that page — so the page may still appear in search results (without a snippet) if other sites link to it. For pages you want definitively excluded from search results, use noindex and allow crawling so Google can read the instruction.

The Sitemap Directive

The Sitemap: directive at the bottom of robots.txt tells crawlers where to find your sitemap XML file. This is important because it allows any crawler that reads your robots.txt — not just ones that you have manually submitted sitemaps to — to discover your sitemap automatically. Include the full absolute URL: Sitemap: https://example.com/sitemap.xml. If you have multiple sitemaps or a sitemap index, you can include multiple Sitemap directives, one per line.

Testing Your robots.txt

After uploading your robots.txt file, test it immediately using Google Search Console's Robots Testing Tool (available under Settings → Robots.txt). This tool shows you how Googlebot interprets your rules and lets you test specific URLs to see whether they are allowed or blocked. Check the tool after any changes to ensure your rules work as intended and you have not accidentally blocked important pages.

robots.txt for Specific Crawlers

You can write separate rule blocks for different crawlers by specifying their User-agent names. For example, you might want to allow Googlebot to access your images while blocking other bots: User-agent: Googlebot-Image / Allow: /images/. You can also specifically block known bad bots by name while keeping rules permissive for legitimate search engines. Common bot User-agent strings to block include AhrefsBot, SemrushBot, DotBot, MJ12bot, and PetalBot — these aggressive crawlers consume bandwidth and server resources without contributing SEO value.