Robots.txt Generator
{{ summaryLabel }}
Rules for {{ hostClean || 'the selected host' }}
{{ stats.sections }} user-agent{{ stats.sections === 1 ? '' : 's' }} {{ stats.allowCount }} allow {{ stats.disallowCount }} disallow {{ stats.sitemapCount }} sitemap{{ stats.sitemapCount === 1 ? '' : 's' }} {{ warnings.length }} publish check{{ warnings.length === 1 ? '' : 's' }} Blocks all crawlers
Robots.txt generator inputs
Choose allow-all, block-all, sensitive-path, throttle, or Custom.
Enter the exact origin, such as https://www.example.com.
Enter one full URL per line; leading slashes use the Site URL origin.
User-agent sections:
Add one section per crawler policy, then open rows to edit paths, delay, and comments.
Use * for all crawlers, or pick a named bot for a targeted rule group.
Enter the exact user-agent product token, for example ExampleBot.
Enter one root-relative path per line, such as / or /public/.
Enter one path per line; plain paths are normalized with a leading slash.
Enter seconds between requests, or 0 to omit Crawl-delay for this section.
seconds
Write the review note that should appear above this section.
Enter a hostname like www.example.com, or leave blank to omit Host.
Enter seconds for a fallback delay; section-level values override it.
seconds
Turn on to add a generated # header before the first directive.
{{ add_header_comment ? 'On' : 'Off' }}
Turn on to emit Disallow: with a blank value for open sections.
{{ allow_all_placeholder ? 'On' : 'Off' }}
Turn on to alphabetize each section's path rules before export.
{{ sort_directives ? 'On' : 'Off' }}
Turn on for one blank line between rendered crawler sections.
{{ pad_sections ? 'On' : 'Off' }}
Paste plain robots.txt content, then choose Import to rebuild the form fields.
Robots parser
{{ importError }}
{{ importSuccess }}

                        
# User-agent Allow Disallow Crawl-delay Comment Copy
Add a user-agent section to generate robots.txt.
{{ row.idx }} {{ row.userAgent }} {{ row.allowSummary }} {{ row.disallowSummary }} {{ row.crawlDelayLabel }} {{ row.note }}
Check Status Detail Copy
Publish readiness {{ warnings.length ? 'Review before publishing' : 'No blocking checks detected' }} Robots.txt is public crawl guidance, not access control. Keep sensitive paths protected by authentication.
Warning Issue {{ idx + 1 }} {{ warn }}
Structure Ready Host, sitemap, and section structure are present for the current output.

                        
Customize
Advanced
:

Introduction

robots.txt is the root-level crawl policy file for a host. It tells compliant crawlers which URL paths they may fetch and which paths they should skip. That is useful for shaping crawl traffic around admin areas, account flows, internal search pages, faceted duplicates, or short-lived staging hosts. It is not a privacy control, and it does not guarantee that a blocked URL will disappear from search results.

This generator turns that policy into a reviewable draft. You can start from a preset, define one or more user-agent groups, add allow and disallow paths, include sitemap URLs, keep an optional host line, attach notes as comments, and export the same draft as plain text, CSV, DOCX, or JSON.

That makes it useful in a few different situations. A developer can block a staging environment before launch. A site owner can keep crawlers away from checkout, account, or search-result pages while leaving the public catalog open. A team inheriting a messy file can paste the old text into the import box and turn it into a clearer table for review.

The tool stays focused on drafting and inspection. It does not fetch your live /robots.txt, test whether every crawler will obey every line, or publish anything to the site root. What it gives you is a structured file draft, warning hints, and synchronized output views so you can catch mistakes before deployment.

All processing happens in the browser. The generated text, imported rules, table rows, JSON view, and export files are assembled locally, which is useful when the draft contains internal paths you would rather not send to a remote service.

Technical Details

The current robots standard in RFC 9309 centers on User-agent, Allow, and Disallow. Matching is path-based, case-sensitive, and resolved by the most specific rule. When equally specific allow and disallow rules conflict, the allow rule wins. Google also documents support for Sitemap inside robots.txt, while fields such as Crawl-delay sit outside Google's supported rule set.

This generator builds its output from explicit sections. Each section holds one crawler token, zero or more allow paths, zero or more disallow paths, an optional crawl-delay value, and an optional note. Bare rule entries are normalized into root-relative paths, duplicate entries are removed, and sitemap lines can be expanded from site-relative input into full URLs by using the site URL field as the base.

The advanced switches affect how the draft is rendered, not how the protocol itself works. You can prepend a generated comment with timestamp and host, keep an explicit blank Disallow: line when a section is intentionally open, sort rules for cleaner diffs, and insert blank lines between user-agent groups for readability.

Import follows the same practical model. The parser reads User-agent, Allow, Disallow, Crawl-delay, Sitemap, and the first Host line it encounters. A leading comment is carried forward as the note for the next section. If the pasted text contains no user-agent group at all, the tool stops with an error instead of trying to infer a policy from stray lines.

Policy Draft Flow

site URL -> host + sitemap normalization -> user-agent sections

sections -> optional comments + sorted rules -> finished robots.txt text

same draft -> directive table + JSON payload + CSV/DOCX exports

Directive coverage and standards status for this robots.txt generator
Directive or element How the tool handles it Standards or compatibility note
User-agent Supports preset crawler names and custom tokens, one section at a time. Core REP syntax. Matching user-agent groups may be combined by crawlers when they interpret the final file.
Allow and Disallow Accepts one path per line, normalizes missing leading slashes, keeps wildcard-friendly strings, and can sort the final rules. Core REP behavior. Specificity matters more than line order when crawlers evaluate matching rules.
Sitemap Accepts multiple entries, expands site-relative values from the site URL, and appends them after the rule groups. Widely supported. Google expects fully qualified sitemap URLs and treats the field separately from user-agent groups.
Crawl-delay Allows a global default and per-section values, then carries them into text, table, and JSON views. Compatibility-only. Google does not support it, and Yandex says it has ignored the directive since February 22, 2018.
Host Cleans the host value from the site URL or manual input and appends the line when present. Not part of RFC 9309. Treat it as an optional compatibility line, not as the main way to declare canonical hosting.
Review surfaces produced by the generator
Output surface What it shows Best use
Robots.txt The final plain-text file, including comments, user-agent groups, optional host, and sitemap lines. Final copy review before publishing the file to the site root.
Directive Table One normalized row per section with allow, disallow, crawl-delay, and note summaries. Fast comparison across crawler groups and easy CSV or DOCX handoff.
JSON The draft policy as structured data, including host, sitemaps, sections, and generation time. Structured review, automation handoff, or archival of the current draft state.

Everyday Use & Decision Guide

Start with the crawl posture you actually want. If the host should be publicly crawlable, keep the wildcard section simple and block only the paths that create noise or exposure. If the host is temporary or private, a full-site disallow is easier to verify than a long list of supposedly sensitive folders.

Use multiple sections only when crawler behavior really differs. If every crawler should see the same policy, one wildcard block is easier to review than several repeated sections. When you do split the draft by crawler, make the reason obvious. Common examples are a specific partner bot, a media bot, or a legacy crawler that still gets a different rule set.

Allow rules work best as narrow exceptions inside a broader disallow. A classic pattern is to block a larger path tree and reopen one public subpath or file. If you are only trying to keep crawlers away from admin, cart, checkout, login, account, or on-site search pages, you often need only a short disallow list plus sitemap lines that point toward the public content you do want discovered.

Sitemaps deserve more attention than they usually get. A clean sitemap line gives crawlers a direct route to the URL inventory you want indexed, which is especially helpful on larger sites or sites with deep navigation. The host line is different. The generator supports it because some workflows still expect it, but the field sits outside the core standard and should not carry more weight than redirects, canonical signals, or host-specific deployment choices.

Be cautious with Crawl-delay. It is easy to add, but it is only a request to compatible bots. Google ignores it, Yandex no longer honors it, and other crawlers may interpret it differently or not at all. If your real goal is rate control, use server-side protections or crawler-specific webmaster settings rather than assuming this line will solve the problem by itself.

Step-by-Step Guide

  1. Enter the site URL first so the tool can derive the host value and expand site-relative sitemap entries.
  2. Choose a preset only if it already resembles your real policy. Otherwise start from Custom.
  3. Create one wildcard section or one crawler-specific section for each genuinely different policy.
  4. Add one allow or disallow path per line, and use allow rules only when you need a clear exception inside a broader block.
  5. Add one or more sitemap URLs for the public content you want discovered, then decide whether the optional host line belongs in your workflow.
  6. Open the advanced panel if you want a generated header comment, explicit allow-all placeholder, sorted rules, section spacing, or an import pass from an existing file.
  7. Review the warnings, then compare the plain-text file with the table and JSON views to make sure the draft still tells one consistent story.
  8. After export, publish the final file as UTF-8 plain text at the exact host root, using /robots.txt for that protocol and host, and test that it is publicly reachable.

Interpreting Results

The summary box is a quick read, not a full audit. Crawling open means the current draft does not contain any disallow rule. Selective crawling means some paths are blocked. All crawling blocked is a high-risk flag triggered when the draft contains a full-site disallow such as / or /* in one of the sections. Before you publish a draft with that label, check whether the full block applies to the wildcard group or only to one named crawler.

The warning panel is there to catch false confidence. A blank host value does not make the file invalid, but the tool reminds you because some teams still expect that line. Missing sitemap entries are more consequential on content-heavy sites because they remove an easy discovery signal. A crawl-delay warning means exactly what it says: the draft includes a line that not every major crawler uses.

The three output views should agree with each other. If the text view, directive table, and JSON payload seem to describe different policies, stop and review the sections before export. The text view is the final publication surface. The table is the fastest way to compare groups side by side. The JSON view is the cleanest structured record of the current draft, including the generation timestamp.

After publication, the live check still matters. Google documents that robots.txt applies only to the protocol, host, and port where the file is served, so a correct file on the www host does not govern the bare domain or another subdomain. A clean-looking draft is only the first half of the job; the file must also be placed at the right root path and served successfully.

How to read the main result signals in the generator
Signal Read it this way Check next
Summary label Fast posture read for the current section set. Confirm which crawler block caused the label before you treat it as a site-wide conclusion.
Allow and disallow badges Rule counts, not quality scores. Review whether each rule belongs to the right crawler group and whether any exception needs an allow line.
Warnings Compatibility or completeness hints surfaced by the current draft. Decide whether the warning is intentional, harmless, or a publishing blocker.
JSON timestamp The moment this draft state was exported. Use it to distinguish one review snapshot from another when multiple versions circulate.

Worked Examples

Blocking a staging host before launch

A team preparing a preview environment starts from the block-all preset and keeps a wildcard user-agent. The summary immediately highlights the high-risk posture, which is exactly what they want in this case. Before shipping, they still need to make sure the file lives on the staging host itself and not only on the production domain.

Keeping production crawlers out of private paths

An ecommerce site wants product pages crawled but does not want bots wasting time on /admin, /account, /cart, /checkout, or on-site search results. The team keeps one wildcard section, leaves the public site open, adds disallow lines for those private areas, and keeps the sitemap lines pointed at the main content indexes. The directive table is the quickest place to confirm that the policy is still focused instead of overblocking.

Cleaning up an inherited file for review

A marketing team inherits a hand-edited file full of comments, repeated user-agent blocks, and scattered sitemap lines. They paste the text into the import panel, let the tool rebuild the sections, then review the result in the table and JSON views. Even if the import is not a perfect editorial reconstruction of every comment, it gives them a far cleaner starting point for deciding what should stay, merge, or be removed.

FAQ:

Does blocking a path in robots.txt hide it from search results?

No. It blocks compliant crawlers from fetching the path, but search engines can still learn that the URL exists through links or other signals. Use stronger controls such as authentication or a noindex method when your goal is removal from search results.

Why does the generator warn when the host field is blank?

Because some teams still include a host line in their review process. An empty host does not make the file invalid. It simply means the draft is omitting an optional, non-standard line.

Why would I keep a blank Disallow: line?

It makes an intentionally open section explicit. The crawler outcome is still allow-all, but the empty line can make the draft easier for humans to review because the openness is written down instead of implied by missing rules.

Can I paste sitemap paths that begin with a slash?

Yes. When the site URL is present, the tool expands site-relative sitemap entries into full URLs before writing the final file.

Why did import fail even though the text looked like robots.txt?

The parser requires at least one User-agent group. If the pasted text has only comments, sitemap lines, or partial directives with no crawler block, the tool stops rather than guessing what the intended policy should be.

Glossary:

User-agent group
A block of rules that applies to one crawler token or to the wildcard token *.
Longest match
The standard rule-selection approach where the most specific matching path wins when multiple allow and disallow rules apply.
Sitemap line
A pointer to a sitemap or sitemap index URL that helps crawlers discover the URLs you want them to know about.
Crawl-delay
A crawler-specific pacing hint expressed in seconds. It is not universally supported.
Root-level robots file
The plain-text file served as /robots.txt for one exact protocol, host, and port combination.

References: