{{ robotsStageAgentLabel }} {{ robotsStageGateText }} {{ robotsStagePathLabel }} {{ robotsStageSitemapLabel }}
Robots.txt generator inputs
Choose allow-all, block-all, sensitive-path, throttle, or Custom.
Enter the exact origin, such as https://www.example.com.
Enter one full URL per line; leading slashes use the Site URL origin.
User-agent sections:
Add one section per crawler policy, then open rows to edit paths, delay, and comments.
Use * for all crawlers, or pick a named bot for a targeted rule group.
Enter the exact user-agent product token, for example ExampleBot.
Enter one root-relative path per line, such as / or /public/.
Enter one path per line; plain paths are normalized with a leading slash.
Enter seconds between requests, or 0 to omit Crawl-delay for this section.
seconds
Write the review note that should appear above this section.
Enter a hostname like www.example.com, or leave blank to omit Host.
Enter seconds for a fallback delay; section-level values override it.
seconds
Turn on to add a generated # header before the first directive.
{{ add_header_comment ? 'On' : 'Off' }}
Turn on to emit Disallow: with a blank value for open sections.
{{ allow_all_placeholder ? 'On' : 'Off' }}
Turn on to alphabetize each section's path rules before export.
{{ sort_directives ? 'On' : 'Off' }}
Turn on for one blank line between rendered crawler sections.
{{ pad_sections ? 'On' : 'Off' }}
Paste plain robots.txt content, then choose Import to rebuild the form fields.
Robots parser
{{ importError }}
{{ importSuccess }}

                        
# User-agent Allow Disallow Crawl-delay Comment Copy
Add a user-agent section to generate robots.txt.
{{ row.idx }} {{ row.userAgent }} {{ row.allowSummary }} {{ row.disallowSummary }} {{ row.crawlDelayLabel }} {{ row.note }}
Check Status Detail Copy
Publish readiness {{ warnings.length ? 'Review before publishing' : 'No blocking checks detected' }} Robots.txt is public crawl guidance, not access control. Keep sensitive paths protected by authentication.
Warning Issue {{ idx + 1 }} {{ warn }}
Structure Ready Host, sitemap, and section structure are present for the current output.

                    
Customize
Advanced
:

A robots policy is one of the first crawl signals a well-behaved crawler reads. Before requesting ordinary pages, the crawler looks for /robots.txt at the root of the exact origin, chooses the user-agent group that applies to its product token, and treats the listed path rules as public instructions about which URLs it should fetch or skip.

The file is small, but the planning around it is not trivial. Account screens, internal search results, cart paths, faceted category URLs, generated calendars, preview hosts, and old test folders can all create crawl waste or confusing discovery signals. A useful policy keeps those areas out of routine fetching while leaving public pages, shared assets, and sitemap hints easy for compliant crawlers to find.

User-agent group
The crawler identity the following rules apply to, with * used as the wildcard group when no narrower group is needed.
Path rule
Allow and Disallow lines compare against URL paths, so spelling, slashes, and rule specificity change the result.
Sitemap hint
A full sitemap URL that helps crawlers discover intended public URLs but does not override a blocking rule.
Common robots.txt planning situations and safer companion controls
Situation Robots.txt can help with Use another control when
Public site launch Keeping the default crawl policy open and pointing crawlers to sitemap files. A page must be hidden from search results entirely.
Staging or maintenance host Publishing a broad Disallow: / rule for a temporary host. The host contains private data that needs login protection.
Duplicate or low-value paths Discouraging crawlers from fetching search, cart, checkout, or account URLs. Canonical URLs, redirects, or indexing rules are needed for search appearance.

The common mistake is treating robots.txt as a privacy or removal system. It is not. The file is public, non-compliant bots can ignore it, and a disallowed URL can still be discovered from links or other signals. Private content needs authentication, and search-result removal usually needs noindex, deletion, redirects, or a removal workflow rather than a crawl hint alone.

Host root /robots.txt same scheme host and port Crawler group User-agent: * Allow: /public/ Disallow: /admin/ longest matching path wins Discovery hint Sitemap: full URL outside groups Public crawl guidance, not access control

A useful policy is short, grouped by real crawler differences, and easy to audit before publication. It should make dangerous choices obvious, especially a full-site block, and it should pair crawl restrictions with sitemap hints so crawlers can still find the URL inventory that is meant to be discovered.

How to Use This Tool:

Start with the crawl posture you want, then review the generated text and publish checks before copying the file to the site root.

  1. Choose a Preset. Standard (allow all) opens the site, Block all crawlers fits staging or maintenance hosts, Hide admin/search adds common private and duplicate paths, Throttle polite crawlers adds a delay hint, and Custom leaves the policy under your control.
  2. Set Site URL to the exact origin you plan to publish from. The host display and any sitemap line that starts with a slash use this value as their base.
  3. Add Sitemap URLs one per line. Use full URLs for final output, or use leading slash paths only when they should resolve against the current Site URL.
  4. Open User-agent sections to edit crawler tokens, Allow paths, Disallow paths, optional Crawl-delay, and the comment that appears above that group. Use a named crawler section only when its policy really differs from the wildcard group.
  5. Use Advanced for the legacy Host directive, fallback delay, generated header comment, allow-all placeholder, sorted path rules, blank lines between sections, or Import robots.txt.
  6. If import shows No user-agent rules detected, paste a complete robots file with at least one User-agent group. Comments, sitemap lines, and loose directives are not enough to rebuild editable sections.
  7. Review Robots.txt, Directive Table, Publish Check, and JSON. Treat Robots.txt as the publication draft and Publish Check as the list of warnings to fix or deliberately accept.

Interpreting Results:

The summary label describes the current crawl posture. Crawling open means the generated sections contain no disallow paths. Selective crawling means at least one path is blocked. All crawling blocked means a section contains Disallow: / or Disallow: /*, which is normal for a staging host and risky on a production host.

Publish Check is a review aid, not proof that the live site is configured correctly. A missing sitemap warning means discovery may be weaker for content-heavy sites. A blank host warning points to a legacy review expectation, not a core Robots Exclusion Protocol requirement. A crawl-delay warning matters because crawler support differs, and Google documents that it ignores that field.

How to read robots.txt generator result signals
Result signal Meaning Verification step
Allow and Disallow counts Counts of rendered path rules across user-agent sections. Open Directive Table and confirm each path sits under the intended crawler token.
Blocks all crawlers At least one section blocks every path for its matching crawler group. Check whether the blocking group is the wildcard * group before publishing to production.
No blocking checks detected The current draft has no built-in warnings for host, sitemap, or delay settings. Still load the final file at the exact protocol, host, and port that crawlers will use.
JSON A structured snapshot of the same policy settings. Use it for review or handoff, but publish the plain text from Robots.txt.

After deployment, visit the live /robots.txt file and test representative URLs in your crawler or search-console tools. A correct file on the HTTPS www origin does not govern the HTTP version, the bare domain, another subdomain, or a different port.

Technical Details:

The Robots Exclusion Protocol is a host-scoped rule system. Crawlers fetch a UTF-8 plain-text file named /robots.txt, select the most relevant user-agent group, and compare URL paths against the rules in that group. The rules govern fetching, not indexing, authorization, ranking, or canonical choice.

Path matching starts at the beginning of the URL path and is normally case-sensitive. When more than one rule matches, the most specific path wins. If an Allow and a Disallow rule are equally specific, the less restrictive Allow result is commonly used. An empty Disallow: line means nothing is blocked for that group.

Rule Core:

Core robots.txt rules and review boundaries
Element Core behavior Review boundary
/robots.txt Applies to the same scheme, host, and port where the file is served. Publish separate files or redirects for each canonical host that crawlers may request.
User-agent Starts a rule group for a crawler token such as *, Googlebot, or Bingbot. Bot-specific groups should repeat any general restrictions they still need.
Allow Permits a matching path, often as a narrow exception inside a broader block. Useful when /admin/help/ should remain crawlable while /admin/ is blocked.
Disallow Asks compliant crawlers not to fetch a matching path. Does not hide the path, protect the content, or remove an already known URL from search.
* and $ Represent wildcard matching and end-of-path anchoring for crawlers that support the standard syntax. Test crawler-specific behavior before relying on complex patterns.
Sitemap Lists a full sitemap or sitemap index URL outside the user-agent groups. Should point to URL inventories that belong to the declared site and are meant for discovery.
Crawl-delay and Host Common legacy or crawler-specific lines outside the core RFC rule set. Do not depend on them for rate limiting, canonical host selection, or security.

Matching Examples:

Examples of robots.txt path matching outcomes
Requested path Relevant rules Expected crawl result
/admin/report Disallow: /admin/ Blocked for the matching group.
/admin/help/index.html Disallow: /admin/
Allow: /admin/help/
Allowed because the allow path is more specific.
/search?q=blue Disallow: /search Blocked because matching begins at the path start.
/ Disallow: Allowed because the disallow value is empty.

Transformation Core:

The generated draft is assembled from the visible site URL, sitemap list, user-agent sections, and advanced rendering choices. Plain path entries are normalized toward root-relative paths, duplicate path entries are collapsed, optional sorting makes diffs easier to review, and sitemap entries that begin with a slash are expanded from the selected site origin.

Site URL -> host display + base for relative Sitemap lines

User-agent sections -> comments + User-agent + Allow + Disallow + Crawl-delay

Advanced choices -> Host line + header comment + sorting + section spacing

Final draft -> plain robots.txt + directive rows + publish warnings + JSON snapshot

Limitations and Privacy Notes:

Robots.txt is intentionally public and advisory. The generated file can reduce crawl waste for compliant crawlers, but it cannot publish the file, verify the live response, enforce crawler behavior, protect sensitive paths, or remove URLs from search results.

  • Keep secrets, customer data, private staging paths, and unreleased URLs out of comments and path names when possible. Anyone can read a published robots file.
  • Drafting, import parsing, and exports happen in the browser. If you copy, bookmark, or share the edited page URL, the current policy settings may be included in that shared browser state.
  • Use authentication, server rules, noindex, removals, redirects, or canonical tags when the goal is privacy, de-indexing, or search-result cleanup rather than crawl pacing.

Worked Examples:

Staging host before launch

A preview host on a staging subdomain uses Block all crawlers. Robots.txt shows a wildcard group with Disallow: /, the summary reads All crawling blocked, and Publish Check warns if no sitemap is present. That warning can be accepted for a staging host, but the same draft should not be copied to production.

Commerce site with duplicate paths

A public store starts from Hide admin/search for its canonical HTTPS host, keeps the sitemap URL, and reviews the disallow list for /admin, /login, /account, /checkout, /cart, /cgi-bin/, and /search. The summary should read Selective crawling, and Directive Table should show those paths under the wildcard user-agent unless a named crawler needs different rules.

Import that cannot rebuild sections

A pasted file containing only comments, Host:, and Sitemap: lines returns No user-agent rules detected. Add a group such as User-agent: * with the intended Allow or Disallow lines, import again, then check Robots.txt and Publish Check before exporting.

FAQ:

Does robots.txt hide private pages?

No. It is a public crawl request for compliant bots. Put private pages behind authentication or other access controls instead of listing sensitive paths in a public file.

Why can a blocked URL still appear in search?

A search engine may learn the URL from links even if it does not crawl the content. If the goal is removal from search, remove the block long enough for a noindex directive to be seen, require login, delete the page, or use the relevant removal process.

Should every bot get its own section?

Usually no. A wildcard User-agent: * section is easier to audit when the same policy applies to all compliant crawlers. Add named sections only when rules, delays, or comments need to differ.

What should I enter for sitemap lines?

Use full sitemap or sitemap index URLs when possible. If you enter a path that starts with /, the draft expands it against Site URL, so check the final Sitemap: lines before publishing.

Can Crawl-delay control Googlebot?

No. Google documents that it does not support Crawl-delay. The line may matter for some other crawlers, but server controls and crawler consoles are better for serious rate management.

Does import validate a live robots file?

No. Import parses pasted text into editable sections. After exporting, load the live /robots.txt URL and test representative paths with the crawler tools you rely on.

Glossary:

User-agent group
A set of crawl rules for one crawler token or the wildcard token *.
Allow
A path rule that permits crawling, often as a narrow exception inside a broader disallow.
Disallow
A path rule asking compliant crawlers not to fetch matching URLs.
Sitemap
A discovery hint that points crawlers to a sitemap or sitemap index URL.
Crawl-delay
A nonstandard pacing hint in seconds that some crawlers ignore.
Host directive
A legacy host preference line that is not part of the core RFC rule set.
Noindex
An indexing instruction used when the goal is keeping a page out of search results.

References: