Robots.txt Generator

Preset:

Choose allow-all, block-all, sensitive-path, throttle, or Custom.

Site URL:

Enter the exact origin, such as https://www.example.com.

Sitemap URLs:

Enter one full URL per line; leading slashes use the Site URL origin.

User-agent sections:

Add one section per crawler policy, then open rows to edit paths, delay, and comments.

User-agent:

Use * for all crawlers, or pick a named bot for a targeted rule group.

Custom user-agent:

Enter the exact user-agent product token, for example ExampleBot.

Allow paths:

Enter one root-relative path per line, such as / or /public/.

Disallow paths:

Enter one path per line; plain paths are normalized with a leading slash.

Crawl-delay:

Enter seconds between requests, or 0 to omit Crawl-delay for this section.

seconds

Comment:

Write the review note that should appear above this section.

Host directive:

Enter a hostname like www.example.com, or leave blank to omit Host.

Fallback crawl-delay:

Enter seconds for a fallback delay; section-level values override it.

seconds

Add header comment:

Turn on to add a generated # header before the first directive.

Allow-all placeholder:

Turn on to emit Disallow: with a blank value for open sections.

Sort directives:

Turn on to alphabetize each section's path rules before export.

Blank lines between sections:

Turn on for one blank line between rendered crawler sections.

Import robots.txt:

Paste plain robots.txt content, then choose Import to rebuild the form fields.

Robots parser

#	User-agent	Allow	Disallow	Crawl-delay	Comment	Copy
Add a user-agent section to generate robots.txt.
{{ row.idx }}	{{ row.userAgent }}	{{ row.allowSummary }}	{{ row.disallowSummary }}	{{ row.crawlDelayLabel }}	{{ row.note }}

Check	Status	Detail
Publish readiness	{{ warnings.length ? 'Review before publishing' : 'No blocking checks detected' }}	Robots.txt is public crawl guidance, not access control. Keep sensitive paths protected by authentication.
Warning	Issue {{ idx + 1 }}	{{ warn }}
Structure	Ready	Host, sitemap, and section structure are present for the current output.

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

A robots policy is one of the first crawl signals a well-behaved crawler reads. Before requesting ordinary pages, the crawler looks for /robots.txt at the root of the exact origin, chooses the user-agent group that applies to its product token, and treats the listed path rules as public instructions about which URLs it should fetch or skip.

The file is small, but the planning around it is not trivial. Account screens, internal search results, cart paths, faceted category URLs, generated calendars, preview hosts, and old test folders can all create crawl waste or confusing discovery signals. A useful policy keeps those areas out of routine fetching while leaving public pages, shared assets, and sitemap hints easy for compliant crawlers to find.

User-agent group: The crawler identity the following rules apply to, with * used as the wildcard group when no narrower group is needed.
Path rule: Allow and Disallow lines compare against URL paths, so spelling, slashes, and rule specificity change the result.
Sitemap hint: A full sitemap URL that helps crawlers discover intended public URLs but does not override a blocking rule.

Common robots.txt planning situations and safer companion controls
Situation	Robots.txt can help with	Use another control when
Public site launch	Keeping the default crawl policy open and pointing crawlers to sitemap files.	A page must be hidden from search results entirely.
Staging or maintenance host	Publishing a broad `Disallow: /` rule for a temporary host.	The host contains private data that needs login protection.
Duplicate or low-value paths	Discouraging crawlers from fetching search, cart, checkout, or account URLs.	Canonical URLs, redirects, or indexing rules are needed for search appearance.

The common mistake is treating robots.txt as a privacy or removal system. It is not. The file is public, non-compliant bots can ignore it, and a disallowed URL can still be discovered from links or other signals. Private content needs authentication, and search-result removal usually needs noindex, deletion, redirects, or a removal workflow rather than a crawl hint alone.

A useful policy is short, grouped by real crawler differences, and easy to audit before publication. It should make dangerous choices obvious, especially a full-site block, and it should pair crawl restrictions with sitemap hints so crawlers can still find the URL inventory that is meant to be discovered.

How to Use This Tool:

Start with the crawl posture you want, then review the generated text and publish checks before copying the file to the site root.

Choose a Preset. Standard (allow all) opens the site, Block all crawlers fits staging or maintenance hosts, Hide admin/search adds common private and duplicate paths, Throttle polite crawlers adds a delay hint, and Custom leaves the policy under your control.
Set Site URL to the exact origin you plan to publish from. The host display and any sitemap line that starts with a slash use this value as their base.
Add Sitemap URLs one per line. Use full URLs for final output, or use leading slash paths only when they should resolve against the current Site URL.
Open User-agent sections to edit crawler tokens, Allow paths, Disallow paths, optional Crawl-delay, and the comment that appears above that group. Use a named crawler section only when its policy really differs from the wildcard group.
Use Advanced for the legacy Host directive, fallback delay, generated header comment, allow-all placeholder, sorted path rules, blank lines between sections, or Import robots.txt.
If import shows No user-agent rules detected, paste a complete robots file with at least one User-agent group. Comments, sitemap lines, and loose directives are not enough to rebuild editable sections.
Review Robots.txt, Directive Table, Publish Check, and JSON. Treat Robots.txt as the publication draft and Publish Check as the list of warnings to fix or deliberately accept.

Interpreting Results:

The summary label describes the current crawl posture. Crawling open means the generated sections contain no disallow paths. Selective crawling means at least one path is blocked. All crawling blocked means a section contains Disallow: / or Disallow: /*, which is normal for a staging host and risky on a production host.

Publish Check is a review aid, not proof that the live site is configured correctly. A missing sitemap warning means discovery may be weaker for content-heavy sites. A blank host warning points to a legacy review expectation, not a core Robots Exclusion Protocol requirement. A crawl-delay warning matters because crawler support differs, and Google documents that it ignores that field.

How to read robots.txt generator result signals
Result signal	Meaning	Verification step
`Allow` and `Disallow` counts	Counts of rendered path rules across user-agent sections.	Open `Directive Table` and confirm each path sits under the intended crawler token.
`Blocks all crawlers`	At least one section blocks every path for its matching crawler group.	Check whether the blocking group is the wildcard `*` group before publishing to production.
`No blocking checks detected`	The current draft has no built-in warnings for host, sitemap, or delay settings.	Still load the final file at the exact protocol, host, and port that crawlers will use.
`JSON`	A structured snapshot of the same policy settings.	Use it for review or handoff, but publish the plain text from `Robots.txt`.

After deployment, visit the live /robots.txt file and test representative URLs in your crawler or search-console tools. A correct file on the HTTPS www origin does not govern the HTTP version, the bare domain, another subdomain, or a different port.

Technical Details:

The Robots Exclusion Protocol is a host-scoped rule system. Crawlers fetch a UTF-8 plain-text file named /robots.txt, select the most relevant user-agent group, and compare URL paths against the rules in that group. The rules govern fetching, not indexing, authorization, ranking, or canonical choice.

Path matching starts at the beginning of the URL path and is normally case-sensitive. When more than one rule matches, the most specific path wins. If an Allow and a Disallow rule are equally specific, the less restrictive Allow result is commonly used. An empty Disallow: line means nothing is blocked for that group.

Rule Core:

Core robots.txt rules and review boundaries
Element	Core behavior	Review boundary
`/robots.txt`	Applies to the same scheme, host, and port where the file is served.	Publish separate files or redirects for each canonical host that crawlers may request.
`User-agent`	Starts a rule group for a crawler token such as `*`, `Googlebot`, or `Bingbot`.	Bot-specific groups should repeat any general restrictions they still need.
`Allow`	Permits a matching path, often as a narrow exception inside a broader block.	Useful when `/admin/help/` should remain crawlable while `/admin/` is blocked.
`Disallow`	Asks compliant crawlers not to fetch a matching path.	Does not hide the path, protect the content, or remove an already known URL from search.
`*` and `$`	Represent wildcard matching and end-of-path anchoring for crawlers that support the standard syntax.	Test crawler-specific behavior before relying on complex patterns.
`Sitemap`	Lists a full sitemap or sitemap index URL outside the user-agent groups.	Should point to URL inventories that belong to the declared site and are meant for discovery.
`Crawl-delay` and `Host`	Common legacy or crawler-specific lines outside the core RFC rule set.	Do not depend on them for rate limiting, canonical host selection, or security.

Matching Examples:

Examples of robots.txt path matching outcomes
Requested path	Relevant rules	Expected crawl result
`/admin/report`	`Disallow: /admin/`	Blocked for the matching group.
`/admin/help/index.html`	`Disallow: /admin/` `Allow: /admin/help/`	Allowed because the allow path is more specific.
`/search?q=blue`	`Disallow: /search`	Blocked because matching begins at the path start.
`/`	`Disallow:`	Allowed because the disallow value is empty.

Transformation Core:

The generated draft is assembled from the visible site URL, sitemap list, user-agent sections, and advanced rendering choices. Plain path entries are normalized toward root-relative paths, duplicate path entries are collapsed, optional sorting makes diffs easier to review, and sitemap entries that begin with a slash are expanded from the selected site origin.

Site URL -> host display + base for relative Sitemap lines

User-agent sections -> comments + User-agent + Allow + Disallow + Crawl-delay

Advanced choices -> Host line + header comment + sorting + section spacing

Final draft -> plain robots.txt + directive rows + publish warnings + JSON snapshot

Limitations and Privacy Notes:

Robots.txt is intentionally public and advisory. The generated file can reduce crawl waste for compliant crawlers, but it cannot publish the file, verify the live response, enforce crawler behavior, protect sensitive paths, or remove URLs from search results.

Keep secrets, customer data, private staging paths, and unreleased URLs out of comments and path names when possible. Anyone can read a published robots file.
Drafting, import parsing, and exports happen in the browser. If you copy, bookmark, or share the edited page URL, the current policy settings may be included in that shared browser state.
Use authentication, server rules, noindex, removals, redirects, or canonical tags when the goal is privacy, de-indexing, or search-result cleanup rather than crawl pacing.

Worked Examples:

Staging host before launch

A preview host on a staging subdomain uses Block all crawlers. Robots.txt shows a wildcard group with Disallow: /, the summary reads All crawling blocked, and Publish Check warns if no sitemap is present. That warning can be accepted for a staging host, but the same draft should not be copied to production.

Commerce site with duplicate paths

A public store starts from Hide admin/search for its canonical HTTPS host, keeps the sitemap URL, and reviews the disallow list for /admin, /login, /account, /checkout, /cart, /cgi-bin/, and /search. The summary should read Selective crawling, and Directive Table should show those paths under the wildcard user-agent unless a named crawler needs different rules.

Import that cannot rebuild sections

A pasted file containing only comments, Host:, and Sitemap: lines returns No user-agent rules detected. Add a group such as User-agent: * with the intended Allow or Disallow lines, import again, then check Robots.txt and Publish Check before exporting.

FAQ:

Does robots.txt hide private pages?

No. It is a public crawl request for compliant bots. Put private pages behind authentication or other access controls instead of listing sensitive paths in a public file.

Why can a blocked URL still appear in search?

A search engine may learn the URL from links even if it does not crawl the content. If the goal is removal from search, remove the block long enough for a noindex directive to be seen, require login, delete the page, or use the relevant removal process.

Should every bot get its own section?

Usually no. A wildcard User-agent: * section is easier to audit when the same policy applies to all compliant crawlers. Add named sections only when rules, delays, or comments need to differ.

What should I enter for sitemap lines?

Use full sitemap or sitemap index URLs when possible. If you enter a path that starts with /, the draft expands it against Site URL, so check the final Sitemap: lines before publishing.

Can Crawl-delay control Googlebot?

No. Google documents that it does not support Crawl-delay. The line may matter for some other crawlers, but server controls and crawler consoles are better for serious rate management.

Does import validate a live robots file?

No. Import parses pasted text into editable sections. After exporting, load the live /robots.txt URL and test representative paths with the crawler tools you rely on.

Glossary:

User-agent group: A set of crawl rules for one crawler token or the wildcard token *.
Allow: A path rule that permits crawling, often as a narrow exception inside a broader disallow.
Disallow: A path rule asking compliant crawlers not to fetch matching URLs.
Sitemap: A discovery hint that points crawlers to a sitemap or sitemap index URL.
Crawl-delay: A nonstandard pacing hint in seconds that some crawlers ignore.
Host directive: A legacy host preference line that is not part of the core RFC rule set.
Noindex: An indexing instruction used when the goal is keeping a page out of search results.

References:

RFC 9309: Robots Exclusion Protocol, RFC Editor, August 2022.
How Google Interprets the robots.txt Specification, Google Search Central, last updated 2026-04-14.
Introduction to robots.txt, Google Search Central, last updated 2025-12-10.
Sitemaps XML Format, sitemaps.org, last updated 2016-11-21.
How to create a robots.txt file for your website, Simplified Guide.