Robots.txt Generator

#	User-agent	Allow	Disallow	Crawl-delay	Note	Copy
Add a user-agent section to generate robots.txt.
{{ row.idx }}	{{ row.userAgent }}	{{ row.allowSummary }}	{{ row.disallowSummary }}	{{ row.crawlDelayLabel }}	{{ row.note }}

Robots.txt files are small rule lists that tell automated crawlers which parts of a site they should visit and which paths they should skip. A robots.txt generator for staging sites helps you draft a policy that matches your intent and spot accidental blocks before you publish.

Teams use robots rules to keep test areas out of public discovery, to focus crawl attention on important pages, and to reduce wasted requests on duplicates such as internal search results. Robots.txt is guidance, not a lock, so anything truly private still needs proper access control.

You describe the site you are working on and write simple rules for each crawler group you care about. The generator turns that plan into a ready to publish file and a readable summary you can sanity check before it goes live.

For example, a shop can allow product pages, block account and checkout areas, and include a sitemap reference so catalog changes are found sooner. A staging environment can use a stricter policy that blocks everything while still keeping production open to crawlers.

Keep notes explaining why a rule exists and revisit them whenever the site structure changes. If your rules mention sensitive paths, remember that robots.txt is public and should not contain secrets.

Technical Details:

A robots.txt file is a plain text policy that groups directives into sections for a specific User-agent token, such as * or Googlebot. Within each section, Allow and Disallow lines describe which Uniform Resource Locator (URL) path patterns a crawler may fetch, and Crawl-delay suggests a minimum pause between requests for crawlers that honor it.

This generator builds the final file by cleaning inputs, normalizing rule lines, and then emitting directives in a predictable order. Duplicate rules inside a section are removed, and you can optionally alphabetize Allow and Disallow lines to make diffs easier to review.

The summary banner is computed from rule counts and a simple full block check, and it labels policies as Crawling open, Selective crawling, or All crawling blocked. Warnings appear when Host is empty, when no Sitemap URLs are present, or when any Crawl-delay is set because some crawlers ignore it.

Processing pipeline

Trim the User-agent value, defaulting to * when blank.
Split Allow and Disallow text into lines, trimming whitespace and dropping empties.
Normalize each rule by adding a leading slash unless it is an absolute URL, * wildcard, or $ anchor.
De-duplicate rules within each section while keeping the first occurrence.
Coerce Crawl-delay to a nonnegative number and round to two decimals.
If a section delay is zero, inherit the Global Crawl-delay value.
Optionally sort Allow and Disallow rules with locale-aware string comparison.
If no rules exist, optionally emit a blank Disallow line as an allow-all placeholder.
Write each section as note comment, User-agent, then Allow, Disallow, and Crawl-delay lines.
Optionally insert a blank line between sections for readability.
Append Host and each Sitemap line after all sections, then trim trailing blanks.

Symbols and directive meanings

Directive names and what they represent in a robots.txt policy
Symbol	Meaning	Unit/Datatype	Source
`User-agent`	Selects the crawler group the section targets.	string	Input
`Allow`	Path pattern explicitly permitted for the selected agent.	string per line	Input
`Disallow`	Path pattern the selected agent should avoid fetching.	string per line	Input
`Crawl-delay`	Suggested delay between requests for crawlers that honor it.	seconds (number)	Input or derived
`Host`	Optional preferred host value used by some crawlers.	host string	Input, cleaned
`Sitemap`	Absolute sitemap location to help crawlers discover pages.	URL string	Input, normalized
`#`	Comment line written above a section or at the file start.	text line	Derived
`*`	Wildcard token in agent names or rule patterns where supported.	character	Input
`$`	End anchor in rule patterns for crawlers that honor it.	character	Input

Worked example

Scenario: you are preparing a staging site and want to block an admin area, publish a sitemap, and apply a polite request delay.

Site URL: https://www.example.com
Host directive: https://www.example.com
Sitemap URLs: /sitemap.xml and sitemap-index.xml
Global Crawl-delay: 2.345
Section: User-agent: *, Allow: /, Disallow: admin, Crawl-delay: 0, note “Block admin while testing”

The generator normalizes the delay to two decimals, then applies it as the effective delay because the section delay is zero.

\begin{matrix} d_in & = & 2.345 \\ d_rounded & = & \frac{round (2.345 \times 100)}{100} \\ d_rounded & = & 2.35 \end{matrix}

\begin{matrix} d_section & = & 0 \\ d_global & = & 2.35 \\ d_effective & = & 2.35 \end{matrix}

It also cleans Host to www.example.com, prefixes the disallow rule as /admin, and expands sitemaps to absolute URLs.

# robots.txt generated for www.example.com on 2026-02-14T00:00:00.000Z
# Block admin while testing
User-agent: *
Allow: /
Disallow: /admin
Crawl-delay: 2.35

Host: www.example.com
Sitemap: https://www.example.com/sitemap.xml
Sitemap: https://www.example.com/sitemap-index.xml

The header timestamp is generated at runtime and will differ each time when enabled.

Status labels and what triggers them

How the policy status label is determined
Status label	Trigger in this generator	Interpretation	Action cue
Crawling open	No `Disallow` rules are present.	Crawlers are not explicitly blocked by path rules.	Consider adding blocks for private or duplicate areas.
Selective crawling	At least one `Disallow` rule is present.	Some paths are blocked, others may remain reachable.	Review for accidental blocks and keep sitemaps up to date.
All crawling blocked	Any section contains `Disallow: /` or `Disallow: /*`.	The policy blocks all crawlers for the affected User-agent group.	Use for maintenance or staging, remove before launch if unintended.

These labels are a convenience summary of the generated rules and do not guarantee how a specific crawler will behave.

Presets and key parameters

Preset policies and their defaults
Preset	Intent	Section defaults	Global Crawl-delay (s)
Standard (allow all)	Open crawling with a simple baseline policy.	`User-agent: *`, `Allow: /`, no disallows, note “Default access for all crawlers”.	0
Block all crawlers	Block crawling for maintenance or staging.	`User-agent: *`, `Disallow: /`, note “Maintenance or staging; disallow everything”.	0
Hide admin/search	Block common private and duplicate paths.	`Disallow` includes `/admin`, `/login`, `/account`, `/checkout`, `/cart`, `/cgi-bin/`, `/search`.	0
Throttle polite crawlers	Suggest a slower crawl rate where honored.	`User-agent: *`, `Allow: /`, section `Crawl-delay` set.	10

Units, precision, and determinism

Crawl-delay unit: seconds, stored as a JavaScript number.
Rounding: delays are rounded to two decimals using Math.round(value × 100) / 100.
Lower bound: negative or non-finite delays are treated as 0.
Determinism: with header comments disabled, identical inputs produce identical outputs.
Heads-up When the header comment is enabled, the timestamp changes every run and can make diffs noisy.

Validation and bounds

Input validation rules and normalization behaviors
Field	Type	Min	Max	Step/Pattern	Error Text	Placeholder
Site URL	`url` string	—	—	Expected to include protocol (e.g., `https://`).	—	`https://www.example.com`
Host directive	string	—	—	Protocol stripped if present; leading and trailing slashes removed.	Warning when empty: “Host directive is empty; set it for Bing/Yandex clarity.”	`www.example.com`
Sitemap URLs	multi-line string	—	—	One URL per line; `/path` and relative paths expand from Site URL base.	Warning when empty: “Add at least one Sitemap URL so crawlers find your index.”	`https://www.example.com/sitemap.xml`
Global Crawl-delay	number	0	—	Step `0.1`; rounded to 2 decimals.	—	—
User-agent	string	—	—	Defaults to `*` when blank; preset list includes common bots.	—	Custom value: “Enter custom user-agent”
Allow paths	multi-line string	—	—	One path per line; supports `*` and `$` tokens for crawlers that honor them.	—	`/` and `/public/`
Disallow paths	multi-line string	—	—	Leading slash is added when missing, unless the rule starts with `*` or `$`.	—	`/admin` and `/search`
Section Crawl-delay	number	0	—	Step `0.1`; falls back to Global when `0`; rounded to 2 decimals.	Warning when any delay set: “Crawl-delay is ignored by some crawlers (e.g., Googlebot).”	—
Note	string	—	—	Rendered as a `#` comment above the section.	—	“e.g., Blocked cart pages”
Import robots.txt	multi-line string	—	—	Parses `User-agent`, `Allow`, `Disallow`, `Crawl-delay`, `Sitemap`, `Host`.	“Paste robots.txt content to import.” and “No user-agent rules detected.”	“Paste robots.txt here”

Input and output formats

Supported inputs and generated outputs
Input	Accepted families	Output	Encoding/Precision	Rounding
Rule lists	Newline-separated strings for Allow and Disallow.	robots.txt text	LF newlines, trimmed lines, duplicates removed.	Not applicable
Delays	Numbers ≥ 0 for Global and per-section Crawl-delay.	`Crawl-delay` directives	Decimal seconds as a JavaScript number.	2 decimal places
Sitemaps	Absolute URLs, `/path`, or relative paths.	`Sitemap` directives	Normalized to absolute URLs when a valid Site URL base exists.	Not applicable
Policy snapshot	Current settings and normalized sections.	JSON policy view	Pretty-printed with 2-space indentation and `generated_at` timestamp.	Not applicable
Directive summary	Normalized sections with allow and disallow summaries.	Table, CSV, and DOCX summary	CSV headers: “#”, “User-agent”, “Allow rules”, “Disallow rules”, “Crawl-delay (s)”, “Note”.	Not applicable

Import behavior

Lines starting with # are treated as a note for the next User-agent section.
Each User-agent: line starts a new section.
Allow, Disallow, and Crawl-delay lines attach to the current section, creating one if needed.
Only the first Host value is used when multiple exist.
Sitemap lines are collected and then normalized using the current Site URL base.
Unknown directives are ignored, and import fails when no User-agent lines are detected.

Networking, storage, and privacy

This package contains no fetch or XHR calls and does not write to local or session storage in its script. Copy and download actions operate on the generated text and summaries.

Robots.txt content and path lists can reveal internal structure, so treat drafts and published files as public configuration data.

Performance and complexity

For S sections and R total rules, normalization runs in O(S + R), and optional sorting adds O(R log R). Output generation is linear in the number of emitted lines.

Security considerations

Robots.txt is public and advisory. Do not rely on it to protect sensitive content, and avoid including secrets or private tokens in rule text.

Assumptions and limitations

Heads-up A valid robots.txt does not guarantee a crawler will obey it.
Heads-up Blocking a path does not remove content that is already public elsewhere.
Host is treated as optional and may be ignored by many crawlers.
Crawl-delay is included when set, even though support varies by crawler.
Rule normalization prefixes missing slashes, which can change intent if you expected a literal token.
Absolute URLs in Allow or Disallow are preserved, but not all crawlers interpret them.
Sorting rules improves reviewability, but some crawler implementations may be sensitive to ordering.
The status label is a heuristic based on rule presence, not a full crawler simulation.
Import expects a colon-delimited directive format and ignores unknown lines.
Header timestamps improve traceability but reduce output stability for version control diffs.

Edge cases and error sources

Non-ASCII characters in paths are treated as raw text and are not normalized.
Unicode normalization is not applied, so visually similar strings may compare differently.
Grapheme clusters are not interpreted, and trimming operates on code units.
NaNs and infinities in Crawl-delay are coerced to 0 and omitted.
Signed zero and negative values are treated as 0 after coercion.
Denormals and extremely small positive delays may round down to 0.
Rounding ties follow Math.round, which can bump 0.005 to 0.01.
IPv6 hosts in the Site URL rely on the platform URL parser, including bracket handling.
Trailing slashes matter for string comparisons and may affect crawler matching behavior.
Wildcard DNS is not resolved or validated because no network lookups are performed.
Non-convergent roots are not applicable because there is no iterative numeric solving.
Floating-point drift is limited to delay rounding and typical JavaScript number behavior.
Race conditions are unlikely because generation is synchronous, but rapid edits can change timestamps.
PRNG caveats are not applicable because no random generation is used.
Stale cache effects are not applicable because the script does not implement caching.

Standards and references

Robots directives are commonly described by the IETF Robots Exclusion Protocol standard (RFC 9309). URL parsing behavior follows the WHATWG URL Standard, and timestamps use ISO 8601 formatting via toISOString() in UTC.

Text handling relies on the Unicode Standard for character encoding and comparison behaviors.

Step-by-Step Guide:

Robots.txt policies guide crawlers toward the right pages and away from areas you do not want crawled, and this flow helps you produce a clean draft you can publish.

Pick a Preset if you want a sensible starting policy.
Enter the Site URL including the protocol, such as https://.
Review the optional Host value, especially if you want Bing or Yandex clarity.
Add Sitemap URLs, one per line, using absolute URLs or paths like /sitemap.xml.
Create one User-agent section per crawler family that needs different rules.
List Allow and Disallow paths, one per line, and let the generator prefix missing slashes.
Set Crawl-delay per section or use the global delay as a fallback for sections set to zero.
Use Warning when relying on Crawl-delay, since some crawlers ignore it.
Copy or download the generated text and publish it at /robots.txt on your site.

Compact example: block the admin area while allowing everything else.

User-agent: *
Allow: /
Disallow: /admin

Disable the header comment when you want stable output for version control.
Keep “Sort directives” enabled for predictable ordering across edits.
Use one sitemap per line, and include a sitemap index if you have many sitemaps.
Add a short note above each section to explain why the rules exist.

Pro tip: generate a staging policy that blocks everything, then switch to a selective policy only when you are ready for discovery.

FAQ:

Is my data stored?

The script contains no network calls and does not write to local or session storage. Treat drafts as public if you copy them into tickets or share them widely.

Robots.txt is meant to be publicly readable once published.

How accurate is the result?

It accurately reflects the rules you enter plus normalization like adding leading slashes and rounding delays. Real crawler behavior varies, so test important URLs with the crawlers you care about.

Can it work offline?

Generation and import parsing run locally. If the page and its assets are already loaded, it can continue without a network connection.

What formats are supported?

You can copy or download robots.txt text, view a directive summary table, and export the policy as CSV, DOCX, or JSON.

Formats reflect the same normalized rules shown in the output.

How do I import rules?

Paste an existing robots.txt into the import area and run import. Each User-agent starts a new section, comment lines become the next section note, and Host and Sitemap lines are captured when present.

What does Crawl delay mean?

It is a suggested pause in seconds between requests for a given crawler group. Some crawlers ignore it, so it should be treated as a best effort hint rather than a guarantee.

How do I validate a CSR?

This page is focused on robots.txt policies and does not validate Certificate Signing Requests. Use a certificate or PKI inspector for CSR checks.

Borderline result meaning?

There is no borderline score. The status label is based on whether any Disallow rules exist and whether any section uses Disallow / or /* which is treated as blocking all crawling.

Is there a cost?

This package does not include pricing or licensing text. Any cost or license terms depend on the site that hosts it, so check that site’s terms of use.

Troubleshooting:

The Host value stays blank: make sure the Site URL includes https:// so it can be parsed.
Sitemap paths do not expand: verify the Site URL is a valid absolute URL.
A path looks wrong after generation: remember missing leading slashes are added automatically.
Output changes every time: disable the header comment to remove the timestamp.
Rules appear out of order: enable sorting to alphabetize Allow and Disallow lines.
Import merges content unexpectedly: import starts a new section on every User-agent: line.

Blocking issue: import says “No user-agent rules detected.”

Check that your text includes at least one User-agent: line and that directives use a colon separator like Disallow: and Allow:.

Advanced Tips:

Tip Use separate sections for crawlers that need different access, rather than mixing exceptions.
Tip Prefer narrow Allow exceptions only when a broader Disallow is necessary for duplicates.
Tip Keep sorting enabled and disable header timestamps to make reviews and diffs predictable.
Tip Add one sitemap index line, then keep detailed sitemaps referenced from that index.
Tip For staging, block all crawling and remove sitemap lines so crawlers do not queue URLs.
Tip Treat robots.txt as public signage and avoid listing sensitive endpoints you would not publish.

Glossary:

User-agent: Crawler identifier that selects which section applies.
Allow: Path pattern explicitly permitted within a section.
Disallow: Path pattern a crawler should avoid fetching.
Crawl-delay: Suggested pause in seconds between crawler requests.
Host directive: Optional host hint used by some crawlers.
Sitemap directive: Absolute sitemap URL that helps crawlers find pages.
Wildcard *: Pattern token that can match multiple characters.
Anchor $: Pattern token that anchors a match to the end.
Header comment: Generated note that can include host and timestamp.
ISO 8601 timestamp: Standard date time string produced by toISOString().