Robots.txt status
{{ summaryLabel }}
Host: {{ hostClean || 'not set' }}
{{ stats.sections }} user-agent{{ stats.sections === 1 ? '' : 's' }} {{ stats.allowCount }} allow {{ stats.disallowCount }} disallow Sitemaps {{ stats.sitemapCount }} Blocks all crawlers
Check before publishing
  • {{ warn }}
s
Section {{ idx + 1 }}
Leave blank to avoid overriding disallow rules.
s
{{ importError }}
{{ importSuccess }}

                        
# User-agent Allow Disallow Crawl-delay Note Copy
Add a user-agent section to generate robots.txt.
{{ row.idx }} {{ row.userAgent }} {{ row.allowSummary }} {{ row.disallowSummary }} {{ row.crawlDelayLabel }} {{ row.note }}

                        
:

Introduction:

Robots.txt files are small rule lists that tell automated crawlers which parts of a site they should visit and which paths they should skip. A robots.txt generator for staging sites helps you draft a policy that matches your intent and spot accidental blocks before you publish.

Teams use robots rules to keep test areas out of public discovery, to focus crawl attention on important pages, and to reduce wasted requests on duplicates such as internal search results. Robots.txt is guidance, not a lock, so anything truly private still needs proper access control.

You describe the site you are working on and write simple rules for each crawler group you care about. The generator turns that plan into a ready to publish file and a readable summary you can sanity check before it goes live.

For example, a shop can allow product pages, block account and checkout areas, and include a sitemap reference so catalog changes are found sooner. A staging environment can use a stricter policy that blocks everything while still keeping production open to crawlers.

Keep notes explaining why a rule exists and revisit them whenever the site structure changes. If your rules mention sensitive paths, remember that robots.txt is public and should not contain secrets.

Technical Details:

A robots.txt file is a plain text policy that groups directives into sections for a specific User-agent token, such as * or Googlebot. Within each section, Allow and Disallow lines describe which Uniform Resource Locator (URL) path patterns a crawler may fetch, and Crawl-delay suggests a minimum pause between requests for crawlers that honor it.

This generator builds the final file by cleaning inputs, normalizing rule lines, and then emitting directives in a predictable order. Duplicate rules inside a section are removed, and you can optionally alphabetize Allow and Disallow lines to make diffs easier to review.

The summary banner is computed from rule counts and a simple full block check, and it labels policies as Crawling open, Selective crawling, or All crawling blocked. Warnings appear when Host is empty, when no Sitemap URLs are present, or when any Crawl-delay is set because some crawlers ignore it.

Processing pipeline

  1. Trim the User-agent value, defaulting to * when blank.
  2. Split Allow and Disallow text into lines, trimming whitespace and dropping empties.
  3. Normalize each rule by adding a leading slash unless it is an absolute URL, * wildcard, or $ anchor.
  4. De-duplicate rules within each section while keeping the first occurrence.
  5. Coerce Crawl-delay to a nonnegative number and round to two decimals.
  6. If a section delay is zero, inherit the Global Crawl-delay value.
  7. Optionally sort Allow and Disallow rules with locale-aware string comparison.
  8. If no rules exist, optionally emit a blank Disallow line as an allow-all placeholder.
  9. Write each section as note comment, User-agent, then Allow, Disallow, and Crawl-delay lines.
  10. Optionally insert a blank line between sections for readability.
  11. Append Host and each Sitemap line after all sections, then trim trailing blanks.

Symbols and directive meanings

Directive names and what they represent in a robots.txt policy
Symbol Meaning Unit/Datatype Source
User-agent Selects the crawler group the section targets. string Input
Allow Path pattern explicitly permitted for the selected agent. string per line Input
Disallow Path pattern the selected agent should avoid fetching. string per line Input
Crawl-delay Suggested delay between requests for crawlers that honor it. seconds (number) Input or derived
Host Optional preferred host value used by some crawlers. host string Input, cleaned
Sitemap Absolute sitemap location to help crawlers discover pages. URL string Input, normalized
# Comment line written above a section or at the file start. text line Derived
* Wildcard token in agent names or rule patterns where supported. character Input
$ End anchor in rule patterns for crawlers that honor it. character Input

Worked example

Scenario: you are preparing a staging site and want to block an admin area, publish a sitemap, and apply a polite request delay.

  • Site URL: https://www.example.com
  • Host directive: https://www.example.com
  • Sitemap URLs: /sitemap.xml and sitemap-index.xml
  • Global Crawl-delay: 2.345
  • Section: User-agent: *, Allow: /, Disallow: admin, Crawl-delay: 0, note “Block admin while testing”

The generator normalizes the delay to two decimals, then applies it as the effective delay because the section delay is zero.

d_in = 2.345 d_rounded = round(2.345×100)100 d_rounded = 2.35
d_section = 0 d_global = 2.35 d_effective = 2.35

It also cleans Host to www.example.com, prefixes the disallow rule as /admin, and expands sitemaps to absolute URLs.

# robots.txt generated for www.example.com on 2026-02-14T00:00:00.000Z
# Block admin while testing
User-agent: *
Allow: /
Disallow: /admin
Crawl-delay: 2.35

Host: www.example.com
Sitemap: https://www.example.com/sitemap.xml
Sitemap: https://www.example.com/sitemap-index.xml
The header timestamp is generated at runtime and will differ each time when enabled.

Status labels and what triggers them

How the policy status label is determined
Status label Trigger in this generator Interpretation Action cue
Crawling open No Disallow rules are present. Crawlers are not explicitly blocked by path rules. Consider adding blocks for private or duplicate areas.
Selective crawling At least one Disallow rule is present. Some paths are blocked, others may remain reachable. Review for accidental blocks and keep sitemaps up to date.
All crawling blocked Any section contains Disallow: / or Disallow: /*. The policy blocks all crawlers for the affected User-agent group. Use for maintenance or staging, remove before launch if unintended.

These labels are a convenience summary of the generated rules and do not guarantee how a specific crawler will behave.

Presets and key parameters

Preset policies and their defaults
Preset Intent Section defaults Global Crawl-delay (s)
Standard (allow all) Open crawling with a simple baseline policy. User-agent: *, Allow: /, no disallows, note “Default access for all crawlers”. 0
Block all crawlers Block crawling for maintenance or staging. User-agent: *, Disallow: /, note “Maintenance or staging; disallow everything”. 0
Hide admin/search Block common private and duplicate paths. Disallow includes /admin, /login, /account, /checkout, /cart, /cgi-bin/, /search. 0
Throttle polite crawlers Suggest a slower crawl rate where honored. User-agent: *, Allow: /, section Crawl-delay set. 10

Units, precision, and determinism

  • Crawl-delay unit: seconds, stored as a JavaScript number.
  • Rounding: delays are rounded to two decimals using Math.round(value × 100) / 100.
  • Lower bound: negative or non-finite delays are treated as 0.
  • Determinism: with header comments disabled, identical inputs produce identical outputs.
  • Heads-up When the header comment is enabled, the timestamp changes every run and can make diffs noisy.

Validation and bounds

Input validation rules and normalization behaviors
Field Type Min Max Step/Pattern Error Text Placeholder
Site URL url string Expected to include protocol (e.g., https://). https://www.example.com
Host directive string Protocol stripped if present; leading and trailing slashes removed. Warning when empty: “Host directive is empty; set it for Bing/Yandex clarity.” www.example.com
Sitemap URLs multi-line string One URL per line; /path and relative paths expand from Site URL base. Warning when empty: “Add at least one Sitemap URL so crawlers find your index.” https://www.example.com/sitemap.xml
Global Crawl-delay number 0 Step 0.1; rounded to 2 decimals.
User-agent string Defaults to * when blank; preset list includes common bots. Custom value: “Enter custom user-agent”
Allow paths multi-line string One path per line; supports * and $ tokens for crawlers that honor them. / and /public/
Disallow paths multi-line string Leading slash is added when missing, unless the rule starts with * or $. /admin and /search
Section Crawl-delay number 0 Step 0.1; falls back to Global when 0; rounded to 2 decimals. Warning when any delay set: “Crawl-delay is ignored by some crawlers (e.g., Googlebot).”
Note string Rendered as a # comment above the section. “e.g., Blocked cart pages”
Import robots.txt multi-line string Parses User-agent, Allow, Disallow, Crawl-delay, Sitemap, Host. “Paste robots.txt content to import.” and “No user-agent rules detected.” “Paste robots.txt here”

Input and output formats

Supported inputs and generated outputs
Input Accepted families Output Encoding/Precision Rounding
Rule lists Newline-separated strings for Allow and Disallow. robots.txt text LF newlines, trimmed lines, duplicates removed. Not applicable
Delays Numbers ≥ 0 for Global and per-section Crawl-delay. Crawl-delay directives Decimal seconds as a JavaScript number. 2 decimal places
Sitemaps Absolute URLs, /path, or relative paths. Sitemap directives Normalized to absolute URLs when a valid Site URL base exists. Not applicable
Policy snapshot Current settings and normalized sections. JSON policy view Pretty-printed with 2-space indentation and generated_at timestamp. Not applicable
Directive summary Normalized sections with allow and disallow summaries. Table, CSV, and DOCX summary CSV headers: “#”, “User-agent”, “Allow rules”, “Disallow rules”, “Crawl-delay (s)”, “Note”. Not applicable

Import behavior

  • Lines starting with # are treated as a note for the next User-agent section.
  • Each User-agent: line starts a new section.
  • Allow, Disallow, and Crawl-delay lines attach to the current section, creating one if needed.
  • Only the first Host value is used when multiple exist.
  • Sitemap lines are collected and then normalized using the current Site URL base.
  • Unknown directives are ignored, and import fails when no User-agent lines are detected.

Networking, storage, and privacy

This package contains no fetch or XHR calls and does not write to local or session storage in its script. Copy and download actions operate on the generated text and summaries.

Robots.txt content and path lists can reveal internal structure, so treat drafts and published files as public configuration data.

Performance and complexity

For S sections and R total rules, normalization runs in O(S + R), and optional sorting adds O(R log R). Output generation is linear in the number of emitted lines.

Security considerations

Robots.txt is public and advisory. Do not rely on it to protect sensitive content, and avoid including secrets or private tokens in rule text.

Assumptions and limitations

  • Heads-up A valid robots.txt does not guarantee a crawler will obey it.
  • Heads-up Blocking a path does not remove content that is already public elsewhere.
  • Host is treated as optional and may be ignored by many crawlers.
  • Crawl-delay is included when set, even though support varies by crawler.
  • Rule normalization prefixes missing slashes, which can change intent if you expected a literal token.
  • Absolute URLs in Allow or Disallow are preserved, but not all crawlers interpret them.
  • Sorting rules improves reviewability, but some crawler implementations may be sensitive to ordering.
  • The status label is a heuristic based on rule presence, not a full crawler simulation.
  • Import expects a colon-delimited directive format and ignores unknown lines.
  • Header timestamps improve traceability but reduce output stability for version control diffs.

Edge cases and error sources

  • Non-ASCII characters in paths are treated as raw text and are not normalized.
  • Unicode normalization is not applied, so visually similar strings may compare differently.
  • Grapheme clusters are not interpreted, and trimming operates on code units.
  • NaNs and infinities in Crawl-delay are coerced to 0 and omitted.
  • Signed zero and negative values are treated as 0 after coercion.
  • Denormals and extremely small positive delays may round down to 0.
  • Rounding ties follow Math.round, which can bump 0.005 to 0.01.
  • IPv6 hosts in the Site URL rely on the platform URL parser, including bracket handling.
  • Trailing slashes matter for string comparisons and may affect crawler matching behavior.
  • Wildcard DNS is not resolved or validated because no network lookups are performed.
  • Non-convergent roots are not applicable because there is no iterative numeric solving.
  • Floating-point drift is limited to delay rounding and typical JavaScript number behavior.
  • Race conditions are unlikely because generation is synchronous, but rapid edits can change timestamps.
  • PRNG caveats are not applicable because no random generation is used.
  • Stale cache effects are not applicable because the script does not implement caching.

Standards and references

Robots directives are commonly described by the IETF Robots Exclusion Protocol standard (RFC 9309). URL parsing behavior follows the WHATWG URL Standard, and timestamps use ISO 8601 formatting via toISOString() in UTC.

Text handling relies on the Unicode Standard for character encoding and comparison behaviors.

Step-by-Step Guide:

Robots.txt policies guide crawlers toward the right pages and away from areas you do not want crawled, and this flow helps you produce a clean draft you can publish.

  1. Pick a Preset if you want a sensible starting policy.
  2. Enter the Site URL including the protocol, such as https://.
  3. Review the optional Host value, especially if you want Bing or Yandex clarity.
  4. Add Sitemap URLs, one per line, using absolute URLs or paths like /sitemap.xml.
  5. Create one User-agent section per crawler family that needs different rules.
  6. List Allow and Disallow paths, one per line, and let the generator prefix missing slashes.
  7. Set Crawl-delay per section or use the global delay as a fallback for sections set to zero.
  8. Use Warning when relying on Crawl-delay, since some crawlers ignore it.
  9. Copy or download the generated text and publish it at /robots.txt on your site.

Compact example: block the admin area while allowing everything else.

User-agent: *
Allow: /
Disallow: /admin
  • Disable the header comment when you want stable output for version control.
  • Keep “Sort directives” enabled for predictable ordering across edits.
  • Use one sitemap per line, and include a sitemap index if you have many sitemaps.
  • Add a short note above each section to explain why the rules exist.

Pro tip: generate a staging policy that blocks everything, then switch to a selective policy only when you are ready for discovery.

FAQ:

Is my data stored?

The script contains no network calls and does not write to local or session storage. Treat drafts as public if you copy them into tickets or share them widely.

Robots.txt is meant to be publicly readable once published.
How accurate is the result?

It accurately reflects the rules you enter plus normalization like adding leading slashes and rounding delays. Real crawler behavior varies, so test important URLs with the crawlers you care about.

Can it work offline?

Generation and import parsing run locally. If the page and its assets are already loaded, it can continue without a network connection.

What formats are supported?

You can copy or download robots.txt text, view a directive summary table, and export the policy as CSV, DOCX, or JSON.

Formats reflect the same normalized rules shown in the output.
How do I import rules?

Paste an existing robots.txt into the import area and run import. Each User-agent starts a new section, comment lines become the next section note, and Host and Sitemap lines are captured when present.

What does Crawl delay mean?

It is a suggested pause in seconds between requests for a given crawler group. Some crawlers ignore it, so it should be treated as a best effort hint rather than a guarantee.

How do I validate a CSR?

This page is focused on robots.txt policies and does not validate Certificate Signing Requests. Use a certificate or PKI inspector for CSR checks.

Borderline result meaning?

There is no borderline score. The status label is based on whether any Disallow rules exist and whether any section uses Disallow / or /* which is treated as blocking all crawling.

Is there a cost?

This package does not include pricing or licensing text. Any cost or license terms depend on the site that hosts it, so check that site’s terms of use.

Troubleshooting:

  • The Host value stays blank: make sure the Site URL includes https:// so it can be parsed.
  • Sitemap paths do not expand: verify the Site URL is a valid absolute URL.
  • A path looks wrong after generation: remember missing leading slashes are added automatically.
  • Output changes every time: disable the header comment to remove the timestamp.
  • Rules appear out of order: enable sorting to alphabetize Allow and Disallow lines.
  • Import merges content unexpectedly: import starts a new section on every User-agent: line.

Blocking issue: import says “No user-agent rules detected.”

Check that your text includes at least one User-agent: line and that directives use a colon separator like Disallow: and Allow:.

Advanced Tips:

  • Tip Use separate sections for crawlers that need different access, rather than mixing exceptions.
  • Tip Prefer narrow Allow exceptions only when a broader Disallow is necessary for duplicates.
  • Tip Keep sorting enabled and disable header timestamps to make reviews and diffs predictable.
  • Tip Add one sitemap index line, then keep detailed sitemaps referenced from that index.
  • Tip For staging, block all crawling and remove sitemap lines so crawlers do not queue URLs.
  • Tip Treat robots.txt as public signage and avoid listing sensitive endpoints you would not publish.

Glossary:

User-agent
Crawler identifier that selects which section applies.
Allow
Path pattern explicitly permitted within a section.
Disallow
Path pattern a crawler should avoid fetching.
Crawl-delay
Suggested pause in seconds between crawler requests.
Host directive
Optional host hint used by some crawlers.
Sitemap directive
Absolute sitemap URL that helps crawlers find pages.
Wildcard *
Pattern token that can match multiple characters.
Anchor $
Pattern token that anchors a match to the end.
Header comment
Generated note that can include host and timestamp.
ISO 8601 timestamp
Standard date time string produced by toISOString().