| # | User-agent | Allow | Disallow | Crawl-delay | Note | Copy |
|---|---|---|---|---|---|---|
| Add a user-agent section to generate robots.txt. | ||||||
| {{ row.idx }} | {{ row.userAgent }} | {{ row.allowSummary }} | {{ row.disallowSummary }} | {{ row.crawlDelayLabel }} | {{ row.note }} | |
Robots.txt files are small rule lists that tell automated crawlers which parts of a site they should visit and which paths they should skip. A robots.txt generator for staging sites helps you draft a policy that matches your intent and spot accidental blocks before you publish.
Teams use robots rules to keep test areas out of public discovery, to focus crawl attention on important pages, and to reduce wasted requests on duplicates such as internal search results. Robots.txt is guidance, not a lock, so anything truly private still needs proper access control.
You describe the site you are working on and write simple rules for each crawler group you care about. The generator turns that plan into a ready to publish file and a readable summary you can sanity check before it goes live.
For example, a shop can allow product pages, block account and checkout areas, and include a sitemap reference so catalog changes are found sooner. A staging environment can use a stricter policy that blocks everything while still keeping production open to crawlers.
Keep notes explaining why a rule exists and revisit them whenever the site structure changes. If your rules mention sensitive paths, remember that robots.txt is public and should not contain secrets.
A robots.txt file is a plain text policy that groups directives into sections for a specific User-agent token, such as * or Googlebot. Within each section, Allow and Disallow lines describe which Uniform Resource Locator (URL) path patterns a crawler may fetch, and Crawl-delay suggests a minimum pause between requests for crawlers that honor it.
This generator builds the final file by cleaning inputs, normalizing rule lines, and then emitting directives in a predictable order. Duplicate rules inside a section are removed, and you can optionally alphabetize Allow and Disallow lines to make diffs easier to review.
The summary banner is computed from rule counts and a simple full block check, and it labels policies as Crawling open, Selective crawling, or All crawling blocked. Warnings appear when Host is empty, when no Sitemap URLs are present, or when any Crawl-delay is set because some crawlers ignore it.
User-agent value, defaulting to * when blank.Allow and Disallow text into lines, trimming whitespace and dropping empties.* wildcard, or $ anchor.Crawl-delay to a nonnegative number and round to two decimals.Crawl-delay value.Allow and Disallow rules with locale-aware string comparison.Disallow line as an allow-all placeholder.User-agent, then Allow, Disallow, and Crawl-delay lines.Host and each Sitemap line after all sections, then trim trailing blanks.| Symbol | Meaning | Unit/Datatype | Source |
|---|---|---|---|
User-agent |
Selects the crawler group the section targets. | string | Input |
Allow |
Path pattern explicitly permitted for the selected agent. | string per line | Input |
Disallow |
Path pattern the selected agent should avoid fetching. | string per line | Input |
Crawl-delay |
Suggested delay between requests for crawlers that honor it. | seconds (number) | Input or derived |
Host |
Optional preferred host value used by some crawlers. | host string | Input, cleaned |
Sitemap |
Absolute sitemap location to help crawlers discover pages. | URL string | Input, normalized |
# |
Comment line written above a section or at the file start. | text line | Derived |
* |
Wildcard token in agent names or rule patterns where supported. | character | Input |
$ |
End anchor in rule patterns for crawlers that honor it. | character | Input |
Scenario: you are preparing a staging site and want to block an admin area, publish a sitemap, and apply a polite request delay.
https://www.example.comhttps://www.example.com/sitemap.xml and sitemap-index.xml2.345User-agent: *, Allow: /, Disallow: admin, Crawl-delay: 0, note “Block admin while testing”The generator normalizes the delay to two decimals, then applies it as the effective delay because the section delay is zero.
It also cleans Host to www.example.com, prefixes the disallow rule as /admin, and expands sitemaps to absolute URLs.
# robots.txt generated for www.example.com on 2026-02-14T00:00:00.000Z # Block admin while testing User-agent: * Allow: / Disallow: /admin Crawl-delay: 2.35 Host: www.example.com Sitemap: https://www.example.com/sitemap.xml Sitemap: https://www.example.com/sitemap-index.xml
| Status label | Trigger in this generator | Interpretation | Action cue |
|---|---|---|---|
| Crawling open | No Disallow rules are present. |
Crawlers are not explicitly blocked by path rules. | Consider adding blocks for private or duplicate areas. |
| Selective crawling | At least one Disallow rule is present. |
Some paths are blocked, others may remain reachable. | Review for accidental blocks and keep sitemaps up to date. |
| All crawling blocked | Any section contains Disallow: / or Disallow: /*. |
The policy blocks all crawlers for the affected User-agent group. | Use for maintenance or staging, remove before launch if unintended. |
These labels are a convenience summary of the generated rules and do not guarantee how a specific crawler will behave.
| Preset | Intent | Section defaults | Global Crawl-delay (s) |
|---|---|---|---|
| Standard (allow all) | Open crawling with a simple baseline policy. | User-agent: *, Allow: /, no disallows, note “Default access for all crawlers”. |
0 |
| Block all crawlers | Block crawling for maintenance or staging. | User-agent: *, Disallow: /, note “Maintenance or staging; disallow everything”. |
0 |
| Hide admin/search | Block common private and duplicate paths. | Disallow includes /admin, /login, /account, /checkout, /cart, /cgi-bin/, /search. |
0 |
| Throttle polite crawlers | Suggest a slower crawl rate where honored. | User-agent: *, Allow: /, section Crawl-delay set. |
10 |
Math.round(value × 100) / 100.0.| Field | Type | Min | Max | Step/Pattern | Error Text | Placeholder |
|---|---|---|---|---|---|---|
| Site URL | url string |
— | — | Expected to include protocol (e.g., https://). |
— | https://www.example.com |
| Host directive | string | — | — | Protocol stripped if present; leading and trailing slashes removed. | Warning when empty: “Host directive is empty; set it for Bing/Yandex clarity.” | www.example.com |
| Sitemap URLs | multi-line string | — | — | One URL per line; /path and relative paths expand from Site URL base. |
Warning when empty: “Add at least one Sitemap URL so crawlers find your index.” | https://www.example.com/sitemap.xml |
| Global Crawl-delay | number | 0 | — | Step 0.1; rounded to 2 decimals. |
— | — |
| User-agent | string | — | — | Defaults to * when blank; preset list includes common bots. |
— | Custom value: “Enter custom user-agent” |
| Allow paths | multi-line string | — | — | One path per line; supports * and $ tokens for crawlers that honor them. |
— | / and /public/ |
| Disallow paths | multi-line string | — | — | Leading slash is added when missing, unless the rule starts with * or $. |
— | /admin and /search |
| Section Crawl-delay | number | 0 | — | Step 0.1; falls back to Global when 0; rounded to 2 decimals. |
Warning when any delay set: “Crawl-delay is ignored by some crawlers (e.g., Googlebot).” | — |
| Note | string | — | — | Rendered as a # comment above the section. |
— | “e.g., Blocked cart pages” |
| Import robots.txt | multi-line string | — | — | Parses User-agent, Allow, Disallow, Crawl-delay, Sitemap, Host. |
“Paste robots.txt content to import.” and “No user-agent rules detected.” | “Paste robots.txt here” |
| Input | Accepted families | Output | Encoding/Precision | Rounding |
|---|---|---|---|---|
| Rule lists | Newline-separated strings for Allow and Disallow. | robots.txt text | LF newlines, trimmed lines, duplicates removed. | Not applicable |
| Delays | Numbers ≥ 0 for Global and per-section Crawl-delay. | Crawl-delay directives |
Decimal seconds as a JavaScript number. | 2 decimal places |
| Sitemaps | Absolute URLs, /path, or relative paths. |
Sitemap directives |
Normalized to absolute URLs when a valid Site URL base exists. | Not applicable |
| Policy snapshot | Current settings and normalized sections. | JSON policy view | Pretty-printed with 2-space indentation and generated_at timestamp. |
Not applicable |
| Directive summary | Normalized sections with allow and disallow summaries. | Table, CSV, and DOCX summary | CSV headers: “#”, “User-agent”, “Allow rules”, “Disallow rules”, “Crawl-delay (s)”, “Note”. | Not applicable |
# are treated as a note for the next User-agent section.User-agent: line starts a new section.Allow, Disallow, and Crawl-delay lines attach to the current section, creating one if needed.Host value is used when multiple exist.Sitemap lines are collected and then normalized using the current Site URL base.User-agent lines are detected.This package contains no fetch or XHR calls and does not write to local or session storage in its script. Copy and download actions operate on the generated text and summaries.
Robots.txt content and path lists can reveal internal structure, so treat drafts and published files as public configuration data.
For S sections and R total rules, normalization runs in O(S + R), and optional sorting adds O(R log R). Output generation is linear in the number of emitted lines.
Robots.txt is public and advisory. Do not rely on it to protect sensitive content, and avoid including secrets or private tokens in rule text.
Host is treated as optional and may be ignored by many crawlers.Crawl-delay is included when set, even though support varies by crawler.0 and omitted.0 after coercion.0.Math.round, which can bump 0.005 to 0.01.Robots directives are commonly described by the IETF Robots Exclusion Protocol standard (RFC 9309). URL parsing behavior follows the WHATWG URL Standard, and timestamps use ISO 8601 formatting via toISOString() in UTC.
Text handling relies on the Unicode Standard for character encoding and comparison behaviors.
Robots.txt policies guide crawlers toward the right pages and away from areas you do not want crawled, and this flow helps you produce a clean draft you can publish.
https://./sitemap.xml./robots.txt on your site.Compact example: block the admin area while allowing everything else.
User-agent: * Allow: / Disallow: /admin
Pro tip: generate a staging policy that blocks everything, then switch to a selective policy only when you are ready for discovery.
The script contains no network calls and does not write to local or session storage. Treat drafts as public if you copy them into tickets or share them widely.
Robots.txt is meant to be publicly readable once published.It accurately reflects the rules you enter plus normalization like adding leading slashes and rounding delays. Real crawler behavior varies, so test important URLs with the crawlers you care about.
Generation and import parsing run locally. If the page and its assets are already loaded, it can continue without a network connection.
You can copy or download robots.txt text, view a directive summary table, and export the policy as CSV, DOCX, or JSON.
Formats reflect the same normalized rules shown in the output.Paste an existing robots.txt into the import area and run import. Each User-agent starts a new section, comment lines become the next section note, and Host and Sitemap lines are captured when present.
It is a suggested pause in seconds between requests for a given crawler group. Some crawlers ignore it, so it should be treated as a best effort hint rather than a guarantee.
This page is focused on robots.txt policies and does not validate Certificate Signing Requests. Use a certificate or PKI inspector for CSR checks.
There is no borderline score. The status label is based on whether any Disallow rules exist and whether any section uses Disallow / or /* which is treated as blocking all crawling.
This package does not include pricing or licensing text. Any cost or license terms depend on the site that hosts it, so check that site’s terms of use.
https:// so it can be parsed.User-agent: line.Blocking issue: import says “No user-agent rules detected.”
Check that your text includes at least one User-agent: line and that directives use a colon separator like Disallow: and Allow:.
toISOString().