{{ summaryTitle }}
{{ summaryPrimary }}
{{ summaryLine }}
{{ badge.label }}
Sitemap XML generator inputs
Enter the canonical origin, for example https://www.example.com.
Use absolute URLs or root-relative paths. Optional fields can be separated with pipes, tabs, or CSV commas.
{{ sourceHint }}
{{ fileError }}
Prefer per-URL dates from your CMS or export. Omit lastmod when you do not know the real update date.
Use YYYY-MM-DD unless you have verified time-level modification data.
Choose whether to emit changefreq and priority tags in addition to loc and lastmod.
This is a crawl hint, not a crawl schedule command.
Use one decimal place. The protocol default is 0.5.
Leave on for a cleaner sitemap unless you are auditing a raw export.
{{ dedupeEnabled ? 'On' : 'Off' }}
Input order preserves your source export; URL order creates predictable diffs.
Leave off when query parameters identify real canonical pages.
{{ stripTrackingEnabled ? 'On' : 'Off' }}
Optional. The normal sitemap namespace is enough for most publishing workflows.
{{ schemaLocationEnabled ? 'On' : 'Off' }}

          
Line Status loc lastmod changefreq priority Note Copy
Add at least one URL or path to build a sitemap.
{{ row.line }} {{ row.status }} {{ row.loc || row.source }} {{ row.lastmod || '' }} {{ row.changefreq || '' }} {{ row.priority || '' }} {{ row.note }}
Check Status Detail Copy
{{ row.check }} {{ row.status }} {{ row.detail }}

        
Customize
Advanced
:

Introduction:

Search crawlers discover pages through links first, but a sitemap gives them a deliberate inventory to check. The XML format is especially useful when a site has pages that are hard to reach from navigation, recently changed content, separate sections managed by different systems, or enough URLs that a plain manual list becomes risky.

A good sitemap is not a promise that every page will be indexed. It is a clean statement of which canonical URLs belong to one host and which optional facts, such as a last modified date, are reliable enough to share. If a CMS or hosting platform already publishes a reliable sitemap, that maintained source is usually safer than a second hand-built list. Crawlers still evaluate robots rules, redirects, canonical tags, page quality, duplicate content, and normal crawl constraints before deciding what to fetch or show in search results.

The most common mistakes happen before the XML is even written. A site owner may mix www and non-www URLs, include a staging host, paste campaign links with tracking parameters, or fill every lastmod field with the generation date. Those choices can produce well-formed XML while still creating a weak crawl handoff.

URL inventory filtered through same-host, dedupe, and limit checks before urlset XML is produced

Three ideas matter most when preparing a sitemap inventory: location, freshness, and intent. Location means each URL should belong to the host that will publish or submit the sitemap. Freshness means lastmod should describe a meaningful page change, not a scheduled export. Intent means the list should contain the URLs you actually want search engines to consider as canonical, not every possible path a server can return.

Sitemap fields and common mistakes
Field What it tells crawlers Common mistake
loc The absolute URL to consider for discovery. Listing duplicates, tracking URLs, old protocol variants, or the wrong host.
lastmod The date or date-time of a significant page update. Using today's date for unchanged pages.
changefreq An optional hint about expected change frequency. Treating it as a crawl schedule.
priority An optional same-site importance hint from 0.0 to 1.0. Expecting it to improve ranking against other sites.

Large sites also have a mechanical boundary: one sitemap file is limited to 50,000 URLs and 50 MB uncompressed. Bigger inventories need multiple sitemap files and usually a sitemap index. Smaller sites still benefit from the same discipline because duplicate or stale URLs can make troubleshooting harder after submission.

A sitemap works best beside robots rules, redirects, canonical tags, internal links, and webmaster-tool checks. It helps crawlers find and revisit preferred URLs, but it should not be used to hide architecture problems or to compensate for pages that are blocked, redirected incorrectly, or duplicated across several canonical forms. After publication, the most useful checks are simple: fetch the public file, confirm that it returns sitemap XML, list it in robots.txt when appropriate, and watch webmaster-tool feedback for fetch or parsing errors.

How to Use This Tool:

Start with the host that owns the sitemap, then paste or load the URL inventory and review the generated checks before publishing the XML.

  1. Set Site origin to the exact http or https origin for the sitemap, using the site's canonical host and protocol. Relative paths will use this origin.
  2. Paste one URL or root-relative path per line in URL inventory, or choose Browse TXT/CSV. A row may include optional fields after the URL as lastmod, changefreq, and priority.
  3. Choose Lastmod handling. Use per-line values when they come from a CMS or crawl export, apply a default only when missing rows truly share that date, or choose Omit lastmod when dates are uncertain.
  4. Set Optional hint tags. Global hints applies one changefreq and priority to every included row, Per-line with fallback respects row values when valid, and Omit hints writes only URL and date fields.
  5. Leave De-duplicate URLs on for a publication draft. Use Output order for stable diffs, Strip tracking query params when campaign parameters are not canonical, and Include schema location when a validator workflow expects it.
  6. Check the summary, URL Inventory, and Publish Check. If the draft is blocked, fix the origin, add at least one valid same-host URL, split oversized files, or correct invalid default dates before copying Sitemap XML.

Interpreting Results:

The summary is a readiness signal, not the full audit. Sitemap draft blocked means the XML is missing a required condition. Sitemap ready with review notes means XML can be produced, but one or more rows, dates, hints, exclusions, or handoff checks still deserve attention.

  • Sitemap XML is the text to publish only after blockers and important review notes are resolved.
  • URL Inventory shows which rows are Included, Review, or Excluded, with row-level notes for fragments, tracking cleanup, duplicate removal, invalid dates, cross-host URLs, and parse failures.
  • Publish Check compares the draft against host scope, entry count, size limits, lastmod policy, optional hints, XML escaping, source exclusions, and discovery handoff.
  • Passing checks do not prove that the listed pages are crawlable, canonical, indexable, or reachable over live HTTP. Test representative URLs after publishing and compare Search Console or webmaster-tool feedback.

Technical Details:

The standard XML sitemap format uses a urlset root element in the sitemap namespace. Each url entry must contain one absolute loc value. The optional lastmod, changefreq, and priority elements add metadata, but they do not replace the required URL.

Canonicalization matters because crawlers compare sitemap entries with the rest of the site's signals. A path such as /pricing becomes useful only after it is resolved to a full URL. Query strings may identify real content, but analytics parameters and fragments usually do not belong in a canonical sitemap URL. When an absolute URL already includes a protocol, review it against the site's chosen canonical protocol before publishing.

Transformation Core:

Sitemap XML transformation rules
Stage Rule Result Cue
Origin check The origin must parse as http or https. An invalid origin blocks the draft.
Row parsing Each non-empty, non-comment line is split as a URL plus optional date, frequency, and priority fields. Pipe, tab, and CSV-style comma rows are accepted.
URL resolution Root-relative paths use the selected origin, while absolute URLs must stay on the selected host. Different hosts are excluded from the XML.
URL cleanup Fragments are removed. Common tracking query parameters are removed only when that option is enabled. The inventory note records cleanup that affects an included row.
Optional fields lastmod must be a valid date or date-time. changefreq must match a protocol value. priority is constrained to 0.0 through 1.0. Invalid dates are omitted or replaced by the selected policy, and invalid hints fall back when fallback mode is active.
XML writing Included values are entity-escaped before loc, lastmod, changefreq, and priority are written. A URL containing & remains valid XML text.

Protocol and Review Limits:

Sitemap protocol limits and review responses
Check Limit or Rule Response
URLs per sitemap 50,000 Split larger inventories into multiple sitemap files.
Uncompressed file size 50 MB Split the file or publish a compressed version while keeping the uncompressed size within the limit.
loc length 2,048 characters Shorten, canonicalize, or remove overlong URLs before publishing.
Host scope one host Generate a separate sitemap for a different host or subdomain.
Google hint handling advisory Expect changefreq and priority to be ignored by Google; accurate URLs and dates matter more.

A row such as /docs/getting-started?utm_source=newsletter | 2026-04-12 | weekly | 0.7 is resolved under the selected origin, the tracking parameter is removed when that cleanup option is on, the W3C-compatible date is kept, and the remaining values are emitted as one escaped url entry.

Comment lines and blank lines are skipped before row validation, so a working inventory can keep short notes while the XML stays limited to included URLs. Sorting by URL or path depth changes only the order of emitted entries; it does not override exclusion rules, date validation, hint fallback, duplicate handling, or the one-host sitemap boundary.

Privacy Notes:

Pasted inventory text and selected TXT, CSV, or TSV files are read in the current browser session to build the sitemap output. The generator does not crawl listed URLs, fetch page content, submit the sitemap to search engines, or verify live HTTP status.

  • Remove staging URLs, private paths, tokens, and campaign parameters before sharing XML, JSON, CSV, or document exports.
  • Treat generated sitemap.xml as public once published because it advertises URLs you want crawlers to discover.
  • Run a separate crawl, link check, or webmaster-tool validation when live reachability and indexing feedback matter.

Worked Examples:

A small documentation site sets Site origin to its canonical web origin and pastes /, /pricing, and /docs/getting-started | 2026-04-12 | weekly | 0.7. The URL Inventory shows three Included rows, Publish Check passes host scope and protocol limits, and Sitemap XML contains three loc entries.

A CMS export contains a blog-host page while the origin is the main site host. That row appears as Excluded because the host does not match. The right fix is to generate a separate sitemap for the blog host, not to force the row into the main site's file.

A row such as /sale#signup | 2026-02-31 | daily | 1.2 needs review before publishing. The fragment is removed, the invalid lastmod is omitted or replaced depending on Lastmod handling, and the priority is constrained to the valid sitemap range. Check the row note and Publish Check before copying the XML.

FAQ:

Can I mix subdomains in one sitemap?

No. The inventory is constrained to the host in Site origin. Use a separate sitemap for hosts such as www.example.com and blog.example.com.

Should every URL have today's lastmod date?

Only use today's date when the page content actually changed today. If the date is uncertain, choose Omit lastmod or use per-line dates from a reliable CMS, crawl, or deployment source.

Why did a URL disappear from Sitemap XML?

Open URL Inventory. Rows can be excluded for missing values, parse failures, cross-host URLs, overlong loc values, or duplicate URLs when De-duplicate URLs is on.

Does a passing Publish Check mean the pages will be indexed?

No. Publish Check reviews sitemap structure and handoff risks. Indexing still depends on crawl access, canonical signals, redirects, page quality, and search engine decisions.

Glossary:

urlset
The root XML element for a standard sitemap file.
loc
The required absolute URL for one sitemap entry.
lastmod
An optional date or date-time for a meaningful page change.
changefreq
An optional protocol hint for likely update frequency.
priority
An optional same-site importance hint from 0.0 to 1.0.
sitemap index
A file that lists multiple sitemap files for larger inventories.

References: