Sitemap XML Generator

Site origin:

Enter the canonical origin, for example https://www.example.com.

URL inventory:

Use absolute URLs or root-relative paths. Optional fields can be separated with pipes, tabs, or CSV commas.

Lastmod handling:

Prefer per-URL dates from your CMS or export. Omit lastmod when you do not know the real update date.

Default lastmod:

Use YYYY-MM-DD unless you have verified time-level modification data.

Optional hint tags:

Choose whether to emit changefreq and priority tags in addition to loc and lastmod.

{{ hint_mode === 'per_line_global' ? 'Fallback change frequency:' : 'Change frequency:' }}

This is a crawl hint, not a crawl schedule command.

{{ hint_mode === 'per_line_global' ? 'Fallback priority:' : 'Priority:' }}

Use one decimal place. The protocol default is 0.5.

De-duplicate URLs:

Leave on for a cleaner sitemap unless you are auditing a raw export.

Output order:

Input order preserves your source export; URL order creates predictable diffs.

Strip tracking query params:

Leave off when query parameters identify real canonical pages.

Include schema location:

Optional. The normal sitemap namespace is enough for most publishing workflows.

Sitemap needs input review

{{ message }}

Line	Status	loc	lastmod	changefreq	priority	Note	Copy
Add at least one URL or path to build a sitemap.
{{ row.line }}	{{ row.status }}	{{ row.loc \|\| row.source }}	{{ row.lastmod \|\| '' }}	{{ row.changefreq \|\| '' }}	{{ row.priority \|\| '' }}	{{ row.note }}

Check	Status	Detail	Copy
{{ row.check }}	{{ row.status }}	{{ row.detail }}

Export to PDF Fullscreen

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

An XML sitemap lists canonical URLs that a site owner wants crawlers to discover. It does not force a crawler to fetch every page, and it does not decide which pages deserve to rank. Its value is practical: it gives search systems a clean inventory of crawlable URLs, plus optional dates and hints that can help with discovery and review.

Good sitemap work starts with the URL list, not the XML wrapper. The pages should belong to the same host, use the canonical protocol and host spelling, and avoid fragments or tracking parameters that do not identify distinct pages. A messy export can still produce XML, but the result may send crawlers toward duplicates, stale paths, or pages that should have been kept out of the crawl inventory.

URL inventory filtered through sitemap rules into urlset XML with review checks

lastmod deserves special care. It should describe a real page modification, not the day the sitemap was regenerated. When dates are missing or uncertain, leaving them out is often better than filling every URL with today's date. Search systems can distrust stale or unrealistic date signals, and optional fields such as changefreq and priority are only hints.

A sitemap also sits beside other crawl controls. A Sitemap: line in robots.txt can tell crawlers where the file lives, while robots rules still decide which paths crawlers may fetch. A valid sitemap is therefore a publication aid, not proof that every listed page is accessible, indexed, or preferred over another canonical URL.

Technical Details:

The standard XML sitemap format wraps URL entries in a urlset element using the sitemap namespace. Each entry needs a loc value. The optional lastmod, changefreq, and priority elements add page-update and crawl-hint information, but the URL itself remains the core record.

Protocol limits matter before publication. A single sitemap file may contain up to 50,000 URLs and must stay at or below 50 MB uncompressed. Each loc value must be shorter than 2,048 characters, and all URLs in one sitemap belong to the same host. XML values also need entity escaping, so characters such as ampersands in query strings must be represented safely in the final XML.

Sitemap Structure Rules:

Sitemap XML structure and field rules
Element or Rule	Requirement	Reader Check
`urlset`	Required root element for the XML sitemap namespace.	The generated XML starts with `urlset` and a sitemap protocol namespace.
`url`	Parent element for one page URL entry.	Each included inventory row becomes one URL entry unless it is excluded or de-duplicated.
`loc`	Required absolute page URL, constrained to the selected site host and shorter than `2,048` characters.	Cross-host rows and overlong URLs are excluded in `URL Inventory`.
`lastmod`	Optional W3C-compatible date or date-time for the page's last meaningful change.	Invalid date values are omitted or replaced by the selected default-date policy.
`changefreq`	Optional crawl hint using `always`, `hourly`, `daily`, `weekly`, `monthly`, `yearly`, or `never`.	Invalid per-line values fall back to the selected global value when fallback mode is active.
`priority`	Optional same-site priority hint from `0.0` to `1.0`; the protocol default is `0.5`.	Entered values are clamped to the valid range and formatted to one decimal place.

The transformation from inventory to XML has four main stages. Raw rows are split into a URL plus optional fields. Relative paths are resolved against the chosen origin. Fragments are removed because sitemap URLs identify pages, not in-page anchors. The remaining values are written as escaped XML so query strings and special characters do not break the document.

Validation and Publication Limits:

Sitemap validation and blocking conditions
Check	Blocking Condition	Practical Response
Site origin	The origin is missing, malformed, or does not use `http` or `https`.	Fix `Site origin` before reviewing row-level errors.
URL entries	No valid same-host rows remain after parsing and exclusions.	Check for cross-host URLs, bad paths, blank lines, or duplicate-only input.
Protocol limits	Included rows exceed `50,000` URLs or XML exceeds `50 MB` uncompressed.	Split the inventory into multiple sitemap files before publishing.
Default date	`Apply one default date` is active and the default date is invalid.	Use `YYYY-MM-DD` or switch to per-line dates or omitted dates.
Optional hints	Hints are valid protocol fields, but major search engines may ignore them.	Treat `changefreq` and `priority` as review signals, not crawl commands.

The optional schema-location setting adds XML Schema Instance attributes for validator workflows. It does not change the URL inventory, limits, date policy, or same-host checks. For most publication paths, the sitemap namespace and valid URL entries carry the important meaning.

Everyday Use & Decision Guide:

Start with Site origin set to the exact canonical origin, such as https://www.example.com. Then paste one URL or path per line in URL inventory. Root-relative paths are convenient for CMS exports, while full URLs are useful when you need to catch host mismatches before publication.

Use per-line lastmod values when your source export knows the real update date for each page. Choose Omit lastmod when the dates are unknown. Apply one default date is best reserved for a narrow batch where all missing rows genuinely changed on the same date.

Leave De-duplicate URLs on for a publishable sitemap. Turn it off only when auditing a raw export for duplicate causes.
Use Strip tracking query params when campaign parameters created duplicate URLs, but leave it off when query strings identify real canonical pages.
Choose URL order for predictable diffs, Shallow paths first for human review, or Input order to match the source export.
Open URL Inventory before copying XML. A row marked Review can still be included, but its note tells you what changed.
Use Publish Check as the final pause point. Any Blocked status should be fixed before the file is placed on the site.

The most common mistake is treating optional hints as commands. A weekly changefreq does not make crawlers visit weekly, and a priority of 1.0 does not outrank another site. Use those values only when they help your own same-site inventory review.

After the checks pass, publish the XML at the intended sitemap URL, add or confirm the Sitemap: directive in robots.txt, or submit the sitemap in Search Console or a comparable crawler console.

Step-by-Step Guide:

Enter the canonical Site origin. If the origin is invalid, the summary switches to Sitemap draft blocked and Publish Check reports a host-scope blocker.
Paste rows into URL inventory or load a TXT, CSV, or TSV file. Use URL | lastmod | changefreq | priority when you already have optional fields.
Set Lastmod handling. Use per-line dates for CMS exports, choose a default date for one known batch, or omit dates when you cannot verify them.
Choose Optional hint tags. Use per-line values with fallback when rows have mixed hints, global hints for one uniform sitemap, or omit hints when you only want loc and trusted dates.
Keep De-duplicate URLs on unless duplicate review is the goal. If needed, open Advanced to set Output order, Strip tracking query params, or Include schema location.
Review URL Inventory. Fix excluded cross-host URLs, invalid dates, duplicate rows, and overlong locations before relying on the XML.
Open Publish Check. Clear every Blocked row and decide whether any Review row is acceptable for your publishing workflow.
Use Sitemap XML as the copy-ready output once the included URL count, byte size, lastmod count, and review notes match the intended sitemap.

Interpreting Results:

Sitemap XML ready means the current inputs produce at least one included URL, the origin is valid, and the basic protocol limits are not exceeded. Sitemap ready with review notes means XML exists, but at least one row was changed, warned, excluded, or de-duplicated. Sitemap draft blocked means the output should not be published yet.

How to read sitemap generator result signals
Result Cue	What It Means	What to Do
`loc` count	The number of included URL entries written to the XML.	Compare it with the expected crawl inventory count.
`excluded` count	Rows were dropped because they were invalid, cross-host, overlong, or duplicate under the current settings.	Open `URL Inventory` and review each note before publishing.
`lastmod` count	Included URLs with accepted date or date-time values.	Make sure those dates reflect real page updates, not a blanket regeneration date.
`Protocol limits`	The sitemap is checked against `50,000` URLs and `50 MB` uncompressed size.	Split the inventory when either limit is exceeded.
`Optional hints`	`changefreq` and `priority` are present or omitted under the selected hint policy.	Do not read hint presence as crawler scheduling or ranking control.

A valid XML file does not prove that the URLs return 200, point to self-canonical pages, or are allowed by robots rules. Before submitting a production sitemap, spot-check representative URLs and confirm that excluded rows were intentionally left out.

Worked Examples:

A campaign export might contain /docs/getting-started?utm_source=newsletter and /docs/getting-started?utm_medium=email. With Strip tracking query params and De-duplicate URLs enabled, the normalized location becomes one canonical docs URL, one duplicate is excluded, and URL Inventory shows why the count dropped.

A troubleshooting case begins with https://blog.example.com/post-1 in the inventory while Site origin is https://www.example.com. The row is excluded as cross-host, excluded increases, and Publish Check tells you to review source exclusions. Fix it by changing the origin only if the sitemap is truly for the blog host, or by removing the cross-host URL from the www sitemap.

FAQ:

Can I include URLs from more than one host?

No for this generator. Included rows are constrained to the host in Site origin, and cross-host URLs are marked excluded in URL Inventory.

Should every URL have lastmod?

Only when the date is real. Use per-line dates from a reliable export, apply a default date only for a known same-date batch, or choose Omit lastmod when the update date is uncertain.

Why did a duplicate URL disappear?

De-duplicate URLs keeps the first identical normalized URL and excludes later copies. Turn it off only when you want the inventory table to show duplicate source rows without removing them.

Do changefreq and priority control crawling?

No. They are optional sitemap hints. The Publish Check tab marks them for review because major search engines may ignore those fields.

Why is the sitemap blocked even though XML appears?

The blocker comes from a required publish check, such as an invalid origin, no included URLs, an invalid default date, more than 50,000 included URLs, or output above 50 MB uncompressed.

Glossary:

XML sitemap: A structured list of URLs and optional metadata that helps crawlers discover site pages.
Site origin: The protocol and host that define which URLs belong in this sitemap, such as https://www.example.com.
loc: The required sitemap element that contains the absolute URL for one page.
lastmod: An optional date or date-time for the page's last meaningful modification.
changefreq: An optional hint about likely page-change frequency, not a crawler schedule.
priority: An optional same-site priority hint from 0.0 to 1.0.

References:

Sitemaps XML Protocol, Sitemaps.org.
Sitemaps ping endpoint is going away, Google Search Central Blog, 2023-06-26.
Create and Submit a robots.txt File, Google Crawling Infrastructure, 2025-11-21.
How Google Interprets the robots.txt Specification, Google Crawling Infrastructure.