You learn how to create and submit an XML sitemap to Google Search Console. You see “Success.” You assume Google’s crawling everything now. Then you check back in three weeks and half your pages still aren’t indexed. Or worse, Google’s crawling 10,000 pages you never wanted indexed in the first place, burning through crawl budget on pagination filters and archive URLs while your best content sits undiscovered.
This happens because most people treat XML sitemaps like a checkbox. Generate it, submit it, forget it. But sitemaps aren’t magic. They’re a crawl efficiency mechanism. And when you don’t understand how search engines actually use them, you end up creating indexation problems instead of solving them.
This guide explains how to create and submit an XML sitemap by breaking down the system underneath it—what sitemaps actually do, how crawlers prioritize them, what to include versus exclude, and how to maintain them as your site evolves. Not just the steps. The structure.
An XML sitemap is a structured file that lists URLs on your website along with metadata about when they were last updated, how frequently they change, and (theoretically) how important they are relative to other pages.
You’re essentially giving search engines a declared list of URLs you want crawled.
But here’s what most content gets wrong: A sitemap is not a discovery mechanism. Google doesn’t need your sitemap to find your pages. Crawlers follow internal links. They parse your site structure. They monitor your RSS feeds. If a page is properly linked within your site, Google will find it eventually.
So why do sitemaps exist?
They exist to influence crawl prioritization. When you submit a sitemap, you’re telling search engines “these are the URLs I care about, and here’s when they were last updated.” The crawler reads that, compares it to what it already knows about your site, and decides whether those URLs are worth crawling based on your site’s crawl budget allocation.
Crawl budget is the number of pages Google is willing to crawl on your site within a given timeframe. It’s determined by your site’s authority, server performance, and perceived content quality. Low-authority sites get less crawl budget. High-authority sites with fast servers get more.
If you have 10,000 pages but Google only crawls 500 per day, your sitemap helps determine which 500 get prioritized. If your sitemap is full of junk—duplicate pages, parameter URLs, archived content nobody searches for—you’re directing crawl budget away from pages that actually matter.
That’s the tradeoff most people miss. Inclusion feels safer than exclusion. But when you include everything, you dilute crawl focus.
XML sitemaps are machine-readable files designed for search engines. HTML sitemaps are human-readable pages designed for site navigation. They serve different purposes.
An HTML sitemap helps users navigate your site structure. It’s a page on your website with links organized by category. Some SEOs argue it helps with internal linking, and it can—but mostly for sites with poor navigation architecture. If your site structure is clean, HTML sitemaps are optional.
XML sitemaps, on the other hand, are required for any site that wants efficient crawling. They’re not displayed to users. They’re read by bots. And they follow a specific format that search engines parse automatically.
You need an XML sitemap. You probably don’t need an HTML sitemap unless your site has 10,000+ pages with weak internal linking.
When you submit a sitemap to Google Search Console or Bing Webmaster Tools, here’s what happens:
yoursite.com/sitemap.xml)Notice what’s missing: There’s no guarantee.
Google can ignore your sitemap entirely. If your sitemap lists URLs that are blocked by robots.txt, return 404 errors, redirect to other pages, or contain thin content, Google will stop trusting your sitemap. Over time, if your sitemap consistently includes low-quality URLs, crawlers will rely more on internal link signals and less on your declared list.
This is why sitemap accuracy matters. It’s not just about having one. It’s about maintaining one that reflects your actual site priorities.
Several variables determine whether a search engine actually crawls the URLs in your sitemap:
Site authority – Low-authority sites get smaller crawl budgets, so sitemaps become more important as a crawl efficiency tool. High-authority sites can afford sloppier sitemaps because Google crawls them aggressively anyway.
Internal linking strength – If a URL is deeply linked within your site, Google prioritizes it over sitemap declarations. Strong internal links override sitemaps. Weak internal links make sitemaps more influential.
Content update frequency – If your sitemap’s lastmod dates are accurate and show recent updates, crawlers check those pages more often. If your lastmod dates never change or are clearly wrong, crawlers ignore them.
Server performance – If your sitemap file is slow to load or your server throttles requests, crawlers will reduce how often they fetch it. Fast sitemap delivery improves crawl frequency.
Historical trust – If you’ve submitted sitemaps in the past that were full of 404s, redirects, or blocked URLs, Google reduces its reliance on your sitemaps. Trust degrades over time with repeated inaccuracies.
The pattern here: Sitemaps work best when they reflect reality. They stop working when they become wishful thinking.
Most sitemap guides tell you to include every indexable URL. That’s wrong.
A better principle: Only include URLs you actively want crawled and indexed.
This means excluding:
?sort=price-low-to-high)/search?q=keyword)The goal is crawl focus. Every URL in your sitemap competes for crawl budget. If you include 5,000 URLs but only 500 actually matter for your business, you’re diluting the signal.
For large sites—ecommerce stores with 50,000 products, content hubs with 10 years of blog archives—this gets more complex. You need to prioritize strategically.
If your site has more than 10,000 URLs, you should break your sitemap into multiple files organized by category or priority level.
Instead of one massive sitemap.xml file, create:
sitemap-index.xml (the master file that lists all other sitemaps)sitemap-products.xml (active product pages)sitemap-blog.xml (recent blog content)sitemap-categories.xml (high-value category pages)sitemap-archive.xml (older content that still has value but doesn’t need frequent crawling)This structure gives you granular control. You can update sitemap-products.xml daily while only updating sitemap-archive.xml monthly. You can submit different sitemaps to different search engines if needed. And you can monitor crawl stats per sitemap to see which content types Google prioritizes.
For ecommerce specifically, exclude:
For content sites, exclude:
/2019/03/)The cleaner your sitemap, the more efficiently crawlers navigate your site.
There are three main approaches: manual creation, CMS plugins, and programmatic generation. Which one you use depends on your site’s size and technical complexity.
If you have fewer than 50 pages and a static site, you can create an XML sitemap manually.
The basic structure looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page-1</loc>
<lastmod>2026-02-15</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>https://example.com/page-2</loc>
<lastmod>2026-02-10</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
</urlset>
Save this as sitemap.xml and upload it to your site’s root directory.
Important notes on XML tags:
<loc> – The full URL (required)<lastmod> – Last modified date in YYYY-MM-DD format (optional but recommended)<changefreq> – How often the page changes: always, hourly, daily, weekly, monthly, yearly, never (optional and mostly ignored by Google)<priority> – A value from 0.0 to 1.0 indicating relative importance (optional and deprecated—Google ignores this)For small sites, this works. For anything larger, manual management becomes unsustainable.
Most modern content management systems have sitemap plugins or built-in sitemap generators.
WordPress: Yoast SEO, Rank Math, and All in One SEO all generate sitemaps automatically. They update dynamically as you publish new content. Configuration options let you exclude post types, taxonomies, or individual pages.
Shopify: Has a built-in sitemap at yourstore.com/sitemap.xml. It includes products, collections, pages, and blog posts automatically. You can’t customize it directly, but you can use apps like Sitemap NoIndex to exclude specific pages.
Webflow: Generates sitemaps automatically for published pages. Limited customization options.
Wix, Squarespace, Weebly: All generate sitemaps automatically. You generally can’t control what’s included, which is fine for small sites but problematic for large ones.
The advantage of plugins: Automation. You don’t have to manually update the sitemap every time you publish content.
The disadvantage: Lack of control. Most plugins include everything by default. If you have thin tag pages or parameter URLs, the plugin will add them to your sitemap unless you explicitly configure exclusions.
If you’re using a plugin, audit it. Don’t assume it’s doing what you want.
For large sites, dynamic sites, or headless setups, you need programmatic sitemap generation.
This typically involves:
/sitemap.xml)Most modern frameworks have sitemap libraries:
next-sitemap packagedjango-sitemap appsitemap_generator gemspatie/laravel-sitemap packageFor JavaScript-heavy sites (React, Vue, Angular), make sure your sitemap generation happens server-side or during build time. Client-side rendering doesn’t work for sitemaps—crawlers need the XML file immediately accessible.
If your site has more than 50,000 URLs, you’ll need a sitemap index file that links to multiple sitemaps.
Example sitemap index structure:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-02-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2026-02-14</lastmod>
</sitemap>
</sitemapindex>
This scales to 50,000 sitemaps × 50,000 URLs = 2.5 billion URLs. If you need more than that, you have bigger problems than sitemap structure.
Before submitting your sitemap, validate it.
Common errors that break sitemaps:
Use a sitemap validator before submitting:
If your sitemap has errors, fix them before submitting. Repeated submission of broken sitemaps trains crawlers to ignore you.
Once your sitemap is generated and validated, submit it to search engines.
sitemap.xml or https://example.com/sitemap.xml)Google will fetch your sitemap and start processing the URLs. This doesn’t mean instant indexation—it means Google has queued your URLs for crawling.
Check back in a few days to see the Discovered vs. Indexed count. If you submitted 1,000 URLs but only 200 are indexed, that’s a signal. Either:
This is where ongoing monitoring matters.
Bing typically crawls sitemaps faster than Google for new sites. If you’re targeting international markets where Bing has stronger presence (Russia uses Yandex, China uses Baidu, but Bing powers DuckDuckGo and other privacy-focused engines), this matters.
IndexNow is a protocol that lets you instantly notify search engines when content changes instead of waiting for them to crawl your sitemap.
Instead of submitting a static sitemap and hoping crawlers check it, you ping an API endpoint whenever you publish or update a page.
Example API call:
POST https://api.indexnow.org/indexnow
{
"host": "example.com",
"key": "your-api-key",
"keyLocation": "https://example.com/your-api-key.txt",
"urlList": [
"https://example.com/new-page"
]
}
Bing and Yandex support this. Google doesn’t (yet). For high-frequency publishing—news sites, ecommerce stores launching products daily—IndexNow reduces time-to-index significantly.
If you’re on WordPress, plugins like Rank Math support IndexNow integration automatically.
You can also declare your sitemap in your robots.txt file:
User-agent: *
Sitemap: https://example.com/sitemap.xml
This helps crawlers discover your sitemap automatically, even if you haven’t manually submitted it to search consoles. It’s good hygiene, but not a replacement for GSC submission—manual submission gives you monitoring tools.
Most people submit their sitemap once and forget about it. That’s where the system breaks down.
Your site evolves. You publish new content. You delete old pages. You restructure categories. You update product listings. If your sitemap doesn’t reflect these changes, it becomes increasingly inaccurate—and crawlers start ignoring it.
Here’s the maintenance system:
Monthly: Audit sitemap accuracy
Quarterly: Review crawl efficiency
After major site changes: Regenerate and resubmit
/blog/post-name to /content/post-name)After content launches: Update immediately
The goal isn’t perfection. It’s sustained accuracy. A sitemap that’s 95% accurate and updated regularly is more valuable than a perfect sitemap that never changes.
Submitting a sitemap is easy. Knowing whether it’s actually helping is harder.
Here’s what to track:
Indexation rate – Compare the number of URLs in your sitemap to the number of indexed pages in Google Search Console. If you submitted 1,000 URLs but only 400 are indexed, investigate why.
Crawl frequency – GSC’s Crawl Stats report shows how often Google crawls your site. If you submit an updated sitemap but crawl frequency doesn’t increase, your sitemap might not be influencing crawler behavior.
Time to indexation – For new content, measure how long it takes from publication to Google indexation. If it’s consistently 7+ days, your sitemap might not be getting crawled frequently enough—or your site’s crawl budget is too low.
Sitemap errors – GSC flags errors like 404s, blocked URLs, and redirect chains. If errors persist across multiple crawls, fix them. Repeated errors degrade sitemap trust.
Coverage issues – GSC’s coverage report shows which URLs Google discovered but didn’t index, and why. Common reasons: duplicate content, low quality, crawled but not indexed (usually a quality signal). If sitemap URLs are showing up as “Discovered – currently not indexed,” that’s a content quality problem, not a sitemap problem.
If your sitemap isn’t improving indexation velocity or crawl efficiency, the issue is usually one of three things:
Sitemaps amplify efficiency. They don’t create it from nothing.
If a URL has a noindex tag or is blocked by robots.txt, don’t include it in your sitemap. Google explicitly warns against this. It wastes crawl budget and reduces sitemap trust.
If your lastmod dates never change or are clearly wrong (e.g., all pages show the same date), Google will ignore them. Only include lastmod if you’re tracking actual content updates.
Your sitemap should only include canonical URLs. If you have duplicate content with canonical tags pointing elsewhere, don’t include the non-canonical versions in your sitemap. This confuses crawlers about which version to prioritize.
Pagination (e.g., /blog/page/2, /products?page=3) rarely needs to be in a sitemap unless each page has unique, valuable content. Most paginated pages are thin and waste crawl budget. Use rel="next" and rel="prev" tags instead, or implement infinite scroll with proper anchor links.
If your site relies heavily on JavaScript (React, Vue, Angular), make sure your sitemap URLs are server-side rendered or pre-rendered at build time. Google can render JavaScript, but it’s slower and less reliable. If a URL requires JS execution to display content, it might not get indexed even if it’s in your sitemap.
If you have multiple language or regional versions of the same content (e.g., example.com/en/page and example.com/es/page), use hreflang annotations inside your sitemap to tell search engines which version to show to which audience.
Example:
<url>
<loc>https://example.com/en/page</loc>
<xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/page"/>
<xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/page"/>
</url>
This prevents duplicate content issues and ensures the right version ranks in the right region.
If your ecommerce site has faceted navigation (e.g., /products?color=red&size=large), these URLs should not be in your sitemap unless each filter combination has unique, valuable content. Most filter URLs are thin and create indexation bloat. Use canonical tags to consolidate them, and keep them out of your sitemap.
XML sitemaps are a crawl efficiency tool, not a magic indexation button. They work best when they reflect your site’s actual priority structure and are maintained as your site evolves.
Core principles:
Checklist:
If your sitemap is 95% accurate, updated monthly, and strategically excludes low-value pages, you’re in the top 10% of sites. Most sitemaps are bloated, outdated, and ignored by crawlers. Yours doesn’t have to be.
Looking to optimize your site’s crawl efficiency and indexation strategy? At MarginsEye, we audit technical SEO infrastructure to identify structural weaknesses that hold sites back. Get a technical SEO audit and see where your site’s crawl budget is actually going.
1. Do I need an XML sitemap if my site is small?
If your site has fewer than 10 pages and strong internal linking, technically no. Google will find everything through links. But there’s no downside to having one, and it speeds up discovery for new pages. For any site that publishes content regularly or has more than 50 pages, yes—you need a sitemap.
2. How often should I update my sitemap?
It depends on how often your content changes. News sites and ecommerce stores should update daily or use dynamic sitemap generation. Blogs that publish weekly can update weekly. Static sites can update monthly. The key is consistency—update it whenever your site structure changes significantly.
3. Can I have multiple sitemaps?
Yes. For large sites, breaking your sitemap into multiple files (products, blog, categories) improves organization and crawl efficiency. Use a sitemap index file to link them all together.
4. What’s the difference between sitemap priority and actual crawl priority?
Sitemap priority tags are deprecated—Google ignores them. Actual crawl priority is determined by internal linking, content quality, update frequency, and site authority. Strong internal links override sitemap declarations.
5. Why are some URLs in my sitemap not getting indexed?
Common reasons: low content quality, duplicate content, thin pages, blocked by robots.txt or noindex tags, or your site doesn’t have enough crawl budget. Check Google Search Console’s coverage report for specific reasons.
6. Should I include images in my sitemap?
If images are important for your SEO strategy (e.g., ecommerce product photos, visual content sites), yes. Create a separate image sitemap or add image tags to your main sitemap. For most sites, this is optional.
7. Do I need to submit my sitemap to multiple search engines?
Yes. Submit to Google Search Console and Bing Webmaster Tools at minimum. If you target specific regions, also submit to Yandex (Russia), Baidu (China), or Naver (South Korea).
8. What happens if I submit a sitemap with errors?
Google will flag the errors in Search Console and may ignore affected URLs. Repeated submission of broken sitemaps degrades trust, meaning crawlers rely less on your sitemap over time. Always validate before submitting.
9. Can I block certain pages from my sitemap but still have them indexed?
Yes. Your sitemap is advisory, not mandatory. Pages with strong internal links can still get indexed even if they’re not in your sitemap. Excluding them just deprioritizes them in crawl order.
10. How do I know if my sitemap is actually being used by Google?
Check Google Search Console’s sitemap report. It shows how many URLs were submitted, how many were discovered, and how many were indexed. If discovered and indexed numbers are close to submitted numbers, your sitemap is working. If there’s a large gap, investigate why.
11. Should I use changefreq tags in my sitemap?
They’re optional and mostly ignored by Google. Only include them if you have accurate data on how often pages actually change. Incorrect changefreq tags don’t harm anything, but they don’t help either.
12. What file format should my sitemap use?
XML is standard. You can gzip compress it to .xml.gz to reduce file size and bandwidth. Both formats are accepted by all major search engines.
Next Read: How to Diagnose and Fix Crawl Budget Issues That Are Killing Your Indexation
Understanding crawl budget allocation is the next layer after sitemap optimization—because even a perfect sitemap won’t help if Google isn’t allocating enough crawl budget to your site in the first place.
We use cookies to improve your experience on our site. By using our site, you consent to cookies.
Manage your cookie preferences below:
Essential cookies enable basic functions and are necessary for the proper function of the website.
These cookies are needed for adding comments on this website.
Statistics cookies collect information anonymously. This information helps us understand how visitors use our website.
Google Analytics is a powerful tool that tracks and analyzes website traffic for informed marketing decisions.
Service URL: policies.google.com (opens in a new window)
SourceBuster is used by WooCommerce for order attribution based on user source.
You can find more information in our Terms and conditions and Privacy policy.