Search engines such as Google can make use of XML sitemaps to discover content. Such sitemaps are useful if a site:
- Has dynamic content
- Has pages that aren't easily discovered during the crawl process (e.g. rich media)
- Is new and has few links to it
- Has a large archive of content pages that are not well linked to each other, or are not linked at all
The XML sitemap protocol is an open standard defined by http://www.sitemaps.org.
According to the specification:
- You can provide multiple sitemap files
- Each sitemap file can only contain 50,000 URLs
- Site map files must be < 10 MB
- Compression is allowed
- Multiple sitemap files should be listed in a sitemap index file
The location of a sitemap file is important to what URLs can be contained in it.
“A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/.” – see http://www.sitemaps.org/protocol.php#location
See http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156184 for Google specifics.