Sitemaps: Is there such a thing as too big?

Screen shot 2011-01-26 at 3.01.40 PM.png

I use the Google XML-Sitemap plugin in all my WordPress installations. It’s a great way to tell Google, Yahoo and Bing when my site updates, and for them to easily see what’s new. It’s a nice set-and-forget kind of plugin.

I’ve been noticing something strange on one of my sites lately – the sitemap keeps timing out and producing nothing but a blank screen. That’s never good – so I’ve been digging into the site to suss out the issue.

I think part of the problem is that this particular site has over 2,700 blog posts. Building the sitemap for that site, and including all the tags and categories was killing the process.

My question today is this – should we be limiting the number of items we put in our sitemap.xml file? Should I cut it off at an arbitrary number, such as 25, or leave all 2,700 in there knowing that only the most recent 5 or so posts are the ones I really want Google to see.

2 thoughts on “Sitemaps: Is there such a thing as too big?”

  1. You can manually upload a sitemap to Google with Webmaster Tools. I’m not sure if this plugin is creating the sitemap dynamically, hence the timeout? So that might be a solution.

    I find that even on a non-WordPress site Google only follows a certain number of links on the sitemap that we provide, ex. our site map contains 2,000 urls but Google says it is only using 1,100 of these urls. Maybe this is due to dead links?

    On a blogging site, I think WordPress does a great job that directs the spider around to pages like archives, categories etc. I’ve read that Google will use your sitemap on two pages that might be duplicate, it will link to the one provided in the sitemap.

    Sitemaps also come in handy with sections of your site that aren’t included in navigation but you still want to index. So maybe save a certain number of links for that purpose.

  2. I have also seen it where LARGE sites 100,000+ pages that have everything in the sitemap don’t get everything into the search index. I imagine this is a classic problem for large college websites.

Comments are closed.