How to fix: Google not indexing Commerce Server 2007 pages

kick it on DotNetKicks.com

Alternate Titles:

  • ASP.NET 2.0 URL Rewriting Causes HTTP 500 Errors for GoogleBot
  • How to disappear from Google, Yahoo, MSN etc. with Commerce Server 2007 in less than a week
  • Cannot use a leading .. to exit above the top directory

Tools You’ll Need:

Background and Fix:

There are a few factors that may be causing Commerce Server pages to fail to be indexed, finding out the root cause is the first step.

To do this change the User Agent String in your web browser to the Googlebot’s UAstring. I accomplished this by using Firefox (http://getfirefox.com) and the User Agent Switcher Plug-in (http://chrispederick.com/work/user-agent-switcher/). You’ll need to add the Googlebot UA to the plug-in (it’s "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" then surf to your site using this newly created User Agent. Once you do that you’ll "see" what the Googlebot "sees" and can troubleshoot.

First, sometime in 2006 Google changed the way it reported itself via the User Agent String. Prior to the change it had been handed by ASP.NET’s default.browser definitions, and everything worked fine. After that time it started being recognized as a Mozilla UA. ASP.NET erroneously handles clients using the Mozilla UA with the default Mozilla/1.0 settings (which assume NO COOKIES), ASP.NET inserts the session ID into the url and issues 302 redirect (content temporarily moved)—instead of the 200 response that a Google expects.

To get a real picture of what the problem is, open Firefox and switch your User Agent to Googlebot (see above). Surf to your website and you should see the error that Google is seeing (you may have to turn off your custom errors to see the actual error message, but you already knew that). You’ll probably see an error about trying to navigate the top level.

This is actually three problems in one:

To correct the first part of this problem, the Mozilla UA detection issue, add a genericmozilla5.browser file to your application. The best write-up I’ve seen on this can be found here: http://www.kowitz.net/archive/2006/12/11/asp.net-2.0-mozilla-browser-detection-hole.aspx

To address the second part of this problem, the cookie handling issue, change your web.config file like so:

   1: <authentication mode="Forms">
   2:    <forms ... cookieless="UseCookies" />
   3: </authentication>

The third part of the problem was already addressed in the genericmozilla5.browser file (above) but is worth going in to a little detail. ASP.NET 2.0 has a bug in the tagwriter that many ASP.NET 2.0 URL re-writers use, specifically System.Web.UI.Html32TextWriter. You’ll notice that we’re using the System.Web.UI.HtmlTextWriter class for tagwriting in our custom .browser file, so by following the instructions above you’ll have already corrected for this issue, too.

Next, you’ll need to IISRESET (or at least cycle the applications AppPool) to make the changes take effect (specifically the addition of the .browser file).

Then, re-test your application (still with Firefox using the Googlebot UA), and your problem should now be solved.

At this point, change your web.config back to show custom errors, generate a Google Sitemap, and submit it to Google for indexing. This should take at least a few days and may take the better part of a week.

You can thank me later.

www.JoeLevi.com

Related Articles:

kick it on DotNetKicks.com

One Response to How to fix: Google not indexing Commerce Server 2007 pages

  1. asp.net says:

    your site is very nice …
    this is very helpful and attractive.

Leave a Reply