Search Engines (Google, Yahoo, MSN, ASK, etc.) index web pages by spidering web pages and applying a secret algorithm to figure out page ranking.
Spiders (or “bots”) do this for each indexable page on the internet on every indexable domain. That’s where a lot of web developers get caught between usability and Search Engine Optimization.
A page called “default.aspx” may reside on my server, but users (including spiders) might be able to reach it by going to http://www.JoeLevi.com or by going to http://JoeLevi.com. The file is the same file with the same content, but spiders see it as two separate pages on two separate domains. This can negatively influence that page’s ranking in two ways:
- by having the same content on multiple sites Search Engines may think you’re trying to spam them, and might reject your page entirely, or by applying some type of “negative points” in their secret algorithm;
- by splitting the relevance of your page across however many instances of that page exist in the Search Engines index; in other words, if you have 2 domains pointing to the same page, the relevancy of those pages is some percentage totaling 100%, theoretically splitting the relevancy in half (now imagine if you have 5 domains, plus their “www” variants… you can see why this article is titled “Preventing SEO Schizophrenia”)
How to attack the problem
First, it’s just fine to have multiple domains, and it’s still best practices to point your domain and www. prefix to your site. It’s also still a good idea to register .com, .net, and whatever other misspellings of your domain name you think users might attempt when trying to get to your site.
BUT, and here’s the kicker, you can’t just point them all to the same site and have the contents look the same — fragmenting your Search Engine placement, having your pages removed for spamming. So how do we address that?
301 Moved Permanently Response Status
HTTP (HyperText Transport Protocol), like most protocols, uses headers to provide session- and meta-data. HTTP 200, for example, means the page was found and all is well. HTTP 301, on the other hand, means that the page has permanently moved, and can then tell the client (which may be a web surfer or a Search Engine bot) where the new page is located. This is a key point: if the client is a spider, it can then update the Search Engine’s index to effectively combine the multiple entries into the one page that it truly is.
While you can add this header response to each and every page on each and every site, I’d recommend a little code magic instead.
Here are ASP.NET C# and VB implementations to place in the PageLoad event of MasterPage or a control loaded across all pages on your site (a header control, for example).
ASP.NET C# Implementation
1: string strServerName = Request.ServerVariables["SERVER_NAME"];
2: if ( (Request != null) &&
3: (strServerName.IndexOf("www") == -1) &&
4: (strServerName.IndexOf("prototype") == -1) &&
5: (strServerName.IndexOf("admin") == -1) )
7: Response.Status = "301 Moved Permanently";
8: Response.AddHeader("Location", "http://www." + Request.ServerVariables["HTTP_HOST"] + Request.ServerVariables["URL"]);
ASP.NET VB Implementation
1: If InStr(Request.ServerVariables("SERVER_NAME"),"www") = 0) Then
2: Response.Status="301 Moved Permanently"
3: Response.AddHeader("Location","http://www." & Request.ServerVariables("HTTP_HOST") & Request.ServerVariables("URL"))
5: End if