by Alan Rabinowitz
Like it or not the World’s most popular Search Engine is old and has bugs. Since 1998 Google has been unable to fix its weak canonical handling and 302 hijacks that actually deteriorate website rankings if not found.
One nice thing that Google does not give SEOs credit for, it that fact that we fix sites. We make pages work for Google while the average laymen web designer and developer is 100% clueless to SEO and “Google Bugs”. We make pages conform to Google’s webmaster guidelines, while the average webmaster is still clueless.
The canonical fix:
Since CMS systems and blogs are all over, we have entered a time when programmers make SEOs lives quite busy. We need to make pages search engine friendly, and with CMS systems, shopping carts, and some blogs, we have the ability to rewrite URLs to make them Search Engine Friendly.
Google’s Bug:
Google can sometimes get lost as to the correct path to a site and trigger its aggressive duplicate content filter to auto ban your website. Google sees the following as 4 different duplicate pages:domain.com
domain.com/index.html
www.domain.com
www.domain.com/index.html
While they have given us tools in Google Webmaster Tools that may help eliminate issues. A canonical fix is the best solution.
To make this even worse, when we use a CMS system some still have the URLs available in rewrites. So if we rewrite a URL like to become a .html static appearance file, we need to make sure that there is no way the the dynamic version like index.php?variable&id=12345
So, we need to remove the potential canonical – duplicate content issue. Google is an old application with the potential of having lots of legacy code.
All links to the home page should always go directly to the URL and not the file extension. So we never want to go to domain.com/index.html, instead we want to go to http:/domain.com/ So to take this further we want to make sure the sites do not duplicate two paths. We need to make sure there is no way that anyone can get to the original dynamic URL, but only the original static URL
When using URL rewrites, it is imperative that we only use one form for the URLs. So if we have both:
domain.com/extension.html
and
index.php?option=com_content&task=view&id=13&Itemid=26
Google may hurt the site because it can be considered duplicate as we have two paths and one may still exist on the site if we are not careful in implementing our rewrites.
Keep in mind the best way to look at a system like this, is to consider that the system thinks that we are all bad, we are all spammers. So we need to leave as little room for technical error as possible or something that is not meant to be spam will be considered spam and the site will not rank, an no one will ever figure out why.
The 302 hijack:
This is an issue where the site shows in Google’s results pages but with someones else’s URL, namely a 302 redirect. Some sites are spammy about the 302 redirects and use it to make their sites appear larger so a link may be a URL in a link like domain.com/click/webfirm/b.123.c.456.html.
Google sees the URL above as a page, but it is really an application that passes a 302 directive. 302 means moved then found. Google for some reason cannot figure out the difference between the redirect.html page and the site it redirects to. So instead of crediting the site with a backlink, it actually thins the domain it is going to is on another domains internal page and it gives the site that links with the 302 directive credit for the real sites pages.
This effect is also called a “Domain Killer”, meaning that if you get hit with these your rankings will plummet. So we again, need to watch fr this, as this is something we have little control of. These sites that hijack other sites are what we called PageRank hoarders. They are selfish webmasters who believe that using a 302 is going to keep their PageRank. In fact what these sites really are are potential domain killers. Like it or not, both Amazon and Alexa pass 302 redirects to every site that is listed in the Alexa traffic portal.
Google is the only major Search Engine that does not handle 302s well, and the only major Search Engine with aggressive duplicate content filters. The problem with Google’s duplicate content filter, is that if we syndicate a competitors website, it may eventually get removed by the filters and never show up again in Google.