I am amazed at the number of supplemental pages that are caused by Google’s duplicate content filter. Good sites that do NOT understand that they are in fact having duplicate content problems, nonetheless, here’s how to resolve them.
We have seen good quality sites loose good internal pages because they change currency, have a dynamic naming convention, use a similar contact form for hundreds of products/listings, and have been duplicated by competitors.
These are not sites trying to spam Google with “Page Spam” they are simply one of the thousands and probably millions of troubled sites falling victim to a filter that most webmasters agree is a tad bit “overly aggressive”.
I now tell my clients to think of “Some” Search Engines as a Spam Paranoid Grandma.
This sounds funny, it is, but it is also somewhat true. The SE’s must determine spam from credibility so they implement a filtering system that tries to deter spam. These filters have an accuracy percentage. Meaning they know innocent sites will be hurt in order to punish the majority of bad apples. What the acceptable percentage is is unknown.
As a webmaster, I have seen too many good quality sites hurt by duplicates, and its now becoming a practice for effective competitor removal (negative SEO). Filters tend to penalize relentlessly, meaning once hurt, the site or page may be dead forever. It’s the kiss of Google Death.
I will admit I spend too much time talking about this filter. Sorry Matt, but I believe this ones is hurting too many decent sites and increasing the size of the supplemental index. It simply does not accurately determine the original content well enough. It guesses and sometimes attributes it to the more authoritative site.
Although some of this is the filters programmers faults, some webmasters are simply ignorant to the whole duplicate content filter.
To me, this is the biggest headache filter as it really hurts websites rankings and acts as a penalty. You must be very careful if you develop dynamic sites and assure that there is never a way to reproduce the same page more than once or give multiple URL paths to get to the same page.
1) Determine the pages.
2) Determine if a disallow in robots.txt can be used and makes sense.
3) Use a CANONICAL tag.
4) Avoid using different URLs for the same page whenever possible.
5) Assure mod rewrites are working properly and there is no way to get there dynamically if there is then disallow access via robots.txt.
6) Disallow robots via the metas.
7) Use “if else” statements if changing currencies that will add a robots “noindex” to the page, or the proper canonical.
8) Create new named URLs for the old pages after removing the duplicate pages. 301 redirect the old pages.