by Alan Rabinowitz
For those of you who do not know, SEO Image is one of the most plagiarized websites. Our content is stolen and rewritten every day by new and novice SEO companies throughout the world.
One issue we have is that these novice SEO companies not only copy word for word, but some cause the same effect that the proxy search portals do. That is a duplicate content filter. For those of you new to my blog, I have been very anti-duplicate content filter since its unleashing in 2005 as an overly aggressive filter.
So, to take this further the proxy sites are ways that searchers can try and mask the IP they are searching from, as the proxy server will allow someone to access a site that has banned regions from accessing it. I do not want to get to technical with the proxy servers, you can read more at Wikipedia. The problem with proxy servers is that they cache websites that are searched and then allow search engines to spider them so that they can appear as larger websites (page spam) and rank better so that people can click on the paid ads.
The Google URL Removal tool is a sure way of removing proxy duplicates. Since we feel the duplicate content filter will remove most copies, the proxy search results concern us because they are used by Black Hat SEO’s to try and hurt other websites rankings.
There is one easy way to remove the proxy servers with Google’s Remove URL tool. That is, first you need to be able to deny IP ranges from accessing your website in either Windows IIS Administration, or htaccess for Linux Servers.
First Step:
- Find the proxy indexed in Google with your content
- Find the Reverse DNS using DNS Stuff to determine the IP we generally block the Name services and the IP by C Class (XXX.XXX.C-Class.XXX). If the IP does not work, try our Server Header Checker Tool.
- Using your .htaccess file or IIS Administration – deny access to the IP ranges by the C Class of the IP.
- Click the link in the Google Search Results and see if it returns a 403 Forbidden Code.
This is where it gets tricky, if you get the 403 code, then the site will no longer be duplicating you, however, if the site uses a frameset or iframe, then you will NOT be able to use the Google URL Removal Tool as it will see a 200 “Found” header directive and assume the page still exists.
Use the URL Removal Tool and check off “anything associated with this domain”. If the site does not use frames then you will get it removed, if it does have frames then google gets a 200 code and will NOT remove the site despite the frame. You can try to access the frame and submit that page, but it generally will not help.
All in all, the ability of proxy servers to hurt rankings is unknown. We believe it will influence some of the sites rankings, but may not be the full story. Another issue of proxy servers is that they can 302 hijack sites if they are set poorly.
We have not found any code that can ban proxy servers even ones that use nph-proxy.cgi. If you have any way to block proxy servers as a rule, then please leave a comment.
Hi Alan,
This is really a thinkable issue. I did not believe it before reading your article. Thanks for awaking us. This is really a valuable post . Thanks again. Keep on posting………