Google Remove URL – One for the Good Guys!

SEO Image
December 1, 2006 by Alan Rabinowitz

For those of you who do not know, SEO Image is one of the most plagiarized websites. Our content is stolen and rewritten every day by new and novice SEO companies throughout the world.

One issue we have is that these novice SEO companies not only copy word for word, but some cause the same effect that the proxy search portals do. That is a duplicate content filter. For those of you new to my blog, I have been very anti-duplicate content filter since its unleashing in 2005 as an overly aggressive filter.

So, to take this further the proxy sites are ways that searchers can try and mask the IP they are searching from, as the proxy server will allow someone to access a site that has banned regions from accessing it. I do not want to get to technical with the proxy servers, you can read more at Wikipedia. The problem with proxy servers is that they cache websites that are searched and then allow search engines to spider them so that they can appear as larger websites (page spam) and rank better so that people can click on the paid ads.

The Google URL Removal tool is a sure way of removing proxy duplicates. Since we feel the duplicate content filter will remove most copies, the proxy search results concern us because they are used by Black Hat SEO’s to try and hurt other websites rankings.

There is one easy way to remove the proxy servers with Google’s Remove URL tool. That is, first you need to be able to deny IP ranges from accessing your website in either Windows IIS Administration, or htaccess for Linux Servers.
First Step:

  1. Find the proxy indexed in Google with your content
  2. Find the Reverse DNS using DNS Stuff to determine the IP we generally block the Name services and the IP by C Class (XXX.XXX.C-Class.XXX). If the IP does not work, try our Server Header Checker Tool.
  3. Using your .htaccess file or IIS Administration deny access to the IP ranges by the C Class of the IP.
  4. Click the link in the Google Search Results and see if it returns a 403 Forbidden Code.

This is where it gets tricky, if you get the 403 code, then the site will no longer be duplicating you, however, if the site uses a frameset or iframe, then you will NOT be able to use the Google URL Removal Tool as it will see a 200 “Found” header directive and assume the page still exists.

Use the URL Removal Tool and check off “anything associated with this domain”. If the site does not use frames then you will get it removed, if it does have frames then google gets a 200 code and will NOT remove the site despite the frame. You can try to access the frame and submit that page, but it generally will not help.

All in all, the ability of proxy servers to hurt rankings is unknown. We believe it will effect some of the sites rankings, but may not be the full story. Another issue of proxy servers, is that they can 302 hijack sites if they are set poorly.

We have not found any code that can ban proxy servers even ones that use nph-proxy.cgi. If you have any way to block proxy servers as a rule, then please leave a comment.

Add A Comment

4 thoughts on “Google Remove URL – One for the Good Guys!

  1. More SEO says:

    Hi Alan,
    This is really a thinkable issue. I did not believe it before reading your article. Thanks for awaking us. This is really a valuable post . Thanks again. Keep on posting………

  2. Thanks for the very informative article, our main site has been “proxy-hijacked” and we are currently battling to get it back at its normal position.

  3. Clare Ross says:

    I am noure if this is entirely relevant or will be of use to you, but I recently installed LFD CSF security on my server and was immediately concerned at the number of sites getting blocked access, getting their ips blocked, due to too many connections.

    I was thinking that when people are trying to do a site over as you explained they try to make multiple connections, so I set a limit to the number of connections possible via LFD, so if they try over a certain umber they get blocked and have found attempts are slowing down.

    Host support thinks these were people scraping my sites, I am hoping the problem will be over.

  4. Alan says:

    Clare Ross

    This post is referring to other sites that scrape content and are potential triggers for doing what I call “Regional Replacement” (changing the sites ranking by associating the content with a different region) – OR – Duplicating the content in hopes of devaluing the site in Google search as the effect does not work in Yahoo and MSN. The proxies being indexed is intentionally done by a third party as there is no way to physically spider the proxy pages unless linked by another site as a Google submission.

    An attack on a server from simultaneous connections is usually called a DOS Attack (Denial of Service) and the goal there is to take the server down and not the Google Rankings.

    I did notice that once I banned a specific region from one of our sites, that region (according to Alexa) now accounts for 35% of the traffic, so I have to wonder what is going on and how and why.

    There are scraper sites that send out spiders to copy content so they can get Adsense or other paid search clicks, the goal is to steal relevant content hoping Google will find their sites more related and get them to rank so people will click the paid search ads, ironically its usually Google Adsense. So this is a bit of a different type of scraper then the proxy sites. I would recommend you permanently ban them at the server level. You most likely will not be able to remove them from Google with the URL removal tool as the content scraping method is not always live its more of a copy.

    The other scraper theory for Negative SEO is to try and associate the site with low quality websites (by scraping content and linking back). This makes Google think you are a spammer and potentially auto-penalize you or set you up for manual review.

    You can consider filing a DMCA complaint if there is enough content stolen, but that will NOT get rankings back (if lost) and can be time consuming. There are other methods of defense but I do not believe anything is full proof.

    Since we do not own the search engines and since Google never responds to these situations, we have to assume that there can be evidence for an against them and they can be washed out as theory. What you cannot claim to be theory is intentional indexing of sites in multiple proxies and their sudden drop in rankings at the same time. This is more common then Matt Cutts and Google will admit to or even address. If the effect is minor, then we can call it “Stale Content” which is common for business sites that get scraped and do not update their content frequently.

Comments are closed.