How to Avoid Duplicate Content Penalties
If you've been involved with search engine optimization (SEO) for even short periods of time, you've probably thought about duplicate content. And while you can fix problems on websites you manage, getting the owner of scraper-sites to stop their practices is next to impossible. The good news is that there is something you can do to increase the chances that your site will be treated as the original source of content.
Website scraping technology has become quite advanced to the point where websites can be filled with content with little need for user interaction. Many of these set ups simply subscribe to RSS feeds and then wait for new content to be published making blogs particularly vulnerable to being scraped.
The good news is that there is one piece of information that you know that scrapers can't possibly ever know — the exact time when you publish new content. This information is important because it gives you a small window in which to claim new content as your own. That is, if you can notify the search engines that your site has new content before another site can copy it and claim it as theirs, then the search engines are more likely to consider you the proper owner.
The best way to notify search engines is to “ping” them as soon as you publish content. I recommend using Pingoat. I also ping Google to let them know that my sitemap has changed. You'll first need to create an account and upload a sitemap. Pinging then becomes as easy as a click once you've bookmarked the correct page.
The final step in all of this is to confirm that your copy of the content is indeed being indexed before other copies. The easiest way I've found to do this is to set up an alert with Google. If you ensure that each of your posts includes something unique to you like your blog's name or your name, you can use an alert to notify you as soon as Google indexes the content. As long as your copy shows up first, you should be good.
When I follow the above steps, I get a notification from Google within an hour or so that my new post has been found. Shortly after that, I get a few more notifications from Google listing the scrapers that have also picked up my content.