Just recently I spoke with someone who had been outsourcing their blog maintenance to a company and wasn’t seeing results. I did a little digging on the company and found that they were creating blogs and distributing the same articles across multiple clients. Not only was the client upset to find that they weren’t receiving original content, but were also concerned about their search rankings and possible penalties. While this particular situation isn’t as typical, it inspired me to write a blog on a troubling issue most content marketers face: duplicate content.
Most often, duplicate content is a result of content scrapers ‘stealing’ original content without permission. It can be extremely frustrating to see your content across the web without getting credit for it and even more frustrating to see duplicates of your content ranking above your original in the search results. Spotting duplicates isn’t always easy, but luckily there are some free tools available to help.
How to spot it
Duplicate Content Checkers
You can look for duplicates of your articles with a tool like PlagSpotter or CopyScape. Both of these softwares can help you identify duplicate content and can be upgraded for a fee if you need to check more than one URL at a time.
Both Wordpress and Google Analytics offer a Trackbacks feature which notifies you when someone has linked to your blog. If you've included internal links or a link back to your website in your blog article, Trackbacks can also help to identity when your content has been scraped by another site.
Webmaster Tools is also another great resource for pinpointing scraped or duplicated content, along with other issues on your website. By downloading a list of links, you can go through and determine if any of them have resulted from scraped content. You can find a list of links by going to the ‘Search Traffic’ section of your Webmaster Tools account.
You can also set up a Google Alert for your blog’s title, so whenever that title is spotted across the web, you will be notified by email. Just make sure to put your blog’s title in quotations for the search query, so that it returns results that have that exact phrasing.
There aren’t a whole lot of options when it comes to getting duplicate content removed, but if you are adamant about the removal of the duplicate content, there are two methods you can try.
• Contact the host and ask them to either remove the duplicate content entirely or add a noindex tag so that it won't come up in the search results.
• In more extreme circumstances, you can request for Google to remove the duplicate content from their search results by filing a request under the Digital Millennium Copyright Act.
Since content scraping is fairly common, and can be very difficult to remove, the best practice is to take steps to prevent it from taking credit away from your website or outranking you in the search results.
Get it indexed before someone else does
Sometimes scrapers take your content immediately after it’s published. If your content is indexed first, it is more likely that Google will consider it to be the original. Google Webmaster Tools has a great little feature to allow you to submit a URL for indexing. This of course does not guarantee that your URL will be indexed immediately, but it does help to speed up the process in many cases.
Go to ‘Fetch as Googlebot’ in your Webmaster Tools Account and ‘fetch’ the new URL. When Googlebot has finished fetching the URL, you will see a ‘submit to index’ prompt which you can then click. Typically, it takes around 12-14 hours to see your page indexed in the search results.
Include a link to the original source
If your content is scraped, having links back to your website can help promote backlinks and can also help users find the original source. In most cases, the sites that scrape content are lower quality sites and therefore provide lower quality links, nevertheless, backlinks can help give you a little boost in the search results (as long as the sites aren’t pornographic or pure spam). Always include a link back to your website, whether it’s a previous article or your homepage. Just make sure to use natural anchor text like your company name or the article’s name, so that your backlinks don’t appear spammy to the search engines.
Use rel="canonical" link
Canonical links are useful when there are sets of duplicate content, typically duplicate content that appears internally. For instance say you sell a product that comes in a variety of colors with a different page for each color. You would use the canonical link to specify the page that is most authoritative and that you would want to rank in the search results above the others. The same applies to content across different domains. Include a canonical link on your blog article, pointing to itself, so that if your content is scraped, the original version will appear to be the preferred version. To add the rel=”canonical” link, add the below snippet to the <head> section of your website.
<link rel="canonical" href="http://www.yoursite.com/your-original-article.html"/>
Nobody likes to see their content on another site, especially if it is outranking your original article. Unfortunately, the battle with scrapers and duplicate content is an ongoing one, but the good news is that the search engines are working harder to come up with a resolution. In the meantime, it’s up to you to stay diligent about your content’s whereabouts and use preventative action so that you can get the credit you rightfully deserve.
If you have any questions about duplicate content or how to deal with it once you’ve found it, feel free to contact me.