The theme of this post is: don’t create multiple pages, subdomains, or domains with substantially duplicate content. Almost every day when I visit new blogs on the internet I spot duplicated content. The most common instances I witness are bloggers, who set up free blogs on WordPress.com, whereon blogger initiated advertising and duplicate content are not allowed, who then go on to create a mirror site on a free Blogger blog containing all the same content, so they can benefit from the niggardly income provided by Google Adsense. The second most common experience I’m having is witnessing is published articles from article directories duplicated on multiple sites. The third most common experience I am witnessing is very similar content on multiple sites that differs only in that a few words or paragraphs have been added to the core text.
What constitutes duplicate content?
Duplicate content is content that can be accessed on more than one URL.
“Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:
If your site contains multiple pages with largely identical content, there are a number of ways you can indicate your preferred URL to Google. (This is called “canonicalization“.) However, in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic. Deceptive practices like this can result in a poor user experience, when a visitor sees substantially the same content repeated within a set of search results.”
Why is duplicate content an issue?
One of the biggest issues with SEO is duplicate content. If search engine spiders can’t tell which version of a web page or document is the original or canonical version, then the consequences will be less than ideal search visibility. Most duplicate content is created by blog scraping sploggers who steal content by subscribing to RSS feeds. Some duplicate content is created by the author’s of the content and the latter is what this article is focused on.
Search engines are designed to provide the most relevant results to those who use them. When it comes to a blog not making the ascent to the top of the search engine rankings and SERPs (search engine pages results) the issue of duplicate content arises. Search engines like Google, Yahoo, Bing, and Ask have developed tools and filters that locate and remove web pages containing duplicated content, in order to deliver the most relevant and timely results to searchers. Not all duplicate content has to be identical to be spotted and removed a search engine crawler. But web pages with similarity over of over 60% will definitely be detected and impede any ranking success a blogger is aiming to enjoy.
Matt Cutts of Google introduces the canonical link element
Whenever content on a site can be found at multiple URLs, it should be canonicalized for search engines. This can be accomplished using a 301 redirect to the correct URL, using the rel=canonical or in some cases using the Parameter handling tool in Google Webmaster Central. The ways of properly handling cross-domain content duplication are found in Handling legitimate cross-domain content duplication on the Official Google WebMaster central Blog.
Get with the program, please!
On my regular read around today I came across the following comment relating to traffic generation and link building.
“Submit some of your more popular posts to article directories in order to gain greater exposure”
Let me just make myself 100% clear on this statement….
It is false, do not submit any content from your site/blog to article marketing directories, if you do it will be labeled duplicate content and no doubt your page will be thrown into the supplementary index.” — Tim Grice in SEO – Some Common Newbie Mistakes
1. It seems clear to me that those creating duplicate content mirror blogs on WordPress.com and Blogger (blogspot) blogs are motivated by greed, and fall into the group who are deliberately duplicating content across domains in an attempt to manipulate search engine rankings and/or secure more traffic. I report all such sites when I encounter them.
The types of blogs allowed and not allowed on the WordPress.com blogging platform and the Terms of Service prevent using a WordPress.com blog as a publicly available and indexed duplicate content blog. WordPress.com Staff will suspend or delete all duplicate content blogs reported to them. If you have exported your content out of a blog on another blogging platform such as Blogger, Blogger, Israblog, LiveJournal, Movable Type, Typepad, Posterous, Spaces, Tapuz ,Vox, and Yahoo! 360, and then imported it into a WordPress.com free hosted blog, change the visibility on the original blog to “private” so there will be no duplicate content issue. If you don’t do that then my understanding is that the first content to be indexed will be considered to be the original, and all other copies will be considered to be duplicates.
2. Ezinearticles and most article directories so accept article(s) that have been previously published elsewhere, provided you are the unique person who holds copyright to the article. However, Hubpages, Buzzle, Ehow and Knol do not allow duplicate content. They want to only unique content on their sites and will delete your article(s) and your account if you persist. It seems to me that anyone who can write can also rewrite. So smart bloggers are not duplicating content and having content in article directories, etc. out place their blog content in the SERPs
3. Reputable blog directories do not allow duplicate content sites to be registered. If and when they do slip in under the radar and are reported to site Admin they will delete the site from their directory.
4. When RSS syndicating content, create different versions of the same article that you want to syndicate, rather than posting the same article everywhere.
Further reading: Six Easy Ways to Eliminate Pesky Duplicate Content
There are many free plagiarism checkers you can use online. Copyscape is a free plagiarism checker. The software lets you detect duplicate content and check if articles are original.
plagium (beta) – Track plagiarism by pasting your original text.
I require the use of search engines to do research for my contracted work and prior to creating and publishing blog posts. And, I resent going through screen after screen of duplicated content results presented to me in the SERPs. I think it is a good strategy for search engines to penalize those sites with duplicate content by omitting them from the search results. Google’s algorithm will continue to be adjusted over time to fit one simple goal: return the most relevant, helpful pages for any particular search. Really? Then why Google isn’t doing a better job? Duplicate Content in the SERPs Sucks!