Home (All Topics) → Technology → Internet → Duplicate Content and SEO

Duplicate Content and SEO

By Marios Alexandrou, Toronto SEO Consultant.

Note that this is an update of a May 2, 2006 article with more data and more commentary.

Several months ago, a subsidiary of my current employer created a very robust recipe search tool with over 10,000 recipes. The availability of this tool resulted in some excitement from various other business groups because they wanted to incorporate it in to their websites. When I heard this I thought I should monitor the situation to see what would happen with these different search tools because the sites were, in effect, deploying duplicate content and which the general belief is that search engines don't like.

Of course the question is, what does it mean that search engines don't like duplicate content? My take is that when they encounter duplicate content, the search engines will want to make sure only one copy is shown in the search engine results for any given search term. After all, it does the user little good to display 5 different sites all with the exact same content. Confirming that two pages are duplicates is easy. That's the sort of thing that computers are good at. Even if the sites have a slightly different look depending on the branding, I think search engines can still determine that the real content of a page is a duplicate.

However, the challenge that search engines face is figuring out which copy is the original. This is important so that the original is displayed in search results while the duplicate is discarded. From what I've read, this decision is made by considering a number of factors (all speculation of course) including which copy was found first, which copy comes from an older site, and which copy comes from a more “respectable” site.

This duplicate content that I'm reporting on should provide some good insights. Why? Because these sites aren't attempting to spam the search engines. Instead, they're just typical efforts by business units to brand something on the web with no consideration of the SEO consequences. You might also ask why, as someone involved with SEO, I'm not doing anything about it? Partly because the sites are eventually going to disappear as they are replaced by yet another recipe search and partly because examining the data should be quite educational.

So without further ado, here's the data I've collected over the course of a few months. This first chart shows the number of pages for a particular site in Google. Note, I labeled the sites with a letter rather than the actual domain. Other things to keep in mind:

All sites are sub-domains off of a parent site i.e. the parent site is www.something.com and these sites are subdomain.something.com.
The parent site of site A is the oldest and most optimized. It ranks well with Google and other search engines.
Other than site A, all the parent sites are well indexed, but haven't had all SEO issues addressed.
Site C is considered the flagship version of this recipe search tool i.e. it gets the offline and online press.

Date	Site A	Site B	Site C	Site D	Site E	Site F
10-Mar-06	795	36	33,600	11,000	199	13,700
21-Mar-06	12,700	40	21,700	99	653	10,500
29-Mar-06	12,000	27	23,300	73	9,810	967
12-Apr-06	14,800	38	15,000	27	10,100	885
17-Apr-06	16,600	39	40,700	35	15,400	12,400
2-May-06	30,900	28	25,900	31	11,700	13,000
5-May-06	15,200	25	19,400	19	9,880	11,200
18-May-06	84,800	25	24,100	17,700	9,360	11,400
22-May-06	107,000	31	26,200	17,600	836	788
30-May-06	14,500	28	10,700	17,600	526	17,200
12-June-06	11,000	24	9,1200	21,600	442	20,500

There are some interesting things happening here.

Even though Site A was poorly indexed at the beginning while Site E, F, and H were well indexed, Site A still managed to have its content accepted by Google.
Some strange things happened near the end of May in that Site A had more pages indexed than were actually on the site. I thought there might be a URL parameter issue, but it looks like the problem fixed itself.
Even though we consider Site C to be the flagship version of the search tool, Google has most recently decreased the number of indexed pages and seems to favor Site D and Site E.
Site E seems to be the clearest indication of what happens to a site of duplicate content in that it has for just under a month had very few pages indexed.
And yet contrary to above statement, Site D went from having next to no pages indexed to now being the leader.

Even though I've made some observations, I'm finding it difficult to draw any solid conclusions. Part of the problem may be that I have only 3 months worth of data. I would've expected 3 months to be long enough for Google to figure things out. Perhaps Google is giving each site the benefit of the doubt at this time.

(2 votes, average: 2.50 out of 5)

1 Comment

Rod

I realise this is old, but did you ever draw any further conclusions on this? Did the rankings stabilise in the subsequent months? I find there is a lot of conflicting advice about duplicate content, so it would be nice to have some actual data to base an opinion off.

Reply

Duplicate Content and SEO

1 Comment

Leave a Reply Cancel reply

Media Mentions (Web and Books)

Other Technology Topics