A bit about the author...

Khalid is a web designer/developer; who is on a mission to make stuff look beautiful, and feel interactively alive on the web. He has been obsessed with design since as long as he can remember, and spends his leisure time drawing and sketching.

share this post read comments add a comment

How to resolve duplicate content issues and the canonical link element explained

Posted on 19th of February 2011, in Articles, by Khalid Majid Ali

The question most asked by web masters, search engine optimizers and web developers these days is how to resolve duplicate content issues?

Today we'll answer this question in detail and learn about a relatively new link element, whose purpose and use can be understood in the crux of this article.

We’ll begin by learning what duplicate content issue really is, why it should be avoided and how to resolve it.

What is duplicate content?

To answer this question, we’ll assume that we own a website called www.ourwebsite.com, with its index pointing to index.html, just for an example. Now when our website is crawled by search engines, all the allowed content, usually text and images, available on our website will be indexed by them.

It’s our common understanding that the following list of links refer to the same webpage.

http://www.ourwebsite.com
http://www.ourwebsite.com/
http://www.ourwebsite.com/index.html
http://ourwebsite.com
http://ourwebsite.com/
http://ourwebsite.com/index.html

The problem is that the search engine crawlers treat each of the above mentioned links as a separate webpage, because each of them has a different URL.

When the crawlers index the content of each of these URLs they’ll mark our website for having duplicate content, because to them the exact same content is replicated over six different pages on our website.

What we need to do over here is redirect each of the above mentioned links to a single webpage, for example to “http://www.ourwebsite.com/”.

Why should we avoid duplicate content?

Well we should avoid duplicate content, because it hurts our webpage’s reputation. To understand how this actually happens we must learn a bit about page rank and understand how it works.

We won’t go very deep into the page ranking discussion in this article, all we need to know is that every webpage available on the internet has a page rank or PR. It’s a numeric value between 0 and 10, which the search engines allocate to a webpage depending mainly upon the amount of external links pointing to that page and some other more complex factors.

To make the concept of page rank simple to understand let’s assume the search engines as a teacher, our website as a student, each page of our website as a test and the page rank as the score that the teacher gives the student on a particular test. The maximum score the student can achieve on a test is 10 and the minimum is of course 0.

Why is page rank so important?

Let’s try to answer this question with an example; suppose there are two pages on the internet which discuss the statistics about 1975 sports coupe’s and the content on these pages are very similar. Now if someone searches for 1975 sports coupe’s in a search engine, then from the two web pages the one which has more page rank will be listed higher in the search results.

Note: The above was just an example to understand why page rank is important; search results’ listing priority depends upon a number of factors other than page rank. One thing is for sure though, the higher a webpage’s page rank the better!

How does duplicate content affect page rank?

If the search engines consider that a particular webpage has duplicate content because its accessible via 6 different URLs and they treat each one as a separate page, then they rank each webpage separately as well.

Think of this as the Horcruxes that Lord Voldemort created. Each one split up his soul and made him weaker. Duplicate content weakens up the reputation of a webpage the same way.

This simply means that if someone links to our webpage like “http://ourwebsite.com” and someone else links to it as “http://www.ourwebsite.com/index.html” then both the back links point to a different version of our website. We lose precious back link count for our webpage this way.

All in all, our preferred page “http://www.ourwebsite.com/” does not get its due respect and reputation because of this. That is why duplicate content issues should be resolved.

How to resolve duplicate content issues?

First of all we should checkout if our website is accessible via multiple versions of URLs, with and without www for example. If it is, then we’ll get down to fixing it.

Basically what we need to do is, make our webpage links accessible by only one possible URL. To do that we’ll need to 301 redirect all unwanted versions of URLs to our preferred URL.

In other words, if our preferred URL is “http://www.ourwebsite.com/” and someone links to the webpage as “http://ourwebsite.com” then our web server should redirect the request to “http://www.ourwebsite.com/”.

There are always special cases though. Maybe we don’t have access to reconfigure how our web server handles a request for example. Well that’s where the canonical link element comes in.

What is a canonical link element?

It is a link element that goes in the head section of our webpage, just like a stylesheet link element. This link element gives the robots our preferred URL of the page they are currently on.


<link rel="canonical" href="http://www.ourwebsite.com/" />

What actually happens when we add the canonical link element to our web page is that even if someone has linked to our webpage as “http://ourwebsite.com”, the robots will give the priority to the URL mentioned in the href attribute of the canonical link element. This solves a lot of problems about the duplicate content for us.

However if you have access to your web server’s configuration it is recommended that you resolve as much of the duplicate content issues as you can via 301 redirects before using the canonical link element.

The canonical link element is useful for us in many other ways. Matt Cutts, who is a Google Engineer, has discussed about the canonical link element in an extremely informative video presentation, do watch it to learn more about how this element can be used.

Conclusion

Today we learnt about the duplicate content issue and how to resolve it, we also discussed page rank and found out why it is important. Finally we also looked into the canonical link element and its uses.

Hope you found this post informative. Thank you for reading.

top about the author read comments add a comment

Comments

Samuel Webb

20th of February 2011, 2:14 AM

excellent post
insomniac

20th of February 2011, 8:49 AM

Wow! do all search engines support canonical tag?
Khalid Majid Ali

21st of February 2011, 12:39 AM

@insomniac All the major search engines Google, Yahoo, Bing and Ask support the canonical link element officially.
SEO engineer

1st of September 2020, 9:08 AM

This is informative