You are on page 1of 15

Is your E-commerce System Harming Your Search Engine Rankings

www.altruik.com

Hamlet Batista
Chief Search Strategist hbatista@altruik.com

Table of Contents
What Is Duplicate Content? How Duplicate Content Affects Your Search Engine Rankings How To Put An End To Duplicate Content So You Can Reclaim Your Ranking When Duplicate Content Is Not Really Duplicate At All Sound Like Too Much Manual Labor? Will You Profit From Addressing Duplicate Content Issues? Heres What You Should Do Next 13 14 11 12 8 5 3

s an online retailer, your search engine strategy is your business strategy. Have you noticed your search engine rankings slipping away recently? Do you wonder what the cause might be? It is critical that every page selling your products ranks as highly as possible in search engines like Google. Thats why it is important that you optimize your site for search engine spiders, especially if you are using a CMS (content management system). There is a hidden dangeran issue that affects the majority of e-commerce websiteswhich most business owners dont know about it until it is too late. That problem is duplicate content. If you have multiple copies of the same page, different URLs that point to the same content, and navigation systems that track your users, there is a good chance that you have an issue with duplicate content, too. Most Content Management Systems, as useful as they are, surprisingly are not designed with SEO in mind. Your CMS features tools that make finding products easier for visitors to your website. But those same features that duplicate product pages into multiple categories often make it difficult for Google to crawl, index, and rank all of the pages on your site.

Copyright 2011 Altruik, Inc.

Duplicate content causes serious problems because it: Weakens the rank of your most popular pages Sends Google on a wild goose chase, causing it to abandon your site altogether Blocks large portions of your website from getting indexed Prevents your most profitable pages from reaching the top of Googles rankings Cripples your best link-building efforts

Duplicate content problems are like leaky faucets. As more sites link to your duplicate URLs, the reputation and rank of your top-selling product pages go down the drain. Products that once ranked very high suddenly begin tumbling down the rankings, and your competition gains the upper hand. The question now is: how can you identify duplicate content and patch up the leaks that are ruining your search engine rankings? Keep on reading because were going to teach you what most people dont know about the mess their CMS is leaving behind. Your priority is to patch these leaks before they drown your entire online business. With the right tools, you can build an even stronger search presence.

If you have a duplicate content problem, huge portions of your website might not be in Googles index.

Copyright 2011 Altruik, Inc.

What Is Duplicate Content?


First, the basics. Duplicate content is any page on the Internet that is either exactly the same or nearly identical to another page. Google compares the text of multiple pages to determine a match. If the written content is exactly the same, or almost exactly the same, Google considers the newer page to be duplicate content. Most duplicate content is created when your CMS allows visitors (and Google) to access the same page from different URLs. Lets say your online store has a category for shoes and another category for all products in the color black. The same pair of black shoes can be accessed from two different category combinations, one in which the user selects shoes first, and another in which the user chooses black first.

These pages are almost identical. The URLs are different but lead you to the same product, Jessica Simpson Womens Leve Black Leather shoe. In each example, users selected various categories in different orders and were able to access the same content via different paths. Copyright 2011 Altruik, Inc.

Googles search engine robot crawls your website like a nosy visitor, following each link for every category. It will find the same page twice, once under one combination of categories, and another under the other. You dont actually have two copies of the same page, but your CMS setup certainly makes it look like you do.

Duplicate content is also created when:


You use multiple subdomains. Google thinks you have duplicate content when you put the same page on http://example.com as you do on http://www.example.com. That www. makes a big difference to Google. Your CMS creates separate pages for different product colors. Google cant tell the difference between an image of a blue shoe and a red shoe (it relies on textual descriptions). It will conclude that one of these pages is a duplicate.Your CMS dynamically generates pages as your users click on links. A good example of this is a calendar that creates a new page every time you click on the next month link. You, or people linking to your pages, add extra parameters to URLs (sometimes for tracking), creating multiple URLs that direct Google to the same page over and over again.

As you can see, duplicate content can arise from a variety of sources. Each of these is another leak in your faucet. It creates a number of nasty problems, both for Googles search engine robot and for other search engines. Sometimes it sends the robot on an endless chase that Google eventually abandons, and at other times, it simply dilutes the reputation of all your affected pages. When only a small portion of your site makes it into the search engine rankings, your overall ranking suffers.

Page reputation is diluted with the same content is accessible through multiple URLs. You can recapture reputation and prevent duplicate content by consolidating non-canonical versions with 301 redirects. Source: Googles SEO Report CardGoogle Webmaster Central In the next section, well show you how duplicate content prevents your most profitable pages from making it to the top of Googles rankings. Copyright 2011 Altruik, Inc.

Glossary
301 Redirect 200 An HTTP status code. Automatically redirects users to a specific URL A successful request, content is returned

How Duplicate Content Affects Your Search Engine Rankings


Now were ready to see how duplicate content affects Googles impression of your content, wreaking havoc on your search rankings in the process.

How duplicate content dilutes the ranking of your top-rated pages


Lets say you just wrote a popular article that went viral. Would you rather see the entire article getting a million views, or would your prefer to split the article in two and assign 500,000 views to each section? If you chose the former option, youre on the right track. As a single page receives more views, it increases the chances of receiving natural links. More people share the page, blog about it, and link to it.

You want your website listed in the prime real estate of the results page. Splitting links will dilute rankings of your strongest pages. Copyright 2011 Altruik, Inc.

When you have multiple versions of the same article, video, or page, Google splits your reputation between all of the pages. Your duplicate pages siphon off a large portion of your inbound links, and it takes longer for your article, video, or page to rank highly in Google. No matter how many links you get, some of them are going down the drain.

your inbound links should go to the same page. In Googles eyes, that gives you 100% of the reputation.

How duplicate content sends Google on a wild goose chase


Another problem to consider can be even more tragic for your search rankings. What might happen if Google decides that your website is composed mostly of duplicate content? The short answer is that it will stop indexing your pages and move on to other websites. Here is what Matt Cutts, head of Googles Webspam team, has to say about duplicate content: Imagine we crawl three pages from a site, and then we discover that the two other pages were duplicates of the third page. Well drop two out of the three pages and keep only one, and thats why it looks like it has less good content. So we might tend to not crawl quite as much from that site [T]he fact that you had duplicate content and we discarded those pages meant you missed an opportunity to have other pages with good, unique quality content show up in the index. There are a number of scenarios in which Googles robot will give up crawling your website, leaving vast numbers of pages completely out of the index and your site flagged as mostly spam. Here are some of the most common caused by your CMS:

How duplicate content cripples your best link building efforts


Consider another example. A Doggy Care website using a CMS creates two URLs for dog bone under the category food and another under the category treats. To a search engine, the result is once again duplicate content. Thats only the half of it. What happens when customers really like the dog bone and want to tell others about it? They link to it on their website. However, because there are two different pages created by the CMS for the same dog bone, they might link to either one of them. A product that would have received 100 links only receives half that. The rest leak over to the duplicate page. When youre trying to rank highly in Google, you must avoid wasting your links and reputation on duplicate pages. If these duplicates make it into Googles index, they will almost certainly be filtered out of the rankings. 100% of

Copyright 2011 Altruik, Inc.

Source: Google's SEO Report CardGoogle Webmaster Central Your CMS creates a calendar that generates a new page for a new month every time you click on the next month link. Because your website keeps generating a new link every time Googlebot follows the next month link, Googlebot keeps following this link as long as it can and eventually times out. Your website features a guided navigation shopping cart with categories for different brands and types of products. Because the products and categories are linked to each other (often in very complex ways), Googlebot keeps following the links in circles until it times out. Your website uses a session ID in its URLs to track users who have cookies disabled (jsessionid is a common example of an in-URL session ID that gets indexed as duplicate content). If these IDs are present in the path_info portion of your URL, they are particularly dangerous. This last one can be particularly nasty. When a search engine bot crawls the site, it acts like a user with browser cookies disabled. Each time Googlebot requests a page, it is given a new page with a new jsessionid. This quickly causes the bot to see millions of pages that are identical, differing only in the URLan infinite space that Googlebot treats as duplicate content. Once Googlebot understands that is going in circles (or down an endless drain like the calendar example), it concludes that your site is composed mostly of duplicate content and stops crawling your website. This is a very bad thing, and it can cause large portions of your site to go unnoticed. You can make vast improvements in your search engine rankings by tackling just this problem alone. Now that you understand how duplicate content can harm your search engine rankings, we want to show you what you can do to stop your CMS from creating so much of it. Youll be happy to know that all of these problems can be solved, and you can use automated tools to help you handle most of them. Copyright 2011 Altruik, Inc.

How To Put An End To Duplicate Content So You Can Reclaim Your Ranking
As you have already seen, duplicate content problems happen all on their own. If you dont do something to address them before they affect your ranking, your competitors will gain the edge. There are solutions to duplicate content problems and well take a look at the how to solve the most dangerous ones.

Is Your Content Accessible From Multiple Subdomains?


As we discussed earlier, when your website is accessible from multiple subdomains (for example, both example.com and www.example.com), Google treats the content on one of the subdomains as a duplicate. It can also happen when your CMS uses multiple URLs to point to the same content. If Google follows the link http:// www.dogtoys.com/ chewybone.php and http://www. dogtoys.com/bones/ chewybone.php to the same page, Google will index a duplicate page for one of the URLs. But the fix is relatively easy. You just need to tell Google

which subdomain contains the original source material. There are two ways to do this: 1. Implement 301 redirects to send people to the right subdomain with the original content. 2. Or use the Googles Webmaster Tools to choose which domain contains the original content. This process is sometimes called canonicalization.

When this page is selected in the search engine results page, users are automatically directed to the canonical URL. Copyright 2011 Altruik, Inc.

Once you have indicated to Google where it can find the original content, it will no longer index your subdomain. Because your primary (canonical) page will be the only page that can get links and reputation, it will start performing much better in the search engine rankings. Congratulations: youve just fixed one of the leaks in your faucet.

color or the image, you need to indicate this to Google so that it does not conclude that you have duplicate content. You can do this by using the rel=canonical link tag on the pages with the near-identical content. Make sure you place this tag somewhere in <head> section of these near-duplicate pages, just as you would with meta tags. Heres an example. Whenever you use this tag, you are telling Google that the current page is either a duplicate or a near-duplicate, and the original page can be found at the address you have specified. Do your URLs contain extra parameters for tracking and sorting? They might accidentally convince Google that you have a duplicate content problem.

Are some pages near duplicates of others? What to do when your product descriptions only differ by few words.
This problem usually affects online retailers who sell many different versions of the same product. Perhaps you sell a golden chocolate basket, a silver chocolate basket, and a bronze chocolate basket. If the only difference between one product description and the next is the

Some shopping carts add parameters to your URLs for the purposes of sorting, dividing products into pages by category, and tracking users. Googles search engine robot unwittingly follows all of these URLs, and it keeps finding more duplicate content. If you dont tell Google which parameters to ignore, Googlebot will keep spinning its proverbial wheels. Heres what you can do:

Copyright 2011 Altruik, Inc.

Google Webmaster Tools allows users to define what parameters Google should ignore when crawling a website.

How to stop Google from going on a wild goose chase.


Sometimes Google finds large sections of your website that contain links to pages with no original content. This is called the infinite space problem because Googlebot gets stuck in these sections, continually crawling the same series of dynamically generated pages or URLs with session IDs and tracking parameters, over and over again. As we discussed, often the culprit is the jsessionid parameter. Thankfully, there is a way to stop it. Google knows about the infinite space problem, and will tell you if your website has this issue when you log in to Google Webmaster Tools. Specifically, it will list which links lead to an infinite space, and offers a few tips to patch things up.

Once youve found the links that lead to an infinite space, do one of the following: Set the rel attribute in the suspicious link to nofollow. When you do this, your new link should look like the following:
< a href=http://www.calendar.com/nextmonth.php rel=nofollow>next month</a>

Block the infinite space URLs in your robots. txt file. Make it impossible for search engines to extract these URLs. You can do this by hiding them within JavaScript. Now that you have the tools to clean up duplicate content, in the next section well consider a few important cases where duplicate content is not only acceptable, but necessary.

Copyright 2011 Altruik, Inc.

10

When Duplicate Content Is Not Really Duplicate At All


Sometimes you end up with exact duplicate pages for legitimate reasons. This is no crime, of course, but it does require you to let Google know so that your site may be indexed appropriately by the search engine robot. It also prevents your website from being flagged as mostly duplicate content. Heres the fix: In this case, there is nothing to worry about. When hosted on different international domains, search engines like Google do not consider the same content as duplicate content. That said, the issues concerning subdomains that we discussed previously also apply to your international websites. That means you will have to go through the time-consuming task of canonicalizing your URLs so that they all point to the same international pages, just like you did on your home website domain.

Use a 301 redirect if you have duplicate pages that just cant be avoided.
Using a 301 redirect not only sends your users to the canonical page, it also tells Google that the page is an exact or near duplicate. Google continues to crawl your site because you are no longer using up its bandwidth unnecessarily. There are also two minor cases worth understanding where duplicate content can actually help your rankings. Keep in mind, these are very specific and do not apply to every website.

Sometimes you dont need to consolidate your duplicate pages. Heres how to know when...
As youve learned, in most cases it is beneficial for a single page to garner the highest possible rank. After all, if this page features one of your bestselling products, you are practically guaranteed more sales. But there is one case when using canonical tags and giving all of your reputation to a single page isnt the best idea. When your visitors really care about your products attributes (e.g. the products color), it might be smart to separate your pages. Lets return to the example about shoes. If your online store offers the same shoe in multiple colors, and you have found that customers are specifically searching for products in the color turquoise, you might benefit from treating each color of the product as a separate page. Copyright 2011 Altruik, Inc.

You dont have to worry about localized content on international domains.


What happens when you host the same content on different regional servers and international domains? For example, suppose you copy the same content on http://www.example.com to your local servers at http://www.example.fr. Will the content make it into the search engine results page abroad, or will also be deemed duplicate content?

11

Both your shoes and your turquoise pages will get traffic from color-based searches. Your competitors are probably doing the same thing. Whenever you separate your pages, you need to make them stand on their own. Your product page for the turquoise shoes must be distinct enough from the page for the black shoes to pass Googles duplicate content filter. Otherwise, Google will not rank the page at all. It is not enough to swap out a few words and reorganize paragraphs to create a new description. Google is too smart for that. Youll need to rewrite each new product description from scratch.

Once again, it bears repeating that this is an exceptional case. You must really understand your customers, and more importantly, pay attention to their search behavior. If your customers are not typically searching for different variations of the same product, it is safe to use canonical tags and consolidate duplicate content. But if they usually search for items by their color, size, weight, etc., you should keep the pages separate and write new descriptions to individualize the content.

Sound Like Too Much Manual Labor? Theres Good News. Most Of It Can Be Automated.
It doesnt matter if you own one website or many websites on several international domains. By now, you have the knowledge to understand and tackle the problem of duplicate content. However, you have probably realized just how time-consuming the process of consolidating your content can be. Do you really want to go through every duplicate or near-duplicate page, every subdomain, and every extra parameter in your URLs? Lighthouse does everything weve discussed so You are a businessperson, so like us your answer will be an affirmative NO! You have better far, and it does a few more things beyond the scope of this paper. Here is a quick rundown: ways of spending your valuable time. Luckily for you, we developed our Lighthouse software originally to solve our own duplicate content problems. We were slaving away, consolidating content for one of our clients, and we simply grew tired of the whole process. You can manually implement only so many 301 redirects before you start thinking, There has to be a better way.

Copyright 2011 Altruik, Inc.

12

Automated 301 redirects and rel=canonical tags. Lighthouse spots your duplicate pages and automatically implements 301 redirects and rel=canonical tags. Automated robots.txt analysis. Lighthouse finds and corrects problems with sitemap accessibility, infinite spaces, and crawl delays.

We understand how all of this can seem like a huge project at first. Thats why wed like to show you a way to measure the direct business benefit youll get from tackling each of these issues head on. In the next section, youll learn what you need to know before you decide to launch an all-out assault on your websites duplicate content.

Will You Profit From Addressing Duplicate Content Issues? Heres a Surefire Way to Know.
Its one thing to suspect you have a problem. Its quite another to know the severity of the problem and identify where it is located. You wouldnt fix a faucet that isnt leaking, so why would you tackle a duplicate content problem that is practically nonexistent? We want to show you how to measure the direct business benefit youll get from patching up the leaks your CMS leaves behind. It works wonders for us, and we are sure it will for you too. Step one: establish a baseline for measurement. First, determine how many pages your site has. Add up your product pages, category pages, and ancillary pages. The total number is your real number of site pages. Consider two key monthly metrics: 1) Revenue per page (total site revenue pages indexed in Google) and 2) Searches per page (total search clicks to your site pages indexed in Google). Step two: implement the change and wait. Fix your duplicate content issues, or hire a professional to do it for you. Then sit back and wait at least one month before you make another measurement. Sometimes it takes a while before Google returns to crawl the extra pages on your site. Step three: look for an increase in active pages and traffic. What you measure next depends on your goals. If you are looking primarily for increased revenue, as we all are, you want to see an increase in the number of pages indexed. Compare your revenue per page before and after the duplicate content fix. The second metric to consider is search clicks per page. Youll notice an increase here if your site suffered from duplicate pages that divided your audience and your links, reducing your primary page reputation in Google. If all went well, your canonical pages will rank higher, and as they perform better youll also inCopyright 2011 Altruik, Inc.

13

crease your revenue. You should see an increase in the number of unique pages receiving regular search traffic as well as an overall increase in traffic to your website. This increase usually

happens because Google crawled more of your website and more of your pages made it onto the search engine results page.

Heres What You Should Do Next


By now you realize that duplicate content problems happen all on their own, and it is up to you to stop them before you lose your rankings to the competition. Even if you take care of every piece of duplicate content today, you still will have to deal with it periodically in the future. The more content you add to your website, the more likely duplicate pages will pop up. Its nice to have a way to constantly keep it in check. Most business owners wait until their next website redesign to start tackling their duplicate content problems, but this approach comes at a huge cost. Each low-ranking page amounts to customers who never made it to your store. Can you really afford to lose a single sale between now and your next redesign? Automatic duplicate content management is the only solution that makes sense. When you allow our Lighthouse software to consolidate your Copyright 2011 Altruik, Inc. content as you create it, your pages start to rank better right out of the gate. You dont have to stop what you are doing to handle a situation that can easily get out of control. Its something we like to call peace of mind. If you are interested in ridding your site of duplicate content problems for good, we encourage you to give us a call. Well tell you more about Lighthouse and how you can use it to take care of your duplicate content automatically. Why go through page after page when software can do all the dirty work? We created Lighthouse because youve got better things to do.

14

You might also like