Is your E-commerce System Harming Your Search Engine Rankings

Hamlet Batista
Chief Search Strategist

different URLs that point to the same content. surprisingly are not designed with SEO in mind. Most Content Management Systems. That’s why it is important that you optimize your site for search engine spiders. there is a good chance that you have an issue with duplicate content.Table of Contents What Is Duplicate Content? How Duplicate Content Affects Your Search Engine Rankings How To Put An End To Duplicate Content So You Can Reclaim Your Ranking When Duplicate Content Is Not Really Duplicate At All Sound Like Too Much Manual Labor? Will You Profit From Addressing Duplicate Content Issues? Here’s What You Should Do Next 13 14 11 12 8 5 3 A s an online retailer. Inc. Have you noticed your search engine rankings slipping away recently? Do you wonder what the cause might be? It is critical that every page selling your products ranks as highly as possible in search engines like Google. index. Your CMS features tools that make finding products easier for visitors to your website. If you have multiple copies of the same page. But those same features that duplicate product pages into multiple categories often make it difficult for Google to crawl. That problem is duplicate content. and rank all of the pages on your site. and navigation systems that track your users. There is a hidden danger—an issue that affects the majority of e-commerce websites—which most business owners don’t know about it until it is too late. as useful as they are. especially if you are using a CMS (content management system). 1 . your search engine strategy is your business strategy. too. Copyright © 2011 Altruik.

Duplicate content causes serious problems because it: • Weakens the rank of your most popular pages • Sends Google on a wild goose chase. the reputation and rank of your top-selling product pages go down the drain. huge portions of your website might not be in Google’s index. and your competition gains the upper hand. Your priority is to patch these leaks before they drown your entire online business. causing it to abandon your site altogether • Blocks large portions of your website from getting indexed • Prevents your most profitable pages from reaching the top of Google’s rankings • Cripples your best link-building efforts Duplicate content problems are like leaky faucets. The question now is: how can you identify duplicate content and patch up the leaks that are ruining your search engine rankings? Keep on reading because we’re going to teach you what most people don’t know about the mess their CMS is leaving behind. If you have a duplicate content problem. With the right tools. Copyright © 2011 Altruik. you can build an even stronger search presence. 2 . Inc. As more sites link to your duplicate URLs. Products that once ranked very high suddenly begin tumbling down the rankings.

users selected various categories in different orders and were able to access the same content via different paths.What Is Duplicate Content? First. the basics. In each example. or almost exactly the same. Google considers the newer page to be duplicate content. Jessica Simpson Women’s Leve Black Leather shoe. 3 . Copyright © 2011 Altruik. Duplicate content is any page on the Internet that is either exactly the same or nearly identical to another page. and another in which the user chooses “black” first. Google compares the text of multiple pages to determine a match. Let’s say your online store has a category for “shoes” and another category for all products in the color “black. Most duplicate content is created when your CMS allows visitors (and Google) to access the same page from different URLs. The URLs are different but lead you to the same product. These pages are almost identical.” The same pair of black shoes can be accessed from two different category combinations. If the written content is exactly the same. Inc. one in which the user selects “shoes” first.

You don’t actually have two copies of the same page.” makes a big difference to Google. As you can” as you do on “http://www. It creates a number of nasty problems. following each link for every category. Sometimes it sends the robot on an endless chase that Google eventually abandons. Duplicate content is also created when: • You use multiple subdomains. it simply dilutes the reputation of all your affected pages. and another under the other. Google thinks you have duplicate content when you put the same page on “http://example. It will conclude that one of these pages is a duplicate. we’ll show you how duplicate content prevents your most profitable pages from making it to the top of Google’s rankings. Copyright © 2011 Altruik. duplicate content can arise from a variety of sources. both for Google’s search engine robot and for other search engines. and at other times. You can recapture reputation and prevent duplicate content by consolidating non-canonical versions with 301 redirects. Each of these is another leak in your faucet. • Your CMS creates separate pages for different product colors.Google’s search engine robot crawls your website like a nosy visitor.Your CMS dynamically generates pages as your users click on links.example. or people linking to your pages. Google can’t tell the difference between an image of a blue shoe and a red shoe (it relies on textual descriptions). your overall ranking suffers. A good example of this is a calendar that creates a new page every time you click on the “next month” link. It will find the same page twice. Page reputation is diluted with the same content is accessible through multiple URLs. Automatically redirects users to a specific URL A successful request. content is returned 4 . Source: Google’s SEO Report Card—Google Webmaster Central In the next”. When only a small portion of your site makes it into the search engine rankings. but your CMS setup certainly makes it look like you do. That “www. Glossary 301 Redirect 200 An HTTP status code. creating multiple URLs that direct Google to the same page over and over again. • You. once under one combination of categories. add extra parameters to URLs (sometimes for tracking). Inc.

you’re on the right track. or would your prefer to split the article in two and assign 500.000 views to each section? If you chose the former option. Would you rather see the entire article getting a million views. wreaking havoc on your search rankings in the process. As a single page receives more views.How Duplicate Content Affects Your Search Engine Rankings Now we’re ready to see how duplicate content affects Google’s impression of your content. Splitting links will dilute rankings of your strongest pages. Copyright © 2011 Altruik. it increases the chances of receiving natural links. More people share the page. 5 . Inc. You want your website listed in the prime real estate of the results page. How duplicate content dilutes the ranking of your top-rated pages Let’s say you just wrote a popular article that went viral. blog about it. and link to it.

Your duplicate pages siphon off a large portion of your inbound links. What might happen if Google decides that your website is composed mostly of duplicate content? The short answer is that it will stop indexing your pages and move on to other websites. If these duplicates make it into Google’s index. The rest leak over to the duplicate page. A “Doggy Care” website using a CMS creates two URLs for “dog bone” under the category “food” and another under the category “treats. Here is what Matt Cutts. Here are some of the most common caused by your CMS: How duplicate content cripples your best link building efforts Consider another example. or page to rank highly in Google. video. 100% of Copyright © 2011 Altruik. and then we discover that the two other pages were duplicates of the third page. That’s only the half of it. However. you must avoid wasting your links and reputation on duplicate pages.” To a search engine. Google splits your reputation between all of the pages. What happens when customers really like the dog bone and want to tell others about it? They link to it on their website. Inc. because there are two different pages created by the CMS for the same dog bone. your inbound links should go to the same page. video.When you have multiple versions of the same article. some of them are going down the drain. How duplicate content sends Google on a wild goose chase Another problem to consider can be even more tragic for your search rankings. When you’re trying to rank highly in Google. leaving vast numbers of pages completely out of the index and your site flagged as mostly spam. has to say about duplicate content: Imagine we crawl three pages from a site. A product that would have received 100 links only receives half that. they will almost certainly be filtered out of the rankings. No matter how many links you get. that gives you 100% of the reputation. 6 . they might link to either one of them. or page. and that’s why it looks like it has less good content. and it takes longer for your article. So we might tend to not crawl quite as much from that site… [T]he fact that you had duplicate content and we discarded those pages meant you missed an opportunity to have other pages with good. We’ll drop two out of the three pages and keep only one. There are a number of scenarios in which Google’s robot will give up crawling your website. head of Google’s Webspam team. the result is once again duplicate content. unique quality content show up in the index. In Google’s eyes.

If these IDs are present in the path_info portion of your URL. they are particularly dangerous. Now that you understand how duplicate content can harm your search engine rankings. Googlebot keeps following this link as long as it can and eventually times out. it concludes that your site is composed mostly of duplicate content and stops crawling your website. we want to show you what you can do to stop your CMS from creating so much of it. Googlebot keeps following the links in circles until it times out.Source: Google's SEO Report Card—Google Webmaster Central • Your CMS creates a calendar that generates a new page for a new month every time you click on the “next month” link. and it can cause large portions of your site to go unnoticed. Because the products and categories are linked to each other (often in very complex ways). This last one can be particularly nasty. Once Googlebot understands that is going in circles (or down an endless drain like the calendar example). Because your website keeps generating a new link every time Googlebot follows the “next month” link. differing only in the URL—an “infinite” space that Googlebot treats as duplicate content. This is a very bad thing. You can make vast improvements in your search engine rankings by tackling just this problem alone. Inc. • Your website features a “guided navigation” shopping cart with categories for different brands and types of products. 7 . You’ll be happy to know that all of these problems can be solved. This quickly causes the bot to “see” millions of pages that are identical. • Your website uses a session ID in its URLs to track users who have cookies disabled (“jsessionid” is a common example of an in-URL session ID that gets indexed as duplicate content). it is given a new page with a new jsessionid. it acts like a user with browser cookies disabled. When a search engine bot crawls the site. Each time Googlebot requests a page. Copyright © 2011 Altruik. and you can use automated tools to help you handle most of them.

dogtoys. There are two ways to do this: 1. But the fix is relatively easy.example. duplicate content problems happen all on their own. It can also happen when your CMS uses multiple URLs to point to the same content. both example. your competitors will gain the edge. Or use the Google’s Webmaster Tools to choose which domain contains the original content. This process is sometimes called “canonicalization. users are automatically directed to the canonical URL. There are solutions to duplicate content problems and we’ll take a look at the how to solve the most dangerous chewybone. Implement 301 redirects to send people to the right subdomain with the original content. 8 . If you don’t do something to address them before they affect your ranking. You just need to tell Google which subdomain contains the original source material. Copyright © 2011 Altruik. Google treats the content on one of the subdomains as a duplicate. Is Your Content Accessible From Multiple Subdomains? As we discussed earlier.php” to the same page.” When this page is selected in the search engine results page. If Google follows the link “http:// and www. Google will index a duplicate page for one of the URLs.php” and “http://www.dogtoys. chewybone.How To Put An End To Duplicate Content So You Can Reclaim Your Ranking As you have already seen. Inc. when your website is accessible from multiple subdomains (for example.

Google’s search engine robot unwittingly follows all of these URLs. If you don’t tell Google which parameters to ignore. you need to indicate this to Google so that it does not conclude that you have duplicate content. and it keeps finding more duplicate content. Whenever you use this tag. you are telling Google that the current page is either a duplicate or a near-duplicate. color or the image. Perhaps you sell a golden chocolate basket. Here’s what you can do: Copyright © 2011 Altruik. and a bronze chocolate basket.Once you have indicated to Google where it can find the original content. Congratulations: you’ve just fixed one of the leaks in your faucet. and tracking users. If the only difference between one product description and the next is the Some shopping carts add parameters to your URLs for the purposes of sorting. dividing products into pages by category. Do your URLs contain extra parameters for tracking and sorting? They might accidentally convince Google that you have a duplicate content problem. 9 . just as you would with meta tags. You can do this by using the “rel=canonical” link tag on the pages with the near-identical content. it will start performing much better in the search engine rankings. Here’s an example. Because your primary (canonical) page will be the only page that can get links and reputation. Googlebot will keep spinning its proverbial wheels. and the original page can be found at the address you have specified. This problem usually affects online retailers who sell many different versions of the same product. Inc. a silver chocolate basket. Make sure you place this tag somewhere in <head> section of these near-duplicate pages. it will no longer index your subdomain. Are some pages near duplicates of others? What to do when your product descriptions only differ by few words.

often the culprit is the “jsessionid” parameter. As we discussed. This is called the “infinite space” problem because Googlebot gets “stuck” in these 10 . When you do this. and will tell you if your website has this issue when you log in to Google Webmaster Tools. Now that you have the tools to clean up duplicate content. it will list which links lead to an “infinite space. How to stop Google from going on a wild goose chase. in the next section we’ll consider a few important cases where duplicate content is not only acceptable. over and over again. Specifically. • Make it impossible for search engines to extract these URLs. Thankfully. your new link should look like the following: < a href=“http://www. Once you’ve found the links that lead to an infinite space. do one of the following: • Set the “rel” attribute in the suspicious link to “nofollow”. txt file. Google knows about the “infinite space” problem.calendar. there is a way to stop it. continually crawling the same series of dynamically generated pages or URLs with session IDs and tracking parameters.php” rel=“nofollow”>next month</a> • Block the infinite space URLs in your robots.” and offers a few tips to patch things up. but necessary.Google Webmaster Tools allows users to define what parameters Google should ignore when crawling a website. You can do this by hiding them within JavaScript. Copyright © 2011 Altruik. Sometimes Google finds large sections of your website that contain links to pages with no original content. Inc.

. That means you will have to go through the time-consuming task of canonicalizing your URLs so that they all point to the same international pages. Here’s how to know when. This is no crime.” you might benefit from treating each color of the product as a separate page. these are very specific and do not apply to every website. search engines like Google do not consider the same content as duplicate content. If your online store offers the same shoe in multiple colors.When Duplicate Content Is Not Really Duplicate At All Sometimes you end up with exact duplicate pages for legitimate reasons. Use a 301 redirect if you have duplicate pages that just can’t be avoided. and you have found that customers are specifically searching for products in the color “turquoise. That said. there is nothing to worry about. it might be smart to separate your pages. the issues concerning subdomains that we discussed previously also apply to your international websites. You don’t have to worry about localized content on international domains. Will the content make it into the search engine results page abroad. suppose you copy the same content on http://www. Using a 301 redirect not only sends your users to the canonical page.. you are practically guaranteed more sales. When your visitors really care about your products’ attributes (e. or will also be deemed “duplicate content”? 11 . it also tells Google that the page is an exact or near duplicate. There are also two minor cases worth understanding where duplicate content can actually help your rankings.example. Let’s return to the example about shoes. But there is one case when using canonical tags and giving all of your reputation to a single page isn’t the best to your local servers at http://www. but it does require you to let Google know so that your site may be indexed appropriately by the search engine robot. When hosted on different international domains. the product’s color). It also prevents your website from being flagged as mostly duplicate content. After all. if this page features one of your bestselling products. Copyright © 2011 Altruik. Keep in mind. just like you did on your home website domain. Here’s the fix: In this case. As you’ve learned. What happens when you host the same content on different regional servers and international domains? For example.g. Google continues to crawl your site because you are no longer using up its bandwidth unnecessarily. in most cases it is beneficial for a single page to garner the highest possible rank. Sometimes you don’t need to consolidate your duplicate pages. of course. Inc.example.

12 . it bears repeating that this is an exceptional case. you have the knowledge to understand and tackle the problem of duplicate content. weight. we developed our Lighthouse software originally to solve our own duplicate content problems. and it does a few more things beyond the scope of this paper. it is safe to use canonical tags and consolidate duplicate content. But if they usually search for items by their color. It is not enough to swap out a few words and reorganize paragraphs to create a new description. By now. consolidating content for one of our clients. You can manually implement only so many 301 redirects before you start thinking. Otherwise. Your competitors are probably doing the same thing. so like us your answer will be an affirmative NO! You have better far.Both your “shoes” and your “turquoise” pages will get traffic from color-based searches. you have probably realized just how time-consuming the process of consolidating your content can be. If your customers are not typically searching for different variations of the same product.” Copyright © 2011 Altruik. every subdomain. you should keep the pages separate and write new descriptions to individualize the content. and every extra parameter in your URLs? Lighthouse does everything we’ve discussed so You are a businessperson. Once again. Here is a quick rundown: ways of spending your valuable time. and we simply grew tired of the whole process.. You must really understand your customers. pay attention to their search behavior. etc. It doesn’t matter if you own one website or many websites on several international domains. Google is too smart for that. and more importantly. However. Luckily for you. Most Of It Can Be Automated. size. Your product page for the “turquoise shoes” must be distinct enough from the page for the “black shoes” to pass Google’s duplicate content filter. We were slaving away. “There has to be a better way. You’ll need to rewrite each new product description from scratch. you need to make them stand on their own. Whenever you separate your pages. Google will not rank the page at all. Inc. Sound Like Too Much Manual Labor? There’s Good News. Do you really want to go through every duplicate or near-duplicate page.

and ancillary pages. • Automated robots. Step three: look for an increase in active pages and traffic. or hire a professional to do it for you. infinite spaces. Compare your revenue per page before and after the duplicate content fix. so why would you tackle a duplicate content problem that is practically nonexistent? We want to show you how to measure the direct business benefit you’ll get from patching up the leaks your CMS leaves behind. It’s quite another to know the severity of the problem and identify where it is located.txt analysis. You’ll notice an increase here if your site suffered from duplicate pages that divided your audience and your links. We understand how all of this can seem like a huge project at first. you’ll learn what you need to know before you decide to launch an all-out assault on your website’s duplicate content. Then sit back and wait at least one month before you make another measurement. category pages. Step one: establish a baseline for measurement. Fix your duplicate content issues. Lighthouse spots your duplicate pages and automatically implements 301 redirects and rel=canonical tags. Will You Profit From Addressing Duplicate Content Issues? Here’s a Surefire Way to Know. and as they perform better you’ll also inCopyright © 2011 Altruik. In the next section. What you measure next depends on your goals. The total number is your real number of site pages. The second metric to consider is search clicks per page. It works wonders for us. First. You wouldn’t fix a faucet that isn’t leaking. as we all are. 13 . It’s one thing to suspect you have a problem.• Automated 301 redirects and “rel=canonical” tags. your canonical pages will rank higher. and crawl delays. Inc. determine how many pages your site has. If you are looking primarily for increased revenue. you want to see an increase in the number of pages indexed. Sometimes it takes a while before Google returns to crawl the extra pages on your site. If all went well. Consider two key monthly metrics: 1) Revenue per page (total site revenue ÷ pages indexed in Google) and 2) Searches per page (total search clicks to your site ÷ pages indexed in Google). Lighthouse finds and corrects problems with sitemap accessibility. and we are sure it will for you too. That’s why we’d like to show you a way to measure the direct business benefit you’ll get from tackling each of these issues head on. Step two: implement the change and wait. Add up your product pages. reducing your primary page reputation in Google.

Why go through page after page when software can do all the dirty work? We created Lighthouse because you’ve got better things to do. Can you really afford to lose a single sale between now and your next redesign? Automatic duplicate content management is the only solution that makes sense. Each low-ranking page amounts to customers who never made it to your store.” If you are interested in ridding your site of duplicate content problems for good. the more likely duplicate pages will pop up. We’ll tell you more about Lighthouse and how you can use it to take care of your duplicate content automatically. we encourage you to give us a call. When you allow our Lighthouse software to consolidate your Copyright © 2011 Altruik. and it is up to you to stop them before you lose your rankings to the competition. Even if you take care of every piece of duplicate content today. you still will have to deal with it periodically in the future. You should see an increase in the number of unique pages receiving regular search traffic as well as an overall increase in traffic to your website. Here’s What You Should Do Next By now you realize that duplicate content problems happen all on their own. This increase usually happens because Google crawled more of your website and more of your pages made it onto the search engine results page. It’s something we like to call “peace of mind. your pages start to rank better right out of the gate. Most business owners wait until their next website redesign to start tackling their duplicate content problems. but this approach comes at a huge cost. The more content you add to your website. It’s nice to have a way to constantly keep it in check. content as you create it. Inc. 14 . You don’t have to stop what you are doing to handle a situation that can easily get out of control.crease your revenue.

Sign up to vote on this title
UsefulNot useful