Professional Documents
Culture Documents
www.altruik.com
Hamlet Batista
Chief Search Strategist hbatista@altruik.com
Table of Contents
What Is Duplicate Content? How Duplicate Content Affects Your Search Engine Rankings How To Put An End To Duplicate Content So You Can Reclaim Your Ranking When Duplicate Content Is Not Really Duplicate At All Sound Like Too Much Manual Labor? Will You Profit From Addressing Duplicate Content Issues? Heres What You Should Do Next 13 14 11 12 8 5 3
s an online retailer, your search engine strategy is your business strategy. Have you noticed your search engine rankings slipping away recently? Do you wonder what the cause might be? It is critical that every page selling your products ranks as highly as possible in search engines like Google. Thats why it is important that you optimize your site for search engine spiders, especially if you are using a CMS (content management system). There is a hidden dangeran issue that affects the majority of e-commerce websiteswhich most business owners dont know about it until it is too late. That problem is duplicate content. If you have multiple copies of the same page, different URLs that point to the same content, and navigation systems that track your users, there is a good chance that you have an issue with duplicate content, too. Most Content Management Systems, as useful as they are, surprisingly are not designed with SEO in mind. Your CMS features tools that make finding products easier for visitors to your website. But those same features that duplicate product pages into multiple categories often make it difficult for Google to crawl, index, and rank all of the pages on your site.
Duplicate content causes serious problems because it: Weakens the rank of your most popular pages Sends Google on a wild goose chase, causing it to abandon your site altogether Blocks large portions of your website from getting indexed Prevents your most profitable pages from reaching the top of Googles rankings Cripples your best link-building efforts
Duplicate content problems are like leaky faucets. As more sites link to your duplicate URLs, the reputation and rank of your top-selling product pages go down the drain. Products that once ranked very high suddenly begin tumbling down the rankings, and your competition gains the upper hand. The question now is: how can you identify duplicate content and patch up the leaks that are ruining your search engine rankings? Keep on reading because were going to teach you what most people dont know about the mess their CMS is leaving behind. Your priority is to patch these leaks before they drown your entire online business. With the right tools, you can build an even stronger search presence.
If you have a duplicate content problem, huge portions of your website might not be in Googles index.
These pages are almost identical. The URLs are different but lead you to the same product, Jessica Simpson Womens Leve Black Leather shoe. In each example, users selected various categories in different orders and were able to access the same content via different paths. Copyright 2011 Altruik, Inc.
Googles search engine robot crawls your website like a nosy visitor, following each link for every category. It will find the same page twice, once under one combination of categories, and another under the other. You dont actually have two copies of the same page, but your CMS setup certainly makes it look like you do.
As you can see, duplicate content can arise from a variety of sources. Each of these is another leak in your faucet. It creates a number of nasty problems, both for Googles search engine robot and for other search engines. Sometimes it sends the robot on an endless chase that Google eventually abandons, and at other times, it simply dilutes the reputation of all your affected pages. When only a small portion of your site makes it into the search engine rankings, your overall ranking suffers.
Page reputation is diluted with the same content is accessible through multiple URLs. You can recapture reputation and prevent duplicate content by consolidating non-canonical versions with 301 redirects. Source: Googles SEO Report CardGoogle Webmaster Central In the next section, well show you how duplicate content prevents your most profitable pages from making it to the top of Googles rankings. Copyright 2011 Altruik, Inc.
Glossary
301 Redirect 200 An HTTP status code. Automatically redirects users to a specific URL A successful request, content is returned
You want your website listed in the prime real estate of the results page. Splitting links will dilute rankings of your strongest pages. Copyright 2011 Altruik, Inc.
When you have multiple versions of the same article, video, or page, Google splits your reputation between all of the pages. Your duplicate pages siphon off a large portion of your inbound links, and it takes longer for your article, video, or page to rank highly in Google. No matter how many links you get, some of them are going down the drain.
your inbound links should go to the same page. In Googles eyes, that gives you 100% of the reputation.
Source: Google's SEO Report CardGoogle Webmaster Central Your CMS creates a calendar that generates a new page for a new month every time you click on the next month link. Because your website keeps generating a new link every time Googlebot follows the next month link, Googlebot keeps following this link as long as it can and eventually times out. Your website features a guided navigation shopping cart with categories for different brands and types of products. Because the products and categories are linked to each other (often in very complex ways), Googlebot keeps following the links in circles until it times out. Your website uses a session ID in its URLs to track users who have cookies disabled (jsessionid is a common example of an in-URL session ID that gets indexed as duplicate content). If these IDs are present in the path_info portion of your URL, they are particularly dangerous. This last one can be particularly nasty. When a search engine bot crawls the site, it acts like a user with browser cookies disabled. Each time Googlebot requests a page, it is given a new page with a new jsessionid. This quickly causes the bot to see millions of pages that are identical, differing only in the URLan infinite space that Googlebot treats as duplicate content. Once Googlebot understands that is going in circles (or down an endless drain like the calendar example), it concludes that your site is composed mostly of duplicate content and stops crawling your website. This is a very bad thing, and it can cause large portions of your site to go unnoticed. You can make vast improvements in your search engine rankings by tackling just this problem alone. Now that you understand how duplicate content can harm your search engine rankings, we want to show you what you can do to stop your CMS from creating so much of it. Youll be happy to know that all of these problems can be solved, and you can use automated tools to help you handle most of them. Copyright 2011 Altruik, Inc.
How To Put An End To Duplicate Content So You Can Reclaim Your Ranking
As you have already seen, duplicate content problems happen all on their own. If you dont do something to address them before they affect your ranking, your competitors will gain the edge. There are solutions to duplicate content problems and well take a look at the how to solve the most dangerous ones.
which subdomain contains the original source material. There are two ways to do this: 1. Implement 301 redirects to send people to the right subdomain with the original content. 2. Or use the Googles Webmaster Tools to choose which domain contains the original content. This process is sometimes called canonicalization.
When this page is selected in the search engine results page, users are automatically directed to the canonical URL. Copyright 2011 Altruik, Inc.
Once you have indicated to Google where it can find the original content, it will no longer index your subdomain. Because your primary (canonical) page will be the only page that can get links and reputation, it will start performing much better in the search engine rankings. Congratulations: youve just fixed one of the leaks in your faucet.
color or the image, you need to indicate this to Google so that it does not conclude that you have duplicate content. You can do this by using the rel=canonical link tag on the pages with the near-identical content. Make sure you place this tag somewhere in <head> section of these near-duplicate pages, just as you would with meta tags. Heres an example. Whenever you use this tag, you are telling Google that the current page is either a duplicate or a near-duplicate, and the original page can be found at the address you have specified. Do your URLs contain extra parameters for tracking and sorting? They might accidentally convince Google that you have a duplicate content problem.
Are some pages near duplicates of others? What to do when your product descriptions only differ by few words.
This problem usually affects online retailers who sell many different versions of the same product. Perhaps you sell a golden chocolate basket, a silver chocolate basket, and a bronze chocolate basket. If the only difference between one product description and the next is the
Some shopping carts add parameters to your URLs for the purposes of sorting, dividing products into pages by category, and tracking users. Googles search engine robot unwittingly follows all of these URLs, and it keeps finding more duplicate content. If you dont tell Google which parameters to ignore, Googlebot will keep spinning its proverbial wheels. Heres what you can do:
Google Webmaster Tools allows users to define what parameters Google should ignore when crawling a website.
Once youve found the links that lead to an infinite space, do one of the following: Set the rel attribute in the suspicious link to nofollow. When you do this, your new link should look like the following:
< a href=http://www.calendar.com/nextmonth.php rel=nofollow>next month</a>
Block the infinite space URLs in your robots. txt file. Make it impossible for search engines to extract these URLs. You can do this by hiding them within JavaScript. Now that you have the tools to clean up duplicate content, in the next section well consider a few important cases where duplicate content is not only acceptable, but necessary.
10
Use a 301 redirect if you have duplicate pages that just cant be avoided.
Using a 301 redirect not only sends your users to the canonical page, it also tells Google that the page is an exact or near duplicate. Google continues to crawl your site because you are no longer using up its bandwidth unnecessarily. There are also two minor cases worth understanding where duplicate content can actually help your rankings. Keep in mind, these are very specific and do not apply to every website.
Sometimes you dont need to consolidate your duplicate pages. Heres how to know when...
As youve learned, in most cases it is beneficial for a single page to garner the highest possible rank. After all, if this page features one of your bestselling products, you are practically guaranteed more sales. But there is one case when using canonical tags and giving all of your reputation to a single page isnt the best idea. When your visitors really care about your products attributes (e.g. the products color), it might be smart to separate your pages. Lets return to the example about shoes. If your online store offers the same shoe in multiple colors, and you have found that customers are specifically searching for products in the color turquoise, you might benefit from treating each color of the product as a separate page. Copyright 2011 Altruik, Inc.
11
Both your shoes and your turquoise pages will get traffic from color-based searches. Your competitors are probably doing the same thing. Whenever you separate your pages, you need to make them stand on their own. Your product page for the turquoise shoes must be distinct enough from the page for the black shoes to pass Googles duplicate content filter. Otherwise, Google will not rank the page at all. It is not enough to swap out a few words and reorganize paragraphs to create a new description. Google is too smart for that. Youll need to rewrite each new product description from scratch.
Once again, it bears repeating that this is an exceptional case. You must really understand your customers, and more importantly, pay attention to their search behavior. If your customers are not typically searching for different variations of the same product, it is safe to use canonical tags and consolidate duplicate content. But if they usually search for items by their color, size, weight, etc., you should keep the pages separate and write new descriptions to individualize the content.
Sound Like Too Much Manual Labor? Theres Good News. Most Of It Can Be Automated.
It doesnt matter if you own one website or many websites on several international domains. By now, you have the knowledge to understand and tackle the problem of duplicate content. However, you have probably realized just how time-consuming the process of consolidating your content can be. Do you really want to go through every duplicate or near-duplicate page, every subdomain, and every extra parameter in your URLs? Lighthouse does everything weve discussed so You are a businessperson, so like us your answer will be an affirmative NO! You have better far, and it does a few more things beyond the scope of this paper. Here is a quick rundown: ways of spending your valuable time. Luckily for you, we developed our Lighthouse software originally to solve our own duplicate content problems. We were slaving away, consolidating content for one of our clients, and we simply grew tired of the whole process. You can manually implement only so many 301 redirects before you start thinking, There has to be a better way.
12
Automated 301 redirects and rel=canonical tags. Lighthouse spots your duplicate pages and automatically implements 301 redirects and rel=canonical tags. Automated robots.txt analysis. Lighthouse finds and corrects problems with sitemap accessibility, infinite spaces, and crawl delays.
We understand how all of this can seem like a huge project at first. Thats why wed like to show you a way to measure the direct business benefit youll get from tackling each of these issues head on. In the next section, youll learn what you need to know before you decide to launch an all-out assault on your websites duplicate content.
Will You Profit From Addressing Duplicate Content Issues? Heres a Surefire Way to Know.
Its one thing to suspect you have a problem. Its quite another to know the severity of the problem and identify where it is located. You wouldnt fix a faucet that isnt leaking, so why would you tackle a duplicate content problem that is practically nonexistent? We want to show you how to measure the direct business benefit youll get from patching up the leaks your CMS leaves behind. It works wonders for us, and we are sure it will for you too. Step one: establish a baseline for measurement. First, determine how many pages your site has. Add up your product pages, category pages, and ancillary pages. The total number is your real number of site pages. Consider two key monthly metrics: 1) Revenue per page (total site revenue pages indexed in Google) and 2) Searches per page (total search clicks to your site pages indexed in Google). Step two: implement the change and wait. Fix your duplicate content issues, or hire a professional to do it for you. Then sit back and wait at least one month before you make another measurement. Sometimes it takes a while before Google returns to crawl the extra pages on your site. Step three: look for an increase in active pages and traffic. What you measure next depends on your goals. If you are looking primarily for increased revenue, as we all are, you want to see an increase in the number of pages indexed. Compare your revenue per page before and after the duplicate content fix. The second metric to consider is search clicks per page. Youll notice an increase here if your site suffered from duplicate pages that divided your audience and your links, reducing your primary page reputation in Google. If all went well, your canonical pages will rank higher, and as they perform better youll also inCopyright 2011 Altruik, Inc.
13
crease your revenue. You should see an increase in the number of unique pages receiving regular search traffic as well as an overall increase in traffic to your website. This increase usually
happens because Google crawled more of your website and more of your pages made it onto the search engine results page.
14