Black Hat SEO Crash Course V1.
Introduction If you have spent any significant amount of time online, you have likely come across the term Black Hat at one time or another. This term is usually associated with many negative comments. This book is here to address those comments and provide some insight into the real life of a Black Hat SEO professional. To give you some background, my name is Brian. I've been involved in internet marketing for close to 10 years now, the last 7 of which have been dedicated to Black Hat SEO. As we will discuss shortly, you can't be a great Black Hat without first becoming a great White Hat marketer. With the formalities out of the way, lets get into the meat of things, shall we? What is Black Hat SEO? The million dollar question that everyone has an opinion on. What exactly is Black Hat SEO? The answer here depends largely on who you ask. Ask most White Hats and they immediately quote the Google Webmaster Guidelines like a bunch of lemmings. Have you ever really stopped to think about it though? Google publishes those guidelines because they know as well as you and I that they have no way of detecting or preventing what they preach so loudly. They rely on droves of webmasters to blindly repeat everything they say because they are an internet powerhouse and they have everyone brainwashed into believing anything they tell them. This is actually a good thing though. It means that the vast majority of internet marketers and SEO professionals are completely blind to the vast array of tools at their disposal that not only increase traffic to their sites, but also make us all millions in revenue every year. The second argument you are likely to hear is the age old ,“the search engines will ban your sites if you use Black Hat techniques”. Sure, this is true if you have no understanding of the basic principals or practices. If you jump in with no knowledge you are going to fail. I'll give you the secret though. Ready? Don't use black hat techniques on your White Hat domains. Not directly at least. You aren't going to build doorway or cloaked pages on your money site, that would be idiotic. Instead you buy several throw away domains, build your doorways on those and cloak/redirect the traffic to your money sites. You lose a doorway domain, who cares? Build 10 to replace it. It isn't rocket science, just common sense. A search engine can't possibly penalize you for outside influences that are beyond your control. They can't penalize you for incoming links, nor can they penalize you for sending traffic to your domain from other doorway pages outside of that domain. If they could, I would simply point doorway pages and spam links at my competitors to knock them out of the SERPS. See..... Common sense. So again, what is Black Hat SEO? In my opinion, Black Hat SEO and White Hat SEO are almost no different. White hat web masters spend time carefully finding link partners to increase rankings for their keywords, Black Hats do the same thing, but we write automated scripts to do it while we sleep. White hat SEO's spend months perfecting the on page SEO of their sites for maximum rankings, black hat SEO's use content generators to spit out thousands of generated pages to see which version works best. Are you starting to see a pattern here? You should, Black Hat SEO and White Hat SEO are one in the same with one key difference. Black Hats are lazy. We like things automated. Have you ever heard the phrase "Work smarter not harder?" We live by those words. Why spend weeks or months building pages only to have Google slap them down with some obscure penalty. If you have
∗Stems terms. This is called an inverted file. Document Processor The document processor prepares. shall we? Search engines match queries against an index that they create. every good Black Hat must be a solid White Hat. ∗Identifies potential indexable elements in documents. yet their site is completely gone from the SERPS (Search Engine Results Pages) one morning for no apparent reason. Lets get started. That's when it came to me. and further I'm going to discuss how you can use that information to your advantage. and inputs the documents. ∗Extracts index entries. This section is going to get technical as we discuss how search engines work and delve into ways to exploit those inner workings. we've all been there. ∗Deletes stop words. lets start with the fundamentals. The document processor performs some or all of the following steps: ∗Normalizes the document stream to a predefined format. Each of these four modules may cause the expected or unexpected results that consumers get when they use a search engine. ∗Computes weights. So. processes. ∗Isolates and meta tags sub document pieces. I got tired of it as I am sure you are. The index consists of the words in each document. Months of work gone and nothing to show for it. ∗Creates and updates the main inverted file against which the search engine searches in order to match queries to documents. A web master plays by the rules. A search engine or IR (Information Retrieval) system comprises four essential modules: ∗A document processor ∗A query processor ∗A search and matching function ∗A ranking capability While users focus on "search.spent any time on web master forums you have heard that story time and time again." the search and matching function is only one of the four modules.
. Who elected the search engines the "internet police"? I certainly didn't. It's frustrating.
Search Engine 101 As we discussed earlier. or sites that users search against. does nothing outwardly wrong or evil. ∗Breaks the document stream into desired retrievable units. so why play by their rules? In the following pages I'm going to show you why the search engines rules make no sense. pages. plus pointers to their locations within the documents.
For example. Step 5: Deleting stop words. The process has two goals. and forms of the "to be" verb (is. For example. a successful query for the user would have come from matching only the word form actually used in the query. it). such as articles (a. we must define the word "term. In designing the system. analyzing. which the international community has called to try to prevent an all-out war in the Serbian province. This step used to matter much more than it does now when memory has become so much cheaper and systems so much faster. pronouns (he. Of course." Each search engine depends on a set of rules that its document processor must execute to determine what action is to be taken by the "tokenizer. Therefore. es. -ability). what about non-compositional phrases (phrases in which the separate words do not convey the meaning of the phrase. perhaps recursively in layer after layer of processing. those terms that have little value in finding useful documents in response to a customer's query. when. analyzes. Identifying potential indexable elements in documents dramatically affects the nature and quality of the document representation that the engine will search against.so that documents which include various forms of analy. conjunctions (and. in fact. which in turn reduces the storage space required for the index and speeds up the search process. the document processor extracts the remaining entries from the original document. stemming improves recall by reducing all forms of the word to a base or stemmed form. Step 7: Extract index entries. A stop word list typically consists of those word classes known to convey little substantive meaning. It may negatively affect precision in that all forms of a stem will match. like "skunk works" or "hot dog"). -es. it still has some significance." Is it the alpha-numeric characters between blank spaces or punctuation? If so. but). if a user asks for analyze. as well as potential matching. stemming does have a downside. but since stop words may comprise up to 40 percent of text words in a document. To delete stop words. carried by the official news agency Tanjug. this would not occur if the engine only indexed variant forms separately and required the user to enter all.
Systems may implement either a strong stemming algorithm or a weak stemming algorithm. the). over).will have equal likelihood of being retrieved. This step helps save system resources by eliminating from further processing. an algorithm compares index term candidates in the documents against a stop word list and eliminates certain terms from inclusion in the index for searching. multi-word proper names. In terms of effectiveness. analyzer. A strong stemming algorithm will strip off both inflectional suffixes (-s. Having completed steps 1 through 6. they may also want documents which contain analysis. In terms of efficiency. cast doubt over the governments at the talks. Stemming removes word suffixes. -ed). Step 6: Term Stemming. the software used to define a term suitable for indexing. -ed) and derivational suffixes (-able. interjections (oh. and analyzed. the document processor stems document terms to analy.e. the following paragraph shows the full text sent to a search engine for processing: Milosevic's comments." i. while a weak stemming algorithm will strip off only the inflectional suffixes (-s. or inter-word symbols such as hyphens or apostrophes that can denote the difference between "small business men" versus small-business men.Step 4: Identify elements to index. but). are). -aciousness. stemming reduces the number of unique words in the index. prepositions (in. "President Milosevic said it was well known that Serbia
Extensive experience in information retrieval research over many years has clearly demonstrated that the optimal weighting comes from use of "tf/idf." This algorithm measures the frequency of occurrence of each term within a document. peacefully in Serbia with the participation of the representatives of all ethnic communities. The simplest of search engines just assign a binary weight: 1 for presence and 0 for absence. Cook earlier told a conference that Milosevic had agreed to study the proposal." Tanjug said. Steps 1 to 6 reduce this text for searching to the following: Milosevic comm carri offic new agen Tanjug cast doubt govern talk interna commun call try prevent all-out war Serb province President Milosevic said well known Serbia Yugoslavia firm commit resolv problem Kosovo integr part Serbia peace Serbia particip representa ethnic commun Tanjug said Milosevic speak meeti British Foreign Secretary Robin Cook deliver ultimat attend negoti week time autonomy propos Kosovo ethnic Alban lead province Cook earl told conference Milosevic agree study propos.and Yugoslavia were firmly committed to resolving problems in Kosovo. Conversely. to insure index entries such as Milosevic are tagged as a Person and entries such as Yugoslavia and Serbia as Countries. Then it compares that frequency against the frequency of occurrence in the entire database.
. Not all terms are good "discriminators" — that is." More sophisticated document processors will have phrase recognizers. in a database devoted to health or medicine. however. The more sophisticated the search engine. A simple example would be the word "the. all terms do not single out one document from another very well. with length-normalization of frequencies still more sophisticated. who delivered an ultimatum to attend negotiations in a week's time on an autonomy proposal for Kosovo with ethnic Albanian leaders from the province. the term "antibiotic" would probably be a good discriminator among documents. The specific nature of the index entries. will vary based on the decision in Step 4 concerning what constitutes an "indexable term. Milosevic was speaking during a meeting with British Foreign Secretary Robin Cook. Weights are assigned to terms in the index file. Step 8: Term weight assignment. and therefore would be assigned a high weight." In a sports database when we compare each document to the database as a whole. The output of step 7 is then inserted and stored in an inverted file that lists the index entries and an indication of their position and frequency of occurrence. which is an integral part of Serbia. A less obvious example would be the word "antibiotic." This word appears in too many documents to help distinguish one from another. "antibiotic" would probably be a poor discriminator. the more complex the weighting scheme. since it occurs very often. as well as Named Entity recognizers and Categorizers. The TF/IDF weighting scheme assigns higher weights to those terms that really distinguish one document from the others. Measuring the frequency of occurrence of a term in the document creates more sophisticated weighting.
phrases. Some search engines will go further and stop-list and stem the query." However.-. "I'd like information about. ∗Create query representation.. since most publicly available search engines encourage very short queries. the higher the quality of results. Good statistical queries should contain many synonyms and other terms in order to create a full representation. including Boolean. this is the point at which the majority of publicly available search engines perform the search. As soon as a user inputs a query. break it down into understandable segments. conjunctions. though a system can cut these steps short and proceed to match the query to the inverted file at any of a number of places during the processing. However.g. the system needs to parse the query first into query terms and operators. Steps 3 and 4: Stop list and stemming. having too many documents to search against. search system designers must choose what is most important to their users — time or quality. and Named Entities. ————————> Matcher ∗Delete stop words. The steps in query processing are as follows (with the option to stop processing and start matching indicated as "Matcher"): ∗Tokenize query terms.. ordering). then the query must match the statistical representations of the documents in the system.. the search engine -.. In the case of an NLP system. How each particular search engine creates a query representation depends on how the system does its matching. At this point. the query processor will recognize the operators implicitly in the language used no matter how the operators might be expressed (e. ————————> Matcher ∗Expand query terms. a search engine may take the list of query terms and search them against the inverted file.Query Processor Query processing has seven possible steps.-. An NLP system will recognize single terms. Recognize query terms vs. If a Boolean matcher is utilized. Usually a token is defined as an alpha-numeric string that occurs between white space and/or punctuation. quotation marks) or reserved terms in specialized format (e. If it uses any Boolean logic. prepositions. The stop list might also contain words from commonly occurring querying phrases. Document processing shares many steps with query processing. ∗Stem words.g. the longer the wait for results. Step 2: Parsing. If a statistically based matcher is used.-.-. similar to the processes described above in the Document Processor section.e. such as. or NOT.must tokenize the query stream. as evidenced in the size of query window provided.-. the engines may drop these two steps. adjacency. Step 5: Creating the query. OR. special operators.g. More steps and more documents make the process more expensive for processing in terms of computational resources and responsiveness. it will also recognize the logical operators from Step 2 and create a representation containing logical
. Thus.-. OR). Publicly available search engines usually choose time over very high quality. These operators may occur in the form of reserved punctuation (e.--> Matcher Step 1: Tokenizing. ∗Compute weights.whether a keyword-based system or a full natural language processing (NLP) system -. -. In fact. AND. or proximity operators. i. Since users may employ special operators in their query. then the system must create logical sets of the terms connected by AND.
tf/idf. the expanded. While the computational processing required for simple. Boolean logic fulfillment. Search and Matching Function How systems carry out their search and matching functions differs according to which theoretical model of information retrieval underlies the system's design philosophy. a similarity score is computed between the query and each document/page based on the scoring algorithm used by the system. more sophisticated systems may expand the query into all possible synonymous terms and perhaps even broader and narrower terms. in the documents which the search engine searches against. or query term weights. no matter whether the search ends after the first two. intermediaries might have used the same controlled vocabulary or thesaurus used by the indexers who assigned subject descriptors to documents. Since users of search engines usually include only a single statement of their information needs in a query. Some search engines use scoring algorithms not based on document contents. Second. so they may not know the correct terminology. term frequency. and the matching algorithm. resources such as WordNet are generally available. Since making the distinctions between these models goes far beyond the goals of this article. non-Boolean query matching is far simpler than when the model is an NLP-based query within a weighted." is typically a standard binary search. More advanced search engines may take two further steps. it becomes highly probable that the information they need may be expressed using synonyms. on relations among documents or
. non-ambiguous queries seeking the most generally known information. Back then. At this point. Few search engines implement system-based query weighting. most users seek information about an unfamiliar subject. Scoring algorithms rankings are based on the presence/absence of query term(s). Sometimes the user controls this step by indicating either how much to weight each term or simply which term or concept in the query matters most and must appear in each retrieved document to ensure relevance. Having determined which subset of documents or pages matches the query requirements to some degree. or specialized expansion facilities may take the initial query and enlarge it by adding associated vocabulary. or NOT'd. Step 7: Query term weighting (assuming more than one query term). rather than the exact query terms. The engines use this information to provide a list of documents/pages to the user. OR'd. a search engine may take the query representation and perform the search against the inverted file. weighted query is searched against the inverted file of documents. Searching the inverted file for documents meeting the query requirements. but some do an implicit weighting by treating the first term(s) in a query as having higher significance. The final step in query processing involves computing weights for the terms in the query. Step 6: Query expansion. the query representation. it also follows that the simpler the document representation. Boolean model. Today. They can't make this determination for several reasons.sets of the terms to be AND'd. referred to simply as "matching. five. or all seven steps of query processing. This process approaches what search intermediaries did for end users in the earlier days of commercial search systems. but rather. and document terms are weighted by being compared to the database as a whole. unweighted. they don't know what else exists in the database. Leaving the weighting up to the user is not common. because research has shown that users are not particularly good at determining the relative importance of terms in their queries. First. After this final step. such as one-word. we will only make some broad generalizations in the following description of the search and matching function. except for very simple queries. the less relevant the results. Therefore.
What Document Features Make a Good Match to a Query We have discussed how search engines work.
. the ranked results list goes to the user. More sophisticated systems will go even further at this stage and allow the user to provide some relevance feedback or to modify their query based on the results they have seen. e." Many of the non-relevant documents presented to users result from matching the right word. as well as the richness of the document and query weighting mechanisms. nor are higher weights assigned to appropriate distinguishing (and less frequently-occurring) terms. several situations can undermine this premise. Some studies show that the location — in which a term occurs in a document or on a page — indicates its significance to the document. in a collection of documents in a particular domain. who can then simply click and follow the system's internal pointers to the selected document/page. but what features of a query make for good matches? Let's look at the key features and consider some pros and cons of their utility in helping to retrieve a good representation of documents/pages. Think of words like "pool" or "fire. many words have multiple meanings — they are polysemous. Terms occurring in the title of a document or page that match a query term are therefore frequently weighted more heavily than terms occurring in the body of the document. such as education. Search engines that don't use a tf/idf weighting algorithm do not appropriately down-weight the overly frequent terms.g. "earlychildhood. Term frequency: How frequently a query term appears in a document is one of the most obvious ways of determining a document's relevance to a query. query terms occurring in section headings or the first paragraph of a document may be more likely to be relevant.past retrieval history of documents/pages. but with the wrong meaning. the system presents an ordered list to the user. in a document would produce a very different ranking than one by a search engine that performed linguistically correct phrasing for both document and query representation and that utilized the proven tf/idf weighting scheme. search engines that only require the presence of any alpha-numeric string from the query occurring anywhere. If either of these are available. in any order. the system will then adjust its query representation to reflect this value-added feedback and re-run the search with the improved query to produce either a new set of documents or a simple reranking of documents from the initial search. However the search engine determines rank. The sophistication of the ordering of the documents again depends on the model the system uses. While most often true. common query terms such as "education" or "teaching" are so common and occur so frequently that an engine's ability to distinguish the relevant from the non-relevant in a collection declines sharply. Similarly.." Location of terms: Many search engines give preference to words found in the title or lead paragraph or in the meta data of a document. First. For example. Also. After computing the similarity of each document in the subset of documents.
Date of Publication: Some search engines assume that the more recent the information is. Popularity utilizes data on the frequency with which a page is chosen by all users as a means of predicting relevance. While popularity is a good indicator at times. it is a factor when used to compute the relative merit of similar pages. So. search engines are simple test parsers. if the search engine assumes that you are searching for a name instead of the same word as a normal everyday term. places. when you were looking for pictures of Madonnas for an art history class. but first. it assumes that the underlying information need remains the same. Proximity of query terms: When the terms in a query occur near to each other within a document. the document that contains a proportionately higher occurrence of the term relative to the length of the document is assumed more likely to be relevant. in a choice between two documents both containing the same query terms. They can't understand text. The engines therefore present results beginning with the most recent to the less current. While some search engines do not recognize phrases per se in queries." the rock star. A widely passed around myth on web master forums is that duplicate content is viewed by search engines as a percentage. They take a series of words and try to reduce them to their core meaning. Length: While length per se does not necessarily predict relevance. Lets start with content.those referred to by many other pages. some search engines clearly rank documents in results higher if the query terms occur adjacent to one another or in closer proximity. It's a nice thought. we can discuss methods to take advantage of them. Proper nouns sometimes have higher weights. it's just too bad that it is completely wrong. as compared to documents in which the terms occur at a distance. but we black hats will evolve as well always aiming to stay at least one step ahead. since so many searches are performed on people. As long as you stay below the threshold. or things. As you saw in the above pages. we need to understand duplicate content. it is more likely that the document is relevant to the query than if the terms occur at greater distance. Lets discuss the basics of generating content as well as some software used to do so. the more likely that it will be useful or relevant to the user. While this may be useful. This of course will change over time as search engines evolve and the cost of hardware falls. nor do they have the capability of discerning between grammatically correct text and complete gibberish. then the search results may be peculiarly skewed. Imagine getting information on "Madonna.
Now that we have covered how a search engine works.
. or have a high number of "in-links" Popularity: Google and several other search engines add popularity to link analysis to help determine the relevance or value of pages. you pass by penalty free.
Here is what I have discovered. the. I’ve seen the following used as an example so lets use it here as well. To gain some understanding we need to take a look at the k-shingle algorithm that may or may not be in use by the major search engines (my money is that it is in use). Simply saying that “by changing 25 percent of the text on a page it is no longer duplicate content” is not a true or accurate statement. If too many “fingerprints” match other documents the score becomes high enough that the search engines flag the page as duplicate content thus sending it to supplemental hell or worse deleting it from their index completely. They also strip out all fill words. The search engine can now compare this “fingerprint” to other pages in an attempt to find duplicate content. Before we get to this point the search engine has already stripped all tags and HTML from the page leaving just this plain text behind for us to take a look at. Lets examine why that is. As duplicates are found a “duplicate content” score is assigned to the page. to. The first thing they do is strip out all stop words like and. Let’s suppose that you have a page that contains the following text: The swift brown fox jumped over the lazy dog. (I'm going to include the stop words for simplicity) The swift brown fox swift brown fox jumped brown fox jumped over fox jumped over the jumped over the lazy over the lazy dog These are essentially like unique fingerprints that identify this block of text. The shingling algorithm essentially finds word groups within a body of text in order to determine the uniqueness of the text.Duplicate Content
I’ve read seemingly hundreds of forum posts discussing duplicate content. Once this is done the following “shingles” are created from the above text. Most people are under the assumption that duplicate content is looked at on the page level when in fact it is far more complex than that.
. leaving us only with action words which are considered the core of the content. none of which gave the full picture. I decided to spend some time doing research to find out exactly what goes on behind the scenes. of. leaving me with more questions than answers.
You can't simply add generic stop words here and there and expect to fool anyone. Always split test and perform controlled experiments. Second we can see that people adding “stop words” and “filler words” to avoid duplicate content are largely wasting their time.
. we also understand the intricacies of duplicate content and what it takes to avoid it. Everything you do should be from the standpoint of a scientist. I suggest experimenting and finding what works for you in your situation. There is no magic involved in SEO. Think through every decision using logic and reasoning. Now it is time to check out some basic content generation techniques. The last paragraph here is the real important part when generating content.My old lady swears that she saw the lazy dog jump over the swift brown fox.). we're dealing with a computer algorithm here. not some supernatural power. It’s the “action” words that should be the focus. So what can we take away from the above examples? First and foremost we quickly begin to realize that duplicate content is far more difficult than saying “document A and document B are 50 percent similar”. Then again there may be other mechanisms at work that we can’t yet see rendering that impossible as well. Thus it is unlikely that these two documents are duplicates of one another. just raw data and numbers. No one but Google knows what the percentage match must be for these two documents to be considered duplicates.
What Makes A Good Content Generator? Now we understand how a search engine parses documents on the web. but some thorough testing would sure narrow it down . Remember. Changing action words without altering the meaning of a body of text may very well be enough to get past these algorithms. The above gives us the following shingles: my old lady swears old lady swears that lady swears that she swears that she saw that she saw the she saw the lazy saw the lazy dog the lazy dog jump lazy dog jump over dog jump over the jump over the swift over the swift brown the swift brown fox Comparing these two sets of shingles we can see that only one matches (”the swift brown fox“).
So what works today? Now and in the future. It sounds complicated. we're lazy! The other gripe is the ip cloaking. So. and Theory of Relativity. Unless things change drastically I would avoid this one. Their ip list is terribly out of date only containing a couple thousand ip's as of this writing. Markov isn't actually intended for content generation. lets say they index a page and find words like atomic bomb. It takes days just to setup a few decent pages. it's actually something called a Markov Chain which was developed by mathematician Andrey Markov. The content engine doesn't do anything to address LSI. Remember. For example. but I was really let down by this software. ranking for a keyword phrase is no longer as simple as having content that talks about and repeats the target keyword phrase over and over like the good old days. The idea is that the search engine can process those words. I do have two complaints.One of the more commonly used text spinners is known as Markov. So. LSI is basically just a process by which a search engine can infer the meaning of a page based on the content of that page. which software works. I'm being harsh.00 you basically get a clunky outdated interface for slowly building HTML pages. I personally stopped using Markov back in 2006 or 2007 after developing my own proprietary content engine. The other issue with Markov is the fact that it will likely never pass a human review for readability. and which doesn't? Software Fantomaster Shadowmaker: This is probably one of the oldest and most commonly known high end cloaking packages being sold. This one is of good quality and with work does provide results. They are worth taking a look at just to understand the fundamentals. For $3. The content engine is mostly manual making you build sentences which are then mixed together for your content. but it's also typically VERY unreadable. It's also one of the most out of date.000. Germany. If you understand SEO and have the time to dedicate to creating the content. The software is SLOW. LSI stands for Latent Semantic Indexing. the pages built last a long time. we've talked about the old methods of doing things. but this isn't 1999. That in itself isn't very black hat. Some popular software that uses Markov chains include RSSGM and YAGC both of which are pretty old and outdated at this point. find relational data and determine that the page is about Albert Einstein. you can't fool the search engines by simply repeating a keyword over and over in the body of your pages (I wish it were still that easy). If you don't shuffle the Markov chains enough you also run into duplicate content issues because of the nature of shingling as discussed earlier. The quality of the output really depends on the quality of the input. LSI is becoming more and more important. The algorithm takes each word in a body of content and changes the order based on the algorithm. Manhattan Project. This produces largely unique text.
. but there are FAR better packages out there. but it really isn't. I know. So if Markov is easy to detect and LSI is starting to become more important. Some people may be able to get around this by replacing words in the content with synonyms. Now we need to make sure we have other key phrases that the search engine thinks are related to the main key phrase. SEC (Search Engine Cloaker): Another well known paid script. It simply splices unrelated sentences together from random sources while tossing in your keyword randomly.
Cloaking automatically gets a bad reputation. BlogSolution falls short in almost every important area. that's cloaking.0 (SymbianOS/9. The interface has the feel of a system developed by real professionals. user agent cloaking and ip based cloaking. and even iframe. but faster. Have you ever visited a web site with your cell phone and been automatically directed to the mobile version of the site? Guess what. we can tell the difference between a mobile phone visiting our page and a regular visitor viewing our page with Internet Explorer or Firefox for example.SSEC or Simplified Search Engine Content: This is one of the best IP delivery systems on the market.06. Blog Cloaker: Another solid offering from the guys that developed SSEC. You can also mix and match the content sources giving you the ultimate in control. This again is an ip cloaking solution with the same industry leading ip list as SSEC. and circumvent. like Gecko) Safari/413 Knowing this. You can easily put together several thousand sites each with hundreds of pages of content in just a few hours. We can then write a script that will show different information to those users based on their user agent. It's flexible. even Google cloaks.000 ip's. This is also the fastest page builder I have come across. Using that plug-in I can make the
but that is a very difficult and time consuming thing to pull of. This cache is going to contain the page as the search engine bot saw it at indexing time. This means your competition can view your cloaked page by clicking on the cache in the SERPS. So. Once we have that information. Lets discuss some common and not so common methods for doing so. or simply to add as a link in their blog directory. what else can we do if user agents are so easy to spoof? IP Cloaking Every visitor to your web site must first establish a connection with an ip address. Every search engine crawler must identify itself with a unique signature viewable by reverse dns lookup. use ip cloaking. This method certainly drives the bots. Blog indexing services setup a protocol in which a web site can send a ping whenever new pages are added to a blog. When a human visits that same page I can show an ad. so there is no way to circumvent ip based cloaking (although some caution must be taken as we will discuss). Black Hats exploit this by writing scripts that send out massive numbers of pings to various services in order to entice bots to crawl their pages.script think that I am a Google search engine bot.
. For example. or an affiliate product so I can make some money. The use of the meta tag noarchive in your pages forces the search engines to show no cached copy of your page in the search results. Blog ping: This one is quite old. only lazy! As we build pages. This means we have a sure fire method for identifying and cloaking based on ip address. thus rendering your cloaking completely useless. but still widely used. The only other method of detection involves ip spoofing. so you are still out of luck. but the problem here is that the data for the page would be sent to the ip you are spoofing which isn't on your computer. The most difficult part of ip cloaking is compiling a list of known search engine ip's. I can show a search engine bot a keyword targeted page full of key phrases related to what I want to rank for. so you avoid snooping web masters. This would allow you to connect as though you were a search engine bot. but in the last couple years it has lost most of its power as far as getting pages to rank. They can then send over a bot that grabs the page content for indexing and searching. The lesson here? If you are serious about this. we can then show different pages to different users based on the ip they visit our page with. it's easy to get around that. Luckily software like Blog Cloaker and SSEC already does this for us. These ip addresses resolve to dns servers which in turn identify the origin of that visitor. Black Hats are Basically White Hats. See the power and potential here? So how can we detect ip cloaking? Every major search engine maintains a cache of the pages it indexes. That's ok. This also means that we don't rely on the user agent at all. Basically you configure a computer to act as if it is using one of Google's ip's when it visits a page. It is very difficult to detect and by far the most solid option. we also need links to get those pages to rank.
Link Building As we discussed earlier.
but it's still a viable tool. etc. but it takes some more creativity. so it would be almost impossible to detect. how a search engine works. then back to A. page B links to page C. Forums and Guest books: The internet contains millions of forums and guest books all ripe for the picking. We took advantage of that by posting millions of links to our pages on these abandoned sites. They have over 300 servers all over the world with thousands of ip's. So. So much so that most forums have methods in place to detect and reject these types of links. trackbacks are basically a method in which one blog can tell another blog that it has posted something related to or in response to an existing blog post. the key here is to have a very diverse pool of links. Universities and government agencies with very high ranking web sites often times have very old message boards they have long forgotten about. These are pretty easy to detect because of the limited range of ip's involved. We're talking about abandoned forums. This gave a HUGE boost to rankings and made some very lucky Viagra spammers millions of dollars. that still leaves you with thousands in which you can drop links where no one will likely notice or even care. Software packages like Xrumer made this a VERY popular way to gather back links. EDU links: A couple years ago Black Hats noticed an odd trend. but that still have public access. So how do you pull all of it together to make some money?
. content generation.Trackback: Another method of communication used by blogs. The effectiveness of this approach has diminished over time. these have been popular for years. software that is pure gold and even link building strategies. Link Networks: Also known as link farms. Most are very simplistic in nature. we see that as an opportunity to inject links to thousands of our own pages by automating the process and sending out trackbacks to as many blogs as we can. While most forums are heavily moderated (at least the active ones). It doesn't take much processing to figure out that there are only a few people involved with all of the links. Now. old guest books. A search engine would have to discount links completely in order to filter these links out.
Money Making Strategies We now have a solid understanding of cloaking. you can get links dropped on active forums as well. Some people still use them and are still successful. Putting up a post related to the topic on the forum and dropping your link In the BB code for a smiley for example. Most blogs these days have software in place that greatly limits or even eliminates trackback spam. software to avoid. As a black hat. Page A links to page B. Take a look at Link Exchange for example.
then send all of your doorway/cloaked traffic to the index page. The Landing Page Builder shows the best possible page with ads based on what the incoming user searched for.blackhat360.he traffic you send it. then redirect straight to the merchant or affiliates sales page. Direct Linking is where you setup your cloaked pages with all of your product keywords. We automate the difficult and time consuming tasks so we can focus on the important tasks at hand. or by using something like Landing Page Builder which automates everything for us. Landing pages give us a place to send and clean our traffic. I would like to thank you for taking the time to read this. you can follow me on my personal blog over at http://www. setup a template with your ads or offers. Be sure to register and post on the forums if you have any comments or questions. Black Hat Marketing isn't all that different from White Hat marketing. we want to make money. but as I said. Some networks and affiliates allow direct linking. some affiliates don't allow Direct Linking. I plan to update often. again.
. After all. You load up your money keyword list. Conclusion As we can see. So. This often results in the highest conversion rates. and it automates the difficult tasks we all hate. that's where Landing Pages come in.com . Either building your own (which we are far too lazy to do). Affiliate Marketing: We all know what an affiliate program is. That again is where good software and cloaking comes into play. they also prequalify the buyer and make sure the quality of the traffic sent to the affiliate is as high as possible. The most difficult part of affiliate marketing is getting well qualified targeted traffic. In the mean time. There are literally tens of thousands of affiliate programs with millions of products to sell. Couldn't be easier. but we also want to keep a strong relationship with the affiliate so we can get paid.