You are on page 1of 15

Black Hat SEO Crash Course V1.

Introduction

If you have spent any significant amount of time online, you have likely come across the term
Black Hat at one time or another. This term is usually associated with many negative comments. This
book is here to address those comments and provide some insight into the real life of a Black Hat SEO
professional. To give you some background, my name is Brian. I've been involved in internet marketing
for close to 10 years now, the last 7 of which have been dedicated to Black Hat SEO. As we will
discuss shortly, you can't be a great Black Hat without first becoming a great White Hat marketer. With
the formalities out of the way, lets get into the meat of things, shall we?

What is Black Hat SEO?

The million dollar question that everyone has an opinion on. What exactly is Black Hat SEO?
The answer here depends largely on who you ask. Ask most White Hats and they immediately quote
the Google Webmaster Guidelines like a bunch of lemmings. Have you ever really stopped to think
about it though? Google publishes those guidelines because they know as well as you and I that they
have no way of detecting or preventing what they preach so loudly. They rely on droves of webmasters
to blindly repeat everything they say because they are an internet powerhouse and they have everyone
brainwashed into believing anything they tell them. This is actually a good thing though. It means that
the vast majority of internet marketers and SEO professionals are completely blind to the vast array of
tools at their disposal that not only increase traffic to their sites, but also make us all millions in
revenue every year.
The second argument you are likely to hear is the age old ,“the search engines will ban your sites if you
use Black Hat techniques”. Sure, this is true if you have no understanding of the basic principals or
practices. If you jump in with no knowledge you are going to fail. I'll give you the secret though.
Ready? Don't use black hat techniques on your White Hat domains. Not directly at least. You aren't
going to build doorway or cloaked pages on your money site, that would be idiotic. Instead you buy
several throw away domains, build your doorways on those and cloak/redirect the traffic to your money
sites. You lose a doorway domain, who cares? Build 10 to replace it. It isn't rocket science, just common
sense. A search engine can't possibly penalize you for outside influences that are beyond your control.
They can't penalize you for incoming links, nor can they penalize you for sending traffic to your domain
from other doorway pages outside of that domain. If they could, I would simply point doorway pages
and spam links at my competitors to knock them out of the SERPS. See..... Common sense.

So again, what is Black Hat SEO? In my opinion, Black Hat SEO and White Hat SEO are
almost no different. White hat web masters spend time carefully finding link partners to increase
rankings for their keywords, Black Hats do the same thing, but we write automated scripts to do it while
we sleep. White hat SEO's spend months perfecting the on page SEO of their sites for maximum
rankings, black hat SEO's use content generators to spit out thousands of generated pages to see which
version works best. Are you starting to see a pattern here? You should, Black Hat SEO and White Hat
SEO are one in the same with one key difference. Black Hats are lazy. We like things automated. Have
you ever heard the phrase "Work smarter not harder?" We live by those words. Why spend weeks or
months building pages only to have Google slap them down with some obscure penalty. If you have
spent any time on web master forums you have heard that story time and time again. A web master
plays by the rules, does nothing outwardly wrong or evil, yet their site is completely gone from the
SERPS (Search Engine Results Pages) one morning for no apparent reason. It's frustrating, we've all
been there. Months of work gone and nothing to show for it. I got tired of it as I am sure you are. That's
when it came to me. Who elected the search engines the "internet police"? I certainly didn't, so why
play by their rules? In the following pages I'm going to show you why the search engines rules make no
sense, and further I'm going to discuss how you can use that information to your advantage.

Search Engine 101

As we discussed earlier, every good Black Hat must be a solid White Hat. So, lets start with the
fundamentals. This section is going to get technical as we discuss how search engines work and delve
into ways to exploit those inner workings. Lets get started, shall we?

Search engines match queries against an index that they create. The index consists of the words in each
document, plus pointers to their locations within the documents. This is called an inverted file. A search
engine or IR (Information Retrieval) system comprises four essential modules:
∗A document processor
∗A query processor
∗A search and matching function
∗A ranking capability

While users focus on "search," the search and matching function is only one of the four modules. Each
of these four modules may cause the expected or unexpected results that consumers get when they use a
search engine.
Document Processor
The document processor prepares, processes, and inputs the documents, pages, or sites that users search
against. The document processor performs some or all of the following steps:
∗Normalizes the document stream to a predefined format.
∗Breaks the document stream into desired retrievable units.
∗Isolates and meta tags sub document pieces.
∗Identifies potential indexable elements in documents.
∗Deletes stop words.
∗Stems terms.
∗Extracts index entries.
∗Computes weights.
∗Creates and updates the main inverted file against which the search engine searches in order to
match queries to documents.
Step 4: Identify elements to index. Identifying potential indexable elements in documents dramatically
affects the nature and quality of the document representation that the engine will search against. In
designing the system, we must define the word "term." Is it the alpha-numeric characters between
blank spaces or punctuation? If so, what about non-compositional phrases (phrases in which the
separate words do not convey the meaning of the phrase, like "skunk works" or "hot dog"), multi-word
proper names, or inter-word symbols such as hyphens or apostrophes that can denote the difference
between "small business men" versus small-business men." Each search engine depends on a set of
rules that its document processor must execute to determine what action is to be taken by the
"tokenizer," i.e. the software used to define a term suitable for indexing.

Step 5: Deleting stop words. This step helps save system resources by eliminating from further
processing, as well as potential matching, those terms that have little value in finding useful documents
in response to a customer's query. This step used to matter much more than it does now when memory
has become so much cheaper and systems so much faster, but since stop words may comprise up to 40
percent of text words in a document, it still has some significance. A stop word list typically consists of
those word classes known to convey little substantive meaning, such as articles (a, the), conjunctions
(and, but), interjections (oh, but), prepositions (in, over), pronouns (he, it), and forms of the "to be"
verb (is, are). To delete stop words, an algorithm compares index term candidates in the documents
against a stop word list and eliminates certain terms from inclusion in the index for searching.

Step 6: Term Stemming. Stemming removes word suffixes, perhaps recursively in layer after layer of
processing. The process has two goals. In terms of efficiency, stemming reduces the number of unique
words in the index, which in turn reduces the storage space required for the index and speeds up the
search process. In terms of effectiveness, stemming improves recall by reducing all forms of the word to
a base or stemmed form. For example, if a user asks for analyze, they may also want documents which
contain analysis, analyzing, analyzer, analyzes, and analyzed. Therefore, the document processor stems
document terms to analy- so that documents which include various forms of analy- will have equal
likelihood of being retrieved; this would not occur if the engine only indexed variant forms separately
and required the user to enter all. Of course, stemming does have a downside. It may negatively affect
precision in that all forms of a stem will match, when, in fact, a successful query for the user would
have come from matching only the word form actually used in the query.

Systems may implement either a strong stemming algorithm or a weak stemming algorithm. A strong
stemming algorithm will strip off both inflectional suffixes (-s, -es, -ed) and derivational suffixes (-able,
-aciousness, -ability), while a weak stemming algorithm will strip off only the inflectional suffixes (-s, -
es, -ed).

Step 7: Extract index entries. Having completed steps 1 through 6, the document processor extracts the
remaining entries from the original document. For example, the following paragraph shows the full text
sent to a search engine for processing:

Milosevic's comments, carried by the official news agency Tanjug, cast doubt over the
governments at the talks, which the international community has called to try to prevent an
all-out war in the Serbian province. "President Milosevic said it was well known that Serbia
and Yugoslavia were firmly committed to resolving problems in Kosovo, which is an
integral part of Serbia, peacefully in Serbia with the participation of the representatives of
all ethnic communities," Tanjug said. Milosevic was speaking during a meeting with British
Foreign Secretary Robin Cook, who delivered an ultimatum to attend negotiations in a
week's time on an autonomy proposal for Kosovo with ethnic Albanian leaders from the
province. Cook earlier told a conference that Milosevic had agreed to study the proposal.

Steps 1 to 6 reduce this text for searching to the following:


Milosevic comm carri offic new agen Tanjug cast doubt govern talk interna commun call try
prevent all-out war Serb province President Milosevic said well known Serbia Yugoslavia
firm commit resolv problem Kosovo integr part Serbia peace Serbia particip representa
ethnic commun Tanjug said Milosevic speak meeti British Foreign Secretary Robin Cook
deliver ultimat attend negoti week time autonomy propos Kosovo ethnic Alban lead
province Cook earl told conference Milosevic agree study propos.

The output of step 7 is then inserted and stored in an inverted file that lists the index entries and an
indication of their position and frequency of occurrence. The specific nature of the index entries,
however, will vary based on the decision in Step 4 concerning what constitutes an "indexable term."
More sophisticated document processors will have phrase recognizers, as well as Named Entity
recognizers and Categorizers, to insure index entries such as Milosevic are tagged as a Person and
entries such as Yugoslavia and Serbia as Countries.
Step 8: Term weight assignment. Weights are assigned to terms in the index file. The simplest of search
engines just assign a binary weight: 1 for presence and 0 for absence. The more sophisticated the search
engine, the more complex the weighting scheme. Measuring the frequency of occurrence of a term in
the document creates more sophisticated weighting, with length-normalization of frequencies still more
sophisticated. Extensive experience in information retrieval research over many years has clearly
demonstrated that the optimal weighting comes from use of "tf/idf." This algorithm measures the
frequency of occurrence of each term within a document. Then it compares that frequency against the
frequency of occurrence in the entire database.
Not all terms are good "discriminators" — that is, all terms do not single out one document from
another very well. A simple example would be the word "the." This word appears in too many
documents to help distinguish one from another. A less obvious example would be the word
"antibiotic." In a sports database when we compare each document to the database as a whole, the term
"antibiotic" would probably be a good discriminator among documents, and therefore would be
assigned a high weight. Conversely, in a database devoted to health or medicine, "antibiotic" would
probably be a poor discriminator, since it occurs very often. The TF/IDF weighting scheme assigns
higher weights to those terms that really distinguish one document from the others.
Query Processor
Query processing has seven possible steps, though a system can cut these steps short and proceed to
match the query to the inverted file at any of a number of places during the processing. Document
processing shares many steps with query processing. More steps and more documents make the process
more expensive for processing in terms of computational resources and responsiveness. However, the
longer the wait for results, the higher the quality of results. Thus, search system designers must choose
what is most important to their users — time or quality. Publicly available search engines usually
choose time over very high quality, having too many documents to search against.
The steps in query processing are as follows (with the option to stop processing and start matching
indicated as "Matcher"):
∗Tokenize query terms.
Recognize query terms vs. special operators.
————————> Matcher
∗Delete stop words.
∗Stem words.
∗Create query representation.
————————> Matcher
∗Expand query terms.
∗Compute weights.
-- -- -- -- -- -- -- --> Matcher
Step 1: Tokenizing. As soon as a user inputs a query, the search engine -- whether a keyword-based
system or a full natural language processing (NLP) system -- must tokenize the query stream, i.e.,
break it down into understandable segments. Usually a token is defined as an alpha-numeric string that
occurs between white space and/or punctuation.
Step 2: Parsing. Since users may employ special operators in their query, including Boolean, adjacency,
or proximity operators, the system needs to parse the query first into query terms and operators. These
operators may occur in the form of reserved punctuation (e.g., quotation marks) or reserved terms in
specialized format (e.g., AND, OR). In the case of an NLP system, the query processor will recognize
the operators implicitly in the language used no matter how the operators might be expressed (e.g.,
prepositions, conjunctions, ordering).
At this point, a search engine may take the list of query terms and search them against the inverted file.
In fact, this is the point at which the majority of publicly available search engines perform the search.
Steps 3 and 4: Stop list and stemming. Some search engines will go further and stop-list and stem the
query, similar to the processes described above in the Document Processor section. The stop list might
also contain words from commonly occurring querying phrases, such as, "I'd like information about."
However, since most publicly available search engines encourage very short queries, as evidenced in
the size of query window provided, the engines may drop these two steps.
Step 5: Creating the query. How each particular search engine creates a query representation depends
on how the system does its matching. If a statistically based matcher is used, then the query must match
the statistical representations of the documents in the system. Good statistical queries should contain
many synonyms and other terms in order to create a full representation. If a Boolean matcher is utilized,
then the system must create logical sets of the terms connected by AND, OR, or NOT.
An NLP system will recognize single terms, phrases, and Named Entities. If it uses any Boolean logic,
it will also recognize the logical operators from Step 2 and create a representation containing logical
sets of the terms to be AND'd, OR'd, or NOT'd.

At this point, a search engine may take the query representation and perform the search against the
inverted file. More advanced search engines may take two further steps.
Step 6: Query expansion. Since users of search engines usually include only a single statement of their
information needs in a query, it becomes highly probable that the information they need may be
expressed using synonyms, rather than the exact query terms, in the documents which the search engine
searches against. Therefore, more sophisticated systems may expand the query into all possible
synonymous terms and perhaps even broader and narrower terms.
This process approaches what search intermediaries did for end users in the earlier days of commercial
search systems. Back then, intermediaries might have used the same controlled vocabulary or thesaurus
used by the indexers who assigned subject descriptors to documents. Today, resources such as WordNet
are generally available, or specialized expansion facilities may take the initial query and enlarge it by
adding associated vocabulary.
Step 7: Query term weighting (assuming more than one query term). The final step in query processing
involves computing weights for the terms in the query. Sometimes the user controls this step by
indicating either how much to weight each term or simply which term or concept in the query matters
most and must appear in each retrieved document to ensure relevance.
Leaving the weighting up to the user is not common, because research has shown that users are not
particularly good at determining the relative importance of terms in their queries. They can't make this
determination for several reasons. First, they don't know what else exists in the database, and document
terms are weighted by being compared to the database as a whole. Second, most users seek information
about an unfamiliar subject, so they may not know the correct terminology.
Few search engines implement system-based query weighting, but some do an implicit weighting by
treating the first term(s) in a query as having higher significance. The engines use this information to
provide a list of documents/pages to the user.
After this final step, the expanded, weighted query is searched against the inverted file of documents.

Search and Matching Function


How systems carry out their search and matching functions differs according to which theoretical
model of information retrieval underlies the system's design philosophy. Since making the distinctions
between these models goes far beyond the goals of this article, we will only make some broad
generalizations in the following description of the search and matching function.
Searching the inverted file for documents meeting the query requirements, referred to simply as
"matching," is typically a standard binary search, no matter whether the search ends after the first two,
five, or all seven steps of query processing. While the computational processing required for simple,
unweighted, non-Boolean query matching is far simpler than when the model is an NLP-based query
within a weighted, Boolean model, it also follows that the simpler the document representation, the
query representation, and the matching algorithm, the less relevant the results, except for very simple
queries, such as one-word, non-ambiguous queries seeking the most generally known information.
Having determined which subset of documents or pages matches the query requirements to some
degree, a similarity score is computed between the query and each document/page based on the scoring
algorithm used by the system. Scoring algorithms rankings are based on the presence/absence of query
term(s), term frequency, tf/idf, Boolean logic fulfillment, or query term weights. Some search engines
use scoring algorithms not based on document contents, but rather, on relations among documents or
past retrieval history of documents/pages.

After computing the similarity of each document in the subset of documents, the system presents an
ordered list to the user. The sophistication of the ordering of the documents again depends on the model
the system uses, as well as the richness of the document and query weighting mechanisms. For example,
search engines that only require the presence of any alpha-numeric string from the query occurring
anywhere, in any order, in a document would produce a very different ranking than one by a search
engine that performed linguistically correct phrasing for both document and query representation and
that utilized the proven tf/idf weighting scheme.
However the search engine determines rank, the ranked results list goes to the user, who can then simply
click and follow the system's internal pointers to the selected document/page.
More sophisticated systems will go even further at this stage and allow the user to provide some
relevance feedback or to modify their query based on the results they have seen. If either of these are
available, the system will then adjust its query representation to reflect this value-added feedback and
re-run the search with the improved query to produce either a new set of documents or a simple re-
ranking of documents from the initial search.

What Document Features Make a Good Match to a Query


We have discussed how search engines work, but what features of a query make for good matches?
Let's look at the key features and consider some pros and cons of their utility in helping to retrieve a
good representation of documents/pages.
Term frequency: How frequently a query term appears in a document is one of the most
obvious ways of determining a document's relevance to a query. While most often true,
several situations can undermine this premise. First, many words have multiple meanings
— they are polysemous. Think of words like "pool" or "fire." Many of the non-relevant
documents presented to users result from matching the right word, but with the wrong
meaning.

Also, in a collection of documents in a particular domain, such as education, common query terms such
as "education" or "teaching" are so common and occur so frequently that an engine's ability to
distinguish the relevant from the non-relevant in a collection declines sharply. Search engines that don't
use a tf/idf weighting algorithm do not appropriately down-weight the overly frequent terms, nor are
higher weights assigned to appropriate distinguishing (and less frequently-occurring) terms, e.g., "early-
childhood."
Location of terms: Many search engines give preference to words found in the title or lead
paragraph or in the meta data of a document. Some studies show that the location — in
which a term occurs in a document or on a page — indicates its significance to the
document. Terms occurring in the title of a document or page that match a query term are
therefore frequently weighted more heavily than terms occurring in the body of the
document. Similarly, query terms occurring in section headings or the first paragraph of a
document may be more likely to be relevant.
those referred to by many other pages, or have a high number of "in-links"

Popularity: Google and several other search engines add popularity to link analysis to help
determine the relevance or value of pages. Popularity utilizes data on the frequency with
which a page is chosen by all users as a means of predicting relevance. While popularity is
a good indicator at times, it assumes that the underlying information need remains the same.

Date of Publication: Some search engines assume that the more recent the information is,
the more likely that it will be useful or relevant to the user. The engines therefore present
results beginning with the most recent to the less current.

Length: While length per se does not necessarily predict relevance, it is a factor when used
to compute the relative merit of similar pages. So, in a choice between two documents both
containing the same query terms, the document that contains a proportionately higher
occurrence of the term relative to the length of the document is assumed more likely to be
relevant.

Proximity of query terms: When the terms in a query occur near to each other within a
document, it is more likely that the document is relevant to the query than if the terms occur
at greater distance. While some search engines do not recognize phrases per se in queries,
some search engines clearly rank documents in results higher if the query terms occur
adjacent to one another or in closer proximity, as compared to documents in which the terms
occur at a distance.

Proper nouns sometimes have higher weights, since so many searches are performed on
people, places, or things. While this may be useful, if the search engine assumes that you
are searching for a name instead of the same word as a normal everyday term, then the
search results may be peculiarly skewed. Imagine getting information on "Madonna," the
rock star, when you were looking for pictures of Madonnas for an art history class.

Summary

Now that we have covered how a search engine works, we can discuss methods to take
advantage of them. Lets start with content. As you saw in the above pages, search engines
are simple test parsers. They take a series of words and try to reduce them to their core
meaning. They can't understand text, nor do they have the capability of discerning between
grammatically correct text and complete gibberish. This of course will change over time as
search engines evolve and the cost of hardware falls, but we black hats will evolve as well
always aiming to stay at least one step ahead. Lets discuss the basics of generating content
as well as some software used to do so, but first, we need to understand duplicate content. A
widely passed around myth on web master forums is that duplicate content is viewed by
search engines as a percentage. As long as you stay below the threshold, you pass by penalty
free. It's a nice thought, it's just too bad that it is completely wrong.
Duplicate Content

I’ve read seemingly hundreds of forum posts discussing duplicate content, none of which
gave the full picture, leaving me with more questions than answers. I decided to spend some
time doing research to find out exactly what goes on behind the scenes. Here is what I have
discovered.

Most people are under the assumption that duplicate content is looked at on the page level
when in fact it is far more complex than that. Simply saying that “by changing 25 percent
of the text on a page it is no longer duplicate content” is not a true or accurate statement.
Lets examine why that is.

To gain some understanding we need to take a look at the k-shingle algorithm that may or
may not be in use by the major search engines (my money is that it is in use). I’ve seen the
following used as an example so lets use it here as well.
Let’s suppose that you have a page that contains the following text:

The swift brown fox jumped over the lazy dog.

Before we get to this point the search engine has already stripped all tags and HTML from
the page leaving just this plain text behind for us to take a look at.

The shingling algorithm essentially finds word groups within a body of text in order to
determine the uniqueness of the text. The first thing they do is strip out all stop words like
and, the, of, to. They also strip out all fill words, leaving us only with action words which
are considered the core of the content. Once this is done the following “shingles” are
created from the above text. (I'm going to include the stop words for simplicity)

The swift brown fox


swift brown fox jumped
brown fox jumped over
fox jumped over the
jumped over the lazy
over the lazy dog

These are essentially like unique fingerprints that identify this block of text. The search
engine can now compare this “fingerprint” to other pages in an attempt to find duplicate
content. As duplicates are found a “duplicate content” score is assigned to the page. If too
many “fingerprints” match other documents the score becomes high enough that the search
engines flag the page as duplicate content thus sending it to supplemental hell or worse
deleting it from their index completely.
My old lady swears that she saw the lazy dog jump over the swift brown fox.

The above gives us the following shingles:

my old lady swears


old lady swears that
lady swears that she
swears that she saw
that she saw the
she saw the lazy
saw the lazy dog
the lazy dog jump
lazy dog jump over
dog jump over the
jump over the swift
over the swift brown
the swift brown fox

Comparing these two sets of shingles we can see that only one matches (”the swift brown fox“). Thus it
is unlikely that these two documents are duplicates of one another. No one but Google knows what the
percentage match must be for these two documents to be considered duplicates, but some thorough
testing would sure narrow it down ;).

So what can we take away from the above examples? First and foremost we quickly begin to realize
that duplicate content is far more difficult than saying “document A and document B are 50 percent
similar”. Second we can see that people adding “stop words” and “filler words” to avoid duplicate
content are largely wasting their time. It’s the “action” words that should be the focus. Changing action
words without altering the meaning of a body of text may very well be enough to get past these
algorithms. Then again there may be other mechanisms at work that we can’t yet see rendering that
impossible as well. I suggest experimenting and finding what works for you in your situation.
The last paragraph here is the real important part when generating content. You can't simply add generic
stop words here and there and expect to fool anyone. Remember, we're dealing with a computer
algorithm here, not some supernatural power. Everything you do should be from the standpoint of a
scientist. Think through every decision using logic and reasoning. There is no magic involved in SEO,
just raw data and numbers. Always split test and perform controlled experiments.

What Makes A Good Content Generator?

Now we understand how a search engine parses documents on the web, we also understand the
intricacies of duplicate content and what it takes to avoid it. Now it is time to check out some basic
content generation techniques.
One of the more commonly used text spinners is known as Markov. Markov isn't actually intended for
content generation, it's actually something called a Markov Chain which was developed by
mathematician Andrey Markov. The algorithm takes each word in a body of content and changes the
order based on the algorithm. This produces largely unique text, but it's also typically VERY
unreadable. The quality of the output really depends on the quality of the input. The other issue with
Markov is the fact that it will likely never pass a human review for readability. If you don't shuffle the
Markov chains enough you also run into duplicate content issues because of the nature of shingling as
discussed earlier. Some people may be able to get around this by replacing words in the content with
synonyms. I personally stopped using Markov back in 2006 or 2007 after developing my own
proprietary content engine. Some popular software that uses Markov chains include RSSGM and
YAGC both of which are pretty old and outdated at this point. They are worth taking a look at just to
understand the fundamentals, but there are FAR better packages out there.

So, we've talked about the old methods of doing things, but this isn't 1999, you can't fool the search
engines by simply repeating a keyword over and over in the body of your pages (I wish it were still that
easy). So what works today? Now and in the future, LSI is becoming more and more important. LSI
stands for Latent Semantic Indexing. It sounds complicated, but it really isn't. LSI is basically just a
process by which a search engine can infer the meaning of a page based on the content of that page. For
example, lets say they index a page and find words like atomic bomb, Manhattan Project, Germany, and
Theory of Relativity. The idea is that the search engine can process those words, find relational data and
determine that the page is about Albert Einstein. So, ranking for a keyword phrase is no longer as simple
as having content that talks about and repeats the target keyword phrase over and over like the good old
days. Now we need to make sure we have other key phrases that the search engine thinks are related to
the main key phrase.
So if Markov is easy to detect and LSI is starting to become more important, which software works, and
which doesn't?

Software

Fantomaster Shadowmaker: This is probably one of the oldest and most commonly known high end
cloaking packages being sold. It's also one of the most out of date. For $3,000.00 you basically get a
clunky outdated interface for slowly building HTML pages. I know, I'm being harsh, but I was really
let down by this software. The content engine doesn't do anything to address LSI. It simply splices
unrelated sentences together from random sources while tossing in your keyword randomly. Unless
things change drastically I would avoid this one.

SEC (Search Engine Cloaker): Another well known paid script. This one is of good quality and with
work does provide results. The content engine is mostly manual making you build sentences which are
then mixed together for your content. If you understand SEO and have the time to dedicate to creating
the content, the pages built last a long time. I do have two complaints. The software is SLOW. It takes
days just to setup a few decent pages. That in itself isn't very black hat. Remember, we're lazy! The
other gripe is the ip cloaking. Their ip list is terribly out of date only containing a couple thousand ip's
as of this writing.
SSEC or Simplified Search Engine Content: This is one of the best IP delivery systems on the market.
Their ip list is updated daily and contains close to 30,000 ip's. The member only forums are the best in
the industry. The subscription is worth it just for the information contained there. The content engine is
also top notch. It's flexible, so you can chose to use their proprietary scraped content system which
automatically scrapes search engines for your content, or you can use custom content similar in fashion
to SEC above, but faster. You can also mix and match the content sources giving you the ultimate in
control. This is the only software as of this writing that takes LSI into account directly from within the
content engine. This is also the fastest page builder I have come across. You can easily put together
several thousand sites each with hundreds of pages of content in just a few hours. Support is top notch,
and the knowledgeable staff really knows what they are talking about. This one gets a gold star from
me.
BlogSolution: Sold as an automated blog builder, BlogSolution falls short in almost every important
area. The blogs created are not wordpress blogs, but rather a proprietary blog software specifically
written for BlogSolution. This “feature” means your blogs stand out like a sore thumb in the eyes of the
search engines. They don't blend in at all leaving footprints all over the place. The licensing limits you
to 100 blogs which basically means you can't build enough to make any decent amount of money. The
content engine is a joke as well using rss feeds and leaving you with a bunch of easy to detect duplicate
content blogs that rank for nothing.
Blog Cloaker: Another solid offering from the guys that developed SSEC. This is the natural evolution
of that software. This mass site builder is based around wordpress blogs. This software is the best in the
industry hands down. The interface has the feel of a system developed by real professionals. You have
the same content options seen in SSEC, but with several different redirection types including header
redirection, JavaScript, meta refresh, and even iframe. This again is an ip cloaking solution with the
same industry leading ip list as SSEC. The monthly subscription may seem daunting at first, but the
price of admission is worth every penny if you are serious about making money in this industry. It
literally does not get any better than this.
Cloaking

So what is cloaking? Cloaking is simply showing different content to different people based on different
criteria. Cloaking automatically gets a bad reputation, but that is based mostly on ignorance of how it
works. There are many legitimate reasons to Cloak pages. In fact, even Google cloaks. Have you ever
visited a web site with your cell phone and been automatically directed to the mobile version of the site?
Guess what, that's cloaking. How about web pages that automatically show you information based on
your location? Guess what, that's cloaking. So, based on that, we can break cloaking down into two
main categories, user agent cloaking and ip based cloaking.
User Agent cloaking is simply a method of showing different pages or different content to visitors based
on the user agent string they visit the site with. A user agent is simply an identifier that every web
browser and search engine spider sends to a web server when they connect to a page. Above we used the
example of a mobile phone. A Nokia cell phone for example will have a user agent similar to: User-
Agent: Mozilla/5.0 (SymbianOS/9.1; U; [en]; Series60/3.0 NokiaE60/4.06.0) AppleWebKit/413
(KHTML, like Gecko) Safari/413

Knowing this, we can tell the difference between a mobile phone visiting our page and a regular visitor
viewing our page with Internet Explorer or Firefox for example. We can then write a script that will
show different information to those users based on their user agent.

Sounds good, doesn't it? Well, it works for basic things like mobile and non mobile versions of pages,
but it's also very easy to detect, fool, and circumvent. Firefox for example has a handy plug-in that
allows you to change your user agent string to anything you want. Using that plug-in I can make the
script think that I am a Google search engine bot, thus rendering your cloaking completely useless. So,
what else can we do if user agents are so easy to spoof?

IP Cloaking

Every visitor to your web site must first establish a connection with an ip address. These ip addresses
resolve to dns servers which in turn identify the origin of that visitor. Every search engine crawler must
identify itself with a unique signature viewable by reverse dns lookup. This means we have a sure fire
method for identifying and cloaking based on ip address. This also means that we don't rely on the user
agent at all, so there is no way to circumvent ip based cloaking (although some caution must be taken as
we will discuss). The most difficult part of ip cloaking is compiling a list of known search engine ip's.
Luckily software like Blog Cloaker and SSEC already does this for us. Once we have that information,
we can then show different pages to different users based on the ip they visit our page with. For
example, I can show a search engine bot a keyword targeted page full of key phrases related to what I
want to rank for. When a human visits that same page I can show an ad, or an affiliate product so I can
make some money. See the power and potential here?

So how can we detect ip cloaking? Every major search engine maintains a cache of the pages it indexes.
This cache is going to contain the page as the search engine bot saw it at indexing time. This means
your competition can view your cloaked page by clicking on the cache in the SERPS. That's ok, it's easy
to get around that. The use of the meta tag noarchive in your pages forces the search engines to show no
cached copy of your page in the search results, so you avoid snooping web masters. The only other
method of detection involves ip spoofing, but that is a very difficult and time consuming thing to pull
of. Basically you configure a computer to act as if it is using one of Google's ip's when it visits a page.
This would allow you to connect as though you were a search engine bot, but the problem here is that
the data for the page would be sent to the ip you are spoofing which isn't on your computer, so you are
still out of luck.
The lesson here? If you are serious about this, use ip cloaking. It is very difficult to detect and by far the
most solid option.

Link Building

As we discussed earlier, Black Hats are Basically White Hats, only lazy! As we build pages, we also
need links to get those pages to rank. Lets discuss some common and not so common methods for
doing so.

Blog ping: This one is quite old, but still widely used. Blog indexing services setup a protocol in which
a web site can send a ping whenever new pages are added to a blog. They can then send over a bot that
grabs the page content for indexing and searching, or simply to add as a link in their blog directory.
Black Hats exploit this by writing scripts that send out massive numbers of pings to various services in
order to entice bots to crawl their pages. This method certainly drives the bots, but in the last couple
years it has lost most of its power as far as getting pages to rank.
Trackback: Another method of communication used by blogs, trackbacks are basically a method in
which one blog can tell another blog that it has posted something related to or in response to an existing
blog post. As a black hat, we see that as an opportunity to inject links to thousands of our own pages by
automating the process and sending out trackbacks to as many blogs as we can. Most blogs these days
have software in place that greatly limits or even eliminates trackback spam, but it's still a viable tool.

EDU links: A couple years ago Black Hats noticed an odd trend. Universities and government agencies
with very high ranking web sites often times have very old message boards they have long forgotten
about, but that still have public access. We took advantage of that by posting millions of links to our
pages on these abandoned sites. This gave a HUGE boost to rankings and made some very lucky
Viagra spammers millions of dollars. The effectiveness of this approach has diminished over time.

Forums and Guest books: The internet contains millions of forums and guest books all ripe for the
picking. While most forums are heavily moderated (at least the active ones), that still leaves you with
thousands in which you can drop links where no one will likely notice or even care. We're talking about
abandoned forums, old guest books, etc. Now, you can get links dropped on active forums as well, but it
takes some more creativity. Putting up a post related to the topic on the forum and dropping your link In
the BB code for a smiley for example. Software packages like Xrumer made this a VERY popular way
to gather back links. So much so that most forums have methods in place to detect and reject these types
of links. Some people still use them and are still successful.
Link Networks: Also known as link farms, these have been popular for years. Most are very simplistic
in nature. Page A links to page B, page B links to page C, then back to A. These are pretty easy to detect
because of the limited range of ip's involved. It doesn't take much processing to figure out that there are
only a few people involved with all of the links. So, the key here is to have a very diverse pool of links.
Take a look at Link Exchange for example. They have over 300 servers all over the world with
thousands of ip's, so it would be almost impossible to detect. A search engine would have to discount
links completely in order to filter these links out.

Money Making Strategies

We now have a solid understanding of cloaking, how a search engine works, content generation,
software to avoid, software that is pure gold and even link building strategies. So how do you pull all of
it together to make some money?
he traffic you send it. You load up your money keyword list, setup a template with your ads or offers,
then send all of your doorway/cloaked traffic to the index page. The Landing Page Builder shows the
best possible page with ads based on what the incoming user searched for. Couldn't be easier, and it
automates the difficult tasks we all hate.

Affiliate Marketing: We all know what an affiliate program is. There are literally tens of thousands of
affiliate programs with millions of products to sell. The most difficult part of affiliate marketing is
getting well qualified targeted traffic. That again is where good software and cloaking comes into play.
Some networks and affiliates allow direct linking. Direct Linking is where you setup your cloaked
pages with all of your product keywords, then redirect straight to the merchant or affiliates sales page.
This often results in the highest conversion rates, but as I said, some affiliates don't allow Direct
Linking. So, again, that's where Landing Pages come in. Either building your own (which we are far
too lazy to do), or by using something like Landing Page Builder which automates everything for us.
Landing pages give us a place to send and clean our traffic, they also prequalify the buyer and make
sure the quality of the traffic sent to the affiliate is as high as possible. After all, we want to make
money, but we also want to keep a strong relationship with the affiliate so we can get paid.
Conclusion

As we can see, Black Hat Marketing isn't all that different from White Hat marketing. We automate the
difficult and time consuming tasks so we can focus on the important tasks at hand. I would like to thank
you for taking the time to read this. I plan to update often. In the mean time, you can follow me on my
personal blog over at http://www.blackhat360.com . Be sure to register and post on the forums if you
have any comments or questions.

You might also like