To attract prudent legitimate advertisers who do not want to betoo closely connected to the spammers, many syndicators havealso split their operations into two or more layers, which areconnected by multiple redirections, to obfuscate the connectionbetween the advertisers and the spammers. Since thesesyndicators are typically smaller companies, they often join forcesthrough traffic aggregation to attract sufficient traffic providersand advertisers.We model this end-to-end search spamming business with thefive-layer double-funnel illustrated in Figure 1: tens of thousandsof
advertisers
(Layer #5) pay a handful of
syndicators
(Layer #4)to display their ads. The syndicators buy traffic from a smallnumber of
aggregators
(Layer #3), who in turn buy traffic fromweb spammers to insulate syndicators and advertisers from spampages. The spammers set up hundreds to thousands of redirectiondomains (Layer #2), create millions of doorway pages (Layer #1)that fetch ads from these redirection domains, and widely spamthe URLs of these doorways at public forums. If any such URLsare promoted into top search results and are clicked by users, allclick-through traffic is funneled back through the aggregators,who then de-multiplex the traffic to the right syndicators.Sometimes there is a chain of redirections between theaggregators and the syndicators due to multiple layers of trafficaffiliate programs, but almost always one domain at the end of each chain is responsible for redirecting to the target advertiser’swebsite.
AggregatorsSyndicatorAdvertiserAdvertiserDoorway
Doorway
Doorway
RedirectionDomainRedirectionDomain
AdsDisplayClick-Thru
SyndicatorDoorway
Layer #1Layer #5Layer #2Layer #3Layer #4
Figure 1: Spam Double-Funnel
In the case of AdSense-based spammers, the single domaingooglesyndication.com plays the role of the middle three layers,responsible for serving ads, receiving click-through traffic, andredirecting to advertisers. Specifically, browsers fetch AdSenseads from the redirection domain googlesyndication.com anddisplay them on the doorway pages; ads click-through traffic goesinto the aggregator domain googlesyndication.com beforereaching advertisers’ websites.
3.
SPAMMER-TARGETED KEYWORDS
To study the common characteristics of redirection spam, ourfirst step was to discover the keywords and categories heavilytargeted by redirection spammers. In this section, we describe ourmethodology for deriving 10 spammer-targeted categories and abenchmark of 1,000 keywords, which serve as the basis for theanalyses presented in Section 4.Redirection spammers often use their targeted keywords as theanchor text of their spam links at public forums, exploiting atypical algorithm by which common search engines index andrank URLs. For example, the anchor text for the spam URLhttp://coach-handbag-top.blogspot.com/ is typically “coachhandbag”. Therefore, we collect spammer-targeted keywords byextracting all the anchor text from a large number of spammedforums and ranking the keywords by their frequencies.Between June and August of 2006, we manually investigatedspam reports from multiple sources including search userfeedback, heavily spammed forum types, online spam discussionforums, etc. We compiled a list of 323 keywords that returnedspam URLs among the top 50 results at one of the three majorsearch engines. We then queried these keywords at all threesearch engines, extracted the top-50 results, scanned them with anearlier version of Search Ranger, and identified 4,803 uniqueredirection-spam URLs.Next, we issued a “
link:
” query on each of the 4,803 URLs andretrieved 35,878 unique pages that contained at least one of thesespam URLs. From these pages, we collected a total of 1,132,099unique keywords, with a total of 6,026,699 occurrences, andranked the keywords by their occurrence counts. The top-5keywords are all drugs-related: “
phentermine
” (8,117), “
viagra
”(6,438), “
cialis
” (6,053), “
tramadol
” (5,788), and “
xanax
”(5,663). Among the top one hundred, 74 are drugs-related, 16 areringtone-related, and 10 are gambling-related.Among the above 1,132,099 keywords, we could select a toplist, say top 1000, for our subsequent analyses. However, weobserved that keywords related to drugs and ringtones dominatethe top-1000 list. Since it would be useful to study spammers whotarget different categories, we decided to construct our benchmark by manually selecting ten of the most prominent categories fromthe list. They are:1.
Drugs
: phentermine, viagra, cialis, tramadol, xanax, etc.2.
Adult
: porn, adult dating, sex, etc.3.
Gambling
: casino, poker, roulette, texas holdem, etc.4.
Ringtone
: verizon ringtones, free polyphonic ringtones, etc.5.
Money
: car insurance, debt consolidation, mortgage, etc.6.
Accessories
: rolex replica, authentic gucci handbag, etc.7.
Travel
: southwest airlines, cheap airfare, hotels las vegas, etc.8.
Cars
: bmw, dodge viper, audi monmouth new jersey, etc.9.
Music
: free music downloads, music lyrics, 50 cent mp3, etc.10.
Furniture
: bedroom furniture, ashley furniture, etc.We then selected the top-100 keywords from each category toform our first benchmark of 1,000 spammer-targeted search terms.
4.
REDIRECTION-SPAM ANALYSIS
In late September 2006, we submitted the 1,000 keywords tothe Search Ranger system, which retrieved the top-50 results fromall three major search engines. In total, we collected 101,585unique URLs from 1,000x50x3=150,000 search results. With a setof approximately 500 known-spammer redirection domains andAdSense IDs at that time, the system identified
12,635
uniquespam URLs, which accounted for
11.6%
of all the top-50appearances. (The actual redirection-spam density should behigher because some of the doorway pages had been deactivated,which were no longer causing URL redirections when we scannedthem.) We first give a brief analysis of per-category spamdensities in Section 4.1 and then focus on the double-funnelanalysis for the remainder of this section.
4.1
Spam Density Analysis
Figure 2 compares the per-category spam densities across the10 spammer-targeted categories. The numbers range from
2.7%
Add a Comment