Inside The Scam Jungle: A Closer Look at 419 Scam Email Operations

Isacenkova et al.
EURASIP Journal on Information Security 2014, 2014:4

http://jis.eurasipjournals.com/content/2014/1/4
R ESEA R CH Open Access
Inside the scam jungle: a closer look at 419

scam email operations
Jelena Isacenkova1* , Olivier Thonnard2 , Andrei Costin1 , Aurélien Francillon1 and David Balzarotti1
Abstract
419 scam (also referred to as Nigerian scam) is a popular form of fraud in which the fraudster tricks the victim into
paying a certain amount of money under the promise of a future, larger payoff.
Using a public dataset, in this paper, we study how these forms of scam campaigns are organized and evolve over time.
In particular, we discuss the role of phone numbers as important identifiers to group messages together and depict
the way scammers operate their campaigns. In fact, since the victim has to be able to contact the criminal, both email
addresses and phone numbers need to be authentic and they are often unchanged and re-used for a long period of
time. We also present in detail several examples of 419 scam campaigns, some of which last for several years -
representing them in a graphical way and discussing their characteristics.
Keywords: Email spam; Nigerian scam; 419 scam; Cyber crime; Campaigns
1 Introduction human factors: pity, greed, and social engineering tech-

Nigerian scam [1], also called ‘419 scam’ as a reference to niques. Scammers use very primitive tools (if any) com-
the 419 section in the Nigerian penal code [2], has been pared with other forms of spam where operations are
a known problem for several decades. The name encom- often completely automated. A distinctive characteristic
passes many variations of this type of scam, like advance of email fraud is the communication channel set up to
fee fraud, fake lottery, black money scam, etc. Originally, reach the victim: from this point of view, scammers tend
the 419 scam phenomenon started by postal mail, and to use emails and/or phone numbers as their main con-
then evolved into a business run via fax first, and email tacts [5], while other forms of spam are more likely to
later. 419 scam is a popular form of fraud in which the forward their victims to specific URLs. For instance, a
fraudster tricks the victim into paying a certain amount of previous study of spam campaigns [6] (in which scam
money under the promise of a future, larger payoff. The was considered a subset of spam) indicates that 59% of
prosecution of such criminal activity is complicated [3] spam messages contain a URL. However, even though 419
and can often be evaded by criminals. As a result, reports scam messages got eclipsed by the large amount of spam
of such crime still appear in the social media and online sent by botnets, they still pose a persistant problem that
communities, e.g. 419scam.org [4], exist to mitigate the causes substantial personal financial losses for a number
risk and help users to identify scam messages. of victims all around the world.
Nowadays, 419 scam is often perceived as a particu- The traditional spam and scam (not 419) scenarios have
lar type of spam. However, while most of the spam is been already thoroughly studied (e.g. [6,7]), where a big
sent today in bulk through botnets and by compromised part of existing unsolicited bulk email identification tech-
machines, 419 scam activities are still largely performed niques rely on high volumes of similar messages. However,
in a manual way. Moreover, the underlying business and 419 messages are more likely to be sent in lower copies and
operation models differ. Spammers trap their victims from webmail accounts. Thus, criminals aim to stay unno-
through engineering effort, whereas scammers rely on ticed by the traditional spam filters and avoid drawing
attention to abused webmail accounts. The exact distribu-
*Correspondence: jelena.isachenkova@eurecom.fr
tion methods of 419 scam messages have not been stud-
1 Eurecom, Route des Chappes, Sophia Antipolis, France ied as deeply as, for example, the distribution of botnet
Full list of author information is available at the end of the article spam. However, based on Microsoft Security Intelligence
© 2014 Isacenkova et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction
in any medium, provided the original work is properly cited.
Isacenkova et al. EURASIP Journal on Information Security 2014, 2014:4 Page 2 of 18
Reports [8], 419 scam messages constitute on average to The rest of the paper is organized as follows: We start by
8% of email spam traffic (data over the last 5 years). describing the scam dataset (Section 3), to which we apply
A recent study by Costin et al. [5] describes the use of our cluster analysis technique to extract scam campaigns,
phone numbers in a number of malicious activities. The and compare the usage of email addresses and phone
authors show that the phone numbers used by scammers numbers (Section 4). In Section 5, we focus on a number
are often active for a long period of time and are reused of individual campaigns to present their characteristics.
over and over in different emails, making them an attrac- Finally, we draw our conclusions in Section 6.
tive feature to link together scam messages and identify
possible campaigns. In this work, we test this hypothe- 2 Related work
sis by using phone numbers and other email features to Scammers employ various techniques to harvest money
automatically detect and study scam campaigns by using from ingenuous victims. Tive [13] introduced the tricks
a public dataset. To the best of our knowledge, this is the of 419 fee fraud and the philosophy of the tricksters
first in-depth study of 419 email campaigns. While a pre- behind. Stajano and Wilson [14] studied a number of scam
liminary version of this study was published in [9], this is techniques and demonstrated the importance of security
an extended version of the study. engineering operations. Herley [15] analysed attack deci-
Our goal is to study how scammers orchestrate their sions as binary classification problems studying the case
scam campaigns, by looking at the interconnections of 419 scammers. The author looks into the economical
between email accounts, phone numbers, and email topics aspects of adversaries trying to understand how scammers
used by scammers. To this aim, we use a novel multi- find viable victims out of millions of users, so that their
criteria decision algorithm to effectively cluster scam business still remains profitable. A brief summary of 419
emails that are sharing a minimal number of common- scam schemes was presented by Buchanan and Grant [3]
alities, even in the presence of more volatile features. indicating that Internet growth has facilitated the spread
Because of this set of commonalities, scam emails origi- of cyber fraud. They also emphasized the difficulties of
nating from the same scammer(s) are likely to be grouped adversary prosecution - one of the main reasons why 419
together, enabling us to gain insights into scam cam- scam is still an issue today. A more recent work by Oboh
paigns. Additionally, we also evaluate the quality and et al. [16] discussed the same problem of prosecution in a
consistency of our clustering results. To this aim, we per- more global context taking the Netherlands as an example.
form a threshold sensitivity analysis, as well as evaluate the Another work by Goa et al. [17] proposed an ontol-
homogeneity of clusters using the graph compactness and ogy model for scam 419 email text mining demonstrating
Adjusted Rand Index as metrics. high precision in detection. A work by Pathak et al. [6]
In our analysis, we have identified over 1,000 differ- analysed email spam campaigns sent by botnets, describ-
ent campaigns and, for most of them, phone numbers ing their patterns and characteristics. The authors also
represent the cornerstone that allows us to link the differ- showed that 15% of the spam messages contained a phone
ent pieces together. We also discovered some larger-scale number. A recent patent has been published by Coomer
campaigns (so-called ‘macro-cluster’), which are made of [18] on a technique that detects scam and spam emails
loosely inter-connected scam clusters reflecting different through phone number analysis. This is the first mention-
operations of the same scammers. We believe these can ing of phone numbers being used for identifying scam,
be attributed to different scam runs orchestrated by the but with no technical implementation details. Costin et al.
same criminal groups, as we observe the same phone [5] studied the role of phone numbers in various online
numbers or email accounts being reused across different fraud schemes and empirically demonstrated it’s signifi-
sub-campaigns. cance in 419 scam domain. Our work extends the study
As demonstrated by our experiments, our methods and by focusing on scam email and campaign characterization
findings could be leveraged to pro-actively identify new that relies on phone numbers and email addresses used by
scam operations (or variants of previous ones) by quickly scammers.
associating a new scam to ongoing campaigns. We believe
that this would facilitate the work of law enforcement 3 Dataset
agencies in the prosecution of scammers. Our approach In this section, we describe the dataset we used for ana-
could also be leveraged to improve investigations of other lyzing 419 scam campaigns and provide some statistics
cybercrime schemes by logging and investigating various of the scam messages. There are various sources of scam
groups of cybercriminals based on their online activities. often reported by users and aggregated afterwards by
In this regard, our methodology already proved its util- dedicated communities, forums, and other online activ-
ity in the context of other security investigations, such as ity groups. The data chosen for our analysis come from
in the analysis of rogue AV campaigns [10], spam botnets 419scam.org - a 419 scam aggregator - as it provides a
operations [11], and targeted attacks [12]. large set of preprocessed data: email bodies, headers, and
some already extracted emails attributes, like the scam are collecting and classifying the data. Nevertheless, the
category and the phone numbers. Note that IP addresses geographical distribution of the mentioned continents
data are absent. We downloaded the emails for a period is reflected in our dataset, excluding only some minor
spanning from January 2009 until August 2012. actors.
In our study, we also exploited the fact that the phone To better understand the dataset, we look at the time
numbers can indicate a geographical location, typically during which emails and phones were advertised by scam-
the country where the phone is registered. Although it mers in scam messages. Seventy one percent of the email
does not prove the origin of the message or the scam- addresses in our dataset were used only during 1 day. The
mer, still it references a country of a scam operation, and remaining were used for an average duration of 79 days
improves victim’s level of confidence in the received mes- each. Phone numbers have a longer longevity than email
sage. For example, receiving a new partnership offer from addresses: 51% of the phone numbers were used only for 1
UK could seem suspicious if the phone contact has a day; the rest were used on average for 174 days (around 6
Nigerian prefix, or a fake lottery notification with contact months) hence making it an important feature in our data
details originating from an African country while the vic- clustering analysis.
tim being from Europe. Moreover, as shown in a previous Table 2 summarizes the phone number geographi-
study [5], 419 scam mobile phone numbers are precise in cal distribution. UK numbers are twice as common as
indicating the country of residence of the phone owner Nigerian, and three times more common than the ones
(scammer) as few roaming cases were found. Therefore, from Benin, the third biggest group. The Netherlands and
the phone attribute is precise enough to indicate geo- Spain are the leading countries in Europe. Note that UK
graphical origins. should be considered as a special case. As reported by
The resulting dataset consists of 36,761 messages with 419scam.org and Costin et al. [5], all UK phone numbers
11,768 unique phone numbers. The general statistics of in this dataset belong to personal numbering services -
the data are shown in Table 1. A first thing to notice is that services used for forwarding phone calls to other phone
the number of email addresses is three times bigger than numbers and serving as a masking service of the real des-
the number of phone numbers, emphasizing the facility tination for the callee. In our dataset there are 44% of
to acquire mailboxes for malevolent purposes. However, such phone numbers (all with UK prefix), another 44% are
still the ratio is quite low indicating rather cheap and easy mobile phone numbers, 12% are fixed lines [5], and only
access to the phone numbers. Another specificity of this less than 1% of the phones are non-existent.
dataset is that each message can contain several email The initial 419 scam messages are also labeled by the
addresses and phone numbers, where the number of dif- 419scam.org [4] with a category. Around 64% of the emails
ferent email addresses can be as high as five per message: are assigned to the category named ‘419 scam’, which
from email address, reply email, and other email addresses is a subcategory of 419 scam and refers specifically to
indicated in the body of the messages. Hence, although financial fraud types, e.g., transactions, lost funds, etc.
we collected only 36,761 messages, we extracted 112,961 As reported by [4], most of the remaining emails (24%)
email addresses from them. belong to ‘Fake lottery’ category. However, this distribu-
In our dataset, we did not notice any significant bursts tion has been changing over time as shown in Figure 1.
of scam messages (verified on a monthly basis) during Especially, a big difference can be observed between 2009
the three year span, suggesting that the email messages
were constantly distributed over time. It is also impor-
tant to note that the dataset is mostly limited to the Table 2 Phones by countries
European and African regions (with only a few Asian Country Total phones Total (%)
samples), which is due to the way the website owners United Kingdom 4,499 43
Nigeria 3,121 30
Table 1 General statistics table Benin 1,448 14
Description Numbers South Africa 562 5
Scam messages 36,761 Spain 372 4
Unique messages 26,250 Netherlands 263 3
Total email addresses 112,961 Ivory Coast 89 1
Unique email addresses 34,723 China 68 1
Total phone numbers 41,320 Senegal 47 0.5
Total unique phone numbers 11,768 Togo 11 0.1
Number of countries 12 Indonesia 1 0.01
Figure 1 Scam email categories over time.
and 2011, where in 2011, the ‘419 scam’ became a dom- have clustered all scam messages using TRIAGE - a soft-
inant category. As of August 2012, there was five times ware framework for security data mining that takes advan-
more emails of ‘419 scam’ than of ‘fake lottery’ letters. tage of multi-criteria data analysis to group events based
This might be due to an outdated categorization process, on subsets of common elements (subsequently called fea-
as scam topics - like spam - may change and evolve over tures). Thanks to this multi-criteria clustering approach,
time. For this reason, in the next section, we describe our TRIAGE identifies complex patterns in data, unveiling
process to automatically identify the scam topics based on sometimes varying relationships among series of con-
the word frequencies in the messages. We also observe nected or disparate events. TRIAGE is best described as a
that most of the ‘fake lottery’ scams are associated with security tool designed for intelligence extraction helping
European phone numbers suggesting that this category to determine the patterns and behaviors of the intrud-
is sent to a targeted audience. In the majority of ‘419 ers (i.e., the tactics, techniques and procedures, or TTPs),
scam’ cases, scammers use as many Nigerian phone num- highlighting ‘how’ they operate rather than ‘what’ they
bers (Figure 2) as of UK ones. African phone numbers do. The framework [19] has already demonstrated its util-
(Figure 2) with UK share being equivalent to Nigerian. ity in the context of other security investigations, e.g.,
Also notice that Benin becomes a much bigger player rogue AV campaigns [10], spam botnets [11] and targeted
starting from the beginning of 2011. Hence, the geo- attacks [12].
graphical targets may vary with the topics and objectives Figure 3 illustrates the TRIAGE workflow, as applied to
persuaded by the accomplices. our scam dataset. In step 1, a number of email charac-
teristics (or features) are selected and defined as deci-
4 Data analysis methods sion criteria for linking the emails. Such characteristics
In this section, we describe methods used for identify- include the sender email address (the from), the email
ing groups of similar 419 scam emails that are believed subject, the sending date, the reply address (as found
to be part of the campaigns and then present the results. in the email header), the phone number and any other
We use two different metrics to evaluate the quality and email address found in the message itself (email body).
consistency of the created clusters (campaigns). Finally, In step 2, TRIAGE builds relationships among all email
we extract the most repetitive keywords from the body samples with respect to the selected features using appro-
of the scam messages in order to improve their topic priate similarity metrics. More specifically, we used var-
categorization. ious string-oriented similarity measures commonly used
in information retrieval, such as the Levenshtein similar-
4.1 Scam email clustering: the TRIAGE approach ity (for the subject) and the N-gram similarity (for features
To identify groups of scam emails that are likely part of a as from, reply, email body). As shown in [20], N-gram sim-
campaign orchestrated by the same group of people, we ilarity can be seen as a generalization of Levenshtein and
Figure 2 ‘419 scam’ category phone numbers over time by countries.
has proved to work better with short strings. However, N- Table 3 shows the particular set of weights used for
gram can be computationally expensive. For this reason, this analysis, in which we emphasize the importance of
we decided to use it solely for email addresses, for which the phone numbers and the email subjects. The features
it is somehow more critical to have a reliable compari- related to the sender email addresses were given a medium
son method, than for email subjects which tend also to be importance, whereas the sending date was given a much
longer and thus more expensive to compare. For features lower importance. The fused similarity value (or aggre-
as phone and date, we simply used the equality compar- gated value) is then used as input to a classical graph
ison method - as this method appeared to us to be the clustering algorithm, such as minimal cut or connected
most appropriate to capture the semantics between two components, which are able to recover clusters of arbi-
different feature values in this particular case. trary size and shape.
At step 3, the individual feature similarities are fused At this stage, it is important to stress that, except maybe
using an aggregation model reflecting a high-level behav- for the phone number used by the scammer (unless this
ior defined by the analyst, who can impose, e.g., that at number is fake), none of the other features included in the
least k highly similar email features (out of n) are required clustering analysis can be considered alone to be sufficient
to attribute different samples to the same campaign. The to attribute scam emails to the same criminals. For exam-
tool allows to assign different weights to the features, so ple, the fact that the same (or similar) sender email address
as to give higher or lower importance to certain features. was used in two scam messages, even sent on the same
Figure 3 TRIAGE workflow on scam dataset.


Table 3 Weights of individual features =1 correlated features. The result of this sensitivity analysis
Feature Importance is shown in Figure 4, which represents the total number
of clusters (MDCs) found by the algorithm for increas-
Phone 0.30
ing values of the decision threshold. The best trade-off
From 0.12
between quality and completeness of the clustering pro-
Reply 0.18 cess is usually obtained for threshold values correspond-
Subject 0.25 ing to the maximum number of clusters [19], i.e., we chose
Email body 0.10 here to set the threshold at 0.35. Given the set of impor-
Date 0.05 tances and interactions defined above, we can easily verify
that the outcome of the Choquet aggregation will exceed
this threshold for combinations of any two features involv-
date, does not necessarily mean that these two messages ing similarities for the phone number and at least one
originate from the same individual(s). Only one or two other feature (besides the date). Any coalition of three (or
common features can appear merely due to a coincidence, more) similar features will also exceed the threshold and
to a commonly seen pattern, or perhaps to some com- will lead to the formation of a cluster.
monly used technique shared by scammers. This moti-
vates our choice for a multi-criteria fusion model that 4.1.1 k-additive Choquet integral
incorporates various weights defined according to a fuzzy The Choquet integral is defined with respect to a so-called
linguistic quantifier (also referred to as Regular Increasing fuzzy measure. Given a set of criteria N (e.g., email fea-
Monotone or RIM quantifier [21,22]), which allows us to tures), a fuzzy measure is simply a set function used to
model fusion strategies such as at least k strong correla- define, in some sense, the importance (or strength) of
tions are required to link two scam emails. Note that other any subset belonging to the power set of N . More for-
fuzzy linguistic models may be considered to model the mally, a fuzzy measure (alternatively called a capacity in
portion of criteria to be satisfied in the aggregation step, the literature) is defined as follows:
e.g., few, some, half, many, or most of the features can be
required to be strongly correlated. Definition. (Fuzzy measure): Let N = {1, 2, . . . , n} be
The TRIAGE framework provides advanced aggregation the index set of n criteria. A fuzzy measure [25] (or capac-
modelling capabilities, such as the Choquet integral - a ity [26]) is a set function v : 2N → [0, 1] which is
fuzzy integral that aggregates a set of scores by not only monotonic (i.e., v(A) ≤ v(B ) whenever A ⊂ B ) and sat-
taking into account importance factors assigned to indi- isfies v(∅) = 0. The measure is normalized if in addition
vidual criteria, but also interactions among subsets of v(N ) = 1.
criteria [23,24]. This enables us to also include interac-
tions among groups of criteria (i.e., email features), like In multi-criteria decision making, a fuzzy measure is
synergies and redundancies. For this analysis, we have thus a set of 2n real values where each value can be viewed
assigned synergies to coalitions of features involving at as the degree of importance of a combination of criteria
least the phone number of the scammer, so as to boost the (also called a coalition, in particular in game theory).
overall similarity between emails having the same phone However, since the exponential complexity of fuzzy
number, plus one additional feature in common. Inversely, measures makes them impractical for a decision maker
some redundancy was put on certain combinations of to define them manually, Grabisch proposed a sub-model
email address-related features, such as (reply, email body), called k-order additive fuzzy measures, or shorter k-
in order to diminish their redundancy effect on the overall additive fuzzy measures [27]. The idea is to construct
similarity score. The definition of this aggregation model a fuzzy measure where the interaction among criteria is
and its parameters must be guided by domain expert limited to groups of size k (or less). For example, in a
knowledge; in our case, we leveraged the insights gained 2-additive fuzzy measure, we can only model pairwise
previously by analyzing the role of phone numbers and interactions among criteria, but no interactions in groups
email addresses in such scam operations in [5]. of 3 or more.
As an outcome (step 4), TRIAGE identifies multi- In MCDA, k-additive fuzzy measures have proved to
dimensional clusters (MDCs), which in this analysis are provide a good trade-off between complexity and flexibil-
clusters of scam emails in which any pair of emails is ity of the model. Instead of 2n − 2 values, they require

linked by a number of common traits. As explained in only ki=1 (ni ) values to be defined. 1-additive fuzzy mea-
[19], a decision threshold can be chosen such that unde- sures are just ordinary additive measures (for which only
sired linkage between attacks are eliminated, i.e., to drop n values are needed), but they are usually too restrictive
any irrelevant connection that could result from a com- for an accurate representation of complex problems1 . In
bination of small values or an insufficient number of practice, 2-additivity seems to be the best compromise
Figure 4 Sensitivity analysis on the decision threshold used in the TRIAGE clustering.
between low complexity and richness of the model [27]. In induces a conjunctive aggregation of scores, while a
this case, only n(n + 1)/2 values need to be defined. negative interaction induces a disjunctive aggregation
From a practical viewpoint, a decision maker may thus (it is sufficient that one score is high). Clearly, the
consider defining a 2-additive fuzzy measure since it Shapley value is the linear part of the model, while
appears to be a good trade-off between complexity of the interaction is the nonlinear part.
model (number of values to determine) and its effective- - Coefficients are non-negative, and if the capacity is
ness and flexibility to include interactions (synergies or normalized, they sum up to 1. In other words, it
redundancies) among pairs of features. To do this, a com- means that the Choquet integral is a convex
monly used approach consists to calculate the Choquet combination of the scores zi on all criteria, and of all
integral directly from the values provided by the decision disjunctive and conjunctive combinations of scores
maker for the interaction indices I2 (A), and by using the on pairs of criteria. Hence, the coefficient of a given
expression of Cv given by Grabisch in [28]: term can be interpreted as the percentage of
contribution of such term to the overall score. This
aspect is highly appreciated in practice because it
Cv (z) = (zi ∧ zj )Iij + (zi ∨ zj )|Iij |
allows an analyst to understand each decision
i,j∈N|Iij >0 i,j∈N|Iij <0
behavior made by the algorithm when calculating a
⎡ ⎤ (1)
global score between two entities.
1
+ zi ⎣φi − |Iij |⎦ ,
2 Note that when defining interaction indices for pairs of
i∈N j =i
criteria, the only conditions one must verify are the addi-
tivity and monotonicity of the resulting measure. That is,
where φi is the Shapley value of v as defined in [29] (which it is sufficient to verify that the following quantity is always
are directly related to the importances of individual fea- positive ∀i ∈ N:
tures), Iij is the interaction index between criteria i and j
(as defined in [30]), and z is the vector of pairwise similar- ⎡ ⎤
ities obtained from the comparison of email features for 1
any pair of scam messages. ⎣φi − |Iij |⎦ ≥ 0 (2)
2
As pointed out by Grabisch in [28], the expression (1) j =i
here above is remarkable for two reasons:
4.2 Clustering results and experimental validation
- It explains clearly the meaning of the interaction The TRIAGE clustering tool identified 1,040 clusters that
index and Shapley value: a positive interaction consist of at least 5 scam emails correlated by various
combinations of features. Because of the way these clus- the inter-cluster variability (i.e., the separability between
ters are generated (i.e., the multi-criteria aggregation), clusters).
we anticipate that these email clusters represent different To evaluate the quality of our clustering results, we have
campaigns, potentially organized by the same individuals - examined their overall compactness, broken down by indi-
as emails within the same cluster share several common vidual features. The graph compactness (Cp ) is a cluster
traits. validity index that indicates how ‘compact’ (or homoge-
Table 4 provides some global statistics computed across neous) the clusters are, based on their intra-connectivity
the top-250 largest scam campaigns. In over half of these characteristics. Cp reflects the average edge similarity
campaigns, scammers are using only two distinct phone between two objects of the cluster. More formally, for any
numbers, but they still make use of more than five dif- cluster Ck , we calculate a normalized compactness index,
ferent mailboxes to get the answers from their victims. as proposed in [32]:
Most scam campaigns are rather long-lived (lasting on
average about a year). We note that cluster sizes are small Nk −1 Nk
on average, indicating that there are many small, isolated i=1 j=i+1 ω(i, j)
campaigns and only a few dozens of messages belong Cpk = , (3)
Nk (Nk − 1)/2
to the same campaign. This might be also an artefact
of the data collection process; nevertheless, we antici-
pate that this could also reflect the scammers’ behavior where ω(i, j) is a positive weight function reflecting node
who may want to stay ‘under the radar’. Indeed, bulk similarities in the graph, and Nk is the number of members
amounts of the same emails would have more poten- within cluster k.
tial to compromise their scamming operations, as this Figure 5 depicts graphically the overall Cp for the top 50
would become visible for content-based spam filters and, clusters. Besides a few exceptions, most clusters have an
hence, would get blocked on the earlier stages of email average compactness value above 1.5, which in most cases
filtering. is associated with a combination of at least three strongly
correlated features.
4.2.1 Assessment of clustering results Since the TRIAGE tool is keeping track of all individual
As data clustering is essentially an unsupervised classifica- links in the similarity graphs, it is also possible to com-
tion approach, it is important to assess clustering results pute the proportion of emails that are linked by specific
by means of objective criteria, so as to validate the sound- combinations of features within clusters. This can be very
ness of the results. There are mainly two approaches to useful to understand the reasons behind the formation of
perform this validation: external or internal evaluation. the clusters, and hence provide insights into ‘stable’ (less
When there is no ‘ground truth’ information, we usu- volatile) features used by scammers when performing new
ally perform an internal evaluation of the clusters valid- campaigns.
ity to confirm that the obtained results cannot have From Table 5, we can observe that the top combination
reasonably occurred ‘by chance’, or as an artefact of of features that tend to link scam emails (in 13% of the
the clustering algorithm. Various cluster validity indices cases) involves the phone number, the subject, and also all
have been proposed to assess the quality and consis- three email addresses (from, reply, email body) used by
tency of clustering results [31]. In graph clustering, the scammers. To confirm our intuition about the impor-
most indices are based on the comparison of intra- tance of certain features (phone numbers and, to a lesser
cluster connectivity (i.e., the compactness of clusters) and extent, email addresses) and their effective role in iden-
tifying campaigns, we look at all similarity links within
clusters. We observe that the features mainly respon-
Table 4 Global statistics of the top 250 clusters sible for linking scam messages in the clusters involve
Statistic Average Median Maximum phone numbers (in 88% cases), followed by the reply email
Number of emails 38 28 376 address (for 66% of the links). Not surprisingly, the from
address (which can be easily spoofed) changes much more
Number of from 13.9 9 181
often and is used as linking feature in only 46% of cluster
Number of replies 6.2 5 56 formations.
Number of subjects 9.9 7 114 On the other hand, external validation techniques aim
Number of phones 2.5 2 34 at comparing the recovered clusters to an a priori known
Duration (in days) 396 340 1,454 structure. Given the knowledge of the ground truth class
Number of dates (distinct) 27.9 22 259
assignments and the clustering algorithm assignments of
the same samples, external validation metrics measure the
Compactness 2.5 2.4 5.0
similarity of the two assignments based solely on clusters
Figure 5 Overall compactness of the top 50 clusters (broken down by feature).
labels, and by ignoring permutations. Examples of well- • A bounded range in [−1, 1]: negative values are bad
known external validation metrics include: (independent labellings), similar clusterings have a
positive ARI, 1.0 is the perfect match score.
• The (Adjusted) Rand Index, which measures the • Random (uniform) label assignments have an ARI
similarity of two clustering assignments, ignoring score close to 0.0, thanks to the
permutations and with chance normalization; ‘corrected-for-chance’ adjustment made in order to
• The Mutual Information based scores, which
reduce the effect of random labellings.
compares two partitions based on (normalized)
entropy measurements; Overall, ARI and compactness validation indices indi-
• The Homogeneity, Completeness and V-measure, cate that most clustering results are similar, although
which all measure some degree of agreement they have obviously some matching differences with each
between two clustering assignments by defining an other. Still, we observe that removing one or the other
intuitive metric using conditional entropy analysis. feature can greatly impact the data partitioning results,
leading in general to fewer, coarse-grained MDClusters.
To further evaluate our clustering results, we have used Further, we compare our results to results achieved by
here the Adjusted Rand Index (ARI) to compare different applying a different aggregation function. Weighted aver-
sets of clusters obtained by varying the parameters of the aging functions, such as Ordered Weighted Averaging
data fusion model used as input to the TRIAGE algorithm. (OWA) or the weighted mean, can be quite convenient
The results of this comparison is shown in Table 6. Note aggregation functions when we deal with data fusion
that ARI has tasks, in which criteria of interest are expressed with
numerical values (usually, in [0, 1]n ). However, the weights
Table 5 Top coalitions of features across all clusters
used in weighted mean (WM) and the ones defined for
Coalition Percentage (%) the OWA operator play very different roles. Torra pro-
(phone, subject, from, reply, email body) 13 posed in [33] a generalization of both WM and OWA,
(phone, reply, email body) 12 called Weighted OWA (WOWA). This aggregation func-
(phone, subject, reply, email body) 11 tion combines the advantages of both types of averaging
functions by allowing the user to quantify the reliability of
(phone, from, reply, email body) 7
the information sources with a vector p (as the weighted
(phone, subject) 6 mean does), and at the same time, to weight the values in
(phone, from) 5 relation to their relative position with a second vector w
(phone, reply) 4 (as the OWA operator).
(phone, reply, subject) 4 The two analyses that include all features (Table 6),
(phone, reply, subject, from) 4 Choquet and WOWA, are more similar - with an ARI of
0.68 - which seems to validate our selection of the features
others 33
set. Here we used a different method in order to compare
Table 6 Experimental results of different feature sets and clustering algorithms

Threshold ARI Compactness MDC Clusters Emails (% of total) Features
Choquet 0.35 - 3.08 1,036 19,866 13,779 (39%) 6
Choquet, no phone 0.35 0.53 2.64 912 16,745 17,753 (50%) 5
Choquet, no emails 0.35 0.52 2.2 868 13,458 14,884 (42%) 4
Choquet, no subj. 0.35 0.55 2.63 1,034 17,918 17,287 (49%) 5
WOWA 0.5 - 3.3 1,012 21,775 16,920 (48%) 6
WOWA, no phone 0.6 0.56 2.98 1,393 18,032 11,103 (31%) 5
our analysis with another one as there is no other existent we find that these two parameters are uncorrelated. There
previous work to which we could compare. are more changes performed with email addresses by
Since we have no ground truth, it is hard to reach a scammers than with phone numbers, and even larger
final conclusion on which analysis is providing the best clusters of scammers sometimes maintain few email
results. However, the various quality measures presented addresses.
above (Cp and ARI), combined with the data set coverage
(proportion of clustered emails) provides a good indica- 4.4 Clouds of words
tion of the results quality and a good confidence in the 419scam.org [4], as mentioned before, categorizes the
parameters chosen. scam emails into 10 categories. We presented their shares
in the dataset in Section 3. Since the provided cate-
4.3 On the role of emails and phone numbers gorization is rather general, we wanted to evaluate by
One could wonder about the longevity of these key scam ourselves the scam categories present in our dataset by
features; hence, we also looked at phone numbers and measuring the word frequencies in the body of the scam
email addresses from a time perspective. Figure 6 repre- messages. Hence, to extract some additional knowledge
sents the usage of the same email addresses and phone from the clustered data, we create a list of the most
numbers over time. The Y -axis is density of the features repetitive keywords (after removing all stop words) and
that indicates their distribution in time on a 100% scale. As group them into meaningful categories. Stop words are
mentioned before, a larger part of them is used for only 1 ‘words which are filtered out prior to, or after, process-
day, so there is a slight concentration on the left side of the ing of natural language data’ [34]. Examples of such words
plot. However, the phone numbers are more often reused in English could be short function words as the follow-
over time than email addresses. This could be explained ing: a, which, this, was, etc. By removing such words
by an easy access to new mailboxes offered by many from the text, we took into account only words contain-
free email providers. As for the phone, they probably still ing some knowledge about the topic discussed in the
require some financial investment compared with emails. message.
We checked the domain names of email addresses used As a result, we identified three big categories within the
in our scam dataset and found that the top 100 belong clusters: money transfer and bank-related fraud schemes
to webmail providers from all over the world. This find- (54%), fake lottery scam (22%), and fake delivery services
ing suggests that email messages sent from such accounts (11%). The rest is uncategorized and refers to 13% of the
would overpass sender-based anti-spam filters that are clusters. The distribution is quite similar to the one pro-
widely deployed today. vided by the data source, except that the delivery services
If we represent a scatter plot of from email addresses are separated into other categories. The so-called gen-
against phone numbers, on a per cluster basis (Figure 7), eral 419 scam category corresponds to messages about
lost bank payments, compensations, and investment pro-
12.5 Legend posals. We grouped them together as it is challenging to
10.0 Emails clearly separate them due to a large number of shared
7.5
Phones keywords.
Density
5.0
2.5
5 Characterization of campaigns
This section provides deeper insights into 419 scam cam-
paign orchestration. We present several typical scam cam-
10 1000 paigns and show the connections between clusters, which
Duration of emails and phones (days)
are possibly run by the same group of scammers due
Figure 6 Duration of phone numbers and emails used by to multiple strong interconnections among scam emails
scammers, in days.
belonging to the same cluster.
Figure 7 Number of distinct From addresses versus the number of phones used in the clusters. Each node represents a cluster. Node size
indicates the number of emails.
5.1 Scam campaign examples fake lottery campaign located in the upper-left part of the
Here we characterize 419 scam campaigns by looking graph (Figure 9); secondly, a 1.5-year campaign imperson-
at how they are operated. For this purpose we use data ating ESKOM Holdings, an electricity company in South
visualization tools to plot the clustered data in an orga- Africa.
nized manner and look at the ‘big picture’. Campaign Even though scammers changed the topic of their scam,
graphs are likely reflecting the organization of campaigns they kept re-using the very same phone number (rep-
and their maintenance over time. Interestingly, various resented in the center of the diagram). A noteworthy
campaigns have different operational structures and man- aspect of this campaign, shared with other campaigns we
age resources differently, as depicted by the examples in found, is that it relies on a few from email addresses (i.e.,
Figure 8. the bigger nodes in the figure). A set of email addresses
Figures 8, 9, 10 and 11 show examples of different for reply and body was used in this campaign, however,
scam campaigns identified by TRIAGE. These diagrams since the switch of the scam topic a set of mailboxes
were created using graph visualization tools developed and subjects has also changed. Also, we observe that the
in the VIS-SENSE project2 . The graph diagrams are load of the scam campaign is well distributed over time,
drawn using a circular layout that represents the var- and does not exhibit very high peaks on specific dates,
ious dates on which scam messages were sent. The hence keeping very low volumes of emails sent. Finally,
dates are laid out starting from 9 o’clock (far left in the from email accounts used by scammers in this case are
the graph) and are growing clockwise. The other clus- mostly Gmail accounts. As we have no sender IP infor-
ter nodes, which highlight other email features and their mation, we could not verify if these were spoofed or not.
relationships, are drawn using a force-directed node However, in case these are genuine email accounts, this
placement algorithm. The big nodes in the graphs refer suggests that scammers use such webmail accounts for
to phone numbers and from addresses. The smaller long periods of time while staying unnoticed by the email
nodes represent mostly subjects and email addresses providers.
found in the reply and from fields, or in the message
content. 5.1.2 Sify-Rolex campaign
A similar campaign, presented in Figure 8A, illustrates
5.1.1 ESKOM campaign the roles of email addresses and phone numbers in
Figure 9 is an example of a 419 scam campaign quite likely 419 scam. This campaign, which lasted for 1.5 years,
orchestrated by the same cyber criminals. This campaign changed topic five times at a frequency of 1 to 2
actually consists of two sub-campaigns: firstly, a 1-year months, which is visible in the Figure by looking at
A B
C D
Figure 8 Examples of other scam campaign structures. (A) Distinct sub-campaigns, connected through a phone number. (B) A diverse iPhone
scam campaign. (C) International campaign operated in China, UK, and the Netherlands. (D) Spanish lottery campaign often changing phone
numbers and email addresses.
the larger subgroups placed around the circle. These changed as scammers were moving from one campaign to
shorter sub-campaigns were most probably run by the another.
same group of scammers as the same phone number Moreover, these email addresses were often selected to
was reused over all campaigns. Inversely, we observe match the campaign topic and subjects, probably to make
that the email addresses and subjects were completely the scam messages appear authentic.
Figure 9 Lotteries (between 9 and 12 o’clock) and ESKOM Holdings impersonation.
5.1.3 iPhone campaign features changed over time, except a single from address.
While we observed a large number of such easily distin- Interestingly, most of the phone numbers were regularly
guishable campaigns, we also identified a very different, switched over time.
more ‘chaotic’ type of patterns reflecting a very different We also note that both campaigns exploit fake lot-
modus operandi, as demonstrated with the graphical illus- tery topics. For example, the second one represents a
tration in Figure 8B. This diagram shows a cluster repre- Spanish fake lottery campaign that uses topic-related
senting a recent campaign of iPhone-related scams, which email addresses (again, in order to look more legitimate),
lasted for over 1.5 years. The communication infrastruc- and scammers leverage up to 11 different Spanish phone
ture of the scammers operating this campaign is much numbers, which are also regularly changed over time in
more diverse - around 85 unique email accounts were used the campaign. In the middle of the diagram, we can see
in this campaign. Moreover, it relies on a large number a larger node representing an email subject that has been
of ‘disposable’ email addresses, which are typically used reused in a large number of scam emails during this cam-
only once and seldom reused for a long period of time. As paign, showing that scammers were probably reusing the
opposed to previous examples, however, the same or quite same fake lottery email template for all these emails. How-
similar subjects and from email address were often reused, ever, this type of scam cluster illustrates well the chal-
as well as the very same two phone numbers. lenge of identifying such dynamic campaigns, in which
the links between scam emails originating from the same
5.1.4 Internationally-operated campaigns campaign criminal group are constantly changing over time. This
Some scam campaigns still lack organization and do supports our choice of using a multi-dimensional cluster-
not always exhibit very clearly separated patterns, as ing tool, which can not only take into account multiple
illustrated by two campaigns depicted in Figures 8C features but can also identify groups of emails that are
and 8D. Both seem to use fewer email addresses but many linked by varying sets of commonalities (i.e., more volatile
different phone numbers are changing over time. The first features). These complex patterns and this volatility in
one is an international campaign that started with Chinese email attributes can also suggest that cyber criminals
phone numbers (top-left part), then moved to UK-based operate in separate groups, where each group manages
anonymous proxy numbers (top-right part), and ended its own set of mailboxes and phone number(s); however,
with Dutch-based phone numbers. Almost all analyzed these groups are somehow federated and are collaborating
Figure 10 An example of macro-cluster, which consists of 6 different scam campaigns. The nodes laid in clockwise fashion reflect the timeline
of the campaigns.
with each other, for example by sharing the same email As a result, we identified a set 845 isolated clusters,
templates, same distribution lists, or exchanging new and another set of 195 connected clusters, where the lat-
scam topics. ter consists of 62 macro-clusters. The characteristics of
the top six macro-campaigns are shown in Table 7. These
5.2 Macro-clusters: connecting sub-campaigns macro-clusters are particularly interesting as they con-
At the next step, we looked at scam campaigns from sist of a set of scam campaigns that appear to be loosely
a broader perspective: by searching for loosely inter- interconnected and therefore could be also orchestrated
connected clusters. The goal was to pinpoint possibly by the same cybercriminals. In fact, the links between
larger-scale campaigns, which are made of weakly inter- different scam clusters were considered too weak by the
connected scam operations (i.e. different scam runs). For clustering algorithm, because of the decision scheme and
this purpose, we only used email addresses and phone thresholds set as parameters, and thus these various scam
numbers, since the other attributes are not considered runs were eventually grouped into separate clusters. How-
as personally identifiable information. In fact, we looked ever, these weak links can be easily recovered, and it
for clusters that share at least one email address and/or is then up to the analyst to investigate how meaning-
phone number, and use this information to build so-called ful these interconnections really are. Indeed, we believe
macro-clusters. that it is much easier for a cyber investigator to start
staatsnl@hotmail.co.uk
applylottonl@gmail.com
lotto_agenttoyota@yahoo.com.hk
directorworlddebtcommitte@yahoo.co.jp
admin@xcaret.com
peterwaelter66@msn.com
raymoundjohn2007@yahoo.com.hk
+31626006217 administracion@icn.minsa.gob.pe
johnraymound@yahoo.com.hk
raymound_john9@yahoo.com.hk
raymound_john107@yahoo.com.hk
raymoundjohn109@yahoo.com.hk
winner@gmail.com
asfahan@hathway.com
dayzers@gmail.com ashleymoore783@yahoo.com.hk
postcodenational@gmail.com
dblind@gmail.com
deltachem@hathway.com
vangogh@aol.nl
antonielowie@aol.nl
ept.servis@iol.cz
helvetia@hathway.com dannyb@gmail.com
dblind070@aol.nl
claimofficerash@aol.nl
yvonnezwanette@gmail.com nationalpostcodenl@gmail.com
infonoreply@lotto.nl
claimdeprabobnk@aol.nl
temp@1128.org
clay_links@aol.nl
dharmiks@hathway.com claimdeptrabobnk@aol.nl
bensealcole2@unimail.mn
world_ben6@mail.mn clay_1links@aol.nl dr_petermcwealth009@live.com
b_reco_debt1200@biz.by benseal_1234@jabster.pl
bbbbbbbsealcole40@jabster.pl info@aol.nl
bensealcole1967@pravda.ru benseal_cole16@jabster.pl
bensealcole23232@pravda.ru bensealcole575@pravda.ru +447011162749 +8613480076925 lottery@aol.nl
claimoffice@aol.nl postcode09@gmail.com
benseal983@e-mail.ua
b_comm_recon099@biz.by
bensealcole015@unimail.mn nl_pauledward@ymail.com
b_recon_debt509@biz.by
world_ben112@gazeta.pl
benseal_cole1@yahoo.com.hk bensealcole4@hosanna.net
vbencolr55@gmail.com
benseal02@pravda.ru lotterycl@aol.nl
b_cs927@me.by dayzer101@aim.com
coleben23@yahoo.co.jp nl_pauledward11@ymail.com pmcwealth10@yahoo.com.hk
b_com_re400@biz.by
worldde20@e-mail.ua lotteryclaim@aol.nl
wrd_debt354@e-mail.ua
b_comm_recon0107@biz.by
world_ben48@e-mail.ua
nl_pauldwa@aol.nl
world_ben123@gazeta.pl bensealcole121@gazeta.pl
infob2310@pravda.ru
bbsealcole1913@yahoo.com
vanverm@aol.nl
bensealcole008@jabster.pl
benseal989@e-mail.ua bensealcole55@yahoo.com.hk clay_links1@aol.nl
bencole95@gmail.com bencole01@msn.com
bensealcole04@gazeta.pl nl_pauledward0@yahoo.com
benseal007@ldsliving.com
world_committee9@me.by bensealcole123@gazeta.pl
bensealcole213@gazeta.pl
mrbensealcole009@yahoo.com.hk
becole3@hosanna.net ben_seal13@me.by
world_s_c489@e-mail.ua
+447011120798 lotteryclam@aol.nl
bencole05@e-mail.ua nl_pauledward11@live.com
bensealcole@unimail.mn
+31684052846 +447024067752
benseal_cole14@e-mail.ua
benseal_cole13@gazeta.pl
benseal_cole1979@yahoo.com
vbencole333@e-mail.ua
colebevuu1@gmail.com
ben_s_cole4121@gazeta.pl
bensealcole1240092@pravda.ru
+31633664359 markinfo9@yahoo.co.jp
allenm@imf.org
vanvermeer@aol.nl
bensealcole24@unimail.mn drerichangartner2010@yahoo.com.hk nl_pauledward@aol.nl

bensealcole030@gazeta.pl allenmr.mark@yahoo.com.hk
ben_seal_cole100@klikni.cz wrd_debttsealc345@e-mail.ua scotthammer14@yahoo.com.hk
benry@peru.com diplomatdept@oboj.net lotteryclai@aol.nl
ben_seal_cole1000@yahoo.com dhancott@hathway.com peter_mcwealth4487@yahoo.com.hk
benseal2346@pravda.ru 1536248651@wswtv.com
bensealcole00@hosanna.net
bbsealcole87661@jabster.pl sandraills@peru.com
world_ben116@gazeta.pl
markalleninfo7@yahoo.co.jp
world_debt509@gazeta.pl bensealcole1@hosanna.net
bencole001@e-mail.ua
world_debt5091@gazeta.pl
ben00001133@e-mail.ua
benseal0222@e-mail.ua
b_com_re000000101256@biz.by
bencole22@pravda.ru
+31644230304 +31657797011
world_dt589@e-mail.ua
bensealcole019@unimail.mn dayz@dayz.com
bensealcole1969@pravda.ru
frank_ken77@yahoo.co.in 2010-01-27
2010-01-25
2010-01-26
2010-01-24
worldde22@e-mail.ua
wrd_debt89@e-mail.ua vbencole505@pravda.ru
benscole1@jabster.pl
worlddept4@e-mail.ua 2009-12-27
2009-12-25
2009-12-21
2009-12-20
2009-12-19
2010-01-07
2010-01-02
2010-01-01
2009-12-31
2009-12-30
2010-01-12
2010-01-11
2010-01-10
2010-01-09
2010-01-08
2010-01-23
2010-01-21
2010-01-20
2010-01-16
2010-01-15 2010-01-29
2010-01-30
2010-02-02
2010-02-04
2010-02-05
2010-02-06
2010-02-07
2010-02-08
2010-02-10
2010-02-11
2010-02-12
2010-02-15
2010-02-16
2010-02-17
2010-02-19
2010-02-20
2010-02-21
2010-02-22
2010-02-24
2010-02-25
+31617705833 jlarracuente@masterbrands.us
2009-12-18
2009-12-15 2010-03-02
2010-03-11
2009-12-13
2009-12-11 2010-03-12
2010-03-13
s_recon_comm06@biz.by worldde18@e-mail.ua 2009-12-08 2010-03-14 corue@asdmonline.net
bbensealcole00000@yahoo.com.hk 2009-12-01
2009-11-25 2010-03-17
2010-03-18 satchley@maydayaviation.com
2009-11-21
2009-11-19 2010-03-20
2010-03-24 albert@wingsbeachwear.com
ben_seal12@me.by 2009-11-17 2010-03-26
ben00005544@e-mail.ua 2009-11-16
2009-11-15 2010-03-27
2010-03-28
2009-11-14 2010-04-02 payoffice9@aim.com
bencole12@e-mail.ua directorworlddebtcommittee@yahoo.co.jp 2009-11-12 2010-04-03 cipo@univ-ouaga.bf primetime@cinipac.com
b_comm_recon033@biz.by 2009-11-10 2010-04-04
benseal025@pravda.ru 2009-11-07 2010-04-05 tn@hafnet.dk
bensealcole3@unimail.mn 2009-11-04 2010-04-07
ben_s_cole49912@gazeta.pl
bemsealcole19@unimail.mn 2009-11-03 2010-04-08 kmkpaints@kmkpainthorses.com
2009-10-30 2010-04-09
+447045735084
2009-10-26 2010-04-13 ramona510@fortunehitech.net
2009-10-24 2010-04-14 irina.segarceanu_grey@mail.grey.ro
2009-10-23 2010-04-15
s_recon_comm007@biz.by 2009-10-20 2010-04-16
b_comm_recon0105@biz.by 2009-09-20 2010-04-17
infob2327@pravda.ru 2009-09-19 2010-04-18 claimpbecker@yahoo.cn
2009-09-18 2010-04-19
bencole23@pravda.ru
world_ben5@mail.mn 2009-09-17 2010-04-20
worlddept12@jabster.pl 2009-09-16 2010-04-21 dgilbert@goodbyegraffiti.com
bharmon@safetyservicesusa.com
2009-09-13 2010-04-22 senddetails@mail2me.com
benseal987@e-mail.ua 2009-09-05 2010-04-24
world_debtcomm212@gazeta.pl 2009-09-01 2010-04-25 payoffice38@aim.com
2009-08-30 2010-04-29 info@sshapirodesign.com
info_center2@mailbox.hu
ben_seal_cole100@yahoo.de mrbensealcole007@yahoo.com.hk 2009-08-26 2010-04-30
2009-08-21 2010-05-01
worlddep30@e-mail.ua 2009-08-15 2010-05-02 safetyse@safetyservicesusa.com
benseal_c4040@gazeta.pl 2009-08-11 2010-05-04 chris@witpromos.com
2009-08-09 2010-05-05 rep_claimsdetail@yahoo.cn
2009-06-09 2010-05-06
wrd_ddebtrec234@biz.by
benseal0000888@ldsliving.com 2009-06-07 millervanboss@gmail.com 2010-05-07 djgerlach@alleghenyad.com
bangongshi@jsbchina.cn
bensealcole7@unimail.mn bensealcole0000@yahoo.com.hk 2008-12-12 2010-05-09
wrd_debt56@e-mail.ua 2008-12-04 2010-05-10
benseal_cole19@gazeta.pl 2008-10-31 2010-05-11 sean.nolan@yahoo.cn
2008-10-29 2010-05-12
bensealcole76@yahoo.com.hk 2008-10-08 2010-05-14 saimoda-fu1@adetex.co.id
2008-09-13 2010-05-15
2008-09-11 +31617701759 2010-05-16 dissing@hafnet.dk aaseschmidt@hafnet.dk
bensealcole101@yahoo.com.hk b_s_c9022@gazeta.pl
benseal75615@pravda.ru 2008-09-08 2010-05-17
vbcoleben33@jabster.pl 2008-09-06 cenglish@ifgnevis.com 2010-05-18
bbsealcole10@yahoo.com.hk
info006@pravda.ru theval@netrevolution.com
2008-08-23 2010-05-19 vbrymer@theimagegroup.net
2008-08-22 2010-05-21
2008-08-21 allendmark@gmail.com 2010-05-22
worlddep5@e-mail.ua worlddept9@e-mail.ua
infob2318@pravda.ru 2008-08-14 +31617271233 2010-05-23
2008-07-22 2010-05-24 upgradee2010@sify.com
beckerleon17@yahoo.com.hk
ben.cole.cole110@gmail.com
benseal00012@e-mail.ua 2008-07-20 2010-05-27
2008-07-19 2010-05-28 jsk@el-un.ru
otuaka_fmf@yahoo.com
bensealcole020@unimail.mn 2008-07-17 2010-06-03
bbsealcole87652@jabster.pl 2008-07-16 2010-06-04 e-lot@award.nl
wildebarry@thevine.net
mr.millervanboss@yahoo.com.hk bbsealcole1983@yahoo.in
benseal985@e-mail.ua worlddept15@e-mail.ua 2008-07-15 +31649295117wfdmdiplomatic@gmail.com 2010-06-05
worlddep45@e-mail.ua 2008-07-14 jconsultance@gmail.com 2010-06-06 claimrep_becker@yahoo.cn
2008-07-12 2010-06-07
2008-07-10 avery.baker@live.nl
+31616710747
benseal04@e-mail.ua 2010-06-09
b_scole4097@e-mail.ua 2008-07-09 mkallen@post.net 2010-06-10 asriyati@len.co.id
office@pmgconsulting.ro
world_committee5@me.by
2008-07-08 2010-06-11
staatssweepstakescoordinator@live.com b_recon_debt007@biz.by
world_ben3@mail.mn 2008-07-05 alawal.fmf@gmail.com 2010-06-12 shirleysun@hirschglasscorp.com
b_s_cole3177@gazeta.pl 2008-07-04 office21221@yahoo.co.jp 2010-06-17
bensealcole111@gazeta.pl 2008-07-03 2010-06-18
2008-07-02 maimfcode@gmail.com 2010-06-19 cpross@texxa.net
doc@tripcentral.ca
2008-07-01 2010-06-20 badini@univ-ouaga.bf
2008-06-30 2010-06-22 claimdept@aol.nl
2008-06-29 mrsveronicawilliams09@yahoo.co.jp 2010-06-23
2008-06-27
2008-06-26
2008-06-25
2008-06-24
2008-06-23
+447045704575 nationalpstcd@gmail.com
KATHY@VERO.NL
2010-06-25
2010-06-26
2010-06-27
2010-06-28
2010-06-29 +31645415206
contact@sedex.ro
gmitchell@mitchell-media.com
doc@auxjohn.com
2008-06-22 2010-07-01 claimrep_dept@yahoo.cn averybaker1@live.nl
2008-06-21 2010-07-02
2008-06-20 2010-07-03 hrh@princessayo.com
2008-06-09 jason@gearhartgolflinks.com 2010-07-04
2008-06-05 2010-07-06 jj@joslinadv.com
2008-06-04 2010-07-07 writesean_details@yahoo.cn
2008-06-01 office@freestaatsams.biz 2010-07-08 hunewil2@tele-net.net
2008-05-31 2010-07-09 debbiep@printresourcesinc.com
2008-05-30 2010-07-10
2008-05-20 2010-07-11
ecuatours@norlopjwt.com.ec
2008-05-02
2008-03-27
2012-04-14
2012-04-02
enquirydeptlewis@yahoo.cn
dlewis.claimenquiries@live.nl +31684628536
7788988250@fido.ca
2010-07-12
2010-07-13
2010-07-14
2010-07-15
jeravicnanem@cros.net dav.berman@live.nl
2012-03-31 info@e-lot.org 2010-07-18
2012-02-20 2010-07-19
2012-02-18 beckerleon40@yahoo.com.hk test@etnshost.com 2010-07-20
2012-02-17 2010-07-21 eudewallpromonl@yahoo.com.cn
ppsghana@4u.com.gh postcodenationale@yahoo.com.hk
2012-02-15 2010-07-22 armen@uptheside.com
paul@webwealthprofits.net
2012-02-10 bogdanluta@geticdecor.ro 2010-07-23
2012-02-09 dalewis2009@yahoo.cn 2010-07-24
2012-02-08 info@barclaysbankplc.com 2010-07-25 david.berm8an@live.nl
reachcarson@live.nl
2012-02-07 office@infiintari-firma.rowrbirp@iarp.com
kathy@gormannet.com 2010-07-26
2012-02-06 info_claimslewis@yahoo.cn 2010-07-28 sales@webgrup.org
warunee@wg.thainamsiri.co.th
2012-02-03 2010-07-29
2012-02-02 staatsnl@live.co.uk 2010-07-31 info@postcodeloterij.nl
2012-02-01 2010-08-01 alatas@qatar.net.qa
uniworldinc@live.nl mongkol@tourlines.co.th
+31626006051
2012-01-31 2010-08-04
2012-01-30
2012-01-25
2012-01-24
+2348020989864 da_lewis01@live.nl 2010-08-05
2010-08-06
coordinator_iansterling@yahoo.cn2010-08-07
craig@helmsdeep.org
postcodenlnationale@gmail.com
office@euro-ad.ro claimreplewis09@yahoo.cn
quannh@ciren.gov.vn
2012-01-18 2010-08-14
2012-01-17 2010-08-20 postcodenationalecla@gmail.com
2012-01-14 audrey@steelcast.com.au 2010-08-21 claimrepivan@yahoo.cn
2012-01-13 2010-08-23 richard@ectownusa.com
2012-01-10 2010-08-31
2012-01-09 2010-09-06 david.berman@live.nl reach.carson@live.nl
2012-01-06 2010-09-13 gspurlock@setel.com annehannonrsm@eircom.net
2012-01-05 2010-09-16
2012-01-03 e-lot@promo.nl 2010-09-17
2011-12-27 claimrepbecker@yahoo.cn 2010-09-21
2011-12-09 2010-09-29 signup@mypchost.com info.notice@elotto.org.nl
2011-12-08 2010-10-01 zonalclaimsenquiries@yahoo.cn
2011-12-02 2010-10-02
2011-11-30
2011-11-26 +31617563803 claimsofficer@yahoo.cn
2010-10-05
2010-10-06 brownrihneawardoffice2009@yahoo.com.hk
baekyoungan@live.nl
2011-11-15 2010-10-15
2011-11-09 2010-10-16 cobra@justaskit.net
2011-11-05
2011-11-03
2011-10-29
2010-10-17
2010-10-18
2010-10-19
claim_repberman@yahoo.cn
dav.ber879@live.nl
+31684973595
2011-10-26 2010-10-20 uniworld@live.nl
claimsagentjohnlee@yahoo.cn 2011-10-25 2010-10-24
2011-10-24 info@medi-kareonline.com 2010-10-25
2011-10-18 directorweb@gbromania.com 2010-10-27 cdolores@uagro.mx
brian@hafnet.dk
kdesandies1@verizon.net
2011-09-20
2011-09-19
2011-09-16
2011-09-15
2011-09-10
patrice@fracheboud.ch
+31617205556 2010-11-02
2010-11-05
2010-10-30
2010-10-31
2010-11-01
jack@eifin.com
kelvinkerkular.godwin@gmail.com
dav.ber876@live.nl
2011-09-06 2010-11-10
2011-09-05 2010-11-13 a_kail@kam.kz
2011-08-26 2010-11-18 zam@zamiv.com
2011-08-24 info@yahoo.com 2010-11-20
2011-08-23 2010-11-22
2011-08-19 2010-11-24 david.berm888an@live.nl
2011-08-17 2010-11-25 0001fsm@ms1.mercuries.com.tw
2011-08-12 2010-11-28 claim_rep@live.nl
esther.tarkieltaub@hilleltorah.org
2011-08-11 2010-11-30
2011-08-10 2010-12-01
2011-08-09
2011-08-08
2011-08-07
2011-08-06
2011-08-05
2010-12-02
2010-12-03
2010-12-04
2010-12-06
2010-12-07
+31626433877 mrsbeakyoung@live.com
dav.berm76@live.nl
da_lewis@live.nl enquiry_desk@live.nl
2011-08-03 electronic_raffle@games.com 2010-12-08
barclays_bkplc22@aol.com 2011-07-30
2011-07-25
2010-12-14
2010-12-15
2011-07-12 2010-12-16 nesoncole@yahoo.com.hk electrotechnika@hol.gr
2011-07-11 2010-12-20 infomation_desk@mcom.com
2011-06-16 2010-12-21
2011-06-12 2010-12-22
2011-06-11
2011-05-26
2010-12-23
2010-12-25
2011-05-25
2011-05-22 2010-12-26
2010-12-29
2011-05-20 2010-12-30 notic@sweepstakes.com
2011-05-19
2011-05-12 2010-12-31
2011-01-07
2011-05-02 2011-01-16
payoffice69@aim.com 2011-04-22
2011-04-13
2011-04-10
2011-04-08
2011-04-07
2011-04-06
2011-04-05
2011-01-22
2011-01-19
2011-01-24
2011-01-26
2011-01-27
2011-01-29
2011-01-30
orders@rcstores.net
nelsoncole@live.nl
2011-04-04
2011-04-02 2011-01-31
2011-02-01
2011-03-31
2011-03-30 2011-02-02
2011-02-03 eighty8@mato.com
2011-03-29
2011-03-28
2011-03-26 2011-02-04
2011-02-05
2011-02-06
2011-03-22
2011-03-21 2011-02-07
2011-02-08 franklennon52@yahoo.com.hk
2011-03-12
2011-03-11
2011-03-10 2011-02-09
2011-02-10
2011-02-11
2011-03-09
2011-03-08
2011-03-07
2011-03-06
2011-03-05 2011-02-12
2011-02-13
2011-02-14
2011-02-16
2011-02-17 derek@fotogravett.com
2011-03-04
2011-03-01
2011-02-28
2011-02-27 2011-02-18
2011-02-19
2011-02-20
2011-02-21
2011-02-22
2011-02-25
2011-02-24
2011-02-23 jay@timbrecraft.com
fredhansen001@yahoo.com.hk
+31619970279 nelsoncole2010@yahoo.cn
+31647250558 award.cn@kam.kz international_awardprize2010@yahoo.cn
+2347066207558
franklennon76@live.com
claimsofficer8@att.net
+447031824165 ilham@umg.ac.id victoria_claimsdept@yahoo.cn
+31619798868
Electronic_Selection@games.com
+31649163656 AWARD.CN@kam.kz
electronicraffle_draws@games.com
nelsoncole2010@live.nl
remittancedesk@live.nl
colinrobert_claimsdept@yahoo.cn
nelsoncole10@yahoo.cn
marja-liisa.kopponen@kotinet.com
info@yahoo.nl franklennonclaimrep09@yahoo.cn
bbpowers@bprhealth.com
Info@tele2.nl
sjs@shirleyjeanschmidt.com info@live.cn
+31626453477
electronicraffle@games.com
soresenholms09@yahoo.cn responsedesk@blumail.org pcwealth222@gmail.com
cpvv@golfhub.co.za info@solnewmedia.com
kengo@e-mail.jp admin@live-tv-online.net
gloria@envir.com
claimsagent_becker@yahoo.cn husein@2ndsight.com.au
zona44@mayoral.es
info@yahoo.cn
+31619892155 shenningdong@96818.com.cn
claimsofficer2@yahoo.cn
informationdesk@mcom.com
drmcwealtp01@yahoo.com.hk
office@sebastiand.ro
sj@localhost.2www.e-ssociation.net e-lot@live.nl dr_mcwealth@yahoo.co.jp
ELOT@PROMO.NL e-lotto@tele2.nl linda@kenly.com.tw info@lotto.nl globalinternational_award2010@yahoo.cn
officialclaims_leon@yahoo.cn inadesk@jambo.co.ke leonbecker@yahoo.nl remittance_desk@w.cn pktv@tinp.net.tw
warden641@clear.net info@live.nl
magnus.wester@westerware.se
@yahoo.nl paul@paulwallison.com
becker.leon@yahoo.cn
info@elottery.com pascakareba@unm.ac.id
alzfsd@medscape.com
gary@southernrigging.com.au interclaims_office50@yahoo.cn info@loto.cn e-lot@tele2.nl
rep_leonbecker@yahoo.cn admin@chaisedebureau.net weasel71@cablespeed.com
cb@cbfinancial.com
Info@ymail.nl
alvinsotec@yahoo.com
beledk@waybreaker.com
shellg@mm2all.net
beckerleon234@yahoo.cn
infobecker@yahoo.cn
robin@robinperini.com
shahidul.hassan@tradevision.com.bd
+31620669926 jasruddin@unm.ac.id
drmcwealth@yahoo.co.jp
info_beckerleon@yahoo.cn stephen@hainescentre.com
claims_officebecber@yahoo.cn jaffer@jaffershah.com
info@ymail.nl Info@live.nl office@sapte-sapte.ro mitch@transitionfinancial.com drp_mcwealth11@yahoo.co.jp
tom@eckinger.com
office@amslabit.ro jonathon.perino@avidtech.com.au drpmcwealth001@yahoo.com.hk
admin@free1000000clicks.com
iqbal@cybertexit.com
leonbecker@yahoo.cn info@e-loto.cn
paula@healingrooms-ak.org drmcwealth11@yahoo.co.jp
absolom.rwamunono@rra.gov.rw
petermcwealth110@yahoo.com.hk
drpmcwealth11@yahoo.co.jp
dr_mcwealth11@yahoo.co.jp
davidwilliams_claimsdept@yahoo.cn
pcwealth2020@yahoo.com.hk
mercydegelder@live.com
info@giving.com.tw
+31622915948 falcksecurities@aim.com
eubkrbv@aim.com
claimsagentprom011@yahoo.cn
andy@hanabank.co.id
info.nl@nettforlaget.sys.wserver.no
postcodeloterijnl2011@gmail.com enquirydept19@att.net
george@pmlake.net
+31619207655 agentservicebv@aim.com
international_prizeaward2011@yahoo.cn
eusecuritiesbv@netscape.net
edricrupert_claimsdept@yahoo.com.hk zeffreezuma@aim.com
international_prizeaward2011@yahoo.com... form-processor@webs.com
edricrupert_claimsdesk@yahoo.cn
international_prizeawardd2011@yahoo.cn claimsdept01@games.com
agent_dwebber@yahoo.cn edricrupert_claimsdeptdesk@yahoo.cn
agentpalchri03@aim.com
enquirydept18@att.net
info_desk@mcom.com
euscs@aim.com
globalworld2009@aim.com payoffice77@aim.com
benjamin.baker.11@ucl.ac.uk
zeffreetaylor@aim.com
Ibreach.Capital@ausi.com
enquirydept20@att.net
Figure 11 An example of macro-cluster, which consists of 14 sub-campaigns. The nodes laid in clockwise fashion reflect the timeline of the
campaigns.
from a set of really meaningful scam clusters, to gradu- operated in different countries. An example of a macro-
ally increase the decision thresholds up to the point where campaign is illustrated in Figure 10, where it consists of
she can decide herself to stop merging data clusters, as six different scam campaigns of various sizes that include
it might not be meaningful any more to attribute further UK and Nigerian phone numbers.
different campaigns to the same group due to a lack of We can easily distinguish them in the diagram as they
evidence. appear as separate subgroups, each one having one or two
Macro-clusters usually span across long time periods bigger nodes (representing phone numbers reused mul-
and exhibit various bursts of emails reflecting different tiple times) and a tail of connected nodes representing a
campaigns, which use various topics and can even be series of from email addresses.
Table 7 Macro-clusters, mean values of attributes

Number Number of Phones Mailboxes Subjects Duration Countries Topics
campaigns (years)
1 14 44 677 223 4 4 Lottery, lost funds, investments
2 43 163 1,127 463 4 7 Lottery, banks, diplomats, FBI
3 6 18 128 80 4 4 Lottery
4 5 8 111 51 3.5 2 Packaging, lottery, loans
5 6 7 201 96 1 1 Lottery, UPS & WU delivery
6 4 7 82 33 2 1 Diplomats, monetary
and payments scam
Notice that campaigns in this case are well separated some individuals or criminal groups that could be crime
with respect to phone numbers and emails, which are ded- associates.
icated to each campaign (or operation), and the overlaps
between campaigns are quite limited. However, there is a 5.3 Geographical distribution of campaigns
small node just in the center that indicates how these are To better understand how scammers operate geographi-
interconnected (through a common from email address). cally, we look at the data from a different angle. We have
Some contact details were also reused and we used that represented the scam email distribution per country for
for grouping them together. All together, these campaigns three subsets of our original data in Figure 12: (i) for the
lasted for almost 3.5 years. Over this rather long time complete dataset (light grey), (ii) for scam clusters (dark
period, scammers have sent emails using 51 distinct sub- grey), and (iii) for macro-clusters (black). As we can see,
jects and 8 different phone numbers. This diversity of most campaigns identified through scam clusters origi-
the topics suggests that there might be some competition nate either from African countries or from anonymized
among them, as they try to cover different online trick UK numbers. The difference between the light grey and
schemes instead of specializing in a single one. dark grey bars in Figure 12 probably indicates a large num-
Another example of a macro-campaign is illustrated in ber of stealthier or isolated scammers, as they do not form
Figure 11, which consists of 14 sub-campaigns that can be any cluster. Those quite likely refer to unorganized, oppor-
more or less identified in the diagram as separate groups tunistic scammers, or maybe smaller gangs that operate in
revolving around different phone numbers. Each one has a loosely organised fashion.
a few dedicated phone numbers (44 in total) and its own Another interesting point is that macro-clusters (black
set of from, reply, and embedded email addresses. How- bars) cover African and most of the European campaigns,
ever, in this case it appears that scammers were operating forming bigger clusters potentially pointing to large orga-
these different scam runs sequentially, sometimes reusing nized groups of accomplices.
certain resources of previous campaigns. Hence, in foren- Organizing such macro-campaigns might be more ex-
sic investigations, it might be necessary to look some- pensive and difficult to operate, requiring more people
times at weaker links that may possibly connect together to coordinate in various locations and using different
Legend
All
Clusters
10000
Macro-clusters
Emails
5000
0
t s a
ia as nd a ric
l g ium enin hina ones Co erla igeri th Af pain ogo UK
Be B C Ind r y th N u S T
Ivo Ne So
Countries of phone numbers in emails
Figure 12 Scam email distribution by country for the largest scam clusters.
Country
Benin
China
Netherlands
400 Nigeria
Phones South Africa
Spain
UK
200
0
1 2 3 4 5 6
Macro-clusters
Figure 13 Top six macro-clusters: country distribution versus phone number count.
languages. Yet, these macro-campaigns are likely to be We presented in detail several examples of 419 scam
much more profitable, especially for the top-level leaders campaigns, some of which lasted for several years - rep-
of these gangs. resenting them in a graphical way and discussing their
We next look in more detail into the specific origins characteristics. Finally, we uncovered the existence of
of macro-campaigns. Figure 13 shows the country dis- macro-campaigns, groups of loosely linked together cam-
tribution, versus phone number count, for the top 6 paigns that are probably run by the same people. We
macro-campaigns. The last three campaigns are almost found that some of these macro-campaigns are geograph-
exclusively based in Africa, furthermore in only one or two ically spread over several countries, both African and
countries, assuming that anonymised UK phone numbers European. We showed that infrastructure, orchestration,
are most probably used by scammers located in Africa and modus operandi of such campaigns differ from tra-
and hiding behind these European phone numbers. The ditional spam campaigns: campaigns are long and scarce,
first three campaigns are more biased towards Europe, yet they extensively use anonymization tools like webmail
with strong connections to Nigeria and Benin. From an in- accounts for hiding IPs and anonymous proxy phone
depth analysis, we conclude that these groups are compet- numbers. Our analysis has unveiled a high diversity in
ing in several ‘fake lottery’-related scam, with the second scam orchestration methods, showing that scammer(s)
group leading the pack and covering most of the countries. can work on various topics within a campaign, thus
In comparison to the previous study of scam campaign probably competing with each other over trendy scam
[5], we observed much less UK and Nigerian numbers per topics.
group, and confirm that large-scale scam campaigns can We believe that our methods and findings could be
be distributed over several continents. Indeed, the largest leveraged to improve investigations of various cyber crime
macro-campaign identified (#2) seems to be orches- schemes - other than scam campaigns as well.
trated by people distributed in many countries, if we
assume also that that mobile phones are rarely used out- Competing interests
side its country of origin (as highlighted by previous The authors do not have any competing interests.
research [5]).
Acknowledgements
This research was partly supported by the European Commission’s Seventh
6 Conclusions Framework Programme (FP7 2007-2013) under grant agreement no. 257495
In this study, we empirically demonstrated the existence (VIS-SENSE) and no. 257007 (SysSec). The opinions expressed in this paper are
those of the authors and do not necessarily reflect the views of the European
of 419 scam campaigns and a crucial role the phone Commission.
numbers and email addresses play in 419 email scam busi-
ness, in contrast to other cybercriminal schemes where Author details
1 Eurecom, Route des Chappes, Sophia Antipolis, France. 2 Symantec Research
email addresses may be often spoofed and phone numbers
Labs, Sophia Antipolis, France.
rarely used. With the help of a multi-dimensional cluster-
ing technique for grouping similar emails, we identified Received: 30 August 2013 Accepted: 16 December 2013
around 1,000 of 419 scam campaigns. Published: 22 January 2014
References 23. M Grabisch, C Labreuche, A decade of application of the Choquet and

1. Definition of Nigerian Scam. http://en.wikipedia.org/wiki/Nigerian_scam. Sugeno integrals in multi-criteria decision aid. Ann. Operations Res (2009).
Accessed 3 January 2014 http://dx.doi.org/10.1007/s10479-009-0655-8
2. Definition of 419 Scam. http://www.419scam.org/419scam.htm. 24. V Torra, Y Narukawa, Modeling Decisions: Information Fusion and
Accessed 3 January 2014 Aggregation Operators. (Springer, Berlin, 2007)
3. J Buchanan, AJ Grant, Investigating and prosecuting Nigerian Fraud. 25. M Sugeno, Theory of fuzzy integrals and its applications. PhD thesis,
United States Attorneys’ Bulletin. 49(6) (2001) Tokyo Institute of Technology, 1974
4. 419 Scam – fake lottery Fraud phone directory. http://www.419scam.org/ 26. G Choquet, Theory of capacities. Annales de l’Institut Fourier. 5, 131–295
419-by-phone.htm. Accessed 3 January 2014 (1953)
5. A Costin, J Isacenkova, M Balduzzi, A Francillon, D Balzarotti, The role of 27. M Grabisch, k-order additive discrete fuzzy measures and their
phone numbers in understanding cyber-crime, in 11th International representation. Fuzzy Sets Syst. 92(2), 167–189 (1997)
Conference on Privacy, Security and Trust (PST 2013) (Tarragona, Catalonia, 28. M Grabisch, C Labreuche, A decade of application of the Choquet and
10-12 July, 2013) Sugeno integrals in multi-criteria decision aid. Ann. Operations Res.
6. A Pathak, F Qian, YC Hu, ZM Mao, S Ranjan, Botnet spam campaigns can 175(1) (2010)
be long lasting: evidence, implications, and analysis, in SIGMETRICS ’09 29. L Shapley, A value for n-person games, in Contributions to the Theory of
Proceedings of the Eleventh International Joint Conference on Measurement Games, Vol. II, Volume 28 of Annals of Mathematics Studies, ed. by H Kuhn,
and Modeling of Computer Systems Seattle, WA, 15-19 June (ACM, New A Tucker (Princeton University Press, Princeton, NJ, 1953), pp. 307–317
York, 2009), pp. 13–24 30. T Murofushi, S Soneda, Techniques for reading fuzzy measures (iii)
7. DS Anderson, C Fleizach, S Savage, GM Voelker, Spamscatter: Interaction index, in Proceedings of the 9th Fuzzy Systems Symposium
Characterizing internet scam hosting infrastructure. PhD thesis, University (Sapporo, Japan, May 1993), pp. 693–696
of California, San Diego 2007 31. M Halkidi, Y Batistakis, M Vazirgiannis, On clustering validation techniques.
8. Microsoft Security Intelligence Report (2008-2012), http://www.microsoft. J. Intell. Inf. Syst. 17(2-3), 107–145 (2001)
com/security/sir/archive/default.aspx. Accessed 3 January 2014 32. F Boutin, M Hascoet, Cluster validity indices for graph partitioning, in 8th
9. J Isacenkova, O Thonnard, A Costin, D Balzarotti, A Francillon, F Eurecom, International Conference on Information Visualization (IV) (London, 14-16
Inside the SCAM jungle: a closer look at 419 scam email operations, in July 2004)
International Workshop on Cyber Crime (IWCC 2013) (The Westin Hotel, 33. V Torra, The weighted {OWA} operator. Int. J. Intell. Syst. 12(2), 153–166
San Francisco, CA, 24 May 2013) (1997)
10. M Cova, C Leita, O Thonnard, AD Keromytis, M Dacier, An analysis of 34. Stop Words. http://en.wikipedia.org/wiki/Stop_words. Accessed 3
rogue AV campaigns, in Proceedings of the 13th International Conference January 2014
on Recent Advances in Intrusion Detection (Springer-Verlag, RAID’10, Berlin,
Heidelberg, 2010), pp. 442–463. [http://portal.acm.org/citation.cfm?id= doi:10.1186/1687-417X-2014-4
1894166.1894196] Cite this article as: Isacenkova et al.: Inside the scam jungle: a closer look
11. O Thonnard, M Dacier, A strategic analysis of spam botnets operations, in at 419 scam email operations. EURASIP Journal on Information Security 2014
Proceedings of the 8th Annual Collaboration, Electronic messaging, 2014:4.
Anti-Abuse and Spam Conference, CEAS ’11 (ACM, New York, 2011),
pp. 162–171
12. O Thonnard, L Bilge, G O’Gorman, S Kiernan, M Lee, Industrial espionage
and targeted attacks: understanding the characteristics of an escalating
threat, in RAID, Amsterdam, The Netherlands, 12-14 September (Springer,
New York, 2012), pp. 64–85
13. C Tive, 419 Scam: Exploits of the Nigerian Con Man. (iUniverse, New York,
2006)
14. F Stajano, P Wilson, Understanding scam victims: seven principles for
systems security. Commun. ACM. 54(3), 70–75 (2011)
15. C Herley, Why do Nigerian Scammers say they are from Nigeria?, in
Proceedings of the Workshop on the Economics of Information Security
(Berlin, Germany, 25-26 June 2012)
16. J Oboh, Y Schoenmakers, Nigerian advance fee Fraud in transnational
perspective. Policing Multiple Commun. 15, 235 (2010)
17. Y Gao, G Zhao, Knowledge-based information extraction: a case
study of recognizing emails of Nigerian frauds, in Policing multiple
communities, NLDB’05 (Springer-Verlag, Berlin, Heidelberg, 2005),
pp. 161–172
18. C Graham, J U. S. Patent 7917655 Nicholas, Method and system for
employing phone number analysis to detect and prevent spam and
e-mail scams. 29 March 2011 http://www.patentlens.net/patentlens/
patent/US_7917655/en/ Submit your manuscript to a
19. O Thonnard, A multi-criteria clustering approach to support attack journal and beneﬁt from:
attribution in cyberspace. PhD thesis, École Doctorale d’Informatique,
Télécommunications et Électronique de Paris, 2010 7 Convenient online submission
20. G Kondrak, N-gram similarity and distance, in Proceedings of the 12th 7 Rigorous peer review
Conference on String Processing and Information Retrieval Buenos Aires, 2-4 7 Immediate publication on acceptance
November (Springer-Verlag Berlin, Heidelberg, 2005), pp. 115–126
7 Open access: articles freely available online
21. LA Zadeh, A computational approach to fuzzy quantifiers in natural
languages. Comput. & Math. Appl. 9, 149–184 (1983) 7 High visibility within the ﬁeld
22. RR Yager, Quantifier guided aggregation using OWA operators. Int. J. 7 Retaining the copyright to your article
Intell. Syst. 11, 49–73 (1996). http://dx.doi.org/10.1002/(SICI)1098-
111X(199601)11:1<49::AID-INT3>3.0.CO;2-Z
Submit your next manuscript at 7 springeropen.com

Inside The Scam Jungle: A Closer Look at 419 Scam Email Operations

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Inside The Scam Jungle: A Closer Look at 419 Scam Email Operations

Uploaded by

Copyright:

Available Formats

Isacenkova et al.

EURASIP Journal on Information Security 2014, 2014:4

R ESEA R CH Open Access

Inside the scam jungle: a closer look at 419

1 Introduction human factors: pity, greed, and social engineering tech-

Figure 1 Scam email categories over time.

Figure 2 ‘419 scam’ category phone numbers over time by countries.

Figure 3 TRIAGE workflow on scam dataset.

Figure 5 Overall compactness of the top 50 clusters (broken down by feature).

Table 6 Experimental results of different feature sets and clustering algorithms

Figure 9 Lotteries (between 9 and 12 o’clock) and ESKOM Holdings impersonation.

bensealcole24@unimail.mn drerichangartner2010@yahoo.com.hk nl_pauledward@aol.nl

+447031824165 ilham@umg.ac.id victoria_claimsdept@yahoo.cn

Table 7 Macro-clusters, mean values of attributes

References 23. M Grabisch, C Labreuche, A decade of application of the Choquet and

You might also like