Professional Documents
Culture Documents
Abstract— Phishing is a threat in which users are sent fake This work presents a method to choose the most efficient
emails that urge them to click a link (URL) which takes to a feature in detecting phishing emails. The importance of the
phisher's website. At that site, users' accounts information could
be lost. Many technical and non-technical solutions have been selected feature is determined by calculating its Effectiveness
proposed to fight phishing attacks. To stop such attacks, it is Metric (EM) value based on three criteria which derived based
important to select the correct feature(s) to detect phishing on, and related to three general aspects of email. These three
emails. Thus, the current work presents a new method to aspects of email are, email's sender, email's content, and
selecting more efficient feature in detecting phishing emails. Best email's receiver.
features can be extracted from email's body (content) part.
Keywords and URLs are known features that can be extracted The rest of this paper is organized as follows. In section II,
from email's body part. These two features are very relevant to we provide a background on feature selection propositions.
the three general aspects of email, these aspects are, email's
sender, email's content, and email's receiver. In this work, three Section III describes feature selection process used in this
effectiveness criteria were derived based on these aspects of work. This is followed by presenting the process of calculating
email. Such criteria were used to evaluate the efficiency of features’ EM values (Section IV). In section V, we discuss the
Keywords and URLs features in detecting phishing emails by most important results obtained in the current work. Finally,
measuring their Effectiveness Metric (EM) values. The the conclusion and suggestions for future work are presented
experimental results obtained from analyzing more than 8000
ham (legitimate) and phishing emails from two different datasets in section VI.
show that, relying upon the URLs feature in detecting phishing
emails will predominantly give more precise results than relying II. BACKGROUND ON FEATURE SELECTION
upon the Keywords feature in a such task. PROPOSITIONS
Keywords— phishing, Keywords feature, URLs feature, ham Email generally consists of two parts, header and body.
emails, phishing emails, effectiveness metric. Email's header is a set of structured fields such as, from, to,
subject, and routing information. Email's body is the actual
I. INTRODUCTION content of the email which is the foremost part users are
Phishing is an attack that makes Internet users reveal their concerning about and dealing with. The main features to
personal information to un-authorised party. Most phishing detect phishing emails can be extracted from these two parts
attacks start when users receive fake emails asking them to of the email. Such features are identified in [3] and presented
click a URL (link) to update their accounts' information. Once in Table I. A detailed description of these features can be
clicked, this URL will deliver the user to a fake website where found in many studies such as [2],[4].
he/she will most probably lose control over its account
information. According to Anti-Phishing Working Group TABLE I. Main categories of phishing emails detection
features
report, the number of URLs which were used to host phishing
attacks has increased from 164,023 in the first quarter of 2012 Email's Part Feature / Set of Features
to 175,229 in the second quarter of the same year [1].
subject-based features
To detect phishing emails, it is important to choose the features extracted from
sender-based features
email's header
right detection feature(s). Among the available various anti- behaviour -based features
phishing solutions, there is a considerable number of features
URL-based features
which have been suggested to best classify ham (legitimate)
features extracted from keyword-based features
and phishing emails. However, in many cases, these features
email's body (content) form-based features
are inappropriately chosen. This is because they are selected
based on the author's intuition about their effectiveness in script-based features
email classification process [2].
REFERENCES
[1] http://docs.apwg.org/reports/apwg_trends_report_q2_2012.pdf
[2] Toolan, F. & Carthy, J. Feature selection for spam and phishing
detection. In: eCrime Researchers Summit (eCrime), 2010, 2010.
IEEE, 1-12.
[3] Hamid, I. R. A. & Abawajy, J. 2011. Hybrid feature selection for
phishing email detection. Algorithms and architectures for parallel
processing. Springer.