You are on page 1of 9

Distinguishing Attacks from Legitimate

Authentication Traffic at Scale


Paras Joshi, Parth Arora, Abhisek Mohapatra
2020201028, 2020201054, 2020201020

1 Introduction
In today’s world, one of the most commonly used means of authentication and
security is Passwords as they are user-convenient and are easier to maintain and
remember. Passwords are the most commonly used means of authentication
as passwords are very easier to implement, convenient for users and are user
friendly.
There are mainly 2 types of Password based attacks : Online Attacks and
Offline Attacks.
Some of the examples of Offline based Password attacks include Eavesdropping
the communication channel and recording or manipulation of the conversations
that are taking place on the communication channel. Mainly offline attacks are
faster as compared to online attacks as the involvement of online authentication
server is not there, which will eventually result in the faster access of data to
and from the database. Also offline attacks, are computationally faster as the
password crack mechanism is held offline, which will happen usually happen in
depth first attack, as usually we are targeting a single user at the instant.
The online based Password attacks can be majorly classified into Targeted
(Depth First) and Untargeted attacks (Breadth First) from two types of
attacks: In a particular attack a mostly guesses are being ran for a small number
of accounts that eventually is a particular interest to an attacker. The email
or social networking accounts of celebrities, for example, are obvious targets.
In an untargeted attack guesses are varying and depicting to a huge number
of accounts. Here the interest of the attacker is not in particular individual so
much as an information gain or an account response from the user. It can be
seen that the malicious or spam emails are sent to the accounts that are more
famous or are in a limelight compared to that of that are hidden or are securely
filtered.

1
2 Abstract
This paper shows how to accurately estimate the odds that an observation x
indicates that a request is malicious. Our main assumptions are that successful
malicious logins presumably are a small in fraction compared to the the total,
and that the distribution of x in the admissible traffic is static or stationary,
or very-slowly changing. From these it shows how we can estimate the ratio of
bad-to-good traffic among any set of requests, so resulting to the question that
how can we find the subset of traffic that is admissible and how to get the help
to find the distribution of x from the least attacked subsets over the admissible
data, and then finally the odds ratio is being calculated from the estimation
value obtained.
A sensitivity analysis afterwards shows that although we are unable to get the
subset having little traffic our analysis to estimate the odds is quiet robust.

2.1 Absence of Labels


We are dealing with a large number of incoming authentication requests that
are generally an unknown mixture of benign and malicious traffic. There will
be failures and successes among both the legitimate and malicious traffic. It
is ideal to use Supervised Learning Algorithms in case of labelled training set.
However, there is no reasonable way to obtain reliable labels to categorize each
of the incoming authentication requests since when we look at something like
unauthorized login attempts, we never really know the ground truth.
The inability to get labels obviously rules out supervised learning approaches.
It also means that we will be unable to compute performance metrics such as ac-
curacy, F1-score, true and false positives, precision and recall, etc. This greatly
complicates the question of evaluation. Since they lack labels, unsupervised
approaches to classification must make assumptions about the classes and how
they differ. The better the assumptions match reality, of course the better the
performance will be, but without labels we can’t directly measure how well an
assumption holds.
In order to deal with unlabelled data set, we need to use some reasonable as-
sumptions regarding the network traffic. However, determining these set of
assumptions poses a huge challenge since we need to know certain situations
like to what degree attack and benign traffic originate from the same addresses
and if an attacker has only a small number of IP addresses or large pool of IP
addresses etc. It is very difficult to predict any reasonable pattern to detect
attack traffic from the incoming network traffic because the attacker traffic is
highly unpredictable and inconsistent and is notoriously volatile and subject to
change behaviour.
We now find ourselves in a dilemma. Without labels, we have no choice but to
rely on assumptions for our block decisions. However, without labels we also
can’t compute the metrics needed to test how well those assumptions are doing.
Hence, anything we might learn from a particular set of logs can’t necessarily

2
be relied upon elsewhere.
Considering the above challenges, in this paper we make only one assumption
about attack traffic:

Assumption : Malicious logins are a small fraction of total logins. This should
be inherently true of guessing attacks. First, any service for which it is not true
(i.e., admissible logins do’nt outnumber malicious logins in large proportion)
and it seems that it would give a certain utility to the users. Secondly, for the
best case, even the attacker’s per-guess success rate of about 0.1% ; if admissi-
ble requests succeed 95% of the time, and traffic is a 50/50 mix of benign and
malicious then malicious logins would get outnumbered by benign ones in huge
amount950:1.

2.2 Our attack on the Problem


For handling this problem we will assume x as a categorical feature that we
have to estimate at the end as our conclusion. Now, in order to decide whether
the request is malicious, we might want to get the odds which can be calculated
by the below formula,

P (mal | x) P (x | mal) P (mal)


= · = Θ(x) · Ψ (1)
P (mal | x) P (x | mal) P (mal)

This might allow us to decide to lock an account, block requests from an IP


address or other defensive action.
Since x is the categorical feature whose observed distribution is the sum of
distributions in the benign and malicious traffic, we can write P (x) as follows:

P (x) = α · P (x | mal) + (1 − α) · P (x | mal)

where 0 ≤ α ≤ 1. The problem is that we don’t know P (x | mal), P (x | mal) or


α. While the malicious traffic is non- stationary, we expect the benign traffic,
P (x | mal), as the aggregate of the independent actions of a very large number
of users, to vary slowly, if at all. We begin with calculating P (mal) / P (mal)
or Ψ,

P (mal)
Ψ= = (1 − α)/α
P (mal)
By using this deduction we can identify the subsets of the data where the distri-
bution of malicious traffic is very little and the traffic is mostly clean ,i.e, α ≈
1. With this the observed distribution closely approximates that of the benign
traffic with P (x) ≈ P (x | mal).

With this observation we can estimate Θ(x) in eq(1) as,

P (x | mal) P (x) − α̂ · P̂ (x | mal)


Θ(x) = = (2)
P (x | mal) (1 − α̂) · P̂ (x | mal)

3
where,ˆrepresents estimate value. One thing to note here is, in order to sim-
plify we have represented our approach for single categorical feature and we as-
sume that distribution of features in the benign traffic is independent
(A2) and slowly-varying.

P(x,y | mal) = P (x | mal) · P (y | mal)(3)

Coming back to the equation (2), we represent the two component ratios of the
likelihood ratio test as Ψ and Θ(x). Ψ is the ratio of bad-to-good traffic (often
called the pre-observation odds); Θ(x) is the ratio of how common x is in the
malicious traffic over how common it is in legitimate traffic (often called the
likelihood ratio). The product, Θ(x) · Ψ, is often called the post-observation
odds.

In addition to this we assume certain things like Malicious logins are a small
fraction of total logins (A1), which can be observed from the fact that legit-
imate user succeed at least 95% of the time while the attackers fail at least 99% of
the time. Moreover we assume that rate of login failures from benign users
is independent of feature categories (A3), which means that the benign
failure rate should be approximately constant when users are grouped by cer-
tain features like browser version, IP address pool etc. This assumption does
not hold for malicious traffic.

3 Model’s Estimations on test Statistics


A. Estimating P (mal)/P (mal))

An authentication server handles login requests. Various requests consist s


of L successful logins and F failed attempts. Initially we are unaware of any
data of traffic onto the network. So there is an unknown mix of legitimate and
malicious traffic.
Benign logins − − > Lb
Malicious logins − − > Lm
Combined number of benign and malicious logins is :

L = Lb + Lm

Benign Fails − − > Fb


Malicious Fails − − > Fm
Combined number of benign and malicious fails is :

F = Fb + Fm

4
. To estimate the last term in the likelihood ratio test, the ratio of bad traffic
to good is :
P (mal)(k) 1 − α(k) (Lm + Fm )
ϕ= = =
P (mal)(k) α(k) Fb + Lb
Consider the ratio of fails to logins. Under our assumption,

A1 Lm << Lb (i.e., Lm /Lb ≈ 0) we get:

F Fb + Fm (Fb /Lb + Fm /Lb )


= =
L Lb + Lm 1 + Lm /Lb

Fb Fm
≈ +
Lb Lb
So if legitimate requests succeed 95% of the time, and traffic is a 50/50 mix of
malicious and benign then 1 + Lm /Lb = 1.0011 ≈ 1.
Benign traffic is the aggregate of a large number of users (≈ hundreds of mil-
lions) acting independently. For a large enough number of requests :
(Fb /Lb ) ≈ const. Because, login failures from legitimate users to happen in-
dependently at some rate p by means of typos, mis remembered passwords or
usernames etc, which we assume happen independently across the population.
So, a total of Lb + Fb benign requests should result in p · (Lb + Fb ) = Fb failures.
The ratio of benign fails to logins should be fairly constant over large enough
collections of authentication requests. Hence, if we divide our data into large-
enough subsets of requests, we get for any given subset :

F (k) Fm (k)
≈c+
L(k) Lb (k)

where c = p/(1 − p) and k is just an index over subsets. The subsets can
be any grouping of the login request data so long as they have enough entries
to ensure that the confidence interval is small relative to p. For example, the
subsets can have request data from particular time slots, requests that come
from particular IP-address ranges, requests with particular user-agent strings,
or requests against particular sets of accounts. Combinations of those things are
also possible, e.g., we could form a subset of request data during a particular
time slot, from a particular range of IP addresses against a certain collection of
accounts. We’ll take subsets that are grouped by time intervals (e.g., hour or
day) as well as by features such as browser and IP address; to avoid unnecessarily
encumbering the notation we don’t explicitly add an index for time.
We are assuming that the benign failure rate p is independent across features;
that is, Fb (k)/Lb (k) ≈ const.
We expect the benign traffic from one particular range of IP addresses to fail at
the same rate as others. We are not assuming the same of the malicious traffic,
and we are not assuming the same of the observed traffic. Observed logins,
F (k)/L(k), are due to malicious traffic. That is, if a subset has no malicious
traffic, Fm (k) = 0 and then F (k)/L(k) = c = p/(1 − p). Further, since Fm (k)

5
must be positive, malicious traffic can cause increases in F (k)/L(k) above the
baseline value c, but cannot cause decreases. Estimate the malicious traffic in
a particular subset is observed with the help of value c.

F (k)
Fm (k) = Lb (k) · ( − c) ≈ F (k) − c · L(k)
L(k)

So we could estimate the amount of malicious traffic in a timeslot, or from


a range of IP addresses, or from a particular useragent if we can determine
this single number c = p/(1 − p). This value should also be very stable over
time as we do not expect that rate at which users mis-type and mis-remember
passwords etc, to change significantly over time. Hence, we have a potentially
accurate estimate of the amount of malicious attack traffic if we can determine
this single constant c, which at least in principle, need only be measured once.
Also, the larger the subset that we use the smaller is the confidence interval.
Hence, there’s a significant advantage to having large subsets. The ratio of bad
traffic to good for any subset that we need in the likelihood ratio test can now
be easily estimated as

P (mal)(k) Fm (k)
≈ (4)
P (mal)(k) Lb (k) + Fb (k)

F (k) − L(k) · c
=
L(k)(1 + c)
This involves only quantities that we observe directly (i.e., F(k) and L(k)) and
p = c/(1 + c) (to be estimated).

B. Estimating c = p/(1 − p) and P (x/mal)

To get an accurate estimate of the constant p/(1 − p). We will find an esti-
mate the rate at which users enter either password or username incorrectly.
Various Observations –

• It is likely that a where we need to login on an yearly basis or so will have


a much higher benign failure rate than a social networking or email site
where a majority login every day.
• Sites that force mandatory password expiration every 90 days likely have
higher fail rates than those that do not.
• Also, we can expect benign failures that come from mobile clients to occur
at a very different rate from clients that have a conventional keyboard.
• The mix of traffic from mobile to conventional clients is probably very site
dependent (e.g., many people probably access facebook or twitter using
apps on their phone, while it is likely that few people access a government
tax site in this way).

6
• Finally, users may feel more reticent to allow a browser to auto-fill cre-
dentials for a bank site than for a video-streaming service or online news-
paper; this might make password submission more likely to be automated
and thus less subject to error.

For all of these reasons it seems preferable to estimate p on a per-site basis and
to update this estimate periodically. Malicious traffic can increase F (k)/L(k)
above c but not decrease it. Thus, a simple upper-bound estimate for our
constant is:
F (k)
ĉ = mink ≥c (5)
L(k)
Here the minimization is taken over all subsets large enough to satisfy our as-
sumption that the confidence interval is small enough to ensure that F b/Lb is
approximately constant.
We’ll examine subsetting strategies as follows :
Once we have this we can estimate p̂ = ĉ/(1 + ĉ). Call the index of the subset
of our data that achieves the minimum k0 . Note that k0 is the subset, where
Fm (k0 )/Lb (k0 ) is minimized. It’s easy to show that Fm (k)/Lb (k) is monoton-
ically decreasing with α(k) and thus this subset contains the least amount of
attack traffic among all the subsets. Hence, in finding our estimate ĉ we’ve also
found an estimate for the distribution of x over the benign traffic:

P (x | mal) ≈ P (x)(k0 ) (6)

We’ve dropped the k-index since, by assumption A3, the distribution of x in


the benign traffic is stationary and independent of features such as browser ver-
sion, IP address and time. If we over-estimate c then we wrongly consider some
malicious traffic as benign. This means that we under-estimate the numerator
and over-estimate the denominator in P (mal)/P (mal). Thus our estimate is
conservative in the sense of erring toward considering traffic benign rather than
malicious.

C. Estimating P (x|mal)/P (x|mal)) :

We need to evaluate P (x | mal)/P (x | mal) since we need to use the Like-


lihood Ratio test.
Now, let us assume that the chosen subset has proportion of (1 − α(k)) and α(k)
of attack and benign traffic respectively. So, we have,

P (x)(k) = α(k) · P (x | mal) + (1 − α(k)) · P (x | mal)(k)

Now, we can drop the k index for P (x | mal) since it is constant across subsets.
On further simplifying the above equation, we get α(k) as :

P (mal)(k) 1 − α(k)
= (7)
P (mal)(k) α(k)

7
So, we can now calculate α(k) for any subset assuming that we have first esti-
mated P (mal)(k)/P (mal)(k) Now, the malicious portion of the traffic is:

P (x)(k) − α(k) · P (x | mal)


P (x | mal)(k) =
(1 − α(k))

The benign traffic in this particular subset is P (x | mal) weighted by α(k). So


now the overall odds are :

P (x | mal)(k) P (x)(k) − α̂(k) · P̂ (x | mal)


Θ(x)(k) = = (8)
P (x | mal) (1 − α̂(k)) · P̂ (x | mal)
where α̂(k) is an estimate of α(k)

4 Overall Algorithm
At First, we will try to estimate the benign traffic since they are mostly sta-
ble and do not need to be re-estimated again. From the stream of incoming
requests, we select a time-interval and divide our incoming data stream into S
subsets using specific strategies. The overall algorithm implementation can be
summarized as:
1. In the first step of the algorithm, we calculate the estimated value of ratio
of benign failure-to-benign logins, i.e ĉ = mink FL(k) (k)
.

2. Using ĉ, we calculate the estimated p̂ = ĉ/(1+ĉ) which the benign failure
rate.

3. Estimating P̂ (x | mal) with the help of equation (6).


4. Using equation (4), for a received request that belongs to subset k com-
puted over time interval we estimate Ψ̂(k) = P̂ (mal)(k)/P̂ (mal)(k).
5. In the next step, we calculate α̂(k) using the equation (7).

6. Next, we will estimate Θ̂(x)(k) = P̂ (x | mal)(k)/P̂ (x | mal) for each value


of categorical feature using equation (8)x .
7. Finally, we will calculate post-observation odds as the product: Θ̂(x)(k) ·
Ψ̂(k) .
This allows calculation of the likelihood that any particular request is malicious.
There are many different ways of acting on this information such as denying
those requests for which the likelihood exceeds a threshold value. Also, we can
keep track of IP Addresses from which the odds of requests being malicious are
high and the accounts with high average can be flagged for audit or subjected
to authentication measures such as Captcha challenges.

8
5 Conclusion
The problem addressed in this paper is to protect an authentication server
from online password attacks mainly focusing on password guessing attacks.
The algorithm proposed in this paper helps to estimate the ratio of malicious-
to-benign traffic and hence determine the odds that an observed request is
malicious. The scope of the algorithm can be judged with the fact that it does
not assumes scarcity of properties of the attacker and avoids base rate neglect.
Following these rules we identify subsets of the request data with least attack
traffic which help us to estimate the distribution of the values of observation
over legitimate data and calculate the odds ratio.

6 Suggestions
• However, as with any statistical technique, it is important that we use
features that distinguish well between attack and benign traffic.
• A main source of error is the possibility that we estimate c from a subset
that is not entirely benign, and contains some attack traffic. We can see
how this affects the value that we get for ĉ and then examine how that
inaccuracy changes through the other quantities.
• As the model is totally based upon the methods of taking a huge amount of
dataset and then according to it we are estimating the value of parameters,
(without using any prediction, but with the help of assumptions). So we
may use here any good machine learning methods to first pre-process the
data based upon various output values and then perform the analysis and
estimation of the model.

You might also like