/  13
 
Entropy Bounds for Traffic Confirmation
Luke O’Connor
October 21, 2008
Abstract
Consider an open MIX-based anonymity system with
participants and abatch size of 
b
. Assume a global passive adversary who targets a given participantAlice with a set
R
A
of 
m
communicating partners. Let
(
R
A
| B
t
) denote theentropy of 
R
A
as calculated by the adversary given
t
message sets (MIX batches)where Alice is a sender in each message set. Our main result is to express therate at which the anonymity of Alice (as measured by
R
A
) degrades over time as afunction of the main parameters
,
b
and
m
. Assuming
m,b<
, we prove thatthere is a threshold
t
=
O
(
m
ln
) such that when
t
=
t
+ 5
cm
, for any integer
c>
0, thenPr(
(
R
A
| B
t
) = 0)
>
1
e
c
(1)where the probabilities are computed over the random communication model. Thusonce the attacker has collected a threshold
t
of messages sets, each additional5
m
message sets collected decreases
(
R
A
| B
t
) towards zero geometrically. Weprovide formulas for computing the implied constant in
t
=
O
(
m
ln
) and alsoto improve the constant 5 in
t
=
t
+ 5
cm
.When
N >O
(
bm
ln
m
), or in general for large
, the threshold can be improvedto
t
=
O
(
m
(ln
b
+ ln
m
)), which is independent of 
. Further, if 
b
=
O
(
m
) thenthe threshold simplifies to
t
=
O
(
m
ln
m
) and this bound is asymptotically optimalsince the coupon collector problem indicates that
t
= Ω(
m
ln
m
).
The author can be contacted via email at
lukejamesoconnor@gmail.com
.
1
 
1 Introduction
Traffic analysis is a collection of techniques for inferring communication relationshipswithout having access to the content that is being communicated.
Traffic confirmation 
is a special subcase that attempts to infer communication relationships amongst userswhose communications are mediated through an anonymity system. We assume that thatthe underlying anonymity system is a threshold MIX, with batch size
b
, supporting
participants, each of whom may act as both a sender or receiver. We consider the class of traffic confirmation attacks where an attacker targets a specific user Alice and attemptsto infer her communication partners from observing her communication patterns throughthe MIX. The attacker is assumed to be a global passive adversary, and is therefore ableto determine the set of senders and receivers for each message set (batch) submitted tothe MIX. Let
R
A
be the set of Alice’s communicating partners where
|R
A
|
=
m
, and let
(
R
A
| B
t
) be the entropy of 
R
A
after observing
t
rounds of the MIX. Intuition suggeststhat
(
R
A
| B
t
) will tend to zero as
t
increases since the attacker can ignore messagesets where Alice is not a sender.Our main results are to express the rate at which the anonymity of Alice (as measuredby
R
A
) degrades over time as a function of the main parameters
,
b
and
m
. Assuming
m,b <
, we prove that there is a threshold
t
=
O
(
m
ln
) such that when
t
=
t
+ 5
cm
, for any integer
c >
0, thenPr(
(
R
A
| B
t
) = 0)
>
1
e
c
,
(2)where the probabilities are computed over the random communication model (RCM) [1].This result has the following interpretation. Once the attacker has collected a threshold
t
of messages sets, each additional 5
m
message sets collected decreases
(
R
A
| B
t
)towards zero at a geometric rate governed by 1
/e
. The constant 5 is an upper bound andcan be improved in practice. Using the benchmarking parameters
= 20
,
000,
b
= 50and
m
= 20, our results yield that when
t
= 292 + 22
c
the bound of (2) applies. When
t
= 292 + 22
·
10 = 512, for example, then Pr(
(
R
A
| B
512
) = 0)
>
0
.
9999.With a threshold of 
t
=
O
(
m
ln
), the number of messages required to identify thecommunicating partners of Alice increases as the number of participants increases, albeitonly logarithmically. This is somewhat counterintuitive since larger values of 
shouldactually decrease the likelihood of having hitting sets different from
R
A
. We show that if 
N > O
(
bm
ln
m
) then the threshold can be improved to
t
=
O
(
m
(ln
b
+ln
m
)), which isindependent of 
. Further, if 
b
=
O
(
m
) then the threshold simplifies to
t
=
O
(
m
ln
m
)and this bound is asymptotically optimal since the coupon collector problem indicatesthat
t
= Ω(
m
ln
m
).The motivation for our work was to investigate if there are relatively simple rela-tionships between the main parameters and the level of anonymity they provided as thenumber of observations by the attacker increases. Previous work on traffic confirmation2
 
[5, 1, 2, 3, 6, 7] has not convincingly undercovered these fundamental relationships ina manner that is instructive for both system users and designers. This is somewhatsurprising given the simplicity and symmetry of the RCM. Some of the previous workmight be best described as “proof by plotting”, often because the bounds on degradinganonymity are derived from system simulation or the evaluation of complex formula thathave no known closed form.
2 Preliminaries
Let the total set of communicating users be
, where
|U|
=
and
is large, sayat least several tens of thousands. The set of potential senders will be denoted as
and similarly define
R
for the recipients. We assume that
|S|
=
|R|
=
, meaningthat each user may send sender or receive messages. For each sender
, let
R
R
denote the communicating partners of 
. Communication between the users is mediatedby a MIX that operates over a series of rounds, indexed by a parameter
t
1 whichcan be thought of as symbolizing time. At each round the MIX collects a message set(batch)
B
t
consisting of 
b
messages sent by a collection of senders
t
. After processingthe MIX delivers the messages of 
B
t
to a collection of recipients
R
t
. It is assumed thatthe MIX perfectly hides the communication patterns between the members of 
t
and
R
t
.The value of 
b
=
|
B
t
|
is referred to as the threshold of the MIX. When an example ishelpful to clarify a point we will use
= 20
,
000,
b
= 50 and
m
= 20, which will bereferred to as the
standard parameters
. These parameters have been used for previouslyfor benchmarking search algorithms [5, 1, 6].The attacker is assumed to be a global passive adversary, and can therefore determine
t
and
R
t
for each message set
B
t
collected by the MIX. A collection of 
t
message setsobserved by the attacker will be denoted as
B
t
=
{
B
1
,B
2
,...,B
t
}
, where each
B
i
corre-sponds to a batch. To analyze traffic confirmation attacks we also require a model of thetraffic patterns amongst the users of the MIX. We will adopt the
random communication model 
(RCM) as proposed in [1], which has been used to study the disclosure attack forexample. The defining properties of the model are:
Each message batch is
B
t
formed by
b
distinct senders, which determines a receiverset
R
t
for the batch.
When a sender, other than Alice, sends a message then the recipient is chosenuniformly amongst all
users.
The user targetted by the attacker, Alice, has
m
regular communicating peers,denoted by
R
A
=
{
A
1
,A
2
,...,A
m
} ⊂ R
. The recipient of each message sent byAlice is chosen uniformly from
R
A
.3

Share & Embed

More from this user

Add a Comment

Characters: ...