Professional Documents
Culture Documents
ABSTRACT
This paper presents a traitor tracing method dedicated to video content distribution. It is based on a two-level
approach with probabilistic traitor tracing codes. Codes are concatenated and decoded successively, the first
one is used to decrease the decoding complexity and the second to accuse users. We use the well-known Tardos
fingerprinting code for the accusation process and a Boneh-Shaw code with replication scheme to reduce the
search space of users. This method ensures a decrease of the computational time compared to classical Tardos
codes decoding. We present a method to select suspect groups of users and compare it to a more complex
two-level Tardos code which follows the same construction.
1. INTRODUCTION
Traitor tracing aims at preventing unauthorized redistribution of multimedia content by embedding individual
sequences of bits within each authorized copy. Each sequence is unique and allows to accuse users who share
content illegally. A watermarking scheme is used to embed the sequence inside the multimedia content, conse-
quently, attacks could be processed on the watermarking side, e.g. by noise by adding to the content, or directly
on the sequences in case of a collusion. Collusion is the process used by dishonest users, called colluders, in
the attempt to forge an untraceable content by observing differences between their copies. The set of all user
sequences, which is called a code, needs a particular construction to be robust against collusion attacks.
The main trend in traitor tracing focuses on probabilistic codes proposed by G. Tardos1 based on the marking
assumption first introduced with Boneh-Shaw codes.2 The goal of probabilistic traitor tracing codes is to reach
short code’s length with a fixed error probability. The error is defined as the sum of the probabilities of missed
detection, i.e failing to catch colluders, and the probability of accusing an innocent user. The most important
constraint is to guaranty a very low false alarm probability for a known maximum number of colluders. Tardos
code is the first code with length in O(c2 log(1/!−1 3
1 )) theoretically proved for random codes, where c is the
number of colluders and !1 the false alarm probability.
Recent contributions are focused on making proofs of tight bounds of the minimal code length4 or improving
knowledge of the code in terms of achievable capacity under particular decoding assumptions.5, 6 Other ap-
proaches using new tracing functions for simple or joint decoding have been introduced 7, 8 and experimentally
proven reliable for worst strategies under an information theory viewpoint9, 10 and fixed collusion sizes. Other
improvements have been reducing the memory consumption 11 and decoding complexity12 of the detection pro-
cess.
Current fingerprinting applications deal with a large number of users and distributors need to have a fast
detection process. Decoding Tardos codes is linear in time as a function of the number of users. Indeed we
have to compute a correlation score between the colluded sequence and a user sequence to decide if a user is
guilty or not. It implies searching a colluder inside a large group of innocent users and no information is given
from the colluded sequence to locate the colluder inside this group. Some improvements have been made in this
domain with a two-level approach 12 where a hierarchical construction with two Tardos codes is used in order
to reduce the detection time in the users space. This method ensures good decoding performance but increases
the memory consumption, as new sequences for groups of users and the Tardos key need to be stored.
Further author information: (Send correspondence to M.D)
E-mail: mathieu.desoubeaux@orange-ftgroup.com
In this paper we present a two-level approach dedicated to improving the detection of colluders with the well
known Tardos fingerprinting scheme. Our objective is to reduce the memory and the complexity impact of a
two-level method while keeping good performance at the decoding side. A Boneh-Shaw code with replication
scheme called a BS-RS is concatenated with a Tardos code thanks to a q-ary alphabet. It requires using two
different keys for the embedding of each symbol for a binary Tardos sequence. A group of users is associated
for each transition of columns types2 of the BS-RS code. In section 3, an improvement at the decoding side is
presented by organizing the detection of users with a statistical analysis of the BS-RS code. Indeed we can detect
first the more probable groups of users hosting colluders. The efficiency of the detection is strongly dependent
on the Tardos code length because it constrains the replication factor of the BS-RS code which is directly linked
with the error probability. In section 4, experimentations confirm this point and show improvements compared
to the original Tardos detection process. We also compare our method with the two-level Tardos method, which
ensures the best known decoding rates.
At the decoding side, an accusation score Sj (Y, T j) is computed for each user where Y denotes the colluded
sequence. A user is accused if Sj > Z, with Z = Bc log(!−1 2 −1
1 ) and m = Ac log(!1 ). The value of A could be
different under particular assumption: e.g. with a Gaussian assumption of the score distributions, this constant
could be set to 2π 2 , whereas in the original Tardos scheme, A is set to 100.
In our experimentation, we use the following symmetric decoding function which is independent from the
collusion strategy:14
"m
Sj = Uji (yi , xji , pi ) with
i=0
! !
U (1, 1, p) = (1 − p)/p U (1, 0, p) = − p/(1 − p)
! !
U (0, 1, p) = − (1 − p)/p U (0, 0, p) = p/(1 − p)
At the decoding side with the original BS-RS, the accusation of a user is made with a null hypothesis test,
i.e. all users for which the null hypothesis is rejected are accused. Approaches with list decoding are used in the
literature, for concatenated scheme with BS-RS as inner code and error correcting code as outer code .15 With
list decoding users with the closest codeword to the pirated sequence are accused. Concatenated codes are used
to reduce the length of the code if we know that the maximum number of colluders will not exceed a bound c.
Many different outer codes are used in the literature but our goal is to reduce the computation time, so we do
not consider complex error decoding codes with large alphabets as outer codes. The security of such a process
depends on the key used for the permutation of types. The knowledge about the secret key increases obviously
with the number of colluders.
3. TWO-LEVEL CODES
3.1 TT code
In this paper, we call a two-level Tardos code by a TT code. As far as we know, the only work which describes the
use of a two-level Tardos codes is the paper of Zhang et al.12 The two-level construction is dedicated to reducing
the detection complexity of the classical one level code and as well as to reduce the false alarm probability.
They mention that their method increases the length of the code which allows us to assume that they used a
sequential construction. The first code, for the selection of groups of users, is separated from the code used
to accuse them. They prove that the false positive probability is reduced but the false negative probability is
not studied. They do not consider the significant impact of the probability of miss detection. Because Tardos
codes do not guaranty that all colluders will be above the threshold, the miss detection probability will clearly
increase in their approach. Indeed, the proof of the bound depends on the sum of colluders scores and it does not
guaranty to catch more than one colluder. The length of the total code is larger than our code because they do
not change the cardinality of the code for the construction. In our implementation, we use two-level codes with
equal lengths. We double the cardinality of the code to construct a concatenated code with symbols embedded
in the same position inside the content as the original Tardos scheme.
• Weight based decision: we define wbi the weight of block bi as the number of 1" s, from the position (i − 1)r
to ir, detected inside the colluded sequence. We compute all weights for all blocks w = {w(b1 ), ..., w(bng −1 )}
and the sequence of values dw = {dw1 , ..., dwn−1 } with dwi = |wbi +1 − wbi |.
• Hyper-geometric based decision: the probability of error is bound by computing the probability of having
k1 1" s inside a block bi knowing that we observe k2 1" s inside the block bi+1 (cf Ref 2, Chap 2,Lemma V.2).
• Hypothesis testing based decision: another solution could be a hypothesis testing rule as a function of the
the probabilistic model but it is out of the scope of this paper and will be more complex than the Tardos
process. Our objective is to present the less complex solution and not to have to compute more correlation
scores.
There are two ways of assigning users in groups, the distributor can decide either to set consecutive transactions
linearly inside groups or to distribute consecutive transactions uniformly inside the Ng groups. A transaction is
associated to one user. If a colluder wants to forge an illegal content, he can buy several copies and then collude
them. If the distributor decides to assign consecutive transactions linearly then there will be several colluder
sequences inside the same group. It could be an important information because the colluders do not know when
the distributor will change transaction assignation with groups. It can improve the detection complexity because
if all colluder sequences are in the same group, we are assured to decode this group firstly. Therefore colluders
have to deal with the time they get copies. The information can not be exploited with Tardos code because it
is not a structured code, it is fully random.
3.4 Security
Two secrets are unknown for the colluders, which are the watermarking secret keys used to embed symbols inside
the content and the secret key of the code, i.e, the sequence of probabilities pi . Under these hypotheses, the
colluders do not know if the embedded symbol is a 1 or a 0. In our case the observation of colluders, i.e. the
number of symbols they observe, is changed due to the two-level approach. It introduces a perturbation inside
the knowledge of colluders and therefore in their potential strategies. But it is not the common assumption,
in most related papers,1, 2 they have access to their symbol values and therefore, forge a sequence under this
assumption. In this context we assume that colluders can attack both codes independently.
1. Attack on the Tardos code: if a simple decoder is used,6 which theoretically maximizes the rate of the
code, the best attack consists in minimizing the expectation over f (p) of the mutual information between
one user sequence and the output sequence Y . It is a complex process for colluders, an estimation of this
strategy up to 9 colluders is given in.10
2. Attack the BS-RS code: it remains to reduce the differences between adjacent weights. The fact that block
weights will increase from 0 to r is impossible to avoid but if the weights increase smoothly, no tracing
algorithm will succeeded (cf Ref 3, Lemma3.3).
4. EXPERIMENTAL RESULTS
In our experimentation, we consider that colluders do not know which symbol they have in their sequences. We
test the BST code only with the random strategy. We choose length of Tardos code as m equal to 2048 symbols.
It is approximatively the length of the Tardos code for 2 colluders with the gaussian assumption14 and the
constant A equal to 2π 2 , n set to 105 and !1 to 1.10−6 . We test a set of tree different collusion sizes c = {2, 6, 9}.
The BST code is compared with the TT code, the TT code gives an upper bound for the best results for the
simple decoder. The worst performance is given by the classical decoding and is linked with the hypergeometric
law presented in [sec 3.3]. It gives the upper bound of the decoding. It consists in randomly selecting the groups,
we call it the rand code. The performance is presented in Tab. 1 as the error of the number of bad groups
decoded, i.e. the groups without colluders inside. We give the expectation and the standard deviation of this
error for 1000 realisations of the tracing algorithm. Inevitably, the expectation of the error increases with the
number of colluders and decreases with the replication factor r. For 2 colluders, the decoding error is null for
r set to 64 bits. It corresponds to a number of groups set to 33. If the colluders are in two different groups
then we only have to consider these two groups in expectation and the search space of users is reduced by 93
percent of the total space of users. However this number of groups may not be optimal, we can consider more
groups and accept errors while having best performance in expectation. Indeed it could be better to consider
more groups with an expectation and a standard deviation error higher than a decoding with a small number of
groups. For low values of the replication factor the expectation error and the standard deviation error follow the
rand selection because there is no information to improve the decoding. Collusion of size 6 and 9 increase the
error of the BST codes but also for the TT code. However performance is still interesting, e.g. for 9 colluders
and r equal to 128 the expected search space of users is reduced by almost 30 percent.
In fig. 3 and fig. 4 we compare the TT code and the BST code with regard to the number of users detections.
In fig. 3 we observe that up to 65 groups (resp. r equal to 32) the expectation of the number of detected users
of the BST code is almost the same as for the TT code. If the replication factor decreases, then the BST is less
efficient than the TT code. In fig. 4 for bigger collusions, the performance of the TT codes seems to be bound
for number of groups above 300 and BST codes achieve their minimum for small number of groups. With regard
to the TT code our method is faster, in Fig.5, the decoding of the TT code is linear in function of the numbers
of groups, the processing time is presented in vertical axis and the number of groups in horizontal axis. For
example if we take 256 groups of users, the BST groups ordering is 3000 times faster than the TT ordering.
4
BST c=2
4
TT c=2
0
200 250 300 350 400 450 500 50 100 150
Number of groups
Figure 3. Expectation of the number of users to consider with the BST and the TT code in function of the replication
value for c = 2 and m = 2048.
4
Expectation of the number of users to decode
x 10
10
6
bst c=6
TT c=6
4
BST c=9
TT c=9
2
0
0 100 200 300 400 500
Number of groups
Figure 4. Expectation of the number of users to consider with the BST and the TT code in funtion of the replication
value for c = 6,c = 9 and m = 2048.
0.045
0.04
0.035
0.03
Time(sec)
0.025
0.02
0.015 BST
TT
0.01
0.005
0
0 50 100 150 200 250 300 350 400 450 500
Number of groups
Figure 5. Time consumption of BST and TT group selection functions for 1000 trials and m = 2048.
REFERENCES
[1] Tardos, G., “Optimal probabilistic fingerprint codes,” J. ACM 55(2), 1–24 (2008).
[2] Boneh, D. and Shaw, J., “Collusion-secure fingerprinting for digital data,” IEEE Transactions on Informa-
tion Theory 44, 1897–1905 (Sept. 1998).
[3] Peikert, C., shelat, A., and Smith, A., “Lower bounds for collusion-secure fingerprinting,” in [Proceedings of
the fourteenth annual ACM-SIAM symposium on Discrete algorithms ], 472–479, Society for Industrial and
Applied Mathematics, Baltimore, Maryland (2003).
[4] Skoric, B., Vladimirova, T., Celik, M., and Talstra, J., “Tardos fingerprinting is better than we thought,”
IEEE Transactions on Information Theory 54(8), 3663–3676 (2008).
[5] Amiri, E. and Tardos, G., “High rate fingerprinting codes and the fingerprinting capacity,” in [Proceedings
of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms ], SODA ’09, 336–345, Society for
Industrial and Applied Mathematics, Philadelphia, PA, USA (2009).
[6] Huang, Y.-W. and Moulin, P., “On the saddle-point solution and the large-coalition behavior of fingerprint-
ing games,” CoRR (2010).
[7] Furon, T. and Perez-Freire, L., “EM decoding of tardos traitor tracing codes,” in [Proceedings of the 11th
ACM workshop on Multimedia and security], 99–106, ACM, Princeton, New Jersey, USA (2009).
[8] Meerwald, P. and Furon, T., “Towards Joint Tardos Decoding: The ’Don Quixote’ Algorithm,” in [Infor-
mation Hiding], Pevny, Tomas, Filler, and Tomas, eds., Springer-Verlag, Prague, Czech Republic (2011).
[9] Moulin, P., “Universal fingerprinting: Capacity and Random-Coding exponents,” arXiv:0801.3837 (Jan.
2008).
[10] Furon, T. and Perez-Freire, L., “Worst case attacks against binary probabilistic traitor tracing codes,” in
[Proceedings of First IEEE International Workshop on Information Forensics and Security], 46–50 (2009).
[11] Nuida, K., Hagiwara, M., Watanabe, H., and Imai, H., “Optimization of memory usage in tardoss finger-
printing codes,” Information Hiding 9th Intl Workshop IH 2007 Saint Malo France Jun 1113 2007 Revised
Selected Papers 4567, 12 (2006).
[12] Zhang, H., Havyarimana, V., and Qiaoliang, L., “Two-level tardos fingerprinting code,” in [2010 Interna-
tional Conference on Educational and Information Technology], (Sept. 2010).
[13] Wang, Z., Wu, M., Zhao, H., Trappe, W., and Liu, K., “Anti-collusion forensics of multimedia fingerprinting
using orthogonal modulation,” IEEE Transactions on Image Processing 14(6), 804–821 (2005).
[14] Skoric, B., Katzenbeisser, S., and Celik, M. U., “Symmetric tardos fingerprinting codes for arbitrary alphabet
sizes,” Des. Codes Cryptography 46(2), 137–166 (2008).
[15] Schaathun, H., “The BonehShaw fingerprinting scheme is better than we thought,” IEEE Transactions on
Information Forensics and Security 1, 248–255 (June 2006).
[16] Anthapadmanabhan, N. P. and Barg, A., “Two-level fingerprinting codes,” CoRR abs/0905.0417 (2009).
[17] Xie, F., Furon, T., and Fontaine, C., “On-off keying modulation and tardos fingerprinting,” in [Proceedings
of the 10th ACM workshop on Multimedia and security], 101–106, ACM, Oxford, United Kingdom (2008).