distance between a randomly selected vector from
F
n
2
andthe code
C
. Clearly
R
a
≤
R
.Linear codes are codes for which
C
is a linear vectorsubspace of
F
n
2
. If
C
has dimension
k
, we call
C
a linearcode of length
n
and dimension
k
(and codimension
n
−
k
),or we say that
C
is an
[
n,k
]
code. Each linear code
C
of dimension
k
has a basis consisting of
k
vectors. Writing thebasis vectors as rows of an
k
×
n
matrix
G
, we obtain agenerator matrix of
C
. Each codeword can be written as alinear combination of rows from
G
. There are
2
k
codewordsin an
[
n,k
]
code. Given
x,y
∈
F
n
2
, we deﬁne their dot product
x.y
=
x
1
y
1
+
x
2
y
2
+
...
+
x
n
y
n
, all operations in
GF
(2)
. Wesay that
x
and
y
are orthogonal if
x.y
= 0
. Given a code
C
,the dual code of
C
, denoted as
C
⊥
, is the set of all vectors
x
that are orthogonal to all vectors in
C
. The dual code of a
[
n,k
]
code is a
[
n,n
−
k
]
code with an
(
n
−
k
)
×
n
generatormatrix
H
with the property that
Hx
= 0
⇔
x
∈
C.
(1)The matrix
H
is called the parity check matrix of
C
. For any
x
∈
F
n
2
, the vector
s
=
Hx
is called the syndrome of
x
. Foreach syndrome
s
∈
F
n
−
k
2
, the set
C
(
s
) =
{
x
∈
F
n
2

Hx
=
s
}
is called a coset. Note that
C
(0) =
C
. Obviously, cosetsassociated with different syndromes are disjoint. Also, fromelementary linear algebra we know that every coset can bewritten as
C
(
s
) =
x
+
C
, where
x
∈
C
(
s
)
arbitrary. Thus,there are
2
n
−
k
disjoint cosets, each consisting of
2
k
vectors.Any member of the coset
C
(
s
)
with the smallest Hammingweight is called a coset leader and will be denoted as
e
L
(
s
)
.
1) Lemma:
Given a coset
C
(
s
)
, for any
x
∈
C
(
s
)
,
d
(
x,C
) =
w
(
e
L
(
s
)
. Moreover, if
d
(
x,C
) =
d
(
x,c
)
for some
c
∈
C
, the vector
x
−
c
is a coset leader.
Proof
:
d
(
x,C
) =
min
c
∈
C
w
(
x
−
c
) =
min
y
∈
C
(
s
)
w
(
y
) =
w
(
e
L
(
s
))
. The second equality follows from the fact that if
c
goes through the code
C
,
x
−
c
goes through all members of the coset
C
(
s
)
.
2) Lemma:
If
C
is an
[
n,k
]
code with a
(
n
−
k
)
×
n
paritycheck matrix
H
and covering radius
R
, then any syndrome
s
∈
F
n
−
k
2
can be written as a sum of at most
R
columnsof
H
and
R
is the smallest such number. Thus, we can alsodeﬁne the covering radius as the maximal weight of all cosetleaders.
Proof
: Any
x
∈
F
n
2
belongs to exactly one coset
C
(
s
)
andfrom Lemma 1 we know that
d
(
x,C
) =
w
(
e
L
(
s
))
. But theweight
w
(
e
L
(
s
))
is the smallest number of columns in
H
thatmust be added to obtain
s
.III. L
INEAR
C
ODES FOR
S
TEGANOGRAPHY
The behavior of a steganographic algorithm can be sketchedin the following way:1) a covermedium is processed to extract a sequence of symbols
v
, sometimes called coverdata;2)
v
is modiﬁed into
s
to embed the message
m
;
s
issometimes called the stegodata;3) modiﬁcations on
s
are translated on the covermediumto obtain the stegomedium.Here, we assume that the detectability of the embeddingincreases with the number of symbols that must be changedto go from
v
to
s
( [5] for some examples of this framework).Syndrome coding deals with this number of changes. Thekey idea is to use some syndrome computation to embed themessage into the coverdata. In fact, such a scheme uses alinear code
C
, more precisely its cosets, to hide
m
. A word
s
hides the message
m
if
s
lies in a particular coset of
C
, relatedto
m
. Since cosets are uniquely identiﬁed by the socalledsyndromes, embedding/hiding consists exactly in searching
s
with syndrome
m
, close enough to
v
.
A. Matrix Encoding
We ﬁrst set up the notation and describe properly the matrixencoding framework and its inherent problems. Let
v
∈
F
nq
denote the coverdata and
m
∈
F
rq
the message. We arelooking for two mappings, embedding
Emb
and extraction
Ext
, such that
∀
(
v,m
)
∈
F
nq
×
F
rq
,Ext
(
Emb
(
v,m
)) =
m.
(2)
∀
(
v,m
)
∈
F
nq
×
F
rq
,d
(
v,Emb
(
v,m
))
≤
T.
(3)Equation (2) means that we want to recover the message inall cases; (3) means that we authorize the modiﬁcation of atmost
T
coordinates in the vector
v
.From Error Correcting Codes (Section 2), it is quite easy toshow that the scheme deﬁned by
Emb
(
v,m
) =
v
+
D
(
m
−
E
(
v
))
(4)
Ext
(
y
) =
E
(
y
) =
y
×
H
t
.
(5)
D
and
E
mean respectively the decoding function and thefunction of the syndrome. enables to embed messages of length
r
=
n
−
k
in a coverdata of length
n
, while modifyingat most
T
=
R
elements of the coverdata.The parameter
n
−
kR
represents the embedding efﬁciency, thatis, the number of embedded symbols per embedding changes.Linking symbols with bits is not simple, as naive solutionslead to bad results in terms of efﬁciency. For example, if elements of
F
q
are viewed as blocks of
L
bits, modifying asymbol roughly leads to
L
2
bit ﬂips on average and
L
for theworst case.A problem raised by the matrix encoding, as presentedabove, is that any position in the coverdata
v
can be changed.In some cases, it is more reasonable to keep some coordinatesunchanged because they would produce too big artifacts in thestegodata.
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 8, November 2010148http://sites.google.com/site/ijcsis/ISSN 19475500