You are on page 1of 17


Capacity of Steganographic Channels

Jeremiah J. Harmsen, Member, IEEE, William A. Pearlman, Fellow, IEEE,

Abstract—This work investigates a central problem in Somewhat less work exists exploring capacity with arbi-
steganography, that is: How much data can safely be hidden trary detection functions. These works are written from a
without being detected? To answer this question a formal steganalysis perspective[4], [5] and accordingly give heavy
definition of steganographic capacity is presented. Once this has
been defined a general formula for the capacity is developed. The consideration to the detection function.
formula is applicable to a very broad spectrum of channels due This work differs from previous work in a number of
to the use of an information-spectrum approach. This approach aspects. Most notable is the use of information-spectrum meth-
allows for the analysis of arbitrary steganalyzers as well as non- ods that allow for the analysis of arbitrary detection algorithms
stationary, non-ergodic encoder and attack channels. and channels. This eliminates the need to restrict interest
After the general formula is presented, various simplifications
are applied to gain insight into example hiding and detection to detection algorithms that operate on sample averages or
methodologies. Finally, the context and applications of the work behave consistently. Instead the detection functions may be
are summarized in a general discussion. instantaneous, that is, the properties of a detector for n samples
Index Terms—Steganographic capacity, stego-channel, ste- need not have any relation to the same detector for n + 1
ganalysis, steganography, information theory, information spec- samples. Additionally, the typical restriction that the channel
trum under consideration be consistent, ergodic or stationary is also
I. I NTRODUCTION Another substantial difference is the presence of noise
before the detector. This placement enables the modeling of
A. Background common signal processing distortions such as compression,
HANNON’S pioneering work provides bounds on the quantization, etc. The location of the noise adds complexity
S amount of information that can be transmitted over a noisy
channel. His results show that capacity is an intrinsic property
not only because of confusion at the decoder, but also a signal,
carefully crafted to avoid detection, may be corrupted into one
of the channel itself. This work takes a similar viewpoint that will trigger the detector.
in seeking to find the amount of information that may be Finally, the consideration of a cover-signal and distortion
transferred over a stego-channel as seen in Figure 1. constraint in the encoding function is omitted. This is due
The stego-channel is equivalent to the classic channel with to the view that steganographic capacity is a property of the
the addition of the detection function and attack channel. For channel and the detection function. This viewpoint, along with
the classic channel, a transmission is considered successful if the above differences, make a direct comparison to previous
the decoder properly determines which message the encoder work somewhat difficult, although possible with a number of
has sent. In the stego-channel a transmission is successful not simplifications explored in Section V.
only if the decoder properly determines the sent message, but
if the detection function is not triggered as well.
C. Groundwork
This additional constraint on the channel use leads to the
fundamental view that the capacity of a stego-channel is This chapter lays the groundwork for determining the
an intrinsic property of both the channel and the detection amount of information that may be transferred over the chan-
function. That is, the properties of the detection function nel shown in Figure 1. Here, the adversary’s goal is to disrupt
influence the capacity just as much as the noise in the channel. any steganographic communication between the encoder and
decoder. To accomplish this a steganalyzer is used to intercept
steganographic messages, and an attack function may alter the
B. Previous Work signal.
There have been a number of applications of information We now formally define each of the components in the
theory to the steganographic capacity problem[ 1], [2], [3]. system, beginning with the random variable notation.
These works give capacity results under distortion constraints 1) Random Variables: Random variables are denoted by
on the hider as well as active adversary. The additional capital letters, e.g. X. Realizations of these random variables
constraint that the stego-signal retain the same distribution as are denoted as lowercase letters, e.g. x. Each random variable
the cover-signal serves as the steganalysis detection function. is defined over a domain denoted with a script X . A sequence
of n random variables is denoted with X n = (X1 , . . . , Xn ).
This work was carried out at Rensselaer Polytechnic Institute and was
supported by the Air Force Research Laboratory, Rome, NY. Similarly, an n-length sequence of random variable realiza-
J. Harmsen is now with Google Inc. in Mountain View, CA 94043, USA; tions is denoted x = (x1 , . . . , xn ) ∈ X n . The probability of
E-mail: X taking value x ∈ X is pX (x).
W. Pearlman is with the Elec. Comp. and Syst. Engineering Dept.,
Rensselaer Polytechnic Institute, Troy, NY 12180-3590, USA; E-mail: Following a signal through Figure 1 we begin in the space of n-length stego-signals denoted X n . The signal then undergoes

Encoder-Attack Channel Qn (z|x)

Xn Yn Zn
fn (m) W n (y|x) An (z|y) φn (z)
Encoder Noise Attack Channel
Encoder Decoder

gn (y)


Fig. 1. Active System with Noise

gn−1 ({0}) 4) Impermissible Set: The impermissible set I gn ⊆ Y n is
the inverse image of 1 under g n . That is,
Pgn {0, 1}
Ign := gn−1 ({1}) = {y ∈ Y n : gn (y) = 1}. (4)
Ign gn−1 ({1}) For a given g n the impermissible set is the set of all signals
in Y n that gn will classify as steganographic.
Fig. 2. Permissible and Impermissible Sets
Example 1: Consider the illustrative sum steganalyzer de-
fined for the binary channel outputs (Y = {0, 1}). The
steganalyzer is defined for y = (y 1 , . . . , yn ) as,
 n n
1, if i=1 yi > 2
some distortion as it travels through the encoder-channel. This gn (y) = (5)
0, else
results in an element from the corrupted stego-signal space of
Y n . Finally, the signal is attacked to produce the attacked The permissible sets for n = 1, 2, 3, 4 are shown in Table I.
stego-signal in space Z n .
2) Steganalyzer: The steganalyzer is a function g n : Y n → S UM S TEGANALYZER P ERMISSIBLE S ETS
{0, 1} that classifies a sequence of signals from Y n into one of P1 = {(0)}
two categories: containing steganographic information, and not P2 = {(0,0),(0,1),(1,0)}
containing steganographic information. The function is defined P3 = {(0,0,0),(1,0,0),(0,1,0),(0,0,1)}
as follows for all y ∈ Y n , P4 = {(0,0,0,0),(1,0,0,0),(0,1,0,0),(0,0,1,0),(0,0,0,1),

1, if y is steganographic (1,1,0,0),(1,0,1,0),(1,0,0,1),(0,1,1,0),(0,1,0,1),(0,0,1,1)}
gn (y) = (1)
0, if y is not steganographic
5) Memoryless Steganalyzers: A memoryless steganalyzer,
The specific type of function may be that of support vector g = {gn }∞ n=1 is one where each g n is defined for y =
machine or a Bayesian classifier, etc. (y1 , y2 , . . . , yn ) as,
A steganalyzer sequence is denoted as, 
1, if ∃i ∈ {1, 2, . . . , n} such that g(yi ) = 1
gn (y) =
g := {g1 , g2 , g3 , . . .}, (2) 0, if g(yi ) = 0 ∀ i ∈ {1, 2, . . . , n}
where gn : Y n → {0, 1}. where g ∈ G1 is said to specify gn (and g). To denote a
The set of all n length steganalyzers is denoted G n . steganalyzer sequence is memoryless the following notation
3) Permissible Set: For any steganalyzer g n , the space of will be used g = {g}.
signals Y n is split into the permissible set and the impermis- The analysis of the memoryless steganalyzer is motivated
sible set, defined below. by the current real world implementation of detection systems.
The permissible set Pgn ⊆ Y n is the inverse image of 0 As an example we may consider each y i to be a digital image
under gn . That is, sent via email. When sending n emails, the hider attaches one
of the yi ’s to each message. The entire sequence of images
Pgn := gn−1 ({0}) = {y ∈ Y n : gn (y) = 0}. (3) is considered to be y. Typically steganalyzers do not make
use of entire sequence y. Instead each image is sequentially
The permissible set is the set of all signals of Y n that the processed by a given steganalyzer g, where if any of the y i
given steganalyzer, g n will classify as non-steganographic. trigger the detector the entire sequence of emails is treated as
Since each steganalyzer has a binary range, a steganalyzer steganographic.
sequence may be completely described by a sequence of Clearly for a memoryless steganalyzer g n , defined by g we
permissible sets. To denote a steganalyzer sequence in such have that,
a way the following notation is used, Pgn = Pg × Pg × · · · × Pg (7)
= {P1 , P2 , P3 , . . .},
That is, the permissible set of g n is defined by the n-
where Pn ⊆ Y n is the permissible set for g n . dimensional product of P g .

D. Channels E. Encoder and Decoder

We now define two channels. The first models inherent The purpose of the encoder and decoder is to transmit
distortions occurring between the encoder and detection func- and receive information across a channel. The information to
tion, such as the compression of the stego-signal. The second be transferred is assumed to be from a uniformly distributed
models a malicious attack by an active adversary such as message set denoted M n , with a cardinality of M n .
a cropping or additive noise. Both of these distortions are The encoding function maps a message to a stegosignal,
considered to be outside the control of the encoder. i.e. fn : Mn → X n . The element of X n to which the ith
1) Encoder-Noise Channel: The encoder-noise channel is message maps is called the codeword for i and is denoted, u i .
denoted as W n where W n : Y n × X n → [0, 1] and has the The collection of codewords, C n = {u1 , . . . , uMn } is called
following property for all x ∈ X n , the code. The rate, R n of an encoding function is given as
n log Mn .
W n (y|x) := Pr {Y n = y|X n = x} . The decoding function, φ n : Z n → Mn , maps a corrupted
stegosignal to a message. The decoder is defined by the set of
The channel represents the conditional probabilities of the decoding regions for the each message. The decoding regions,
steganalyzer receiving y ∈ Y n when x ∈ X n is sent. D1 , . . . , DMn , are disjoint sets that cover Z n and defined such
The random variable, Y resulting from transmitting X that,
through the channel W will be denoted as X → Y . φ−1
n ({m}) = Dm
We denote an arbitrary encoder-noise channel as the se-
quence of transition probabilities, := {F ⊆ Z n : φn (z) = m, ∀ z ∈ F } ,
for m = 1, . . . , Mn .
W := {W 1 , W 2 , W 3 , . . .}. Next, two important terms are presented that allow for the
analysis of steganographic systems. The first is the probability
2) Attack Channel: The attack function maps A n : Y n →
n the decoder makes a mistake, called the probability of error.
Z as,
The second is the probability the steganalyzer is triggered,
An (z|y) = Pr {Z n = z|Y n = y} . (8) called the probability of detection. In both cases they are
calculated for a given code C = {u 1 , . . . , uMn }, encoder-
The attack channel may be deterministic or probabilistic. channel W n , attack-channel A n and impermissible set Ign
Similarly to the encoder-noise channel, we denote an arbi- (corresponding to some g n ).
trary attack channel as the sequence of transition probabilities, The probability of error in decoding the message can be
found as,
A := {A1 , A2 , A3 , . . .}. Mn
n = Qn (Dic |ui ) , (12)
3) Encoder-Attack Channel: The encoder-attack channel or Mn i=1
channel is a function Q n : X n → Z n , defined to model the
where Qn = An ◦ W n .
effect of both the encoder-noise and attack channel. Thus,
Similarly the probability of detection for the steganalyzer is
calculated as,
Qn (z|x) = An (z|y) W n (y|x) . (9)
y∈Y n 1
δn = W n (Ign |ui ) . (13)
The specification of Q n by An and W n is denoted Qn = Mn i=1
An ◦ W n .
The arbitrary encoder-attack channel is a sequence of F. Stego-Channel
transition probabilities, A steganographic channel or stego-channel is a triple
1 2 3 (W, g, A), where W is an arbitrary encoder-noise channel,
Q = {Q , Q , Q , . . .}. (10)
g is a steganalyzer sequence, and A is an arbitrary attack
We will express the dependence between the arbitrary encoder- channel. To reinforce the notion that a stego-channel is defined
noise and attack channels and the arbitrary encoder-attack by a sequence of triples we will typically write (W, g, A) =
channel as Q = A ◦ W. {(W n , gn , An )}∞
n=1 .

4) Memoryless Channels: In the case where channel dis- 1) Discrete Stego-Channel: A discrete stego-channel is one
tortions act independently and identically on each input letter where at least one of the following holds:
xi , we say it is a memoryless channel. In this instance the |X | < ∞, |Y| < ∞, |Z| < ∞, or |Pgn | < ∞ ∀n.
n-length transition probabilities can be written as,
2) Discrete Memoryless Stego-Channel: A discrete memo-

ryless stego-channel (DMSC) is a stego-channel where,
W n (y|x) = W (yi |xi ), (11)
1) (W, g, A) is discrete
2) W is memoryless
where W is said to define the channel. To denote a channel 3) g is memoryless
is memoryless and defined by W we will write W = {W }. 4) A is memoryless

A DMSC is said to be defined by the triple (W, g, A) and Similarly, the liminf in probability of a sequence of random
will be denoted (W, g, A) = {(W, g, A)}. variables, {Zn }∞n=1 is,

p- lim inf Zn := sup β : lim Pr {Zn < β} = 0 .
G. Steganographic Capacity
The spectral sup-entropy rate of a general source X =
The secure capacity tells us how much information can {X n }∞
n=1 is defined as,
be transferred with arbitrarily low probabilities of error and
1 1
detection. H(X) := p- lim sup log n
. (15)
An (n, Mn , n , δn )-code (for a given stego-channel) consists n→∞ n p X (X )

of an encoder and decoder. The encoder and decoder are Analogously, the spectral inf-entropy rate of a general
capable of transferring one of M n messages in n uses of the source X = {X n }∞ n=1 is defined as,
channel with an average probability of error of less than (or 1 1
H(X) := p- lim inf log . (16)
equal to) n and a probability of detection of less than (or n→∞ n pX n (X n )
equal to) δn .
The spectral entropy rate has a number of natural properties
1) Secure Capacity: A rate R is said to be securely achiev- such as for any X, H(X) ≥ H(X) ≥ 0 [6, Thm. 1.7.2].
able for a stego-channel (W, g, A) = {(W n , gn , An )}∞ n=1 , if The spectral sup-mutual information rate for the pair of
there exists a sequence of (n, M n , n , δn )-codes such that: general sequences (X, Y) = {(X n , Y n )}∞ n=1 is defined as,
1) limn→∞ n = 0 1
2) limn→∞ δn = 0 I(X; Y) := p- lim sup i(X n ; Y n ), (17)
n→∞ n
3) lim inf n→∞ n1 log Mn ≥ R
The secure capacity of a stego-channel (W, g, A) is de- pY n |X n (Y n |X n )
noted as C(W, g, A). This is defined as the supremum of all i(X n ; Y n ) := log . (18)
pY n (Y n )
securely achievable rates for (W, g, A).
Likewise the spectral inf-mutual information rate for the
pair of general sequences (X, Y) = {(X n , Y n )}∞ n=1 is
H. (, δ)-Secure Capacity defined as,
A rate R is said to be (, δ)-securely achievable for a stego- 1
I(X; Y) := p- lim inf i(X n ; Y n ). (19)
channel (W, g, A) = {(W n , gn , An )}∞ n=1 , if there exists a
n→∞ n
sequence of (n, M n , n , δn )-codes such that:
B. Information-Spectrum Results
1) lim supn→∞ n ≤ 
This section lists some of the fundamental results from
2) lim supn→∞ δn ≤ δ
information-spectrum theory [6] that will be used in the
3) lim inf n→∞ n1 log Mn ≥ R
remainder of the paper.
II. S ECURE C APACITY F ORMULA H(X) ≤ lim inf H (X n ) (20)
n→∞ n
A. Information-Spectrum Methods I(X; Y) ≤ H(Y) − H(Y|X) (21)
The information-spectrum method[6], [7], [8], [9], [10] I(X; Y) ≥ H(X) − H(Y|X) (22)
is a generalization of information theory created to apply
to systems where either the channel or its inputs are not C. Secure Sequences
necessarily ergodic or stationary. Its use is required in this 1) Secure Input Sequences: For a given stegochannel
work because the steganalyzer is not assumed to have any (W, g, A), a general source X = {X n }∞ n=1 is called δ-secure
ergodic or stationary properties. if the resulting Y = {Y n }∞
n=1 satisfies,
The information-spectrum method uses the general source
lim sup Pr {gn (Y n ) = 1} ≤ δ, (23)
(also called general sequence) defined as, n→∞
∞ or either of the following equivalent conditions,
(n) (n)
X := X n = (X1 , X2 , . . . , Xn(n) ) , (14)
lim sup pY n (Ign ) ≤ δ, (24)
(n) n→∞
where each Xm is a random variable defined over alphabet or
X . It is important to note that the general source makes no lim inf pY n (Pgn ) ≥ 1 − δ. (25)
assumptions about consistency, ergodicity, or stationarity. n→∞

The information-spectrum method also uses two novel quan- The set, Sδ , of all general sources that are δ-secure is
tities defined for sequences of random variables, called the defined as,

lim sup and lim inf in probability.
The limsup in probability of a sequence of random variables, Sδ := X : lim sup W n (Ign |x) pX n (x) ≤ δ , (26)
{Zn }∞ n=1 is defined as, x∈X n
where X = {X n }∞
n=1 .
p- lim sup Zn := inf α : lim Pr {Zn > α} = 0 . The set for δ = 0 is called secure input set and denoted S 0 .

2) Secure Output Sequences: For a given steganalyzer with J (R + γ|X) ≤  shows that lim supn→∞ n ≤ .
sequence g = {gn }∞ n ∞
n=1 , a general sequence Y = {Y }n=1 is Finally since X ∈ Sδ we have that,
called δ-secure if,
lim sup pZ n (Ign ) ≤ δ. (35)
lim sup Pr {gn (Y n ) = 1} ≤ δ, (27) n→∞
Converse: Let R > C, and choose γ > 0 such that R −
The set, Tδ , of all δ-secure general output sequences is defined 2γ > C. Assume that R is (, δ)-achievable, so there exists
as, an (n, Mn , n , δn )-code such that,
n ∞
Tδ := Y = {Y }n=1 : lim sup pY n (Ign ) ≤ δ . 1
(28) lim inf log Mn ≥ R, (36)
n→∞ n→∞ n
The set for δ = 0 is called secure output set and denoted T 0 . lim sup n ≤ , (37)

D. (, δ)-Secure Capacity and

lim sup δn ≤ δ. (38)
We are now prepared to derive the first fundamental result- n→∞
the (, δ)-Secure Capacity. This capacity will make use of the
Let X = {X n }∞ n
n=1 where each X is a uniform distribution
following definition,
  over codewords C n , and let Z be the corresponding channel
1 output. Since R − 2γ > C ≥ sup{R : J (R|X) ≤ },
J (R|X) := lim sup Pr i(X n ; Z n ) ≤ R
n→∞ n
  J (R − 2γ|X) > . (39)
1 Qn (Z n |X n )
= lim sup Pr log ≤ R .
n→∞ n pZ n (Z n ) The Feinstein Dual [6], [7] states that for a uniformly
distributed input X n over a (n, Mn , n )-code and output Z n
The proof is the general -capacity proof given by Han[ 6],
corresponding to channel Q, the following holds for all n,
[7], with the restriction to the secure input set.  
Theorem 2.1 ((, δ)-Secure Capacity): The (, δ)-secure 1 Qn (Z n |X n ) 1
n ≥ Pr log ≤ log Mn − γ − e−nγ
capacity C(, δ|W, g, A) of a stegochannel (W, g, A) is n pZ n (Z n ) n
given by, (40)
Using the property of lim inf we have that for all n > n 0
C(, δ|W, g, A) = sup sup {R : J (R|X) ≤ } , (29) that,
for any 0 ≤  < 1 and 0 ≤ δ < 1. log Mn ≥ R − γ. (41)
Proof: This proof is based on [6], [7]. Let C = Thus for n > n0 we have,
supX∈Sδ sup {R : J (R|X) ≤ }, and Qn = An ◦ W n .  
Achievability: Choose any  ≥ 0 and δ > 0. 1 Qn (Z n |X n )
n ≥ Pr log ≤ R − 2γ − e−nγ . (42)
Let R = C − 3γ, for any γ > 0. By the definition of C we n pZ n (Z n )
have that there exists an X ∈ S δ such that, Taking the lim sup of both sides, and considering ( 39), we
sup{R : J (R|X) ≤ } ≥ C − γ = R + 2γ. (30) see that,
lim sup n > . (43)
Similarly we may find an R  > R + γ such that J (R |X) ≤ . n→∞
As J (R|X) is monotonically increasing,
J (R + γ|X) ≤ . (31) A fundimental assumption in the above proof is that the
encoder has a knowledge of the detection function. From
Next by letting M n = enR we have that, a steganalysis perspective this allows one to determine the
1 “worst-case scenario” for the amount of information that may
lim inf log Mn ≥ R. be sent through a channel.
n→∞ n
Using Feinstein’s Lemma[11] we have that there exists an
(n, Mn , n )-code with, E. Secure Capacity
  The next result deals with a special case of (, δ)-secure
1 Qn (Z n |X n ) 1
n ≤ Pr log ≤ log M n + γ + e−nγ . capacity, namely the one where  = δ = 0. The secure
n pZ n (Z n ) n
(32) capacity is the maximum amount of information that may be
As n1 log Mn = R for all n we have, sent over a channel with arbitrarily small probabilities of error
  and detection.
1 Qn (Z n |X n )
n ≤ Pr log ≤ R + γ + e−nγ . (33) The four potential formulations for our model are shown
n pZ n (Z n ) in Figure 3. The capacity of the stego-channel (W, g, A) is
Taking the lim sup of each side we have, shown in Theorem 2.2 to follow and specialized to the other
cases in Theorems 2.3, 2.4 and 2.5.
lim sup n ≤ J (R + γ|X) , (34) The results of these capacities are summarized in Table II.

(W, g, A) TABLE II
W n (y|x) gn (y) An (y|z)

Secure Capacity Noise Attack Thm.

(·, g, A)
Xn = Y n Zn C(W, g, A) = sup I(X; Z) W A 2.2
gn (y) An (y|z) X∈S0
C(·, g, A) = sup I(Y; Z) Noiseless A 2.3
(W, g, ·) Y∈T0
Y n = Zn
Xn C(W, g) = sup I(X; Y) W Passive 2.4
W n (y|x) gn (y) X∈S0
C(·, g) = sup H(Y) Noiseless Passive 2.5
(·, g, ·) Y∈T0
Xn = Y n = Zn
gn (y)

Fig. 3. Stegochannels
F. Strong Converse
A stego-channel (W, g, A) is said to satisfy the -strong
converse property if for any R > C(0, δ|W, g, A), every
(n, Mn , n , δn )-code with,
Theorem 2.2 (Secure Capacity): The secure capacity
C(W, g, A) of a stegochannel (W, g, A) is given by, 1
lim inf log Mn ≥ R,
n→∞ n
C(W, g, A) = sup I(X; Z). (44)
X∈S0 and
Proof: We apply Theorem 2.1 with  = 0 and δ = 0. lim sup δn ≤ δ,
This gives,
we have,
C(W, g, A)
= C(0, 0|W, g, A) (45a) lim n = 1.
= sup sup {R : J (R|X) ≤ 0} (45b)
X∈S0 Thus if a channel satisfies the -strong converse,
1 C(, δ|W, g, A) = C(0, δ|W, g, A),
= sup sup R : lim sup Pr i(X n ; Z n ) ≤ R ≤ 0 (49)
X∈S0 n→∞ n
(45c) for any  ∈ [0, 1).
= sup I(X; Z) (45d) Theorem 2.6 (-Strong Converse): A stego-channel
X∈S0 (W, g, A) satisfies the -strong converse property (for
a fixed δ) if and only if,
Here the last line is due to the definition of p- lim inf.
Theorem 2.3 (Noiseless Encoder, Active Adversary): The sup I(X; Z) = sup I(X; Z). (50)
secure capacity of a stego-channel, (·, g, A), with a noiseless- X∈Sδ X∈Sδ
encoder and active adversary, denoted C(·, g, A), is given This proof is essentially the -strong converse[6], [7] with a
by, restriction to the secure input set. See details in Appendix A
C(·, g, A) = sup I(Y; Z). (46)
G. Bounds
Proof: Apply Theorem 2.2 with X = Y and S0 = T0 .
Theorem 2.4 (Passive Adversary): The secure channel ca- We now derive a number of useful bounds on the spectral-
pacity with a passive adversary, denoted C(W, g) of a stego- entropy of an output sequence in relation to the permissible
channel (W, g, ·) is given by, set. These bounds will then be used to prove general bounds
for steganographic systems, and see further application in
C(W, g) = sup I(X; Y). (47) Chapter III.
Theorem 2.7 (Spectral inf-entropy bound): For a discrete
Proof: Since the adversary is passive, we have that Z = g = {Pn }∞n=1 with corresponding secure output set T 0 ,
Theorem 2.5 (Noiseless Encoder, Passive Adversary): sup H(Y) = lim inf log |Pn | (51)
Y∈T0 n→∞ n
The secure capacity of a stego-channel (·, g, ·), with a
noiseless-encoder and passive adversary, denoted C(·, g), is See Appendix B for proof.
given by, Theorem 2.8 (Spectral sup-entropy bound): For discrete
C(·, g) = sup I(X; Y). (48) g = {Pn }∞
n=1 with corresponding secure output set T 0 ,
Proof: Since the adversary is passive, we have that Z = sup H(Y) = lim sup log |Pn | (52)
Y∈T0 n→∞ n
Y, and since there is no encoder noise we have that X = Y
and S0 = T0 . See Appendix C for proof.

H. Capacity Bounds Here the final line follows since if X ∈ S 0 and X → Y then
This section present a number of fundamental bounds on Y ∈ T0 .
the secure capacity of a stego-channel based on the properties The next corollary specializes the above theorem when the
of that channel. permissible set is finite.
We make use of the following lemma, Corallary 2.1 (Discrete Permissible Set Bound):
Lemma 2.1: For a stego-channel (W, g, A) the following For a given discrete stego-channel (W, g, A) =
hold, {(W n , Pgn , An )}∞
n=1 the secure capacity is bounded
from above as,
I(X; Z) ≤ I(X; Y), (53)
I(X; Z) ≤ I(Y; Z). (54) C(W, g, A) ≤ lim sup log |Pgn | (61)
n→∞ n
Proof: We note that the general distributions form a
Proof: Combining Theorem 2.8 and line (59b) of Theo-
Markov chain, X → Y → Z 1 . A property of the inf- rem 2.10 gives the desired result.
information rate[7] is, The next theorem provides an intuitive result dealing with
I(X; Z) ≤ I(X; Y), (55) the capacity of two stego-channels having related steganalyz-
when X → Y → Z.
Since X → Y → Z implies Z → Y → X we also have, Theorem 2.11 (Permissible Set Relation): For two stego-
channels, (W, g, A) and (W, v, A) if P gn ⊆ Pvn for all
I(X; Z) ≤ I(Y; Z). (56) but finitely many n, then,
C(W, g, A) ≤ C(W, v, A). (62)
The first capacity bound gives an upperbound based on the
sup-entropy of the secure input set. Proof: Let {fn }∞ ∞
n=1 and {φn }n=1 be a sequence of
Theorem 2.9 (Input Sup-Entropy Bound): For a stego- encoding and decoding functions that achieves C(W, g, A).
channel (W, g, A) the secure capacity is bounded as, Such a sequence exists by the definition of secure capacity.
The following definitions will be used for i = 1, . . . , M n ,
C(W, g, A) ≤ sup H(X) (57)
ui = fn (i),
Proof: Using (21) and the property that H(X|Z) ≥ 0 we Di = φ−1
n ({i}) .
The probability of error for this sequence is given by ( 12),
C(W, g, A) = sup I(X; Z)

X∈S0 Mn
n = Qn (Dic |ui ) ,
≤ sup H(X) − H(X|Z) Mn i=1
≤ sup H(X) where Qn = An ◦ W n .
X∈S0 Clearly, this value is independent of the permissible sets and
if n → 0 for the stego-channel (W, g, A) then it also goes
The next theorem gives two upper bounds on the capacity to zero for (W, v, A).
based on the sup-entropy of the secure input and output sets. Next we know that the probability of detection for
Theorem 2.10 (Output Sup-Entropy Bounds): For a stego- (W, g, A) is given by (13),
channel (W, g, A) the secure capacity is bounded as,
C(W, g, A) ≤ sup H(Y) (59a) δng = W n (Ign |ui ) ,
X∈S0 Mn i=1
≤ sup H(Y) (59b) and that δng → 0.
Since Pgn ⊆ Pvn for all n > N , we have that, I gn ⊇ Ivn
Proof: Using (21) and the property that H(Z|X) ≥ 0 we
if n > N and thus,
W n (Ign |x) ≥ W n (Ivn |x) , ∀n > N, x ∈ X n . (63)
C(W, g, A) = sup I(X; Z)
Using this we may bound the probability of detection for
≤ sup I(X; Y) (W, v, A) and n > N as,
(21)   1
≤ sup H(Y) − H(Y|X) δnv = W n (Ivn |ui )
X∈S0 Mn i=1
≤ sup H(Y) Mn
(63) 1
≤ W n (Ign |ui )
≤ sup H(Y) Mn i=1
1X → Y → Z is said to hold when for all n, Xn and Zn are conditionally
independent given Y n . Since δng → 0 we see that δnv → 0 as well.

Xn = Y n = Zn
fn (m) W n (y|x) φn (y) Mn fn (m) φn (z) M̂n

Encoder Noise Decoder Encoder Decoder

gn (y) vn (y) gn (y)

Detection g Detection v Detection

Fig. 4. Composite steganalyzer Fig. 6. Noiseless Stego-Channel

A(y|x) B(z|y) To see this consider the two steganalyzers g and v. Assume
Noise A Noise B that g classifies signals with positive means as stegano-
graphic, while v classifies signals with negative means as
gn (y) vn (z) steganographic. If these detection functions were in series, the
Detection g Detection v permissible set (of the composite detection function) is empty
as a signal cannot have a positive and negative mean. Now
Fig. 5. Two Noise Channel consider a specific, deterministic distortion B n (−y|y) = 1.
Now we may send any signal we wish, as long as its mean is
positive. So in some instances, it is possible for the addition
I. Applications of a distortion to actually increase the capacity.
1) Composite steganalyzers: This final theorem of the pre-
vious section is intuitively pleasing and leads to some imme- III. N OISELESS C HANNELS
diate results. An example of this is the composite steganalyzer
pictured in Figure 4. This section investigates the capacity of the noiseless stego-
In this system two steganalyzers, g and v are used sequen- channel shown in Figure 6. In this system there is no encoder-
tially on the corrupted stego-signal. If either of these stegana- noise and the adversary is passive. This means that not only
lyzers are triggered, the message is considered steganographic. does the decoder receive exactly what the encoder sends, but
We will denote the composite stego-channel of this system as the steganalyzer does as well.
(W, h, A). This section finds the secure capacity of this system, and
As one would expect the capacity of the composite chan- then derives a number of intuitive bounds relating to this
nel, C(W, h, A), is smaller than either C(W, g, A) or capacity.
C(W, v, A). This is shown in the next theorem.
Theorem 2.12 (Composite Stego-Channel): For a compos-
A. Secure Noiseless Capacity
ite stego-channel (W, h, A) defined by g and v, the following
inequality holds, Theorem 3.1 (Secure Noiseless Capacity): For a discrete
noiseless channel (·, g, ·) the secure capacity is given by,
C(W, h, A) ≤ min {C(W, g, A), C(W, v, A)} . (65)
Proof: We first show that C(W, h, A) ≤ C(W, g, A). 1
C(·, g) = lim inf log |Pgn | (67)
n→∞ n
The permissible set of the composite is equal to the inter-
section of the base detection functions, Proof: The proof follows directly from Theorem 2.5 and
Phn = Pgn ∩ Pvn , ∀n, (66) Theorem 2.7.
Example 2 (Capacity of the Sum Steganalyzer): We now
thus we have that P hn ⊆ Pgn and we may apply Theorem 2.11 use this result to find the secure noiseless capacity of the
to state, parity steganalyzer of Example 1. The size of the permissible
C(W, h, A) ≤ C(W, g, A). set for n is equal to the number of different ways we may
arrange up to n/2 1s into n positions.
The above argument may be applied using P hn ⊆ Pvn to
show C(W, h, A) ≤ C(W, v, A).  
2) Two Noise Systems: We briefly present and discuss an |Pgn | = . (68)
interesting case that is somewhat counter-intuitive. Consider i:0≤i≤ n
the channel shown in Figure 5. In this case there is distortion  
A after the encoder and a second distortion, B before the n−1 n 1
For n even |Pgn | = 2 + 2 and for n odd,
second steganalyzer. In the previous section it was shown n/2
that in the composite steganalyzer the addition of a second |Pgn | = 2 . Applying the noiseless Theorem,
steganalyzer (Figure 5) lowers the capacity of the stego-
1 1
channel. A surprising result for the two noise system is that C(·, g) = lim inf log |Pgn | = lim log 2n−1
n→∞ n n→∞ n
this may not be the case- in fact, the addition of a second
distortion may increase the capacity of a stego-channel! = 1bit/use. (69a)

B. -Strong Converse for Noiseless Channels From Example 2 the size of the permissible set is,
We now present a fundamental result for discrete noiseless  n−1 1 n
channels regarding the -strong converse property. It gives 2 +2 , for even n
|Pgn | = n/2 (75)
 n−1
the necessary and sufficient conditions for a noiseless stego- 2 , for odd n
channel to satisfy the -strong converse property.
Theorem 3.2 (Noiseless -Strong Converse): A discrete We will make use of Stirling’s approximation,

noiseless stego-channel (·, g, ·) satisfies the -strong converse 1
n! = 2πnn+ 2 e−n+λn , (76)
property if and only if,
1 where 1/(12n + 1) < λn < 1/(12n).
C(·, g) = lim log |Pgn | . (70) For n even,
n→∞ n

Proof: Since the channel is noiseless, X = Y = Z we 1 n!

|Pgn | = 2n−1 + (77)
have, 2 (n − 12 n)!( 12 n)!
√ 1
1 2πnn+ 2 e−n+λn
sup I(X; Z) = sup H(Y), (71) = 2n−1 + √ 2 (78)
X∈S0 Y∈T0 2 n
+1 n
2π(n/2) 2 2 e− 2 +λn/2
sup I(X; Z) = sup H(Y). (72)  
X∈S0 Y∈T0 2e
≤ 2n−1 1 + √ (79)
First assume that the stego-channel satisfies the -strong 2πn
converse property. This gives, This gives,
sup H(Y) = sup I(X; Z) 1
Y∈T0 X∈S0 lim sup log |Pgn |
n→∞ n
= sup I(X; Z)
1 2e
X∈S0 ≤ lim sup log 2n−1 1 + √ (80)
n→∞ n 2πn
= sup H(Y)
Y∈T0 =1 (81)
The capacity is then, This shows,
C(·, g) = sup H(Y)
1 1
lim inf log |Pgn | = 1 ≥ lim sup log |Pgn | . (82)
n→∞ n n→∞ n
= lim inf log |Pgn |

Since the liminf and limsup coincide the limit is indeed a true
one. Thus, this stego-channel satisfies the -strong converse.
= sup H(Y)

1 C. Properties of the Noiseless DMSC
= lim sup log |Pgn |

n→∞ n In this section we briefly investigate the secure capacity of

= lim log |Pgn | the discrete memoryless stego-channel (cf. I-F2).
n→∞ n Theorem 3.3 (Noiseless DMSC Secure Capacity): For the
Here the final line results as the lim inf and lim sup coincide. stego-channel (·, g, ·) with g = {g}, the secure capacity is
For the other direction assume that C(·, g) = given by,
limn→∞ n1 log |Pgn | thus we have, C(·, g) = log |Pg | , (83)
C(·, g) = sup I(X; Z) and furthermore this stego-channel satisfies the strong
= sup H(Y)

Proof: As the channel is noiseless and the input alphabet
= lim log |Pgn | is finite we may use Theorem 3.1,
n→∞ n
1 1
= lim sup log |Pgn | C(·, g) = lim inf log |Pgn | . (84)
n→∞ n n→∞ n
= sup H(Y) Note that by (7) we have for all n,

= sup I(X; Z)
1 1  
log |Pgn | = log Pg × Pg × · · · × Pg 

X∈S0 n n    
Thus, supX∈S0 I(X; Z) = supX∈S0 I(X; Z) and by The- 1 n
orem 2.6 the stego-channel satisfies the -strong-converse = log |Pg |
= log |Pg | .
Example 3 (Sum Steganalyzer): We now determine if the
sum steganalyzer satisfies the -strong converse.

Xn Y n = X n + Nen Z n = Y n + Nan
fn (m) φn (y)
fn (m) φn (z)
Encoder Decoder
Encoder Decoder

N (0, σe2) gn (y) N (0, σa2)

Nen gn (y) Nan
Noise Detection Attack
Noise Detection Attack

Fig. 8. AWGN Channel Active Adversary

Fig. 7. Additive Noise Channel Active Adversary

Proof: First we find a lower bound as,

C(·, g) = log |Pg | .
(85) C(W, g, A) ≥ sup H(Z) − H(Z|X) (91)
We also have that
= sup {H(Z)} − H(N)
1 1 X∈S0
C(·, g) = lim inflog |Pgn | = log |Pg | = lim log |Pgn | ,
n→∞ n n→∞ n Next we upperbound the capacity as,
thus by Theorem 3.2 the stego-channel satisfies the strong C(W, g, A) ≤ sup {H(Z) − H(Z|X)} (93)
converse. X∈S0

= sup {H(Z)} − H(N)

In this section we evaluate the capacity of particular stego- By assumption H(N) = H(N) and combining (92)
channel, shown in Figure 7. In this channel both the encoder- and (94) we have the desired result.
noise and attack-noise are additive and independent from the
channel input. B. AWGN Example
The general formula of the previous section is now ap-
A. Additive Noise plied to the commonly found additive white Gaussian noise
channel. The detector is motivated by the use of spread
Denote the sum of two general sequences X = spectrum steganography[12] or more generally stochastic
(n) (n)
{X n = (X1 , . . . , Xn )}∞ n=1 , and Y = {Y n = modulation[13].
(n) (n) ∞
(Y1 , . . . , Yn )}n=1 as, The encoder-noise and attack-channel to be considered are
(n) (n) additive white Gaussian noise (AWGN). Thus for a stego-
X+Y := {X n +Y n = (X1 +Y1 , . . . , Xn(n) +Yn(n) )}∞ n=1 . signal, x = (x1 , . . . , xn ), the corrupted stego-signal is given
Letting the encoder-noise be denoted as N e = {Nen }∞
n=1 y = (x1 + n1 , . . . , xn + nn ),
and the attack-noise denoted as N a = {Nan }∞
n=1 we have the
following relations, where each ni ∼ N (0, σe2 ), and all are independent.
The transition probabilities of the encoder-noise are given
Y = X + Ne by,
Z = Y + Na = X + Ne + Na = X + N n

n 1 1 2
W (y|x) = n exp − 2 (yi − xi ) . (95)
where N = {N n }∞ n=1 = Ne + Na . (2πσe2 ) 2 2σe i=1
As noises are independent from the stego-signal we may
use the following simplifications, Similarly, the attack-channel is AWGN as N (0, σ a2 ) so the
transition probabilities are,
pZ n |X n (X n + N n |X n ) = pN n (N n ), n

n 1 1 2
leading to the following simplifications in spectral-entropies, A (z|y) = n exp − 2 (zi − yi ) . (96)
(2πσa2 ) 2 2σa i=1
H(Z|X) = H(N), (88) 1) Variance Steganalyzer: In stochastic modulation, a
H(Z|X) = H(N). (89) pseudo-noise is modulated by a message and added to the
cover signal. This is done, as the presence of noise in signal
We now use these simplifications to present a useful capac- processing applications is a common occurrence.
ity result for additive noise channels. If the passive adversary has knowledge of the distribution of
Theorem 4.1: For additive noise stego-channel defined with the coversignal and suspects that the hider is using stochastic
Ne +Na = N, if N satisfies the strong converse (i.e. H(N) = modulation, it expects the variance of a stegosignal will differ
H(N)) then the capacity is, from a coversignal. If the passive adversary knows the variance
of the cover-distribution it could design a steganalyzer to
C(W, g, A) = sup {H(Z)} − H(N) (90)
X∈S0 trigger if the variance of a test signal is higher than expected.

For example when testing the signal y = (y 1 , . . . , yn ) the This allows for a lower bound of,
variance steganalyzer operates as, 1  
C(W, g, A) = sup H(Z) − log 2πe(σe2 + σa2 ) (103a)
 n 2
1, if n1 i=1 yi2 > c X∈S0
gn (y) = (97) 1  
0, else ≥H(Z) − log 2πe(σe2 + σa2 ) (103b)
Thus, if the empirical variance of a test signal is above a
1 c + σa2
certain threshold, the signal is considered steganographic. = log 2 (103c)
2) Additive Gaussian Channel Active Adversary: In this 2 σe + σa2
section we derive the capacity under an active adversary. Converse:
Assume that the adversary uses an additive i.i.d. Gaussian To find the upperbound we will make use of a number of
noise with variance σ a2 while the encoder noise is additive simple lemmas:
i.i.d. Gaussian with σe2 . Lemma 4.1: For a given stego-channel with secure input
Let Ne = {Ne }2 where Ne ∼ N (0, σe2 ) and Na = {Na } distribution set S0 and secure output distribution set T 0 , the
where Na ∼ N (0, σa2 ). following holds,
Let N = Ne + Na = {N n = Nen + Nan }∞ n=1 . Since both
Ne and Na are i.i.d. as N (0, σe2 ) and N (0, σa2 ), respectively, sup H(Z) ≤ sup H(Z). (104)
X∈S0 Y∈T0
their sum is i.i.d. as N (0, σe2 + σa2 ), i.e. N = {N } with N ∼
N (0, σe2 + σa2 ). Proof: By definition for any X ∈ S 0 and X → Y, we

Since N = {N } with N ∼ N (0, σe2 + σa2 ) we have the have Y ∈ T0 .

(n) (n) (n) (n)
following relations, Lemma 4.2: For Y n = (Y1 , Y2 , . . . , Yn ) let Kij
1   be the covariance between Y i
(n) (n) (n)
and Yj , that is Kij :=
H(N) = H(N) = H(N ) = log 2πe σa2 + σe2 . (98)
2 (n)
E Yi Yj
. For the stego-channel defined above, if Y =
Since H(N) = H(N) we see that the noise sequence {Y n }∞ ∈ T
n=1 0 we have for any γ > 0 there exists some N
satisfies the strong converse property. such that for all n > N ,
3) Active Adversary Capacity: We now derive the secure
capacity of the above stego-channel. Since the noises are 1 (n)
K + σa2 < c + σa2 + γ. (105)
i.i.d. the general sequence N will satisfy the strong converse n i=1 ii
and allow the use of Theorem 4.1.
The formal proof is then followed by a discussion of the Proof: It suffices to show,
results and a description using the classic sphere packing n
intuition. 1 (n)
K < c + γ, (106)
Theorem 4.2: For the stego-channel (W, g, A) = n i=1 ii
{(W n , gn , An )}∞n=1 with W n
and A n
defined by (95)
and (96) respectively, and g n defined by (97) the secure for all n greater than some N .
capacity is, To show this, assume that no such N exists, thus we have
a subsequence n k such that,
1 c + σa2
C(W, g, A) = log 2 . (99) 1
2 σe + σa2 (n )
K k ≥ c + γ. (107)
Proof: From Theorem 4.1 and (98) we have, nk i=1 ii

C(W, g, A) = sup {H(Z)} − H(N) (100) This means that,

X∈S0 nk

1   1 (n ) 1
= sup {H(Z)} − log 2πe σa2 + σe2 . (101) K k =E y2 ≥ c + γ,
X∈S0 2 nk i=1 ii nk i=1 i
Achievability: which in turn implies that,
Let X = {X} where X ∼ N (0, c−σe2 ). Thus Y = X+Ne =
{Y } with Y = X +Ne . By addition of independent Gaussians, Pr {gnk (Y nk ) = 0} → 0.
Y ∼ N (0, c). This gives, This is a contradiction as it shows Y = {Y n }∞
n  n=1 ∈/ T0 .
1  (n) 2 Lemma 4.3: For any Z n = (Z1 , . . . , Zn ) with Cij =
Pr Yi > c → 0, (102) E {Zi Zj },
n i=1
 n n
1 1
and we see that X ∈ S0 . Similarly, Z = Na + Y = {Z} H(Z n ) ≤ log(2πe)n Cii . (108)
with Z = X + Ne + Na . Again by addition of independent 2 n i=1
Gaussians we have Z ∼ N (0, c + σa2 ).
Proof: From [14, Chap. 9.6] we have,
(n) (n)
2 Recall {Xn
that for a general sequence, X = = (X1 , . . . , Xn )}∞
n=1 1
when X = {X} is written it means that each Xi
is independent and H(Z n ) ≤ log(2πe)n Cii . (109)
identically distributed as X. 2 i=1

The result follows from application of the arithmetic-geometric G AUSSIAN A DDITIVE N OISE C APACITIES
Lemma 4.4: For the above stego-channel, any Y ∈ T 0 and Channel Secure Capacity Encoder Noise Attack Noise
c+σ 2
any  > 0 we have, C(W, g, A) 1
log σ 2 +σa2 σe2 σa2
e a
C(W, g) log σc2 σe2 0
1 1 2
H(Z n ) < log 2πe(c + σa2 ) + ,
lim inf (110) C(·, g, A) 1 c+σ 2
log σ 2 a 0 σa2
n→∞ n 2 2 a
A C(·, g) limσ 2 →0 12 log c+σ
2σ 2
0 0
where Z = {Z n }∞ n=1 and Y → Z.
Proof: Let any  > 0 be given and choose γ > 0 such
that, fn (m) φn (y)
γ ≤ (c + σa2 ) e2 − 1 , Encoder Decoder

this gives, N (0, σ 2 ) gn (y)

1   1 Noise Detection
log 2πe c + σa2 + γ ≤ log 2πe(c + σa2 ) + . (111)
2 2 Fig. 9. AWGN Channel Passive Adversary

(n) (n) (n) (n) (n) (n)
Letting Cij = E Zi Zj and Kij = E Yi Yj
(n) (n)
we note that Zi = Yi + Na . This gives,
5) Large Attack Case: We first consider the case where σ a2
Cii =
Kii + σa2 . (112) is much larger than both c and σ e2 . This gives,
1 c + σa2 1 σa2
This gives, C(W, g, A) = log 2 ≈ log = 0.
2 σe + σa2 2 σa2
1 (L4.3) 1 1 (n) Thus when the attack noise is large enough the capacity of
H(Z n ) ≤ log(2πe)n C (113)
n 2n n i=1 ii the stego-channel goes to zero. Intuitively this is due to the
n fact that the variance steganalyzer places a power constraint
1 1 (n) (of c) on any signals it allows to pass. If the attack noise is
= log(2πe)n K + σa2
2n n i=1 ii much larger than c, a message simply cannot be transmitted
(L4.2) 1  n with enough power to overcome that noise and  n → 0 is
< log(2πe)n c + σa2 + γ (115) impossible.
(111) 1 6) Large Encoder-Noise Case: Next we consider the case
≤ log 2πe(c + σa2 ) +  (116) where σe2 ≥ c.
c+σ2 c+σ2
The inequality of (115) holds for all but a finite number of n Since σ2 +σa2 ≤ 1, we have log σ2 +σa2 ≤ 0. This gives,
e a e a

by Lemma 4.2. 1 c + σa2

We now show the upperbound: C(W, g, A) = log 2 ≤0
2 σe + σa2
Beginning with the specialization of Theorem 4.1,
As capacity is always greater or equal to zero we see that
C(W, g, A) = sup {H(Z)} − log 2πe(σe2 + σa2 ) (117a) the capacity of this system is indeed zero. This is because no

X∈S0 2 matter what codeword is sent, the encoder-noise will corrupt

(L4.1) 1 it into the impermissible set and the steganalyzer will be
≤ sup {H(Z)} − log 2πe(σe2 + σa2 )
Y∈T0 2 triggered, that is δ n −→
(117b) This case illuminates the importance of the additional
(20) 1 constraint in communication over a stego-channel, as even if
≤ sup lim inf H(Z n )  → 0 the capacity of the stego-channel is still zero.
Y∈T0 n→∞ n
1 7) Noiseless Case: Consider the noiseless case where σ e2 =
− log 2πe(σe2 + σa2 ) (117c) σa = σ 2 and σ 2 → 0. This gives,
(L4.4) 1 c + σa2 1 c + σ2
< log 2 + (117d) lim C(W, g, A) = lim log 2 =∞
2 σe + σa2 2
σ →0 2 σ →0 2 σ + σ2
Combining (103c) and (117d) we have for any  > 0, Thus we see that since the channel is noiseless, and the
permissible set size (as well as input and output alphabets)
1 c + σa2 1 c + σa2 is uncountable (thus infinite) and the capacity is unbounded.
log 2 2
≤ C(W, g, A) < log 2 + ,
2 σe + σa 2 σe + σa2 8) Geometric Intuition: In this section we present some
geometric intuition to the previous results, similar to the case
and we see that C(W, g, A) = 12 log σ2 +σa2 . of the classic additive Gaussian noise[14], [15].
e a
4) Noise Cases: We now use this theorem to investigate We will consider the case of only an encoder-noise of σ 2 ,
the behavior of the capacity under different noise conditions. shown in Figure 9.

From the above theorem we see that, Thus using the center of each sphere
  n as a codeword, we
1 c have Mn codewords where M n = σc2 2 .
C(W, g) = log 2 . (118) If we consider the capacity as C(W, g) = lim n1 log Mn
2 σ
we have,
The most basic element will be the volume of an n
1  c  n2
dimensional sphere of radius r. In this case the volume is C(W, g) = lim log 2 (120a)
equal to An rn where An is a constant dependent only on the n σ
1 c
dimension n. = log 2 , (120b)
2 σ
The fundamental question is what is the capacity of the
stego-channel, or how many codewords can we reliably use. which agrees with the result of Theorem 4.2.
To answer this, we must consider the two constraints on a
secure system: error probability and detection probability. V. P REVIOUS W ORK R EVISITED
9) Error Probability: Since we have that X n = Y n = n , A. Cachin Perfect Security
we may view each codeword as a point in  n . When we In Cachin’s definition of perfect security[16] the cover-
transmit a given codeword we may think of the addition of signal distribution and the stego-signal distribution are each
noise as moving the point around in that space. Since the required to be independent and identically distributed. This
power of the noise is σ 2 , the probability
√ that the received gives the following secure-input set,
codeword has moved more than nσ 2 away from where it  
started goes to zero as n → ∞. Thus we know that if we S0 = X = {X} : lim D (S n ||X n ) = 0 . (121)
n→∞ n
transmit a codeword, it will likely be√contained in a sphere
(centered on the codeword) of radius nσ 2 . The i.i.d. property means that D (S n ||X n ) = nD (S||X)
This means that if we receive a signal inside such a sphere, so we see that the above is equivalent to,
it is likely that the transmitted codeword was the center of that
S0 = {X = {X} : D (S||X) = 0} (122)
sphere. In this manner we can define a coding system.
We know that for secure capacity the probability of error = {X = {X} : pS = pX } (123)
must go to zero. We also know that each codeword has an Since Cachin’s definition does not model noise, we may
associated sphere that the received signal will fall inside. consider it as noiseless and apply Theorem 3.1,
Thus if we choose the codewords such that their spheres do
not overlap, there will be no confusion in decoding and the C(W, g) = sup H(X) = H(S). (124)
probability of error will go to zero.
10) Detection Probability: We begin by looking at the This result states that in a system that is perfectly secure (in
permissible set. The permissible set for our g n is given by, Cachin’s definition) the limit on the amount of information that
may be transferred each channel use is equal to the entropy of
the source. This is intuitive because in Cachin’s definition the
Pgn = {y ∈ Y n : yi2 < nc}. (119)
output distribution of the encoder is constrained to be equal
√ to the cover distribution.
Clearly the permissible set is a sphere of radius nc centered
at the origin. If a test signal falls inside this sphere it is B. Empirical Distribution Steganalyzer
classified as non-steganographic, whereas if it is outside it is
The empirical distribution steganalyzer is motivated by the
considered steganographic.
fact that the empirical distribution from a stationary memory-
The second criteria for a secure system is that the probability
less source converges to the actual distribution of that source.
of detection go to zero. If we were to place each codeword
Accordingly, if the empirical distribution of the test signal
such that its sphere was inside the permissible set, we know
converges to the cover-signal distribution it is considered to
that the probability of detection will go to zero.
be non-steganographic.
11) Capacity: From the above we know that the codeword Assume that pS is a discrete distribution over the finite
spheres cannot overlap (to ensure no errors), and we also know alphabet S. Let a sequence, {s n }∞ n n
n=1 with each s ∈ S be
that all the codeword spheres must fit inside the permissible used to specify the steganalyzer for a test signal x as,
set (to ensure no detection). Thus if we calculate the number 
of non-overlapping spheres we may pack into the permissible 0 if P[sn ] = P[x] ,
gn (x) = (125)
set, we will have a general idea of the number of codewords 1 if P[sn ] = P[x] .
we can use.
where P[x] is the empirical distribution of x.
Since the volume of the permissible set is A n (nc) 2 and the The permissible set for g n is equal to the type class of P [sn ] ,
volume of each codeword sphere is A n (nσ 2 ) 2 we can place i.e.,
Pgn = T (P[sn ] ) := x ∈ X n : P[x] = P[sn ] . (126)
An (nc) 2
n  c  n2
n = , Theorem 5.1 (Empircal Distribution Steganalyzer Capacity):
An (nσ 2 ) 2 σ2
non-overlapping sphere inside the permissible set. C(W, g) = H (S) . (127)

Detection X ∼ pS Proof: Theorem 5.1 shows C(W, g) = H (S).
We now show Moulin’s capacity is equal to this value. In
the case of a passive adversary (D 2 = 0), the following is the
fN (m, s) An (y|x) φN (y) M̂
capacity of the stego-channel[2],
Source X Y
Encoder Noise Decoder C ST EG (D1 , 0) = sup H(X|S) (130)
Q ∈Q
D1 D2
where a p ∈ Q is feasible if,

Fig. 10. Moulin Stego-channel p(x|s)pS (s)d(s, x) ≤ D1 , (131)
p(x|s)pS (s) = pS (x). (132)
X s

M The capacity can be found for unbounded D 1 as,

Source fn (m, s) φn (y) M̂
Encoder Decoder C ST EG (∞, 0) = sup H(X|S) (133a)
Fig. 11. Equivalent Stego-channel
= H(S) − min I(S; X) (133b)

= H(S) (133c)
Proof: Since the channel is noiseless we may apply where the final line comes from choosing p(x) = p S (x).
Theorem 3.1.
C(W, g) = lim inf log |Pgn | (128a)
n→∞ n A framework for evaluating the capacity of steganographic
= lim inf log |T (sn )| (128b) channels under an active adversary has been introduced. The
n→∞ n system considers a noise corrupting the signal before the
= H(S) (128c) detection function in order to model real-world distortions
such as compression, quantization, etc.
Here we have used the fact that the permissible set for
Constraints on the encoder dealing with distortion and a
the empirical distribution detection function is the type
cover-signal are not considered. Instead, the focus is to develop
class in (128b). Additionally, by Varadarjan’s Theorem[ 17],
the theory necessary to analyze the interplay between the chan-
P[sn ] (x) → pS (x) almost surely (here the convergence is
nel and detection function that results in the steganographic
uniform in x as well). This allows for the use of the type
class-entropy bound from Theorem D.1 that provides the final
The method uses an information-spectrum approach that
allows for the analysis of arbitrary detection functions and
channels. This provides machinery necessary to analyze a very
C. Moulin Steganographic Capacity broad range of steganographic channels.
Moulin’s formulation[2], [3] of the stego-channel is shown In addition to offering insight into the limits of performance
in Figure 10. This is somewhat different than the formulation for steganographic algorithms, this formulation of capacity can
shown in Figure 1; most notable is the presence of distortion be used to analyze a different, and fundamentally important,
constraints and an absence of a distortion function prior to facet of steganalysis. While false alarms and missed signals
the steganalyzer. Additionally, an explicit steganalyzer is not have rightfully dominated the steganalysis literature, very little
defined and a hypothetical X ∼ p S is used. In order to have is known about the amount of information that can be sent past
the two formulations coincide a number of simplifications are these algorithms. This work presents a theory to shed light
needed for each model. onto this important quantity called steganographic capacity.
For our model,
• The stego-channel is noiseless
• The steganalyzer is the empirical distribution
A stego-channel (W, g, A) satisfies the -strong converse
For Moulin’s model,
property (for a fixed δ) if and only if,
• Passive adversary (D 2 = 0)
• No distortion constraint on encoder (D 1 = ∞) sup I(X; Z) = sup I(X; Z). (A.134)
X∈Sδ X∈Sδ
These changes produce the stego-channel shown in Fig-
Proof: First assume supX∈Sδ I(X; Z) =
ure 11.
supX∈Sδ I(X; Z). Let R = C(0, δ|W, g, A) + 3γ with
Theorem 5.2: For the stego-channel shown in Figure 11,
γ > 0. Consider an (n, M n , n , δn )-code with,
the capacities of this work and Moulin’s agree. That is,
C(W, g) = C ST EG (∞, 0) = H (S) . (129) lim inf log Mn ≥ R,
n→∞ n

and Substituting we have that,

lim sup δn ≤ δ.
n→∞ sup I(X; Z) ≤ R+γ (A.149)
Let X represent the uniform input due to this code and Z the = C(0, δ|W, g, A) + 2γ (A.150)
output after the channel Q = AX. From the Feinstein Dual
[6], [7] we know, = supX∈Sδ I(X; Z) + 2γ (A.151)
1 1 As γ is arbitrarily close to 0 we have,
n ≥ Pr i(X ; Z ) ≤ log Mn − γ − e−nγ . (A.135)
n n
n n sup I(X; Z) ≤ sup I(X; Z). (A.152)
X∈Sδ X∈Sδ
We also know there exists n 0 such that for all n > n 0 that,
1 Also, by definition,
log Mn ≥ R − γ, (A.136)
n sup I(X; Z) ≥ sup I(X; Z), (A.153)
so for n > n0 , X∈Sδ X∈Sδ
  showing equality and completing the proof.
n ≥ Pr i(X ; Z ) ≤ R − 2γ − e−nγ .
n n
We now show that the probability term above tends to 1. S PECTRAL INF - ENTROPY BOUND
Using Theorem 2.2 we have,
For a discrete g = {Pn }∞
n=1 with corresponding secure
R = C(0, δ|W, g, A) + 3γ (A.138) output set T0 ,
= supX∈Sδ I(X; Z) + 3γ (A.139) 1
sup H(Y) = lim inf log |Pn |
= supX∈Sδ I(X; Z) + 3γ (A.140) Y∈T0 n→∞ n
Rewriting gives, Proof: Let U(A) represent the uniform distribution on a
set A.
R − 2γ = sup I(X; Z) + γ. (A.141) Since Y∗ = {U(Pn )}∞
X∈Sδ i=1 ∈ T0 we have,
By the definition of I(X; Z) we finally have, sup H(Y) ≥ H(Y∗ ) = lim inf log |Pn | (B.154)
n→∞ n
1 n n
lim Pr i(X ; Z ) ≤ R − 2γ = 1, (A.142) Now assume there exists Y ∈ T0 with Y = {Ȳ n }∞
n→∞ n n=1 , such
which together with A.137 shows that that limn→∞ n = 1. H(Y) = H(Y∗ ) + 3γ, (B.155)
For the other direction assume,
for any γ > 0.
lim n = 1, (A.143) This means that,
and, 1 1 ∗
lim Pr log < H(Y ) + 2γ = 0 (B.156)
lim sup δn ≤ δ. (A.144) n→∞ n pȲ n (Ȳ n )
By (B.154) we have H(Y ∗ ) = lim inf n→∞ n1 log |Pn | and
Set R = C(0, δ|W, g, A) + γ for any γ > 0 and set M n =
from the definition of lim inf we may find a subsequence
enR . Clearly,
indexed by k n such that,
lim inf log Mn = R > C(0, δ|W, g, A). 1
n→∞ n H(Y∗ ) + 2γ ≥ log |Pkn | + γ. (B.157)
For any X ∈ Sδ (and its corresponding Z), using Feinstein’s
Lemma [11] we have an (n, M n , n )-code satisfying, For any kn (B.157) holds and we have,
1 1 1 1
n ≤ Pr i(X ; Z ) ≤ R + γ + e−nγ .
n n Pr log < log |Pkn | + γ ≤
(A.145) kn p k (Ȳ kn ) kn
n  Ȳ n 
1 1 ∗
From the error assumption we see that, Pr log < H(Y ) + 2γ . (B.158)
  kn pȲ kn (Ȳ kn )
1 n n
lim Pr i(X ; Z ) ≤ R + γ = 1. (A.146) Applying this result to (B.156) we have,
n→∞ n  
1 1 1
This means that, lim Pr log < log |Pkn | + γ = 0.
n→∞ kn pȲ kn (Ȳ kn ) kn
R + γ ≥ I(X; Z), (A.147) (B.159)
and since X ∈ Sδ is arbitrary we have, For any  > 0 and n greater than some n 0 ,
kn e−kn γ
R + γ ≥ sup I(X; Z). (A.148) Pr pȲ kn (Ȳ ) > < . (B.160)
X∈Sδ Pkn

Let, For n > n0 the probability of the permissible set (in this
  subsequence) is,
n kn e−kn γ
Akn = y ∈ Y : pȲ kn (Ȳ ) > , (B.161)
|Pkn |
pȲ kn (Pkn ) = pȲ kn (y) (C.170a)
and for all n > n0 , we have pȲ kn (Akn ) < . x∈Pkn
For n > n0 we may calculate the probability of the
permissible set (for the subsequence) as, = pȲ kn (y)
y∈Pkn ∩Ackn
pȲ kn (Pkn ) = pȲ kn (y) (B.162a)
+ pȲ kn (y) (C.170b)
y∈Pkn ∩Akn
= pȲ kn (y) + pȲ kn (y) e−kn γ
y∈Pkn ∩Ackn y∈Pkn ∩Akn ≤ 1
|Pkn |
(B.162b) y∈Pkn ∩Ackn

e−kn γ + pȲ kn (y) (C.170c)
≤ + pȲ kn (y) (B.162c)
|Pkn | y∈Pkn ∩Akn
y∈Pkn y∈Akn
< e−kn γ +  (C.170d)
< e−kn γ +  (B.162d)
This shows pȲ kn (Pkn )−→
 1 and we have a contradiction showing it is impossible for Y ∈ T 0 .
as Y ∈
/ T0 .

For discrete g = {Pn }∞
n=1 with corresponding secure output
set T0 ,
1 Theorem D.1: Let (p 1 , p2 , . . .) be a sequence of types de-
sup H(Y) = lim sup log |Pn |
Y∈T0 n→∞ n fined over the finite alphabet X where p n ∈ Pn . Assume this
sequence satisfies the following:
Proof: Since Y ∗ = {U(Pn )}∞
i=1 ∈ T0 we have,
1) pn → p
sup H(Y) ≥ H(Y∗ ) (C.163a) 2) pn ≺≺ p, ∀n
1 Then,
= lim sup log |Pn | (C.163b)
n→∞ n
lim log |T (pn )| = H(p). (D.171)
Now assume there exists Y ∈ T0 , with Y = {Ȳ n }∞ n=1 such n→∞ n
H(Y) = H(Y∗ ) + , (C.164) Proof: We first show,
for any γ > 0.
This means that, lim inf log |T (pn )| ≥ H(p). (D.172)
n→∞ n
1 1 ∗ γ
lim Pr log > H(Y ) + = 0 (C.165)
n→∞ n pȲ n (Ȳ n ) 2 A sharpening of Stirling’s approximation states that for
1 1
By the definition of lim sup for some subsequence k n we 12n+1 < λn < 12 ,

have, √
1 γ n! =
2πnn+ 2 e−n eλn .
log |Pkn | + γ > H(Y∗ ) + (C.166)
kn 2
  Let the empirical distribution, p n be specified by
1 1 1 (n1 , . . . , nKn ). That is, if we enumerate the outcomes as
lim Pr log k
> log |Pkn | + γ = 0.
n→∞ kn pȲ kn (Ȳ )
n kn (a1 , . . . , aKn ) we have that,
For any  > 0 letting, ni
  pn (ai ) = .
e−kn γ n
Akn = y ∈ X n : pȲ kn (Ȳ kn ) < (C.168)
|Pkn |
we may find n0 where for n > n 0 , By definition i=1 ni = n, and from the above condition
of absolute continuity we have that K n ≤ s(p) for all n, where
pȲ kn (Akn ) < . (C.169) s(p) is the support of the final distribution.

[11] A. Feinstein, “A new basic theorem of information theory,” IEEE Trans.

  on Information Theory, vol. 4, no. 4, pp. 2–22, Sep. 1954.
log |T (pn )| = log [12] L. M. Marvel, C. G. Boncelet, Jr, and C. T. Retter, “Spread spectrum
n1 !, n2 !, . . . , nKn ! image steganography,” IEEE Trans. Image Processing, vol. 8, no. 8, pp.
√ 1 1075–1083, Aug. 1999.
2πnn+ 2 e−n eλn [13] J. Fridrich and M. Goljan, “Digital image steganography using stochastic
= log √ 
Kn ni + 12 −n λn modulation,” in Proc. SPIE Electronic Imaging 5022, Santa Clara, CA,
i=1 2πn i e ie i
Jan. 21–24, 2003.
[14] T. M. Cover and J. A. Thomas, Elements of information theory. Wiley-
√ Interscience, 1991.
= n log n − ni log ni + log 2πneλn [15] C. E. Shannon, “Communication in the presence of noise,” Proceedings
i=1 of the I.R.E., vol. 37, pp. 10–21, Jan. 1949.
Kn [16] C. Cachin, An Information-Theoretic Model for Steganog-
√  raphy, ser. Lecture Notes in Computer Science. New
− log 2πni eλni York: Springer-Verlag, 1998, vol. 1525. [Online]. Available:
√ 1
 [17] R. M. Dudley, Real Analysis and Probability. Cambridge, UK:
≥ nH (pn ) − Kn log 2πne 12 Cambridge University Press, 2002.

This implies that,

1 s(p) √ 1

log |T (pn )| ≥ H (pn ) − log 2πne 12 .
n n
Taking the lim inf of each side, Jeremiah Harmsen is currently an engineer at
Google Inc. in Mountain View, CA.
lim inf log |T (pn )| ≥ lim inf H (pn ) = H(p). (D.173) Jeremiah recieved B.S. degrees in Electrical Engi-
n→∞ n n→∞ neering and Computer Engineering (2001), an M.S.
degree in electrical engineer (2003), an M.S. degree
Now we have from the type class upper-bound[ 14] that, in mathematics (2005) and a Ph.D. in electrical
engineering (2005) from Rensselaer Polytechnic In-
lim sup log |T (pn )| ≤ lim sup H(pn ). (D.174) stitute, Troy, NY.
n→∞ n n→∞ Jeremiah recieved the Rensselaer Wynant James
Prize in Electrical Engineering, Rensselaer Founders
Combing with (D.173) gives the desired result. Award and Rensselaer Electrical Computer and Sys-
tems Departmental Service Award. He is a member of Tau Beta Pi and Eta
Kappa Nu.
The support of the Center for Integrated Transmission
and Exploitation (CITE) and the Information Directorate of
the Air Force Research Laboratory, Rome, NY is gratefully
acknowledged. The authors also wish to thank the anonymous
reviewers for their helpful and constructive comments.
William Pearlman is Professor of Electrical, Com-
puter and Systems Engineering and Director of the
R EFERENCES Center for Image Processing Research, Rensselaer
Polytechnic Institute, Troy, NY.
[1] P. Moulin and J. A. O’Sullivan, “Information-theoretic analysis of Previously, he held industrial positions at Lock-
information hiding,” IEEE Transactions on Information Theory, vol. 49, heed Missiles and Space Company and GTE-
no. 3, pp. 563–593, Mar. 2003. Sylvania before joining the Electrical and Computer
[2] P. Moulin and Y.Wang, “New results on steganographic capacity,” in Engineering Department, University of Wisconsin,
Proc. CISS Conference, Princeton, NJ, Mar. 2004. Madison, WI, in 1974, and then moved to Rensselaer
[3] Y. Wang and P. Moulin, “Perfectly secure steganography: Capacity, error in 1979. He has spent sabbaticals at the Technion in
exponents, and code constructions,” IEEE Transactions on Information Israel, Haifa, and Delft University of Technology,
Theory, vol. 54, no. 6, pp. 2706–2722, Jun. 2008. Delft, The Netherlands. He received an National Research Council associate-
[4] R. Chandramouli and N. Memon, “Steganography capacity: a steganal- ship to spend six months in 2000 at the Air Force Research Laboratory, Rome,
ysis perspective,” in Proc. SPIE Electronic Imaging 5022, Santa Clara, NY. In the summer of 2001, he was a Visiting Professor at the University of
CA, Jan. 21–24, 2003. Hannover, Germany, where he was awarded a prize for outstanding work in
[5] I. S. Moskowitz, L. Chang, and R. E. Newman, “Capacity is the wrong the field of picture coding. His research interests are in data compression of
paradigm,” in Proceedings of the 2002 workshop on New security images, video, and audio, digital signal processing, information theory, and
paradigms. ACM Press, 2002, pp. 114–126. [Online]. Available: digital communications theory. Dr. Pearlman is a Fellow of SPIE. He has been a past Associate Edi-
[6] T. S. Han, Information-Spectrum Methods in Information Theory, ser. tor for Coding for the IEEE TRANSACTIONS ONIMAGE PROCESSING
Applications of Mathematics. Berlin, Germany: Springer-Verilog, 2003, and served on many IEEE and SPIE conference committees, assuming the
vol. 50. chairmanship of SPIEs Visual Communications and Image Processing in
[7] S. Verdú and T. S. Han, “A general formula for channel capacity,” IEEE 1989 (VCIP89). He was a keynote speaker at VCIP2001 and at the Picture
Trans. on Information Theory, vol. 40, no. 4, pp. 1147–1157, Jul. 1994. Coding Symposium 2001 in Seoul, Korea. He has received the IEEE Circuits
[8] T. S. Han and S. Verdú, “Generalizing the fano inequality,” IEEE Trans. and Systems Society 1998 Video Transactions Best Paper Award and the
on Information Theory, vol. 40, no. 4, pp. 1247–1251, Jul. 1994. IEEE Signal Processing Society 1998 Best Paper Award in the Area of
[9] T. S. Han, “Hypothesis testing with the general source,” IEEE Trans. on Multidimensional Signal and Image Processing.
Information Theory, vol. 46, no. 7, pp. 2415–2427, Nov. 2000.
[10] ——, “An information-spectrum approach to source coding theorems
with a fidelity criterion,” IEEE Trans. on Information Theory, vol. 43,
no. 4, pp. 1145–1164, Jul. 1997.