Professional Documents
Culture Documents
Weighed Entropy
Weighed Entropy
DOI: 10.15559/17-VMSTA85
arXiv:1710.10798v1 [math.PR] 30 Oct 2017
Abstract This paper represents an extended version of an earlier note [10]. The concept of
weighted entropy takes into account values of different outcomes, i.e., makes entropy context-
dependent, through the weight function. We analyse analogs of the Fisher information in-
equality and entropy power inequality for the weighted entropy and discuss connections with
weighted Lieb’s splitting inequality. The concepts of rates of the weighted entropy and infor-
mation are also discussed.
Keywords Weighted entropy, Gibbs inequality, Ky-Fan inequality, Fisher information
inequality, entropy power inequality, Lieb’s splitting inequality, rates of weighted entropy and
information
2010 MSC 94A17
1 Introduction
This paper represents an extended version of an earlier note [10].1 We also follow
earlier publications discussing related topics: [20, 21, 19, 18]. The Shannon entropy
∗ Corresponding author.
1 AMCTA-2017, Talks at Conference “Analytical and Computational Methods in Probability Theory
and its Applications”
© 2017 The Author(s). Published by VTeX. Open access article under the CC BY license.
www.i-journals.org/vmsta
234 M. Kelbert et al.
is context-free, i.e., does not depend on the nature of outcomes xi or x, but only upon
probabilities p(xi ) or values f (x). It gives the notion of entropy a great flexibility
which explains its successful applications. However, in many situations it seems in-
sufficient, and the context-free property appears as a drawback. Viz., suppose you
learn a news about severe weather conditions in an area far away from your place.
Such conditions usually do not happen; an event like this has a small probability
p ≪ 1 and conveys a high information − log p. At the same time you hear that a tree
near your parking lot in the town has fallen and damaged a number of cars. The prob-
ability of this event is also low, so the amount of information is again high. However,
the value of this information for you is higher than in the first event. Considerations
of this character can motivate a study of weighted information and entropy, making
them context-dependent.
Definition 1.1. Let us define the weighted entropy (WE) as
X
hwϕ (p) = − ϕ(xi )p(xi ) log p(xi ). (1.2)
i
1
For ϕ(x) = 1 we get the normal SDE h(fCNo ) = 2 log[(2πe)d det C].
In this note we give a brief introduction into the concept of the weighted entropy.
We do not always give proofs, referring the reader to the quoted original papers. Some
basic properties of WE and WDE have been presented in [20]; see also references
therein to early works on the subject. Applications of the WE and WDE to the security
quantification of information systems are discussed in [15]. Other domains range
from the stock market to the image processing, see, e.g., [6, 9, 12, 14, 23, 26].
Weighted entropy: basic inequalities 235
Throughout this note we assume that the series and integrals in (1.2)–(1.3) and
the subsequent equations converge absolutely, without Rstressing it every time again.
To unify the presentation, we will often use integrals X dµ relative to a reference
σ-finite measure µ on a Polish space X with a Borel σ-algebra X. In this regard,
the acronym PM/DF (probability mass/density function) will be employed. Usual
measurability assumptions will also be in place for the rest of the presentation. We
also assume that the WF ϕ > 0 on an open set in X .
In some parts of the presentation, the sums and integrals comprising a PM/DF
will be written as expectations: this will make it easier to explain/use assumptions and
properties involved. Viz., Eqns (1.2)–(1.3) can be given as hw ϕ (p) = −Eϕ(X) log p(X)
and hwϕ (f ) = −Eϕ(X) log f (X) where X is a random variable (RV) with the PM/DF
p or f . Similarly, in (1.4), αϕ (C) = Eϕ(X) and ΦC,ϕ = Eϕ(X)XXT where
X ∼ N(0, C).
Given two non-negative functions f, g (typically, PM/DFs), define the weighted Kull-
back-Leibler divergence (or the relative WE, briefly RWE) as
f (x)
Z
w
Dϕ (f kg) = ϕ(x)f (x) log dµ(x). (2.1)
X g(x)
w
Then Dϕ w
(f kg) ≥ 0. Moreover, Dϕ (f kg) = 0 iff ϕ(x)[ fg(x)
(x) − 1] = 0 f -a.e.
where ∇ stands for the gradient w.r.t. to the parameter vector θ, and
Z
Aϕ (θ) = log ϕ(x)h(x) exp θ, T (x) dx. (2.5)
Theorems 2.1 and 2.2 from [20] offer the following assertion:
236 M. Kelbert et al.
hw w w
ϕ (λ1 f1 + λ2 f2 ) ≥ λ1 hϕ (f1 ) + λ2 hϕ (f2 ). (3.1)
The equality iff ϕ(x)[f1 (x) − f2 (x)] = 0 holds for (λ1 f1 + λ2 f2 )-a.a. x.
w
(b) However, the RWE functional (f, g) 7→ Dϕ (f kg) is convex: given two pairs
of PDFs (f1 , f2 ) and (g1 , g2 ),
w w w
λ1 Dϕ (f1 kg1 ) + λ2 Dϕ (f2 kg2 ) ≥ Dϕ (λ1 f1 + λ2 f2 kλ1 g1 + λ2 g2 ), (3.2)
with equality iff λ1 λ2 = 0 or ϕ(x)[f1 (x) − f2 (x)] = ϕ(x)[g1 (x) − g2 (x)] = 0 µ-a.e.
For t = 0 we obtain ϕ ≡ 1, and (4.4) coincides with (4.1). Theorem 4.1 is related
to the maximisation property of the weighted Gaussian entropy which takes the form
of Theorem 4.2. Cf. Example 3.2 in [20].
Theorem 4.2. Let f (x) be a PDF on Rd with mean 0 and (d × d) covariance ma-
trix C. Let f No (x) stand for the Gaussian PDF, again with the mean 0 and covariance
matrix C. Define (d × d) matrices
Z Z
T No
Φ= ϕ(x)xx f (x)dx, ΦC = ϕ(x)xxT fCNo (x)dx. (4.5)
Rd Rd
and
Z
d
ϕ(x) f (x) − fCNo (x) dx + tr C −1 ΦNo
log (2π) (det C) C −Φ ≤ 0.
Rd
Then hw w No No
ϕ (f ) ≤ hϕ (fC ), with equality iff ϕ(x)[f (x) − fC (x)] = 0 a.e.
Theorems 4.1 and 4.2 are a part of a series of the so-called weighted determinantal
inequalities. See [20, 22]. Here we will focus on a weighted version of Hadamard
inequality asserting that for a (d × d) positive-definite matrix C = (Cij ), det C ≤
Qd Pd No
j=1 Cjj or δ(C) ≤ j=1 log Cjj . Cf. [20], Theorem 3.7. Let fCjj stand for the
Gaussian PDF on R with the zero mean and the variance Cjj . Set:
Z
α= ϕ(x)fCNo (x)dx (cf. (1.4)).
Rd
w 1 w
J (X, θ1 )(θ2 − θ1 )2 + Eθ1 ϕ(X)Dθ log fθ1 (X) (θ1 − θ2 )
Dϕ (fθ1 kfθ2 ) =
2 ϕ
D2 fθ (X)
1
− Eθ1 ϕ(X) θ 1 (θ2 − θ1 )2 + o |θ1 − θ2 |2 Eθ1 ϕ(X)
2 fθ1 (X)
(5.2)
∂
where Dθ stands for ∂θ .
Hence
Dθ2 fθ1
2
− Jϕw (fθ1 ).
Eθ1 ϕDθ log fθ1 = Eθ1 ϕ (5.5)
fθ1
Therefore the claimed result, i.e., (5.2), is achieved.
with equality if and only if either X is Gaussian or if Yn = Xn−k , for some k, that
is, the filtering operation is a pure delay. Clearly, a possible extension of the EPI gives
more flexibility in signal processing. We are interested in the weighted entropy power
inequality (WEPI)
w w w
2hϕ (X1 ) 2hϕ (X2 ) 2hϕ (X)
κ := exp + exp ≤ exp . (6.5)
dEϕ(X1 ) dEϕ(X2 ) dEϕ(X)
Note that (6.5) coincides with (6.2) when ϕ ≡ 1. Let d = 1, we set
w
hϕ (X2 ) hw
ϕ (X1 )
X1 X2
α = tan−1 exp − , Y1 = , Y2 = . (6.6)
Eϕ(X2 ) Eϕ(X1 ) cos α sin α
Theorem 6.1. Given independent RVs X1 , X2 ∈ R1 with PDFs f1 , f2 , and the
weight function ϕ, set X = X1 + X2 . Assume the following conditions:
(i)
Eϕ(Xi ) ≥ Eϕ(X) if κ ≥ 1, i = 1, 2,
Eϕ(Xi ) ≤ Eϕ(X) if κ ≤ 1, i = 1, 2. (6.7)
(ii) With Y1 , Y2 and α as defined in (6.6),
(cos α)2 hw 2 w w
ϕc (Y1 ) + (sin α) hϕs (Y2 ) ≤ hϕ (X), (6.8)
where ϕc (x) = ϕ(x cos α), ϕs (x) = ϕ(x sin α) and
hw hw
ϕc (Y1 ) = −E ϕc (Y1 ) log fY1 (Y1 ) , ϕs (Y2 ) = −E ϕs (Y2 ) log fY2 (Y2 ) .
(6.9)
Then the WEPI holds.
Paying homage to [13] we call (6.8) weighted Lieb’s splitting inequality (WLSI).
In some cases the WLSI may be effectively checked.
Proof. Note that
hw w
ϕ (X1 ) = hϕc (Y1 ) + Eϕ(X1 ) log cos α,
hw w
ϕ (X2 ) = hϕs (Y2 ) + Eϕ(X2 ) log sin α. (6.10)
Using (6.8), we have the following inequality
hw 2
w
ϕ (X) ≥ (cos α) hϕ (X1 ) − Eϕ(X1 ) log cos α
+ (sin α)2 hw
ϕ (X2 ) − Eϕ(X2 ) log sin α . (6.11)
Furthermore, recalling the definition of κ in (6.5) we obtain
w w
2hϕ (X1 ) 2hϕ (X2 )
1 1
hw
ϕ (X) ≥ Eϕ(X 1 ) log κ exp + Eϕ(X 1 ) log κ exp .
2κ Eϕ(X1 ) 2κ Eϕ(X2 )
(6.12)
By virtue of assumption (6.7), we derive
1
hw
ϕ (X) ≥ Eϕ(X) log κ. (6.13)
2
The definition of κ in (6.5) leads directly to the result.
240 M. Kelbert et al.
Example 6.2. Let d = 1 and X1 ∼ N(0, σ12 ), X2 ∼ N(0, σ22 ). Then the WLSI (6.8)
takes the following form
log e 2
log 2π σ12 + σ22 Eϕ(X) + 2
2 E X ϕ(X)
σ1 + σ2
2
(cos α)2 log e 2
2πσ1
≥ (cos α)2 log
2
Eϕ(X1 ) + 2 E X1 ϕ(X1 )
(cos α) σ1
2
(sin α)2 log e 2
2πσ2
+ (sin α)2 log
2
Eϕ(X2 ) + 2 E X2 ϕ(X2 ) . (6.14)
(sin α) σ2
Example 6.3. Let d = 1, X = X1R+ X2 , X1 ∼ U[a1 , b1 ] and X2 ∼ U[a2 , b2 ]
x
be independent. Denote by Φ(x) = 0 ϕ(u)du and Li = bi − ai , i = 1, 2. The
Φ(bi )−Φ(ai )
WDE hw ϕ (Xi ) = Li log Li . Then the inequality κ ≥ (≤)1 takes the form
L21 + L22 ≥ (≤)1. Suppose for definiteness that L2 ≥ L1 or, equivalently, C1 :=
a2 + b1 ≤ a1 + b2 =: C2 . Inequalities (6.7) take the form
L2 Φ(b1 ) − Φ(a1 ) , L1 Φ(b2 ) − Φ(a2 ) ≥ (≤)Eϕ(X). (6.15)
The WLSI takes the form
2 Φ(b1 )
− Φ(a1 ) L1
−Λ + log(L1 L2 )Eϕ(X) ≥ (cos α) log
L1 cos α
2 Φ(b2 ) − Φ(a2 ) L2
+ (sin α) log , (6.16)
L2 sin α
where
"Z
C1
log L1 1
Λ= Φ(C1 ) − Φ(C2 ) + ϕ(x)(x − A) log(x − A)dx
L2 L1 L2 A
Z B #
+ ϕ(x)(B − x) log(B − x)dx ,
C2
A = a1 + a2 , B = b 1 + b 2 . (6.17)
Rx
Finally, define Φ∗ (x) = 0 uϕ(u)du and note that
1 ∗
Φ (C1 ) − Φ∗ (A) − Φ∗ (B) + Φ∗ (C2 )
Eϕ(X) =
L1 L2
− A Φ(C1 ) − Φ(A) + L1 Φ(C2 ) − Φ(C1 ) + B Φ(B) − Φ(C2 ) .
(6.18)
where ǫ > 0 and ϕ̄ > 0 are constants. Then there exists ǫ0 > 0 such that for any WF
ϕ satisfying (7.1) with 0 < ǫ < ǫ0 WLSI holds true. Hence, checking of the WEPI is
reduced to condition (6.7).
For a RV Z, γ > 0 and independent Gaussian RV N ∼ N(0, Id ) define
√ 2
M (Z; γ) = E Z − E[Z|Z γ + N] , (7.2)
where k.k stands for the Euclidean norm. According to [24, 8] the differential entropy
1 ∞
Z
h(Z) = h(N) + M (Z; γ) − 1{γ<1} dγ. (7.3)
2 0
For Z = Y1 , Y2 , X1 + X2 assume the following conditions
E log fZ (Z) < ∞, E kZk2 < ∞
(7.4)
′
and the uniform integrability: for independent N, N ∼ N(0, I) and any γ > 0 there
exist an integrable function ξ(Z, N) such that
N − N′
log E fZ Z + √ |Z, N ≤ ξ(Z, N). (7.5)
γ
Theorem 7.2. Let d = 1 and assume conditions (7.4), (7.5). Let γ0 be a point of
continuity of M (Z; γ), Z = Y1 , Y2 , X1 + X2 . Suppose that there exists δ > 0 such
that
M (X1 + X2 ; γ0 ) ≥ M (Y1 , γ0 )(cos α)2 + M (Y2 ; γ0 )(sin α)2 + δ. (7.6)
Suppose that for some ϕ̄ > 0 the WF satisfies
ϕ(x) − ϕ̄ < ǫ. (7.7)
Then there exists ǫ0 = ǫ0 (γ0 , δ, f1 , f2 ) such that for any WF satisfying (7.7) with
ǫ < ǫ0 the WLSI holds true.
Proof. For a constant WF ϕ̄, the following inequality is valid (see [8], Lemma 4.2 or
[24], Eqns (9) and (10))
(cos α)2 hw 2 w w
ϕ̄ (Y1 ) + (sin α) hϕ̄ (Y2 ) ≤ hϕ̄ (Y1 cos α + Y2 sin α). (7.8)
However, in view of Theorem 4.1 from [8], the representation (7.3) and inequality
(7.6) imply under conditions (7.4) and (7.5) a stronger inequality
(cos α)2 hw 2 w w
ϕ̄ (Y1 ) + (sin α) hϕ̄ (Y2 ) + c0 δ ≤ hϕ̄ (Y1 cos α + Y2 sin α). (7.9)
Here c0 > 0 and the term of order δ appears from integration in (7.3) in a neigh-
bourhood of the continuity point γ0 . Define ϕ∗ (x) = |ϕ(x) − ϕ̄|. It is easy to check
that
hwϕ∗ (Z) < c1 ǫ, Z = X1 , X2 , X1 + X2 . (7.10)
From (7.9) and (7.10) we obtain that for ǫ small enough
(cos α)2 hw 2 w w
ϕ (Y1 ) + (sin α) hϕ (Y2 ) ≤ hϕ (Y1 cos α + Y2 sin α), (7.11)
i.e., the WLSI holds true.
242 M. Kelbert et al.
where ǫ > 0 and ϕ̄ > 0 are constants. Then there exists ǫ0 = ǫ0 (µ0 , µ1 , σ02 , σ22 ) > 0
such that for any 0 < ǫ ≤ ǫ0 , the WLSI (6.8) with the WF ϕ holds true.
Proof. Let α be as in (6.6); to check (6.8), we use Stein’s formula: for Z ∼ N(0, σ 2 )
Here
No
h± (Xi ) = −E 1 Xi ∈ Ai± log fX
i
(Xi ) , i = 1, 2. (7.15)
and
log e 2
log 2π σ12 + σ22 ϕ̄ +
E X ϕ̄
σ12
+ σ22
2πσ12
σ12 σ22 2
log 2π + ≥ (cos α) log
(cos α)2
2πσ22
+ (sin α)2 log + δ. (7.18)
(sin α)2
Weighted entropy: basic inequalities 243
Combining (7.18) and (7.19) one gets (7.17). Now, to check (6.8) with WF ϕ, in view
of (7.17) it suffices to verify
log e 2
log 2π σ12 + σ22 Eϕ(X) − ϕ̄ + 2
2 E X ϕ(X) − ϕ̄
σ1 + σ2
2 (cos α)2 log e 2
2πσ1
− (cos α)2 log
2
Eϕ(X1 ) − ϕ̄ + 2 E X1 ϕ(X1 ) − ϕ̄
(cos α) σ1
2 2
2πσ2 (sin α) log e 2
− (sin α)2 log
2
[Eϕ(X2 ) − ϕ̄) + 2 E X2 ϕ(X2 ) − ϕ̄ < δ.
(sin α) σ2
(7.20)
We check (7.20) by a brute force, claiming that each term in (7.20) has the abso-
lute value < δ/6 when ǫ is small enough. For the terms containing E[ϕ(Z) − ϕ̄],
Z = X, X1 X2 , this follows since |ϕ − ϕ̄| < ǫ. For the terms containing factor
E[Z 2 (ϕ(Z) − ϕ̄)], we use Stein’s formula (7.13) and the condition that |ϕ′′ (x)| ≤
ǫϕ(x).
Similar assertions can be established for other examples of PDFs f1 (x) and f2 (x),
i.e. uniform, exponential, Gamma, Cauchy, etc.
f1 (x)f2 (y)1(x + y = u)
fZ|X+Y (x, y|u) = R . (8.1)
f (v)f2 (u − v)dv
Rn 1
Ξ := Ξϕ1 ,ϕ2 (X, Y) = Mϕ Jϕw1 (X)Gϕ (I − Mϕ Gϕ )−1 Mϕ Gϕ Jϕw2 (Y)Gϕ − Jϕw1 (X)
(8.4)
Proof. We use the same methodology as in Theorem 1 from [27]. Recalling Corol-
lary 4.8, (iii) in [20] substitute P := [1, 1]. Therefore for Z = (X, Y), J w (Z) is an
m × m matrix w −1
w
−1 Jϕ1 (X) Mϕ
Jθ (Z) = . (8.6)
Mϕ Jw
ϕ2 (Y)
Next, we need the following well-known expression for the inverse of a block matrix
−1 −1
C11 + D12 D−1 −D12 D−1
C11 C21 22 D21 22
= , (8.7)
C12 C22 −D−1
22 D21 D−1
22
where
D22 = C22 − C21 C−1
11 C12 , D12 = C−1
11 C12 ,
−1
D21 = C21 C11 ,
C11 = Jw
ϕ1 (X), C22 = Jw
ϕ2 (Y), and C12 = C21 = Mϕ .
By using the Schwarz inequality, we derive
Mϕ2 ≤ Jw w
ϕ1 (X) Jϕ2 (Y), or Mϕ Gϕ ≤ I, (8.8)
(1) ∂ log f1 (x) ∂ log f2 (y) (1)
with equality iff fX (x) = ∂θ ∝ ∂θ = fY (y).
Define
−1
δ := Jw w
ϕ2 (Y) − M̃ϕ Jϕ1 (X) Mϕ = (I − Mϕ Gϕ )Jw
ϕ2 (Y)
−1
⇒ δ −1 = Jw (I − Mϕ Gϕ )−1 .
θ2 (Y) (8.9)
Thus, owing to the (8.7), particularly for P = [1, 1], we can write
−1 T −1 −1 −1
P Jwϕ (Z) P = Jw ϕ1 (X) + Jw ϕ1 (X) Mϕ δ −1 Mϕ Jw ϕ1 (X)
−1 −1
− Jw ϕ1 (X) Mϕ δ −1 − δ −1 Mϕ Jw ϕ1 (X) + δ −1 . (8.10)
Substituting (8.9), in above expression, we have
−1 T
P Jw ϕ (Z) P
w
−1 −1
= Jϕ1 (X) + Gϕ (I − Mϕ Gϕ )−1 Mϕ Jw ϕ1 (X)
−1 −1
− Gϕ (I − Mϕ Gϕ )−1 − Jw ϕ2 (Y) (I − Mϕ Gϕ )−1 Mϕ Jw
ϕ1 (X)
Weighted entropy: basic inequalities 245
−1
+ Jw
ϕ2 (Y) (I − Mϕ Gϕ )−1
−1
= Jw
ϕ1 (X) (I − Mϕ Gϕ )
−1
+ Gϕ (I − Mϕ Gϕ )−1 Mϕ Jw ϕ1 (X) (I − Mϕ Gϕ ) − Gϕ
−1 −1
− Jw (I − Mϕ Gϕ )−1 M̃ϕ Jw
θ2 (Y) ϕ1 (X) (I − Mϕ Gϕ )
−1
+ Jw (I − Mϕ Gϕ )−1 .
ϕ2 (Y) (8.11)
Consequently by simplifying (8.11), one yields
−1 T
P Jw
ϕ (Z) P
w −1 −1
= Jϕ1 (X) + Jw ϕ2 (Y) + Ξϕ1 ,ϕ2 (X, Y) (I − Mϕ Gϕ )−1 . (8.12)
By using Corollary 3.4, (iii) from [20], we obtain the property claimed in (8.5):
−1 −1 −1
Jw
w
+ Jw + Ξϕ1 ,ϕ2 (X, Y) (I− Mϕ Gϕ )−1
ϕ (X+ Y) ≤ Jϕ1 (X) ϕ2 (Y) .
where
2 d hw ϕ (Z)
Λ(γ) = . (9.3)
d dγ E[ϕ(Z)]
In view of (9.2) the concavity of the WEP is equivalent to the inequality
d −1
Λ(γ) ≥ 1. (9.4)
dγ
In the spirit of the WEP, we shall present a new proof of concavity of EP. Regard-
ing this, let us apply the WFII (8.5) to ϕ ≡ 1. Then a straightforward computation
gives
d d
≥ 1. (9.5)
dγ tr J(Z)
Theorem 9.1 (A weighted De Bruijn’s identity). Let X ∼ fX be a RV in Rn , with a
PDF fX ∈ C 2 . For a standard Gaussian RV N ∼ N(0, Id ) independent of X, and
√
given γ > 0, define the RV Z = X + γN with PDF fZ . Let Vr be the d-sphere
of radius r centered at the origin and having surface denoted by Sr . Assume that for
given WF ϕ and ∀γ ∈ (0, 1) the relations
Z Z
fZ (x) ln fZ (x) dx < ∞, ∇ fZ (y) ln fZ (y) dy < ∞ (9.6)
and Z
lim ϕ(y) log fZ (y) ∇fZ (y) dSr = 0 (9.7)
r→∞ Sr
Then ∃δ = δ(ǫ) such that any WF ϕ for which ∃ϕ̄ > 0: |ϕ − ϕ̄| < δ, |∇ ϕ| < δ the
WEP (9.1) is a concave function of γ. Under the milder assumption
d d
≥ 1 + ǫ, (9.11)
dγ tr J(Z) γ=0
By a straightforward calculation
−1
2 d w d
hϕ (Z)E ϕ(Z) − hw
ψ(Z) = d E ϕ(Z) ϕ (Z) Eϕ(Z) ,
dγ dγ
d w 1 1 d 1
hϕ (Z) = trJϕw (Y) − E ϕ(Y) + R(γ). (9.13)
dγ 2 2 dγ 2
These formulas imply
d
ψ(γ) = + o(δ).
tr Jϕ (Z)
tr Jϕw (Z) − tr J(Z) < δ tr J(Z).
as 1 − δ < E ϕ(Z) < 1 + δ, (9.14)
Next,
d 1
Z
E ϕ(Z) = ϕ(y)∆fZ (y)dy (9.15)
dγ 2
and using the Stokes formula one can bound this term by δ. Finally, |R(γ)| ≤ δ in
view of (9.7), which leads to the claimed result.
This section follows [18]. The concept of a rate of the WE or WDE emerges when
we work with outcomes in a context of a discrete-time random process (RP):
n−1
hw log pn X0n−1 := EIϕwn X0n−1 .
ϕn (pn ) = −Eϕn X0 (10.1)
n
Pn the WF ϕn is made dependent
Here
n
Qnon n: two immediate cases are where (a) ϕn (x1 ) =
j=0 ψ(xj ) and (b) ϕn (x1 ) = j=0 ψ(xj ) (an additive and multiplicative WF, re-
n−1
spectively). Next, X0 = (X0 , . . . , Xn−1 ) is a random string generated by an RP.
For simplicity, let us focus on RPs taking values in a finite set X . Symbol P stands for
the probability measure of X, and E denotes the expectation under P. For an RP with
IID values, the joint probability of a sample x0n−1 = (x0 , . . . , xn−1 ) is pn (x0n−1 ) =
Qn−1
j=0 p(xj ), p(x) = P(Xj = x) being the probability of an individual outcome
Qn
x ∈ X . In the case of a Markov chain, pn (x0n−1 ) = λ(x0 ) j=1 p(xj−1 , xj ). Here
λ(x) gives an initial distribution and p(x, y) is the transition probability on X ; to
reflect this fact, we will sometimes use the notation hw ϕn (pn , λ). The quantity
In the IID case, the WI and WE admit the following representations. Define
S(p) = −E[log p(X)] and Hψw = −E[ψ(X) log p(X)] to be the SE and the WE,
of the one-digit distribution (the capital letter is used to make it distinct from hw
ϕn , the
multi-time WE).
(A) For an additive WF:
n−1
X n−1
X
Iϕwn x0n−1 = −
ψ(xj ) log p(xl ) (10.2)
j=0 l=0
and
hw w
ϕn (pn ) = n(n − 1)S(p)E ψ(X) + nHψ (p) := n(n − 1)A0 + nA1 . (10.3)
and
n−1
hw w
:= B0n−1 × nB1 .
ϕn (pn ) = nHψ (p) Eϕ(X) (10.5)
The values A0 , B0 and their analogs in a general situation are referred to as primary
rates, and A1 , B1 as secondary rates.
Here P(y|x−1 −1
−∞ ) is the conditional PM/DF for X0 = y given x−∞ , an infinite past
realization of X. An assumption upon WFs ϕn called asymptotic additivity (AA) is
that
1
lim ϕn X0n−1 = α, P-a.s. and/or in L2 (P).
(10.7)
n→∞ n
Eqns (10.6), (10.7) lead to the identification of the primary rate: A0 = αS.
Theorem 10.1. Given an ergodic RP X, consider the WI Iϕwn (X0n−1 ) and the WE
Hϕwn (pn ) as defined in (10.2), (10.3). Suppose that convergence in (10.7) holds P-
a.s. Then:
(I) We have that
Iϕwn (X0n−1 )
lim = αS, P-a.s. (10.8)
n→∞ n2
Weighted entropy: basic inequalities 249
(II) Furthermore,
(a) suppose that the WFs ϕn exhibit convergence (10.7), P-a.s., with a finite α, and
|ϕn (X0n−1 )/n| ≤ c where c is a constant independent of n. Suppose also that
convergence in Eqn (10.6) holds true. Then we have that
hw
ϕn (pn )
lim = αS. (10.9)
n→∞ n2
(b) Likewise, convergence in Eqn (10.9) holds true whenever convergences (10.7)
and (10.6) hold P-a.s. and | log pn (X0n−1 )/n| ≤ c where c is a constant.
(c) Finally, suppose that convergence in (10.6) and (10.7) holds in L2 (P), with
finite α and S. Then again, convergence in (10.9) holds true.
Example 10.2. Clearly, the condition of stationarity cannot be dropped. Indeed, let
ϕn (x0n−1 ) = αn be an additive WF and X be a (non-stationary) Gaussian process
with covariances C = {Cij , i, j ∈ Z1+ }. Let fn be a n-dimensional PDF of the vector
(X1 , . . . , Xn ). Then
αn
hw
ϕn (fn ) = n log(2πe) + log det(Cn ) . (10.10)
2
Suppose that the eigenvalues λ1 ≤ · · · ≤ λj ≤ · · · ≤ λn of Cn have the order
λj ≈ cj. Then by Stirling’s formula the second term in (10.10) dominates and the
scaling of hw 2
ϕn (fn ) is (n log n)
−1
instead of n−2 as n → ∞.
Theorem 10.1. can be considered as an analog of the SMB theorem for the pri-
mary WE rate in the case of an AA WF. A specification of the secondary rate A1 is
given in Theorem 10.3 for an additive WF. The WE rates for multiplicative WFs are
studied in Theorem 10.4 for the case where X is a stationary ergodic Markov chain
on X .
Pn−1
Theorem 10.3. Suppose that ϕn (x0n−1 ) = j=0 ψ(xj ). Let X be a stationary RP
with the property that ∀ i ∈ Z there exists the limit
and the last series converges absolutely. Then limn→∞ n1 Hϕwn (pn ) = A1 .
"n−1
1 Y
hw
ϕn (fn ) = E ψ(Xj ) X02 − 2αX0 X1
2 j=0
+ 1 + α2 X12 − 2αX1 X2 + 1 + α2 X22 − · · ·
√ 2
#
1 − α
2
+ 1 + α2 Xn−2 2
− 2αXn−2 Xn−1 + Xn−1 − 2 log .
(2π)n/2
(10.15)
Acknowledgments
The article was prepared within the framework of the Academic Fund Program at
the National Research University Higher School of Economics (HSE) and supported
within the subsidy granted to the HSE by the Government of the Russian Federation
for the implementation of the Global Competitiveness Programme.
References
[1] Cover, T., Thomas, J.: Elements of Information Theory. John Wiley, New York (2006).
MR2239987
Weighted entropy: basic inequalities 251
[2] Dembo, A.: Simple proof of the concavity of the entropy power with respect to
added Gaussian noise. IEEE Trans. Inf. Theory 35(4), 887–888 (1989). MR1013698.
doi:10.1109/18.32166
[3] Dembo, A., Cover, T., Thomas, J.: Information theoretic inequalities. IEEE Trans. Inf.
Theory 37, 1501–1518 (1991). MR1134291. doi:10.1109/18.104312
[4] Frizelle, G., Suhov, Y.: An entropic measurement of queueing behaviour in a class
of manufacturing operations. Proc. R. Soc. Lond., Ser A 457, 1579–1601 (2001).
doi:10.1098/rspa.2000.0731
[5] Frizelle, G., Suhov, Y.: The measurement of complexity in production and other com-
mercial systems. Proc. R. Soc. Lond., Ser A 464, 2649–2668 (2008). doi:10.1098/
rspa.2007.0275
[6] Guiasu, S.: Weighted entropy. Rep. Math. Phys. 2, 165–179 (1971). MR0289206.
doi:10.1016/0034-4877(71)90002-4
[7] Kelbert, M., Suhov, Y.: Information Theory and Coding by Example. Cambridge Univer-
sity Press, Cambridge (2013). MR3137525. doi:10.1017/CBO9781139028448
[8] Kelbert, M., Suhov, Y.: Continuity of mutual entropy in the limiting signal-to-noise ratio
regimes. In: Stochastic Analysis, pp. 281–299. Springer, Berlin (2010). MR2789089.
doi:10.1007/978-3-642-15358-7_14
[9] Kelbert, M., Stuhl, I., Suhov, Y.: Weighted entropy and optimal portfolios for risk-averse
Kelly investments. Aequationes Mathematicae 91 (2017). in press
[10] Kelbert, M., Stuhl, I., Suhov, Y.: Weighted entropy and its use in Computer Science and
beyond. Lecture Notes in Computer Science. (in press)
[11] Khan, J.F., Bhuiyan, S.M.: Weighted entropy for segmentation evaluation. Opt. Laser
Technol. 57, 236–242 (2014). doi:10.1016/j.optlastec.2013.07.012
[12] Lai, W.K., Khan, I.M., Poh, G.S.: Weighted entropy-based measure for image segmenta-
tion. Proc. Eng. 41, 1261–1267 (2012). doi:10.1016/j.proeng.2012.07.309
[13] Lieb, E.: Proof of entropy conjecture of Wehrl. Commun. Math. Phys. 62, 35–41 (1978).
MR0506364. doi:10.1007/BF01940328
[14] Nawrockia, D.N., Harding, W.H.: State-value weighted entropy as a measure of invest-
ment risk. Appl. Econ. 18, 411–419 (1986). doi:10.1080/00036848600000038
[15] Paksakis, C., Mermigas, S., Pirourias, S., Chondrokoukis, G.: The role of weighted en-
tropy in security quantification. Int. Journ. Inf. Electron. Eng. 3(2), 156–159 (2013)
[16] Rioul, O.: Information theoretic proofs of entropy power inequality. IEEE Trans. Inf.
Theory 57(1), 33–55 (2011). MR2810269. doi:10.1109/TIT.2010.2090193
[17] Shockley, K.R.: Using weighted entropy to rank chemicals in quantitative high through-
put screening experiments. J. Biomol. Screen. 19, 344–353 (2014). doi:10.1177/
1087057113505325
[18] Suhov, Y.: Stuhl I. Weighted information and entropy rates (2016). arXiv:1612.09169v1
[19] Suhov, Y., Sekeh, S.: An extension of the Ky-Fan inequality. arXiv:1504.01166
[20] Suhov, Y., Stuhl, I., Sekeh, S., Kelbert, M.: Basic inequalities for weighted entropy. Aequ.
Math. 90(4), 817–848 (2016). MR3523101. doi:10.1007/s00010-015-0396-5
[21] Suhov, Y., Sekeh, S., Kelbert, M.: Entropy-power inequality for weighted entropy.
arXiv:1502.02188
[22] Suhov, Y., Yasaei Sekeh, S.: Stuhl I. Weighted Gaussian entropy and determinant inequal-
ities. arXiv:1505.01753v1
[23] Tsui, P.-H.: Ultrasound detection of scatterer concentration by weighted entropy. Entropy
17, 6598–6616 (2015). doi:10.3390/e17106598
252 M. Kelbert et al.
[24] Verdú, S., Guo, D.: A simple proof of the entropy-power inequality. IEEE Trans. Inf.
Theory 52(5), 2165–2166 (2006). MR2234471. doi:10.1109/TIT.2006.872978
[25] Villani, C.: A short proof of the “concavity of entropy power”. IEEE Trans. Inf. Theory
46, 1695–1696 (2000). MR1768665. doi:10.1109/18.850718
[26] Yang, L., Yang, J., Peng, N., Ling, J.: Weighted information entropy: A method for esti-
mating the complex degree of infrared images’ backgrounds. In: Kamel, M., Campilho,
A. (eds.) Image Analysis and Recognition, vol. 3656, Springer, Berlin/Heidelberg,
pp. 215–22 (2005). MR3157460. doi:10.1007/978-3-642-39094-4
[27] Zamir, R.: A proof of the Fisher information inequality via a data processing argument.
IEEE Trans. Inf. Theory 44(3), 1246–1250 (1998). MR1616672. doi:10.1109/18.669301