You are on page 1of 26

On maximization of measured f -divergence

between a given pair of quantum states


Keiji Matsumoto
National Institute of Informatics,
2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo 101-8430
keiji@nii.ac.jp
March 25, 2015

1 Introduction
This paper deals with maximization of classical f -divergence between the distri-
butions of a measurent outputs of a given pair of quantum states. f -divergence
Df between the probability density functions p1 and p2 over a discrete set is
de…ned as
X p1 (x)
Df (p1 jjp2 ) := p2 (x) f ;
x
p2 (x)

if p2 (x) > 0 for all x. (The de…nition for the general case is given later.)
This problem is import since Df has good operational meanings. For some
choice of f , its operational meaning is well-known. For example,

fKL ( ) := ln ; fKL2 ( ) = ln

correspond to Kullback-Leibler divergence. Also,

; ( 0; 1) ;
f ( ) := (1)
; (0 < < 1) :

correspond to Rnyi-type relative entropy (or its monotone function). They play
key role in the theory of large deviation, and thus extensively used in asymptotic
analysis of eorror probability of decoding, hypothesis test, and so on.
Not only these choice of f , but others have also a good operational meaning.
If f is a proper closed convex function f with dom f (0; 1), Df (p1 jjp2 ) is
the optimal gain of a certain Bayes decision problem. For each f satisfying
above mentioned conditions, there is a pair of real valued functios w1 and w2
on decision space representing a gain of decision d, with
X
Df (p1 jjp2 ) = sup (w1 (d (x)) p1 (x) + w2 (d (x)) p2 (x)) : (2)
d( ) x

1
Conversely, for each pair (w1 ( ) ; w2 ( )), there is a proper closed convex function
f satisfying the above idenity.
Also, by (2) and the celebrated randomization criterion [6][7], there is a
Mrkov map which sends (p; q) to (p0 ; q 0 ) if and only if Df (pjjq) Df (p0 jjq 0 )
holds for any proper closed convex function f with dom f (0; 1).
In quantum information and statistics, we make decisions based on the data
out of measurements performed on given quantum states. Thus, a honest quan-
tum version of the concept is

Dfmin ( 1 jj 2 ) := sup Df P M1 jjP M2 ; (3)


M :POVM

where POVM is the short for positive operator valued measure, and P M is
the probability distribution obtained by applying the measurement M to the
state ( 2 f1; 2g). (Underlying Hilbert space, denoted by H, is …nite dimen-
sional throughout the paper.) A trouble with Dfmin ( 1 jj 2 ) is that the de…ni-
tion involves the intractable maximization of non-linear functional of POVM.
(If they were tractable, quantum asymptotic theory of hypothesis test and so
on should have been easier. As is well-known, these had been solved in quite
non-trivial manner.) This optimization had been so far solved only for f1=2 ,
and f ( ) = j1 j. The former and the latter corresponds to …delity and total
valiation, respectively.
The purposes of the present paper is to advance the study further, by inves-
tigating its properties, rewriting the maximization problem to more tractable
form, and giving closed formulas of the quantity in some special cases. One of
the two core results is Theorem 6: if the convex conjugate f of f is operator
convex, Dfmin ( 1 jj 2 ) is written as the supremum of a concave funcction of Her-
mitian operator in H (If the operators could move extended Hilbert space, the
result would not have been as useful), namely,

Dfmin ( 1 jj 2 ) = sup ftr 1T tr 2f (T ) ; spec T dom f g

This minimization problem is stragight forwardly solved f = f2 and f 1 .


The other core result is Theorem 11: if f is operator convex, dom f is
unbounded from below, and ker 1 is non-trivial, one can reduce the optimization
problem to the one in supp 1 , namely,

Dfmin ( 1 jj 2 ) = Dfmin ( 1 jj 1 2 1 ) + f ( 1) (1 tr 2 1 );

where 1 is the projection onto supp 1 . Using this, closed formula for pure
states is obtained, in case where f satis…es above mentioned condtions.
Using above results, we computed the limit
1 min
lim 2 Df ( jj 0 );
0
! ( 0 )

where f g 2R is a family of parameterized states. In classsical case, this equals


Fisher information, which plays key role in the asymptotic theory of estimation

2
of the parameter . In case that > 0 for all , it have been a forklore that this
equals the constant multiple of SLD (symmetric logarithmic derivative) Fisher
information, which plays important role in the asymptotic theory of statistics.
We analyze the case where is not full-rank, and show this statement requires
non-trivial correction.

2 Classical f -divergence
In this section, we summerize the known or trivial facts about classical f -
divergence and convex functions. Below, f takes values in R [ f1; 1g ,
dom f := f : f ( ) < 1g. f is convex if and only if f( 1 ; 2 ) ; 2 f ( 1 )g is
convex. This way of de…nition is equivalent to the convention de…ning f ( ) :=
1 outside of the domain. A convex function f is proper if and only if f is
nowhere 1 and not 1 everywhere, and is closed if and only if f( 1 ; 2 ) ; 2 f ( 1 )g
is closed, or equivallently, f is lower semicontinuous.
Let f be a proper closed convex function whose e¤ective domain contains
(0; 1). De…ne g ( 1 ; 2 ) by
8
>
> 2f
1
; 1 2 dom f; 2 > 0
>
> 2
<
g ( 1 ; 2 ) := lim 2 #0 2 f 2 ;1
1 2 dom f; 2 = 0;
>
>
>
> 0; 1 = 2 = 0;
:
1; 1 62 dom f or 2 < 0:

Note g is proper, closed, convex, and positively homogeneous (see p. 35 and p.67
of [5]). Especially,
1
lim 2f > 1: (4)
2 #0 2

If 2 > 0, ( 1 ; 2 ) is a member of dom g if and only if 1 2 dom f .


Then f -divergence Df (P1 jjP2 ) positive …nite measures P1 and P2 is de…ned
by Z
Df (P1 jjP2 ) := g (p1 (x) ; p2 (x)) d (x) ;

where p1 and p2 are density function of P1 and P2 with respect to a measure


which majorizes them ( may be taken as P1 + P2 ).
If p2 (x) = 0 and p1 (x) > 0,
p1 (x)
g (p1 (x) ; p2 (x)) = lim 2f ;
2 #0 2

0 1
= p1 (x) lim
0 2f 0 ;
2 #0 2

and if p1 (x) = 0 and p2 (x) = 0, g (p1 (x) ; p2 (x)) = 0. Therefore,


Z
p1 (x) c 1
Df (P1 jjP2 ) = p1 (x) f d (x) + P1 ((supp p2 ) ) lim 2f ;
supp q p 2 (x) 2 #0 2

3
where B c is the complement of the set B

By Corrolary 13.5.1 of [5],


X
g( 1; 2) = sup w ; (5)
w2W
2f1;2g

where
n o
Wf = w = (w ) 2f1;2g ; w1 w10 ; w2 f (w10 ) ; 9w10 2 dom f ;

and f is convex conjugate of f ,

f (t) := sup (t f ( )) :

Therefore,

Df (P1 jjP2 )
0 1
Z X
= @ sup w p (x)A d (x)
w2W
2f1;2g
8 9
<Z X =
= sup w (x) p (x) d (x) ; w ( ) : measureable, into Wf : (6)
: ;
2f1;2g

Remark 1 (6) indicates (2).To see this, use Wf as a decision space, and let
the the gain of the decision w is w when true probability distribution is P .

The correspondence between Df and Wf is one-to-one. But two di¤erent


convex functions f and f0 may corresponds to the same f -divergence, Df = Df0 .
To realize this fact, observe that w2 = f (w1 ) de…nes upper-right border of
Wf and Wf is unbounded below and towords left. Hence, We call the choice
of f is standard if f is (strictly) monotone increasing on dom f . If f is not
standard, we de…ne f0 so that f0 (t) = f (t) at any t with

f (t0 ) > f (t) ; 8t0 > t (7)

and f0 (t) = 1, otherwise. f0 is de…ned by f0 := f0 . Then dom f0 is the set


of all members of dom f with (7). dom f0 may be empty, but then f ( ) = 1
on the positive half-line, and such cases are not interesting. So we suppose
the otherwise. It is not di¢ cult to see that f0 and f coincide on the positive
half-line. Let > 0. Since

t f (t) t0 f (t0 )

4
hlods for any t0 with t0 > t and f (t0 ) f (t),
f ( ) = sup t f (t) ;
t
= sup t f (t)
t2dom f0

= sup t f0 (t)
t2dom f0

= f0 ( ) :
If f0 6= f , how f0 di¤ers from f ? By assumptuin, f is not monotone
increasing. Also, we exclude unintresting case where f is monotone non-
increasing. Then mint f (t) exists. Let a0 be the largest number such that
f (a0 ) = mint f (t). Then
dom f0 = [a0 ; 1) \ dom f .
Let < 0. Then since f0 is monotone increasing on dom f0 = [a0 ; 1) \ dom f ,
f0 ( ) = sup t f (t)
t2[a0 ;1)\dom f

= ta0 f (a0 ) :
Thus, f0 is a¢ ne on the negative half line.
For example, both of
fTV ( ) := j1 j;
j1 j; 0;
fTV2 ( ) :=
1; < 0;
correspond to the total valiation distance,
DfTV (P jjQ) = DfTV2 (P jjQ) = kP Qk1 :
The former is standard but the latter is not,
8
< 1; t < 1;
fTV (t) = t; 1 t 1; ;
:
1; t > 1;
8
< 1; t < 1;
fTV2 (t) = t; 1 t 1; :
:
1; t > 1;
De…ne

f^ ( ) := g (1; ) ; 0;
and extend it to the negative half line so that it becomes standard. Also, De…ne
g^ by in a parallel manner as the de…nition of g, replacing f by f^. Then,
g^ ( 1; 2) = g( 2; 1) ; (8)

5
if 1 0, as will be checked below.
If ^
1 2 dom f and 2 > 0, the relation is checked by easy computation. If
1 = 2 = 0, the = dom f^ and 2 > 0,
both ends of (8) is 0. Finally, if 1 = 0 2

g^ (0; 2) = 1 = f^ (0) = 2f
^ (0) = 2 g (1; 0) = g( 2 ; 0) ;

and the relation is checked.(8) means

Df (P1 jjP2 ) = Df^ (P2 jjP1 ) :

3 Su¢ cient conditions for Dfmin < 1


Now we proceed to a quantum analogue of f -divergence. Let f g 2f1;2g be a
pair of positive operators in a …nite dimensional Hilbert space H. We de…ne
Dfmin by (3). The notation comes from the following fact. If DfQ is a real
valued function of f g 2f1;2g which conincide with Df on any commutative
subalgebra and is monotone non-increasing by application of unital completely
positive (CP) maps, then

Dfmin ( 1 jj 2 ) DfQ ( 1 jj 2 ) :
(If the statement is untrue, it obviously contradict with the monotonicity of
DfQ . )

Remark 2 Here, Df and Dfmax have operational meanngs only if the entities
are normalized. But extending them to general positive …nite measures and
positive operators makes statement of the mathematical statements easier. Since
we sometimes consider submatrices of inputs states, especially in the case that
measurement process is involved.

In this section we dtermine the case where Dfmin stays …nite. Note this result
also give, due to the above mentioned fact, the su¢ cient condition that all the
quantum versions of f -divergence becomes in…nite. Before analyzing non-trivial
cases, we …rst present the cases where Dfmin is …nite trivally.
The …rst trivial case is the one where Wf is "bounded from above", or there
are …nite numbers a1 adn a2 with

Wf ( 1; a1 ] ( 1; a2 ]: (9)

(This is the case if dom f is a …nite interval.) Then by (6),

Df (P1 jjP2 ) max fa1 ; a2 g :

Thus, Dfmax is also …nite. Therefore, we consider the cases where the above
assumption is false.

6
Another trivial case is the one where 1 > 0 and 2 > 0. Then there is a
constant b > 0 and b0 > 0 (b b0 ) with
1 b 2 0; (10)
1 b0 2 0; (11)
0
respectively. For latter use, b and b is de…ned as the smallest number with
(10) and the largest number with (11), respectively. Then, denoting by pM the
density of P M with respect to P M1 + P M2 ,
pM1 (x) tr M 1
sup b ;
pM2 (x) 0 M 1H tr M 2

pM2 (x) tr M 1 1
inf :
pM1 (x) 0 M 1H tr M 2 b0
Hence, using the assumption that f is …nite on (0; 1), we have the asserted
statement. Thus, the question is the case where at least one of them is not
full-rank.
Observe by (6) and (3),
Dfmax ( 1 jj 2 ) w1 tr 1 + w2 tr 2 + w10 tr 1 (1 ) + w20 tr 2 (1 )
holds for any projection and w, w0 2 Wf .
Suppose ker 1 is non tirivial. Then let P be the projection onto ker 1 . Then
above inequality shows that Dfmax ( 1 jj 2 ) can be bounded only if w2 stays …nite,
i.e.,
Wf R ( 1; a2 ]: (12)
On the other hand, suppose the above inclusion is true. To avoid the trivial
case, suppose Wf is not bounded from above, and 2 is full-rank. ( If the latter
is false, by the parallel argument as above shows Dfmax = 1.) By the latter
assumption, there is b > 0 with (10). Let t be the number such that b is a
subgradient of f . Then by the convexity of f and (10), we have
X
Dfmin ( 1 jj 2 ) sup (t tr 1 Mt + ( b (t t ) f (t )) tr 2 Mt )
fMt g t
X
= sup t tr ( 1 b 2 ) Mt + b t tr 1 f (t ) tr 2
fMt g t
b t tr 1 f (t ) tr 2;

Exchanging 1 and f by 2 and f 1 (here, we suppose f is standard


without loss of generality), we have all the cases, summerized as below.
Theorem 3 Suppose f is a proper closed convex function with dom f (0; 1).
Then, Dfmax ( 1 jj 2 ) < 1 holds if 1 > 0, 2 > 0. Also, Dfmax ( 1 jj 2 ) < 1 is
equivalent to (9) if ker 1 6= f0g, ker 2 6= f0g, to (12) if ker 1 6= f0g, 2 > 0,
and to
Wf ( 1; a1 ] R; (13)
if 1 > 0; ker 2 6= f0g.

7
Note (12) and (13) is equivalent to

f ( 1) > 1

and
dom f ( 1; a1 ];
respectively.

4 Continuity of Dfmin
Since Dfmin is jointly convex almost by de…nition, it is continuous in the interior
of dom Dfmin , or at the points where 1 > 0 and 2 > 0. By applying well-known
facts in convex analysis, a sort of continuity at the edge is easily proved. For a
given convex function f , we denote epi f := f( 1 ; 2 ) ; 2 f ( 1 )g.
The de…nition (3) is rewrited as, using (6),

8 9
0 0
<Z X 0
=
sup Df P M1 jjP M2 = sup w (x) dP M (x) ; w ( ) : measureable function into Wf
M0 M 0 ;w : ;
2f1;2g
8 9
<Z X =
= sup w (x) tr M 0 (dx) ; w ( ) :measureable function into Wf
M 0 ;w : ;
2f1;2g
8 ! 9
< X Z =
= sup tr w M (dw) ;
M : w2Wf ;
2f1;2g

where M (B) := M 0 w 1 (B) is a POVM over the topological -…eld on Wf


R2 .
Let WH;f be the set of pairs of operators such that

(W ) 2 2 WH;f
Z
, 9M : POVM W w M (dw) :
w2Wf

Then

Dfmin ( 1 jj 2 ) := sup Df P M1 jjP M2


M
X
= sup tr ( W ) ; (14)
~ 2WH;f
W 2f1;2g

where
~ = (W )
W = (W1 ; W2 ) :
2

8
Lemma 4 Dfmin is a proper closed convex function which is positively homoge-
neous.

Proof. That Dfmin is proper, convex and positively homogeneous is trivial. So


we prove it is closed. By (14),
\
epi Dfmin = epi sup hW~ = epi hW
~ ;
~ 2WH;f
W (W ) 2 2WH;f

where X
hW
~ ( 1; 2) := tr ( W ) :
2f1;2g
T
Since each epi hW
~ is closed, (W ) 2WH;f epi hW
~ is also closed. Thus we have
2
the assertion.

Proposition 5 For any 1 0 and 2 0, and for any X1 ; X2 0,

lim Dfmin ( 1 + sX1 jj 2 + sX2 ) = Dfmin ( 1 jj 2 ) :


s#0

Proof. Since Dfmin is closed, it is continuous on any (…nite dimensional) symplex


inside dom Dfmin by Theorem 10.2 of [5]. Applying this fact to the line segment
connecting ( 1 + sX1 ; 2 + sX2 ) and ( 1 ; 2 ), we have the assertion.

5 Simplifying the optimization problem


De…ne W0H;f as the set of pairs of operators such that (W ) 2 2 W0H;f is
equivalent to the existence of a measurable set and a projection valued measure
(PVM) E over it in the Hilbert space H with
Z
W w E (dw) :
w2Wf

Obviously, W0H;f WH;f and, by Naimark extension theorem, for any (W ) 2 2


WH;f , there is a separable Hilbert space K and an isometry from K onto H such
that
~ 2 V y W0K;f V:
W
Therefore, if
V y W0K;f V W0H;f (15)
for any separable Hilbert space K and any isometry V from K onto H, we have
X
Dfmin ( 1 jj 2 ) = sup tr ( W ) :
~ =2W0
W H;f 2f1;2g

If one of the following two are trule, it is easy to prove that this is the case:

(I) f is operator convex on dom f

9
(II) f^ (t) (= f 1
( t)) is operator convex on dom f^

Observe (W ) 2 2 V y W0K;f V if and only if


Z Z
y y
W1 V w1 E (dw) V = V tE 0 (dt) V = V y T V
w2Wf t2dom f
Z
W2 Vy w2 E (dw) V
w2Wf
Z
Vy f (t) E 0 (dt) V = V y f (T ) V;
t2dom f

where E is a PVM in K, E 0 is the marginal of E, and


Z
T := tE 0 (dt) :
t2dom f

If (I) holds, or f is operator convex, by Jensen-type inequality,

V y f (T ) V f V yT V :
R
Therefore, de…ning a PVM E 00 by the spectral decomposition V y T V = s2dom f
sE 00 (ds),
Z Z
00
W1 sE (ds) ; W2 f (s) E 00 (ds) :

Also, observe
spec V y W1 V spec W1 s 2 dom f :
Thus, (15) is veri…ed. In the case that (II) holds, almost in the parallel manner
(15) is checked.
W0H;f can be expressed in the following way:
n o
W0H;f = W~ ; W1 T; W2 f (T ) ; spec T dom f :

~ 2 W0
This is checked by noticing that W H;f is equivallent to

W1 T;
Z Z
W2 w2 E (dw) f (w1 ) E (dw) = f (T ) :
w2Wf w2Wf

Here, E is an arbitrary PVM over H.


1
As we analyze later, f ( 2 ) and fKL2 are the exaples where (I) holds,
1
and f ( 2 ) and fKL are the exaples where (II) holds. Summerizing the
argument so far, we have:

10
Theorem 6 Let f be a proper convex function with dom f (0; 1). If (15)
is true, then W0H;f = WH;f and thus

Dfmin ( 1 jj 2 ) = sup ftr 1T tr 2f (T ) ; spec T dom f g (16)


1 1
= sup tr 1f (S) tr 2 S; spec S dom f :
Also, instead of all POVM, we only have to optimize over all PVM’s. If the
supremum is achived, it is achived by the spectrum decomposition of Hermitian
operators T or S achieving the maximum of the above. (15) is true if (I) or
(II) is true.

6 The stationary point


In the section, unless otherwise mentioned, we suppose that the condition (I)
holds. This means that
G (T ) := tr 1T tr 2f (T ) (17)
is concave in T , and its supremum equals Dfmin . The dom G is
fT ; spec T domf g :
In this section we focus on an easy case where the stationary point T0 of G, or
T0 with @G (T0 ) =@T = 0 exists in dom G. (Note f is necessarily di¤erentiable,
being operator convex.) In this case, the supremum of G is achived at T0 .
Below, we suppose f is di¤erentiable. Hence the Frechet de…vative Df (T )
of f i.e, a linear transform in B (H) with
kf (T + X) f (T ) Df (T ) (X)k2
! 0; as kXk2 ! 0:
kXk2
is given by, in the basis which diagonalizes T ,
h i
Df (T ) (X) = f [1] ( i ; j ) Xi;j ; (18)

where i (i = 1; ) are eigenvalues of T , and


f (s) f (t)
; (s 6= t) ;
f [1] (s; t) := s t
0
f (s) ; (s = t) :
An important consequence of this formula is Df (T ) ( ) is self-adjoint with
respect to the inner product tr XY ,
X
[1]
tr Y Df (T ) (X) = 2;i;j f ( i ; j ) Xi;j
i;j
X
= f [1] ( i; j ) 2;i;j Xi;j
i;j

= tr XDf (T ) (Y ) :

11
With all de…nitions and assumptions are made, we now proceed to the analy-
sis of the maximal point of G (T ). Most tractable case is that there is a station-
ary point of G (T ) in dom G. Therefore, if
d G (T0 + sX)
= tr X 1 tr f 2 Df (T0 ) (X)g
ds s=0
= tr X ( 1 Df (T0 ) ( 2 )) = 0
holds for any Hermitian matrix X, T0 achives maximum. (Here we used the
fact that Df (T0 ) ( ) is self-conjugate.) Thus, we have

1 = Df (T0 ) ( 2 ) (19)
Therefore,
Dfmin ( 1 jj 2 ) = tr 1 T0 tr 2f (T0 )
= tr T0 Df (T0 ) ( 2 ) tr 2 f (T0 )
= tr Df (T0 ) (T0 ) 2 tr 2 f (T0 )
= tr fDf (T0 ) (T0 ) f (T0 )g 2
n 0
o
= tr T0 f (T0 ) f (T0 ) 2
0
= tr f f (T0 ) 2; (20)

Here, in the third identity holds since Df (T0 ) ( ) is self-adjoint, the …fth idenity
holds due to (18), and the last idenity is due to
f (x) = f (x) = sup tx f (t) :
t

We give a su¢ cient condition that the supremum is achived in the interior
of the domain, or equivallently, T0 with (19) exists:

sup f 0 (t) > b ; (21)


t2dom f

inf f 0 (t) < b0 ; (22)


t2dom f

where b and b0 are as of (10) and (11). We also de…ne t and t0 by


f 0 (t ) = b ;
f 0 (t0 ) = b0 ;
respectively. Then for each given T , we de…ne and 0 as the projection onto
the sum of eigenspaces corresponding to the eigenvalues more than t and less
than t0 , respectively. Also, let := 1 0
. Also, de…ne
G := tr 1T tr 2 f (T ) ;
G0 := tr 1T tr 2 f (T ) ;
G0 := tr 1T
0
tr 2 f (T ) 0 :

12
Then G (T ) = G + G0 + G0 ,

G tr 1 T tr 2 (b (T t ) + f (t ) )
= tr ( 1 b 2 ) T + tr 2 (b t f (t ))
tr ( 1 b 2 ) t + tr 2 (b t f (t ))
= tr 1 (t ) tr 2 f (t ) ;

and
G0 tr 1 (t0 0
) tr 2f (t0 ) 0
:
Therefore,
G (T ) G (t + T + t0 0
):
To summerize:

Lemma 7 Suppose f is a proper closed convex function with dom f (0; 1).
Suppose also the assumption (I) is true. Then, the T in (16) can be restricted
to the set of all the Hermitian operators with spec T [t0 ; t ]

If H is …nite dimensional and 1 > 0, 2 > 0, this lemma means that the
supremum can be restricted to the interior of dom G, and that the supremum
of the di¤erentiable concave function G ( ) is achived by a point satisfying some
point. Thus the solution T0 to (19) achieve the supremum.
So far, we had supposed the assumption (I) is true. Now let us consider the
case where (II) holds and (I) does not. A trivial approach is to exchange 0
and 1 , and apply all the analysis replacing f by f^. This means the change of
variable from T to S := f (T ). But sometimes, the use of the variable T is
more preferable by technical reasons. In such cases, still we can use (19), because
of the following reason. Since the assumption (II) says that f^ (t) = f 1 (t)
is operator convex, it is continuously di¤erentiable. Thus the stable point with
respect to S is also a stable point with respect to T .

7 Detailed analysis for non-full-rank case


In this case we treat the case where the condition (I) holds, ker 1 is non-trivial,
and dom f is not bounded from below.
By the following lemma, whose proof is ginven in Appendix A, then f in
fact is operator monotone increasing.

Lemma 8 Suppose f is proper, closed, convex, standard and dom f (0; 1).
Suppose also f is operator convex,dom f is not bounded from below, and
f ( 1) > 1. Then f is operator monotone.

Below, TH0 ()is the operator in supp 1 with Tsupp 1 jei = 1 T jei. (it is
essentially identical to 1 T 1 , except that 1 T 1 is de…ned in H. )

13
Lemma 9 Suppose also the domain of a function h is unbounded from below
and h ( 1) is …nite. Then, if X 0 is supported on ker 1 and its eigenvalues
are negative on ker 1 ,

lim h (T + sX) tr ( 2 )supp 1


h (Tsupp 1 ) + h ( 1) tr 2 (1 1
): (23)
s!1

Proof. Let rs and j's i (k's k = 1) be the -th eigenvalue and eigenvector of
T + sX, respectively. Since
1
lim s T + X = X;
s!1

lims!1 rs =s := and lims!1 j's i =: j'1 i gives complete set of eigenvalues


and eigenvectors of X. Without loss of generality, let be 0 if rank 1 ,
and negative if > rank 1 .
Let t and jei i be the i-the eigenvalue and eigenvector of (T )supp 1 . Observe,
if rank 1 , we have

rs hei j's i = hei j (T + sX) j's i


= hei j T 1 j's i + hei j T (1 1
) j's i
= ti hei j 1 j's i + hei j T (1 1
) j's i
= ti hei j's i + hei j T (1 1
) j' si;

implying

lim (rs ti ) hei j's i = lim hei j T (1 1 ) j's i


s!1 s!1
= 0:

Therefore, if hei j'1 i 6= 0, rs ! ti holds, implying that j' E1 i is a member of


0
the eigenspace for the eigenvalue ti . Also, since h'1 '1 = 0 ( 6= 0 ), the
number of such equals the dimension of this eigenspace. Therefore,
X X
lim h (rs ) j's i h's j + lim h (rs ) j's i h's j
s!1 s!1
rank 1 >rank 1
X X
= h (ti ) jei i hei j + lim h (s ) j'1 i h'1 j
s!1
rank 1 >rank 1

= h (T )supp 1
+ h ( 1) (1 1
):

Lemma 10 Suppose h is monotone increasing and operator convex. Suppose


also its domain is unbounded from below and h ( 1) is …nite. Then, if all
X 0 is supported on ker 1 and its eigenvalues are negative on ker 1 ,

lim h (T + sX) tr ( 2 )supp 1


h (Tsupp 1 ) + h ( 1) tr 2 (1 1
): (24)
s!1

14
Proof. Since h is operator convex, by Exerceise V.2.2 of [2],

(h (T + sX))supp 1
h (T + sX)supp 1

= h (Tsupp 1 ) ;
(h (T + sX))ker 1
h (T + sX)ker 1

f ( 1) 1ker 1
: (25)

Here the last inequality hols because f is monotone increasing.


Let rs and j's i be the -th eigenvalue and eigenvector of T + sX, respec-
tively. Since
lim s 1 T + X = X;
s!1

j's i converges to one of eigenvectors of X. Since 1


is the projection onto
ker X, either lims!1 1 j's i = 0 or lims!1 (1 1 ) j' s i = 0 holds. There-
fore,

lim 1 h (T + sX) (1 1 )
s!1
X n o
= lim h (rs ) lim 1 j's i h's j (1 1 )
s!1 s!1

= 0;

for h ( 1) is …nite. Combining this and (25), we have the assertion.

Theorem 11 Suppose f is proper, closed, convex, standard and dom f (0; 1).
Suppose also f is operator convex and dom f is not bounded from below. Then
if 1 is not full-rank,

Dfmin ( 1 jj 2 ) = Dfmin ( 1 jj 1 2 1
) f ( 1) tr 2 (1 1
): (26)

Therefore, the problem reduces to optimization of G (T ) restricting T to on


the support of 1 . Especially, if 1 is rank-1 state, = j'i h'j, and h'1 j 2 j'1 i =
6
0,

Dfmin (j'1 i h'1 j jj 2 ) = sup ft h'1 j 2 j'1 i f (t)g f ( 1) (1 h'1 j 2 j'1 i)


t2dom f

= f^ (h'1 j 2 j'1 i) f ( 1) (1 h'1 j 2 j'1 i) (27)

Proof. By Theorem 3, we only have to prove the assertion for the case 1 <
f ( 1) (otherwise, Dfmin ( 1 jj 2 ) = 1:)
By Lemma 8, f is operator monotone. Thus, if X < 0 is supported on
ker 1 , G (T + sX), where G is as of (17), is non-decrasing in s. Therefore,

sup G (T + sX) = lim G (T + sX) :


s s!1

15
Hence, by Lemma 10, we have

sup G (T + X) = tr 1 1
T 1
tr ( 2 )supp 1
f (Tsupp 1 )
X<0: supp X= ker 1

f ( 1) tr 2 (1 1 );

indicating (26).

8 Examples
8.1 Renyi-type, f ( )
Here we consider f ( ) which is de…ned by (1) on the non-negative half line.
On the negarive half-line, we de…ne it so that it becomes standard, i.e., f is
monotone increasing on dom f . Since = 0 and 1 give trivial functions, we
omit these cases. The relation

f^ ( ) = f1 ( );

turns out to be quite useful.


If 0 < < 1,
(
1; (t > 0) ;
f (t) =
(1 ) 1 ( t) 1 ; (t 0) :
1 1
Thus, if 0 < 2 , the condition (I) satis…ed, and if 2 < 1, the condition
(II) is satis…ed. If < 0,
(
1; (t 0) ;
f (t) =
( 1) ( ) 1 ( t) 1 ; (t < 0) ;

and the condition (I) is satis…ed. Thus, if > 1, the condition (II) is satis…ed,
and
f (t) = ( 1) 1t 1; (t > 0) ;
1; (t 0) :
Thus, if 2, in addition the condition (I) is satis…ed.
For all the values of (6= 0; 1), f 0 (t) moves all over the positive half line
(0; 1). Thus, by (22) and (21), the supremum is achieved by T0 with (19).
In the case of = 1 and 2, we can solve the problem "explicitly". Observe
f2 is operator convex on dom f2 .
1 2
f2 (t) = 4t ; (t 0) ;
;
1; (t < 0) :
By (19), we have a Lyapunov equation
1
Df2 (T0 ) ( 2 ) = (T0 2 + 2 T0 ) = 1:
4

16
If 2 > 0, this can be solved about T0 ,
Z 0
T0 = 4 es 2
1e
s 2
ds 0;
1

and in the basis where 2 is diagonal,


4
T0;i;j = 1;i;j :
2;i;i + 2;j;j

Thus, this solution T0 has spectrum in dom f2 . By (20),


2
1 1
Dfmin1 ( 2 jj 1 ) = Dfmin ( 1 jj 2 ) = tr 2 T0 = tr 2
2 T0
2
2 4
= tr 1 T0
Z 0
= 4tr 1 es 2
1e
s 2
ds
1
X 1 2
=4 j 1;i;j j :
i;j 2;i;i + 2;j;j

If = 12 ,
1; (t > 0) ;
f 1 (t) = 1 1
4t ; (t 0) ;
2

and
1 1 1
Df 1 (T0 ) ( 2 ) = T 2 T0 = 1:
2 4 0
Thus q
1 1=2 1=2 1=2 1=2
T0 = 2 2 2 1 2 2 ;
and
1=2
1 2 1 1
Dfmin ( 1 jj 2 ) = tr 2 T = tr 2 T0
1=2
4 0 2
q
1=2 1=2
= tr 2 1 2 ;

which is the minus of the …delity between 1 and 2 , as expected.


By Theorem 3, Dfmin ( 1 jj 2 ) < 1 for any 1 > 0 if < 0, for any 2 > 0 if
> 1, and for any 1 0, 2 0 if 0 < < 1.
1
If 0 < 2 , f is operator monotone, and thus by (27),

Dfmin (j'1 i h'1 j jj 2 ) = f^ (h'1 j 2 j'1 i) f ( 1) (1 h'1 j 2 j'1 i)


1
= h'1 j 2 j'1 i ;
and
2(1 )
Dfmin (j'1 i h'1 j jj j'2 i h'2 j) = jh'1 j'2 ij :
This means, if 1
2 < 1, using f^ = f1 ,
2
Dfmin (j'1 i h'1 j jj j'2 i h'2 j) = jh'1 j'2 ij :

17
8.2 On Cherno¤ and Hoe¤ding bound
The results about Renyi type quantity, especially when two states are pure, gives
another way to compute Cherno¤ bound and Hoe¤ding bound, whose classical
counter part is
1
C (p1 jjp2 ) := sup lim ln ( 1;n + 2;n )
n!1 n
= sup f ln ( Df (p1 jjp2 ))g
0< <1

and
1 1
Hr (p1 jjp2 ) := sup lim ln 1;n ; lim 2;n r
n!1 n n!1 n
r 1
= sup ln ( Df (p1 jjp2 )) ;
0< <1 1 (1 )
respectively. Here, 1;n is the probability that the test mistakenly judges the
true distribution (= p1 n , in fact,) as being p2 n . 2;n the other direction of the
error. Its quantum counterpart is de…ned by repalcing distributions by states,
and explicite form of these quantities are
C ( 1 jj 2 ) = sup ln DfP ( 1 jj 2 ) ;
0< <1
r 1
Hr (p1 jjp2 ) = sup ln DfP ( 1 jj 2 ) ;
0< <1 1 (1 )
where
1
DfP ( 1 jj 2 ) := tr 1 2 ; 0< < 1:
(See [1][3][4])
We con…rm these celebrated results for the case where = j' i h' j. In
fact, we have
1 n n
C ( 1 jj 2 ) = lim sup ln Dfmin 1 jj 2 ;
n!1 n 0< <1
r 1 n n
Hr (p1 jjp2 ) = lim sup ln Dfmin 1 jj 2 :
n!1 0< <1 1 n (1 )
" " is trivial. The achievability is not such straightforward in general since
the optimal measurement di¤ers for each . However, since our states are
pure, the supremum is in…nitely approximated by the sequence of measurements
independent of , and thus the equality holds. Also, the RHS can be computed
explicitly,
1 1
n n 2 (1 ) ln jh'1 j'2 ij ; 0 < 2;
ln Dfmin 1 jj 2 = 1
n 2 ln jh'1 j'2 ij ; 2 < 1:
The supremum is achived at = 0; 1, and the known result is con…rmed. In-
terestingly, even though they give the same supremum, they di¤er for almost
every value of .

18
8.3 Kullback-Leibler divergence
Let
ln ; ( 0) ;
fKL ( ) := ;
1; ( < 0) ;
Then
( R
p2 (x) ln pp12 (x)
(x) d (x) ; if supp p1 supp p2 ;
DfKL (p1 jjp2 ) =
1; otherwise:

is the Kullback-Leibler divergence.


Observe
fKL (t) = et 1

is not operator convex. However,


1 ln ( t) ; (t < 0) ;
f^KL2 (t) =
1; (t 0) ;
is operator convex. Both of

Z 1
DfKL (T0 ) ( 2 ) = esT0 2e
(1 s)T0
ds = 1;
0
Z 1
1 1
Df^KL2 (S0 ) ( 1 ) = (s1 S0 ) 1 (s1 S0 ) ds = 2
0

are di¢ cult to solve explicitly. But using these solutions,


Dfmin
KL
( 1 jj 2 ) = tr 2 T00 ln T00
= tr 1 ln S00 ;
where
T00 := eT0 1
; S00 = S0 :
Also, applying Theorem 3, Dfmin
KL
( 1 jj 2 ) is …nite only if 2 > 0.

8.4 Total variation distance


min
Total variation distance k 1 2 k1 equals, as is well known, DfTV . We con…rm
this result using our method. Since fTV is operator convex on dom f . Here,
it is important to chose the standard f , since fTV2 is not operator convex, for
example. Then
Dfmin
TV
( 1 jj 2 ) = sup tr 1T tr 2T
T : 1 spec T 1

= sup tr ( 1 2) T =k 1 2 k1 :
T : 1 spec T 1

Note, fTV does not satisfy (22) nor (21). Indeed, the supremum is achived
by a T whose eigenvalues are at the both end of the domain of fTV .

19
9 Relation to quantum Fisher information
Consider a parameterized family fp g 2R of probability density functions over
a …nite set, and suppose ! p is smooth, and supp p supp p0 . Suppose
also f is a convex function with all good features. Then using Taylor expansion
of f ,
1 1
lim 2 (Df (p jjp )
0 Df (p jjp )) = f 00 (1) J 2 ;
0!
( 0 ) 2
where
X (@p (x) =@ )2
J :=
x
p (x)
is the Fisher information of the family fp g 2R . This quantity characterizes
asymptotic behaviour of estimate of , and the above mentioned relation is also
important in deriving these results, and in relating estimation and hypothesis
test.
Thus exploring its quantum analogue is also of interest. Let f g 2R be a
family of density operators, and suppose ! is smooth. Then our task
min 0 2 0
here is to evaluate Df ( jj 0 ) up to ( ) for small j j. Naively
exchanging the order of limit and optimization, we have
1 1 00 0 2
lim Dfmin ( jj 0 ) Dfmin ( jj ) = f (1) max J M ( ) ;
!0
( 0 )2 2 M

where J M is the Fisher information of the family pM 2R


, and pM is the
distribution of the data of the measurement M applied to . If this identity is
true, we can use the well-known identity

max J M = J S :
M

Here,
2
J S := tr LS ;
is the SLD Fisher information and LS is called the symmtric logarithmic deriv-
ative of f g 2R is de…ned as a Hermitian operator satisfying the equation

@ 1 S
= L + LS :
@ 2
SLD Fisher information, like its classical analogue, well characterize the asymp-
totic behaviour of the optimal estimate of the unknown parameter . Thus, by
Dfmin ( jj ) = f (1), we obtain

1 1 00
lim Dfmin ( jj 0 ) Dfmin ( jj ) = f (1) J S : (28)
!0
( 0 )2 2

If all members of f g 2R are supported on the same subspace, it is not very


di¢ cult to make the above argument rigorous. Since pM does not vanish. If

20
has non-trivial kernel, however, the remainder term of the Taylor expansion is
not necessarily bounded due to 1=pM -factor in the de…nition of the f -divergence.
For example, suppose ( 2 R) are pure states. Then Dfmin ( > 1; < 0),
min min
DfKL and DfKL diverge, and (28) is never true. On the other hand, in the case
2
that f ( 1) is …nite, f = f (0 < 1=2) for example, it is easy to see the
LHS of (28) equals constant multiple of J S . Hence, the naive argument as above
is not completely regorous, but not totally false. Below, we give deeper analysis
on this issue. As it will turn out, (28) requires some non-trivial correction, when
the rank of is neither full nor 1.
To use Theorem 11, we suppose the assumption (I) holds and dom f is un-
bounded from below. Also, suppose f is three times continuously di¤erentiable
in the neighbourhood of 1, and
f 00 (1) > 0.
(The last assumption is necessary for the f -divergence not to be constant at
0
0
.) This means t0 := f (1) lies in the interior of dom f , and
f (t) = t f 0 1
(t) f f0 1
(t)
is three times continuously di¤erentiable at the neighbourhood of t0 . Using
above identity,
f (t0 ) = t0 f (1) ;
f 0 (t0 ) = 1;
00 1
f (t0 ) = > 0;
f 00 (1)
For simplicity, in addition we also suppose the rank of is constant for all ,
and the map ! is three times continuously di¤erentiable.
By Theorem 11 we only have to maximize

G (T ; ; 0 ) := tr T tr 0 f (T ) f ( 1) (1 tr 0 )
where T moves over all the Hermitian operators living in the support H of ,
is the projector onto H , and spec T is a subset of dom f . Observe, if = 0,
the optimal T is t0 . Thus, we put
T 0 = t0 + Y:
0
Then, using Lemma 7, we have, for small j j,
1
kY k r0 1 k 0 k sup 00
(29)
t2[t0 c;t0 +c] f (t)
as detailed in Appendix B. Here, r0 is the smallest eigenvalue of and c is a
positive number such that
1
sup 00
< 1:
t2[t0 c;t0 +c] f (t)

21
Such c > 0 exists since f is three times continuously di¤erentiable at the
neighbourhood of t0 and f 00 (t0 ) > 0.
Since f is three times continuously di¤erntiable in the neighbourhood of t0
by assumption,
1 1
f (t0 + ) = f (t0 ) + f 0 (t0 ) + f 00 (t0 ) 2 + f 000 (t0 ) 3
2 6
1 2 1
= f (t0 ) + + 00 + f 000 (t0 ) 3 ;
2f (1) 6
for a certain t0 between t0 + and t0 . Due to the fact that t0 commute with
Y and t0 = f (1) + f (t0 ),

G (t0 + Y ; ; 0)
= tr (t0 + Y ) tr 0 f (t0 +Y) f ( 1) (1 tr 0 )
1
= tr (t0 +Y) tr 0 f (t0 ) + Y + Y2 f ( 1) (1 tr 0 )+C
2f 00 (1)
1 n o
2
= f (1) + (f (t0 ) f ( 1)) (1 tr 0) + tr 0 Y02 tr 0 (Y0 Y) +C
2f 00 (1)
1
f (1) + (f (t0 ) f ( 1)) (1 tr 0) + tr 0 Y02 + C;
2f 00 (1)
where C is bounded from above as
1 000 3
jCj sup f (t0 ) kY k
6 t0 2[t0 kY k;t0 +kY k]
0 3
= O( ) ;

and Y0 is the solution to Lyapunov equation


1
( 0 ) ( 0 Y0 + Y0 0 ) = 0: (30)
2f 00 (1)
3
Siince G (t0 + Y0 ; ; 0 ) coincide with the last end except for O ( 0
) terms
(here observe kY0 k = O ( 0 ). ), we have

sup G (t0 + Y ; ; 0) f (1)


Y
1 3
= (f (t0 ) f ( 1)) (1 tr 0 )+ tr 0 Y02 + O ( 0
) :
2f 00 (1)
Snce the rank of is preserved while moves, we can write

= A Ay ;
where the family fA g 2R satis…es

@ 1
A = LS A :
@ 2

22
Then
1
A 0 = A + LS A ( 0
) + C1 ;
2
0 2
where C1 is O ( ) . De…ne the notations

LS;1 := LS ; LS;2 := (1 )LS (1 ):

Then since
1 0
0 = ( ) LS;1 + LS;1 + C2 ;
2
we have, by (30),
Y0 = f 00 (1) ( 0
) LS;1 + C3 ;
0 2
where C2 and C3 are O ( ) . Also,

(1 ) 0 (1 )
y
1 S 0 1 S 0
= (1 ) L A ( ) + C1 L A ( ) + C1 (1 )
2 2
y
1 S;2 0 1 S;2 0
= L A ( ) + (1 )C1 L A ( ) + (1 )C1
2 2
1 0 2
= ( ) LS;2 LS;2 + C4
4
0 3
where C4 is O ( ) . After all, since Dfmin ( jj ) = f (1),

1
lim 2 Dfmin ( jj 0 ) Dfmin ( jj )
0 ! ( 0 )
1 2 f 00 (1) 2
= (f (t0 ) f ( 1)) tr 0 LS;2 + tr LS;1 :
4 2
If is full rank ( in this case, LS;2 = 0) or pure (in this case LS;1 = 0)
the LHS of () equals a constant multiple of J S , though the constant di¤ers
depending on which case it is. But if the rank of is full or 1, the result is a
weighted sum of two components of SLD Fisher information, one is concerned
with the change on the support of and the other is concerned with the change
on the kernel of .

10 Conclusions and further questions


Using tools from convex analysis and matrix analysis, the maximization of the
output classical f -divergence

1. What is the necessary and su¢ cient condition of f such that (15) holds?
2. Rewrite the condition that f is operator monotone to a condition on f .

23
References
[1] K.M.R. Audenaert, J. Calsamiglia, Ll. Masanes, R. Munoz-Tapia, A. Acin,
E. Bagan, F. Verstraete, " Discriminating states: the quantum Cherno¤
bound," Physics Review Letters 98 160501, (2007)
[2] R. Bahatia, Matrix Analysis, Springer, (1997)
[3] F. Hiai, M. Mosonyi, D. Petz and C. Beny, "Quantum f-divergences and
error corrections," Reviews in Mathematical Physics, 23, 691–747 (2011)

[4] H. Nagaoka, "The converse part of the theorem for quantum Hoe¤ding
bound," quantph/0611289 .
[5] R. Rockafellar, Convex analysis (Princeton, 1970).
[6] H. Strasser, "Mathematical Theory of Statistics", de Gruyter (1985).
[7] E. Torgersen, "Comparison of Statistical Experiments", Cambridge Univer-
sity Press (1991).
[8] D. Luenberger, Optimization by vector space method (Wiley, 1969).

A The proof of Lemma 8


If dom f is the whole real line, any operator convex function f is a quadratic
function. Since in addition f is monotone increasing (as a real valued function),
it is an a¢ ne function with positive slope, contradicting f ( 1) > 1. Thus
dom f = ( 1; a) or ( 1; a].
If dom f = ( 1; a], h (t) := f ( t + a) is monotone non-increasing, dom h =
(0; 1) or [0; 1) and h (1) > 1. Since h is monotone non-increasing and
proper, h (1) < 1. Therrefore, h (1) is …nite, and trivially,

h (t)
lim = 0:
t!1 t
Thus, by Proposition 8.4 of [3], h can be written as
Z
t
h (t) = h (0) + t d ( )
(0;1) t +
R 1
using a non-negative measure with (0;1) 1+
d ( ) < 1: Since
Z
h (t) h (0) 1
= d ( )
t (0;1) t+

and Z
1
lim d ( )=0
t!1 (0;1) t+

24
by Lebesgue’s dominated convergence theorem, we have
= 0:
Otherwise, there is t > 0 such that h (t) < h (0), contradicting with the assump-
tion that h (t) is monotone decreasing. Since the function
t
t! = 1+
t+ t+
are oprator monotone non-increasing, h (t) is operator monotone decreasing,
implying the assertion.
If dom f = ( 1; a), due to the above argument, f is operator monotone
increasing on ( 1; a "] for any " > 0. Suppose the spectrum of A1 and A2 is
a subset of ( 1; a], and A1 A2 . Then since A1 "1 A2 "1,
f (A1 "1) f (A2 "1) :
Letting " ! 0, we have f (A1 ) f (A2 ), meaning that f is operator monotone
increasing on ( 1; a).

B Proof of (29)

Let b and b0 be as of (10) and (11), respectively, with 1 = and 2 = 0


.

1 1=2 1=2
1 = 1 ( 0 )
b
1=2 1=2
= ( 0 )
r0 1 k 0 k
By rearranging terms, we have

1
b
1 r0 1 k 0 k
1
1 + r0 k 0 k:
Almost in parallel manner,
b0 1 r0 1 k 0 k:
Then by Lemma 7,
kY k max fjt t0 j ; jt0 t0 jg
1
sup 00 (t)
max fjb 1j ; jb0 1jg
f 0 (t)2[b0 ;b ] f
1
r0 1 k 0 k sup 00
:
f 0 (t)2[b0 ;b ] f (t)

25
Since f 0 is continuous, when r0 1 k 0 k is small enough, the set of all
t with f 0 (t) 2 [b0 ; b ] is the subset of [t0 c; t0 + c]. Therefore, we have (29).

26

You might also like