This action might not be possible to undo. Are you sure you want to continue?
M˘ad˘alin Gut ¸˘a
School of Mathematics
University of Nottingham
1
The old paradigm
Quantum Mechanics up to the 80’s
Quantum measurements have random results
Only probability distributions can be predicted
Perform measurements on huge ensembles
Observe averages
Old Paradigm
It makes no sense to talk about individual systems
E. Schr¨odinger [1952]: We are not experimenting with single
particles, any more than we can raise Ichtyosauria in the zoo
2
The new paradigm
Individual quantum systems are carriers of a new type of information
Delft qubit [2003] [Naik et al, Nature, 2006]
[H¨aﬀner et al, Nature, 2005]
[Monroe, Nature, 2002]
3
Quantum Information Science
Quantum Information
quantum entropy
correlations (entanglement) between quantum systems
capacity of quantum channel for information transmission
Quantum Computation
algorithms for quantum computers (e.g. Shor’s factoring algorithm)
error correction theory
diﬀerent practical implementations of quantum circuits
(ion traps, photons, solid state...)
Quantum Filtering and Control
stochastic evolution and continuous time measurements
protecting quantum systems from ‘decoherence’
steering systems towards a desired state
Quantum Probability and Statistics
uniﬁed framework for classical and quantum stochastics
measurement design for optimal statistical inference
use probabilistic ideas in operator algebra theory
4
Quantum measurements
Every quantum system has an associated Hilbert space, e.g. C
d
Density matrix (quantum state) encodes all information about the
preparation of the system
ρ =
0
B
B
B
@
ρ
11
ρ
12
. . . ρ
1d
ρ
21
ρ
22
. . . ρ
2d
.
.
.
.
.
.
.
.
.
.
.
.
ρ
d1
ρ
d2
. . . ρ
dd
1
C
C
C
A
≥ 0, Tr(ρ) = 1
A measurement M with values in Ω = 1, 2, . . . , k¦ is given by a ‘positive
operator valued measure’
M
i
∈ M(C
d
), M
i
≥ 0,
k
X
i =1
M
i
= 1
Statistical interpretation: The outcome X is random and the probability
that X = i when the system is prepared in state ρ is
P
(M)
ρ
([X = i ]) = Tr(ρM
i
)
5
Quantum Statistics
Quantum mechanics makes predictions about the direct map:
M : ρ −→P
(M)
ρ
What if ρ is not known ? A quantum statistical model (experiment) is a
family of states indexed by a parameter θ belonging to a space Θ
Q := ρ
θ
: θ ∈ Θ¦
For each M we obtain a classical statistical model 1
M
:= P
(M)
ρ
θ
: θ ∈ Θ¦
and we can apply ‘classical’ statistical tools to solve inverse problems like
X ∼ P
(M)
ρ
−→
ˆ
θ(X) ≈ θ
Questions:
for which measurements is θ identiﬁable?
which measurements are optimal for a given statistical problem?
how much statistical information does Q contain?
can we develop a theory of statistical models at the quantum level ?
6
Motivation/Applications
Mesure quantique
ρ
θ
∼∼∼∼∼∼∼∼∼∼ M
appareil
de mesure
 
X ∼ P
θ
M
r´esultat
ˆ
θ (X)
estimateur
Quantum Engineering
statistical validation through measurements of new quantum states
and devices
quantum state/process estimation
Quantum Information and Computation
encoding and decoding information with quantum states
state discrimination
Statistics
extend statistical decision theory to noncommutative models
connections with Quantum Probability, Quantum Control...
7
History of Quantum Statistics
R.L. Stratonovich
[1966] Transmission rates for quantum channels
C. W. Helstrom
[19671976] “Quantum Detection and Estimation Theory”
V. P. Belavkin
[1975] Optimal multiple hypothesis testing
[1976] Generalised uncertainty relations
8
History of Quantum Statistics
A. S. Holevo
[1972] noncommutative statistical decision theory
[1982] “Probabilistic and Statistical Aspects of Quantum Theory”
The Japanese
School
H. Nagaoka, A. Fujiwara, M. Hayashi, K. Matsumoto
[1987] Diﬀerential geometric aspects of quantum state estimation
[1996] Quantum Fisher information and asymptotic estimation
R. D. Gill
[1998] Asymptotic bounds and optimal quantum estimation
[2001] Statistical approach to Bell inequalities
9
Useful references
BOOKS
1. C. W. Helstrom, Quantum Detection and Estimation Theory (1976)
2. A. S. Holevo: Probabilistic and statistical aspects of quantum theory (1982)
3. M.A. Nielsen and I.L. Chuang: Quantum computation and quantum
information (2000)
ONLINE LECTURE NOTES
1. R. Gill et. al: Quantum Statistics [book draft]
http://www.math.leidenuniv.nl/∼gill/teaching/quantum/pages from Qbook.pdf
2. H. Maassen: Quantum Probability Theory
http://www.math.ru.nl/ maassen/lectures/qp.pdf
3. N.P. Landsman: Lecture notes on C
∗
algebras, Hilbert C
∗
modules and
Quantum Mechanics
http://xxx.lanl.gov/pdf/mathph/9807030
PAPERS
1. Artiles, L, Gill, R., Guta, M., An invitation to quantum tomography, J.
Royal Statist. Soc. B, 67, (2005), 109134.
2. BarndorﬀNielsen O.E., Gill, R., Jupp, P. E., On quantum statistical
inference (with discussion), J. R. Statist. Soc. B, 65, (2003), 775816.
3. Guta M., Janssens B., Kahn J., Optimal estimation of qubit states with
continuous time measurements, Commun. Math. Phys., 277, (2008),
127160.
10
Color code
red is used for keywords
brown is used for notions which are deﬁned in appendices
11
Quantum Mechanics as noncommutative probability theory
(Ω, Σ, ν)
measure space
‘Space’
1
Hilbert space
L
∞
(Ω, Σ, ν)
bounded random
variables
Observables
B(1)
bounded selfadjoint
operators
p ∈ L
1
(Ω, Σ, ν)
probability densities
States
ρ ∈ T
1
(1)
density matrices
(p, f ) →
R
p(ω)f (ω)ν(dω)
Pairing =
expectations
(ρ, A) →Tr(ρA)
L
∞
(Ω, Σ, ν) = L
1
(Ω, Σ, ν)
∗
Duality B(1) = T
1
(1)
∗
T : L
1
(Ω, Σ, ν) →L
1
(Ω, Σ, ν)
randomisations
positive
normalised
Transformations
C : T
1
(1) →T
1
(1)
quantum channel
complet. positive
normalised
12
Hilbert spaces
Inner product space
Hilbert space
Orthonormal basis
Physical examples
13
Inner product spaces
Deﬁnition
An inner product over a Clinear space V is a map ¸, ) : V V →C
satisfying the following conditions for all u, v ∈ V, λ ∈ C:
¸u, u) ≥ 0 for all u ∈ V and ¸u, u) = 0 if and only if u = 0.
¸u, v + w) = ¸u, v) +¸u, w)
¸u, λv) = λ¸u, v)
¸u, v) = ¸v, u)
Example
C
n
: ntuples u := (u
1
, u
2
, . . . u
n
) of complex numbers with
¸u, v) =
n
X
j =1
u
j
v
j
C[a, b]: continuous complex valued functions on [a, b] with
¸f , g) =
Z
b
a
f (x)g(x)dx
14
Hilbert spaces
Deﬁnition (Hilbert space)
An inner product space (1, ¸, )) is called a Hilbert space if it is complete with
respect to the norm h :=
p
¸h, h).
Example
L
2
([a, b]): the space of square integrable functions on [a, b] with
¸f , g) :=
Z
f (x)g(x)dx
L
2
(Ω, Σ, P) the space of square integrable random variables on (Ω, Σ, P)
with
¸X, Y) := E(XY) =
Z
X(ω)Y(ω)P(dω)
15
Orthonormal basis (ONB) in a separable Hilbert space
Deﬁnition
Let (1, ¸, )) be a Hilbert space. A sequence of vectors e
k
¦
1≤k≤N
is an ONB
of 1 if its linear span is dense in 1 and ¸e
i
, e
j
) = δ
i ,j
for all i , j .
1 is separable if and only if it has a countable ONB.
Properties
Any vector x ∈ 1 has a unique decomposition x =
P
k
x
k
e
k
where
x
k
= ¸e
k
, x) are the Fourier coeﬃcients w.r.t. the ONB e
k
¦. The
following Parseval equality holds
x
2
=
X
k
[¸e
k
, x)[
2
.
If l is a closed subspace of 1 and l
⊥
its orthogonal complement then x
has a unique decomposition x = y + y
⊥
with y ∈ l and y
⊥
∈ l
⊥
, and
x
2
= y
2
+y
⊥

2
. The vector y is called the orthogonal projection of
x onto l and satisﬁes
y = arg min
z∈I
z −x
16
Direct sum and tensor products of Hilbert spaces
Deﬁnition
Let 1
1
, 1
2
be Hilbert spaces.
1. The direct sum 1
1
⊕1
2
is the Hilbert space consisting of ordered pairs
h
1
⊕h
2
≡ (h
1
, h
2
) ∈ 1
1
1
2
with inner product
¸g
1
⊕g
2
, h
1
⊕h
2
) = ¸g
1
, h
1
) +¸g
2
, h
2
)
1
1
and 1
2
can be seen as orthogonal complements in 1
1
⊕1
2
by
identifying h
1
∈ 1
1
with h
1
⊕0 and h
2
∈ 1
2
with 0 ⊕h
2
.
2. The tensor product 1
1
⊗1
2
is Hilbert space obtained as the norm
completion of the algebraic tensor product 1
1
¸1
2
w.r.t the inner product
¸g
1
⊗g
2
, h
1
⊗h
2
) := ¸g
1
, h
1
)¸g
2
, h
2
)
If e
i
¦ and f
j
¦ are ONB in 1
1
and 1
2
then e
i
⊗f
j
¦ is an ONB in
1
1
⊗1
2
17
Which Hilbert space corresponds to a given quantum system ?
C
2
for a spin, 2 level system, qubit
C
2
⊗C
2
⊗ ⊗C
2
for n qubits
L
2
(R) for a particle in one dimension, harmonic oscillator
T = ⊕
∞
n=0
1
⊗
s
n
for bosonic many particle systems, quantum noise
L
2
(Ω, Σ, ν) square integrable random variables on (Ω, Σ, ν)
18
Hilbert space Operators
Bounded operators
The adjoint
Selfadjoint operators
Unbounded selfadjoint operators
19
Bounded selfadjoint operators
Deﬁnition
Let 1 be a Hilbert space. A linear map A : 1 →1 is called a bounded linear
operator on 1 if
A := sup
h¸=0
Ah
h
< ∞.
The space of bounded operators on 1 is denoted B(1).
Example (exercise)
Any linear transformation of C
d
is bounded
The shift S
y
given by (S
y
f )(x) = f (x −y) is a bounded operator on L
2
(R)
The Volterra operator Tf (s) =
R
s
0
k(s, t)f (t)dt with [k(s, t)[ < C is a
bounded operator on L
2
([0, 1])
Theorem
(B(1),  ) is a Banach algebra, i.e. a Banach space which is also an algebra
and satisﬁes A B ≤ A B for all A, B ∈ B(1)
20
Adjoint, selfadjoint, C
∗
property
Deﬁnition
Let A ∈ B(1). The adjoint A
∗
of A is deﬁned by
¸g, A
∗
h) = ¸Ag, h)
A is called selfadjoint if A = A
∗
Lemma (C
∗
property)
Let A ∈ B(1). Then A
∗

2
= A
2
= A
∗
A.
Proof.
From
Ah
2
= ¸Ah, Ah) = ¸h, A
∗
Ah) ≤ h
2
A
∗
A
we get A
2
≤ A
∗
A.
Together with A
∗
A ≤ AA
∗
 this implies A ≤ A
∗
.
A similar argument show that A
∗
 ≤ A
21
Examples of bounded selfadjoint operators
Let 1 = C
d
and let e
i
¦ be the standard basis in C
d
. Then
B(1) ≡ M(C
d
) by identifying A with the matrix [A
i ,j
], where
A
i ,j
= ¸e
i
, Ae
j
)
Then, A ∈ B(1) is selfadjoint iﬀ A
i ,j
= A
j ,i
( hermitian matrix)
The Pauli matrices
σ
x
=
„
0 1
1 0
«
σ
y
=
„
0 −i
i 0
«
σ
z
=
„
1 0
0 −1
«
are selfadjoint and form a basis of M(C
2
) together with the identity 1.
Let l be a closed subspace of 1. Let x = y + y
⊥
be the unique
orthogonal decomposition of x ∈ 1, with y ∈ l and y
⊥
∈ l
⊥
.
The orthogonal projection P
I
onto l deﬁned by P : x →y is a selfadjoint
operator satisfying P = P
2
:
¸x
1
, Px
2
) = ¸x
1
, y
2
) = ¸y
1
, y
2
) = ¸Px
1
, Px
2
)
22
Examples of unbounded selfadjoint operators
Deﬁnition
An (unbounded) linear operator on 1 is deﬁned as a linear map
R : D(R) →1, whose domain D(R) is a dense linear subspace of 1.
The domain of R
∗
consists of those h for which there exists g := R
∗
(h) so that
¸Rk, h) = ¸k, g), ∀k ∈ D(R)
R is selfadjoint if D(R) = D(R
∗
) and R = R
∗
on their common domain.
Example
The position and momentum operators Q and P are selfadjoint on L
2
(R).
Q : h →(Qh)(x) = xh(x)
with domain D(Q) = h :
R
[xh(x)[
2
dx < ∞¦
P : f →−i
df
dx
with domain D(P) = f : f (b) −f (a) =
R
b
a
g(x)dx, g ∈ 1¦
23
Spectral Theorem
Spectral Theorem in ﬁnite dimensions
Spectrum and resolvent
Projection valued measures
Spectral Theorem for bounded selfadjoint operators
Continuous functional calculus
Spectral Theorem: multiplication operator form
L
∞
functional calculus
Multiplicity Theory
24
Spectral theorem in ﬁnite dimensions
Theorem (diagonalisation = spectral theorem)
Let A be selfadjoint operator on C
d
. Then there exists an ONB of eigenvectors
of A:
Af
k
= λ
k
f
k
, k = 1, . . . d
where λ
k
∈ R are the eigenvalues of A.
Let P
k
be the onedimensioanal projections associated to f
k
, then
A =
d
X
k=1
λ
k
P
k
=
0
B
B
B
@
λ
1
0 . . . 0
0 λ
2
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . λ
d
1
C
C
C
A
Remark
If A = A
∗
∈ M(C
d
) then A = max([λ
1
[, . . . , [λ
d
[)
If A, B are selfadjoint and commute, i.e. AB = BA, then they have a
commun eigenbasis, so they can be diagonalised simultaneously. If
AB ,= BA no such basis exists.
25
Resolvent and spectrum
Deﬁnition
Let A ∈ B(1). A complex number α is said to be in the resolvent set ρ(A) if
α1 −A is a bijection, with bounded inverse.
The spectrum of A is deﬁned as σ(A) = C \ ρ(A).
Properties
The spectrum σ(A) is
contained in the set α ∈ C : [α[ ≤ A¦
compact
nonempty
If A is selfadjoint then σ(A) ⊂ R and r (A) := sup
λ∈σ(A)
[λ[ = A
Example (exercise)
The matrix σ
+
:=
„
0 1
0 0
«
has spectrum σ(σ
+
) = 0¦
Let f ∈ C([0, 1]). The multiplication operator M
f
∈ B(L
2
([0, 1])) has
spectrum σ(M
f
) = y : f (x) = y for some x ∈ [0, 1]¦
if U is unitary (UU
∗
= U
∗
U = 1) then σ(U) ⊂ λ ∈ C : [λ[ = 1¦
26
Projection valued measure (PVM)
Deﬁnition
Let A
n
¦ be a sequence of operators in B(1). We say that
A
n
converges in norm to A ∈ B(1) if lim
n→∞
A
n
−A = 0
A
n
converges strongly to A ∈ B(1) if lim
n→∞
(A
n
−A)h = 0 for
any h ∈ 1
Deﬁnition
A projection valued measure (PVM) over a measure space (Ω, Σ) is a map
P : Σ →B(1) which satisﬁes
P(E) is an orthogonal projection for each E ∈ Σ
P is σadditive: for any countable family E
i
¦ of mutually disjoint sets
P(∪
∞
i =1
E
i
) =
P
∞
i =1
P(E
i
) (sum converging strongly)
P(Ω) = 1
For any unit vector h ∈ 1 we deﬁne the probability measure on (Ω, Σ)
P
h
(E) = ¸h, P(E)h)
27
Examples of projection valued measures
Example
Let e
i
¦
i =1,...,N
be an ONB in 1. Then the corresponding orthogonal
projections P
i
deﬁne a PVM over 1, . . . , N¦ with P(E) =
P
i ∈E
P
i
The measure P
h
is given by P
h
(i ) = [¸h, e
i
)[
2
Let 1 = L
2
(Ω, Σ, ν) and let P(E) be the projection onto the subspace of
functions with support in E
P(E) : f →f χ
E
The measure P
h
is given by P
h
(dω) = [h(ω)[
2
ν(dω)
28
Spectral Theorem
Theorem (Spectral Theorem)
Let A ∈ B(1) be selfadjoint. Then there exists a PVM P over R such that
A =
Z
λP(dλ)
in the sense that ¸h, Ah) =
R
λP
h
(dλ) for every h ∈ 1.
The PVM is supported by the spectrum: P(σ(A)) = 1.
Example (exercise)
The multiplication operator M
x
: f (x) →xf (x) on L
2
([0, 1]) does not have any
eigenvalue but has a ‘continuous’ PVM with
P(E) : f →f χ
E
the projection onto the subspace of L
2
([0, 1]) functions with support in E
29
Main steps of the proof
Continuous functional calculus: deﬁne f (A) for f ∈ C(σ(A))
Spectral Theorem  multiplication operator form: A is unitarily equivalent
to a multiplication operator on L
2
L
∞
calculus: deﬁne f (A) for f ∈ L
∞
(σ(A), µ) such that
P(E) = χ
E
(A)
30
Proof of the Spectral Theorem (I)
Theorem (Continuous functional calculus)
There is a unique map φ : C(σ(A)) →B(1) with the properties
(i) φ is a C
∗
algebra morphism, i.e.
φ(fg) = φ(f )φ(g), φ(λf ) = λφ(f ), φ(
¯
f ) = φ(f )
∗
, φ(1) = 1
(ii) φ is isometric: φ(f ) = f 
∞
(iii) let Id be the function Id(λ) = λ; then φ(Id) = A
Proof.
1. Let P(λ) =
P
p
n=1
a
n
λ
n
be a polynomial and let P(A) :=
P
p
n=1
a
n
A
n
. Then
(i ) and (iii ) are satisﬁed for polynomials by choosing φ(P) = P(A)
2. Show that σ(P(A)) = P(λ) : λ ∈ σ(A)¦ (exercise)
3. Show that P(A) = sup
λ∈σ(A)
[P(λ)[ :
P(A)
2
= P(A)
∗
P(A) = (
¯
PP)(A) = sup
λ∈σ((
¯
PP)(A))
[λ[
2.
= sup
λ∈σ(A)
[
¯
PP(λ)[
4. Extend φ by continuity from polynomials to the whole C(σ(A))
31
Proof of the Spectral Theorem (II)
Deﬁnition
Let h ∈ 1 be a unit vector. Then f →¸hf (A)h) is a positive linear functional
on C(σ(A)) and by the RieszMarkov Theorem there exist a probability
measure µ
h
on σ(A) such that ¸h, f (A)h) =
R
σ(A)
f (λ)µ
h
(dλ)
The measure µ
h
is called the spectral measure assciated to h.
Deﬁnition
A vector h is cyclic for A ∈ B(1) if the span of A
n
h¦
∞
n=0
is dense in 1.
Theorem
Let A ∈ B(1) be selfadjoint with cyclic vector h. Then there exists a unitary
U : 1 →L
2
(σ(A), µ
h
) such that
(UAU
−1
f )(λ) = λf (λ)
32
Proof of the Spectral Theorem (III)
Proof.
1. Deﬁne U by U : φ(f )h →f for all f ∈ C(σ(A)).Then U is norm preserving
by
φ(f )h
2
= ¸h, φ(f )
∗
φ(f )h) = ¸h, φ(
¯
f f )h) =
Z
[f (λ)[
2
µ
h
(dλ)
2. Since h is cyclic and C(σ(A)) is dense in L
2
, U can be extended to a
unitary operator
3. Check that UAU
−1
acts as multiplication by λ on functions in C(σ(A))
(UAU
−1
f )(λ) = [UAφ(f )h](λ) = [Uφ(Id)φ(f )h](λ) = [Uφ(Idf )h](λ) = λf (λ)
33
Proof of the Spectral Theorem (IV)
Remark
In general there may not exist a cyclic vector, for example if A has a degenerate
eigenvalue, i.e. there exist at least two linearly independent eigenvectors
Theorem (Spectral Theorem  multiplication operator form)
Let A ∈ B(1) be selfadjoint. Then there exist unit vectors h
i
¦
N
i =1
in 1 and a
unitary operator U : 1 →⊕
N
i =1
L
2
(R, µ
h
i
) such that
(UAU
−1
f )
i
(λ) = λf
i
(λ), i ≥ 1
Proof.
Using Zorn lemma we can split 1 into a direct sum of subspaces 1
i
such that
A leaves each 1
i
invariant, i.e. Ah ∈ 1
i
for all h ∈ 1
i
For each i there exists a vector h
i
∈ 1
i
which is cyclic for A[
;
i
We then apply the previous Theorem for each cyclic subspace
34
Proof of the Spectral Theorem (V)
Theorem (L
∞
functional calculus)
Let µ be a probability measure on σ(A) such that µ ∼ µ
h
i
¦
i ≥1
, i.e.
µ(E) = 0 iﬀ µ
i
(E) = 0 for all i
Then there exists a unique morphism
˜
φ : L
∞
(σ(A), µ) →B(1) such that
(i)
˜
φ is an extension of φ : C(σ(A)) →B(1) (Continuous Functional Calc.)
(ii)
˜
φ is isometric
(iii) P(E) =
˜
φ(χ
E
)
(iv)
˜
φ is normal, i.e. it is continuous with respect to the weak
∗
topology on
L
∞
(σ(A), µ) and B(1)
35
Proof of the Spectral Theorem (V)
Proof of (i)(iii).
1. By the previous Theorem, for any f ∈ C(σ(A)) we have
φ(f ) = U
−1
[⊕
i
M
i
(f )] U
where M
i
(f ) is the multiplication by f on L
2
(σ(A), µ
h
i
)
2. This map can be extended to L
∞
(σ(A), µ). Indeed since µ ¸µ
h
i
, the
operator M
i
(f ) : g →f g is well deﬁned on L
2
(σ(A), µ) (exercise)
3.
˜
φ is isometric: for any f ∈ L
∞
(σ(A), µ) there is an i such that
f 
∞
= M
i
(f )
4. We have
f (A) :=
˜
φ(f ) =
Z
f (λ)P(dλ), f ∈ L
∞
(σ(A), µ)
where the spectral projections of A are
P(E) = χ
E
(A) =
˜
φ(χ
E
) = U
−1
[⊕
i
M
i
(χ
E
)] U
36
Further spectral analysis: multiplicity theory
The choice of cyclic vectors h
i
and spectral measures µ
h
i
is not unique, and it
is not clear how to use them to answer the following natural question:
Question: Given two selfadjoint operators A, B, does there exist a unitary V
such that A = VBV
−1
?
Answer in ﬁnite dimensions: two selfadjoint matrices are unitarily equivalent if
they have the same spectrum and same multiplicities for each eigenvalue.
Theorem (HahnHellinger)
Any selfadjoint operator is unitarily equivalent to the multiplication operator on
⊕
ℵ
0
i =1
⊕
i
k=1
L
2
“
σ(A), µ
(A)
i
”
where all measures µ
(A)
i
are mutually disjoint.
Two operators are unitarily equivalent if and only if all their measures are
equivalent µ
(A)
i
∼ µ
(B)
i
, i.e. they have the same sets of measure zero.
Reference: V.S. Sunder, Functional AnalysisSpectral Theory, Birkh¨auser, (1998)
37
Traceclass operators
The trace
Polar decomposition
Traceclass operators
Duality between T
1
(1) and B(1)
38
The trace
Deﬁnition
The trace of a positive operator A ∈ B(1) is deﬁned by
Tr(A) =
X
k
¸e
k
, Ae
k
)
where e
k
¦ is an ONB. The trace is independent of the basis and has the
following properties
Tr(A) is independent of the ONB e
i
¦
Tr(A + B) = Tr(A) + Tr(B)
Tr(λA) = λTr(A), λ ≥ 0
Tr(UAU
∗
) = Tr(A) for all unitaries U
if 0 ≤ A ≤ B then Tr(A) ≤ Tr(B)
39
The trace is independent of the ONB
Proof.
Given the ONB e
k
¦ deﬁne Tr
e
(A) =
P
k
¸e
k
, Ae
k
). Let f
j
¦ be another ONB.
Then
Tr
e
(A) =
X
k
¸e
k
, Ae
k
) =
X
k
A
1/2
e
k

2
=
X
k
X
j
[¸f
j
, A
1/2
e
k
)[
2
!
=
X
j
X
k
[¸A
1/2
f
j
, e
k
)[
2
!
=
X
j
A
1/2
f
j

2
= Tr
f
(A)
where the sums can be exchanged since all terms are positive.
The other properties are left as an exercise
40
The polar decomposition
Deﬁnition
An operator W ∈ B(1) is called a partial isometry if both WW
∗
and
W
∗
W are orthogonal projections
The absolute value of B ∈ B(1) is deﬁned by [B[ =
√
B
∗
B ≥ 0
Theorem (polar decomposition)
Let B ∈ B(1). Then there exists a partial isometry W such that B = W[B[.
W is uniquely determined by the condition that Ker (W) = Ker (B).
Sketch of the proof.
The map W : Ran([B[) →Ran(B) given by W : [B[h →Bh is well
deﬁned since
[B[h
2
= ¸[B[h, [B[h) = ¸h, [B[
2
h) = ¸h, B
∗
Bh) = Bh
2
Extend W to extends to an isometry from Ran([B[) to Ran(B) and to
zero on Ran([B[)
⊥
.
Since Bh = 0 ⇔[B[h = 0, we have Ker(W) = Ker([B[) = Ker(B)
41
Traceclass operators
Deﬁnition
The space of traceclass operators is
T
1
(1) = τ ∈ B(1) : τ
1
:= Tr[τ[ < ∞¦.
Properties
1. T
1
(1) is a Banach space
2. Let A ∈ T
1
(1) be selfadjoint. Then A has a complete basis of eigenvectors
e
i
with eigenvalues λ
i
such that A =
P
i
λ
i
P
e
i
and A
1
=
P
i
[λ
i
[
3. If A ∈ T
1
(1) and B ∈ B(1) then
A
∗
, AB, BA ∈ T
1
(H)
Tr(AB) = Tr(BA)
Tr(A
∗
) = Tr(A)
Tr(AB) ≤ A
1
· B
Remark
Point 2. is a particular case of the following: any selfadjoint compact operator
has discrete spectrum with λ
i
→0.
Point 3. can be proved using point 2. and the polar decomposition (exercise)
42
Point 1.: proof of the triangle inequality
Let A, B ∈ T
1
. We will show that A + B
1
≤ A
1
+B
1
. Consider the
polar decompositions
A + B = U[A + B[, A = V[A[, B = W[B[
Then
Tr([A+B[) =
X
k
¸e
k
, U
∗
(A+B)e
k
) ≤
X
k
[¸e
k
, U
∗
V[A[e
k
)[+
X
k
[¸e
k
, U
∗
W[B[e
k
)[
Now by applying the CauchySchwarz inequality twice
X
k
[¸e
k
, U
∗
V[A[e
k
)[ ≤
X
k
[A[
1/2
V
∗
Ue
k
 [A[
1/2
e
k

≤ (
X
k
[A[
1/2
V
∗
Ue
k

2
)
1/2
(
X
k
[A[
1/2
e
k

2
)
1/2
The sums on the right side are equal to Tr(U
∗
V[A[V
∗
U) and Tr([A[).
Using the fact that U, V are partial isometries one can show that
Tr(U
∗
V[A[V
∗
U) ≤ Tr([A[) hence the left side is smaller that A
1
43
B(H) = T
1
(H)
∗
Deﬁnition
Let V be a Banach space. The dual V
∗
is the space of continuous linear maps
t : V →C. V
∗
is a Banach space when endowed with the norm
t = sup
jvj=1
[t(v)[
Theorem
The space (B(1),  ) is the dual of T
1
(1) with the pairing
T
1
(1) B(1) ¸ (τ, A) →Tr(τA)
Sketch of the proof.
1. Show that B(1) ⊂ T
1
(1)
∗
.
Let B ∈ B(1). Since [Tr(Bτ)[ ≤ Tr([τ[)B, the linear functional
τ →Tr(τB) is bounded on T
1
(1)
2. Show that T
1
(1)
∗
⊂ B(1).
Let ∈ T
1
(1)
∗
. Then ([h)¸k[) = ¸k, Bh) = Tr(B[h)¸k[) for a B ∈ B(1).
Use the fact that ﬁnite rank operators are  
1
dense in T
1
(1)
44
States, Observables and Measurements
States and observables in Quantum Mechanics
The weak
∗
topology
Measurements as (completely) positive maps
Positive operator valued measures
Naimark’s dilation Theorem
45
States and observables in quantum mechanics
Deﬁnition
Let 1 be the Hilbert space associated to a quantum system
A (bounded) observable is deﬁned as a selfadjoint operator A ∈ B(1)
A density matrix is a positive traceclass operator ρ such that Tr(ρ) = 1
A state on B(1) is a linear functional ϕ : B(1) →C of the form
ϕ(A) = Tr(ρA)
where ρ is a density matrix.
Lemma (exercise)
Let A =
R
σ(A)
λP(dλ) be an observable and let ϕ be a state with density
matrix ρ. Then
P
ρ
(E) = ϕ(χ
E
(A)) = Tr(P(E)ρ), E ∈ Σ ∩ σ(A)
deﬁnes a probability distribution over σ(A).
46
Probabilistic interpretations
Probabilistic interpretation for measurements of observables
If we measure the observable A =
R
λP(dλ) of a system prepared in state
ϕ with density matrix ρ, we obtain a random result X ∈ σ(A) with
distribution P
ρ
.
Probabilistic interpretation for mixtures of states
Recall that any selfadjoint τ ∈ T
1
(1) has the spectral decomposition
τ =
P
λ
i
P
i
. In particular if τ is a density matrix then λ
i
≥ 0 and
P
i
λ
i
= 1.
The space of density matrices (states) S(1) is convex and its extremal
points are the one dimensional projections [h)¸h[ called pure states.
If a system is prepared randomly in state ρ
i
with probability µ
i
(µ
i
≥ 0,
P
µ
i
= 1) and i is unknown, then the corresponding state is
ρ =
P
i
µ
i
ρ
i
47
The weak
∗
topology
Deﬁnition
Let V be a Banach space. The weak
∗
topology on the dual V
∗
is deﬁned by
the convergence criterion (on nets) :
n
w
∗
→ iﬀ
n
(v) →(v) for all v ∈ V.
Example
L
∞
(Ω, Σ, µ) = L
1
(Ω, Σ, µ)
∗
f
n
w
∗
→ f iﬀ
R
p(ω)f
n
(ω)µ(dω) →
R
p(ω)f
n
(ω)µ(dω) for all p ∈ L
1
(Ω, Σ, µ)
B(1) = T
1
(1)
∗
A
n
w
∗
→ A iﬀ Tr(τA
n
) →Tr(τA) for all τ ∈ T
1
(1)
Theorem
Let V be a Banach space. The linear functionals on V
∗
which are continuous
wit respect to the weak
∗
topology are precisely those of V ⊂ V
∗∗
, ˜ v() := (v)
for v ∈ V and ∈ V
∗
.
48
weak
∗
continuity of
˜
φ
Recall
The L
∞
functional calculus Theorem associates to the selfadjoint operator A a
morphism
˜
φ : L
∞
(σ(A), µ) →B(1)
Lemma
˜
φ is continuous with respect to the weak
∗
topology.
Sketch of the proof.
Let ρ ∈ S(1).
P
ρ
is dominated by µ. Indeed
µ(E) = 0 ⇒P(E) = 0 ⇒P
ρ
(E) = Tr(ρP(E)) = 0
Thus P
ρ
has density p
ρ
=
dP
ρ
dµ
∈ L
1
(σ(A), µ)
If f
n
w
∗
→ f then Tr(ρ
˜
φ(f
n
)) =
R
f
n
(ω)p
ρ
(ω)µ(dω) →
R
f (ω)p
ρ
(ω)µ(dω)
49
Measurements as (completely) positive unital maps (I)
Deﬁnition
Let φ : L
∞
(σ(A), µ) →B(1) be a weak
∗
continuous morphism (previously
denoted
˜
φ). We deﬁne
φ
∗
: T
1
(1) →L
1
(σ(A), µ)
by the duality
Tr(τφ(f )) =
Z
f (ω)p
τ
(ω)µ(dω), p
τ
:= φ
∗
(τ)
˜
φ
∗
has the following properties:
it is linear and positive, i.e. φ
∗
(τ) ≥ 0 if τ ≥ 0
it is normalised, i.e. p
ρ
is a probability density if ρ is a density matrix
50
Measurements as (completely) positive unital maps (II)
Theorem
Let M : L
∞
(Ω, Σ, µ) →B(1) be a linear map such that:
M is positive, i.e. M(f ) ≥ 0 if f ≥ 0
M is unital, i.e. M(1) = 1
M is continuous with respect to the w
∗
topology
There exists a linear map M
∗
: T
1
(1) →L
1
(Ω, Σ, µ) which satisﬁes
Tr(τM(f )) =
Z
p
τ
(ω)f (ω)µ(dω), p
τ
:= M
∗
(τ)
and is
positive, i.e. M
∗
(τ) ≥ 0 for τ ≥ 0
normalised, i.e. p
ρ
a probability density if ρ is a density matrix
Conversely, any linear map M
∗
with these properties has a dual M
Hints for the proof.
⇒ : show that f →Tr(τM(f )) is weak
∗
continuous and hence is given by
some p
τ
⇐: show that M
∗
is  
1
continuous. Then deﬁne M(f ) as element of the
dual of T
1
(1)
51
Measurements as (completely) positive unital maps (III)
Deﬁnition (general deﬁnition of a measurement)
Let B(1) be the algebra of observables of a quantum system. A measurement
with outcomes in the measure space (Ω, Σ) is given by a dual pair (M, M
∗
) as
above. The result X ∈ Ω of M has probability density
p
ρ
:= M
∗
(ρ) ∈ L
1
(Ω, Σ, µ)
Example
Let r
i
be 3 coplanar unit vectors in R
3
forming 120 degrees angles. The
triad, or MercedesBenz measurement on C
2
consists of 3 operators
M
i
=
1
3
(1 +
→
r
i
→
σ) =
1
3
„
1 + r
i ,z
r
i ,x
−ir
i ,y
r
i ,x
+ ir
i ,y
1 −r
i ,z
«
randomised measurement: Let M, N be two measurements with outcomes
in (Ω, Σ). Then R(f ) := λM(f ) + (1 −λ)N(f ) deﬁnes a measurement
obtained by randomly choosing M or N with probabilities (λ, 1 −λ)
52
Positive operator valued measures (POVM)
Deﬁnition
Let (Ω, Σ) be a measure space. A map M : Σ →B(1) is called a positive
operator valued measure (POVM) if it has the following properties
positivity: M(E) ≥ 0 for all E ∈ Σ
σadditivity: M(∪
i
E
i
) =
P
i
M(E
i
) (in the sense of strong convergence)
for any countable family of mutually disjoint sets E
i
∈ Σ
normalisation: M(Ω) = 1
Theorem
Let M be a measurement M : L
∞
(Ω, Σ, µ) →B(1).
Then the operators M(E) := M(χ
E
) form a POVM over (Ω, Σ).
Conversely, for every POVM M(E) : E ∈ Σ¦ over (Ω, Σ) with values in B(1)
there exists a probability measure µ and a measurement
M : L
∞
(Ω, Σ, µ) →B(1) with M(χ
E
) = M(E).
53
Measurements and POVM’s
Proof.
⇒: we only need to prove the σadditivity of M(E)¦.
Lemma
Let M
n
be an increasing net of positive operators converging to a bounded
operator M w..r.t the weak
∗
topology. Then M
n
converges strongly to M and
M is the least upper bound l .u.b.(M
n
).
Since M is weak
∗
continuous we have
M(∪
i
E
i
) = w
∗
 lim
k→∞
k
X
i =1
M(E
i
).
By the previous lemma,
P
k
i =1
M(E
i
) →M(∪
i
E
i
) strongly.
⇐: Given M(E)¦ we construct M
∗
: T
1
(1) →L
1
(Ω, Σ, µ) as follows.
For every density matrix τ deﬁne the probability measure µ
τ
(E) := Tr(τM(E)).
We only need to ﬁnd a common dominating measure. Let ρ be a density
matrix with strictly positive eigenvalues and let µ = µ
ρ
.
Then µ
τ
¸µ because µ(E) = 0 ⇒M(E) = 0 ⇒Tr(τM(E)) = 0.
Thus M
∗
: τ →p
τ
:=
dµ
τ
dµ
∈ L
1
(Ω, Σµ) has the desired properties
54
Naimark’s dilation Theorem
Example
Let P(E)¦ be a PVM with values in B(l) and let V : 1 →l be an isometry.
Then M(E) := V
∗
P(E)V¦ is a POVM with values in B(1)
Theorem (Naimark’s dilation Theorem)
Let M : L
∞
(Ω, Σ, µ) →B(1) be a measurement. There exists a projection
valued measure P : Σ →B(l) and an isometry V : 1 →l such that
M(E) = V
∗
P(E)V, E ∈ Σ
Remark
Naimark’s Theorem is a consequence of Stinespring’s Theorem for
commutative C
∗
algebras
Since V is isometric we can identifying 1 with V1 ⊂ l and write
M(E) = P
;
P(E)P
;
55
Proof of Naimark’s Theorem for ﬁnite measure spaces
Let Ω = {1, . . . , n} and POVM {M
1
, . . . , M
n
}.
Deﬁne the (positive) inner product over the direct sum of d copies of 1:
¸
¯
h,
¯
k)
M
= ¸(h
1
, . . . , h
n
), (k
1
, . . . , k
n
))
M
:=
n
X
i =1
¸h
i
, M
i
k
i
)
Let l be the Hilbert space (⊕
n
i =1
1)/A where
A := 
¯
h ∈ ⊕
n
i =1
1 : 
¯
h
M
= 0¦
is the space of null vectors of ¸, )
M
.
Deﬁne V : 1 →l by V : h →(h, . . . , h). Then V is an isometry:
¸Vh, Vk) = ¸
¯
h,
¯
k)
M
=
X
i
¸h, M
i
h) = ¸h, h)
Let P
i
∈ B(l) e the orthogonal projection onto the i ’s copy of 1
Verify that V
∗
P
i
V = M
i
:
¸h, V
∗
P
i
Vk) = ¸Vh, P
i
Vk) = ¸
¯
hP
i
¯
k)
M
= ¸h, M
i
k)
56
Further topics related to measurements
Measurements are  
1
contractive
Bures (ﬁldelity) distance on density matrices
Measurements are contractive w.r.t the BuresHellinger distance
Convex structure of measurements space. Extremal measurements
In ﬁnite dimensions measurements have densities
57
Measurements are ·
1
contractive maps
Lemma
Let M
∗
: T
1
(1) →L
1
(Ω, Σ, µ) be a measurement. Let ρ, τ be density matrices
and p
ρ
:= M
∗
(ρ), p
τ
:= M
∗
(τ). Then
p
ρ
−p
τ

1
≤ ρ −τ
1
Proof.
Note that if f , g are probability densities
f −g
1
=
Z
[f (ω) −g(ω)[µ(dω) = 2 sup
E
Z
E
f (ω) −g(ω)µ(dω)
Similarly, if ρ, τ are density matrices then we can write ρ −τ = δ
+
−δ
−
where δ
±
are positive operators with orthogonal supports. Then
ρ −τ
1
= Tr([ρ −τ[) = Tr(δ
+
+ δ
−
) = 2Tr(δ
+
) = 2 sup
M
Tr(M(ρ −τ))
where the supremum is taken over all operators 0 ≤ M ≤ 1.
Finally,
p
ρ
−p
τ

1
= 2 sup
E
Z
E
(p
ρ
(ω)−p
τ
(ω))µ(dω) = 2 sup
E
Tr((ρ−τ)M(E)) ≤ ρ−τ
1
58
Bures (ﬁdelity) distance on density matrices
Deﬁnition
Let ρ, τ be two density matrices on 1. The Bures (or ﬁdelity) distance between
ρ and τ is deﬁned as
b(ρ, τ) :=
“
2 −2ρ
1/2
τ
1/2

1
”
1/2
=
“
2 −2 Tr(
p
ρ
1/2
τρ
1/2
)
”
1/2
Deﬁnition
Let ρ be a density matrix on 1. A puriﬁcation of ρ is any pure state
P
ψ
= [ψ)¸ψ[ on an extended space 1⊗l such that ρ = Tr
I
(P
ψ
) or
equivalently Tr(ρA) = ¸ψ, A ⊗1ψ) for all A ∈ B(1).
Theorem
The ﬁdelity F(ρ
1
, ρ
2
) := Tr(
q
ρ
1/2
1
ρ
2
ρ
1/2
1
) is equal to max [¸ψ
1
, ψ
2
)[ where the
maximum is taken over all puriﬁcations ψ
1
, ψ
2
of ρ
1
, ρ
2
.
59
Fidelity and transition probability
Sketch of the proof.
1. Let τ ∈ T
1
(1). Then max [Tr(Uτ)[ = Tr([τ[) with maximum taken over
all unitaries. (use the polar decomposition τ = V[τ[ with a unitary V)
2. Let ρ
i
=
P
k
λ
(i )
k
[e
(i )
k
)¸e
(i )
k
[ be the spectral decompositions of ρ
i
. Any
puriﬁcation of ρ
i
is of the Schmidt form
ψ
i
=
X
k
q
λ
(i )
k
e
(i )
k
⊗f
(i )
k
∈ 1⊗1
with f
(1)
k
¦ and f
(2)
k
¦ orthonormal sets in 1 (l can be taken to be 1)
3. There exist unitaries U
i
: f
(i )
k
→e
(1)
k
for i = 1, 2 and V : e
(2)
k
→e
(1)
k
.
Check that
¸ψ
1
, ψ
2
) = Tr(ρ
1/2
1
ρ
1/2
2
U
T
2
VV
T
U
T
1
)
4. Optimise over U
2
and use point 1. to obtain the equality
max [¸ψ
1
, ψ
2
)[ = Tr([ρ
1/2
1
ρ
1/2
2
[)
60
Measurements are contractive w.r.t. the BuresHellinger distance
Deﬁnition
Let p, q be two probability densities in L
1
(Ω, Σ, µ). The Hellinger distance
between f and g is deﬁned as
h(p, q) := 
√
p −
√
q
2
=
„
2 −2
Z
p
p(ω)q(ω)µ(dω)
«
1/2
Theorem
Let M
∗
: T
1
(1) →L
1
(Ω, Σ, µ) be a measurement. Let ρ, τ be density matrices
and p
ρ
:= M
∗
(ρ), p
τ
:= M
∗
(τ). Then
h(p
ρ
, p
τ
) ≤ b(ρ, τ)
61
Proof of contractivity with respect to the BuresHellinger distance
Proof in the case of a discrete measure space.
Let Ω = 1, . . . , n¦ and the measurement POVM M
1
, . . . M
n
¦.
The theorem is equivalent to
P
n
k=1
√
p
k
√
q
k
≥ F(ρ, τ)
By Naimark’s Theorem we can embed 1 into a larger space l such that
the measurement is given by a PVM P
1
, . . . P
n
¦.
This operation leaves F(ρ, τ) invariant
F(ρ, τ) = sup
ψ,φ
[¸ψ, φ)[ where ψ, φ ∈ l ⊗l are puriﬁcations of ρ, τ.
Thus it suﬃces to show that for any ψ, φ
n
X
k=1
√
p
k
√
q
k
≥ [¸ψ, φ)[
But p
k
= ¸ψP
k
⊗1ψ) = P
k
⊗1ψ
2
and q
k
= ¸φP
k
⊗1φ) = P
k
⊗1φ
2
By using CauchySchwarz we ﬁnally get
n
X
k=1
√
p
k
√
q
k
=
n
X
k=1
P
k
⊗1ψ P
k
⊗1φ ≥
n
X
k=1
[¸P
k
⊗1ψ, P
k
⊗1φ)[
≥ [¸ψ,
n
X
k=1
P
k
⊗1φ)[ = [¸ψ, φ)[
62
Extremal measurements
Deﬁnition
A subset ( of a vector space V is convex if for any u ,= v ∈ (, the vectors
w := λu + (1 −λ)v are in ( for all 0 < λ < 1.
w ∈ ( is called an extremal point of ( if it cannot be decomposed as above
Problem. Characterise the extremal points of the convex set of measurements
M : L
∞
(Ω, Σ, µ) →B(1).
Theorem
Let M
1
, . . . M
n
¦ be the POVM of a measurement M with values in
Ω = 1, . . . , n¦. Let
M
i
=
r
i
X
j =1
m
(i )
j
[v
(i )
j
)¸v
(i )
j
[
with ¸v
(i )
j
, v
(i )
k
) = δ
j ,k
and m
(i )
j
> 0. Then M is extremal if and only if the
rankone operators [v
(i )
j
)¸v
(i )
k
[ : i = 1, . . . , n; j , k = 1 . . . , r
i
¦ are linearly
independent.
References:
K.R. Parthasarathy: Inf. Dim. Analysis, Quantum Probability Rel. Topics, 2, 557568, (1999)
G.M. D’Ariano, et al: J. Phys. A: Math. Gen. 38, 59795991, (2005)
63
Extremal measurements: solution in the case of a ﬁnite POVM
Sketch of the proof.
1. M is not extremal iﬀ there exist selfadjoint operators D
1
, . . . D
n
¦ (not all
equal to zero) such that
P
i
D
i
= 0 and M
i
±D
i
≥ 0 for all i .
Indeed, M
1
±D
1
, . . . , M
n
±D
n
¦ are POVM’s and
M
i
=
1
2
(M
i
+ D
i
) +
1
2
(M
i
−D
i
)
2. M
i
±D
i
≥ 0 implies Ker (M
i
) ⊂ Ker (D
i
).
This follows from ¸h, M
i
±D
i
h) ≥ 0 by writting h = αh
1
+ βh
2
with
h
1
∈ Ker (M
i
) and h
2
∈ Ker (M
i
)
⊥
.
3. If M
i
has spectral decomposition M
i
=
P
r
i
j =1
m
(i )
j
[v
(i )
j
)¸v
(i )
j
[ with m
(i )
j
> 0
then point 2. implies that D
i
can be expressed as
D
i
= d
(i )
j ,k
[v
(i )
j
)¸v
(i )
j
[
Then
P
i
D
i
= 0 is equivalent to the linear dependence of [v
(i )
j
)¸v
(i )
j
[
64
In ﬁnite dimensions measurements have densities
Lemma (measurement density)
Let M(E)¦ be a POVM over (Ω, Σ) with values in M(C
d
). Then there exists
a measure µ on (Ω, Σ) and a positive density function
m ∈ L
1
(Ω, Σ, µ) ⊗M(C
d
) such that
M(E) =
Z
E
m(ω)µ(dω), E ∈ Σ
Moreover µ can be chosen such that m(ω) ≤ 1, almost surely.
Proof.
Let tr(A) := Tr(A)/d and deﬁne the probability measure µ(E) := tr(M(E)).
The matrix element M
ij
(E) := ¸e
i
, M(E)e
j
) is a measure on Ω, dominated by
µ. Thus there exists a density m
ij
∈ L
1
(Ω, Σ, µ) such that
M
i ,j
(E) =
Z
E
m
i ,j
(ω)µ(dω).
Moreover from tr(M(E)) = µ(E) =
R
tr(m(ω))µ(dω) it follows that
tr(m(ω)) = 1, µalmost surely. In particular m is bounded.
65
Notions of statistical inference
Statistical models
Parametric estimation
Fisher Information
Cram´erRao bound
Eﬃcient estimators
Repeated coin toss example
66
What is statistical inference?
Given some random data X from an unknown distribution, one aims to make
an ‘educated guess’ about some property of the underlying distribution
Example
Density estimation: given X
1
, . . . , X
n
independent identically distributed
(i.i.d.) with unknown density p ∈ L
1
([0, 1]), estimate the value of p(x) for
some x ∈ [0, 1]
Hypothesis testing: given X drawn from either P
0
or P
1
decide from which
of the two distributions it comes
Conﬁdence intervals: together with estimator
ˆ
θ(x) of θ, provide a
neighbourhood C of
ˆ
θ(x) such that θ belongs to C with probability p
Suﬃcient statistic: can data X ∼ P
θ
be ‘summarised’ into a ‘simpler’
statistics f (X) without losing information about θ ?
Optimality: how do we compare the performance of estimators and which
are the optimal ones?
Asymptotics: what happens in the limit of ‘large number of data’ ?
67
Statistical models
Deﬁnition
Let Θ be a parameter space. A statistical model (experiment) over Θ is a
family P
θ
: θ ∈ Θ¦ of probability distributions on a measure space (Ω, Σ).
Example
Repeated coin toss: X
1
, . . . , X
n
i.i.d. with P
θ
([X
i
= 1]) = θ and
P
θ
([X
i
= 0]) = 1 −θ. The joint distribution is:
P
n
θ
([X
1
= x
1
, . . . , X
n
= x
n
]) =
n
Y
i =1
P
θ
([X
i
= x
i
]) = θ
P
i
x
i
(1 −θ)
n−
P
i
x
i
Gaussian shift on R
k
: family of Gaussian distributions N(θ, V) with
unknown mean θ ∈ R
k
and known k k covariance matrix V
Tomography: an unknown probability density p over R
2
is probed through
its marginals along random directions φ in plane. For each φ we get data
X ∼ 1[p](x, φ) where 1[p] is the Radon transform
1[p](x, φ) =
Z
p(x cos φ + t sin φ, x sin φ −t cos φ)dt
68
Parametric estimation
Problem
Given
a (open) subset Θ of R
k
data X ∼ P
θ
with P
θ
probability distribution on (Ω, Σ), and θ ∈ Θ
a loss function W : ΘΘ →R
+
, e.g. W(
ˆ
θ, θ) = θ −
ˆ
θ
2
devise an estimator
ˆ
θ =
ˆ
θ(X) such that the risk
R(
ˆ
θ, θ) := E
θ
(W(
ˆ
θ(X), θ))
is small.
Remark
The same problem can be formulated for ‘nonparametric’ Θ, and/or
estimation of a function t = t(θ)
In general the estimator may be randomised, for example
ˆ
θ =
ˆ
θ(X, U) where U is an additional random variable with ﬁxed, known
distribution
if X = x choose
ˆ
θ ∼ K(x, ·) where K : Ω ×Σ
Θ
→ [0, 1] is a Markov kernel
69
Unbiased estimators
Deﬁnition
Let P
θ
: θ ∈ Θ ⊂ R
k
¦ be a parametric statistical model and let X ∼ P
θ
.
An estimator
ˆ
θ(X) is called unbiased if E
θ
(
ˆ
θ(X)) = θ for all θ.
Example
Let X
1
, . . . , X
n
be i.i.d. Bernoulli with P
θ
([X = 1]) = θ and
P
θ
([X = 0]) = 1 −θ. Then
¯
X = (
P
X
i
)/n is an unbiased estimator of θ
Let Y
1
, . . . , Y
n
be i.i.d. normal distributed with P
θ
= N(θ, V). Then
¯
Y = (
P
Y
i
)/n is an unbiased estimator of θ
Remark (exercise)
P
θ
: θ ∈ Θ ⊂ R
k
¦ be a parametric statistical model and let X ∼ P
θ
. The mean
square error of
ˆ
θ(X) can be written as the sum of a variance and a bias terms
E
θ
((
ˆ
θ −θ)
2
) =
Z
(
ˆ
θ −θ)
2
p
θ
(d
ˆ
θ) =
Z
(
ˆ
θ −E
θ
(
ˆ
θ))
2
p
θ
(d
ˆ
θ) + (θ −E
θ
(
ˆ
θ))
2
= V(
ˆ
θ) + B(
ˆ
θ)
2
If
ˆ
θ is unbiased then the mean square error is equal to V(
ˆ
θ)
70
Fisher information matrix
Let P
θ
: θ ∈ Θ ⊂ R
k
¦ be a parametric statistical model with P
θ
probability
measures on (Ω, Σ) dominated by µ.
Smooth model
Throughout the following we will assume that p
θ
=
dP
θ
dµ
satisfy suﬃcient
‘regularity conditions’ allowing for diﬀerentiation w.r.t. θ and exchangeability of
integral and derivative.
Deﬁnition
Let
θ
:= log p
θ
be the log likelihood and let
˙
θ,i
:= ∂
θ
/∂θ
i
be the score
function(s). The Fisher information matrix is deﬁned by
I
i ,j
(θ) := E
θ
(
˙
θ,i
˙
θ,j
) =
Z
p
−1
θ
(ω)
∂p
θ
∂θ
i
∂p
θ
∂θ
j
µ(dω)
71
Properties of the Fisher information matrix
I (θ) is a positive deﬁnite real k k matrix
I (θ) is additive for products of independent models (exercise):
if P
θ
= P
(1)
θ
P
(2)
θ
then I (θ) = I
(1)
(θ) + I
(2)
(θ)
The Hellinger distance between inﬁnitesimally close densities p
θ
and p
θ+dθ
is determined by the Fisher information
h(p
θ
, p
θ+dθ
)
2
=
Z
(
p
p
θ
(ω)−
p
p
θ+dθ
(ω))
2
µ(dω) =
1
4
I (θ)(dθ)
2
+o((dθ)
2
)
The Fisher information matrix deﬁnes a riemannian metric on Θ and the
corresponding geodesic distance is the Bhattacharya distance
d(p
θ
1
, p
θ
2
) = 2 arccos
„Z
p
p
θ
1
(ω)
p
p
θ
2
(ω)µ(dω)
«
Let q
θ
be the probability density of a randomisation Y of X (randomised
statistic, Markov kernel) where X ∼ P
θ
. Then
d(q
θ
1
, q
θ
2
) ≤ d(p
θ
1
, p
θ
2
) and h(q
θ
1
, q
θ
2
) ≤ h(p
θ
1
, p
θ
2
)
I (θ) is the unique metric contracting under all randomisations
72
The CramerRao bound
Theorem (Cram´erRao)
The following matrix inequality holds for any unbiased estimator
ˆ
θ
E
θ
((
ˆ
θ −θ)
2
) = Var (
ˆ
θ) ≥ I (θ)
−1
where I (θ) is the Fisher information matrix.
Proof.
Let θ be one dimensional. The general case is left as exercise.
By CauchySchwarz
Var (
ˆ
θ) I (θ) = E
θ
((
ˆ
θ −θ)
2
) E
θ
(
˙
2
θ
) ≥
˛
˛
˛E
θ
((
ˆ
θ −θ)
˙
θ
)
˛
˛
˛
2
The right side is
E
θ
((
ˆ
θ −θ)
˙
θ
) = E
θ
(
ˆ
θ
˙
θ
) −θE
θ
(
˙
θ
) =
=
Z
ˆ
θ(ω)
dp
θ
dθ
(ω)µ(dω) −θ
Z
dp
θ
dθ
(ω)µ(dω) =
=
d
dθ
Z
ˆ
θ(ω)p
θ
(ω)µ(dω) −θ
d
dθ
Z
p
θ
(ω)µ(dω) =
d
dθ
E
θ
(
ˆ
θ) = 1
73
Remarks on the Cram´erRao bound
One can similarly deﬁne unbiased estimators ˆ g of g(θ) for a function
g : Θ →R
p
. The Cram´erRao bound in this case is
Var (ˆ g) ≥ J(θ)I (θ)
−1
J(θ)
T
where J(θ)
l ,i
= ∂g(θ)
l
/∂θ
i
is the p k Jacobian matrix.
For certain models there exist no unbiased estimators, e.g. the binomial
distribution b(θ, n) and function g(θ) = θ
−1
(exercise).
Even if unbiased estimators exist, their variance may be too big.
The Cram´erRao bound is in general not attainable, but it becomes
equality if and only if the distributions form an exponential family
p
θ
= exp
s
X
i =1
η
i
(θ)ˆ g
i
(ω) −B(θ)
!
h(ω)
74
Asymptotic eﬃciency
The theory of asymptotic eﬃciency shows that the Cram´erRao bound is
asymptotically attained in the following sense.
Deﬁnition
Let P
θ
: θ ∈ Θ ⊂ R
k
¦ be a parametric statistical model. Let X
1
, . . . , X
n
be
i.i.d. with distribution P
θ
. An estimator
ˆ
θ
n
=
ˆ
θ
n
(X
1
, . . . , X
n
) is called
asymptotically eﬃcient if
√
n(
ˆ
θ
n
−θ)
1
−→N(0, I (θ)
−1
)
In particular, if θ is one dimensional then nE
θ
((
ˆ
θ
n
−θ)
2
) →I (θ)
−1
.
Theorem
Under regularity conditions, the maximum likelihood estimator
ˆ
θ
n
(X
1
, . . . , X
n
) = arg max
τ
n
Y
i =1
p
τ
(X
i
)
is asymptotically eﬃcient.
75
Repeated coin toss example
Let P
θ
be the Bernoulli distribution: P
θ
([X = 1]) = θ and P
θ
([X = 0]) = 1 −θ.
Let X
1
, . . . , X
n
be i.i.d. with distribution P
θ
. Then
¯
X
n
:= (
P
n
i =1
X
i
)/n is an unbiased estimator of θ.
Var (
¯
X
n
) = Var (X)/n = θ(1 −θ)
2
+ (1 −θ)(0 −θ)
2
= θ(1 −θ)
The Fisher information is I (θ) = θ
−1
+ (1 −θ)
−1
= 1/(θ(1 −θ))
Thus
¯
X
n
attains the Cram´erRao bound.
Moreover by the Central Limit Theorem we have
√
n(
ˆ
θ
n
−θ) =
1
√
n
n
X
i =1
(X
i
−θ)
1
−→N(0, Var (X)) = N(0, θ(1 −θ))
Hence
ˆ
θ
n
is asymptotically eﬃcient
The maximum likelihood estimator is obtained by diﬀerentiating the
likelihood
dp
θ
dθ
(X
1
, . . . , X
n
) =
d
dθ
n
Y
i =1
θ
P
i
X
i
(1 −θ)
n−
P
i
X
i
=
P
i
X
i
θ
−
n −
P
i
X
i
1 −θ
= 0
with solution
ˆ
θ
n
=
¯
X
n
!
76
Hypothesis testing
Problem
Let P
0
, P
1
¦ be a binary statistical model over (Ω, Σ). Given X ∼ P
i
decide
which of the two hypotheses is true, i = 0 or i = 1.
The test t : Ω →0, 1¦ is ‘good’ if its error probabilities are small
type I error P
0
([t(X) = 1])
type II error P
1
([t(X) = 0])
There are two main approaches to optimality
1. ﬁx a level α ∈ (0, 1) and look for a test that minimises
β := P
1
([t(X) = 0]) under the constraint P
0
([t(X) = 1]) ≤ α
2. ﬁx a prior π
0
, π
1
and ﬁnd a test that minimises the average error
probability P
e
π
:= π
0
P
0
([t(X) = 1]) + π
1
P
1
([t(X) = 0])
Remark
One can extend the problem to
more hypotheses P
1
, . . . , P
k
¦
composite hypotheses: θ ∈ Θ
0
vs θ ∈ Θ
1
where Θ
0
, Θ
1
¦ is a partition of
Θ and X ∼ P
θ
randomised tests t = t(X, U) with U uniform on [0, 1]
77
Optimal tests
Let P
0
, P
1
¦ be a binary statistical model over (Ω, Σ) and let p
0
and p
1
be the
densities of P
0
and P
1
w.r.t. a probability measure µ.
Lemma (NeymanPearson lemma)
Let α ∈ (0, 1) be a ﬁxed level. Then there exist a constant k such that the
likelihood ratio test
t(ω) :=
0 if p
0
(ω)/p
1
(ω) > k
1 if p
0
(ω)/p
1
(ω) ≤ k
satisﬁes P
0
([t(X) = 1]) = α and minimises the type II error P
1
([t(X) = 0])
among the αlevel tests.
Lemma (optimal Bayes test)
Let π
0
, π
1
be a (nondegenerate) prior distribution. Then the likelihood ratio
test
t(ω) :=
0 if p
0
(ω)/p
1
(ω) > π
1
/π
0
1 if p
0
(ω)/p
1
(ω) ≤ π
1
/π
0
has minimal average error
P
e
π
:= π
0
P
0
([t(X) = 1]) + π
1
P
1
([t(X) = 0]) =
1
2
(1 −π
1
p
1
−π
0
p
0

1
)
78
Asymptotics: Stein’s Lemma and Chernoﬀ’s bound
Let P
0
, P
1
¦ be a binary statistical model and let X
1
, . . . , X
n
i.i.d. with X
k
∼ P
i
.
Theorem (Stein’s Lemma)
Let t
n
(X
1
, . . . X
n
) be the most powerful level α test. Then
lim
n→∞
1
n
log P
n
1
([t
n
= 0]) = −D(p
0
, p
1
)
where D(p
0
, p
1
) is the relative entropy
D(p
0
, p
1
) =
Z
p
0
(ω) log(p
0
/p
1
)µ(dω).
Theorem (Chernoﬀ’s bound)
Let π
0
, π
1
be a nondegenerate prior and let t
n
(X
1
, . . . X
n
) be the optimal Bayes
test. Then
lim
n→∞
1
n
log P
e,n
π
= −C(p
0
, p
1
)
where C(p
0
, p
1
) is the Chernoﬀ distance
C(p
0
, p
1
) = −log
„
inf
0≤s≤1
Z
p
s
0
(ω)p
1−s
1
(ω)µ(dω)
«
79
The quantum Cram´erRao bound
Quantum statistical models
Quantum state estimation
The L
2
(ρ) Hilbert space
The quantum FisherHelstrom information matrix
Quantum Cram´erRao bound(s)
The quantum Cram´erRao bound is achievable for Θ ⊂ R
Achievability of the quantum Cram´erRao bound for Θ ⊂ R
k
with k > 1
The right Cram´erRao bound
The Holevo bound
80
Quantum statistical models
Deﬁnition
Let Θ be a parameter space. A quantum statistical model (experiment) over Θ
is a family ρ
θ
: θ ∈ Θ¦ of density matrices ρ
θ
∈ T
1
(1) for a given space 1.
Example
qubit states: indexed by r = (r
x
, r
y
, r
z
) ∈ R
3
such that r ≤ 1
ρ
r
=
1
2
„
1 + r
z
r
x
−ir
y
r
x
+ ir
y
1 −r
z
«
z
y
x
r
coherent spin states: ρ
n
r
= ρ
r
⊗ ⊗ρ
r
, for r = 1 (pure states)
Unitary family: ρ
t
= exp(iHt)ρ exp(iHt) for t ∈ R, H selfadjoint
quantum exponential family:
ρ
θ
= e
−k(θ)
exp(
X
i
γ
i
(θ)T
∗
i
) ρ
0
exp(
X
i
γ
i
(θ)T
i
), γ
i
(θ) ∈ C, T
i
∈ B(1)
Gaussian states of a quantum harmonic oscillator Φ(z, V) with mean
z ∈ C, complex 2 2 ’covariance matrix’ V
81
Quantum state estimation
Problem
Given
a quantum statistical model ρ
θ
: θ ∈ Θ¦
a loss function W : ΘΘ →R
+
, e.g.

ˆ
θ −θ
2
for Θ ⊂ R
k
or ˆ ρ −ρ
1
if Θ ⊂ S(1), etc.
design a measurement M and an estimator
ˆ
θ(X), where X is the outcome of
the measurement, such that
R(M,
ˆ
θ, θ) = E
θ
(W(
ˆ
θ(X), θ))
is small.
Mesure quantique
ρ
θ
∼∼∼∼∼∼∼∼∼∼ M
appareil
de mesure
 
X ∼ P
θ
M
r´esultat
ˆ
θ (X)
estimateur
Remark
same problem can be formulated for estimating a function g(θ)
the main quantum feature is the optimisation over measurements step
measurement and estimator can be ‘bundled’ into a measurement with
values in Θ
82
The L
2
(ρ) Hilbert space
Deﬁnition
Let ρ be a positive operator in T
1
(1). On the Rlinear space of bounded
selfadjoint operators B(1)
sa
deﬁne the inner product
¸A, B)
ρ
:= Tr(ρA ◦ B), A ◦ B =
1
2
(AB + BA)
L
2
R
(ρ
θ
) is the Hilbert space completion of B(1)
sa
with respect to ¸, )
ρ
Remark
A, B ∈ B(1) correspond to the same vector in L
2
R
(ρ) if Tr(ρ(A −B)
2
) = 0
(relevant when ρ has eigenvalues equal to zero).
It can be shown each vector in L
2
R
(ρ) can be identiﬁed with (the
equivalence class of) a square summable operator w.r.t. ρ, i.e. unbounded
symmetric linear operators satisfying
X
λ
i
Xe
i

2
< ∞
where ρ =
P
i
λ
i
[e
i
)¸e
i
[ is the spectral decomposition of ρ
equivalently, X is square summable iﬀ X
√
ρ is a HilbertSchmidt operator,
i.e. X
√
ρ
2
2
= Tr((X
√
ρ)
∗
(X
√
ρ)) = X
2
ρ
< ∞
83
The quantum FisherHelstrom information matrix
Let ρ
θ
: θ ∈ Θ¦ be a parametric statistical model with ρ
θ
∈ T
1
(1) and
Θ ⊂ R
k
open. Let (L
2
R
(ρ
θ
), ¸, )
θ
) be the L
2
space w.r.t. ρ
θ
.
Assume that
θ →ρ
θ
is diﬀerentiable as function with values in T
1
(1)
the linear functional on B(1)
A →
∂
∂θ
i
Tr(Aρ
θ
) = Tr(∂ρ
θ
/∂θ
i
A)
can be extended to a continuous functional on L
2
R
(ρ
θ
) for all i = 1, . . . , k
Then by Riesz Theorem there exists a unique vector /
θ,i
∈ L
2
R
(ρ
θ
) called
symmetric logarithmic derivative (s.l.d.) such that
Tr(∂ρ
θ
/∂θ
i
A) = ¸/
θ,i
, A)
θ
= Tr((ρ
θ
◦ /
θ,i
)A)
or equivalently,
∂ρ
θ
∂θ
i
= /
θ,i
◦ ρ
θ
The quantum FisherHelstrom information matrix is deﬁned as
H(θ)
i ,j
= ¸/
θ,i
, /
θ,j
)
θ
84
Example (exercise)
Let ρ
r
∈ M(C
2
) be the state with Bloch vector r represented in polar
coordinates r ↔(r , θ, φ)
ρ
r
=
1
2
„
1 + r cos θ r sin θe
−i φ
r sin θe
−i φ
1 −r cos θ
«
=
1
2
(1 + rσ)
The symmetric logarithmic derivatives are the solutions of
∂ρ
r
∂r
= /
r,r
◦ ρ
r
,
∂ρ
θ
∂θ
= /
r,θ
◦ ρ
r
,
∂ρ
r
∂φ
= /
r,φ
◦ ρ
φ
and are given by
/
r
=
1
1 + r
(1 + rσ/r ), /
θ
=
∂r
∂θ
σ, /
φ
=
∂r
∂φ
σ.
The quantum FisherHelstrom information matrix is
H(r) =
0
@
1
1−r
2
0 0
0 r
2
0
0 0 r
2
sin θ
2
1
A
85
Properties of the quantum FisherHelstrom information matrix
H(θ) is a real positive deﬁnite matrix
additivity: if ρ
θ
= ρ
(1)
θ
⊗ρ
(2)
θ
then H(θ) = H(θ)
(1)
+ H(θ)
(2)
(exercise)
metric: the Bures (ﬁdelity) distance between inﬁnitesimally close states ρ
θ
and ρ
θ+dθ
is given by the quantum FisherHelstrom information
B(ρ
θ
, ρ
θ+dθ
) =
1
4
H(θ)(dθ)
2
+ o((dθ)
2
)
contractivity: let C : T
1
(1) →T
1
(l) be a quantum channel
(completely positive, trace preserving linear map)
Let τ
θ
:= C(ρ
θ
) be the quantum model obtained by applying the
‘quantum randomisation’ C to ρ
θ
. Then
b(ρ
θ
1
, ρ
θ
2
) ≥ b(τ
θ
1
, τ
θ
2
), and H(ρ
θ
) ≥ H(τ
θ
)
unlike the classical case, H is not the unique contractive metric. Such
metrics are in onetoone correspondence with operator monotone
functions f : R
+
→R (i.e. f (A) ≥ f (B) for all A ≥ B ≥ 0 in B(1))
satisfying f (t) = tf (t
−1
) and f (1) = 1
Reference: D. Petz, Linear Algebra Appl. 244 8196 (1996)
86
Quantum Cram´erRao bound (I)
Theorem
Let Q := ρ
θ
: θ ∈ Θ ⊂ R
k
¦ be a quantum statistical model with ρ
θ
∈ B(1)
and denote by H(θ) the associated quantum Fisher information matrix.
Let M be a measurement with outcomes in (Ω, Σ) and let P
(M)
θ
:= M
∗
(ρ
θ
).
Let 1
M
:= P
(M)
θ
: θ ∈ Θ¦ be the classical model associated to (Q, M) and let
I
M
(θ) be its Fisher information matrix. Then the matrix inequality holds
I
M
(θ) ≤ H(θ)
and in particular, for any unbiased estimator
ˆ
θ of θ we have
Var (
ˆ
θ) ≥ I
M
(θ)
−1
≥ H(θ)
−1
Remark
In the last display, the left inequality is the ‘classical’ Cram´erRao.
the right inequality follows from applying the operator monotone function
f (x) = x
−1
to the previous inequality I
M
(θ) ≤ H(θ).
A function is called operator monotone if f (A) ≤ f (B) for all A, B ∈ B(1)
satisfying 0 ≤ A ≤ B. Not all monotone functions are operator monotone
(exercise) 87
Proof: the case of a PVM
1. Suppose ﬁrst that M is a PVM. The general case is reduced to this by
Naimark’s theorem (next page).
We show that there exists and isometry I : L
2
R
(p
θ
) →L
2
R
(ρ
θ
) such that
I
∗
(/
θ,i
) =
˙
θ,i
which implies I
M
(θ) ≤ H(θ).
Let L
2
R
(p
θ
) = f : Ω →R : E
θ
(f
2
) < ∞¦ be the Hilbert space with inner
product ¸f , g)
θ
= E
θ
(fg)
The score functions
˙
θ,i
are elements of L
2
R
(p
θ
) and I
M
(θ) = ¸
˙
θ,i
,
˙
θ,j
)
θ
Recall that we deﬁned M : L
∞
(Ω, σ, µ) →B(1). Then
¸f , g)
θ
= E
θ
(f , g) = Tr(ρ
θ
M(f g)) = ¸M(f ), M(g))
θ
so M can be extended to an isometry I : L
2
R
(p
θ
) →L
2
R
(ρ
θ
)
we show that I
∗
(/
θ,i
) =
˙
θ,i
. Indeed for every f ∈ L
2
(p
θ
)
¸f ,
˙
θ,i
)
θ
=
Z
f (ω)∂p
θ
/∂θ
i
(ω)µ(dω) = ∂E
θ
(f )/∂θ
i
∂Tr(ρ
θ
M(f ))/∂θ
i
= Tr(∂ρ
θ
/∂θ
i
M(f )) = ¸I (f ), /
θ,i
)
θ
88
Proof : the Naimark Theorem argument
Then I
M
(θ) ≤ H(θ) since
X
i ,j
c
i
c
j
I
M
(θ)
i ,j
= 
X
i
c
i
˙
θ,i

2
θ
= I
∗
X
i
c
i
/
θ,i

θ
≤

X
i
c
i
/
θ,i

θ
=
X
i ,j
c
i
c
j
H(θ)
i ,j
2. Now let M be a general measurement given by a POVM on 1.
By Naimark’s Theorem there exists an isometry V : 1 →l such that
M(B) = V
∗
P(B)V, with P(B)¦ a PVM.
The map V V
∗
: B(1)
sa
→B(l)
sa
extends to an isometric isomorphim
O : L
2
(ρ
θ
) →L
2
(˜ ρ
θ
) where ˜ ρ
θ
:= Vρ
θ
V
∗
is the embedded state
(exercise).
In particular
˜
/
θ,i
= O/
θ,i
O
−1
and H(˜ ρ
θ
) = H(ρ
θ
) = H(θ)
When measuring ˜ ρ
θ
with P(B)¦ we get the same distribution P
θ
as when
measuring ρ
θ
with M(B)¦ and hence, same Fisher information.
We can now apply the proof for the PVM case
89
Quantum Cram´erRao bound (II)
Theorem (Helstrom, Belavkin, Holevo)
Let Q := ρ
θ
: θ ∈ Θ ⊂ R
k
¦ be a quantum statistical model with ρ
θ
∈ B(1)
and denote by H(θ) the associated quantum Fisher information matrix.
Let M be a unbiased measurement with values in Θ, i.e.
the result
ˆ
θ ∼ P
(M)
θ
is unbiased estimator of θ.
Deﬁne the operators
X
M
i
=
Z
x
i
M(dx), i = 1, . . . , k
as element in L
2
(ρ
θ
) and the ‘quantum covariance matrix’
V
M
(θ)
i ,j
:= ¸X
M
i
−θ
i
, X
M
j
−θ
j
)
θ
Then
Var (
ˆ
θ) ≥ V
M
(θ) ≥ H(θ)
−1
90
Proof of quantum Cram´erRao Theorem (II)
1. We ﬁrst prove Var (
ˆ
θ) ≥ V
M
(θ)
We use again Naimark’s theorem (M(dx) = V
∗
P(dx)V) to obtain
X
M
i
=
Z
x
i
M(dx) = V
∗
Z
x
i
P(dx)V = V
∗
X
P
i
V
Let Y
M
(c) :=
P
i
c
i
(X
M
i
−θ
i
) and Y
P
(c) :=
P
i
c
i
(X
P
i
−θ
i
).
Then (exercise)
c
T
V
M
(θ)c = Tr(ρ
θ
(Y
M
c
)
2
) = Tr(ρ
θ
(V
∗
Y
P
c
V)
2
)
= Tr(˜ ρ
θ
Y
P
c
VV
∗
Y
P
c
) ≤ Tr(˜ ρ
θ
(Y
P
c
)
2
)
= E
θ
((
X
i
c
i
(
ˆ
θ
i
−θ
i
))
2
) = c
T
Var (
ˆ
θ)c
91
Proof of quantum Cram´erRao Theorem (II)
2. We now prove the second inequality for one dimensional θ. The general case
is left as an exercise.
By CauchySchwarz we have
/
θ

2
θ
Y
M

2
θ
≥ [¸/
θ
, Y
M
)
θ
[
2
Since H(θ) = /
θ

2
θ
and V
M
(θ) = Y
M

2
θ
, it suﬃces to show that
¸/
θ
, Y
M
)
θ
= 1.
By using the isomorphism O : L
2
(ρ
θ
) →L
2
(˜ ρ
θ
) the isometry
I : L
2
(p
θ
) →L
2
(˜ ρ
θ
), the fact that I
∗
/
θ
=
˙
θ
and Y
P
= I (f ) for
f (x) = x −θ, we get
¸/
θ
, Y
M
)
θ
= ¸
˜
/
θ
, Y
P
)
θ
= ¸
˜
/
θ
, I (f ))
θ
= ¸I
∗
(
˜
/
θ
), f )
θ
=
Z
˙
θ
(x)(x −θ)p
θ
(x)µ(dx) = 1.
92
The quantum Cram´erRao bound is asymptotically achievable for Θ ⊂ R
By measuring /
θ
0
in state ρ
θ
0
for some ﬁxed θ
0
we obtain a random variable L
with mean and variance
E
θ
0
(L) = Tr(ρ
θ
0
/
θ
0
) = 0, Var
θ
0
(L) = Tr(ρ
θ
0
/
2
θ
0
) = H(θ
0
)
Then
ˆ
θ :=
L
H(θ
0
)
+ θ
0
is locally unbiased estimator of θ around θ
0
since
E
θ
(
ˆ
θ) = θ
0
+
Tr(ρ
θ
/
θ
0
)
H(θ
0
)
= θ
0
+ dθ
Tr(
dρ
θ
dθ
/
θ
0
)
H(θ
0
)
+ o(dθ)
= θ
0
+ dθ
Tr(ρ
θ
0
/
2
θ
0
)
H(θ
0
)
+ o(dθ) = θ + o(dθ)
and its variance is
Var (
ˆ
θ) =
Var (L)
H(θ
0
)
2
= H(θ
0
)
−1
93
The quantum Cram´erRao bound is asymptotically achievable for Θ ⊂ R
However the measurement depends on θ
0
and is only ‘locally optimal’. The
argument can be made rigourous in the asymptotic framework using an
adaptive measurement procedure:
1. Consider n independent, identically prepared quantum systems. The
corresponding statistical model is Q
n
:= ρ
⊗n
θ
: θ ∈ Θ¦
2. The s.l.d. is given by the the sum of the individual s.l.d.’s
/
(n)
θ
= /
θ
⊗1 ⊗ ⊗1 + + 1 ⊗ ⊗/
θ
and the Fisher Helstrom information is H
(n)
(θ) = nH(θ)
3. We perform a simple measurement (e.g. separate, identical,
informationally complete meausurements on each system) on a small
fraction ˜ n ¸n of the systems and compute a rough estimator
˜
θ
n
of θ
4. On the rest of the systems we measure the s.l.d. /
(n)
θ
at θ =
˜
θ
n
and
compute the locally unbiased
ˆ
θ
n
.
5. This estimator is eﬃcient
√
n(
ˆ
θ
n
−θ)
1
−→N(0, H(θ)
−1
)
94
Achievability of the quantum Cram´erRao bound for Θ ⊂ R
k
with k > 1
1. If the s.l.d.’s commute with each other, i.e. [/
θ,i
, /
θ,j
] = 0 for all
i , j = 1, . . . , k then they can be measured simultaneously (exercise) and
the previous argument leads to an eﬃcient estimator
ˆ
θ
n
.
2. However, if the s.l.d.’s do not commute with each other, there may not
exist locally unbiased estimator which achieve the quantum Cram´erRao
bound.
Asymptotically, the variance H(θ)
−1
can be achieved iﬀ the weaker form
of commutativity holds Tr(ρ
θ
[/
θ,i
, /
θ
j
]) = 0.
3. Although the bound is in general not achievable, it is sharp in the sense
that if V(M, θ) ≥ K
−1
(θ) for all locally unbiased measurements, then
H(θ)
−1
≥ K
−1
(θ).
4. What is a ‘good estimator’ in this case?
The answer depends on the particular form of the loss function. If
G ∈ M(R
k
) is a positive matrix we deﬁne the loss function
W(
ˆ
θ, θ) =
X
i ,j
(
ˆ
θ
i
−θ
i
)G
i ,j
(
ˆ
θ
j
−θ
j
) = (
ˆ
θ −θ)
T
G(
ˆ
θ −θ)
The risk is given by R(
ˆ
θ, θ, G) = E
θ
W(
ˆ
θ, θ) = TrGVar(
ˆ
θ) and the optimal
measurement procedure will depend on G...
95
The right logarithmic derivative
Deﬁnition
1. Let ρ ∈ T
1
(1) be a state. Deﬁne L
2
+
(ρ) to be the complex Hilbert space
obtained as the completion of B(1) with respect to the inner product
(X, Y)
ρ
:= Tr(ρYX
∗
)
2. Let Q := ρ
θ
: θ ∈ Θ ⊂ R
k
¦ be a quantum statistical model on 1.
Assume that
ρ
θ
is diﬀerentiable in T
1
(H)
the functional A →
∂Tr(ρ
θ
A)
∂θ
i
= Tr(
∂ρ
θ
∂θ
i
A) on B(H), can be extended to a
continuous linear functional on L
2
+
(ρ
θ
).
The right logarithmic derivative L
θ,i
is deﬁned as the vector in L
2
+
(ρ)
satisfying Tr(
∂ρ
θ
∂θ
i
A) = (L
θ,i
, A)
θ
or equivalently,
∂ρ
θ
∂θ
i
= ρ
θ
L
θ,i
3. The right information matrix is deﬁned by
J(θ)
i ,j
= (L
θ,i
, L
θ,j
)
θ
96
The right CramerRao bound
Theorem (Yuen and Lax, Belavkin)
Let Q := ρ
θ
: θ ∈ Θ ⊂ R
k
¦ be a quantum statistical model with ρ
θ
∈ B(1)
and denote by J(θ) the associated right information matrix.
Let M be a unbiased measurement with values in Θ, i.e.
the result
ˆ
θ ∼ P
(M)
θ
is unbiased estimator of θ.
Deﬁne the operators
X
M
i
=
Z
x
i
M(dx), i = 1, . . . , k
as element in L
2
+
(ρ
θ
) and the ‘right quantum covariance matrix’
V
M
+
(θ)
i ,j
:= (X
M
i
−θ
i
, X
M
j
−θ
j
)
θ
Then
Var (
ˆ
θ) ≥ V
M
+
(θ) ≥ J(θ)
−1
where all matrices are considered as elements of M(C
k
).
97
Comparison of the symmetric and right (left) Cram´erRao bounds
1. If θ is one dimensional then the symmetric bound is at least as informative
as the right bound:
H(θ) ≤ J(θ)
Indeed the variance H(θ)
−1
is achieved by measuring /
θ
(locally unbiased
measurement), hence the right bound implies that H(θ)
−1
≥ J(θ)
−1
2. For certain models the right bound is better than the symmetric one. For
example in the case of mixed Gaussian states of a harmonic oscillator
G(z, V) with ﬁxed V and unknown z, the right bound is achieved in the
sense that for any ﬁxed positive matrix G,there exists an unbiased
estimator ˆ z such that
Tr(GVar(
ˆ
θ)) = Tr(GJ(θ)
−1
)
The measurements leading to these estimators depend however on G, and
are incompatible with each other.
98
The Holevo bound for quadratic risk
Let Q = ρ
θ
: θ ∈ Θ ⊂ R
k
¦ be a quantum statistical model on 1.
Let W(
ˆ
θ, θ) be a quadratic loss function, i.e.
W(
ˆ
θ, θ) =
X
i ,j
(
ˆ
θ
i
−θ
i
)G
ij
(
ˆ
θ
j
−θ
j
) = (
ˆ
θ −θ)
T
G(
ˆ
θ −θ)
The risk of an unbiased estimator
ˆ
θ is given by
R(
ˆ
θ, θ, G) = E
θ
(W(
ˆ
θ, θ)) =
X
i ,j
G
ij
E
θ
((
ˆ
θ
i
−θ
i
)(
ˆ
θ
j
−θ
j
)) = Tr(GVar (θ))
Theorem (Holevo bound)
Let M(d
ˆ
θ) be an unbiased measurement. Then
Tr(GVar (
ˆ
θ)) ≥ inf
X
θ
n
Tr
“
√
GRe(Z(X
θ
))
√
G
”
+ Tr
“˛
˛
˛
√
GIm(Z(X
θ
))
√
G
˛
˛
˛
”o
where X
θ
:= (X
θ,1
, . . . , X
θ,k
) with X
θ,k
symmetric elements of L
2
+
(ρ
θ
) satisfying
Tr(ρ
θ
X
θ,i
) = 0, Tr(
∂ρ
θ
∂θ
i
X
θ,j
) = δ
i ,j
,
and Z(X
θ
)
i ,j
:= (X
θ,i
, X
θ,j
)
θ
= Tr(ρ
θ
X
θ,j
X
θ,i
).
99
Proof of the Holevo bound
For simplicity we take G = 1. The general case is left as an exercise.
It is enough to prove the bound for special X
θ
of the form
X
θ,i
=
Z
x
i
M(dx) −θ
i
Check the duality between X
θ,j
and
∂ρ
θ
∂θ
i
Tr(
∂ρ
θ
∂θ
i
X
θ,j
) =
∂
∂θ
i
Tr(ρ
θ
X
θ,j
) −Tr(ρ
θ
∂X
θ,j
∂θ
i
) =
∂
∂θ
i
Tr(ρ
θ
X
θ,j
) + Tr(ρ
θ
)δ
i ,j
= δ
i ,j
As in the Cram´erRao bound (II) it can be shown that
Var (
ˆ
θ) ≥ Z(X
θ
)
Lemma (proof left as exercise): if V is a real symmetric k k matrix, Z is
hermitian (complex) matrix and V ≥ Z, then
Tr(V) ≥ Tr(Re(Z)) + Tr([ImZ[)
Apply Lemma with V = Var (
ˆ
θ) and Z = Z(X
θ
)
100
The Holevo bound is achievable (asymptotically)
1. The Holevo bound is achieved in the case of quantum Gaussian shift
models, i.e. Gaussian states of quantum oscillators with unknown means
and ﬁxed, known covariance. This will be discussed in detail in the
following sections.
2. The Holevo bound is achieved asymptotically for i.i.d. models of ﬁnite
dimensional states, i.e. ρ
θ
⊗ ⊗ρ
θ
with ρ
θ
∈ M(C
d
)
The measurement consists of a two steps adaptive procedure (as in the
case of onedimensional parameter), with the diﬀerence that in the second
step one needs to perform a joint measurement (not separable) on the
n − ˜ n systems. The measurement can be understood by showing that the
n particle model ‘converges’ to a Gaussian model for which the solution is
known.
A proof based on Cram´erRao analysis is given for d = 2 in
M. Hayashi and K. Matsumoto: arXiv:quantph/0411073
For the general case d < ∞ the result follows from the theory of ‘local
asymptotic normality’ developed in
J. Kahn and M. Gut ¸˘a: arXiv:quanthph/0804.3876
101
Covariant measurements
Group covariant quantum statistical models
Covariant measurements
The covariant quantum estimation problem
Optimal covariant measurements
Structure of covariant measurements
The optimal measurement in the case of irreducible representations
Example: estimation of pure states
Reference:
A. S. Holevo: Probabilistic and statistical aspects of quantum theory (1982)
102
Covariant quantum statistical models
Deﬁnition (covariant statistical models)
Let G be a group of transformations of a set Θ and denote the action by
θ →gθ for θ ∈ Θ, g ∈ G.
Let U : G →(1) be a unitary representation of G on 1.
A quantum statistical model ρ
θ
: θ ∈ Θ¦ on 1 is called covariant if
ρ
gθ
= U(g)ρ
θ
U(g)
∗
, g ∈ G, θ ∈ Θ
Example
the set of pure states ρ
P
= P with P a one dimensional projection in C
d
is
covariant under the action of SU(d) given by P →UPU
∗
shift parameter: the time evolved states ρ
t
:= exp(−iHt)ρ exp(iHt) are
covariant with respect to the representation of R given by U(t) = exp(iHt)
orientation parameter: Let U : SO(3) →(1) be a unitary representation.
Let n →gn be the action on S
2
by rotations.
The model ρ
n
:= U(g)ρ
n
0
U(g)
∗
: n ∈ S
2
¦ is covariant,
provided that ρ
n
0
= U(g)ρ
n
0
U(g)
∗
for all g s.t. gn
0
= n
0
103
Covariant measurements
Deﬁnition (covariant measurements)
Let G be a group of (measurable) transformations of a measure space (Ω, Σ)
and denote the action by ω →gω for ω ∈ Ω, g ∈ G.
Let U : G →(1) be a unitary representation of G on 1.
A measurement on 1 with outcomes in Ω is called covariant if
U
∗
(g)M(B)U(g) = M(g
−1
B)
where gB = ω : ω = gω
/
, ω
/
∈ B¦.
Example
Let Q, P be the position and momentum of a quantum particle such that
exp(−ixP)Q exp(ixP) = Q −x1. The measurement of Q is covariant with
respect to U(x) := exp(ixP) (exercise).
The triad measurement is covariant with respect to a unitary
representation of S(3) (exercise).
104
The covariant quantum estimation problem
Problem (covariant quantum estimation)
Given
an action of G on Θ
a unitary representation U : G →(1)
a covariant model ρ
θ
: θ ∈ Θ¦ on 1
an invariant loss function: W(
ˆ
θ, θ) = W(g
ˆ
θ, gθ) for all g ∈ G
ﬁnd the ‘optimal’ measurement for estimating θ.
Remark
Let
ˆ
θ be the result of a measurement M. The risk at θ is R(θ, M) := E
θ
W(
ˆ
θ, θ)
By optimal we mean a measurement that minimises the maximum risk
R(M) = sup
θ
R(θ, M)
Alternatively, we can look for a measurement that minimises the Bayesian risk
R(π, M) =
Z
Θ
π(dθ)R(θ, M)
for a prior π on Θ that is invariant under the action of G.
105
Transitive actions on compact groups
From now on we will assume for simplicity that
Θ ⊂ R
k
is a smooth manifold
G is a compact Lie group
the action of G on Θ is continuous and transitive, i.e.
for any θ ∈ Θ there exists g ∈ G such that θ = gθ
0
for some ﬁxed θ
0
1 is ﬁnite dimensional
Remark
On a compact Lie group there is a unique left (and right) invariant
probability measure, i.e. µ(A) = µ(gA) = µ(Ag) called the Haar measure
Let G
0
= g : gθ
0
= θ
0
¦ ⊂ G be the stationary group of θ
0
. Transitivity
implies Θ
∼
= G/G
0
.
on Θ there is a unique invariant measure ν given by
ν(B) = µ(g : gθ
0
∈ B¦) (exercise)
106
Optimal covariant measurements
Theorem (covariant measurements achieve the optimal risk)
In the covariant quantum estimation problem the minima of the Bayesian risk
R(π, M) and maximum risk R(M) are achieved on a covariant measurement.
Moreover if M is covariant, then
R(π, M) = R(M) = R(θ, M)
107
Optimal covariant measurements: proof
For any measurement M and g ∈ G we can deﬁne a new measurement M
g
by
M
g
(B) = U(g)
∗
M(gB)U(g)
Using the covariance of ρ
θ
: θ ∈ θ¦ and the invariance of W(
ˆ
θ, θ) we get
R(M
g
, θ) =
Z
W(
ˆ
θ, θ)Tr(ρ
θ
M
g
(d
ˆ
θ)) =
Z
W(
ˆ
θ, θ)Tr(ρ
θ
U(g)
∗
M(dg
ˆ
θ)U(g))
=
Z
W(
ˆ
θ, gθ)Tr(U(g)ρ
θ
U(g)
∗
M(d
ˆ
θ)) = R(M, gθ)
In particular if M is covariant then R(M, θ) = R(M, gθ)
Thus R(ν, M) = R(ν, M
g
) since ν is invariant measure on θ
108
Optimal covariant measurements: proof
The averaged measurement
M(B) =
Z
G
M
g
−1 (B)µ(dg)
is covariant and
R(M, ν) =
Z
R(M
g
−1 , ν)µ(dg) = R(M, ν) (5.1)
Moreover
R(M) = sup
θ
R(M, θ) ≥ R(M, ν) = R(M, ν) (5.2)
(5.1) and (5.2) say that the covariant measurement M is at least as good
as M
109
Structure of covariant measurements
Theorem (structure of covariant measurements)
Let m
0
∈ B(1) be a positive operator which commutes with the operators
U(g) : g ∈ G
0
¦ and satisﬁes
Z
G
U(g)m
0
U(g)
∗
µ(dg) = 1.
Deﬁne m(θ) := U(g)m
0
U(g)
∗
where g : θ
0
→θ. Then
M(B) :=
Z
B
m(θ)ν(dθ), B ∈ Σ(Θ)
is (the POVM of) a covariant measurement.
Conversely, any covariant measurement is of this form.
Remark
By considering B such that ν(B) is small enough we get M(B) < 1.
Thus a covariant measurement on a ﬁnite dimensional space cannot be
projection valued.
110
Structure of covariant measurements: proof
1. Direct statement
Note that m(θ) is well deﬁned due to the property m
0
= U(g)m
0
U(g)
∗
for
g ∈ G
0
.
Positivity and σadditivity follow directly from deﬁnitions
Using ν(B) = µ(g : gθ
0
∈ B¦) we obtain the normalisation
Z
Θ
M(dθ) =
Z
θ
m(θ)ν(dθ) =
Z
G
U(g)m
0
U(g)
∗
µ(dg) = 1
2. Converse
We apply the measurement density Lemma to obtain that
M(dθ) = m(θ)ν(dθ) where m(θ) is a unique positive operator density
(νalmost surely)
The covariance implies
Z
B
U(g)
∗
m(θ)U(g)ν(dθ) =
Z
g
−1
B
m(θ)ν(dθ) =
Z
B
m(g
−1
θ)ν(dθ)
and since the density is unique, we obtain U(g)
∗
m(θ)U(g) = m(g
−1
θ)
Choose m
0
= m(θ
0
) and check that it satisﬁes the conditions (exercise)
111
Covariant measurements for irreducible representations
Deﬁnition (irreducible representation)
A unitary representation U : G →B(1) is called irreducible (irrep) if the only
subspaces of 1 that are invariant under U are 1 and 0¦.
Lemma (Schur lemma)
Let U : G →B(1) be an irreducible representation. An operator A ∈ B(1)
commutes with U(g) for all g ∈ G iﬀ A = c1 for some c ∈ C.
Proposition (measurement seed for irreps)
There exists a onetoone correspondence between covariant measurements
with respect to an irreducible representation U : G →B(1) and density
matrices s
0
commuting with U(g) : g ∈ G
0
¦:
M(dθ) = dU(g)s
0
U(g)
∗
ν(dθ), d = dim(1) (5.3)
112
Covariant measurements for irreducible representations
Proof.
All irreps of a compact group are ﬁnite dimensional.
The expression (5.3) deﬁnes a measurement iﬀ m
0
:= ds
0
satisﬁes the
normalisation
Z
U(g)m
0
U(g)
∗
µ(dg) = 1 (5.4)
Since the integral (5.4) commutes with U(g), and U is irreducible, it is
proportional to 1 for arbitrary m
0
(Schur’s Lemma).
By taking trace on both sides of (5.4)
Z
Tr(U(g)m
0
U(g)
∗
)µ(dg) = Tr(m
0
) = Tr(1) = d
hence s
0
= m
0
/d is a density matrix.
113
The optimal measurement in the case of irreducible representations
Proposition (optimal seed for irreps)
Let U : G →(1) be an irreducible representation of G acting on Θ. Let
ρ
θ
: θ ∈ Θ¦ be a covariant quantum statistical model on 1 = C
d
.
1. The risk of a covariant measurement M(dθ) = ds
θ
ν(dθ) is equal to
R(M) = dTr(W
0
s
0
)
where W
0
is the positive operator
W
0
=
Z
W(θ, θ
0
)U(g)
∗
ρ
θ
0
U(g)ν(dθ).
2. The optimal covariant measurement has ‘seed’ s
0
given by
s
0
=
1
d
min
P
min
where P
min
is the projection onto the eigenspace of W
0
coresponding to the
minimal eigenvalue, and d
min
is the dimension of the eigenspace.
114
The optimal measurement in the case of irreducible representations
Proof.
As shown before, we can restrict to covariant measurements and the risk is
R(M) = R(M, θ
0
) = d
Z
W(θ, θ
0
)Tr(ρ
θ
0
s(θ))ν(dθ) =
d
Z
W(θ, θ
0
)Tr(U(g)
∗
ρ
θ
0
U(g)s
0
)ν(dθ) = dTr(W
0
s
0
)
where W
0
is the positive, operator (exercise: verify selfadjointness)
W
0
=
Z
W(θ, θ
0
)U(g)
∗
ρ
θ
0
U(g)ν(dθ)
The minimum of Tr(W
0
s
0
) over all density matrices s
0
is achieved at P
min
/d
min
with P
min
the eigenprojection coresponding to the minimal eigenvalue of W
0
(exercise)
115
Example: estimation of pure states
Let Q := ρ
θ
= [θ)¸θ[ : θ ∈ Θ¦ be the family of pure states where
[θ) =
d
X
i =1
θ
i
[e
i
) ∈ C
d
and
n
X
i =1
[θ
i
[
2
= 1
Let W(
ˆ
θ, θ) = 1 −[¸
ˆ
θ, θ)[
2
be the ﬁdelity distance
Remark (exercise)
Strictly speaking, the state determines the vector [θ) only to a phase
factor. This can be taken into account by ﬁxing the phase of one of the
coeﬃcients
The quantum statistical model Q is covariant with respect to the
(irreducible) representation of the special unitary group SU(d) on C
d
The loss function W(
ˆ
θ, θ) is invariant under the action of SU(d)
116
Example: estimation of pure states
Theorem (optimal estimation of pure states)
The optimal covariant measurement for the above quantum estimation problem
is
M(dθ) = d[θ)¸θ[ν(dθ)
where ν is the unique SU(d)invariant measure on Θ.
117
Example: estimation of pure states
Proof.
Let θ
0
= (1, 0, . . . , 0). The stationary group of θ
0
is G
0
∼
= SU(d −1)
consisting of unitaries U acting on C
d−1
= Line
2
, . . . , e
d
¦ and leaving e
1
ﬁxed.
According to the Proposition ‘measurement seed for irreps’
s
0
= λP
0
+
1 −λ
d −1
P
⊥
0
, λ ∈ [0, 1]
where P
0
= [θ
0
)¸θ
0
[.
According to the Proposition ‘optimal seed for irreps’ we need to optimise
the aﬃne functional λ →Tr(W
0
s
0
), thus the minimum is achieved by one
of the extremal points λ = 0 or λ = 1.
By direct calculation one can verify that λ = 1 is the minimum (exercise)
118
Quantum harmonic oscillators and Gaussian states
The quantum harmonic oscillator/quantum particle
Creation and annihilation operators, phaseshift operator
Coherent states
Squeezed states
Thermal equilibrium states
All gaussian states
Reference:
U. Leonhardt, Measuring the quantum state of light, Cambridge University Press, 1997
119
The quantum harmonic oscillator
Deﬁnition (position and momentum)
A quantum quantum harmonic oscillator/quantum particle is characterised by
its position and momentum (unbounded) observables acting on L
2
(R) as
Q : h → xh(x), h ∈ D(Q)
P : h → −i
dh
dx
, h ∈ D(P)
Formally, Q and P satisfy the Heisenberg commutation relation
[Q, P] = i 1
Lemma (exercise)
Let T : L
2
(R) →L
2
(R) be the (unitary) Fourier transform
T[f ](p) =
1
√
2π
Z
f (q)e
−ipq
dq, T
−1
[g](q) =
1
√
2π
Z
g(p)e
ipq
dq
The operators Q, P are Fourier transform of each other
Q = TPT
∗
, P = T
∗
QT
120
Weyl operators
Theorem (BakerHaussdorf formula)
Let F, G be operators such that [F, G] commutes with both F and G. Then
exp(F + G) = exp(−
1
2
[F, G]) exp(F) exp(G)
Deﬁnition (Weyl operators)
The Weyl operators are deﬁned as U(a) := exp(iaP) and V(b) := exp(ibQ).
From the BakerHaussdorf formula we get the Weyl relations
U(a)V(b) = exp(iab)V(b)U(a)
Alternatively we will use the displacement operators
D(q, p) := exp(ipQ −iqP) = exp(−
ipq
2
) exp(ipQ) exp(−iqP)
or
121
Projective unitary representation of R
2
Remark (projective unitary representation of R
2
)
Note that U(a) and V(b) act as displacement operators
U(a)QU(a)
∗
= Q + a1, V(b)PV(b)
∗
= P −b1
The unitaries D(q, p) satisfy
D(q, p)D(q
/
, p
/
) = exp(i /2(pq
/
−qp
/
)) D(q + q
/
, p + p
/
),
hence we have a projective unitary representation of R
2
.
The theory of covariant measurements can be extended to projective
unitary representations, in particular the statistical model
ρ
q,p
= D(q, p)ρ
0,0
D(q, p)
∗
, (q, p) ∈ R
2
is covariant w.r.t displacements in R
2
.
122
Weyl/CCR algebra
Deﬁnition (Weyl/CCR algebra)
The C
∗
algebra generated by S(a, b) is called the Weyl or CCR (canonical
commutation relations) algebra.
Lemma (irreducibility of the deﬁning representation)
The representation of CCR on L
2
(R) is irreducible (exercise).
Hint. To prove this verify that if ¸g, U(a)V(b)f ) = 0 for all (a, b) then g = 0
or f = 0, by using properties of the Fourier transform.
Theorem (von Neumann’s uniqueness Theorem)
All weakly continuous (w.r.t. (a, b)) irreducible representations of the Weyl
algebra are unitarily equivalent to each other.
Proof.
Look up page 225 in
A. S. Holevo, Probabilistic and statistical aspects of quantum theory (1982)
123
Creation and annihilation operators
Deﬁnition (Fock basis)
The Fock, or number O.N. basis in L
2
(R) is deﬁned by
ψ
n
(x) = H
n
(x)e
−x
2
/2
/(
√
π2
n
n!)1/2, n ≥ 0
where H
n
are the Hermite polynomials. We will denote the vectors ψ
n
by [n)
Deﬁnition (creation and annihilation operators)
The creation and annihilation operators on L
2
(R) are deﬁned as
a
∗
= (Q −iP)/
√
2 and a := (Q + iP)/
√
2
and satisfy the commutation relations [a, a
∗
] = 1.
124
Number and phaseshift operators
Lemma (ladder operators)
[n) are the eigenvectors of the number operator
N := a
∗
a = (P
2
+ Q
2
−1)/2 and N[n) = n[n)
a,and a
∗
act as ‘ladder operators’ on [n) : n ≥ 0¦:
a
∗
[n) =
√
n + 1[n + 1), aψ
n
=
√
n[n −1), a[0) = 0
Lemma
The phaseshift unitary Γ(φ) := exp(−i φN) acts as
Γ(φ)
∗
aΓ(φ) = a exp(−i φ)
or equivalently as rotation of the phase space variables
Γ(φ)
∗
„
Q
P
«
Γ(φ) = R(φ)
„
Q
P
«
, where R(φ) :=
„
cos φ sin φ
−sin φ cos φ
«
Proof. By diﬀerentiating Γ(φ)
∗
aΓ(φ) w.r.t. φ we get −i Γ(φ)
∗
aΓ(φ)
125
Coherent states
Deﬁnition (vacuum and coherent states)
The vector [0) is called vacuum or ground state.
The displacement operators D(q, p) can be rewritten in complex form
D(z) := exp(za
∗
−¯ za), z := (q + ip)/
√
2 ∈ C
[z) := D(z)[0) is called coherent vector and
[z) = exp(−
1
2
[z
2
[)
∞
X
n=0
z
n
√
n!
[n)
In particular, N has a Poisson distribution with intensity [z[
2
w.r.t. [z)¸z[
P
¦z)¸z¦
(n) = [¸z[n)[
2
= exp(−[z
2
[)
z
n
n!
Γ(φ) acts on coherent vectors as phaseshift
Γ(φ)[z) = [e
−i φ
z)
126
Overcompleteness of the coherent states
Lemma (Overcompleteness of the coherent states)
The wave function of the coherent vector [z) is
ψ
z
(x) = ψ
0
(x −q) exp(ipx −ipq/2), z = (q + ip)/
√
2
The inner product of two coherent vectors is
¸z[z
/
) = exp(−[z −z
/
[
2
/2) exp(i Im(¯ zz
/
))
The coherent states form an overcomplete set of projections
Z
dqdp
2π
[z)¸z[ = 1, z = (q + ip)/
√
2
In particular the linear span of the coherent vectors is dense in L
2
(R)
Proof.
First 2 items follow from deﬁnitions.
The overcompleteness can be checked by taking matrix elements w.r.t. the
Fock basis.
127
Squeezed states
Deﬁnition (displaced squeezed vacuum)
The unitary operator S(ξ) is called a squeezing operator
S(ξ) := exp(
ξ
2
(a
2
−a
∗2
)), ξ ∈ R
The vector states [φ, ξ, z) := Γ(φ)S(ξ)D(z)[0) are called (pure) squeezed
states.
Lemma (squeezing of coordinates)
S(ξ) has the following action on Q and P
S(ξ)
∗
QS(ξ) = Qe
−ξ
, S(ξ)
∗
PS(ξ) = Pe
ξ
Consequently, squeezing phase shifting and displacing has the action
Ad[D(z)
∗
Γ(φ)
∗
S(ξ)
∗
] :
„
Q
P
«
→
„
cos φe
−ξ
sin φe
ξ
−sin φe
−ξ
cos φe
ξ
«„
Q
P
«
+
„
q
p
«
1
Proof. By diﬀerentiating S(ξ)
∗
QS(ξ) w.r. t. ξ we obtain −S(ξ)
∗
QS(ξ)
128
Gaussian states
Deﬁnition (Gaussian state)
The quadrature observables are deﬁned by
X
φ
:= Q cos φ + P sin φ = Γ(φ)
∗
QΓ(φ)
A state ρ is called Gaussian if X
φ
has Gaussian distribution for all φ
Lemma
A Gaussian state ρ is completely characterised by the mean values
(q, p) := (Tr(ρQ), Tr(ρP)) and the ‘covariance matrix’ of (Q, P)
V
ρ
:=
„
Tr(ρ(Q −q)
2
) Tr(ρ(Q −q) ◦ (P −p))
Tr(ρ(Q −q) ◦ (P −p)) Tr(ρ(Q −q)
2
)
«
= Tr(ρXX
T
)
where
X =
„
Q −q1
P −p1
«
In particular, the distribution of X
φ
is
N(q cos φ + p sin φ, [R(φ)V
ρ
R(φ)
T
]
11
)
129
Characterisation of Gaussian states
Proof.
Any (classical) Gaussian distribution is uniquely determined by its mean and
variance.
The mean and variance of X
φ
are
Tr(ρX
φ
) = Tr(ρQ cos φ + P sin φ) = q cos φ + p sin φ
and
Tr(ρ(X
φ
−q cos φ + p sin φ)
2
) = Tr(ρ[R(φ)XX
T
R(φ)
T
]
11
) = [R(φ)V
ρ
R(φ)
T
]
11
The fact that there can be only one Gaussian state with a give mean and
variance can best be seen by associating in a onetoone fashion the Wigner
function W
ρ
which in the case of Gaussian states is just the Gaussian
N((q, p), V).
Wigner functions will be studied in the next section.
130
The squeezed states are Gaussian
Lemma (the vacuum state is Gaussian)
The vacuum state is a Gaussian state with each quadrature having distribution
N(0, 1/2). The covariance matrix of (Q, P) is
V
¦0)
=
„
Tr([0)¸0[Q
2
) Tr([0)¸0[Q ◦ P)
Tr([0)¸0[Q ◦ P) Tr([0)¸0[Q
2
)
«
=
„
1
2
0
0
1
2
«
Corollary (all squeezed (coherent) states are Gaussian)
The squeezed and coherent states are Gaussian.
The distribution of X
φ
with respect to ρ := [φ, ξ, z)¸φ, ξ, z[ is the marginal
along the direction φ
/
of the bivariate Gaussian N((q, p), V) with covariance
matrix
V =
„
Tr(ρQ
2
) Tr(ρQ ◦ P)
Tr(ρQ ◦ P) Tr(ρQ
2
)
«
=
1
2
R(φ)
„
e
−2ξ
0
0 e
2ξ
«
R(φ)
T
Moreover these are the only states of minimum uncertainty (exercise)
Var
ρ
(Q)Var
ρ
(P) =
1
4
131
Thermal equilibrium states
Deﬁnition (Thermal equilibrium state)
The thermal equilibrium state at inverse temperature β > 0 is deﬁned by
ρ
β
:= (1 −e
−β
)
∞
X
n=0
[n)¸n[e
−nβ
Remark
The thermal equilibrium states are faithful, i.e.
all eigenvalues of ρ
β
are strictly positive
The thermal equilibrium states are invariant under phase shifts
The number operator N has a geometric distribution with mean
¯
N = (e
β
−1)
−1
P
ρ
β
(n) = (1 −e
−β
)e
−nβ
132
Thermal states as mixtures of coherent states
Lemma
The thermal equilibrium state is a mixture of coherent states with Gaussian
weight of variance σ
2
= 1/(e
β
−1)
ρ
β
=
1
2πσ
2
Z
e
−(q
2
+p
2
)/(2σ
2
)
[z)¸z[dqdp, z = (q + ip)/
√
2
Proof.
By rotation (phase) symmetry it suﬃces to verify that the diagonal matrix
elements on the right and left side coincide:
1
σ
2
Z
∞
0
e
−r
2
/(2σ
2
)
e
−¦r ¦
2
/2
(r
2
/2)
n
n!
d(r
2
/2) = (1 −e
−β
)e
−nβ
133
The thermal equilibrium states are Gaussian
Corollary
The thermal equilibrium ρ
β
state is a centered Gaussian state with covariance
matrix
„
Tr(ρQ
2
) Tr(ρQ ◦ P)
Tr(ρQ ◦ P) Tr(ρQ
2
)
«
=
coth(β/2)
2
„
1 0
0 1
«
,
where coth(β/2) = (e
β/2
+ e
−β/2
)/(e
β/2
−e
−β/2
).
134
All Gaussian states
Theorem (general form of a Gaussian state)
Any Gaussian state of a quantum harmonic oscillator is of the form
ρ = D(z)
∗
Γ(φ)
∗
S(ξ)
∗
ρ
β
S(ξ)Γ(φ)D(z),
i.e. a displaced, rotated, squeezed thermal state. The corresponding bivariate
Gaussian is N((p, q), V) with covariance matrix
V =
coth(β/2)
2
R(φ)
„
e
−2ξ
0
0 e
2ξ
«
R(φ)
T
A positive real matrix V is the covariance matrix of a Gaussian state iﬀ
Det(V) ≥
1
4
Proof: exercise.
135
Estimation of Gaussian states
Gaussian shift models
Gaussian estimation problems
One dimensional Gaussian shift
Two dimensional Gaussian shift
The optimal covariant measurement
Reference:
A. S. Holevo, Probabilistic and statistical aspects of quantum theory (1982)
136
Gaussian shift models
Deﬁnition
Denote by G(z, V) the density matrix of the Gaussian state with displacement
z and covariance matrix V.
A quantum Gaussian shift model is a family of the form
Q
V
:= G(θ, V) : θ ∈ Θ ⊂ C¦
where Θ is a real linear subspace of C and V is a ﬁxed and known covariance
matrix.
Remark
In the case of a single quantum oscillator considered so far, there are only two
possible types of Gaussian shift models: one dimensional and twodimensional
(full) shift.
137
Equivalence with displaced thermal equilibrium states
Lemma
By applying an appropriate (unitary) squeezing operation we can transform the
model Q
V
into an equivalent model consisting of displaced thermal equilibrium
(or coherent) states.
Proof.
By the Theorem on the general form of a Gaussian state we have
G(z, V) = D(z)
∗
Γ(φ)
∗
S(ξ)
∗
ρ
β
S(ξ)Γ(φ)D(z),
We have
S(ξ)Γ(φ)D(z)Γ(φ)
∗
S(ξ)
∗
= D(z
/
),
where
z
/
= (e
ξ
Re(e
i φ
z) + ie
−ξ
Im(e
i φ
z))
From the above equations we get
G(z
/
, σ
2
1) = S(ξ)Γ(φ) G(z, V) Γ(φ)
∗
S(ξ)
∗
, σ
2
= Det(V)
1/2
138
Gaussian estimation problems
We will consider the following two estimation problems:
1. estimation of parameter θ in the onedimensional Gaussian shift
ρ
θ
:= exp(−i θP)ρ
β
exp(i θP) : θ ∈ R¦
for quadratic risk
R(
ˆ
θ, θ) = E
θ
((
ˆ
θ −θ)
2
)
2. estimation of parameter θ = (q, p) in the twodimensional Gaussian shift
ρ
θ
:= D(q, p)ρ
β
D(q, p)
∗
: (q, p) ∈ R
2
¦
for quadratic risk
R(
ˆ
θ, θ, G) = E
θ
((
ˆ
θ −θ)
T
G(
ˆ
θ −θ))
By rotation symmetry of ρ
β
we can pass to the coordinates system in
which G = Diag(g
q
, g
p
) is diagonal and write
R(
ˆ
θ, θ, G) = g
q
E
θ
((ˆ q −q)
2
) + g
p
E
θ
((ˆ p −p)
2
)
139
Onedimensional Gaussian shift
Theorem (optimal estimation for onedimensional shift)
Let
ρ
θ
:= exp(−i θP)ρ
β
exp(i θP), θ ∈ R,
be a Gaussian shift with 0 < β < ∞ ﬁxed and known.
The symmetric logarithmic derivative /
θ
deﬁned by
dρ
θ
dθ
= /
θ
◦ ρ
θ
is equal to 2(Q −θ1)/ coth(β/2) and the quantum Cram´erRao bound is
achieved by measuring Q. The resulting unbiased estimator
ˆ
θ has risk
R(
ˆ
θ, θ) = Var
θ
(
ˆ
θ) = H(θ)
−1
=
coth(β/2)
2
140
Onedimensional Gaussian shift: proof
Proof.
We have
dρ
θ
dθ
= −i [P, ρ
θ
] = −i exp(−i θP)[P, ρ
β
] exp(i θP)
Thus the s.l.d. is of the form /
θ
= exp(−i θP)/
0
exp(i θP) where /
0
is the
solution of
−i [P, ρ
β
] = /
0
◦ ρ
β
By writing the matrix elements w.r.t the Fock basis we get
−i (e
−nβ
−e
−mβ
)¸m[P[n) = (e
−nβ
+ e
−mβ
)/2¸m[/
0
[n)
with solution /
0
= 2Q/ coth(β/2). Hence /
θ
= 2(Q −θ1)/ coth(β/2)
The HelstromFisher information is H(θ) = Tr(ρ
θ
/
2
θ
) = 2/ coth(β/2)
The result
ˆ
θ of measuring Q is an unbiased estimator of θ and
Var
θ
(
ˆ
θ) = H(θ)
−1
141
One dimensional Gaussian shift as a covariant family
Remark
The s.l.d.’s /
θ
: θ ∈ R¦ form a commutative family and in this case we
can achieve the Cram´erRao bound ‘in one shot’ not only in the sense of
locally unbiased measurements which provide only asymptotic optimality
The HelstromFisher matrix and risk do not depend on θ due to the fact
that we deal with a covariant family. The methods developed for covariant
measurements with respect to compact groups can be extended to R and
lead to the same optimal measurement (see Holevo)
The same result can be obtained in the case of coherent states (‘β = ∞’)
with the diﬀerence that /
θ
is not uniquely deﬁned as an operator but only
as an element of L
2
(ρ
θ
)
142
Twodimensional Gaussian shift
Theorem (optimal estimation for twodimensional shift)
Let
ρ
θ
:= D(q, p)ρ
β
D(q, p)
∗
, θ = (q, p) ∈ R
2
be a Gaussian shift model with 0 < β < ∞ ﬁxed and known.
Let us ﬁx the quadratic risk for an estimator
ˆ
θ = (ˆ q, ˆ p)
R(
ˆ
θ, θ, G) = g
q
E
θ
((ˆ q −q)
2
) + g
p
E
θ
((ˆ p −p)
2
)
1. The following covariant measurement is optimal for the above estimation
problem
M(dˆ q dˆ p) = [ξ, ˆ z)¸ξ, ˆ z[
dˆ q dˆ p
2π
where [ξ, ˆ z) is the pure squeezed state with ˆ z = (ˆ q + i ˆ p)/
√
2 and
squeezing parameter e
−2ξ
=
p
g
p
/g
q
2. The components of the corresponding estimator
ˆ
θ = (ˆ q, ˆ p) are unbiased
estimators of q and respectively p and their covariance matrix achieves the
lower bound in the right CramerRao bound.
143
Simple implementation of the optimal measurement (I)
The measurement M(dˆ q dˆ p) can be dilated to a PVM as follows.
Let Q, P be the coordinates of the system and let Q
/
, P
/
be the
coordinates of an independent copy of the oscillator. On the joint space
L
2
(R) ⊗L
2
(R) we have
„
Q
P
«
≡
„
Q ⊗1
P ⊗1
«
,
„
Q
/
P
/
«
≡
„
1 ⊗Q
/
1 ⊗P
/
«
,
Deﬁne the rotated coordinates (50% beamsplitter transformation)
„
Q
1
P
1
«
=
1
√
2
„
Q + Q
/
P + P
/
«
,
„
Q
2
P
2
«
=
1
√
2
„
Q −Q
/
P −P
/
«
,
and note that P
1
commutes with Q
2
. Thus we can deﬁne the PVM
E(dˆ q dˆ p) = E
˜
Q
(dˆ q) E
˜
P
(dˆ p)
with
˜
Q := Q −Q
/
and P
/
= P + P
/
144
Simple implementation of the optimal measurement (II)
If the oscillator (Q
/
, P
/
) is prepared in state [ξ) then we obtain the
eﬀective measurement M on the ﬁrst copy of L
2
(R)
Tr(ρM(B)) = Tr(ρ ⊗[ξ)¸ξ[E(B)), B ∈ Σ(R
2
)
The measurement M is uniquely ﬁxed by its characteristic functions
ϕ
M
ρ
(u, v) := Tr(ρM(e
iuˆ q+ivˆ p
)) = Tr(ρ⊗[ξ)¸ξ[ E(e
iuˆ q+ivˆ p
)), ρ ∈ T
1
(L
2
(R))
Thus it is enough to show that the right hand side is equal to
Z
e
iuˆ q+ivˆ p
Tr(ρ[ˆ z, ξ)¸ˆ z, ξ[)
dˆ qdˆ p
2π
Finally, up to a density argument, it is enough to show this for rank one ρ
of the form [z
1
, ξ)¸z
2
, ξ[. In this case the equality reduces to a
computation with Gaussian integrals (exercise)
145
Proof of the optimal estimation for twodimensional shift Theorem
It is easy to see that the measurement is unbiased and has the same risk
for all θ
To show that the measurement is optimal one has to show that
Tr(GVar(
ˆ
θ)) achieves the smallest possible value.
For this it suﬃces to show that the right Cram´erRao bound is achieved in
the sense that
Tr(GVar
θ
(
ˆ
θ)) = Tr(GJ
−1
θ
)
Both sides can be explicitly computed, similarly to the one dimensional
case.
146
Alternative proof using covariant measurements
Lemma
Any covariant measurement w.r.t the displacement operators has a POVM of
the form
M(dˆ q dˆ p) = D(ˆ z)ρD(ˆ z)
∗
dˆ q dˆ p
2π
, ˆ z = (ˆ q + i ˆ p)/
√
2
where ρ is an arbitrary state.
Proof.
Extension of the results on covariant measurements for irreducible
representations, to the case of noncompact (but abelian) group R
2
Theorem
The covariant measurement which minimises the risk R(
ˆ
θ, θ, G) is given by
M(dˆ q dˆ q) = [ξ)¸ξ[
dˆ q dˆ p
2π
with e
−2ξ
=
p
g
q
/g
p
.
147
Alternative proof using covariant measurements
Since the risk is aﬃne w.r.t. ρ the minimum is achieved on an extremal
point, so we can restrict to pure states ρ = [ψ)¸ψ[
Let M be the measurement with seed ψ. Then M can be dilated to a
PVM on two oscillators as before, by simply replacing [ξ) by [ψ) and
E
θ
(ˆ q) =
Z
ˆ q Tr(ρ
θ
⊗[ψ)¸ψ[E
˜
Q
(dˆ q)E
˜
P
(dˆ p))
= Tr(ρ
β
⊗[ψ)¸ψ[
˜
Q) = Tr(ρ
β
Q) −¸ψ[Q
/
[ψ)
= q −¸ψ[Q
/
[ψ)
Similarly, E
θ
(ˆ p) = p +¸ψ[P
/
[ψ).
The measurement is thus unbiased up to a constant factor.
The risk of M is constant as function of θ and equal to
R(M, G) = g
q
[Tr(ρ
β
Q
2
) + Tr([ψ)¸ψ[Q
/2
)] + g
q
[Tr(ρ
β
P
2
) + Tr([ψ)¸ψ[P
/2
)]
=
coth(β/2)
2
Tr(G) + [g
q
¸ψ, Q
/2
ψ) + g
p
¸ψ, P
/2
ψ)]
148
Alternative proof using covariant measurements
We have
g
q
¸ψ, Q
/2
ψ) + g
p
¸ψ, P
/2
ψ)
2
≥
√
g
q
g
p
p
ψ, Q
/2
ψ)¸ψ, P
/2
ψ) ≥
1
2
√
g
q
g
p
where we used Heisenberg’s uncertainty relation in the second inequality.
Equalities are obtained if [ψ) is a minimum uncertainty state, i.e. a pure
squeezed states [ψ) = [ξ), and if
g
q
e
−2ξ
= g
p
e
2ξ
i.e. e
−2ξ
=
p
g
p
/g
q
The minimum risk is
R(G) =
coth(β/2)
2
Tr(G) +
√
DetG
149
Wigner functions and quantum homodyne tomography
HilbertSchmidt operators
Isometry between T
2
and L
2
(R
2
)
The Wigner function
Examples of Wigner function
Quantum homodyne homography
Estimation of matrix elements using pattern functions
Reference:
A. S. Holevo, Probabilistic and statistical aspects of quantum theory (1982)
L. Artiles, R. D. Gill and M. Gut ¸˘a, J. Royal Stat. Soc B, 67, 109134, (2005)
150
HilbertSchmidt operators
Deﬁnition (HilbertSchmidt operators)
Let 1 be a Hilbert space. The class of HilbertSchmidt operators is deﬁned by
T
2
(1) := τ ∈ B(1) : Tr([τ[
2
) < ∞¦
with τ
2
= Tr([τ[
2
)
1/2
.
Properties
T
2
(1) is a Hilbert space with inner product
¸τ, σ)
2
= Tr(τ
∗
σ)
The ﬁnite rank operators are dense in T
2
(1)
Any τ ∈ T
2
(1) has a singular value decomposition
τ =
∞
X
i =1
µ
i
[e
i
)¸f
i
[,
where µ
i
≥ 0, e
i
¦ and f
i
¦ are ONB and
P
i
[µ
i
[
2
= τ
2
2
< ∞.
The traceclass operators T
1
(1) form a subset of T
2
(1)
151
Isometry between T
2
(L
2
(R)) and L
2
(R
2
)
Proposition (Isometry between T
2
(L
2
(R)) and L
2
(R
2
))
1. Let ψ
1
, ψ
2
be vectors in L
2
(R). Then the function
(u, v) →
f
W
¦ψ
2
)¸ψ
1
¦
(u, v) := ¸ψ
1
, exp(−i (uQ + vP))ψ
2
)
is square intergrable, i.e.
f
W(u, v) ∈ L
2
(R
2
)
2. If e
i
¦ is an ONB of L
2
(R) then the functions
1
√
2π
f
W
¦e
i
)¸e
j
¦
(u, v), i , j ≥ 1
form an ONB of L
2
(R
2
)
3. The transformation
τ →
1
√
2π
f
W
τ
(u, v) :=
1
√
2π
Tr(τ exp −i (uQ + vP)))
maps the Hilbert space T
2
(L
2
(R)) unitarily onto L
2
(R
2
).
152
Isometry between T
2
(L
2
(R)) and L
2
(R
2
)
Proof.
1. Using the deﬁnition of Weyl operators we have
1
√
2π
¸ψ
1
, exp(−i (uQ + vP))ψ
2
) =
1
√
2π
e
i
2
uv
Z
ψ
1
(x)e
−iux
ψ
2
(x −v)dx
=
1
2π
e
i
2
uv
Z Z
ψ
1
(x)
˜
ψ
2
(y)e
ixy
e
−i (yv+ux)
dxdy
where
˜
ψ
2
is the Fourier transform of ψ
2
.
Since ψ
1
, ψ
2
are square integrable, then ψ
1
(x)
˜
ψ
2
(y)e
ixy
is a square integrable
function of (x, y) and the double integral is its Fourier transform.
2. If e
i
¦
i
is an ONB in L
2
(R) then e
i
(x)˜ e
j
(y)e
ixy
¦
i ,j
is an ONB in L
2
(R
2
).
The result follows from the fact that the Fourier transform is unitary operator
on L
2
(R
2
).
3. This is a consequence of 2. and of the fact that the ﬁnite rank operators are
dense in T
2
(L
2
(R)).
153
Wigner function
Deﬁnition (Wigner function)
Let τ ∈ T
2
(L
2
(R)).
The ‘characteristic function’ of τ is deﬁned as
f
W
τ
(u, v) := Tr(τ exp(−i (uQ + vP))), (u, v) ∈ R
2
The Wigner function of τ is deﬁned as
W
ρ
(q, p) =
1
(2π)
2
Z Z
exp(iuq) exp(ivp)
f
W(u, v)dudv
154
(some) Properties of Wigner functions
Let τ ∈ T
2
(L
2
(R)). Then W
τ
is a square integrable function and
W
τ
∗
(q, p) = W
τ
(q, p)
(Overlap formula) Let τ
1
, τ
2
∈ T
2
(L
2
(R)). Then
Tr(τ
∗
1
τ
2
) = 2π
Z Z
W
τ
1
(q, p)W
τ
2
(q, p)dqdp
Let ρ be a density matrix. Then the one dimensional marginal along the
direction φ of the Wigner function W
ρ
is equal to the probability density
of the quadratures X
φ
in the state ρ:
p
(X
φ
)
ρ
(q) =
Z
∞
−∞
W(q cos φ −p sin φ, q sin φ + p cos φ)dp
W
ρ
is a quasiprobability distribution of Q and P: its marginals are
probability densities but W
ρ
may take negative values.
Displacements, phase rotations and squeezing act in the obvious way on
the space of Wigner functions.
155
Examples of Wigner functions
Wigner function of a squeezed state Wigner function of a singlephotonadded state
!4
!2
0
2
4
!4
!2
0
2
4
!0.3
!0.2
!0.1
0
0.1
0.2
0.3
0.4
q
p
10
20
30
40
50
60
10
20
30
40
50
60
!0.3
!0.25
!0.2
!0.15
!0.1
!0.05
0
0.05
0.1
Wigner function of a ‘Schr¨ odinger cat state’
(superposition of two coherent vectors)
Wigner function of the one photon state ψ
1
156
Quantum homodyne tomography
Quantum homodyne tomography is a measurement technique developed in
quantum optics for the estimation of the state of a quantum oscillator
(monochromatic mode of light)
G. Breitenbach, S. Schiller and J. Mlynek
Measurement of the quantum states of squeezed light
Nature 387 471475 (1997)
157
Quantum homodyne tomography: the measurement procedure
Meausurement procedure
1. One chooses a random, uniformly distributed phase φ ∈ [0, π]
2. The quadrature X
φ
is measured on a quantum system prepared in state ρ.
This is a so called homodyne measurement (see ﬁgure)
3. Steps 1. and 2. are repeated on independent copies of ρ and one collects
i.i.d. data (Φ
1
, X
1
), . . . , (Φ
n
, X
n
) with probability density
p
ρ
(φ, x) = p(φ) p
ρ
(x[φ) =
1
π
p
X
φ
ρ
(x), x ∈ R, φ ∈ [0, π]
signal
detector
local
detector
oscillator
I
2
I
1
I2−I1
z
∼ p
ρ
(x, φ)
beam splitter
z = ze
iφ
158
Qauntum homodyne tomography: the Radon transform
Deﬁnition (Radon transform)
Let W
ρ
: R
2
→R be the Wigner function of ρ. The Radon transform of W
ρ
is
the function on R [0, π] given by
1[W
ρ
](q, φ) :=
Z
∞
−∞
W(q cos φ −p sin φ, q sin φ + p cos φ)dp = p
X
φ
ρ
(q)
La fonction de Wigner et la transformation de Radon
On peut pas mesurer Q et P simultan´ement!
QP −PQ = i1 =⇒ E(QP) = Tr [P
Ψ
QP] = Tr [P
Ψ
PQ] = E(PQ)
Quadratures: X
φ
= cos φ Q + sin φ P, densit´e de prob. p
ρ
(x  φ)
ρ W
ρ
(q, p) p
ρ
(x  φ)
W
ρ
(u, v) = Tr [ρ exp(−iuQ −ivP)]
............................................................................................................................................................................................................... . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
....................................................................................................................................................................................................................................................................................................................................... . . . . . . . . . . .
. . . . . . . . . . . . R
...................................................................................................................................................................................................................................................................................................... . . . . . . . . . . . . . . . . . . .....
F
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... . .
............
F
1
Radon: R[f ](x, φ) =
∞
−∞
f (x cos φ + t sin φ, x sin φ −t cos φ)dt
Remark
From the above diagram we conclude that the map ρ →p
ρ
(x, φ) is
injective and hence ρ is identiﬁable from the data with distribution P
n
ρ
.
If one aims at estimating W
ρ
rather than ρ, the problem is closely related
to the ‘classical’ positron emission tomography method with the diﬀerence
that W
ρ
has intrinsic quantum properties, e.g. it may take negative values.
159
Quantum homodyne tomography: the statistical problem
Problem
Given i.i.d. data (X
1
, Φ
1
), . . . , (X
n
, Φ
n
) with distribution P
ρ
, construct an
estimator ˆ ρ
n
of ρ such that
ˆ ρ
n
is consistent, i.e. d(ˆ ρ
n
, ρ) →0 as n →∞ for some relevant distance d,
e.g. normone, ﬁdelity...
R(ˆ ρ
n
, ρ) := E
ρ
d(ˆ ρ
n
, ρ) is small for all states ρ
Similar problem for the estimation of W
ρ
or some functional of ρ, e.g. Tr(ρ
2
)
Remark
If some information is available about ρ we can encode it in the
parametrisation ρ = ρ
θ
for θ ∈ Θ
If the dimension of Θ is inﬁnite, we deal with a nonparametric estimation
problem for which the risk R(ˆ ρ
n
, ρ) may decrease slower than the 1/n
typical for parametric problems
A more restrictive statistical model ρ
θ
: θ ∈ Θ¦ leads to faster estimation
rates but one should avoid “model misspeciﬁcation”
160
Estimation of matrix elements using pattern functions
Lemma
Let (X, Φ)
∼
= P
ρ
. Let ˜ w
ρ
(s[φ) :=
f
W
ρ
(s cos φ, s sin φ) and deﬁne
F
ij
(x, φ) :=
r
π
2
T
−1
[ ˜ w
¦j )¸j ¦
(−s[φ)[s[](x)
where T
−1
is the inverse Fourier transform with respect to s. Then
E
ρ
(F
i ,j
(X, Φ)) = ρ
i ,j
.
Moreover F
j ,k
(x, φ) is of the form
F
j ,k
(x, φ) = f
j ,k
(x) exp
i (j −k)φ
with f
j ,k
bounded oscillatory functions called pattern functions.
161
Estimation of matrix elements using pattern functions
Proof.
We have
ρ
i ,j
= Tr(ρ[j )¸i [) =
1
2π
Z Z
f
W
ρ
(u, v)
f
W
i ,j
(−u, −v)dudv
=
1
2π
Z
∞
−∞
Z
π
0
[s[w
ρ
(s[φ)w
¦j )¸j ¦
(−s[φ) [s[dsdφ
=
1
π
Z
∞
−∞
Z
π
0
F
i ,j
(x, φ)p
ρ
(x[φ)
= E
ρ
(F
i ,j
(X, Φ))
In the ﬁrst equality we used the isometry between T
2
(L
2
(R)) and L
2
(R
2
)
In the third equality we used the relation w(s[φ) = T[p(x[φ)](s).
The dependence of F
j ,k
on φ follows from the deﬁnition of
f
W
j ,k
.
162
Pattern functions
−5 0 5
−1
−0.5
0
0.5
1
Pattern function f
5,5
q
−10 −5 0 5 10
−1
−0.5
0
0.5
1
Pattern function f
20,20
q
−10 −5 0 5 10
−1
−0.5
0
0.5
1
Pattern function f
5,20
q
−10 −5 0 5 10
−1
−0.5
0
0.5
1
1.5
Pattern function f
10,30
q
Pattern functions for diﬀerent matrix elements
163
Estimators based on pattern functions
Deﬁnition (pattern function estimator for matrix elements)
Let (X
1
, φ
1
), . . . , (X
n
, Φ
n
) be i.i.d. with distribution P
ρ
. The pattern function
estimator of ρ
j ,k
is
˜ ρ
(n)
j ,k
:=
1
n
n
X
i =1
f
j ,k
(X
i
)e
i (j −k)Φ
i
Theorem (consistency of truncated pattern function estimators)
Let ˆ ρ
(n)
be the density matrix estimator with matrix elements
ˆ ρ
(n)
j ,k
:=
˜ ρ
(n)
j ,k
if j , k ≤ d(n)
0 if maxj , k¦ > d(d)
where d(n) is the eﬀective dimension of the estimator and satisﬁes d(n) ↑ ∞ as
n →∞ and d(n) = o(n
3/7
)
Then ˆ ρ
(n)
is consistent with respect to the  
2
distance, i.e.
lim
n→∞
E
ρ
ˆ ρ
(n)
−ρ
2
2
= 0
164
Estimators based on pattern functions
Proof.
We write the risk as
E(ˆ ρ
(n)
−ρ
2
2
) =
∞
X
j ,k=0
E
ρ
([ ˆ ρ
j ,k
−ρ
j ,k
[
2
)
=
d(n)
X
j ,k=0
E
ρ
([ ˜ ρ
j ,k
−ρ
j ,k
[
2
) +
X
max{j ,k}>d(n)
[ρ
j ,k
[
2
.
The variance term (ﬁrst) decreases with n but increases with d, while the
bias term (second) decreases with d. Thus, if the variance is controlled
while increasing d with n we obtain the consistency result.
Now
E
ρ
([ ˜ ρ
j ,k
−ρ
j ,k
[
2
) =
1
n
Z
[F
j ,k
(x, φ) −ρ
j ,k
[
2
p(x[φ)
dφ
π
dx ≤
1
n
f
j ,k

2
2
The result follows from the bound (see J. Royal Stat. Soc. B, 67, 109134, (2005))
d
X
j ,k=0
f
j ,k

2
2
= O(d
7/3
)
165
Dependence of the risk on dimension d
The graphs below show the risk as function of dimension d for two estimation
methods (pattern functions and sieve max. lik.) and several choices of n.
The tradeoﬀ between bias and variance is reﬂected in the existence of a
minimum for a certain ‘oracle’ dimension d
∗
(n) which depends on ρ.
5 10 15 20 25
0
0.5
1
1.5
2
n = 100
5 10 15 20 25
0
0.5
1
1.5
2
n = 200
5 10 15 20 25
0
0.5
1
1.5
2
n = 400
5 10 15 20 25
0
0.5
1
1.5
2
n = 800
5 10 15 20 25
0
0.5
1
1.5
2
n = 1600
5 10 15 20 25
0
0.5
1
1.5
2
n = 3200
5 10 15 20 25
0
0.5
1
1.5
2
n = 6400
5 10 15 20 25
0
0.5
1
1.5
2
n = 12800
4 6 8 10
−4
−3
−2
−1
0
log(n)
l
o
g
(
m
i
n
(
L
2
)
)
Opt. L
2
vs n, PF and MLE
166
Methods for choosing the dimension d(n)
Deterministic choice of dimension
Suppose that ρ belongs to a ‘nice’ class of states, e.g.
ρ ∈ c(α, β) := τ : Tr(τe
αN
≤ β)¦, α, β > 0
Then
X
max{j ,k}>d
[ρ
j ,k
[
2
≤ C(α, β)e
−αd
and the risk is upper bounded by
E
ρ
(ˆ ρ
n
−ρ
2
2
) ≤ C
d
7/3
n
+ C
/
e
−αd
By choosing d =
1
α
log n we get
E
ρ
(ˆ ρ
n
−ρ
2
2
) = O
„
(log n)
7/3
n
«
which is only slightly worse than the parametric rate 1/n
167
Methods for choosing the dimension d(n)
Data driven choice of dimension
The previous method selects d which works for all states in a certain class. It
would be nicer to adapt the dimension to the particular state by making use of
the data.
There are many ‘model selection’ methods, e.g. penalised maximum likelihood,
crossvalidation, Akaike’s information criterion, threshholding...
In general one would like to ﬁnd
d
∗
= arg min
d
R(d) := arg min
d
E
ρ
ˆ ρ
(n)
d
−ρ
2
2
where ˆ ρ
(n)
d
is the estimator of dimension d for n samples.
However R(d) depends on ρ, so we can try to estimate it and minimise the
estimator.
168
Methods for choosing the dimension d(n)
The risk as a function of d is
R(d) = E
ρ
ˆ ρ
(n)
d
−ρ
2
2
=
d
X
j ,k=0
E
ρ
[ ˜ ρ
(n)
j ,k
[
2
−2
d
X
j ,k=0
[ρ
j ,k
[
2
+ρ
2
2
We try to change this into an expression depending only on the data (not on
ρ
j ,k
), which is an estimator of the risk.
The last term does not depend on d and can be dropped without changing
the minimum
The ﬁrst term can be estimated from the data by simply taking
P
d
j ,k=0
[ ˜ ρ
(n)
j ,k
[
2
The second term can be estimated unbiasedly by
−2
n(n −1)
d
X
j ,k=0
X
a¸=b
F
j ,k
(X
a
, Φ
a
)F
j ,k
(X
b
, Φ
b
)
Thus our estimator for R(d) = ρ
2
2
is
d
X
j ,k=0
E
ρ
[ ˜ ρ
(n)
j ,k
[
2
−
2
n(n −1)
d
X
j ,k=0
X
a¸=b
F
j ,k
(X
a
, Φ
a
)F
j ,k
(X
b
, Φ
b
)
169
Methods for choosing the dimension d(n)
0 5 10 15 20 25 30 35 40 45
0
0.5
1
1.5
2
2.5
D
L
2
n
o
r
m
o
f
e
r
r
o
r
L
2
error, theoretical and for CV estimator for 1000 observations
The graph represents the risk R(d) as function of dimension d (red) and the
estimated risk for three samples of data from a squeezed state (blue)
170
The old paradigm
Quantum Mechanics up to the 80’s Quantum measurements have random results Only probability distributions can be predicted Perform measurements on huge ensembles Observe averages
Old Paradigm It makes no sense to talk about individual systems
E. Schr¨dinger [1952]: We are not experimenting with single o particles, any more than we can raise Ichtyosauria in the zoo
2
The new paradigm
Individual quantum systems are carriers of a new type of information
Delft qubit [2003]
[Naik et al, Nature, 2006]
[H¨ﬀner et al, Nature, 2005] a
[Monroe, Nature, 2002]
3
g.. photons.Quantum Information Science Quantum Information quantum entropy correlations (entanglement) between quantum systems capacity of quantum channel for information transmission Quantum Computation algorithms for quantum computers (e. Shor’s factoring algorithm) error correction theory diﬀerent practical implementations of quantum circuits (ion traps..) Quantum Filtering and Control stochastic evolution and continuous time measurements protecting quantum systems from ‘decoherence’ steering systems towards a desired state Quantum Probability and Statistics uniﬁed framework for classical and quantum stochastics measurement design for optimal statistical inference use probabilistic ideas in operator algebra theory 4 . solid state.
. . . . . . e. . . . . 2. k X i=1 Mi = 1 Statistical interpretation: The outcome X is random and the probability that X = i when the system is prepared in state ρ is P(M) ([X = i]) = Tr(ρMi ) ρ 5 . ρdd A measurement M with values in Ω = {1. . k} is given by a ‘positive operator valued measure’ Mi ∈ M(Cd ). .g. . ρ2d C B C ρ=B . Tr(ρ) = 1 . Cd Density matrix (quantum state) encodes all information about the preparation of the system 0 1 ρ11 ρ12 . . Mi ≥ 0. . ρ1d B ρ21 ρ22 . A @ .. . C ≥ 0. .Quantum measurements Every quantum system has an associated Hilbert space. . ρd1 ρd2 .
Quantum Statistics Quantum mechanics makes predictions about the direct map: M : ρ −→ P(M) ρ What if ρ is not known ? A quantum statistical model (experiment) is a family of states indexed by a parameter θ belonging to a space Θ Q := {ρθ : θ ∈ Θ} For each M we obtain a classical statistical model PM := {Pρθ : θ ∈ Θ} and we can apply ‘classical’ statistical tools to solve inverse problems like ˆ X ∼ P(M) −→ θ(X ) ≈ θ ρ Questions: for which measurements is θ identiﬁable? which measurements are optimal for a given statistical problem? how much statistical information does Q contain? can we develop a theory of statistical models at the quantum level ? (M) 6 .
.Motivation/Applications ρθ∼∼∼∼∼∼∼∼∼∼ M appareil Mesure quantique  X ∼ Pθ M  ˆ θ (X) estimateur r´sultat e Quantum Engineering de mesure statistical validation through measurements of new quantum states and devices quantum state/process estimation Quantum Information and Computation encoding and decoding information with quantum states state discrimination Statistics extend statistical decision theory to noncommutative models connections with Quantum Probability. 7 . Quantum Control..
History of Quantum Statistics [1966] Transmission rates for quantum channels R. W.L. Belavkin 8 . P. Helstrom [1975] Optimal multiple hypothesis testing [1976] Generalised uncertainty relations V. Stratonovich [19671976] “Quantum Detection and Estimation Theory” C.
History of Quantum Statistics [1972] noncommutative statistical decision theory [1982] “Probabilistic and Statistical Aspects of Quantum Theory” A. S. M. Matsumoto [1987] Diﬀerential geometric aspects of quantum state estimation [1996] Quantum Fisher information and asymptotic estimation [1998] Asymptotic bounds and optimal quantum estimation [2001] Statistical approach to Bell inequalities R. K. Gill 9 . Hayashi. Nagaoka. D. Fujiwara. A. Holevo The Japanese School H.
BarndorﬀNielsen O.nl/ maassen/lectures/qp. R.pdf 2. B. Kahn J. Quantum Detection and Estimation Theory (1976) 2. Helstrom. Guta. R.. E. W. Statist. (2003). On quantum statistical inference (with discussion). 3. Jupp. L. 10 . R. Landsman: Lecture notes on C∗ algebras.. H. Royal Statist. Soc... Hilbert C∗ modules and Quantum Mechanics http://xxx.leidenuniv. Artiles.A. B. 277. Gill. S.Useful references BOOKS 1. Janssens B. (2005). al: Quantum Statistics [book draft] http://www. Gill. 109134.math. Commun. M. Math.pdf 3. J. Soc. A. Holevo: Probabilistic and statistical aspects of quantum theory (1982) 3. 2. C. Gill et. Nielsen and I. M. Maassen: Quantum Probability Theory http://www...lanl. Chuang: Quantum computation and quantum information (2000) ONLINE LECTURE NOTES 1. 775816. J. 65. 67.E.math.ru. P. An invitation to quantum tomography.L..P. Guta M. Phys. R. 127160.gov/pdf/mathph/9807030 PAPERS 1. N. (2008).nl/∼gill/teaching/quantum/pages from Qbook. Optimal estimation of qubit states with continuous time measurements...
Color code red is used for keywords brown is used for notions which are deﬁned in appendices 11 .
Σ. ν)∗ T : L1 (Ω. Σ. ν) bounded random variables p ∈ L1 (Ω. Σ. f ) → R p(ω)f (ω)ν(dω) ‘Space’ H Hilbert space B(H) bounded selfadjoint operators ρ ∈ T1 (H) density matrices (ρ.Quantum Mechanics as noncommutative probability theory (Ω. ν) probability densities (p. Σ. ν) measure space L∞ (Ω. Σ. positive normalised Observables States Pairing = expectations Duality L∞ (Ω. ν) → L1 (Ω. ν) randomisations positive normalised 12 . Σ. Σ. A) → Tr(ρA) B(H) = T1 (H)∗ C : T1 (H) → T1 (H) quantum channel Transformations complet. ν) = L1 (Ω.
Hilbert spaces Inner product space Hilbert space Orthonormal basis Physical examples 13 .
. u. v + w = u. . v ∈ V . u2 . · : V × V → C satisfying the following conditions for all u. v u. b] with Z b f. u Example Cn : ntuples u := (u1 . λv = λ u. b]: continuous complex valued functions on [a. λ ∈ C: u.Inner product spaces Deﬁnition An inner product over a Clinear space V is a map ·. v + u.g = f (x)g (x)dx a 14 . v = n X j=1 uj vj C [a. u = 0 if and only if u = 0. v = v . . w u. un ) of complex numbers with u. u ≥ 0 for all u ∈ V and u.
g := f (x)g (x)dx L2 (Ω. respect to the norm h := Example L2 ([a. p · ) is called a Hilbert space if it is complete with ·. P) the space of square integrable random variables on (Ω. h. b]): the space of square integrable functions on [a.Hilbert spaces Deﬁnition (Hilbert space) An inner product space (H. P) with Z X . Y := E(X Y ) = X (ω)Y (ω)P(dω) 15 . Σ. b] with Z f . h . Σ.
Orthonormal basis (ONB) in a separable Hilbert space
Deﬁnition
Let (H, ·, · ) be a Hilbert space. A sequence of vectors {ek }1≤k≤N is an ONB of H if its linear span is dense in H and ei , ej = δi,j for all i, j. H is separable if and only if it has a countable ONB.
Properties
P Any vector x ∈ H has a unique decomposition x = k xk ek where xk = ek , x are the Fourier coeﬃcients w.r.t. the ONB {ek }. The following Parseval equality holds X 2 x =  ek , x  2 .
k
If K is a closed subspace of H and K⊥ its orthogonal complement then x has a unique decomposition x = y + y ⊥ with y ∈ K and y ⊥ ∈ K⊥ , and x 2 = y 2 + y ⊥ 2 . The vector y is called the orthogonal projection of x onto K and satisﬁes y = arg min z − x
z∈K
16
Direct sum and tensor products of Hilbert spaces
Deﬁnition
Let H1 , H2 be Hilbert spaces. 1. The direct sum H1 ⊕ H2 is the Hilbert space consisting of ordered pairs h1 ⊕ h2 ≡ (h1 , h2 ) ∈ H1 × H2 with inner product g1 ⊕ g2 , h1 ⊕ h2 = g1 , h1 + g2 , h2 H1 and H2 can be seen as orthogonal complements in H1 ⊕ H2 by identifying h1 ∈ H1 with h1 ⊕ 0 and h2 ∈ H2 with 0 ⊕ h2 . 2. The tensor product H1 ⊗ H2 is Hilbert space obtained as the norm completion of the algebraic tensor product H1 H2 w.r.t the inner product g1 ⊗ g2 , h1 ⊗ h2 := g1 , h1 g2 , h2 If {ei } and {fj } are ONB in H1 and H2 then {ei ⊗ fj } is an ONB in H1 ⊗ H 2
17
Which Hilbert space corresponds to a given quantum system ?
C2 for a spin, 2 level system, qubit C2 ⊗ C2 ⊗ · · · ⊗ C2 for n qubits L2 (R) for a particle in one dimension, harmonic oscillator F = ⊕∞ H⊗s n for bosonic many particle systems, quantum noise n=0 L2 (Ω, Σ, ν) square integrable random variables on (Ω, Σ, ν)
18
Hilbert space Operators Bounded operators The adjoint Selfadjoint operators Unbounded selfadjoint operators 19 .
Bounded selfadjoint operators Deﬁnition Let H be a Hilbert space. a Banach space which is also an algebra and satisﬁes A · B ≤ A B for all A.e. B ∈ B(H) 20 . Example (exercise) Any linear transformation of Cd is bounded The shift Sy given by (Sy f )(x) = f (x − y ) is a bounded operator on L2 (R) Rs The Volterra operator Tf (s) = 0 k(s. 1]) Theorem (B(H). t)f (t)dt with k(s. t) < C is a bounded operator on L2 ([0. A linear map A : H → H is called a bounded linear operator on H if Ah A := sup < ∞. · ) is a Banach algebra. h h=0 The space of bounded operators on H is denoted B(H). i.
Adjoint. selfadjoint. h A is called selfadjoint if A = A∗ Lemma (C∗ property) Let A ∈ B(H). Proof. A∗ Ah ≤ h 2 A∗ A ≤ A∗ A . C∗ property Deﬁnition Let A ∈ B(H). Together with A∗ A ≤ A A similar argument show that A∗ ≤ A 21 . Ah = h. A∗ this implies A ≤ A∗ . A∗ h = Ag . The adjoint A∗ of A is deﬁned by g . Then A∗ 2 = A 2 = A∗ A . From Ah we get A 2 2 = Ah.
Let K be a closed subspace of H. A ∈ B(H) is selfadjoint iﬀ Ai.i ( hermitian matrix) The Pauli matrices „ « 0 1 σx = 1 0 „ σy = 0 i −i 0 « σz = „ 1 0 0 −1 « are selfadjoint and form a basis of M(C2 ) together with the identity 1. Px2 22 . where Ai. with y ∈ K and y ⊥ ∈ K⊥ . Aej Then. Let x = y + y ⊥ be the unique orthogonal decomposition of x ∈ H. y2 = Px1 .j = Aj.Examples of bounded selfadjoint operators Let H = Cd and let {ei } be the standard basis in Cd . The orthogonal projection PK onto K deﬁned by P : x → y is a selfadjoint operator satisfying P = P 2 : x1 .j ]. y2 = y1 .j = ei . Px2 = x1 . Then B(H) ≡ M(Cd ) by identifying A with the matrix [Ai.
whose domain D(R) is a dense linear subspace of H. Q : h → (Qh)(x) = xh(x) R with domain D(Q) = {h : xh(x)2 dx < ∞} P : f → −i with domain D(P) = {f : f (b) − f (a) = df dx Rb a g (x)dx.Examples of unbounded selfadjoint operators Deﬁnition An (unbounded) linear operator on H is deﬁned as a linear map R : D(R) → H. h = k. ∀k ∈ D(R) R is selfadjoint if D(R) = D(R ∗ ) and R = R ∗ on their common domain. The domain of R ∗ consists of those h for which there exists g := R ∗ (h) so that Rk. g . g ∈ H} 23 . Example The position and momentum operators Q and P are selfadjoint on L2 (R).
Spectral Theorem Spectral Theorem in ﬁnite dimensions Spectrum and resolvent Projection valued measures Spectral Theorem for bounded selfadjoint operators Continuous functional calculus Spectral Theorem: multiplication operator form L∞ functional calculus Multiplicity Theory 24 .
If AB = BA no such basis exists. . @ . 0 . .. 0 C C . . k = 1. . so they can be diagonalised simultaneously.Spectral theorem in ﬁnite dimensions Theorem (diagonalisation = spectral theorem) Let A be selfadjoint operator on Cd . . . λd ) If A. B are selfadjoint and commute. . . A . . . . . .. . AB = BA. k=1 0 associated to fk . Then there exists an ONB of eigenvectors of A: Afk = λk fk . then they have a commun eigenbasis. d where λk ∈ R are the eigenvalues of A. .. i. Let Pk be the onedimensioanal projections 0 λ1 d B 0 X B A= λk Pk = B . 0 λ2 . . λd Remark If A = A∗ ∈ M(Cd ) then A = max(λ1 .e. C . then 1 0 . 25 . .
with bounded inverse. The multiplication operator Mf ∈ B(L2 ([0. 1])) has spectrum σ(Mf ) = {y : f (x) = y for some x ∈ [0. The spectrum of A is deﬁned as σ(A) = C \ ρ(A). A complex number α is said to be in the resolvent set ρ(A) if α1 − A is a bijection. Properties The spectrum σ(A) is contained in the set {α ∈ C : α ≤ A } compact nonempty If A is selfadjoint then σ(A) ⊂ R and r (A) := supλ∈σ(A) λ = A Example (exercise) „ The matrix σ+ := 0 0 1 0 « has spectrum σ(σ+ ) = {0} Let f ∈ C ([0. 1]} if U is unitary (UU ∗ = U ∗ U = 1) then σ(U) ⊂ {λ ∈ C : λ = 1} 26 .Resolvent and spectrum Deﬁnition Let A ∈ B(H). 1]).
Σ) is a map P : Σ → B(H) which satisﬁes P(E ) is an orthogonal projection for each E ∈ Σ P is σadditive: for any countable family {Ei } of mutually disjoint sets P P(∪∞ Ei ) = ∞ P(Ei ) (sum converging strongly) i=1 i=1 P(Ω) = 1 For any unit vector h ∈ H we deﬁne the probability measure on (Ω. Σ) Ph (E ) = h.Projection valued measure (PVM) Deﬁnition Let {An } be a sequence of operators in B(H). P(E )h 27 . We say that An converges in norm to A ∈ B(H) if limn→∞ An − A = 0 An converges strongly to A ∈ B(H) if limn→∞ (An − A)h = 0 for any h ∈ H Deﬁnition A projection valued measure (PVM) over a measure space (Ω.
Examples of projection valued measures Example Let {ei }i=1. ei 2 Let H = L2 (Ω.. Σ. ν) and let P(E ) be the projection onto the subspace of functions with support in E P(E ) : f → f · χE The measure Ph is given by Ph (dω) = h(ω)2 ν(dω) 28 .N be an ONB in H. .. . . N} with P(E ) = i∈E Pi The measure Ph is given by Ph (i) =  h.. .. Then the corresponding orthogonal P projections Pi deﬁne a PVM over {1.
The PVM is supported by the spectrum: P(σ(A)) = 1. 1]) functions with support in E 29 . Ah = R λPh (dλ) for every h ∈ H. Example (exercise) The multiplication operator Mx : f (x) → xf (x) on L2 ([0.Spectral Theorem Theorem (Spectral Theorem) Let A ∈ B(H) be selfadjoint. Then there exists a PVM P over R such that Z A = λP(dλ) in the sense that h. 1]) does not have any eigenvalue but has a ‘continuous’ PVM with P(E ) : f → f · χE the projection onto the subspace of L2 ([0.
multiplication operator form: A is unitarily equivalent to a multiplication operator on L2 L∞ calculus: deﬁne f (A) for f ∈ L∞ (σ(A).Main steps of the proof Continuous functional calculus: deﬁne f (A) for f ∈ C (σ(A)) Spectral Theorem . µ) such that P(E ) = χE (A) 30 .
i. Show that P(A) = supλ∈σ(A) P(λ) : P(A) 2 ¯ = P(A)∗ P(A) = (PP)(A) = sup ¯ λ∈σ((PP)(A)) ¯ λ = sup PP(λ) λ∈σ(A) 2. φ(λf ) = λφ(f ).Proof of the Spectral Theorem (I) Theorem (Continuous functional calculus) There is a unique map φ : C (σ(A)) → B(H) with the properties (i) φ is a C∗ algebra morphism.e. P P 1. Show that σ(P(A)) = {P(λ) : λ ∈ σ(A)} (exercise) 3. Then n=1 n=1 (i) and (iii) are satisﬁed for polynomials by choosing φ(P) = P(A) 2. Let P(λ) = p an λn be a polynomial and let P(A) := p an An . then φ(Id) = A Proof. ∞ ¯ φ(f ) = φ(f )∗ . φ(1) = 1 (ii) φ is isometric: φ(f ) = f (iii) let Id be the function Id(λ) = λ. φ(fg ) = φ(f )φ(g ). Extend φ by continuity from polynomials to the whole C (σ(A)) 31 . 4.
Proof of the Spectral Theorem (II) Deﬁnition Let h ∈ H be a unit vector. f (A)h = σ(A) f (λ)µh (dλ) The measure µh is called the spectral measure assciated to h. µh ) such that (UAU −1 f )(λ) = λf (λ) 32 . Then there exists a unitary U : H → L2 (σ(A). n=0 Theorem Let A ∈ B(H) be selfadjoint with cyclic vector h. Then f → hf (A)h is a positive linear functional on C (σ(A)) and by the RieszMarkov Theorem there exist a probability R measure µh on σ(A) such that h. Deﬁnition A vector h is cyclic for A ∈ B(H) if the span of {An h}∞ is dense in H.
Check that UAU −1 acts as multiplication by λ on functions in C (σ(A)) (UAU −1 f )(λ) = [UAφ(f )h](λ) = [Uφ(Id)φ(f )h](λ) = [Uφ(Id·f )h](λ) = λf (λ) 33 . 1. φ(f f )h = f (λ)2 µh (dλ) 2. Since h is cyclic and C (σ(A)) is dense in L2 .Proof of the Spectral Theorem (III) Proof. U can be extended to a unitary operator 3.Then U is norm preserving by Z ¯ φ(f )h 2 = h. φ(f )∗ φ(f )h = h. Deﬁne U by U : φ(f )h → f for all f ∈ C (σ(A)).
Ah ∈ Hi for all h ∈ Hi For each i there exists a vector hi ∈ Hi which is cyclic for AHi We then apply the previous Theorem for each cyclic subspace 34 .multiplication operator form) Let A ∈ B(H) be selfadjoint. Then there exist unit vectors {hi }N in H and a i=1 N 2 unitary operator U : H → ⊕i=1 L (R. i ≥1 Proof. µhi ) such that (UAU −1 f )i (λ) = λfi (λ). i. for example if A has a degenerate eigenvalue. there exist at least two linearly independent eigenvectors Theorem (Spectral Theorem .e.e. Using Zorn lemma we can split H into a direct sum of subspaces Hi such that A leaves each Hi invariant.Proof of the Spectral Theorem (IV) Remark In general there may not exist a cyclic vector. i.
µ) → B(H) such that ˜ (i) φ is an extension of φ : C (σ(A)) → B(H) (Continuous Functional Calc.) ˜ (ii) φ is isometric ˜ (iii) P(E ) = φ(χE ) ˜ (iv) φ is normal. µ) and B(H) 35 . i.Proof of the Spectral Theorem (V) Theorem (L∞ functional calculus) Let µ be a probability measure on σ(A) such that µ ∼ {µhi }i≥1 . it is continuous with respect to the weak∗ topology on L∞ (σ(A).e. µ(E ) = 0 iﬀ µi (E ) = 0 for all i ˜ Then there exists a unique morphism φ : L∞ (σ(A).e. i.
µhi ) 2. µ) there is an i such that f ∞ = Mi (f ) 4. the 2 operator Mi (f ) : g → f · g is well deﬁned on L (σ(A). 1. µ) (exercise) ˜ 3. µ).Proof of the Spectral Theorem (V) Proof of (i)(iii). µ) where the spectral projections of A are ˜ P(E ) = χE (A) = φ(χE ) = U −1 [⊕i Mi (χE )] U 36 . We have ˜ f (A) := φ(f ) = Z f (λ)P(dλ). Indeed since µ µhi . φ is isometric: for any f ∈ L∞ (σ(A). This map can be extended to L∞ (σ(A). By the previous Theorem. for any f ∈ C (σ(A)) we have φ(f ) = U −1 [⊕i Mi (f )] U where Mi (f ) is the multiplication by f on L2 (σ(A). f ∈ L∞ (σ(A).
does there exist a unitary V such that A = VBV −1 ? Answer in ﬁnite dimensions: two selfadjoint matrices are unitarily equivalent if they have the same spectrum and same multiplicities for each eigenvalue.Further spectral analysis: multiplicity theory The choice of cyclic vectors hi and spectral measures µhi is not unique. Functional AnalysisSpectral Theory. i. and it is not clear how to use them to answer the following natural question: Question: Given two selfadjoint operators A. Sunder. Two operators are unitarily equivalent if and only if all their measures are (A) (B) equivalent µi ∼ µi . Reference: V.e. Birkh¨user. they have the same sets of measure zero. µi where all measures µi (A) are mutually disjoint. Theorem (HahnHellinger) Any selfadjoint operator is unitarily equivalent to the multiplication operator on “ ” (A) ℵ0 i 2 ⊕i=1 ⊕k=1 L σ(A). (1998) a 37 . B.S.
Traceclass operators The trace Polar decomposition Traceclass operators Duality between T1 (H) and B(H) 38 .
The trace is independent of the basis and has the following properties Tr(A) is independent of the ONB {ei } Tr(A + B) = Tr(A) + Tr(B) Tr(λA) = λTr(A).The trace Deﬁnition The trace of a positive operator A ∈ B(H) is deﬁned by X Tr(A) = ek . λ≥0 Tr(UAU ∗ ) = Tr(A) for all unitaries U if 0 ≤ A ≤ B then Tr(A) ≤ Tr(B) 39 . Aek k where {ek } is an ONB.
The trace is independent of the ONB
Proof.
Given the ONB {ek } deﬁne Tre (A) = k ek , Aek . Let {fj } be another ONB. Then X X 1/2 2 Tre (A) = ek , Aek = A ek
k k
P
! = X X
k j
 fj , A1/2 ek 2 !  A1/2 fj , ek 2
2
= = =
X X
j k
X
j
A1/2 fj
Trf (A)
where the sums can be exchanged since all terms are positive. The other properties are left as an exercise
40
The polar decomposition
Deﬁnition
An operator W ∈ B(H) is called a partial isometry if both WW ∗ and W ∗ W are orthogonal projections √ The absolute value of B ∈ B(H) is deﬁned by B = B ∗ B ≥ 0
Theorem (polar decomposition)
Let B ∈ B(H). Then there exists a partial isometry W such that B = W B. W is uniquely determined by the condition that Ker (W ) = Ker (B).
Sketch of the proof.
The map W : Ran(B) → Ran(B) given by W : Bh → Bh is well deﬁned since Bh
2
= Bh, Bh = h, B2 h = h, B ∗ Bh = Bh
⊥
2
Extend W to extends to an isometry from Ran(B) to Ran(B) and to zero on Ran(B) . Since Bh = 0 ⇔ Bh = 0, we have Ker(W ) = Ker(B) = Ker(B)
41
Traceclass operators
Deﬁnition
The space of traceclass operators is T1 (H) = {τ ∈ B(H) : τ
1
:= Trτ  < ∞}.
Properties
1. T1 (H) is a Banach space 2. Let A ∈ T1 (H) be selfadjoint. Then A has a complete basis P eigenvectors of P ei with eigenvalues λi such that A = i λi Pei and A 1 = i λi  3. If A ∈ T1 (H) and B ∈ B(H) then
A∗ , AB, BA ∈ T1 (H) Tr(AB) = Tr(BA) Tr(A∗ ) = Tr(A) Tr(AB) ≤ A 1 · B
Remark
Point 2. is a particular case of the following: any selfadjoint compact operator has discrete spectrum with λi → 0. Point 3. can be proved using point 2. and the polar decomposition (exercise)
42
Point 1.: proof of the triangle inequality
Let A, B ∈ T1 . We will show that A + B polar decompositions A + B = UA + B, Then Tr(A+B) = X
k 1
≤ A
1
+ B
1.
Consider the
A = V A,
B = W B
ek , U (A+B)ek ≤
∗
X
k
X  ek , U V Aek +  ek , U ∗ W Bek 
∗ k
Now by applying the CauchySchwarz inequality twice X X ∗  ek , U V Aek  ≤ A1/2 V ∗ Uek · A1/2 ek
k k
≤
(
X
k
A
1/2
V Uek
∗
2 1/2
)
·(
X
k
A1/2 ek
2 1/2
)
The sums on the right side are equal to Tr(U ∗ V AV ∗ U) and Tr(A). Using the fact that U, V are partial isometries one can show that Tr(U ∗ V AV ∗ U) ≤ Tr(A) hence the left side is smaller that A
1
43
B(H) = T1 (H)∗
Deﬁnition
Let V be a Banach space. The dual V ∗ is the space of continuous linear maps t : V → C. V ∗ is a Banach space when endowed with the norm t = sup t(v )
v =1
Theorem
The space (B(H), · ) is the dual of T1 (H) with the pairing T1 (H) × B(H) (τ, A) → Tr(τ A)
Sketch of the proof.
1. Show that B(H) ⊂ T1 (H)∗ . Let B ∈ B(H). Since Tr(Bτ ) ≤ Tr(τ ) B , the linear functional τ → Tr(τ B) is bounded on T1 (H) 2. Show that T1 (H)∗ ⊂ B(H). Let ∈ T1 (H)∗ . Then (h k) = k, Bh = Tr(Bh k) for a B ∈ B(H). Use the fact that ﬁnite rank operators are · 1 dense in T1 (H)
44
States, Observables and Measurements
States and observables in Quantum Mechanics The weak∗ topology Measurements as (completely) positive maps Positive operator valued measures Naimark’s dilation Theorem
45
Then Pρ (E ) = ϕ(χE (A)) = Tr(P(E )ρ). deﬁnes a probability distribution over σ(A). Lemma (exercise) Let A = σ(A) λP(dλ) be an observable and let ϕ be a state with density matrix ρ.States and observables in quantum mechanics Deﬁnition Let H be the Hilbert space associated to a quantum system A (bounded) observable is deﬁned as a selfadjoint operator A ∈ B(H) A density matrix is a positive traceclass operator ρ such that Tr(ρ) = 1 A state on B(H) is a linear functional ϕ : B(H) → C of the form ϕ(A) = Tr(ρA) where ρ is a density matrix. E ∈ Σ ∩ σ(A) R 46 .
µi = 1) and i is unknown. The space of density matrices (states) S(H) is convex and its extremal points are the one dimensional projections h h called pure states. If a system is prepared randomly in state ρi with probability µi P (µi ≥ 0. In particular if τ is a density matrix then λi ≥ 0 and P i λi = 1. we obtain a random result X ∈ σ(A) with distribution Pρ .Probabilistic interpretations Probabilistic interpretation for measurements of observables R If we measure the observable A = λP(dλ) of a system prepared in state ϕ with density matrix ρ. then the corresponding state is P ρ = i µi ρ i 47 . Probabilistic interpretation for mixtures of states Recall that any selfadjoint τ ∈ T1 (H) has the spectral decomposition P τ = λi Pi .
Σ.The weak∗ topology Deﬁnition Let V be a Banach space. µ)∗ R R w∗ fn → f iﬀ p(ω)fn (ω)µ(dω) → p(ω)fn (ω)µ(dω) for all p ∈ L1 (Ω. The weak∗ topology on the dual V ∗ is deﬁned by the convergence criterion (on nets) : w∗ n → iﬀ n (v ) → (v ) for all v ∈ V . v ( ) := (v ) ˜ ∗ for v ∈ V and ∈ V . Example L∞ (Ω. µ) = L1 (Ω. 48 . The linear functionals on V ∗ which are continuous wit respect to the weak∗ topology are precisely those of V ⊂ V ∗∗ . Σ. µ) B(H) = T1 (H)∗ An → A iﬀ Tr(τ An ) → Tr(τ A) for all τ ∈ T1 (H) w∗ Theorem Let V be a Banach space. Σ.
Pρ is dominated by µ. Indeed µ(E ) = 0 ⇒ P(E ) = 0 ⇒ Pρ (E ) = Tr(ρP(E )) = 0 Thus Pρ has density pρ = w∗ dPρ dµ ∈ L1 (σ(A). Sketch of the proof. µ) fn (ω)pρ (ω)µ(dω) → R f (ω)pρ (ω)µ(dω) ˜ If fn → f then Tr(ρφ(fn )) = R 49 . µ) → B(H) Lemma ˜ φ is continuous with respect to the weak∗ topology.˜ weak∗ continuity of φ Recall The L∞ functional calculus Theorem associates to the selfadjoint operator A a ˜ morphism φ : L∞ (σ(A). Let ρ ∈ S(H).
µ) → B(H) be a weak∗ continuous morphism (previously ˜ denoted φ). φ∗ (τ ) ≥ 0 if τ ≥ 0 it is normalised.e. pτ := φ∗ (τ ) ˜ φ∗ has the following properties: it is linear and positive. We deﬁne φ∗ : T1 (H) → L1 (σ(A). µ) by the duality Z Tr(τ φ(f )) = f (ω)pτ (ω)µ(dω). i. pρ is a probability density if ρ is a density matrix 50 .e.Measurements as (completely) positive unital maps (I) Deﬁnition Let φ : L∞ (σ(A). i.
i. µ) which satisﬁes Z Tr(τ M(f )) = pτ (ω)f (ω)µ(dω). i.e. M(1) = 1 M is continuous with respect to the w∗ topology There exists a linear map M∗ : T1 (H) → L1 (Ω.e. pρ a probability density if ρ is a density matrix Conversely. µ) → B(H) be a linear map such that: M is positive.Measurements as (completely) positive unital maps (II) Theorem Let M : L∞ (Ω. Σ. M(f ) ≥ 0 if f ≥ 0 M is unital. Σ.e. pτ := M ∗ (τ ) and is positive.e. i. i. ⇒ : show that f → Tr(τ M(f )) is weak∗ continuous and hence is given by some pτ ⇐: show that M∗ is · 1 continuous. M∗ (τ ) ≥ 0 for τ ≥ 0 normalised. Then deﬁne M(f ) as element of the dual of T1 (H) 51 . any linear map M∗ with these properties has a dual M Hints for the proof.
Σ. Σ) is given by a dual pair (M.z 3 3 randomised measurement: Let M.x + iri. or MercedesBenz measurement on C2 consists of 3 operators „ « 1 1 → → 1 + ri. µ) Example Let ri be 3 coplanar unit vectors in R3 forming 120 degrees angles.z ri. The triad. A measurement with outcomes in the measure space (Ω. M∗ ) as above.x − iri.y 1 − ri.Measurements as (completely) positive unital maps (III) Deﬁnition (general deﬁnition of a measurement) Let B(H) be the algebra of observables of a quantum system. 1 − λ) 52 . Then R(f ) := λM(f ) + (1 − λ)N(f ) deﬁnes a measurement obtained by randomly choosing M or N with probabilities (λ. The result X ∈ Ω of M has probability density pρ := M ∗ (ρ) ∈ L1 (Ω.y Mi = (1 + r i σ ) = ri. N be two measurements with outcomes in (Ω. Σ).
53 . Then the operators M(E ) := M(χE ) form a POVM over (Ω. Σ) with values in B(H) there exists a probability measure µ and a measurement M : L∞ (Ω. µ) → B(H) with M(χE ) = M(E ).Positive operator valued measures (POVM) Deﬁnition Let (Ω. for every POVM {M(E ) : E ∈ Σ} over (Ω. Σ. Σ). Conversely. Σ) be a measure space. Σ. A map M : Σ → B(H) is called a positive operator valued measure (POVM) if it has the following properties positivity: M(E ) ≥ 0 for all E ∈ Σ P σadditivity: M(∪i Ei ) = i M(Ei ) (in the sense of strong convergence) for any countable family of mutually disjoint sets Ei ∈ Σ normalisation: M(Ω) = 1 Theorem Let M be a measurement M : L∞ (Ω. µ) → B(H).
Σµ) has the desired properties dµ 54 . Pk k X i=1 k→∞ M(Ei ).(Mn ). µ) as follows.r. Since M is weak∗ continuous we have M(∪i Ei ) = w ∗ . We only need to ﬁnd a common dominating measure.lim By the previous lemma. ⇐: Given {M(E )} we construct M∗ : T1 (H) → L1 (Ω.b. Thus M∗ : τ → pτ := dµτ ∈ L1 (Ω.u. i=1 M(Ei ) → M(∪i Ei ) strongly. Lemma Let Mn be an increasing net of positive operators converging to a bounded operator M w. Then µτ µ because µ(E ) = 0 ⇒ M(E ) = 0 ⇒ Tr(τ M(E )) = 0. Then Mn converges strongly to M and M is the least upper bound l.Measurements and POVM’s Proof. ⇒: we only need to prove the σadditivity of {M(E )}. Σ.t the weak∗ topology. Let ρ be a density matrix with strictly positive eigenvalues and let µ = µρ .. For every density matrix τ deﬁne the probability measure µτ (E ) := Tr(τ M(E )).
Then {M(E ) := V ∗ P(E )V } is a POVM with values in B(H) Theorem (Naimark’s dilation Theorem) Let M : L∞ (Ω. E ∈Σ Remark Naimark’s Theorem is a consequence of Stinespring’s Theorem for commutative C∗ algebras Since V is isometric we can identifying H with V H ⊂ K and write M(E ) = PH P(E )PH 55 .Naimark’s dilation Theorem Example Let {P(E )} be a PVM with values in B(K) and let V : H → K be an isometry. Σ. There exists a projection valued measure P : Σ → B(K) and an isometry V : H → K such that M(E ) = V ∗ P(E )V . µ) → B(H) be a measurement.
.Proof of Naimark’s Theorem for ﬁnite measure spaces Let Ω = {1. Mi k 56 . Vk = h. kn ) M := hi . . n} and POVM {M1 . h). . . . Deﬁne the (positive) inner product over the direct sum of d copies of H: ¯ ¯ h. . . Mi h = h. k M n X i=1 = (h1 . . . . M = 0} Deﬁne V : H → K by V : h → (h. . . . h i Let Pi ∈ B(K) e the orthogonal projection onto the i’s copy of H Verify that V ∗ Pi V = Mi : ¯ ¯ h. Mi ki Let K be the Hilbert space (⊕n H)/N where i=1 ¯ ¯ N := {h ∈ ⊕n H : h i=1 is the space of null vectors of ·. V ∗ Pi Vk = Vh. h. . Then V is an isometry: X ¯ k M= ¯ Vh. . Pi Vk = hPi k M = h. . (k1 . . · M. . . hn ). Mn }. .
Extremal measurements In ﬁnite dimensions measurements have densities 57 .t the BuresHellinger distance Convex structure of measurements space.r.Further topics related to measurements Measurements are · 1 contractive Bures (ﬁldelity) distance on density matrices Measurements are contractive w.
Note that if f . τ are density matrices then we can write ρ − τ = δ+ − δ− where δ± are positive operators with orthogonal supports. τ be density matrices and pρ := M∗ (ρ). Z pρ −pτ 1 = 2 sup E E (pρ (ω)−pτ (ω))µ(dω) = 2 sup Tr((ρ−τ )M(E )) ≤ ρ−τ E 1 58 . Then ρ−τ 1 = Tr(ρ − τ ) = Tr(δ+ + δ− ) = 2Tr(δ+ ) = 2 sup Tr(M(ρ − τ )) M where the supremum is taken over all operators 0 ≤ M ≤ 1. pτ := M∗ (τ ). if ρ.Measurements are Lemma · 1 contractive maps Let M∗ : T1 (H) → L1 (Ω. Let ρ. Σ. Finally. µ) be a measurement. g are probability densities Z Z f − g 1 = f (ω) − g (ω)µ(dω) = 2 sup f (ω) − g (ω)µ(dω) E E Similarly. Then pρ − pτ 1 ≤ ρ−τ 1 Proof.
A ⊗ 1ψ for all A ∈ B(H). τ ) := 2 − 2 ρ τ = 2 − 2 Tr( ρ1/2 τ ρ1/2 ) 1 Deﬁnition Let ρ be a density matrix on H. τ be two density matrices on H. ρ2 ) := Tr( ρ1 ρ2 ρ1 ) is equal to max  ψ1 . Theorem 59 . A puriﬁcation of ρ is any pure state Pψ = ψ ψ on an extended space H ⊗ K such that ρ = TrK (Pψ ) or equivalently Tr(ρA) = ψ. The Bures (or ﬁdelity) distance between ρ and τ is deﬁned as “ ”1/2 “ ”1/2 p 1/2 1/2 b(ρ. ψ2 of ρ1 . ρ2 .Bures (ﬁdelity) distance on density matrices Deﬁnition Let ρ. q 1/2 1/2 The ﬁdelity F (ρ1 . ψ2  where the maximum is taken over all puriﬁcations ψ1 .
to obtain the equality max  ψ1 . 1.Fidelity and transition probability Sketch of the proof. There exist unitaries Ui : fk → ek for i = 1. Check that 1/2 1/2 T T ψ1 . ψ2  = Tr(ρ1 ρ2 ) 1/2 1/2 (i) (1) (2) (1) (1) (2) 60 . Optimise over U2 and use point 1. Let τ ∈ T1 (H). Let ρi = k λk ek ek  be the spectral decompositions of ρi . 2 and V : ek → ek . Any puriﬁcation of ρi is of the Schmidt form X q (i) (i) (i) λk ek ⊗ fk ∈ H ⊗ H ψi = k with {fk } and {fk } orthonormal sets in H (K can be taken to be H) 3. Then max Tr(Uτ ) = Tr(τ ) with maximum taken over all unitaries. ψ2 = Tr(ρ1 ρ2 U2 VV T U1 ) 4. (use the polar decomposition τ = V τ  with a unitary V ) P (i) (i) (i) 2.
t. τ be density matrices and pρ := M∗ (ρ).Measurements are contractive w. q be two probability densities in L1 (Ω. q) := p− q 2 = 2−2 p(ω)q(ω)µ(dω) Theorem Let M∗ : T1 (H) → L1 (Ω. µ) be a measurement. The Hellinger distance between f and g is deﬁned as „ «1/2 Z p √ √ h(p. Σ. pτ := M∗ (τ ). µ).r. τ ) 61 . pτ ) ≤ b(ρ. Σ. the BuresHellinger distance Deﬁnition Let p. Then h(pρ . Let ρ.
φ  k=1 But pk = ψPk ⊗ 1ψ = Pk ⊗ 1ψ 2 and qk = φPk ⊗ 1φ = Pk ⊗ 1φ By using CauchySchwarz we ﬁnally get n X√ √ pk qk k=1 2 = ≥ n X k=1 Pk ⊗ 1ψ · Pk ⊗ 1φ ≥ n X k=1 n X k=1  Pk ⊗ 1ψ. Pn }. . . . φ n X√ √ pk qk ≥  ψ. τ ) By Naimark’s Theorem we can embed H into a larger space K such that the measurement is given by a PVM {P1 . τ .Proof of contractivity with respect to the BuresHellinger distance Proof in the case of a discrete measure space. φ  where ψ. Pk ⊗ 1φ   ψ. Pk ⊗ 1φ  =  ψ. . .φ  ψ. . . Mn }. φ  62 . Thus it suﬃces to show that for any ψ. τ ) invariant F (ρ. This operation leaves F (ρ. . φ ∈ K ⊗ K are puriﬁcations of ρ. Let Ω = {1. . n} and the measurement POVM {M1 . τ ) = supψ. . Pn √ √ The theorem is equivalent to k=1 pk qk ≥ F (ρ.
Mn } be the POVM of a measurement M with values in Ω = {1. 2. (2005) 63 . . Phys. .Extremal measurements Deﬁnition A subset C of a vector space V is convex if for any u = v ∈ C. Dim. . Then M is extremal if and only if the (i) (i) rankone operators {vj independent.M. 59795991. . w ∈ C is called an extremal point of C if it cannot be decomposed as above Problem. vk  : i = 1. Quantum Probability Rel. . Analysis.k and mj > 0. µ) → B(H). ri } are linearly (i) References: K. Parthasarathy: Inf. 38. Characterise the extremal points of the convex set of measurements M : L∞ (Ω. Theorem Let {M1 . Topics. Gen. the vectors w := λu + (1 − λ)v are in C for all 0 < λ < 1. A: Math. .R. j. 557568. . . n}. et al: J. . . vk (i) (i) = δj. Let ri X (i) (i) (i) mj vj vj  Mi = j=1 with vj . . . . Σ. D’Ariano. . (1999) G. n. k = 1 .
. Dn } (not all P equal to zero) such that i Di = 0 and Mi ± Di ≥ 0 for all i. . 3. implies that Di can be expressed as Di = dj.k vj Then P (i) (i) Pri (i) (i) vj  with mj > 0 (i) (i) vj  (i) (i) i Di = 0 is equivalent to the linear dependence of vj vj  (i) 64 . . Indeed. . M is not extremal iﬀ there exist selfadjoint operators {D1 . If Mi has spectral decomposition Mi = j=1 mj vj then point 2. 1. This follows from h. {M1 ± D1 . . . Mn ± Dn } are POVM’s and Mi = 1 (Mi + Di ) + 1 (Mi − Di ) 2 2 2. . Mi ± Di ≥ 0 implies Ker (Mi ) ⊂ Ker (Di ).Extremal measurements: solution in the case of a ﬁnite POVM Sketch of the proof. Mi ± Di h ≥ 0 by writting h = αh1 + βh2 with h1 ∈ Ker (Mi ) and h2 ∈ Ker (Mi )⊥ .
Proof. R 65 . µ) such that Z Mi. E Moreover from tr(M(E )) = µ(E ) = tr(m(ω))µ(dω) it follows that tr(m(ω)) = 1.j (E ) = mi. E ∈Σ E Moreover µ can be chosen such that m(ω) ≤ 1. Then there exists a measure µ on (Ω. µalmost surely. In particular m is bounded.In ﬁnite dimensions measurements have densities Lemma (measurement density) Let {M(E )} be a POVM over (Ω. Thus there exists a density mij ∈ L1 (Ω. Σ. almost surely. M(E )ej is a measure on Ω. The matrix element Mij (E ) := ei . Let tr(A) := Tr(A)/d and deﬁne the probability measure µ(E ) := tr(M(E )). Σ.j (ω)µ(dω). Σ) and a positive density function m ∈ L1 (Ω. µ) ⊗ M(Cd ) such that Z M(E ) = m(ω)µ(dω). dominated by µ. Σ) with values in M(Cd ).
Notions of statistical inference Statistical models Parametric estimation Fisher Information Cram´rRao bound e Eﬃcient estimators Repeated coin toss example 66 .
1] Hypothesis testing: given X drawn from either P0 or P1 decide from which of the two distributions it comes ˆ Conﬁdence intervals: together with estimator θ(x) of θ. one aims to make an ‘educated guess’ about some property of the underlying distribution Example Density estimation: given X1 .d. . . Xn independent identically distributed (i. estimate the value of p(x) for some x ∈ [0.) with unknown density p ∈ L1 ([0.What is statistical inference? Given some random data X from an unknown distribution. provide a ˆ neighbourhood C of θ(x) such that θ belongs to C with probability p Suﬃcient statistic: can data X ∼ Pθ be ‘summarised’ into a ‘simpler’ statistics f (X ) without losing information about θ ? Optimality: how do we compare the performance of estimators and which are the optimal ones? Asymptotics: what happens in the limit of ‘large number of data’ ? 67 . . 1]).i. .
The joint distribution is: Pn ([X1 θ = x1 . V ) with unknown mean θ ∈ Rk and known k × k covariance matrix V Tomography: an unknown probability density p over R2 is probed through its marginals along random directions φ in plane. . . Example Repeated coin toss: X1 .i.d. Xn = xn ]) = n Y i=1 Pθ ([Xi = xi ]) = θ P i xi · (1 − θ) n− P i xi Gaussian shift on Rk : family of Gaussian distributions N(θ. A statistical model (experiment) over Θ is a family {Pθ : θ ∈ Θ} of probability distributions on a measure space (Ω. with Pθ ([Xi = 1]) = θ and Pθ ([Xi = 0]) = 1 − θ.Statistical models Deﬁnition Let Θ be a parameter space. φ) = p(x cos φ + t sin φ. x sin φ − t cos φ)dt 68 . For each φ we get data X ∼ R[p](x. . . φ) where R[p] is the Radon transform Z R[p](x. . . Xn i. . . Σ).
·) where K : Ω × ΣΘ → [0. θ)) is small. U) where U is an additional random variable with ﬁxed. Remark The same problem can be formulated for ‘nonparametric’ Θ. e. and/or estimation of a function t = t(θ) In general the estimator may be randomised. 1] is a Markov kernel 69 .g. W (θ. Σ).Parametric estimation Problem Given a (open) subset Θ of Rk data X ∼ Pθ with Pθ probability distribution on (Ω. and θ ∈ Θ ˆ ˆ a loss function W : Θ × Θ → R+ . θ) = θ − θ 2 ˆ ˆ devise an estimator θ = θ(X ) such that the risk ˆ ˆ R(θ. known distribution ˆ if X = x choose θ ∼ K (x. for example ˆ ˆ θ = θ(X . θ) := Eθ (W (θ(X ).
i. The mean ˆ square error of θ(X ) can be written as the sum of a variance and a bias terms Z ˆ ˆ ˆ Eθ ((θ − θ)2 ) = (θ − θ)2 pθ (d θ) = Z ˆ ˆ ˆ ˆ ˆ ˆ (θ − Eθ (θ))2 pθ (d θ) + (θ − Eθ (θ))2 = V (θ) + B(θ)2 ˆ ˆ If θ is unbiased then the mean square error is equal to V (θ) 70 .Unbiased estimators Deﬁnition Let {Pθ : θ ∈ Θ ⊂ Rk } be a parametric statistical model and let X ∼ Pθ . Then X Let Y1P. .i. ˆ ˆ An estimator θ(X ) is called unbiased if Eθ (θ(X )) = θ for all θ. . normal distributed with Pθ = N(θ. .d.d. Then .. Xn be i. . ¯ Y = ( Yi )/n is an unbiased estimator of θ Remark (exercise) {Pθ : θ ∈ Θ ⊂ Rk } be a parametric statistical model and let X ∼ Pθ . V ). Yn be i. . Example Let X1 . Bernoulli with Pθ ([X = 1]) = θ and ¯ = (P Xi )/n is an unbiased estimator of θ Pθ ([X = 0]) = 1 − θ. .
i := ∂ θ /∂θi be the score function(s). Smooth model Throughout the following we will assume that pθ = dPθ satisfy suﬃcient dµ ‘regularity conditions’ allowing for diﬀerentiation w.j (θ) := Eθ ( ˙θ.t. The Fisher information matrix is deﬁned by Z ∂pθ ∂pθ −1 Ii. Σ) dominated by µ. Deﬁnition Let θ := log pθ be the log likelihood and let ˙θ.Fisher information matrix Let {Pθ : θ ∈ Θ ⊂ Rk } be a parametric statistical model with Pθ probability measures on (Ω.j ) = pθ (ω) µ(dω) ∂θi ∂θj 71 .i ˙θ. θ and exchangeability of integral and derivative.r.
pθ+dθ ) = ( pθ (ω)− pθ+dθ (ω))2 µ(dω) = I (θ)(dθ)2 +o((dθ)2 ) 4 The Fisher information matrix deﬁnes a riemannian metric on Θ and the corresponding geodesic distance is the Bhattacharya distance « „Z p p pθ1 (ω) pθ2 (ω)µ(dω) d(pθ1 . Then d(qθ1 . pθ2 ) and h(qθ1 . pθ2 ) I (θ) is the unique metric contracting under all randomisations 72 . qθ2 ) ≤ d(pθ1 . pθ2 ) = 2 arccos Let qθ be the probability density of a randomisation Y of X (randomised statistic. qθ2 ) ≤ h(pθ1 . Markov kernel) where X ∼ Pθ .Properties of the Fisher information matrix I (θ) is a positive deﬁnite real k × k matrix I (θ) is additive for products of independent models (exercise): (1) (2) if Pθ = Pθ × Pθ then I (θ) = I (1) (θ) + I (2) (θ) The Hellinger distance between inﬁnitesimally close densities pθ and pθ+dθ is determined by the Fisher information Z p p 1 2 h(pθ .
The CramerRao bound Theorem (Cram´rRao) e ˆ The following matrix inequality holds for any unbiased estimator θ ˆ ˆ Eθ ((θ − θ)2 ) = Var (θ) ≥ I (θ)−1 where I (θ) is the Fisher information matrix. By CauchySchwarz ˛ ˛2 ˛ ˛ 2 2 ˆ ˆ ˆ Var (θ) · I (θ) = Eθ ((θ − θ) ) · Eθ ( ˙θ ) ≥ ˛Eθ ((θ − θ) ˙θ )˛ The right side is ˆ ˆ Eθ ((θ − θ) ˙θ ) = Eθ (θ ˙θ ) − θEθ ( ˙θ ) = Z Z dpθ dpθ ˆ θ(ω) (ω)µ(dω) − θ (ω)µ(dω) = dθ dθ Z Z d d d ˆ ˆ pθ (ω)µ(dω) = θ(ω)pθ (ω)µ(dω) − θ Eθ (θ) = 1 dθ dθ dθ = = 73 . Proof. The general case is left as exercise. Let θ be one dimensional.
The Cram´rRao bound is in general not attainable. the binomial distribution b(θ. but it becomes e equality if and only if the distributions form an exponential family ! s X pθ = exp ηi (θ)ˆi (ω) − B(θ) h(ω) g i=1 74 . n) and function g (θ) = θ−1 (exercise). e. The Cram´rRao bound in this case is e Var (ˆ ) ≥ J(θ)I (θ)−1 J(θ)T g where J(θ)l. Even if unbiased estimators exist. their variance may be too big. For certain models there exist no unbiased estimators.Remarks on the Cram´rRao bound e One can similarly deﬁne unbiased estimators g of g (θ) for a function ˆ g : Θ → Rp .i = ∂g (θ)l /∂θi is the p × k Jacobian matrix.g.
.d. Xn ) = arg max τ n Y i=1 pτ (Xi ) is asymptotically eﬃcient. . Xn ) is called asymptotically eﬃcient if √ L ˆ n(θn − θ) −→ N(0.Asymptotic eﬃciency The theory of asymptotic eﬃciency shows that the Cram´rRao bound is e asymptotically attained in the following sense. . Xn be ˆ ˆ i. . Let X1 . An estimator θn = θn (X1 . Deﬁnition Let {Pθ : θ ∈ Θ ⊂ Rk } be a parametric statistical model. 75 . . with distribution Pθ . . if θ is one dimensional then nEθ ((θn − θ)2 ) → I (θ)−1 .i. . the maximum likelihood estimator ˆ θn (X1 . . . . I (θ)−1 ) ˆ In particular. Theorem Under regularity conditions. . .
. i=1 ¯ Var (Xn ) = Var (X )/n = θ(1 − θ)2 + (1 − θ)(0 − θ)2 = θ(1 − θ) The Fisher information is I (θ) = θ−1 + (1 − θ)−1 = 1/(θ(1 − θ)) ¯ Thus Xn attains the Cram´rRao bound. . . θ(1 − θ)) n i=1 ˆ Hence θn is asymptotically eﬃcient The maximum likelihood estimator is obtained by diﬀerentiating the likelihood P P n P Xi n − i Xi d Y Pi Xi dpθ i (X1 . with distribution Pθ . Xn be i. Var (X )) = N(0. Xn ) = θ (1 − θ)n− i Xi = − =0 dθ dθ i=1 θ 1−θ ˆ ¯ with solution θn = Xn ! 76 . .i. Let X1 . . . .Repeated coin toss example Let Pθ be the Bernoulli distribution: Pθ ([X = 1]) = θ and Pθ ([X = 0]) = 1 − θ. e Moreover by the Central Limit Theorem we have n √ 1 X L ˆ n(θn − θ) = √ (Xi − θ) −→ N(0. .d. Then P ¯ Xn := ( n Xi )/n is an unbiased estimator of θ.
Θ1 } is a partition of Θ and X ∼ Pθ randomised tests t = t(X . U) with U uniform on [0. ﬁx a prior π0 . . 1) and look for a test that minimises β := P1 ([t(X ) = 0]) under the constraint P0 ([t(X ) = 1]) ≤ α 2. P1 } be a binary statistical model over (Ω. The test t : Ω → {0. π1 and ﬁnd a test that minimises the average error e probability Pπ := π0 P0 ([t(X ) = 1]) + π1 P1 ([t(X ) = 0]) Remark One can extend the problem to more hypotheses {P1 . 1] 77 . ﬁx a level α ∈ (0. Given X ∼ Pi decide which of the two hypotheses is true.Hypothesis testing Problem Let {P0 . Pk } composite hypotheses: θ ∈ Θ0 vs θ ∈ Θ1 where {Θ0 . i = 0 or i = 1. . . Σ). 1} is ‘good’ if its error probabilities are small type I error P0 ([t(X ) = 1]) type II error P1 ([t(X ) = 0]) There are two main approaches to optimality 1. .
P1 } be a binary statistical model over (Ω. 1) be a ﬁxed level.r. Σ) and let p0 and p1 be the densities of P0 and P1 w.t. Then the likelihood ratio test 0 if p0 (ω)/p1 (ω) > π1 /π0 t(ω) := 1 if p0 (ω)/p1 (ω) ≤ π1 /π0 has minimal average error e Pπ := π0 P0 ([t(X ) = 1]) + π1 P1 ([t(X ) = 0]) = 1 (1 − π1 p1 − π0 p0 1 ) 2 78 . Lemma (optimal Bayes test) Let π0 . Then there exist a constant k such that the likelihood ratio test 0 if p0 (ω)/p1 (ω) > k t(ω) := 1 if p0 (ω)/p1 (ω) ≤ k satisﬁes P0 ([t(X ) = 1]) = α and minimises the type II error P1 ([t(X ) = 0]) among the αlevel tests. π1 be a (nondegenerate) prior distribution.Optimal tests Let {P0 . Lemma (NeymanPearson lemma) Let α ∈ (0. a probability measure µ.
. P1 } be a binary statistical model and let X1 . Then n→∞ lim 1 log Pn ([tn = 0]) = −D(p0 .n lim log Pπ = −C (p0 . .i. . . p1 ) = p0 (ω) log(p0 /p1 )µ(dω). Theorem (Stein’s Lemma) Let tn (X1 . Theorem (Chernoﬀ’s bound) Let π0 . . p1 ) n→∞ n where C (p0 . . . p1 ) is the Chernoﬀ distance „ « Z s 1−s C (p0 . p1 ) is the relative entropy Z D(p0 . Xn ) be the optimal Bayes test.Asymptotics: Stein’s Lemma and Chernoﬀ’s bound Let {P0 . Then 1 e. .d. . Xn i. π1 be a nondegenerate prior and let tn (X1 . p1 ) = − log inf p0 (ω)p1 (ω)µ(dω) 0≤s≤1 79 . Xn ) be the most powerful level α test. with Xk ∼ Pi . . p1 ) 1 n where D(p0 .
The quantum Cram´rRao bound e Quantum statistical models Quantum state estimation The L2 (ρ) Hilbert space The quantum FisherHelstrom information matrix Quantum Cram´rRao bound(s) e The quantum Cram´rRao bound is achievable for Θ ⊂ R e Achievability of the quantum Cram´rRao bound for Θ ⊂ Rk with k > 1 e The right Cram´rRao bound e The Holevo bound 80 .
H selfadjoint quantum exponential family: X X −k(θ) ∗ ρθ = e exp( γi (θ)Ti ) ρ0 exp( γi (θ)Ti ). i i γi (θ) ∈ C. Example qubit states: indexed by r = (rx . rz ) ∈ R3 such that r ≤ 1 z 1 ρr = 2 „ 1 + rz rx + iry rx − iry 1 − rz « r x y coherent spin states: ρn = ρr ⊗ · · · ⊗ ρr . V ) with mean z ∈ C. Ti ∈ B(H) Gaussian states of a quantum harmonic oscillator Φ(z. A quantum statistical model (experiment) over Θ is a family {ρθ : θ ∈ Θ} of density matrices ρθ ∈ T1 (H) for a given space H. for r = 1 (pure states) r Unitary family: ρt = exp(iHt)ρ exp(iHt) for t ∈ R. ry .Quantum statistical models Deﬁnition Let Θ be a parameter space. complex 2 × 2 ’covariance matrix’ V 81 .
where X is the outcome of the measurement. etc. e.Quantum state estimation Problem Given a quantum statistical model {ρθ : θ ∈ Θ} a loss function W : Θ × Θ → R+ . ρθ∼∼∼∼∼∼∼∼∼∼ M appareil  X ∼ Pθ M  ˆ θ (X) estimateur r´sultat e Remark de mesure same problem can be formulated for estimating a function g (θ) the main quantum feature is the optimisation over measurements step measurement and estimator can be ‘bundled’ into a measurement with values in Θ 82 . such that ˆ ˆ R(M. θ) = Eθ (W (θ(X ). ˆ ˆ design a measurement M and an estimator θ(X ). θ.g. ˆ θ − θ 2 for Θ ⊂ Rk or ρ − ρ 1 if Θ ⊂ S(H). θ)) Mesure quantique is small.
· R Remark A. ρ.e. unbounded symmetric linear operators satisfying X λi Xei 2 < ∞ λi ei ei  is the spectral decomposition of ρ √ equivalently.t. √ √ √ i. It can be shown each vector in L2 (ρ) can be identiﬁed with (the R equivalence class of) a square summable operator w. X ρ 2 = Tr((X ρ)∗ (X ρ)) = X 2 < ∞ ρ 2 where ρ = i P 83 . B ρ := Tr(ρA ◦ B).The L2 (ρ) Hilbert space Deﬁnition Let ρ be a positive operator in T1 (H).r. On the Rlinear space of bounded selfadjoint operators B(H)sa deﬁne the inner product A.e. X is square summable iﬀ X ρ is a HilbertSchmidt operator. i. A◦B = 1 (AB + BA) 2 ρ L2 (ρθ ) is the Hilbert space completion of B(H)sa with respect to ·. B ∈ B(H) correspond to the same vector in L2 (ρ) if Tr(ρ(A − B)2 ) = 0 R (relevant when ρ has eigenvalues equal to zero).
) such that Tr(∂ρθ /∂θi A) = Lθ.j = Lθ. ·. .t. . · θ ) be the L2 space w.i ∈ L2 (ρθ ) called R symmetric logarithmic derivative (s. Let (L2 (ρθ ).j θ θ = Tr((ρθ ◦ Lθ.i .i )A) 84 .r. . R Assume that θ → ρθ is diﬀerentiable as function with values in T1 (H) the linear functional on B(H) A→ ∂ Tr(Aρθ ) = Tr(∂ρθ /∂θi A) ∂θi can be extended to a continuous functional on L2 (ρθ ) for all i = 1. . k R Then by Riesz Theorem there exists a unique vector Lθ.d.i ◦ ρθ ∂θi The quantum FisherHelstrom information matrix is deﬁned as H(θ)i. ∂ρθ = Lθ.The quantum FisherHelstrom information matrix Let {ρθ : θ ∈ Θ} be a parametric statistical model with ρθ ∈ T1 (H) and Θ ⊂ Rk open. A or equivalently. ρθ .i . Lθ.l.
Example (exercise) Let ρr ∈ M(C2 ) be the state with Bloch vector r represented in polar coordinates r ↔ (r .r ◦ ρr . 1+r Lθ = ∂r σ.φ ◦ ρφ ∂φ The quantum FisherHelstrom information matrix is 0 1 1 0 0 1−r 2 A H(r) = @ 0 r2 0 0 0 r 2 sin θ2 85 . φ) „ « −iφ 1 1 1 + r cos θ r sin θe = (1 + rσ) ρr = r sin θe −iφ 1 − r cos θ 2 2 The symmetric logarithmic derivatives are the solutions of ∂ρr = Lr. θ. ∂φ ∂ρθ = Lr. ∂θ Lφ = ∂r σ. ∂θ ∂ρr = Lr. ∂r and are given by Lr = 1 (1 + rσ/r ).θ ◦ ρr .
Linear Algebra Appl. H is not the unique contractive metric. Such metrics are in onetoone correspondence with operator monotone functions f : R+ → R (i. 244 8196 (1996) 86 . ρθ+dθ ) = 1 H(θ)(dθ)2 + o((dθ)2 ) 4 (1) (2) contractivity: let C : T1 (H) → T1 (K) be a quantum channel (completely positive. and H(ρθ ) ≥ H(τθ ) unlike the classical case.e. ρθ2 ) ≥ b(τθ1 . Petz. Then b(ρθ1 .Properties of the quantum FisherHelstrom information matrix H(θ) is a real positive deﬁnite matrix additivity: if ρθ = ρθ ⊗ ρθ then H(θ) = H(θ)(1) + H(θ)(2) (exercise) metric: the Bures (ﬁdelity) distance between inﬁnitesimally close states ρθ and ρθ+dθ is given by the quantum FisherHelstrom information B(ρθ . f (A) ≥ f (B) for all A ≥ B ≥ 0 in B(H)) satisfying f (t) = tf (t −1 ) and f (1) = 1 Reference: D. τθ2 ). trace preserving linear map) Let τθ := C (ρθ ) be the quantum model obtained by applying the ‘quantum randomisation’ C to ρθ .
Let PM := {Pθ : θ ∈ Θ} be the classical model associated to (Q.Quantum Cram´rRao bound (I) e Theorem Let Q := {ρθ : θ ∈ Θ ⊂ Rk } be a quantum statistical model with ρθ ∈ B(H) and denote by H(θ) the associated quantum Fisher information matrix. the left inequality is the ‘classical’ Cram´rRao. e the right inequality follows from applying the operator monotone function f (x) = x −1 to the previous inequality IM (θ) ≤ H(θ). Then the matrix inequality holds IM (θ) ≤ H(θ) ˆ and in particular. B ∈ B(H) satisfying 0 ≤ A ≤ B. M) and let IM (θ) be its Fisher information matrix. A function is called operator monotone if f (A) ≤ f (B) for all A. for any unbiased estimator θ of θ we have ˆ Var (θ) ≥ IM (θ)−1 ≥ H(θ)−1 Remark In the last display. Σ) and let Pθ (M) (M) := M∗ (ρθ ). Not all monotone functions are operator monotone 87 (exercise) . Let M be a measurement with outcomes in (Ω.
˙θ.i θ 88 .j R Recall that we deﬁned M : L∞ (Ω. Lθ.g θ θ = Eθ (f . σ. g θ = Eθ (fg ) The score functions ˙θ.i which implies IM (θ) ≤ H(θ). Suppose ﬁrst that M is a PVM. ˙θ. µ) → B(H).Proof: the case of a PVM 1.i are elements of L2 (pθ ) and IM (θ) = ˙θ. M(g ) θ so M can be extended to an isometry I : L2 (pθ ) → L2 (ρθ ) R R we show that I ∗ (Lθ. The general case is reduced to this by Naimark’s theorem (next page). Then f.i .i ) = ˙θ. Indeed for every f ∈ L2 (pθ ) Z f . I (Lθ. We show that there exists and isometry I : L2 (pθ ) → L2 (ρθ ) such that R R ∗ ˙θ. g ) = Tr(ρθ M(f · g )) = M(f ).i θ = f (ω)∂pθ /∂θi (ω)µ(dω) = ∂Eθ (f )/∂θi ∂Tr(ρθ M(f ))/∂θi = Tr(∂ρθ /∂θi M(f )) = I (f ).i ) = Let L2 (pθ ) = {f : Ω → R : Eθ (f 2 ) < ∞} be the Hilbert space with inner R product f .i .
j ci cj H(θ)i.i θ ≤ X i ci Lθ.i θ = X i. Now let M be a general measurement given by a POVM on H.i O −1 and H(˜θ ) = H(ρθ ) = H(θ) ρ When measuring ρθ with {P(B)} we get the same distribution Pθ as when ˜ measuring ρθ with {M(B)} and hence.j = ci ˙θ. The map V · V ∗ : B(H)sa → B(K)sa extends to an isometric isomorphim O : L2 (ρθ ) → L2 (˜θ ) where ρθ := V ρθ V ∗ is the embedded state ρ ˜ (exercise).i i. By Naimark’s Theorem there exists an isometry V : H → K such that M(B) = V ∗ P(B)V .j i 2 θ = I ∗ X i ci Lθ.i = OLθ. ˜ In particular Lθ. same Fisher information.j 2. We can now apply the proof for the PVM case 89 .Proof : the Naimark Theorem argument Then IM (θ) ≤ H(θ) since X X ci cj IM (θ)i. with {P(B)} a PVM.
Holevo) Let Q := {ρθ : θ ∈ Θ ⊂ Rk } be a quantum statistical model with ρθ ∈ B(H) and denote by H(θ) the associated quantum Fisher information matrix. Belavkin. (M) ˆ the result θ ∼ Pθ is unbiased estimator of θ. . .j := XiM − θi . .e. k as element in L2 (ρθ ) and the ‘quantum covariance matrix’ V M (θ)i. . i. XjM − θj Then ˆ Var (θ) ≥ V M (θ) ≥ H(θ)−1 θ 90 . i = 1. Deﬁne the operators XiM = Z xi M(dx).Quantum Cram´rRao bound (II) e Theorem (Helstrom. Let M be a unbiased measurement with values in Θ.
We ﬁrst prove Var (θ) ≥ V M (θ) We use again Naimark’s theorem (M(dx) = V ∗ P(dx)V ) to obtain Z Z XiM = xi M(dx) = V ∗ xi P(dx)V = V ∗ XiP V Let Y M (c) := P ci (XiM − θi ) and Y P (c) := i P ci (XiP − θi ). i Then (exercise) c T V M (θ)c = Tr(ρθ (YcM )2 ) = Tr(ρθ (V ∗ YcP V )2 ) = Tr(˜θ YcP VV ∗ YcP ) ≤ Tr(˜θ (YcP )2 ) ρ ρ X ˆ ˆ = Eθ (( ci (θi − θi ))2 ) = c T Var (θ)c i 91 .Proof of quantum Cram´rRao Theorem (II) e ˆ 1.
I (f ) θ Z ˙θ (x)(x − θ)pθ (x)µ(dx) = 1. ˜ = I ∗ (Lθ ). 2 θ 2 θ · YM 2 θ ≥  L θ . Y M θ = 1. The general case is left as an exercise. Y M θ 2 2 θ. f θ = θ 92 . We now prove the second inequality for one dimensional θ.Proof of quantum Cram´rRao Theorem (II) e 2. and V M (θ) = Y M it suﬃces to show that By using the isomorphism O : L2 (ρθ ) → L2 (˜θ ) the isometry ρ 2 2 ∗ I : L (pθ ) → L (˜θ ). By CauchySchwarz we have Lθ Since H(θ) = Lθ Lθ . Y P ˜ = Lθ . Y M θ ˜ = Lθ . the fact that I Lθ = ˙θ and Y P = I (f ) for ρ f (x) = x − θ. we get Lθ .
The quantum Cram´rRao bound is asymptotically achievable for Θ ⊂ R e By measuring Lθ0 in state ρθ0 for some ﬁxed θ0 we obtain a random variable L with mean and variance Eθ0 (L) = Tr(ρθ0 Lθ0 ) = 0. Then ˆ θ := Varθ0 (L) = Tr(ρθ0 L2 0 ) = H(θ0 ) θ L + θ0 H(θ0 ) is locally unbiased estimator of θ around θ0 since Tr( dρθ Lθ0 ) Tr(ρθ Lθ0 ) dθ ˆ = θ0 + dθ + o(dθ) Eθ (θ) = θ0 + H(θ0 ) H(θ0 ) Tr(ρθ0 L2 0 ) θ = θ0 + dθ + o(dθ) = θ + o(dθ) H(θ0 ) and its variance is Var (L) ˆ Var (θ) = = H(θ0 )−1 H(θ0 )2 93 .
Lθ at θ = θn and ˆ compute the locally unbiased θn .l.l. The corresponding statistical model is Qn := {ρ⊗n : θ ∈ Θ} θ 2. is given by the the sum of the individual s. This estimator is eﬃcient √ L ˆ n(θn − θ) −→ N(0. identical.Helstrom information is H (n) (θ) = nH(θ) 3.d. H(θ)−1 ) (n) 94 .d. separate.l. The argument can be made rigourous in the asymptotic framework using an adaptive measurement procedure: 1. identically prepared quantum systems. On the rest of the systems we measure the s. The s. Consider n independent.’s Lθ = Lθ ⊗ 1 ⊗ · · · ⊗ 1 + · · · + 1 ⊗ · · · ⊗ Lθ and the Fisher.The quantum Cram´rRao bound is asymptotically achievable for Θ ⊂ R e However the measurement depends on θ0 and is only ‘locally optimal’. 5.g.d. We perform a simple measurement (e. informationally complete meausurements on each system) on a small ˜ fraction n ˜ n of the systems and compute a rough estimator θn of θ (n) ˜ 4.
. However. it is sharp in the sense that if V (M. if the s. .’s commute with each other.i . 4. . If G ∈ M(Rk ) is a positive matrix we deﬁne the loss function X ˆ θ) = ˆ ˆ ˆ ˆ W (θ. θ. Lθj ]) = 0. θ) ≥ K −1 (θ) for all locally unbiased measurements. i. Asymptotically.d.. (θi − θi )Gi. 95 .l. j = 1. Although the bound is in general not achievable. .e. 3. there may not exist locally unbiased estimator which achieve the quantum Cram´rRao e bound. θ) = TrGVar(θ) and the optimal measurement procedure will depend on G . What is a ‘good estimator’ in this case? The answer depends on the particular form of the loss function. the variance H(θ)−1 can be achieved iﬀ the weaker form of commutativity holds Tr(ρθ [Lθ.l. k then they can be measured simultaneously (exercise) and ˆ the previous argument leads to an eﬃcient estimator θn .. Lθ. 2. G ) = Eθ W (θ.j (θj − θj ) = (θ − θ)T G (θ − θ) i.i . If the s.d.’s do not commute with each other.j ] = 0 for all i.Achievability of the quantum Cram´rRao bound for Θ ⊂ Rk with k > 1 e 1.j ˆ ˆ ˆ The risk is given by R(θ. [Lθ. then H(θ)−1 ≥ K −1 (θ).
can be extended to a ∂θ ∂θ i i continuous linear functional on L2 (ρθ ). ∂ρθ = ρθ Lθ.i is deﬁned as the vector in L2 (ρ) + ∂ρθ satisfying Tr( ∂θi A) = (Lθ. + The right logarithmic derivative Lθ.The right logarithmic derivative Deﬁnition 1. Let Q := {ρθ : θ ∈ Θ ⊂ Rk } be a quantum statistical model on H. Assume that ρθ is diﬀerentiable in T1 (H) ∂Tr(ρθ A) the functional A → = Tr( ∂ρθ A) on B(H). Deﬁne L2 (ρ) to be the complex Hilbert space + obtained as the completion of B(H) with respect to the inner product (X . Let ρ ∈ T1 (H) be a state.j = (Lθ. Y )ρ := Tr(ρYX ∗ ) 2.i .j )θ 96 . Lθ. A)θ or equivalently.i .i ∂θi 3. The right information matrix is deﬁned by J(θ)i.
i = 1. Deﬁne the operators XiM = Z xi M(dx).The right CramerRao bound Theorem (Yuen and Lax. (M) ˆ the result θ ∼ Pθ is unbiased estimator of θ.e. Belavkin) Let Q := {ρθ : θ ∈ Θ ⊂ Rk } be a quantum statistical model with ρθ ∈ B(H) and denote by J(θ) the associated right information matrix. XjM − θj )θ Then M ˆ Var (θ) ≥ V+ (θ) ≥ J(θ)−1 where all matrices are considered as elements of M(Ck ). . . . . i.j := (XiM − θi . k as element in L2 (ρθ ) and the ‘right quantum covariance matrix’ + M V+ (θ)i. 97 . Let M be a unbiased measurement with values in Θ.
the right bound is achieved in the sense that for any ﬁxed positive matrix G .Comparison of the symmetric and right (left) Cram´rRao bounds e 1. 98 . and are incompatible with each other. If θ is one dimensional then the symmetric bound is at least as informative as the right bound: H(θ) ≤ J(θ) Indeed the variance H(θ)−1 is achieved by measuring Lθ (locally unbiased measurement). For example in the case of mixed Gaussian states of a harmonic oscillator G (z.there exists an unbiased estimator z such that ˆ ˆ Tr(G Var(θ)) = Tr(GJ(θ)−1 ) The measurements leading to these estimators depend however on G . For certain models the right bound is better than the symmetric one. hence the right bound implies that H(θ)−1 ≥ J(θ)−1 2. V ) with ﬁxed V and unknown z.
j Xθ. .e.j )θ = Tr(ρθ Xθ. . θ) be a quadratic loss function. . X ˆ ˆ ˆ ˆ ˆ W (θ.1 . Xθ. G ) = Eθ (W (θ.k symmetric elements of L2 (ρθ ) satisfying + Tr(ρθ Xθ. ∂θi and Z (Xθ )i.j ) = δi.i ) = 0.k ) with Xθ. Tr( ∂ρθ Xθ.The Holevo bound for quadratic risk Let Q = {ρθ : θ ∈ Θ ⊂ Rk } be a quantum statistical model on H. . 99 . θ)) = ˆ ˆ ˆ R(θ. Gij Eθ ((θi − θi )(θj − θj )) = Tr(GVar (θ)) i.j Theorem (Holevo bound) ˆ Let M(d θ) be an unbiased measurement.j := (Xθ.j . i.j ˆ The risk of an unbiased estimator θ is given by X ˆ θ. ˆ Let W (θ. Then “˛√ n “√ √ ” √ ˛”o ˛ ˛ ˆ Tr(GVar (θ)) ≥ inf Tr G Re(Z (Xθ )) G + Tr ˛ G Im(Z (Xθ )) G ˛ Xθ where Xθ := (Xθ. Xθ.i .i ). θ) = (θi − θi )Gij (θj − θj ) = (θ − θ)T G (θ − θ) i.
j Xθ. It is enough to prove the bound for special Xθ of the form Z Xθ.j ) − Tr(ρθ )= ∂θi ∂θi ∂θi ∂ Tr(ρθ Xθ.j and Tr( ∂ρθ ∂θi ∂ρθ ∂ ∂Xθ.j ) = Tr(ρθ Xθ. then Tr(V ) ≥ Tr(Re(Z )) + Tr(ImZ ) ˆ Apply Lemma with V = Var (θ) and Z = Z (Xθ ) 100 .i = xi M(dx) − θi Check the duality between Xθ. Z is hermitian (complex) matrix and V ≥ Z . The general case is left as an exercise.Proof of the Holevo bound For simplicity we take G = 1.j = δi.j ∂θi As in the Cram´rRao bound (II) it can be shown that e ˆ Var (θ) ≥ Z (Xθ ) Lemma (proof left as exercise): if V is a real symmetric k × k matrix.j ) + Tr(ρθ )δi.
3876 ¸a 101 . models of ﬁnite dimensional states. known covariance.The Holevo bound is achievable (asymptotically) 1. The Holevo bound is achieved asymptotically for i.d. Gut˘: arXiv:quanthph/0804. A proof based on Cram´rRao analysis is given for d = 2 in e M.e. Gaussian states of quantum oscillators with unknown means and ﬁxed. ρθ ⊗ · · · ⊗ ρθ with ρθ ∈ M(Cd ) The measurement consists of a two steps adaptive procedure (as in the case of onedimensional parameter).e. Matsumoto: arXiv:quantph/0411073 For the general case d < ∞ the result follows from the theory of ‘local asymptotic normality’ developed in J. i. 2. The measurement can be understood by showing that the ˜ n particle model ‘converges’ to a Gaussian model for which the solution is known. The Holevo bound is achieved in the case of quantum Gaussian shift models. Kahn and M. i. with the diﬀerence that in the second step one needs to perform a joint measurement (not separable) on the n − n systems.i. This will be discussed in detail in the following sections. Hayashi and K.
Holevo: Probabilistic and statistical aspects of quantum theory (1982) 102 . S.Covariant measurements Group covariant quantum statistical models Covariant measurements The covariant quantum estimation problem Optimal covariant measurements Structure of covariant measurements The optimal measurement in the case of irreducible representations Example: estimation of pure states Reference: A.
θ ∈ Θ Example the set of pure states ρP = P with P a one dimensional projection in Cd is covariant under the action of SU(d) given by P → UPU ∗ shift parameter: the time evolved states ρt := exp(−iHt)ρ exp(iHt) are covariant with respect to the representation of R given by U(t) = exp(iHt) orientation parameter: Let U : SO(3) → U (H) be a unitary representation. g ∈ G . The model {ρn := U(g )ρn0 U(g )∗ : n ∈ S 2 } is covariant. g n0 = n0 103 . g ∈ G. Let n → g n be the action on S 2 by rotations. provided that ρn0 = U(g )ρn0 U(g )∗ for all g s.t. Let U : G → U (H) be a unitary representation of G on H. A quantum statistical model {ρθ : θ ∈ Θ} on H is called covariant if ρg θ = U(g )ρθ U(g )∗ .Covariant quantum statistical models Deﬁnition (covariant statistical models) Let G be a group of transformations of a set Θ and denote the action by θ → g θ for θ ∈ Θ.
g ∈ G .Covariant measurements Deﬁnition (covariant measurements) Let G be a group of (measurable) transformations of a measure space (Ω. 104 . P be the position and momentum of a quantum particle such that exp(−ixP)Q exp(ixP) = Q − x1. The triad measurement is covariant with respect to a unitary representation of S(3) (exercise). Σ) and denote the action by ω → g ω for ω ∈ Ω. Example Let Q. The measurement of Q is covariant with respect to U(x) := exp(ixP) (exercise). Let U : G → U (H) be a unitary representation of G on H. ω ∈ B}. A measurement on H with outcomes in Ω is called covariant if U ∗ (g )M(B)U(g ) = M(g −1 B) where gB = {ω : ω = g ω .
M) = π(dθ)R(θ. M) := Eθ W (θ. we can look for a measurement that minimises the Bayesian risk Z R(π. The risk at θ is R(θ. Remark ˆ ˆ Let θ be the result of a measurement M. M) θ Alternatively. θ) = W (g θ. M) Θ for a prior π on Θ that is invariant under the action of G . 105 . θ) By optimal we mean a measurement that minimises the maximum risk R(M) = sup R(θ.The covariant quantum estimation problem Problem (covariant quantum estimation) Given an action of G on Θ a unitary representation U : G → U (H) a covariant model {ρθ : θ ∈ Θ} on H ˆ ˆ an invariant loss function: W (θ. g θ) for all g ∈ G ﬁnd the ‘optimal’ measurement for estimating θ.
e.Transitive actions on compact groups From now on we will assume for simplicity that Θ ⊂ Rk is a smooth manifold G is a compact Lie group the action of G on Θ is continuous and transitive. Transitivity implies Θ ∼ G /G0 . i. = on Θ there is a unique invariant measure ν given by ν(B) = µ({g : g θ0 ∈ B}) (exercise) 106 . for any θ ∈ Θ there exists g ∈ G such that θ = g θ0 for some ﬁxed θ0 H is ﬁnite dimensional Remark On a compact Lie group there is a unique left (and right) invariant probability measure. µ(A) = µ(gA) = µ(Ag ) called the Haar measure Let G0 = {g : g θ0 = θ0 } ⊂ G be the stationary group of θ0 . i.e.
Optimal covariant measurements Theorem (covariant measurements achieve the optimal risk) In the covariant quantum estimation problem the minima of the Bayesian risk R(π. M) and maximum risk R(M) are achieved on a covariant measurement. M) 107 . M) = R(M) = R(θ. Moreover if M is covariant. then R(π.
Mg ) since ν is invariant measure on θ 108 .Optimal covariant measurements: proof For any measurement M and g ∈ G we can deﬁne a new measurement Mg by Mg (B) = U(g )∗ M(gB)U(g ) ˆ Using the covariance of {ρθ : θ ∈ θ} and the invariance of W (θ. g θ)Tr(U(g )ρθ U(g )∗ M(d θ)) = R(M. g θ) In particular if M is covariant then R(M. θ) = W (θ. θ) we get Z Z ˆ ˆ ˆ ˆ R(Mg . M) = R(ν. θ)Tr(ρθ Mg (d θ)) = W (θ. g θ) Thus R(ν. θ) = R(M. θ)Tr(ρθ U(g )∗ M(dg θ)U(g )) Z ˆ ˆ = W (θ.
1) and (5. ν)µ(dg ) = R(M. ν) = R(M. ν) = R(Mg −1 .Optimal covariant measurements: proof The averaged measurement Z M(B) = G Mg −1 (B)µ(dg ) is covariant and Z R(M.1) Moreover R(M) = sup R(M. ν) (5. ν) θ (5.2) (5.2) say that the covariant measurement M is at least as good as M 109 . θ) ≥ R(M.
G Deﬁne m(θ) := U(g )m0 U(g )∗ where g : θ0 → θ. Then Z M(B) := m(θ)ν(dθ). any covariant measurement is of this form. Remark By considering B such that ν(B) is small enough we get M(B) < 1. B ∈ Σ(Θ) B is (the POVM of) a covariant measurement. 110 .Structure of covariant measurements Theorem (structure of covariant measurements) Let m0 ∈ B(H) be a positive operator which commutes with the operators {U(g ) : g ∈ G0 } and satisﬁes Z U(g )m0 U(g )∗ µ(dg ) = 1. Thus a covariant measurement on a ﬁnite dimensional space cannot be projection valued. Conversely.
we obtain U(g )∗ m(θ)U(g ) = m(g −1 θ) Choose m0 = m(θ0 ) and check that it satisﬁes the conditions (exercise) 111 . Converse We apply the measurement density Lemma to obtain that M(dθ) = m(θ)ν(dθ) where m(θ) is a unique positive operator density (νalmost surely) The covariance implies Z Z U(g )∗ m(θ)U(g )ν(dθ) = B g −1 B Z m(θ)ν(dθ) = B m(g −1 θ)ν(dθ) and since the density is unique. Direct statement Note that m(θ) is well deﬁned due to the property m0 = U(g )m0 U(g )∗ for g ∈ G0 .Structure of covariant measurements: proof 1. Positivity and σadditivity follow directly from deﬁnitions Using ν(B) = µ({g : g θ0 ∈ B}) we obtain the normalisation Z Z Z M(dθ) = m(θ)ν(dθ) = U(g )m0 U(g )∗ µ(dg ) = 1 Θ θ G 2.
Lemma (Schur lemma) Let U : G → B(H) be an irreducible representation. An operator A ∈ B(H) commutes with U(g ) for all g ∈ G iﬀ A = c1 for some c ∈ C.3) 112 .Covariant measurements for irreducible representations Deﬁnition (irreducible representation) A unitary representation U : G → B(H) is called irreducible (irrep) if the only subspaces of H that are invariant under U are H and {0}. Proposition (measurement seed for irreps) There exists a onetoone correspondence between covariant measurements with respect to an irreducible representation U : G → B(H) and density matrices s0 commuting with {U(g ) : g ∈ G0 }: M(dθ) = dU(g )s0 U(g )∗ ν(dθ). d = dim(H) (5.
The expression (5.4) commutes with U(g ).4) Since the integral (5. 113 . and U is irreducible.Covariant measurements for irreducible representations Proof.3) deﬁnes a measurement iﬀ m0 := ds0 satisﬁes the normalisation Z U(g )m0 U(g )∗ µ(dg ) = 1 (5. By taking trace on both sides of (5. it is proportional to 1 for arbitrary m0 (Schur’s Lemma). All irreps of a compact group are ﬁnite dimensional.4) Z Tr(U(g )m0 U(g )∗ )µ(dg ) = Tr(m0 ) = Tr(1) = d hence s0 = m0 /d is a density matrix.
The optimal measurement in the case of irreducible representations Proposition (optimal seed for irreps) Let U : G → U (H) be an irreducible representation of G acting on Θ. The optimal covariant measurement has ‘seed’ s0 given by s0 = 1 dmin Pmin where Pmin is the projection onto the eigenspace of W0 coresponding to the minimal eigenvalue. Let {ρθ : θ ∈ Θ} be a covariant quantum statistical model on H = Cd . The risk of a covariant measurement M(dθ) = dsθ ν(dθ) is equal to R(M) = dTr(W0 s0 ) where W0 is the positive operator Z W0 = W (θ. θ0 )U(g )∗ ρθ0 U(g )ν(dθ). 114 . and dmin is the dimension of the eigenspace. 2. 1.
θ0 )Tr(U(g )∗ ρθ0 U(g )s0 )ν(dθ) = dTr(W0 s0 ) where W0 is the positive. θ0 )Tr(ρθ0 s(θ))ν(dθ) = Z d W (θ.The optimal measurement in the case of irreducible representations Proof. θ0 )U(g )∗ ρθ0 U(g )ν(dθ) The minimum of Tr(W0 s0 ) over all density matrices s0 is achieved at Pmin /dmin with Pmin the eigenprojection coresponding to the minimal eigenvalue of W0 (exercise) 115 . θ0 ) = d W (θ. As shown before. operator (exercise: verify selfadjointness) Z W0 = W (θ. we can restrict to covariant measurements and the risk is Z R(M) = R(M.
Example: estimation of pure states Let Q := {ρθ = θ θ : θ ∈ Θ} be the family of pure states where θ = d X i=1 θi ei ∈ C d and n X i=1 θi 2 = 1 ˆ ˆ Let W (θ. This can be taken into account by ﬁxing the phase of one of the coeﬃcients The quantum statistical model Q is covariant with respect to the (irreducible) representation of the special unitary group SU(d) on Cd ˆ The loss function W (θ. the state determines the vector θ only to a phase factor. θ 2 be the ﬁdelity distance Remark (exercise) Strictly speaking. θ) is invariant under the action of SU(d) 116 . θ) = 1 −  θ.
117 .Example: estimation of pure states Theorem (optimal estimation of pure states) The optimal covariant measurement for the above quantum estimation problem is M(dθ) = dθ θν(dθ) where ν is the unique SU(d)invariant measure on Θ.
1] 118 . . thus the minimum is achieved by one of the extremal points λ = 0 or λ = 1. By direct calculation one can verify that λ = 1 is the minimum (exercise) 1−λ ⊥ P0 . 0). According to the Proposition ‘measurement seed for irreps’ s0 = λP0 + where P0 = θ0 θ0 . . . ed } and leaving e1 ﬁxed. . 0. . . . According to the Proposition ‘optimal seed for irreps’ we need to optimise the aﬃne functional λ → Tr(W0 s0 ). The stationary group of θ0 is G0 ∼ SU(d − 1) = d−1 consisting of unitaries U acting on C = Lin{e2 . d −1 λ ∈ [0. Let θ0 = (1.Example: estimation of pure states Proof. .
Leonhardt.Quantum harmonic oscillators and Gaussian states The quantum harmonic oscillator/quantum particle Creation and annihilation operators. Measuring the quantum state of light. 1997 119 . Cambridge University Press. phaseshift operator Coherent states Squeezed states Thermal equilibrium states All gaussian states Reference: U.
dh −i .The quantum harmonic oscillator Deﬁnition (position and momentum) A quantum quantum harmonic oscillator/quantum particle is characterised by its position and momentum (unbounded) observables acting on L2 (R) as Q:h P:h → → xh(x). F −1 [g ](q) = √ g (p)e ipq dq 2π 2π The operators Q. P = F ∗ QF 120 . P are Fourier transform of each other Q = FPF ∗ . dx h ∈ D(Q) h ∈ D(P) Formally. P] = i1 Lemma (exercise) Let F : L2 (R) → L2 (R) be the (unitary) Fourier transform Z Z 1 1 F[f ](p) = √ f (q)e −ipq dq. Q and P satisfy the Heisenberg commutation relation [Q.
Then 1 exp(F + G ) = exp(− [F . From the BakerHaussdorf formula we get the Weyl relations U(a)V (b) = exp(iab)V (b)U(a) Alternatively we will use the displacement operators D(q. p) := exp(ipQ − iqP) = exp(− or ipq ) exp(ipQ) exp(−iqP) 2 121 . G ] commutes with both F and G . G ]) exp(F ) exp(G ) 2 Deﬁnition (Weyl operators) The Weyl operators are deﬁned as U(a) := exp(iaP) and V (b) := exp(ibQ). G be operators such that [F .Weyl operators Theorem (BakerHaussdorf formula) Let F .
p) ∈ R2 V (b)PV (b)∗ = P − b1 122 . p ) = exp(i/2(pq − qp )) D(q + q . The unitaries D(q. p)D(q .r. is covariant w. (q. The theory of covariant measurements can be extended to projective unitary representations. p) satisfy D(q. p)∗ .t displacements in R2 .p = D(q.Projective unitary representation of R2 Remark (projective unitary representation of R2 ) Note that U(a) and V (b) act as displacement operators U(a)QU(a)∗ = Q + a1. p + p ).0 D(q. in particular the statistical model ρq. hence we have a projective unitary representation of R2 . p)ρ0.
b) is called the Weyl or CCR (canonical commutation relations) algebra.t. b) then g = 0 or f = 0. Theorem (von Neumann’s uniqueness Theorem) All weakly continuous (w. Probabilistic and statistical aspects of quantum theory (1982) 123 . To prove this verify that if g . by using properties of the Fourier transform. S. U(a)V (b)f = 0 for all (a. Holevo. b)) irreducible representations of the Weyl algebra are unitarily equivalent to each other. Lemma (irreducibility of the deﬁning representation) The representation of CCR on L2 (R) is irreducible (exercise).r. Hint.Weyl/CCR algebra Deﬁnition (Weyl/CCR algebra) The C ∗ algebra generated by S(a. Look up page 225 in A. Proof. (a.
Creation and annihilation operators Deﬁnition (Fock basis) The Fock. basis in L2 (R) is deﬁned by ψn (x) = Hn (x)e −x 2 /2 √ /( π2n n!)1/2. or number O. 124 . n≥0 where Hn are the Hermite polynomials. a∗ ] = 1. We will denote the vectors ψn by n Deﬁnition (creation and annihilation operators) The creation and annihilation operators on L2 (R) are deﬁned as √ √ ∗ and a := (Q + iP)/ 2 a = (Q − iP)/ 2 and satisfy the commutation relations [a.N.
By diﬀerentiating Γ(φ)∗ aΓ(φ) w.r. a0 = 0 Lemma The phaseshift unitary Γ(φ) := exp(−iφN) acts as Γ(φ)∗ aΓ(φ) = a exp(−iφ) or equivalently as rotation of the phase space variables „ « „ « „ Q Q cos φ .and a∗ act as ‘ladder operators’ on {n : n ≥ 0}: √ √ ∗ a n = n + 1n + 1 .Number and phaseshift operators Lemma (ladder operators) n are the eigenvectors of the number operator N := a∗ a = (P 2 + Q 2 − 1)/2 and Nn = nn a. where R(φ) := Γ(φ)∗ Γ(φ) = R(φ) P P − sin φ Proof. aψn = nn − 1 .t. φ we get −iΓ(φ)∗ aΓ(φ) sin φ cos φ « 125 .
The displacement operators D(q.t. p) can be rewritten in complex form √ ∗ D(z) := exp(za − z a). ¯ z := (q + ip)/ 2 ∈ C z := D(z)0 is called coherent vector and ∞ 1 2 X zn √ n z = exp(− z ) 2 n! n=0 In particular. z z Pz zn z (n) =  zn  = exp(−z ) n! 2 2 Γ(φ) acts on coherent vectors as phaseshift Γ(φ)z = e −iφ z 126 .Coherent states Deﬁnition (vacuum and coherent states) The vector 0 is called vacuum or ground state. N has a Poisson distribution with intensity z2 w.r.
First 2 items follow from deﬁnitions.Overcompleteness of the coherent states Lemma (Overcompleteness of the coherent states) The wave function of the coherent vector z is ψz (x) = ψ0 (x − q) exp(ipx − ipq/2).t. The inner product of two coherent vectors is zz = exp(−z − z 2 /2) exp(iIm(¯z )) z √ z = (q + ip)/ 2 The coherent states form an overcomplete set of projections Z √ dqdp z z = 1. z = (q + ip)/ 2 2π In particular the linear span of the coherent vectors is dense in L2 (R) Proof. the Fock basis.r. 127 . The overcompleteness can be checked by taking matrix elements w.
S(ξ)∗ PS(ξ) = Pe ξ Consequently. ξ we obtain −S(ξ)∗ QS(ξ) 128 . squeezing phase shifting and displacing has the action „ « „ «„ « „ « sin φe ξ Q cos φe −ξ Q q Ad[D(z)∗ Γ(φ)∗ S(ξ)∗ ] : → + 1 P P p − sin φe −ξ cos φe ξ Proof. z := Γ(φ)S(ξ)D(z)0 are called (pure) squeezed states.r. 2 ξ∈R The vector states φ. Lemma (squeezing of coordinates) S(ξ) has the following action on Q and P S(ξ)∗ QS(ξ) = Qe −ξ . By diﬀerentiating S(ξ)∗ QS(ξ) w.Squeezed states Deﬁnition (displaced squeezed vacuum) The unitary operator S(ξ) is called a squeezing operator ξ S(ξ) := exp( (a2 − a∗2 )). t. ξ.
Gaussian states Deﬁnition (Gaussian state) The quadrature observables are deﬁned by Xφ := Q cos φ + P sin φ = Γ(φ)∗ QΓ(φ) A state ρ is called Gaussian if Xφ has Gaussian distribution for all φ Lemma A Gaussian state ρ is completely characterised by the mean values (q. P) „ « 2 Tr(ρ(Q − q) ) Tr(ρ(Q − q) ◦ (P − p)) Vρ := = Tr(ρXXT ) 2 Tr(ρ(Q − q) ◦ (P − p)) Tr(ρ(Q − q) ) where X= In particular. p) := (Tr(ρQ). the distribution of Xφ is N(q cos φ + p sin φ. [R(φ)Vρ R(φ)T ]11 ) „ Q − q1 P − p1 « 129 . Tr(ρP)) and the ‘covariance matrix’ of (Q.
130 . The mean and variance of Xφ are Tr(ρXφ ) = Tr(ρQ cos φ + P sin φ) = q cos φ + p sin φ and Tr(ρ(Xφ − q cos φ + p sin φ)2 ) = Tr(ρ[R(φ)XXT R(φ)T ]11 ) = [R(φ)Vρ R(φ)T ]11 The fact that there can be only one Gaussian state with a give mean and variance can best be seen by associating in a onetoone fashion the Wigner function Wρ which in the case of Gaussian states is just the Gaussian N((q. Wigner functions will be studied in the next section. p). V ).Characterisation of Gaussian states Proof. Any (classical) Gaussian distribution is uniquely determined by its mean and variance.
The distribution of Xφ with respect to ρ := φ. 1/2). P) is « „ « „ 1 2 0 Tr(0 0Q ) Tr(0 0Q ◦ P) 2 V0 = = 0 1 Tr(0 0Q ◦ P) Tr(0 0Q 2 ) 2 Corollary (all squeezed (coherent) states are Gaussian) The squeezed and coherent states are Gaussian. z φ. The covariance matrix of (Q. ξ. z is the marginal along the direction φ of the bivariate Gaussian N((q. V ) with covariance matrix „ « „ −2ξ « 1 Tr(ρQ 2 ) Tr(ρQ ◦ P) e 0 V = = R(φ) R(φ)T 2 2ξ Tr(ρQ ◦ P) Tr(ρQ ) 0 e 2 Moreover these are the only states of minimum uncertainty (exercise) Varρ (Q)Varρ (P) = 1 4 131 . p). ξ.The squeezed states are Gaussian Lemma (the vacuum state is Gaussian) The vacuum state is a Gaussian state with each quadrature having distribution N(0.
all eigenvalues of ρβ are strictly positive The thermal equilibrium states are invariant under phase shifts The number operator N has a geometric distribution with mean ¯ N = (e β − 1)−1 Pρβ (n) = (1 − e −β )e −nβ 132 .Thermal equilibrium states Deﬁnition (Thermal equilibrium state) The thermal equilibrium state at inverse temperature β > 0 is deﬁned by ρβ := (1 − e −β ) ∞ X n=0 n ne −nβ Remark The thermal equilibrium states are faithful.e. i.
z = (q + ip)/ 2 2πσ 2 Proof. By rotation (phase) symmetry it suﬃces to verify that the diagonal matrix elements on the right and left side coincide: Z ∞ 2 n 1 −r 2 /(2σ 2 ) −r 2 /2 (r /2) e e d(r 2 /2) = (1 − e −β )e −nβ σ2 0 n! 133 .Thermal states as mixtures of coherent states Lemma The thermal equilibrium state is a mixture of coherent states with Gaussian weight of variance σ 2 = 1/(e β − 1) Z √ 1 −(q 2 +p 2 )/(2σ 2 ) ρβ = e z zdqdp.
The thermal equilibrium states are Gaussian Corollary The thermal equilibrium ρβ state is a centered Gaussian state with covariance matrix „ « „ « 2 coth(β/2) Tr(ρQ ) Tr(ρQ ◦ P) 1 0 = . 134 . 2 0 1 Tr(ρQ ◦ P) Tr(ρQ ) 2 where coth(β/2) = (e β/2 + e −β/2 )/(e β/2 − e −β/2 ).
i. The corresponding bivariate Gaussian is N((p. squeezed thermal state. a displaced.All Gaussian states Theorem (general form of a Gaussian state) Any Gaussian state of a quantum harmonic oscillator is of the form ρ = D(z)∗ Γ(φ)∗ S(ξ)∗ ρβ S(ξ)Γ(φ)D(z). q). 1 4 135 .e. V ) with covariance matrix „ −2ξ « coth(β/2) e 0 V = R(φ) R(φ)T 2ξ 0 e 2 A positive real matrix V is the covariance matrix of a Gaussian state iﬀ Det(V ) ≥ Proof: exercise. rotated.
Estimation of Gaussian states Gaussian shift models Gaussian estimation problems One dimensional Gaussian shift Two dimensional Gaussian shift The optimal covariant measurement Reference: A. Holevo. S. Probabilistic and statistical aspects of quantum theory (1982) 136 .
Gaussian shift models Deﬁnition Denote by G (z. A quantum Gaussian shift model is a family of the form QV := {G (θ. 137 . Remark In the case of a single quantum oscillator considered so far. there are only two possible types of Gaussian shift models: one dimensional and twodimensional (full) shift. V ) : θ ∈ Θ ⊂ C} where Θ is a real linear subspace of C and V is a ﬁxed and known covariance matrix. V ) the density matrix of the Gaussian state with displacement z and covariance matrix V .
Proof. where z = (e ξ Re(e iφ z) + ie −ξ Im(e iφ z)) From the above equations we get G (z . V ) = D(z)∗ Γ(φ)∗ S(ξ)∗ ρβ S(ξ)Γ(φ)D(z). We have S(ξ)Γ(φ)D(z)Γ(φ)∗ S(ξ)∗ = D(z ). σ 2 = Det(V )1/2 138 .Equivalence with displaced thermal equilibrium states Lemma By applying an appropriate (unitary) squeezing operation we can transform the model QV into an equivalent model consisting of displaced thermal equilibrium (or coherent) states. By the Theorem on the general form of a Gaussian state we have G (z. σ 2 1) = S(ξ)Γ(φ) G (z. V ) Γ(φ)∗ S(ξ)∗ .
p) ∈ R2 } for quadratic risk ˆ ˆ ˆ R(θ. G ) = Eθ ((θ − θ)T G (θ − θ)) By rotation symmetry of ρβ we can pass to the coordinates system in which G = Diag(gq . p)ρβ D(q. θ) = Eθ ((θ − θ)2 ) 2. θ. θ. p) in the twodimensional Gaussian shift {ρθ := D(q. G ) = gq Eθ ((ˆ − q)2 ) + gp Eθ ((ˆ − p)2 ) q p 139 . p)∗ : (q. gp ) is diagonal and write ˆ R(θ.Gaussian estimation problems We will consider the following two estimation problems: 1. estimation of parameter θ = (q. estimation of parameter θ in the onedimensional Gaussian shift {ρθ := exp(−iθP)ρβ exp(iθP) : θ ∈ R} for quadratic risk ˆ ˆ R(θ.
be a Gaussian shift with 0 < β < ∞ ﬁxed and known. θ ∈ R.Onedimensional Gaussian shift Theorem (optimal estimation for onedimensional shift) Let ρθ := exp(−iθP)ρβ exp(iθP). θ) = Varθ (θ) = H(θ)−1 = 2 140 . The resulting unbiased estimator θ has risk coth(β/2) ˆ ˆ R(θ. The symmetric logarithmic derivative Lθ deﬁned by dρθ = L θ ◦ ρθ dθ is equal to 2(Q − θ1)/ coth(β/2) and the quantum Cram´rRao bound is e ˆ achieved by measuring Q.
ρβ ] = L0 ◦ ρβ By writing the matrix elements w.l. Hence Lθ = 2(Q − θ1)/ coth(β/2) The HelstromFisher information is H(θ) = Tr(ρθ L2 ) = 2/ coth(β/2) θ ˆ The result θ of measuring Q is an unbiased estimator of θ and ˆ Varθ (θ) = H(θ)−1 141 .d.Onedimensional Gaussian shift: proof Proof. We have dρθ = −i[P.t the Fock basis we get −i(e −nβ − e −mβ ) mPn = (e −nβ + e −mβ )/2 mL0 n with solution L0 = 2Q/ coth(β/2). is of the form Lθ = exp(−iθP)L0 exp(iθP) where L0 is the solution of −i[P.r. ρβ ] exp(iθP) dθ Thus the s. ρθ ] = −i exp(−iθP)[P.
One dimensional Gaussian shift as a covariant family Remark The s. The methods developed for covariant measurements with respect to compact groups can be extended to R and lead to the same optimal measurement (see Holevo) The same result can be obtained in the case of coherent states (‘β = ∞’) with the diﬀerence that Lθ is not uniquely deﬁned as an operator but only as an element of L2 (ρθ ) 142 .’s {Lθ : θ ∈ R} form a commutative family and in this case we can achieve the Cram´rRao bound ‘in one shot’ not only in the sense of e locally unbiased measurements which provide only asymptotic optimality The HelstromFisher matrix and risk do not depend on θ due to the fact that we deal with a covariant family.l.d.
The following covariant measurement is optimal for the above estimation problem dq dp ˆ ˆ M(d q d p ) = ξ. p) ∈ R2 be a Gaussian shift model with 0 < β < ∞ ﬁxed and known. z ξ.Twodimensional Gaussian shift Theorem (optimal estimation for twodimensional shift) Let ρθ := D(q. θ. z is the pure squeezed state with z = (ˆ + i p )/ 2 and ˆ ˆ q ˆ p −2ξ squeezing parameter e = gp /gq ˆ 2. p)∗ . p ) are unbiased q ˆ estimators of q and respectively p and their covariance matrix achieves the lower bound in the right CramerRao bound. ˆ Let us ﬁx the quadratic risk for an estimator θ = (ˆ. θ = (q. The components of the corresponding estimator θ = (ˆ. 143 . p ) q ˆ ˆ R(θ. p)ρβ D(q. z  ˆ ˆ ˆ ˆ 2π √ where ξ. G ) = gq Eθ ((ˆ − q)2 ) + gp Eθ ((ˆ − p)2 ) q p 1.
On the joint space L2 (R) ⊗ L2 (R) we have „ « „ « „ « „ « Q Q ⊗1 Q 1⊗Q ≡ . . ≡ . P P ⊗1 P 1⊗P Deﬁne the rotated coordinates (50% beamsplitter transformation) « „ « „ « „ „ « 1 1 Q1 Q +Q Q2 Q −Q = √ = √ . ˆ ˆ Let Q. P1 P +P P2 P −P 2 2 and note that P1 commutes with Q2 . P be the coordinates of an independent copy of the oscillator.Simple implementation of the optimal measurement (I) The measurement M(d q d p ) can be dilated to a PVM as follows. Thus we can deﬁne the PVM E (d q d p ) = E Q (d q ) · E P (d p ) ˆ ˆ ˆ ˆ ˜ with Q := Q − Q and P = P + P ˜ ˜ 144 . P be the coordinates of the system and let Q .
ξ z2 . P ) is prepared in state ξ then we obtain the eﬀective measurement M on the ﬁrst copy of L2 (R) Tr(ρM(B)) = Tr(ρ ⊗ ξ ξE (B)). up to a density argument. In this case the equality reduces to a computation with Gaussian integrals (exercise) 145 . B ∈ Σ(R2 ) The measurement M is uniquely ﬁxed by its characteristic functions q ˆ q ˆ ϕM (u. ξ) z ˆ 2π Finally. ξ z .Simple implementation of the optimal measurement (II) If the oscillator (Q . ξ. ρ ρ ∈ T1 (L2 (R)) Thus it is enough to show that the right hand side is equal to Z d qd p ˆ ˆ q ˆ e iuˆ+iv p Tr(ρˆ. it is enough to show this for rank one ρ of the form z1 . v ) := Tr(ρM(e iuˆ+iv p )) = Tr(ρ ⊗ ξ ξ E (e iuˆ+iv p )).
For this it suﬃces to show that the right Cram´rRao bound is achieved in e the sense that −1 ˆ Tr(G Varθ (θ)) = Tr(GJθ ) Both sides can be explicitly computed.Proof of the optimal estimation for twodimensional shift Theorem It is easy to see that the measurement is unbiased and has the same risk for all θ To show that the measurement is optimal one has to show that ˆ Tr(G Var(θ)) achieves the smallest possible value. similarly to the one dimensional case. 146 .
G ) is given by M(d q d q ) = ξ ξ ˆ ˆ with e −2ξ dq dp ˆ ˆ 2π p = gq /gp . 147 .r. Extension of the results on covariant measurements for irreducible representations. to the case of noncompact (but abelian) group R2 Theorem ˆ The covariant measurement which minimises the risk R(θ.t the displacement operators has a POVM of the form √ ˆ ˆ ∗ dq dp M(d q d p ) = D(ˆ)ρD(ˆ) ˆ ˆ z z . Proof. z = (ˆ + i p )/ 2 ˆ q ˆ 2π where ρ is an arbitrary state.Alternative proof using covariant measurements Lemma Any covariant measurement w. θ.
G ) = gq [Tr(ρβ Q 2 ) + Tr(ψ ψQ 2 )] + gq [Tr(ρβ P 2 ) + Tr(ψ ψP 2 )] coth(β/2) Tr(G) + [gq ψ.Alternative proof using covariant measurements Since the risk is aﬃne w.r. ρ the minimum is achieved on an extremal point. Eθ (ˆ) = p + ψP ψ . P 2 ψ ] = 2 148 . Q 2 ψ + gp ψ. The risk of M is constant as function of θ and equal to R(M.t. so we can restrict to pure states ρ = ψ ψ Let M be the measurement with seed ψ. by simply replacing ξ by ψ and Z ˜ ˜ Eθ (ˆ) q = q Tr(ρθ ⊗ ψ ψE Q (d q )E P (d p )) ˆ ˆ ˆ ˜ = Tr(ρβ ⊗ ψ ψQ) = Tr(ρβ Q) − ψQ ψ = q − ψQ ψ Similarly. Then M can be dilated to a PVM on two oscillators as before. p The measurement is thus unbiased up to a constant factor.
Equalities are obtained if ψ is a minimum uncertainty state. and if gq e −2ξ = gp e 2ξ i. P 2 ψ ≥ gq gp 2 2 where we used Heisenberg’s uncertainty relation in the second inequality. i. P 2 ψ 1√ ≥ gq gp ψ.e. Q 2 ψ + gp ψ.e. Q 2 ψ ψ. e −2ξ = p gp /gq The minimum risk is √ coth(β/2) R(G ) = Tr(G ) + DetG 2 149 .Alternative proof using covariant measurements We have p √ gq ψ. a pure squeezed states ψ = ξ .
Gut˘. (2005) ¸a 150 . Gill and M. Holevo. S. R.Wigner functions and quantum homodyne tomography HilbertSchmidt operators Isometry between T2 and L2 (R2 ) The Wigner function Examples of Wigner function Quantum homodyne homography Estimation of matrix elements using pattern functions Reference: A. 109134. D. Artiles. Probabilistic and statistical aspects of quantum theory (1982) L. Soc B. J. 67. Royal Stat.
The class of HilbertSchmidt operators is deﬁned by T2 (H) := {τ ∈ B(H) : Tr(τ 2 ) < ∞} with τ 2 = Tr(τ 2 )1/2 .HilbertSchmidt operators Deﬁnition (HilbertSchmidt operators) Let H be a Hilbert space. {ei } and {fi } are ONB and i < ∞. Properties T2 (H) is a Hilbert space with inner product τ. The traceclass operators T1 (H) form a subset of T2 (H) 151 . σ 2 = Tr(τ ∗ σ) The ﬁnite rank operators are dense in T2 (H) Any τ ∈ T2 (H) has a singular value decomposition τ = ∞ X i=1 µi ei fi . P µi 2 = τ 2 2 where µi ≥ 0.
v ) → Wψ2 ψ1  (u. v ) := ψ1 . The transformation 1 f 1 τ → √ Wτ (u. i. j ≥ 1 152 . ej  (u. i. v ) ∈ L2 (R2 ) 2. If {ei } is an ONB of L2 (R) then the functions 1 f √ Wei 2π form an ONB of L2 (R2 ) 3. v ) := √ Tr(τ exp −i(uQ + vP))) 2π 2π maps the Hilbert space T2 (L2 (R)) unitarily onto L2 (R2 ).e. v ). W (u. Then the function f (u. Let ψ1 . exp(−i(uQ + vP))ψ2 f is square intergrable.Isometry between T2 (L2 (R)) and L2 (R2 ) Proposition (Isometry between T2 (L2 (R)) and L2 (R2 )) 1. ψ2 be vectors in L2 (R).
ψ2 are square integrable.j is an ONB in L2 (R2 ). exp(−i(uQ + vP))ψ2 = √ e 2π 2π Z Z i 1 2 uv ˜ ψ1 (x)ψ2 (y )e ixy e −i(yv +ux) dxdy = e 2π Z ψ1 (x)e −iux ψ2 (x − v )dx ˜ where ψ2 is the Fourier transform of ψ2 . 153 . and of the fact that the ﬁnite rank operators are dense in T2 (L2 (R)). 3.Isometry between T2 (L2 (R)) and L2 (R2 ) Proof. then ψ1 (x)ψ2 (y )e ixy is a square integrable function of (x. This is a consequence of 2. Using the deﬁnition of Weyl operators we have i uv 1 1 2 √ ψ1 . 1. If {ei }i is an ONB in L2 (R) then {ei (x)˜j (y )e ixy }i. y ) and the double integral is its Fourier transform. ˜ Since ψ1 . e The result follows from the fact that the Fourier transform is unitary operator on L2 (R2 ). 2.
v ) := Tr(τ exp(−i(uQ + vP))). v )dudv Wρ (q. (u.Wigner function Deﬁnition (Wigner function) Let τ ∈ T2 (L2 (R)). p) = 2 (2π) 154 . v ) ∈ R2 The Wigner function of τ is deﬁned as Z Z 1 f exp(iuq) exp(ivp)W (u. The ‘characteristic function’ of τ is deﬁned as f Wτ (u.
p)Wτ2 (q.(some) Properties of Wigner functions Let τ ∈ T2 (L2 (R)). Displacements. q sin φ + p cos φ)dp −∞ Wρ is a quasiprobability distribution of Q and P: its marginals are probability densities but Wρ may take negative values. Then the one dimensional marginal along the direction φ of the Wigner function Wρ is equal to the probability density of the quadratures Xφ in the state ρ: Z ∞ (X ) pρ φ (q) = W (q cos φ − p sin φ. 155 . p)dqdp Tr(τ1 τ2 ) = 2π Let ρ be a density matrix. τ2 ∈ T2 (L2 (R)). phase rotations and squeezing act in the obvious way on the space of Wigner functions. Then Wτ is a square integrable function and Wτ ∗ (q. Then Z Z ∗ Wτ1 (q. p) = Wτ (q. p) (Overlap formula) Let τ1 .
05 0 !0.1 !0.2 !0.25 !0.3 0.05 !0.3 60 50 40 30 20 10 10 30 20 40 50 60 Wigner function of a ‘Schr¨dinger cat state’ o (superposition of two coherent vectors) Wigner function of the one photon state ψ1 156 .Examples of Wigner functions Wigner function of a squeezed state Wigner function of a singlephotonadded state 0.1 0 !0.1 !0.4 0.1 0.15 !0.3 4 2 0 !2 !4 p 4 2 q !2 0 !4 0.2 0.2 !0.
Mlynek Measurement of the quantum states of squeezed light Nature 387 471475 (1997) 157 .Quantum homodyne tomography Quantum homodyne tomography is a measurement technique developed in quantum optics for the estimation of the state of a quantum oscillator (monochromatic mode of light) G. Schiller and J. Breitenbach. S.
Quantum homodyne tomography: the measurement procedure
Meausurement procedure 1. One chooses a random, uniformly distributed phase φ ∈ [0, π] 2. The quadrature Xφ is measured on a quantum system prepared in state ρ. This is a so called homodyne measurement (see ﬁgure) 3. Steps 1. and 2. are repeated on independent copies of ρ and one collects i.i.d. data (Φ1 , X1 ), . . . , (Φn , Xn ) with probability density pρ (φ, x) = p(φ) · pρ (xφ) = 1 Xφ pρ (x), π
I2
detector
x ∈ R, φ ∈ [0, π]
I2 −I1 z
∼ pρ (x, φ)
I1
signal
beam splitter
detector
local oscillator
z = zeiφ
158
Qauntum homodyne tomography: the Radon transform
La fonction de Wigner et la transformation de Radon On peut pas mesurer Q Deﬁnition (Radon transform) et
P simultan´ment! e
Let Wρ : R2 → R be the Wigner function of ρ. The Radon transform of Wρ is QP − PQ = i1 =⇒ E(QP) = Tr [PΨQP] = Tr [PΨPQ] = E(PQ) the function on R × [0, π] given by Z ∞ X R[Wρ ](q, φ) := Xφ =W (q cos φ − p ,sin φ, q e de prob. pρφ)dp = pρ φ (q) sin Quadratures: cos φ Q + sin φ P densit´ φ + p cos (x  φ)
−∞
ρ
. . .. .................................................... ................................................... . . . . . ..
Wρ(q, p)
. .................................................................................. .................................................................................. . . ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... .. .... .. .
R
F2
.. .. ... ... ... ... ... ... .. .. ... ... .. .. ... ... .. ... ... .. ... ... ... ... ... ... . .. ... ... ... ... ... .. ... ... ... ... ... ... . .. ... ... ... ... ... ... ... ... .. .. ... ... . ... ..... ... ... ....
pρ(x  φ)
F1
Wρ(u, v) = Tr [ρ exp(−iuQ − ivP)]
Remark
From the → pρ (x, φ) is injective and hence ρ is identiﬁable from the data with distribution Pn . ρ If one aims at estimating Wρ rather than ρ, the problem is closely related to the ‘classical’ positron emission tomography method with the diﬀerence that Wρ has intrinsic quantum properties, e.g. it may take negative values.
Radon:
∞ R[f ](x, φ) = −∞ f (x cos φ + t sin φ, x sin φ − t cos φ)dt above diagram we conclude that the map ρ
159
Quantum homodyne tomography: the statistical problem
Problem
Given i.i.d. data (X1 , Φ1 ), . . . , (Xn , Φn ) with distribution Pρ , construct an estimator ρn of ρ such that ˆ ρn is consistent, i.e. d(ˆn , ρ) → 0 as n → ∞ for some relevant distance d, ˆ ρ e.g. normone, ﬁdelity... R(ˆn , ρ) := Eρ d(ˆn , ρ) is small for all states ρ ρ ρ Similar problem for the estimation of Wρ or some functional of ρ, e.g. Tr(ρ2 )
Remark
If some information is available about ρ we can encode it in the parametrisation ρ = ρθ for θ ∈ Θ If the dimension of Θ is inﬁnite, we deal with a nonparametric estimation problem for which the risk R(ˆn , ρ) may decrease slower than the 1/n ρ typical for parametric problems A more restrictive statistical model {ρθ : θ ∈ Θ} leads to faster estimation rates but one should avoid “model misspeciﬁcation”
160
Φ) ∼ Pρ .k (x. Φ)) = ρi. 161 . Then Eρ (Fi.Estimation of matrix elements using pattern functions Lemma f Let (X . Moreover Fj.k (x. φ) = fj.j . φ) := F [wj j (−sφ)s](x) ˜ 2 where F −1 is the inverse Fourier transform with respect to s.k bounded oscillatory functions called pattern functions. φ) is of the form Fj.k (x) expi(j−k)φ with fj.j (X . s sin φ) and deﬁne ˜ = r π −1 Fij (x. Let wρ (sφ) := Wρ (s cos φ.
Φ)) In the ﬁrst equality we used the isometry between T2 (L2 (R)) and L2 (R2 ) In the third equality we used the relation w (sφ) = F[p(xφ)](s).j = = = = Z Z 1 f f Tr(ρj i) = Wρ (u.k .j (x. v )Wi.j (X .k on φ follows from the deﬁnition of Wj. f The dependence of Fj.Estimation of matrix elements using pattern functions Proof.j (−u. 162 . We have ρi. φ)pρ (xφ) π −∞ 0 Eρ (Fi. −v )dudv 2π Z ∞ Z π 1 swρ (sφ)wj j (−sφ) sdsdφ 2π −∞ 0 Z ∞ Z π 1 Fi.
5 0 −0.5 1 0.5 1 0.5 0 −0.5 0 −0.20 0 q Pattern function f5.20 5 −5 0 5 q Pattern function f10.30 10 1 0.5 0 1.5 −1 −10 Pattern function f20.5 −1 −5 1 0.5 −5 0 q 5 10 −1 −10 −5 0 q 5 10 Pattern functions for diﬀerent matrix elements 163 .5 −1 −10 −0.Pattern functions Pattern function f5.
Estimators based on pattern functions Deﬁnition (pattern function estimator for matrix elements) Let (X1 . with distribution Pρ . ˆ n→∞ lim Eρ ρ(n) − ρ ˆ 2 2 =0 164 . . .k := ˆ 0 if max{j. k ≤ d(n) ˜ (n) ρj. i.e.k := ˜ fj.k (Xi )e i(j−k)Φi n i=1 Theorem (consistency of truncated pattern function estimators) Let ρ(n) be the density matrix estimator with matrix elements ˆ (n) ρj. . φ1 ). k} > d(d) where d(n) is the eﬀective dimension of the estimator and satisﬁes d(n) ↑ ∞ as n → ∞ and d(n) = o(n3/7 ) Then ρ(n) is consistent with respect to the · 2 distance. (Xn . The pattern function estimator of ρj.k is n 1X (n) ρj. Φn ) be i.k if j. .i.d.
109134.k 2 ) ρ X max{j.k (x.k 2 .k  ) + ρ ρj.k 2 p(xφ) dφ 1 dx ≤ fj.k=0 d(n) X j.k}>d(n) = Eρ (˜j. 67. if the variance is controlled while increasing d with n we obtain the consistency result.k 2 ) = ρ 1 n Z Fj.k − ρj. (2005)) fj. Now Eρ (˜j.Estimators based on pattern functions Proof.k 2 2 = O(d 7/3 ) 165 .k π n 2 2 The result follows from the bound (see d X j.k − ρj.k=0 2 Eρ (ˆj. The variance term (ﬁrst) decreases with n but increases with d. while the bias term (second) decreases with d.k=0 J.k − ρj. φ) − ρj. Soc. We write the risk as E( ρ(n) − ρ 2 ) ˆ 2 = ∞ X j. B. Thus. Royal Stat.
5 0 2 1.5 1 0.5 5 10 15 20 n = 12800 25 0 0 −1 −2 −3 −4 4 6 5 n = 400 10 15 n = 3200 20 25 5 10 15 20 25 Opt. PF and MLE 1.5 1 0. n = 100 2 1.5 0 2 log(min(L )) 2 n = 200 2 1.) and several choices of n.5 1 0.5 1 0. lik.5 1 0.Dependence of the risk on dimension d The graphs below show the risk as function of dimension d for two estimation methods (pattern functions and sieve max.5 0 2 1.5 1 0. L2 vs n.5 0 2 1.5 1 0.5 5 10 15 n = 1600 20 25 0 2 1.5 1 0. The tradeoﬀ between bias and variance is reﬂected in the existence of a minimum for a certain ‘oracle’ dimension d ∗ (n) which depends on ρ.5 0 5 10 15 20 25 8 log(n) 10 166 .5 0 5 10 15 20 25 5 10 15 n = 6400 20 25 5 10 15 n = 800 20 25 2 1.
e. β)e −αd and the risk is upper bounded by Eρ ( ρn − ρ ˆ By choosing d = 1 α 2 2) d 7/3 ≤C + C e −αd n log n we get Eρ ( ρn − ρ 2 ) = O ˆ 2 „ (log n) n 7/3 « which is only slightly worse than the parametric rate 1/n 167 .k 2 ≤ C (α. β) := {τ : Tr(τ e αN ≤ β)}. β > 0 ρj. Then X max{j.k}>d α. ρ ∈ E(α.Methods for choosing the dimension d(n) Deterministic choice of dimension Suppose that ρ belongs to a ‘nice’ class of states.g.
168 . penalised maximum likelihood. ˆ However R(d) depends on ρ.. In general one would like to ﬁnd d ∗ = arg min R(d) := arg min Eρ ρd − ρ ˆ (n) d d (n) 2 2 where ρd is the estimator of dimension d for n samples. so we can try to estimate it and minimise the estimator. Akaike’s information criterion. threshholding. It would be nicer to adapt the dimension to the particular state by making use of the data. e. There are many ‘model selection’ methods..g.Methods for choosing the dimension d(n) Data driven choice of dimension The previous method selects d which works for all states in a certain class. crossvalidation.
k 2 − ρ Fj.k=0 a=b Thus our estimator for R(d) = ρ d X j.k (Xa . Φb ) n(n − 1) j.Methods for choosing the dimension d(n) The risk as a function of d is R(d) = (n) Eρ ρ d ˆ d X j. Φa )Fj.k 2 ρ −2 ρj. Φb ) n(n − 1) j.k ).k=0 −ρ 2 2 = (n) Eρ ˜j. which is an estimator of the risk.k (Xb .k 2 + ρ 2 2 We try to change this into an expression depending only on the data (not on ρj.k=0 The second term can be estimated unbiasedly by d XX −2 Fj.k (Xa .k 2 ρ j. Φa )Fj. The last term does not depend on d and can be dropped without changing the minimum The ﬁrst term can be estimated from the data by simply taking Pd (n) ˜j.k=0 2 2 is d XX 2 (n) Eρ ˜j.k (Xb .k=0 a=b 169 .k=0 d X j.
5 0 0 5 10 15 20 D 25 30 35 40 45 The graph represents the risk R(d) as function of dimension d (red) and the estimated risk for three samples of data from a squeezed state (blue) 170 .5 1 0.5 2 L2 norm of error 1.Methods for choosing the dimension d(n) L2 error. theoretical and for CV estimator for 1000 observations 2.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.