Columbia, SC 29208 USA. Email: fenner@cse.sc.edu. This material is based upon work supported
by the National Science Foundation under Grant Nos. CCF0515269 and CCF0915948. Any opinions,
ndings and conclusions or recommendations expressed in this material are those of the author and do not
necessarily reect the views of the National Science Foundation (NSF).
1
Contents
1 Week 1: Overview 7
Brief, vague history of quantum mechanics, informatics, and the
combination of the two. . . . . . . . . . . . . . . . . . . . . . 7
Implementations of Quantum Computers (the Bad News). . . . . . . 8
Implementations of Quantum Cryptography (the Good News). . . . 8
2 Week 1: Preliminaries 9
Just Enough Linear Algebra to Understand Just Enough Quantum
Mechanics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
The Complex Numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . 9
The Exponential Map. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Vector Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Adding and Multiplying Matrices. . . . . . . . . . . . . . . . . . . . . 11
The Identity Matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Nonsingular Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Determinant and Trace. . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Hilbert Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Orthogonality and Normality. . . . . . . . . . . . . . . . . . . . . . . 13
3 Week 2: Preliminaries 15
Linear Transformations and Matrices. . . . . . . . . . . . . . . . . . . 15
Adjoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
GramSchmidt Orthonormalization. . . . . . . . . . . . . . . . . . . . 17
Hermitean and Unitary Operators. . . . . . . . . . . . . . . . . . . . . 18
L(H) is a Hilbert space. . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Week 2: Preliminaries 19
Dirac Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Change of (Orthonormal) Basis. . . . . . . . . . . . . . . . . . . . . . 21
2
Unitary Conjugation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Back to Quantum Physics: The Double Slit Experiment. . . . . . . . . 22
5 Week 3: Unitary conjugation 24
Invariance under Unitary Conjugation: Trace and Determinant. . . . 24
Orthogonal Subspaces, Projection Operators. . . . . . . . . . . . . . . 24
Fundamentals of Quantum Mechanics. . . . . . . . . . . . . . . . . . 28
Physical Systems and States. . . . . . . . . . . . . . . . . . . . . . . . 28
Time Evolution of an Isolated System. . . . . . . . . . . . . . . . . . . 28
Projective Measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6 Week 3: Projective measurements (cont.) 29
A Perfect Example: Electron Spin. . . . . . . . . . . . . . . . . . . . . 31
7 Week 4: Qubits 34
Qubits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Back to Electron Spin. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
8 Week 4: Density operators 39
Density Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Properties of the Pauli Operators. . . . . . . . . . . . . . . . . . . . . 41
SingleQubit Unitary Operators. . . . . . . . . . . . . . . . . . . . . . 42
A direct translation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
From unitaries to rotations. . . . . . . . . . . . . . . . . . . . . . . . . 46
From rotations to unitaries. . . . . . . . . . . . . . . . . . . . . . . . . 47
9 Week 5: The exponential map 48
The Exponential Map (Again). . . . . . . . . . . . . . . . . . . . . . . 48
Upper Triangular Matrices and Schur Bases. . . . . . . . . . . . . . . 50
Eigenvectors, Eigenvalues, and the Characteristic Polynomial. . . . . 51
Eigenvectors and Eigenvalues of Normal Operators. . . . . . . . . . 52
Positive Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Commuting Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3
10 Week 5: Tensor products 59
Tensor Products and Combining Physical Systems. . . . . . . . . . . 59
Back to Combining Physical Systems. . . . . . . . . . . . . . . . . . . 61
The NoCloning Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . 63
Quantum Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
11 Week 6: Quantum gates 66
Quantum Circuits Versus Boolean Circuits. . . . . . . . . . . . . . . . 69
Why Clean? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
12 Week 6: Measurement gates 74
Measurement gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Bell States and Quantum Teleportation. . . . . . . . . . . . . . . . . . 75
Dense Coding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
13 Week 7: Basic quantum algorithms 80
BlackBox Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Deutschs Problem and the DeutschJozsa Problem. . . . . . . . . . . 80
14 Week 7: Simons problem 87
Simons Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Linear Algebra over Z
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Back to Simons Problem. . . . . . . . . . . . . . . . . . . . . . . . . . 91
Shors Algorithm for Factoring. . . . . . . . . . . . . . . . . . . . . . . 92
Modular Arithmetic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Factoring Reduces to Order Finding. . . . . . . . . . . . . . . . . . . . 94
15 Week 8: Factoring and order nding (cont.) 95
Geometric series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
The Quantum Fourier Transform. . . . . . . . . . . . . . . . . . . . . 98
16 Week 8: Shors algorithm (cont.) 101
Analysis of Shors Algorithm. . . . . . . . . . . . . . . . . . . . . . . . 102
4
17 Week 9: Best rational approximations 108
The Continued Fraction Algorithm. . . . . . . . . . . . . . . . . . . . 108
Implementing the QFT. . . . . . . . . . . . . . . . . . . . . . . . . . . 109
18 Week 9: Approximate QFT 113
Exact versus Approximate. . . . . . . . . . . . . . . . . . . . . . . . . 113
The CauchySchwarz inequality. . . . . . . . . . . . . . . . . . . . . . 114
A Hilbert Space Is a Metric Space. . . . . . . . . . . . . . . . . . . . . 115
19 Midterm Exam 121
20 Week 10: Grovers algorithm 123
Quantum Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Some Variants of Quantum Search. . . . . . . . . . . . . . . . . . . . . 125
21 Week 10: Quantum search lower bound 126
A Lower Bound on Quantum Search. . . . . . . . . . . . . . . . . . . 126
22 Week 11: Quantum cryptography 131
Quantum Cryptographic Key Exchange. . . . . . . . . . . . . . . . . 131
23 Week 11: Basic quantum information 137
Norms of Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
POVMs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Mixed States. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
OneQubit States and the Bloch Sphere. . . . . . . . . . . . . . . . . . 142
24 Week 12: Quantum operations 144
The Partial Trace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Open Systems and Quantum Operations. . . . . . . . . . . . . . . . . 145
Equivalence of the CoupledSystems and OperatorSum Represen
tations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
A Normal Form for the Kraus Operators. . . . . . . . . . . . . . . . . 149
Quantum Operations Between Dierent Hilbert Spaces. . . . . . . . 151
5
General Measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Completely Positive Maps. . . . . . . . . . . . . . . . . . . . . . . . . 154
25 Week 12: Distance and delity 159
Distance Measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Trace Distance and Fidelity of Operators. . . . . . . . . . . . . . . . . 161
Properties of the Trace Distance. . . . . . . . . . . . . . . . . . . . . . 162
Properties of the Fidelity. . . . . . . . . . . . . . . . . . . . . . . . . . 167
Comparing Trace Distance and Fidelity. . . . . . . . . . . . . . . . . . 167
26 Week 13: Quantum error correction 168
Quantum Error Correction. . . . . . . . . . . . . . . . . . . . . . . . . 168
The Quantum BitFlip Channel. . . . . . . . . . . . . . . . . . . . . . . 169
The Quantum PhaseFlip Channel. . . . . . . . . . . . . . . . . . . . . 173
The Shor Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
27 Week 13: Error correction (cont.) 180
Quantum Error Correction: The General Theory. . . . . . . . . . . . . 180
Discretization of Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . 184
28 Week 14: Fault tolerance 186
FaultTolerant Quantum Computation. . . . . . . . . . . . . . . . . . 186
29 Week 14: Entanglement and Bell inequalities 186
30 Final Exam 188
6
1 Week 1: Overview
Brief, vague history of quantum mechanics, informatics, and the combination of the
two.
Quantum Theory The foundations of quantum mechanics were established by com
mittee: Niels Bohr, Albert Einstein, Werner Heisenberg, Erwin Schr odinger, Max
Planck, Louis de Broglie, Max Born, John von Neumann, Paul A.M. Dirac, Wolf
gang Pauli, and others over the rst half of the 20th century. The theory provides
extremely accurate descriptions of the world at the atomic and subatomic levels,
where classical (i.e., Newtonian) physics and electrodynamics break down. Ex
amples: stability of atoms, black body radiation, sharp spectral absorption lines,
etc.
Informatics Broadly, this is the study of all aspects of informationits storage, trans
mission, and manipulation (i.e., computation). It includes what is commonly called
Computer Science in the US, as well as Information Theory. Foundations of Com
puter Science were laid at about the same time as quantum mechanics by Gottlob
Frege, David Hilbert, Alonzo Church, Haskell Curry, Kurt G odel, John Barkley
Rosser, Alan Turing, Jacques Herbrand, Emil Post, Stephen Kleene and others, who
were developing a formal notion of algorithm or eective procedure to under
stand problems in the foundations of mathematics. Foundations of computability
culminated in the ChurchTuring thesis. Largly independently, the eld of Informa
tion Theory started in 1948 with Claude Shannons paper, A Mathematical Theory
of Communication. Information theory deals with quantifying information and
understanding how it can be stored and transmitted, both securely and otherwise.
Shannon dened the notion of information entropy, somewhat analogously to phys
ical entropy, and proved engineeringrelated results about compression and noisy
transmission that are in common use today.
Quantum Information and Computation The physicist RichardFeynmanrst suggested
the idea of a quantumcomputer andwhat it couldbe usedfor. Charles Bennett (80s?)
showed that reversible computation (with no heat dissipation or entropy increase)
was possible at least in principle. Paul Benio (80s) showed howquantumdynamics
could be used to simulate classical (reversible) computation, David Deutsch (80s)
dened the Quantum Turing Machine (QTM) and quantum circuits as theoretical
models of a quantumcomputer. Further foundational work was done by Bernstein&
Vazirani, Yao, and others (quantumcomplexity theory). Bennett and Gilles Brassard
(1984) proposed a scheme for unconditionally secure cryptographic key exchange
based on quantum mechanical principles, using polarized photons. Deutsch &
Jozsa and Simon (early 90s) gave toy problems on which quantum computers
performed provably better than classical ones. A big breakthrough came in the
mid 1990s when Peter Shor showed how a quantum computer can factor large
7
integers quickly (1994), as well as compute discrete logarithms (these would break
the security of most public key encryption schemes in use today). Grover (1996?)
proposed a completely dierent quantum algorithm to quadratically speed up list
search. Calderbank & Shor and Steane (1996?) showed that good quantum error
correcting codes exist and that faulttolerant quantum computation is possible. This
led to the threshold theorem (D. Aharonov, A. Yu. Kitaev(?)), which states that there
is a constant
0
> 0 (current rough estimates are around 10
4
) such that if the noise
associated with each gate can be kept below
0
, then any quantum computation can
be carried out with arbitrarily small probability of error. This theorem shows that
noise is not a fundamental impediment to quantum computation.
Implementations of QuantumComputers (the Bad News). There are several proposals
for physical devices implementing the elements of quantum computation. Each has its
own strengths and weaknesses. In recent years, ion traps look the most promising. Were
still far o from a viable, scalable, robust prototype.
Nuclear Magnetic Resonance (NMR) Quantum bits are nuclei of atoms (hydrogen?) ar
ranged on an organic molecule. The value of the bit is given by the spin of the
nucleus. Nuclear spins can be controlled by electromagnetic pulses of the right
frequency and duration. Main advantage: spins are well shielded from the outside
by the electron clouds surrounding them, so they stay coherent for a long time.
Main disadvantage: since the nuclei need to be on same molecule to control the
distances between them, NMR does not scale well. Homay Valafar will talk about
NMR toward the end of the course.
Ions in traps Qubits are ions kept equally spaced in a row (a couple of inches apart) by
an oscillating electric eld. Laser pulses can control the states of the ions.
Quantum dots Qubits are particles (electrons?) kept in nanoscopic wells on the surface
of a silicon chip. Main advantage: easy to control and fabricate (solid state). Main
disadvantage: short decoherence times.
Optical schemes Qubits are polarizedphotons traveling throughmirrors, lenses, crystals,
and the vacuum. Main advantages: photons dont decay and their polarizations are
easy to measure; computation is at the speed of light. Main disadvantage: hard to
get photons to interact with each other.
Superconducting/Josephson junctions I dont know much about this, except that it pre
sumably needs temperatures close to absolute zero.
Implementations of Quantum Cryptography (the Good News). Quantum crypto not
only works in the real world, but works just ne on ber optic networks already in
place. British Telecomm (mid 1990s?) demonstrated the BB84 quantum key exchange
8
protocol using cable laid across Lake Geneva in Switzerland. I believe the scheme has
also been demonstrated to work with photons through the air over modest distances
(a few kilometers?). It is now feasible to use the ber optic cable already in place to
implement quantum crypto in the network of a major city (New York banks are already
using it(?)). It still wont work over really large distances without classical repeaters
(quantum amplication is theoretically impossible).
2 Week 1: Preliminaries
Just Enough Linear Algebra to Understand Just Enough Quantum Mechanics. We let
Z denote the set of integers, Q denote the set of rational numbers, R denote the set of real
numbers, and C denote the set of complex numbers.
The Complex Numbers. Cis the set of all numbers of the formz = x+iy, where x, y R
and i
2
= 1. We often represent z as the point (x, y) in the plane. The complex conjugate
(or adjoint) of z is
z
= z = x iy.
Note that x = (z + z
)/2i is the
imaginary part of z ((z)). The norm or absolute value of z is
z =
z =
_
x
2
+y
2
0,
with equality holding i z = 0. If z
1
, z
2
C, its easy to check that z
1
z
2
 = z
1
 z
2
. Its not
quite so easy to check that
z
1
+z
2
 z
1
 +z
2
 , (1)
but see the CauchySchwarz section of the Background Material for a proof. (1) is an
example of a triangle inequality.
Exercise 2.1 Check that (z
1
z
2
)
= z
1
z
2
and (z
1
+ z
2
)
= z
1
+ z
2
and (z
1
)
= z
1
for all
z
1
, z
2
C.
If z 0, then the argument of z (arg(z)) is dened as the angle that z makes with
the positive real axis. Our convention will be that 0 arg(z) < 2. It is known that
arg(z
1
z
2
) = arg(z
1
) + arg(z
2
) up to a multiple of 2.
The real numbers R forms a subset of C consisting of those complex numbers with 0
imaginary part, namely,
R = {z C : z = z
}.
The unit circle in C is the set of all z of unit norm, i.e., {z C : z = 1}.
9
C is an algebraically closed eld. That is, every polynomial of positive degree with
coecients in Chas a root in C, in fact n of them, where n is the degree of the polynomial.
This is equivalent to saying that every polynomial over Cis a product of linear (i.e., degree
1) factors. This fact is known as the Fundamental Theorem of Algebra.
Every polynomial over R can be factored into real polynomial factors of degrees 1 and
2. This implies that any odddegree real polynomial has at least one real root.
The Exponential Map. For any z, we can dene e
z
= exp(z) by the usual power series:
e
z
= 1 +z +
z
2
2!
+
z
3
3!
+ +
z
k
k!
+ , (2)
which converges for all z.
Exercise 2.2 Show that for any real ,
e
i
= cos +i sin . (3)
This very important identity is known as Eulers formula, named after the Swiss mathe
matician Leonhard Euler. It connects the exponential and trigonometric functions. [Hint:
Compare the power series for e
i
with those for sin and cos . Other proofs exist.]
Exercise 2.3 Show that for all R,
cos =
e
i
+e
i
2
,
sin =
e
i
e
i
2i
.
Thus we can express the basic trigonometric functions in terms of exponentials. [Hint:
Use Eulers formula and the fact that cos() = cos and sin() = sin.]
By Exercise 2.2, we have e
z
= e
x
(cos y +i siny). The unit circle is the set {e
i
: R}.
Vector Spaces. Well deal with nite dimensional vector spaces only. Much of quantum
mechanics requires innite dimensional spaces, but thankfully, the QM that relates to
information and computation only requires nite dimensions. So all our vector spaces are
nite dimensional.
Our vector spaces will usually be over C, the eld of complex numbers, but sometimes
they will be over R(i.e., real vector spaces), and when we do information theory, will need
to look at bit vectors (vectors in spaces over the twoelement eld Z
2
= {0, 1}).
10
In a vector space, vectors can be added to each other and multiplied by scalars, obeying
the usual rules. If V is an ndimensional vector space and B = {b
1
, . . . , b
n
} is a basis for V,
then every v V is written as a linear combination of basis vectors:
v = a
1
b
1
+ +a
n
b
n
,
where a
1
, . . . , a
n
are unique scalars. Thus we can identify the vector v with the ntuple
_
_
a
1
.
.
.
a
n
_
_,
which we may also write as (a
1
, . . . , a
n
). Under this identication, vector addition and
scalar multiplication are componentwise.
The vector (0, . . . , 0) is the zero vector, denoted by 0.
Matrices. For integers m, n > 0, an m n matrix is a rectangular array of scalars with
m rows and n columns. If A is such a matrix and 1 i m and 1 j n, we denote
the (i, j)th entry of A (i.e., the scalar in the ith row and jth column) as [A]
ij
or A[i, j]. The
former notation is useful if the matrix is given by a more complicated expression.
Adding and Multiplying Matrices.
Exercise 2.4 Find two 2 2 matrices A and B such that AB = 0 (the zero matrix), but
BA 0.
The Identity Matrix.
Nonsingular Matrices.
Determinant and Trace. If A is an n n matrix, the trace of A (denoted tr A) is dened
as the sum of all the diagonal elements of A, i.e.,
tr A =
n
i=1
[A]
ii
.
The trace has three fundamental properties:
1. tr I = n, where I is the n n identity matrix.
11
2. tr(A + aB) = tr A + atr B, for n n matrices A and B and scalar a. (The trace is
linear.)
3. tr(AB) = tr(BA) for any n n matrices A and B.
In fact, tr is the only function from n n matrices to scalars that satises (1)(3) above.
Exercise 2.5 (Challenging) Prove this last statement.
Exercise 2.6 Showthat for any integers m, n 1, if Ais an nmmatrix and B is an mn
matrix, then
tr(AB) = tr(BA) . (4)
This veries item (3) above about the trace. We will use this fact frequently.
Hilbert Spaces. A vector space H over C is a Hilbert space if it has a scalar product
) : HH C that behaves as follows for all u, v, w H and a C:
1. uv +aw) = uv) +auw) () is linear in the second argument).
2. uv) = vu)
_
u
1
.
.
.
u
n
_
_ +
_
_
v
1
.
.
.
v
n
_
_ =
_
_
u
1
+v
1
.
.
.
u
n
+v
n
_
_ and a
_
_
u
1
.
.
.
u
n
_
_ =
_
_
au
1
.
.
.
au
n
_
_.
We dene the Hermitean inner product for all vectors u = (u
1
, . . . , u
n
) and v = (v
1
, . . . , v
n
)
as
uv) = u
1
v
1
+ +u
n
v
n
=
n
i=1
u
i
v
i
.
12
In this example, u and v can be expressed as linear combinations over the standard
basis {e
1
, . . . , e
n
}, where
e
i
=
_
_
0
.
.
.
0
1
0
.
.
.
0
_
_
, (5)
where the 1 occurs in the ith row.
Exercise 2.8 Check that the three properties of a Hermitean form are satised in this
example.
Note that if we restrict the u
i
and v
i
to be real numbers, then this is just the familiar
dot product of two real vectors. Also note that in this example,
u = uu) =
_
u
1
u
1
+ +u
n
u
n
=
_
u
1

2
+ +u
n

2
.
OrthogonalityandNormality. Ina genuine sense, the example above is the only example
that really matters. First some more denitions. Two vectors u, v in a Hilbert space Hare
orthogonal or perpendicular if uv) = 0. A vector u is a normal or a unit vector if u = 1. A
set of vectors v
1
, . . . , v
k
His an orthonormal set if each vector is a unit vector and dierent
vectors are orthogonal. That is, for all 1 i, j k, we have
v
i
v
j
) =
ij
=
1 if i = j,
0 if i j.
This equation also denes the expression
ij
which is called the Kronecker delta.
A basis for H is an orthonormal basis if it is an orthonormal set. Orthonormal bases
are special and have nice properties that make them preferable to other bases. From now
on we will assume that all our bases (for Hilbert spaces) are orthonormal unless I say
otherwise, and I wont.
In the example above, e
1
, . . . , e
n
clearly form an orthonormal basis. Well see later that
every Hilbert space has an orthonormal basislots of them, in fact. But lets get back to
our example. If we x an orthonormal basis B = {
1
, . . . ,
n
} for a Hilbert space H, then
we can write two vectors u, v H in terms of B as
u =
n
i=1
u
i
i
and v =
n
j=1
v
j
j
,
13
for some unique scalars u
1
, . . . , u
n
, v
1
, . . . , v
n
C. Lets see what uv) is.
uv) = u
1
1
+ +u
n
n
v)
=
n
i=1
u
i
v) (conjugate linearity in the rst argument)
=
i
u
i
v
1
1
+ +v
n
n
)
=
i
u
i
n
j=1
v
j
i

j
) (linearity in the second argument)
=
i,j
u
i
v
j
i

j
)
=
i,j
u
i
v
j
ij
(the basis is orthonormal)
=
n
i=1
u
i
v
i
.
In other words, uv) is exactly the quantity of our example above, if we identify u with
the tuple (u
1
, . . . , u
n
) C
n
and v with the tuple (v
1
, . . . , v
n
) C
n
.
14
3 Week 2: Preliminaries
Linear Transformations and Matrices. Let U and V be vector spaces. A linear map is a
function T : U V such that, for all vectors u, v U and scalar a,
T(u +av) = Tu +aTv.
The vector addition and scalar multiplication on the lefthand side is in U, and the right
hand side is in V. If {
1
, . . . ,
n
} is a basis for U and {
1
, . . . ,
m
} is a basis for V, then T
can be expressed uniquely in matrix form with respect to these bases: For each 1 j n,
we write T
j
uniquely as a linear combination of the
i
:
T
j
=
m
i=1
a
ij
i
, (6)
where each a
ij
is a scalar. Now let A be the m n matrix whose (i, j)th entry is a
ij
.
Expressing any u U with respect to the rst basis (of U) as
u =
n
j=1
u
j
j
=
_
_
u
1
.
.
.
u
n
_
_,
we get
Tu = T
_
n
j=1
u
j
j
_
=
n
j=1
u
j
T
j
(by linearity)
=
j
u
j
_
m
i=1
a
ij
i
_
(by (6))
=
i
_
j
a
ij
u
j
_
i
=
_
j
a
1j
u
j
.
.
.
j
a
mj
u
j
_
_
=
_
_
a
11
a
1n
.
.
.
.
.
.
a
m1
a
mn
_
_
_
_
u
1
.
.
.
u
n
_
_
= A
_
_
u
1
.
.
.
u
n
_
_,
15
expressed with respect to the second basis (of V). Thus applying T to a vector u amounts
to multiplying the corresponding matrix on the left with the corresponding column vector
on the right.
Conversely, given bases for U and for V, an mn matrix denes a unique linear map
T whose action on a vector u is given above.
Thus, linear maps and matrices are interchangeable.
Linear maps (with the same domain and codomain) can be added and multiplied by
scalars thus:
(T
1
+T
2
)u = T
1
u +T
2
u,
(aT)u = a(Tu).
The two equations above dene T
1
+ T
2
and aT respectively (a a scalar) by showing how
they map an arbitrary vector u. This makes the set of all such linear maps a vector space
in its own right.
If U and V are Hilbert spaces and the {
j
} and {
i
} are orthonormal bases, then each
entry a
ij
can be expressed as a scalar product in V:
i
T
j
) =
i
a
1j
1
+ +a
mj
m
) =
m
k=1
a
kj
i

k
) = a
ij
.
One upshot of this is that a linear map T is completely determined by the quantities
i
T
j
) for all i and j.
Adjoints. If Ais any mnmatrix over C, the adjoint of A(denoted A
or A
) is the nm
matrix obtained by taking the transpose of A and then taking the complex conjugate of
each entry. That is,
[A
]
ij
= ([A]
ji
)
,
for all 1 i n and 1 j m.
Note the following:
1. (A
= A.
2. (A+aB)
= A
+a
= B
.
An important special case is u
where u = (u
1
, . . . , u
n
) is a column vector (i.e., an n1
matrix). We have,
u
=
_
u
1
u
.
16
That is, u
i=1
u
i
v
i
= u
v . (7)
Here, we identify the 1 1 matrix u
vu).
Note that the lefthand side is the scalar product in J, and the righthand side is the scalar
product in H.
If we pick any orthonormal bases for Hand J, then these two denitions of the adjoint
coincide.
Exercise 3.1 (Challenging) Prove this fact.
GramSchmidt Orthonormalization. We prefer orthonormal bases for our Hilbert spaces.
Here we show that they actually exist, and in abundance. Let H be an ndimensional
Hilbert space and let {b
1
, . . . , b
n
} be any basis (not necessarily orthonormal) for H. For
i = 1 to n in order, dene
x
i
= b
i
i1
k=1
y
k
b
i
)y
k
y
i
=
x
i
x
i

.
This is known as the GramSchmidt procedure. Well see that {y
1
, . . . , y
n
} is an orthonormal
basis. Its not obvious that the y
i
are even welldened, since we need to establish that
x
i
 in the denominator is nonzero. We can prove the following facts simultaneously by
induction on i for 1 i n, that is, assuming that all the facts are true for all j < i, we
prove all the facts for i:
1. x
i
0 (and thus x
i
 > 0).
2. y
i
 = 1.
3. {b
1
, . . . , b
i
}, {x
1
, . . . , x
i
}, and {y
1
, . . . , y
i
} are each linearly independent sets of vectors
which span the same subspace of H.
17
4. y
i
b
i
) = b
i
y
i
) > 0.
5. y
j
y
i
) = 0 for all j < i.
For the last item, we compute
y
j
y
i
) =
y
j
x
i
)
x
i

=
1
x
i

_
y
j
b
i
)
k<i
y
k
b
i
)y
j
y
k
)
_
=
y
j
b
i
) y
j
b
i
)
x
i

= 0.
It turns out (we wont prove this) that given a basis b
1
, . . . , b
n
there can only be one
unique list y
1
, . . . , y
n
satisfying all the items (2)(5) above.
Hermitean and Unitary Operators.
Denition 3.2 If H and J are Hilbert spaces, we let L(H, J) denote the space of all linear
maps from H to J. We abbreviate L(H, H) by L(H), the space of all linear operators on
H, with identity element I. Note that L(H, J) is a vector space over C.
A map A J(H) is Hermitean (or selfadjoint) if A
= A. A map A is unitary if AA
= I
(equivalently, A
A = I).
For any u, v H, we have the following easy facts:
If A is Hermitean, then uAv) = Auv). This follows immediately from the fact
that uAv) = A
uv).
If A and B are Hermitean then so is A+B.
If A is Hermitean and a is real, then aA is Hermitean.
If A is Hermitean, then so is A
.
If A is unitary, then AuAv) = uv), that is, A preserves the scalar product. To see
this, we just compute
AuAv) = A
. Note that A
= A
1
in this case.
I is both Hermitean and unitary.
More on all this later.
18
L(H) is a Hilbert space. In Denition 3.2, we mentioned that L(H, J) is a vector space
over C. In fact, its dimension is the product of the dimensions of H and of J: Suppose
H has dimension n and J has dimension m. Given orthonormal bases for each space, an
element of L(H, J) corresponds an mn matrix. You can think of this matrix as a vector
with mn components which just happen to be arranged in a 2dimensional array rather
than a single column. The vector addition and scalar multiplication operations on these
matrices are componentwise, just as with vectors, so L(H, J) has dimension mn.
There is a natural inner product that one can dene on L(H, J) that makes it into an
mndimensional Hilbert space. For all A, B L(H, J), dene
AB) := tr(A
B) . (8)
This is known as the HilbertSchmidt inner product on L(H, J). It looks similar to the
expression u
v for the inner product of vectors u and v (Equation (7)), except that A
B is
not a scalar but an operator in L(H), and so we take the trace to get a scalar result.
Exercise 3.3 Show that L(H, J), together with its HilbertSchmidt inner product, satises
all the axioms of a Hilbert space. [Hint: You can certainly just verify the axioms directly.
Alternatively, represent operators as matrices with respect to some xed orthonormal
bases of Hand J, respectively, then show that if Aand B are mn matrices, then tr(A
B)
is the usual inner product of A and B on C
mn
, where we identify each matrix with the
mndimensional vector of all its entries.]
The HilbertSchmidt inner product interacts nicely with composition of linear maps
(or equivalently, matrix multiplication).
Exercise 3.4 Let H, J, and K be Hilbert spaces. Verify directly that for any A L(H, K),
B L(J, K), and C L(H, J),
ABC) = B
AC) = AC
B) . (9)
This means that you can move a right or left factor fromone side of the inner product to the
other, provided you take its adjoint. [Hint: Use Exercise 2.6 along with basic properties of
adjoints. You may assume that A, B, and C are all matrices of appropriate dimensions.]
4 Week 2: Preliminaries
Exercise 4.1 Let b
1
= (3, 0, 4), b
2
= (3, 1, 2), and b
3
= (0, 1, 1). Perform the Gram
Schmidt procedure above on{b
1
, b
2
, b
3
} to ndthe corresponding {x
1
, x
2
, x
3
} and{y
1
, y
2
, y
3
}.
19
Dirac Notation. In what follows, we x an ndimensional Hilbert space H and some
orthonormal basis for it, so we can identify vectors with column vectors in the usual way.
Recall that for column vectors u, v H, we have
uv) = u
v .
Paul Dirac suggested a notation which reconciles the two sides of this equation: if we let
v) denote the column vector v and we let u denote the rowvector u
= u)
_
0
.
.
.
0
1
0
.
.
.
0
_
_
_
0 0 1 0 0
=
_
_
0 0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 0
0 0 1 0 0
0 0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 0
_
_
,
Where the 1 is in the ith rowand jth column. This matrix is sometimes denoted E
ij
. Notice
that if A is a linear map H H whose corresponding matrix has entries a
ij
, then by the
equation above we must have,
A =
i,j
a
ij
E
ij
=
i,j
a
ij
e
i
)e
j
,
20
where both indices in the summation run from 1 to n. In particular, the identity operator
is given by
I =
i
e
i
)e
i
.
Change of (Orthonormal) Basis. Let H be as before, and let {e
1
, . . . , e
n
} and {f
1
, . . . , f
n
}
be two orthonormal bases for H. There is a unique linear map U L(H) mapping the
rst basis to the second, i.e., Ue
i
= f
i
for all 1 i n. Now for each 1 i, j n we have
e
i
U
Ue
j
) = Ue
i
Ue
j
) = f
i
f
j
) =
ij
= e
i
e
j
) = e
i
Ie
j
).
Since the linear map U
AUe
j
).
21
The righthand side is the (i, j)th entry of the matrix representing the operator U
AUwith
respect to the ebasis.
To summarize, if M
A
andM
A
are the matrices representing the operator Awith respect
to the e and fbases respectively, then
M
A
= M
U
M
A
M
U
,
where M
U
is the matrix representing the operator U with respect to the ebasis.
Thus, changing orthonormal basis amounts to unitary conjugation of the corre
sponding matrices.
Back to QuantumPhysics: The Double Slit Experiment. Its been known since early in
the 20thcenturythat light comes indiscrete packets (particles) calledphotons. People have
observed individual photons hitting a photoelectric detector (or a photographic plate) at
specic times and pinpoint locations, causing local electric currents in the detector (or
dots to appear on the plate).
On the other hand, light also exhibits wavelike properties. In the double slit exper
iment, light from a laser beam is shined on an opaque barrier with two small openings
close to each other (on the order of the wavelength of the light). A screen is placed on
the other side of the barier. What you see on the screen are alternating bands of light
and darka standard interference pattern caused by the light waves from the two slits
interfering constructively and destructively with each other. This is easily visible to the
naked eye. If you block one of the slits, then the interference pattern goes away and you
just see a smoothly contoured, glowing blob on the screen.
Here is a plausible (though ultimately wrong) explanation in terms of photons: the
photons somehow are changing phase in time, and the photons that go through the top
slit are interfering with the photons going through the bottom slit.
Lets see why this is wrong. Now alter the experiment as follows: Make the light
source extremely dim, so that it emits on the average of only one photon per second, and
replace the screen with a photographic plate (or photodetector) that will register where
each photon hits. The photons appear to hit the plate at random places, but if you run the
experiment a long time (thousands or millions of photons), you see that, statistically, the
distribution of photon hits resembles the same wavy interference pattern as before. That
is, the probability of a photon hitting any given location is proportional to the intensity of
the light at that location in the original experiment.
We cant say the the photons are interfering with each other, since one photon goes
through long before the next one comes. The only explanation is that each photon is
somehow passing through both slits at the same time and interfering with itself on the
other side. This cannot be explained at all by classical physics, which asserts that the
photon, being a particle, must travel through either the upper slit or the lower slit, but
22
not both. Indeed, if you put detectors at both slits, the photon will only be detected at (at
most) one slit or the other, not both.
Another thing that classical physics cannot explain is the random behavior of the
photons at the plate. You can send two identical photons, of exactly the same frequency
and moving in exactly the same direction, and they will wind up at dierent locations at
the plate. So the behavior of the photons is not deterministic but inherently random.
Quantum mechanics is needed to explain both these phenomena as follows: Each
photon does indeed correspond to a wave that goes through both slits, but the amplitude
of this wave at any location is related to the probability of the photon being at that location.
These waves interfere with each other and cause the interference pattern in the statistical
distribution of photons at the plate.
So the two hallmarks of quantum mechanics are: (i) nondeterminism (inherent ran
domness) and (ii) interference of probabilities. More later.
23
5 Week 3: Unitary conjugation
Invariance under Unitary Conjugation: Trace and Determinant. If A and U are n n
matrices and U is unitary, then by Equation (4) of Exercise 2.6,
tr(UAU
) = tr(U
UA) = tr(IA) = tr A.
In other words, the tr function is invariant under unitary conjugation, i.e., if matrices A
and B are unitarily conjugate, then their traces are equal. This means that the tr function is
really a function of the underlying operator and does not depend on which orthonormal
basis you use to represent the operator as a matrix. (In fact, it does not depend on any
basis, orthonormal or otherwise.)
Its worth looking at what the trace looks like in Dirac notation. If Ais an operator and
{e
1
, . . . , e
n
} is an orthonormal basis, then we know that [A]
ij
= e
i
Ae
j
) = e
i
Ae
j
) for the
matrix of A with respect to this basis. So,
tr A =
n
i=1
[A]
ii
=
i
e
i
Ae
i
), (10)
and this quantity does not depend on the particular orthonormal basis we choose.
Similarly, the determinant function det is also invariant under unitary conjugation.
This follows from the fact that det(AB) = det Adet B and det(A
1
) = (det A)
1
for any
nonsingular A. For A and U as above, we have
det(UAU
) = det(UAU
1
) = (det U)(det A)(det U)
1
= det A.
So like the trace, det is really a function of the operator and does not depend on the basis
used to represent the operator as a matrix.
Here are some other invariants under unitary conjugation. In each case, U is an
arbitrary unitary operator.
The adjoint. For any A, clearly (UAU
= UA
= UA
= UAU
, so UAU
is also
Hermitean.
Being unitary. If A is unitary, then (UAU
)(UAU
= UAU
UA
= UAA
=
UU
= I, so UAU
is also unitary.
Orthogonal Subspaces, ProjectionOperators. Again, let Hbe an ndimensional Hilbert
space, and let V, W H be subspaces of H. V and W are mutually orthogonal if vw) = 0
for every v V and w W.
24
Exercise 5.1 Show that if V and W are mutually orthogonal, then no nonzero vector can
be in V W.
There is a natural onetoone correspondence between the subspaces of Hand certain
linear operators on H known as projection operators.
Denition 5.2 An (orthogonal) projection operator or projector on His a linear map P L(H)
such that
1. P = P
P.
Exercise 5.4 Show that if P and Q are projection operators and PQ = 0, then QP = 0 as
well, and P + Q is a projection operator. [Hint: To show that QP = 0, take the adjoint of
both sides of the equation PQ = 0.]
Given a projection operator P on H, let V be the image of P, that is, V = imgP := {Pv :
v H}. Then it is easy to check that V is a subspace of H, and we say that P projects onto
V. Notice that if u V then there is a v such that Pv = u, and so
Pu = PPv = Pv = u.
That is, P xes every vector in V, and so clearly we also have V = {u H : Pu = u}.
Not only does P project onto V but it does so orthogonally. This means that P moves
any vector v perpendicularly onto V, or more precisely, uPv v) = 0 for any u V, where
Pvv is the vector representing the net movement fromv to Pv. To see that uPv v) = 0,
we write u = Pw for some w and just calculate:
uPv v) = PwPv v) = PwPv) Pwv) = P
Pwv) =
P
2
wv
_
= Pwv),
25
and
PwQv) = Q
and P
2
= P, so P is a projector. Furthermore, P xes each of the basis
vectors b
1
, . . . , b
k
and so it xes each vector in V. P annihilates all the other b
k+1
, . . . , b
n
,
and so Pv V for all v H. Thus P projects orthogonally onto V.
Exercise 5.5 Let V be a subspace of H and let P be its corresponding projection operator.
Show that dimV = tr P. [Hint: Consider the matrix construction just above.]
Exercise 5.6 Suppose u = u) is a unit vector in H (i.e., uu) = 1). Show that u)u is a
projection operator. What subspace does it project onto? What is tr u)u?
Exercise 5.7 Find the 3 3 matrix for the projector P that projects orthogonally onto the
twodimensional subspace of C
3
spanned by v
1
= (1, 1, 0) and v
2
= (2, 0, i). P is the
unique operator satisfying: (i) P
2
= P = P
, (ii) Pv
1
= v
1
, (iii) Pv
2
= v
2
, and (iv) tr P = 2.
[Hint: If y
1
and y
2
are orthogonal unit vectors, then y
1
)y
1
 + y
2
)y
2
 projects onto the
subspace spanned by y
1
and y
2
. Use GramSchmidt to nd y
1
and y
2
given v
1
and v
2
.
When you nd P, check items (i)(iv) above.]
26
Exercise 5.8 Let V and W be subspaces of H with corresponding projection operators P
and Q, respectively. Prove that V and W are mutually orthogonal if and only if PQ = 0.
[Hint: For the forward direction, consider PQv
2
for any vector v H. For the reverse
direction, consider PvQw) for any vectors v, w H, and move the P to the righthand
side of the bracket.]
If V is a subspace of H, we dene the orthogonal complement of V (denoted V
) to be
V
= {u H : (v V)[uv) = 0]}.
V
is clearly a subspace of H.
Exercise 5.9 Show that if V is a subspace of H with corresponding projection operator P,
then I P is the projection operator corresponding to V
.
A complete set of orthogonal projectors is a collection {P
i
: i I} of nonzero projectors on
H such that
1. P
i
P
j
= 0 for all i, j I with i j, and
2.
iI
P
i
= I (the identity map).
Here, I is any nite set of distinct labels. We may have I = {1, . . . , k} for some k, but there
are other possibilities, including real numbers, or labels that are not numbers at all. We
will see later that condition (1) is actually redundant; it follows from condition (2).
Taking the trace of both sides of item (2), we get
iI
tr P
i
= tr
iI
P
i
= tr I = n.
Since each P
i
0, its trace is a positive integer (Exercise 5.5), so there can be at most n
many projection operators in any complete set, where n = dimH.
For each i I, let V
i
be the subspace that P
i
projects onto. By Exercise 5.8, the V
i
are all
pairwise mutually orthogonal. Furthermore, the V
i
together span all of H: for any v H,
v = Iv =
iI
P
i
v,
but P
i
v V
i
for each i, so v is the sum of vectors from the V
i
.
Exercise 5.10 Let {P
i
: i I} be a complete set of orthogonal projectors over H, and let
v H be any vector. Show by direct calculation that
v
2
=
aI
P
a
v
2
.
Exercise 5.11 Suppose P and Q are projection operators on H projecting onto subspaces
V Hand W H, respectively. Show that if P and Qcommute, that is, if PQ = QP, then
PQ is a projection operator projecting onto V W.
27
Fundamentals of Quantum Mechanics. We now know enough math to present the
fundamental principles of quantum mechanics. For now, I will abide by the Copen
hagen interpretation of quantum mechanics rst put forward by Niels Bohr. This is the
bestknown interpretation and is easy to work with, albeit somewhat unsatisfying philo
sophically. Another wellknown interpretation is the Everett interpretation, a.k.a. the
manyworlds interpretation or the unitary interpretation. More on that later. There are
still other interpretations, but there are no conicts between any of these interpretations;
they all use the same math and lead to the same predictions.
Physical Systems and States. A physical system is some part of nature, for example,
the position of an electron orbiting an atom, the electric eld surrounding the earth, the
speed of a train, etc. The last two are macroscopic, dealing with big objects with lots
of mass, momentum, and energy. Although in principle quantum mechanics covers all
these systems, it is most conveniently applied to microscopic systems like the rst.
The most basic principle of quantum mechanics relevant to us is that to every physical
system S there corresponds a Hilbert space H = H
S
, called the state space of S.
1
At any
given point in time, the system is in some state, which we dene as a unit vector ) H.
The state of the system may change with time, depending on the forces (internal and
external) applied to the system. We may write the state of the system at time t as (t)).
Time Evolution of an Isolated System. Lets assume that our systemS is isolated, i.e., it
is not interacting with any other systems. The state of S evolves in time, but this evolution
is linear in the following sense: For any two times t
1
, t
2
R, there is a linear operator
U = U(t
2
, t
1
) L(H) such that if the system is in the state (t
1
)) at time t
1
then at time
t
2
the system will be in the state
(t
2
)) = U(t
1
)).
The operator U only depends on the system (its internal forces) and on the times t
1
and
t
2
, but not on the particular state the system happens to be in. That is, the single operator
U describes how the system evolves from any state at t
1
to the resulting state at t
2
. Note
that t
1
and t
2
are arbitrary; t
2
does not necessarily have to come after t
1
.
Since U maps states to states, it must be normpreserving. From this one can show
that it must preserve the scalar product. That is, U must be unitary. Here are some other
basic, intuitive facts:
1. U(t, t) = I for any time t. (If no time elapses, then the state has no time to change.)
1
In general, Hmay be innite dimensional. The systems we care about, however, are all bounded, which
means they correspond to nite dimensional spaces.
28
2. U(t
1
, t
2
) = U(t
2
, t
1
)
1
= U(t
2
, t
1
)
P
k
) = P
k
P
k
) = P
k
) . (11)
Furthermore, immediately after the measurement, the state of the system will be

k
) =
P
k
)
P
k
)
=
P
k
)
_
P
k
)
, (12)
where k is the outcome of the measurement.
6 Week 3: Projective measurements (cont.)
A number of points need to be emphasized and claried.
2
There are other, more general types of measurement, but these can actually be implemented using
projective measurements on larger systems, so these other measurements really arent more general than
projective measurements.
29
The outcome of the projective measurement is intrinsically random. You can prepare
the system S in the exact same state ) twice, perform the exact same projective
measurement both times, and get dierent outcomes. The only things that we can
predict from our experiments are the statistics of the outcomes. If we know the state
) of the system when the measurement is performed, then in principle we can
compute Pr[k] for each outcome k, and then if we run the same experiment many
times (say a million times), then we can expect to see outcome k occur about a Pr[k]
fraction of the time. This is indeed what happens.
There can be at most a nite, discrete number of possible outcomes associated with
any projective measurementno more thanthe dimensionof H(at least for bounded
systems).
The probabilities dened by (11) are certainly nonnegative, but we need to check
that they sum to 1. We have
kI
Pr[k] =
k
P
k
) = 
_
k
P
k
_
) = I) = ) = )
2
= 1,
since ) is a unit vector.
Performing a projective measurement in general disturbs the system being mea
sured. The measurement actually consists of an interaction between the system and
the measuring apparatus, and one cannot be aected without aecting the other.
This disturbance of the system being measured is not just a practical matter of us
not building our instruments delicate enough; it is fundamental and unavoidable
physical reality, sometimes referred to collapse of the wavefunction.
Suppose that we perform the measurement above on S in state ) and get outcome
k, so that the state becomes 
k
) = P
k
)/P
k
) as in (12), then we immediately
repeat the same measurement. The probability of getting any outcome j I from
the second measurement is
Pr[j] = P
j

k
)
2
=
_
_
_
_
P
j
P
k
)
P
k
)
_
_
_
_
2
=
kj
.
That is, we see the outcome k again with certainty, and the state immediately after
the second measurement is
P
k

k
)
P
k

k
)
=

k
)

k
)
= 
k
),
unchanged from after the rst measurement. So the rst measurement changes the
state to be consistent with whatever the outcome is, so that repetitions of the same
measurement will always yield the same outcome (provided, of course, that the state
does not evolve between measurements).
30
If ) is a state and R, then e
i
) is also a state. The unit norm scalar e
i
is
known as a phase factor. Note that
1. if U is unitary, then obviously Ue
i
) = e
i
U), and
2. for the projective measurement {P
k
}
kI
above, the probability of seeing k when
the system is in state e
i
) is
P
k
e
i
)
2
= e
i

2
P
k
)
2
= P
k
)
2
,
that is, the same for the state ), and nally,
3. if outcome k occurs, then the state after the measurement is
P
k
e
i
)
P
k
e
i
)
= e
i
P
k
)
P
k
)
= e
i

k
).
This means that the phase factor just goes along for the ride and does not aect the
statistics of any projective measurment (or any other type of measurement, either).
The state ) and e
i
) are physically indistinguishable, and so we can choose overall
phase factors arbitrarily in dening a state, or we are free to ignore them as we wish.
More on this later.
Well now see how this all plays out for a twodimensional system.
A Perfect Example: Electron Spin. Rotating objects possess angular momentum. The
angular momentum of an object is a vector in R
3
that depends on the distribution of
mass in the object and how the object is rotating. For any given object, the length of its
angular momentum vector is proportional to the speed of the rotation (in revolutions per
minute, say), and the vectors direction is pointing (roughly) along the axis of rotation
in the direction given by the right hand rule: a disk rotating counterclockwise in the
x, yplane has its angular momentumvector pointing in the positive zdirection. AFrisbee
thrown by a righthanded person (using the usual backhand ip) rotates clockwise when
viewed from above, so its angular momentum vector points down toward the ground.
If a rotating object carries a net electric charge, then it has a magnetic moment vector
that is proportional to the angular momentum times the net charge. Shooting an object
with a magnetic moment through a nonuniform electric eld imparts a force to the object,
causing it to deect and change direction. The deection force is along the axis given
by the gradient of the electric eld and is proportional to the component of the magnetic
moment along that gradent axis. Youcanmeasure the component of the magnetic moment
along the gradient axis this way by seeing the amount of deection.
Electrons deect when shot through a nonuniform magnetic eld as well, so they
possess magnetic moment. This can only mean that they have angular momentum as
well, even though, being elementary particles, they have no constituent parts that can
31
)
with gradient in
the +z direction
+z
Random spins
Spin up
Spin down
)
Nonuniform eld
Figure 1: SternGerlach exeriment: The electron beam comes in from the left, passes
through a nonuniform eld between the two probes, and splits into two beams. The eld
gradient is oriented along the axis of the probes, which is here given by the +zdirection.
rotate around one another. This is just one of the many bizarre aspects of the microscopic
world.
In the SternGerlach experiment, randomly oriented electrons are shot through a nonuni
formelectric eld whose gradient is oriented in the +zdirection (vertically). According to
classical physics, we would expect the electrons to deect by random amounts, causing a
smooth updown spread in the beam. Instead, what we actually observe is the beam split
into two sharp beams of roughly equal intensity: one going up, the other going down (see
Figure 1). So each electron only goes up the same amount or down the same amount. This
experiment amounts to a projective measurement of the spin of an electron, at least in the
zdirection. There are two possible outcomes: spinup and spindown. It is natural then
to model the physical system of electron spin as a twodimensional Hilbert space, with
an orthonormal basis {), )}, where ) =
_
1
0
_
is the spinup state and ) =
_
0
1
_
is
the spindown state. We may also write ) and ) as +z) and z), respectively, to make
clear along what axis the spin is aligned. The projectors in the projective measurement
are then
P
= P
+z
= ) =
_
1 0
0 0
_
,
which projects onto the space spanned by ) and corresponds to the spinup outcome,
and
P
= P
z
= ) =
_
0 0
0 1
_
,
which projects onto the space spanned by ) and corresponds to the spindown outcome.
32
As well see in a little bit, a twodimensional Hilbert space actually suces for modeling
electron spin.
33
7 Week 4: Qubits
Qubits. In digital information processing, the basic unit of information is the bit, short
for binary digit. Each bit has two distinct states that we care about: 0 and 1. In quantum
information processing, we use bits as well, but we regard them as quantum systems that
have two states 0) and 1) that form an orthonormal basis for a twodimensional Hilbert
space.
3
Such systems are called quantum bits, or qubits for short. Any twodimensional
Hilbert space will do to model a qubit. This is why it is useful to consider the electron
spin example. In fact, electron spin is one proposed way to implement a qubit: ) is
identied with 0) and ) with 1).
4
Well return to the electron spin example, but what
we say applies generally to any system with a twodimensional Hilbert space (sometimes
called a twolevel system), which can then in principle be used to implement a qubit.
To emphasize this point, well use 0) and 1) to stand for ) and ), respectively, and well
let the projectors P
0
and P
1
stand for P
and P
, respectively.
Back to Electron Spin. Using the SternGerlach apparatus oriented in a particular di
rection, we can prepare electrons to have spins in that direction. We simply retain one
emerging beam and discard the other. Figure 2 shows electrons being prepared to spin in
one direction in the x, zplane, then measured in the +zdirection.
The general state of an electron spin is
) = 0) +1).
That is, it is some linear combination of spinup and spindown. We would now like to
determine which linear combinations correspond to which spin directions (in 3space).
Since ) is a unit vector, we have
1 = )
= (
0 +
1)(0) +1))
=
00) +
01) +
10) +
11)
= 
2
+
2
.
Indeed, the probability of seeing 0) (spinup) is
P
0
) = 0)0) =
= 
2
.
And similarly, the probability of seeing 1) (spindown) is 
2
. Since phase factors dont
matter, we can assume from now on that R and 0, because we can multiply )
by the right phase factor, namely e
i arg()
.
3
Throughout this section, the symbols in the bras and kets (e.g., , +z, 0, 1, etc.) are only labels and are
not meant to represent vectors on their own, so the kets and bras are always needed.
4
Another system with a twodimensional Hilbert space is photon polarization, where we can take as our
basis the state ) (horizontal polarization) and the state ) (vertical polarization). All other polarization
states (e.g., slanted or circular) are linear combinations of these two.
34
+z
trash
Figure 2: Electrons are prepared by the tilted apparatus on the left to spin at an angle
from the +zaxis. These are then fed into a vertical apparatus.
35
Now consider the state 
P
+z

) 1 = 2
2
1. (13)
This must be cos , so solving for in terms of and remembering that 0, we get
=
_
1 + cos
2
= cos
2
.
Since 0 
2
= 1
2
, we have  = sin(/2). Thus,
= e
i
sin
2
,
for some real with 0 < 2. In experiments, these relative intensities are actually
observed.
It is worth mentioning at this point that for any 0 and Csuch that
2
+
2
= 1,
there are 0 and 0 < 2 such that
= cos
2
,
= e
i
sin
2
,
giving the general spin state as
) = cos
2
0) +e
i
sin
2
1).
Furthermore, and are uniquely determined by ) except when = 0 or = 0, in
which case = or = 0, respectively, but is completely undetermined.
Now look at the case where = /2, that is, the spin is pointing in the +x direction (to
the right). We get
+x) =
/2
_
= cos
4
0) +e
i
sin
4
1) =
0) +e
i
1)
2
.
36
We are free to adjust the phase factor of 1) to absorb the e
i
above. That is, without
changing the physics, we redene
5
1) := e
i
1).
By the phaseadjustment we now get
+x) =
/2
_
= ) =
0) +1)
2
.
The corresponding onedimensional projector is
P
+x
= P
= +x)+x =
_
0) +1)
2
__
0 +1
2
_
=
1
2
(0)0 +0)1 +1)0 +1)1),
which has matrix form (1/2)
_
1 1
1 1
_
.
Now we consider the state +y) representing spin in the +y direction. A +y spin has
no +zcomponent, so if +y) is measured along the zaxis, we get Pr[] = Pr[] = 1/2, as
with +x). Thus,
+y) =
0) +e
i
1)
2
=
1
2
_
1
e
i
_
,
for some 0 < 2. If we now measure a +y spin in the +x direction, we should again
get equal probabilities of spinleft and spinright, since the spin is perpendicular to x.
Thus we should have
1
2
= Pr[] = +yP
+y) =
1
4
_
1 e
i
_
1 1
1 1
_ _
1
e
i
_
=
1 + cos
2
.
So cos = 0, and it follows that {/2, 3/2} and so e
i
= i. It does not matter which
value we choose; the math and physics is equivalent either way. So well set = /2,
whence we get
+y) =
0) +i1)
2
.
The corresponding projector is
P
+y
= +y)+y =
_
0) +i1)
2
__
0 i1
2
_
=
1
2
(0)0 i0)1 +i1)0 +1)1),
which has matrix form (1/2)
_
1 i
i 1
_
.
5
Mathematicians may not like doing this, but physicists and computer scientists arent bothered by it.
37
Lets review:
+x) =
0) +1)
2
=
1
2
_
1
1
_
, (14)
+y) =
0) +i1)
2
=
1
2
_
1
i
_
, (15)
+z) = 0) =
_
1
0
_
. (16)
The corresponding projectors are
P
+x
= +x)+x =
1
2
_
1 1
1 1
_
=
1
2
(I +X),
P
+y
= +y)+y =
1
2
_
1 i
i 1
_
=
1
2
(I +Y),
P
+z
= +z)+z =
_
1 0
0 0
_
=
1
2
(I +Z),
where
X =
x
=
1
= 2P
+x
I =
_
0 1
1 0
_
, (17)
Y =
y
=
2
= 2P
+y
I =
_
0 i
i 0
_
, (18)
Z =
z
=
3
= 2P
+z
I =
_
1 0
0 1
_
. (19)
X, Y, and Z are known as the Pauli spin matrices. More on them later.
Now consider a general spin state, written in terms of and :
) = 
,
) = cos(/2)0) +e
i
sin(/2)1) =
_
cos(/2)
e
i
sin(/2)
_
.
(Recall that 0 and 0 < 2 are arbitrary.) The direction of this spin is given by
a vector s = (x
s
, y
s
, z
s
) in 3space with Cartesian coordinates x
s
, y
s
, z
s
R. How do we
nd x
s
, y
s
, z
s
? We know that these values are the average deections observed when the
spin is measured in the +x, +y, or +z axes, respectively. So generalizing Equation (13),
we must have
x
s
= 2P
+x
) 1 = X) = cos(/2) sin(/2)(e
i
+e
i
) = sin cos , (20)
y
s
= 2P
+y
) 1 = Y) = cos(/2) sin(/2)(i)(e
i
e
i
) = sin sin, (21)
z
s
= 2P
+z
) 1 = Z) = cos
2
(/2) sin
2
(/2) = cos . (22)
38
Thus s is exactly the point on the unit sphere whose spherical coordinates are (, ).
6
Exercise 7.1 Verify Equations (2022) using matrix multiplication and trig.
Exercise 7.2 What is the spin direction corresponding to the state (
30)1))/2? Express
your answer as simply as possible.
Exercise 7.3 What spin state corresponds to the direction s = (2/3, 2/3, 1/3)? Express
your answer as simply as possible.
Exercise 7.4 (Very useful!) Show that if ) is a general spin state corresponding to the
direction s = (x
s
, y
s
, z
s
) as described above, then
) =
1
2
(I +x
s
X +y
s
Y +z
s
Z). (23)
The righthand side is sometimes written as (1/2)(I + s ), abusing the dot product
notation.
8 Week 4: Density operators
Density Operators. One problem with using a vector ) to represent a physical state
is that the vector carries more information than is physically relevant, namely, an overall
phase factor. The physically relevant portion of ) is really just the onedimensional
subspace that it spans (which does not depend on any phase factors), or equivalently,
the projector ) that orthogonally projects onto that subspace. For this and other
reasons, one may dene the state of a system to be a onedimensional projection operator
= ) instead of a vector ). This alternate view of states is known as the density
operator formalism, and is known as a density operator or density matrix. Besides the
advantage of discarding the physically irrelevant phase information, this formalism has
other advantages that we will see later when we discuss quantuminformation theory. For
many of the tasks at hand, however, either formalism will suce, and we will use both as
is convenient.
We need to describe the two basic physical processes that we have discussedtime
evolution and projective measurementin terms of the density operator formalism.
6
Each vector s on the unit sphere can be described using spherical coordinates, i.e., two angles and ,
where 0 is the angle between s and the +z axis (the latitude of s, measured down from the North
Pole), and 0 < 2 is the angle one would have to swivel the x, zplane counterclockwise around the
+z axis until it hits s (the longitude of s, measured east of Greenwich, i.e., east of the x, zplane). If s has
spherical coordinates (, ), then its Cartesian coordinates are (sin cos , sin sin, cos ).
39
Time evolution of an isolated system. In the original formalism, time evolution is de
scribed by a unitary operator U such that any state ) evolves to a state U) in the
given interval of time. In the new density operator formalism, the state = )
would evolve under U to the new state
= UU
. (24)
To see why this is so, we merely observe that the new state should be ), where
) = U). We get
= ) = ))
= U)(U))
= U))
= U)U
= UU
.
Projective measurement. Suppose we are given a complete set {P
k
: k I} of projectors
corresponding to a projective measurement. In the original formalism, if the system
is in state ) before the measurement, then the probability of outcome k is P
k
).
Since this probability is physically relevant (we can collect statistics over many
identical experiments), we had better get the same probability in the newformalism:
when the state of the system is = ) before the measurement, the probability
of outcome k is given by
Pr[k] = tr(P
k
) = P
k
) , (25)
where the righthand side refers to the HilbertSchmidt inner product on L(H) (see
Equation (8)). To see that this is the same as in the original formulation, we can use
the form of the trace given by Equation (10), where we choose an orthonormal basis
{e
1
, . . . , e
n
} such that e
1
= ). We then get
tr(P
k
) =
n
i=1
e
i
P
k
e
i
) =
i
e
i
P
k
)e
i
)
=
i
e
i
P
k
e
1
)e
1
e
i
) = e
1
P
k
e
1
) = P
k
) .
Alternatively, we can use the commuting property of the trace (Equation (4)) to get
tr(P
k
) = tr(P
k
)) = tr(P
k
)) = P
k
) .
Assuming the outcome is k, the state after the measurement shouldbe
k
= 
k
)
k
,
where 
k
) = P
k
)/P
k
). This simplies:

k
)
k
 =
P
k
)P
k
P
k
)
2
=
P
k
)P
k
Pr[k]
=
P
k
P
k
tr(P
k
)
=
P
k
P
k
P
k
)
.
Thus, the postmeasurement state after outcome k is given by
k
=
P
k
P
k
Pr[k]
=
P
k
P
k
tr(P
k
)
=
P
k
P
k
P
k
)
. (26)
Note that tr(P
k
) = tr(P
2
k
) = tr(P
k
P
k
), so the denominator in (26), i.e., the prob
ability of getting the outcome k, is the trace of the numerator. (Obviously,
k
is
undened if Pr[k] = 0, but if thats the case, wed never see outcome k.)
40
Exercise 8.1 Show that if 
1
) and 
2
) are unit vectors, and
1
= 
1
)
1
 and
2
=

2
)
2
, then
1

2
) = tr(
1
2
) = 
1

2
)
2
.
Properties of the Pauli Operators. The operators X, Y, Zdened in (1719) play a promi
nent role in quantum mechanics and quantum informatics. Here well present their most
important properties in one place for ease of reference. All of these facts are easy to verify,
and we leave that for the exercises.
1. X
2
= Y
2
= Z
2
= I.
2. (a) XY = YX = iZ.
(b) YZ = ZY = iX.
(c) ZX = XZ = iY.
3. X, Y, and Z are all both Hermitean and unitary.
4. tr X = tr Y = tr Z = 0.
5. det X = det Y = det Z = 1.
Exercise 8.2 Verify all the above equations.
Note that there is a cyclic symmetry among the Pauli matrices. If we simultaneously
substitute X Y, Y Z, and Z X everywhere in the equations above, we get the same
equations. We wont pursue it here, but you can use the Pauli operators to represent the
quaternions H.
The four 2 2 matrices I, X, Y, Z (also denoted
0
,
1
,
2
,
3
, respectively) form a basis
for the space L(C
2
) of all operators over C
2
(i.e., 2 2 matrices over C). That is, for any
2 2 matrix A, there are unique coecients a
0
, a
1
, a
2
, a
3
C such that
A = a
0
I +a
1
X +a
2
Y +a
3
Z =
3
i=0
a
i
i
. (27)
The coecients can often be found by inspection, but there is a brute force method to nd
them:
Exercise 8.3
1. Verify that the set
2
,
X
2
,
Y
2
,
Z
i
A)
for all 0 i 3. [Hint: Use the previous item.] This implies that
i

j
) = 2
ij
(28)
for all i, j {0, 1, 2, 3}.
Exercise 8.4 Showthat if A = xX+yY+zZfor real numbers x, y, z suchthat x
2
+y
2
+z
2
= 1,
then A
2
= I. Thus A is both Hermitean and unitary.
SingleQubit Unitary Operators. In this topic, we show that applying any unitary
operator to a onequbit system amounts to a rigid rotation in R
3
, and conversely, any rigid
rotation in R
3
corresponds to a unitary operator. Weve seen that a general onequbit state
can be written, up to an overall phase factor, as
) = 
,
) = cos(/2)0) +e
i
sin(/2)1),
for some 0 and some 0 < 2, and that this state corresponds uniquely
(and vice versa) to the point s on the unit sphere in R
3
with spherical coordinates (, )
(and thus with Cartesian coordinates (x
s
, y
s
, z
s
) = (sin cos , sin sin, cos )). (Think
of s as the spin direction of an electron, for example.) The unit sphere in question here
is known as the Bloch sphere. Well now show that the action of a unitary operator U on
onequbit states amounts to a rigid rotation S
U
of the Bloch sphere.
Its slightly more convenient to work in the density operator formalism, using Equa
tion (23). Given any onequbit unitary operator U, we dene the map S
U
as follows: For
any point s = (x
s
, y
s
, z
s
) on the Bloch sphere (s is a vector in R
3
of length 1), let
s
=
1
2
(I +x
s
X +y
s
Y +z
s
Z)
be the corresponding onequbit state, according to Equation (23). Then let
t
= U
s
U
t
=
1
2
(I +x
t
X +y
t
Y +z
t
Z),
for some unique t = (x
t
, y
t
, z
t
) on the Bloch sphere. We now dene S
U
(s) to be this t.
7
7
Theres no reason that we have to restrict the vector s to be on the Bloch sphere. We can dene S
U
in
precisely the same way for any s R
3
, giving a map from all of R
3
to R
3
. From the sequel, it will be evident
that this is a linear map.
42
It is immediate from the denition that for unitaries U and V we have S
UV
= S
U
S
V
.
To showthat S
U
rotates the sphere rigidly, we rst showthat S
U
preserves dot products
of vectors on the Bloch sphere, that is, S
U
(r) S
U
(s) = r s for any r and s on the Bloch
sphere. This implies that S
U
is a rigid map of the Bloch sphere onto itself, but it does not
imply that S
U
is a rotation, because S
U
might be orientationreversing. Well see that S
U
preserves orientation (aka chirality, aka handedness), so that it must be a rotation.
8
Let r = (r
1
, r
2
, r
3
) and s = (s
1
, s
2
, s
3
) be any two points on the Bloch sphere, with
corresponding states
r
= (1/2)
3
i=0
r
i
i
and
s
= (1/2)
3
j=0
s
j
j
as above, where we
dene r
0
= s
0
= 1. Recall that the dot product of r and s is r s = r
1
s
1
+ r
2
s
2
+ r
3
s
3
. Lets
compute
r

s
) using Equation (28):
r

s
) =
_
1
2
3
i=0
r
i
1
2
3
j=0
s
j
j
_
=
1
4
i,j
r
i
s
j
i

j
) =
1
4
i,j
r
i
s
j
(2
ij
)
=
1
2
3
i=0
r
i
s
i
=
1
2
_
1 +
3
i=1
r
i
s
i
_
=
1 +r s
2
,
so
r s = 2
r

s
) 1 . (29)
Since r and s were arbitrary, we should also have
S
U
(r) S
U
(s) = 2
S
U
(r)
S
U
(s)
_
1 ,
but now,
S
U
(r)
S
U
(s)
_
= tr((U
r
U
)(U
s
U
)) = tr(U
r
s
U
) = tr(
r
s
) =
r

s
) ,
and so S
U
(r) S
U
(s) = r s as we wanted.
Now is perhaps a good time to clear up some confusion that may arise about points
on the Bloch sphere. Letting 
r
) and 
s
) be such that
r
= 
r
)
r
 and
s
= 
s
)
s
,
then combining Equation (29) with Exercise 8.1 above, we get
r s = 2
r

s
)
2
1.
Thus
r

s
) = 0 i r s = 1. In other words, qubit states that are orthogonal in the Hilbert
space correspond to antipodalor oppositepoints on the Bloch sphere. We kind of knew
this already, since the two possible outcomes of the SternGerlach spin measurement
8
Alinear mapAfromR
n
to R
n
preserves orientationidet A > 0, andit reverses orientationidet A < 0.
43
(in any direction) are opposite spins (e.g., ) and )), and must (as with any projective
measurement) correspond to orthogonal states.
Before showingthat S
U
must preserve orientation, well showthat for anyrigidrotation
S of the Bloch sphere, there is a unitary U such that S = S
U
. Geometrically, any rotation S
of the unit sphere can be decomposed into a sequence of three simple rotations:
1. a counterclockwise rotation S
z
() about the +z axis through an angle where
0 < 2,
2. a counterclockwise rotation S
y
() about the +y axis through an angle where
0 , and
3. another counterclockwise rotation S
z
(), about the +z axis, this time through an
angle where 0 < 2.
The last two rotations have the eect of moving the North Pole (i.e., the point (0, 0, 1)) to
an arbitrary point on the sphere (with spherical coordinates (, )) in a standard way. The
only remaining freedom left in choosing S is an initial rotation that xes the North Pole,
i.e., the rst rotation above. The three angles , , are uniquely determined by S (except
when = 0 or = ), and are called the Euler angles of S.
So S = S
z
()S
y
()S
z
(), and so to implement S, we only need to nd unitaries for
rotations around the +z and +y axes. For any angle , dene
R
z
() =
_
e
i/2
0
0 e
i/2
_
. (30)
R
z
() is obviously unitary, and
R
z
()(0) +1)) = e
i/2
0) +e
i/2
1) 0) +e
i
1),
and so if U = R
z
(), then S
U
= S
z
(). (Here and elsewhere, we use the expression A B
to mean that Aand B may dier only by an overall a phase factor, i.e., there exists an angle
R such that A = e
i
B.) For any angle , dene
R
y
() =
_
cos(/2) sin(/2)
sin(/2) cos(/2)
_
. (31)
R
y
() is unitary, and it is straightforward to show that if U = R
y
(), then S
U
= S
y
().
Thus any rotation S can be realized as S
U
for some unitary U. Later, we will see a direct
way of translating between a 1qubit unitary U and its corresponding rotation R
U
. For
completeness, we dene a unitary corresponding to rotation of counterclockwise about
the xaxis:
R
x
() = R
y
(/2)R
z
()R
y
(/2) =
_
cos(/2) i sin(/2)
i sin(/2) cos(/2)
_
. (32)
44
Nowtoshowthat S
U
must preserve orientation, we showthat the orientationreversing
mapMthat maps eachpoint (x, y, z) onthe Blochsphere to its antipodal point (x, y, z)
is not of the form S
U
for any unitary U. This suces, because if S is any orientation
reversing rigid map of the Bloch sphere, then S
1
is also rigid and orientationreversing,
which means that the map T = MS
1
is orientationpreserving and hence a rotation.
Therefore, T = S
V
for some unitary V, as we just showed. But then M = TS, and so
if we assume that S = S
W
for some unitary W, we then have M = S
V
S
W
= S
VW
, a
contradiction.
Suppose M = S
U
for some unitary U. Then, since U must reverse the directions of all
spins, we must have, for example, U0) = U+z) z) = 1) and U1) = Uz) +z) =
0). Expressing U as a matrix in the {+z), z)} basis, we must then have
U =
_
0 e
i
e
i
0
_
for some , R. Now consider the state
) =
1
2
(0) +e
i()/2
1)) =
1
2
_
1
e
i()/2
_
,
which corresponds to some point p on the equator of the Bloch sphere. We have
U) =
1
2
_
0 e
i
e
i
0
_ _
1
e
i()/2
_
=
1
2
_
e
i(+)/2
e
i
_
=
e
i(+)/2
2
_
1
e
i()/2
_
).
So U does not change the state ) more than a phase factor, and thus S
U
leaves the point
p xed, which means that S
U
M, a contradiction.
9
Example. X, Y, and Z are unitary, so what are S
X
, S
Y
, and S
Z
? Its easy to check that
X = iR
x
(), Y = iR
y
(), and Z = iR
z
(), and so (since phase factors dont matter) S
X
, S
Y
,
and S
Z
are rotations about the x, y, and zaxes, respectively, through the angle .
Exercise 8.5 Prove the claim that if U = R
y
(), then S
U
= S
y
(). [Hint: First check that
R
y
()+y) +y), and thus S
U
xes the point (0, 1, 0) where the Bloch sphere intersects
the +yaxis. Then check that R
y
()+z) = R
y
()0) cos(/2)0) + sin(/2)1) = 
,0
), so
S
U
moves the point (0, 0, 1) to the point (sin, 0, cos ). Finally, check that R
y
()+x) =
R
y
()
/2,0
_
cos(/2 +/4)0) + sin(/2 +/4)1) =
+/2,0
_
, so S
U
moves the point
(1, 0, 0) to (cos , 0, sin).]
Adirect translation. Here we give (without proof) a direct way to pass between the 22
unitary matrix U and the corresponding rotation S
U
(a 3 3 real matrix), and vice versa,
using the Pauli matrices. First we give some basic facts about rotations of R
n
in general
and of R
3
in particular (some of which weve seen before). You can skip this list if you
want.
9
This result has physical signicance. It says that there is no single physical process that can reverse the
spin of any isolated electron.
45
1. An n n matrix S with real entries gives a rigid (length and anglepreserving)
transformation of R
n
i
S
T
S = I , (33)
or equivalently, SS
T
= I. (Here S
T
denotes the transpose of S, and I denotes the
nn identity matrix.) In this case, we also say that S is an orthogonal matrix. Notice
that, since S is real, S
T
= S
) XUYU
) XUZU
)
YUXU
) YUYU
) YUZU
)
ZUXU
) ZUYU
) ZUZU
)
_
_
=
1
2
_
_
XUUX) XUUY) XUUZ)
YUUX) YUUY) YUUZ)
ZUUX) ZUUY) ZUUZ)
_
_
,
46
recalling that AB) = tr(A
i
U
j
U
) =
1
2
i
UU
j
) .
Exercise 8.6 Find the 3 3 matrix S
U
where
U =
1
5
_
3 4i
4i 3
_
.
Also nd n (exact expression) and (decimal approximation to three signicant digits).
From rotations to unitaries. Now given some rotation S of R
3
(S is a 3 3 real matrix),
we nd a U such that S = S
U
. There are two cases:
If tr S 1, then S = S
U
for exactly those U satisfying
U
1
2
1 + tr S
[(1 + tr S)I +i(([S]
23
[S]
32
)X + ([S]
31
[S]
13
)Y + ([S]
12
[S]
21
)Z)]
= (cos(/2))I
i( n )
4 cos(/2)
= (cos(/2))I i sin(/2) ( m ) = e
i( m)/2
,
where and n are given by Equations (34) and (35), respectively, and m := n/ n
is the normalized version of n. Here we use the fact that
1 + tr S = 2 cos(/2). Ill
explain the last equation in the chain more fully next time. These expressions give
the unique U with positive trace such that S
U
= S (and in addition, det U = 1).
If tr S = 1, then any one of the following three alternatives is a characterization of
all U such that S = S
U
, provided it is welldened:
U
([S]
11
+ 1)X + [S]
21
Y + [S]
31
Z
_
2([S]
11
+ 1)
,
U
[S]
12
X + ([S]
22
+ 1)Y + [S]
32
Z
_
2([S]
22
+ 1)
,
U
[S]
13
X + [S]
23
Y + ([S]
33
+ 1)Z
_
2([S]
33
+ 1)
.
The ith expression above (for i {1, 2, 3}) is welldened i [S]
ii
1. This is true
for at least one of the three for any rotation S of R
3
with trace 1.
Exercise 8.7 Find a U such that
S
U
=
1
169
_
_
151 24 72
24 137 96
72 96 119
_
_
.
Also give n and cos .
47
9 Week 5: The exponential map
The Exponential Map (Again). Equation (2) denes e
z
via a power series for all scalars
z. We can use the same power series to extend the denition to operators.
Denition 9.1 Let A be an operator in L(H) or an n n matrix. Dene
e
A
= I +A+
A
2
2!
+
A
3
3!
+ +
A
k
k!
+ =
k=0
A
k
k!
. (36)
(A
0
= I by convention.)
If A is an operator or matrix, then so is e
A
. The sum in (36) converges absolutely
10
for all A. The exponential map has may useful properties. Heres one of the most useful,
which generalizes the familiar rule that e
z
1
+z
2
= e
z
1
e
z
2
for scalars z
1
, z
2
.
Proposition 9.2 If operators A and B commute (i.e., AB = BA), then
e
A+B
= e
A
e
B
.
Proof. This closely mirrors the standard proof for scalars. We manipulate the power
series directly. Since A commutes with B, we can expand and rearrange factors in the
expression (A+B)
k
to arrive at an operator version of the Binomial Theorem:
(A+B)
k
=
k
j=0
_
k
j
_
A
j
B
kj
,
for all integers k 0. So,
e
A+B
=
k=0
(A+B)
k
k!
=
k
1
k!
k
j=0
_
k
j
_
A
j
B
kj
=
k
1
k!
k
j=0
k!
j!(k j)!
A
j
B
kj
=
k
k
j=0
A
j
B
kj
j!(k j)!
=
j!!
(setting := k j)
=
j=0
=0
A
j
B
j!!
=
_
j=0
A
j
j!
__
=0
B
!
_
= e
A
e
B
.
10
We wont delve deeply into what it means for an innite sequence of operators A
1
, A
2
, . . . to converge
(absolutely or otherwise). One easy way to express the notion of convergence (among several equivalent
ways) is to say that there exists an operator Asuch that for all vectors v, the sequence of vectors A
1
v, A
2
v, . . .
converges to Av. The operator A, if it exists, must be unique, and we write A = lim
n
A
n
. Convergence
of an innite series of operators is equivalent to the convergence of the sequence of partial sums, as usual.
Absolute convergence, which we dont bother to dene here, implies that you can regroup and rearrange
terms in the sum freely without worry.
48
2
Well leave the other properties of e
A
as exercises.
Exercise 9.3 Verify the following for any operators or square matrices A and B and any
R:
1. e
0
= I, where 0 is the zero operator. [Hint: Inspect the power series.]
2. e
A
= (e
A
)
1
. [Hint: Use the previous item and Proposition 9.2.]
3. e
A
= (e
A
)
. [Hint: You may use the fact that the adjoint of an innite (convergent)
sum is the sum of the adjoints. We know this already for nite sums.]
4. If A is Hermitean, then e
iA
is unitary. [Hint: Use the previous two items and the
fact that (iA)
= iA.]
5. A commutes with e
A
. [Hint: Inspect the power series.]
6. If A and B commute, then so do e
A
and e
B
. [Hint: Use Proposition 9.2.]
7. If U is unitary, then e
UAU
= Ue
A
U
.
Thus a Schur basis always exists for any linear operator. The proof of Theorem 1.1
uses the fact that every operator has an eigenvalue, which well discuss in the next topic.
One key property of an upper triangular matrix is that its determinant is just the
product of its diagonal entries: if T is upper triangular, then
det T =
n
i=1
[T]
ii
. (37)
Exercise 9.6 Show that if A and B are both upper triangular matrices, then so is AB, and
for each 1 i n, we have [AB]
ii
= [A]
ii
[B]
ii
, that is, the diagonal entries just multiply
individually.
Exercise 9.7 Show that if A is a nonsingular, upper triangular matrix, then A
1
is upper
triangular. What are the diagonal entries of A
1
in terms of those of A?
50
Exercise 9.8 Show that if A is upper triangular, then so is e
A
, and we have [e
A
]
ii
= e
[A]
ii
for all 1 i n. [Hint: Use the results of Exercise 9.6 and Equation (36) dening e
A
.]
Exercise 9.9 (One of my favorites.) Show that if A is any operator, then det e
A
= e
tr A
.
[Hint: Pick a Schur basis for A, then use the previous exercise and Equation (37).]
Lower triangular matrices are dened analogously and have similar properties. A
matrix is diagonal if it is both upper and lower triangular, i.e., all its nondiagonal entries
are zero.
Eigenvectors, Eigenvalues, and the Characteristic Polynomial. Let A L(H) be an
operator. A nonzero vector v Hsuch that Av = v, where C, is called an eigenvector
of A, and is its corresponding eigenvalue. Likewise, a scalar is an eigenvalue of A if
it is the eigenvalue of some eigenvector of A. If is an eigenvalue of A, then we have
0 = Av v = (A I)v, for some nonzero vector v. That is, the operator A I maps
the nonzero vector v to 0, which means that A I is singular, which in turn means that
det(AI) = 0. Conversely, if is a scalar such that det(AI) = 0, then AI is singular,
and so it maps some nonzero vector v to 0, and so we have (AI)v = 0, or equivalently,
Av = v. Thus v is an eigenvector of A with eigenvalue . Thus the eigenvalues of A are
exactly those scalars such that det(AI) = 0.
Lets write AI in matrix formwith respect to some (any) orthonormal basis. Setting
a
ij
= [A]
ij
, we get
AI =
_
_
a
11
a
12
a
1n
a
21
a
22
a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
nn
_
_
,
where n = dimH. Fixing all the a
ij
to be constant and considering to be a variable, we
see that det(AI) is a polynomial in with degree n. This is the characteristic polynomial
of A, and we denote it char
A
(). From our considerations above, the eigenvalues of A
are precisely the roots of the polynomial char
A
. Since C is algebraically closed (see the
second lecture), char
A
has n roots, and so A has exactly n eigenvalues, not necessarily all
distinct. The (multi)set of eigenvalues of A is known as the spectrum of A.
Exercise 9.10 We know that char
A
is basisindependent because it is dened in terms of
basisindependent things. Show directly that
char
A
() = char
UAU
(),
for any operator A, unitary operator U, and scalar .
51
The fact that Ahas at least one eigenvalue is the key ingredient in the proof that Ahas a
Schur basis (Background Material, Theorem 1.1). So now that we know that a Schur basis
for Areally exists, lets assume that we chose a Schur basis for Aabove, and so a
ij
= 0 for
all i > j, and hence A I is also upper triangular. So taking the determinant, which is
just the product of the diagonal entries, we get
char
A
() = det(AI) =
n
i=1
(a
ii
). (38)
From(38) it is clear that the eigenvalues of Athe roots of char
A
are exactly a
11
, . . . , a
nn
.
This is true because we chose a basis making the matrix representing A upper triangular,
but from this we get two useful, basisindependent facts: If
1
, . . . ,
n
are the eigenvalues
of A counted with multiplicities, then
tr A =
n
i=1
i
, and
det A =
n
i=1
i
.
Some of the coecients of the polynomial char
A
are familiar. If we expand (38) and
group together powers of (), we get
char
A
() = ()
n
+ (a
11
+a
22
+ +a
nn
)()
n1
+ +a
11
a
22
a
nn
(39)
= ()
n
+ (tr A)()
n1
+ + det A. (40)
The constant term is det A, which can also be seen by noting that this term is
char
A
(0) = det(A 0I) = det A.
Exercise 9.11 Find the eigenvalues of the 2 2 matrix A =
_
3 1
4 2
_
. Find eigenvectors
corresponding to each eigenvalue.
What are the eigenvalues of A
I) = det((A
I)
) = (det(A
I))
= (char
A
(
))
,
and so char
A
() = 0 if and only if char
A
(
are the
complex conjugates of the eigenvalues of A.
Eigenvectors and Eigenvalues of Normal Operators. Suppose A is a Hermitean opera
tor. Then for any eigenvector v of A with eigenvalue , we have
vv) = vv) = vAv) = A
vv).
52
Since vv) = v
2
> 0, we get =
2
v
1
v
2
) = v
1
Av
2
) = Av
1
v
2
) =
1
v
1
v
2
) =
1
v
1
v
2
).
Thus if
1
2
, then this can only be because v
1
v
2
) = 0. In other words, eigenvectors of
a Hermitean operator with distinct eigenvalues must be orthogonal.
An eigenbasis for an operator A is an orthonormal basis of eigenvectors of A. In such a
basis, A is given by a diagonal matrix. If A is Hermitean, then A has an eigenbasis. This
can be proved directly by a routine induction on the dimension of A, but it also a special
case of a more general result.
An operator (or matrix) A is normal if it commutes with its adjoint, i.e., AA
= A
A.
Note that normality of matrices is unitarily invariant (if M is normal then any unitary
conjugate of M is normal), and hence independent of the choice of orthonormal basis.
This can be veried directly, or just by observing that normality is dened in terms of
properties of the operator A itself, and is therefore basisindependent. Notice that all
Hermitean operators and all unitary operators are normal. We know that A has a Schur
basis, in which A is represented by an upper triangular matrix. In Theorem 1.2 of the
Background Material, we show that if a matrix is both normal and upper triangular, then
it is diagonal. Hence A is represented in this basis as a diagonal matrix. That is, any
Schur basis of a normal operator Ais an eigenbasis of A, and the diagonal elements of the
(diagonal) matrix representing A in this basis are the eigenvalues of A. This is known as
the Spectral Theorem for Normal Operators.
If U L(H) is unitary, thenUis normal, andchoosing aneigenbasis for U, we represent
it as a diagonal matrix D. For each 1 i n, let d
i
= [D]
ii
. Since DD
= I (Dis unitary),
we have
1 = [I]
ii
= [DD
]
ii
= d
i
d
i
= d
i

2
,
and thus the eigenvalues of U all lie on the unit circle in C.
The next lemma generalizes what we showed for Hermitean operators.
Lemma 9.12 If A L(H) is normal, and v
1
and v
2
are eigenvectors of Awith distinct eigenvalues,
then v
1
v
2
) = 0.
Proof. Let B be an eigenbasis for A. Well label the elements of B according to their
eigenvalues. Let
1
, . . . ,
k
be all the distinct eigenvalues of the elements of B (without
repetition). The vectors in B can be listed as follows:
b
11
, . . . , b
1m
1
, b
21
, . . . , b
2m
2
, . . . , b
k1
, . . . , b
km
k
,
where b
i1
, . . . b
im
i
are the basis vectors with eigenvalue
i
, for 1 i k. We write any
vector v H as
v =
k
i=1
m
i
j=1
a
ij
b
ij
,
53
for some scalar coecients a
ij
. Nowsuppose that v is an eigenvector of Awith eigenvalue
. If 1 r k and 1 s m
r
, then we have
a
rs
= b
rs
v) = b
rs
Av) =
k
i=1
m
i
j=1
a
ij
b
rs
Ab
ij
) =
j
a
ij
i
b
rs
b
ij
) = a
rs
r
.
So either =
r
or a
rs
= 0. Since some coecient of v is nonzero, we must have
{
1
, . . . ,
k
} (so {
1
, . . . ,
k
} is the entire spectrum of A, not counting duplicates), and
if i is such that =
i
, then v =
m
i
j=1
a
ij
b
ij
, all the other coecients being zero. Thus
any two eigenvectors of A with dierent eigenvalues are spanned by disjoint sets of basis
vectors in B, and therefore must be orthogonal. 2
If A is an operator and is an eigenvalue of A, then we dene the eigenspace of A with
respect to as
E
(A) = {v H : Av = v}.
This is a subspace of H with positive dimension.
Corollary 9.13 If A is normal, then its eigenspaces are mutually orthogonal and span H. The
dimension of each eigenspace is the same as the multiplicity of the corresponding eigenvalue.
Well omit the proof of the following useful corollary:
Corollary 9.14 If A is normal, then there is a unique set {(P
1
,
1
), . . . , (P
k
,
k
)}, such that the
j
C are all distinct, the set {P
1
, . . . , P
k
} is a complete set of orthogonal projectors, and
A =
1
P
1
+
2
P
2
+ +
k
P
k
. (41)
Furthermore,
1
, . . . ,
k
are the distinct eigenvalues of A, and each P
j
orthogonally projects onto
E
j
(A).
The righthand side of Equation (41) is called the spectral decomposition of A.
Exercise 9.15 Show that if A is a normal operator with spectral decomposition
A =
1
P
1
+
2
P
2
+ +
k
P
k
as in Corollary 9.14, then for any integer m 0,
A
m
=
m
1
P
1
+
m
2
P
2
+ +
m
k
P
k
.
(We dene A
0
= I by convention.) [Hint: Induction on m.]
54
Exercise 9.16 Show that if A is a normal operator with spectral decomposition
A =
1
P
1
+
2
P
2
+ +
k
P
k
as in Corollary 9.14, then
e
A
= e
1
P
1
+e
2
P
2
+ +e
k
P
k
.
[Hint: Use the last exercise.]
Well be dealing with normal operators almost exclusively from now on.
Exercise 9.17 We know that any Hermitean operator is normal with real eigenvalues.
Prove the converse: any normal operator with only real eigenvalues is Hermitean. [Hint:
Use an eigenbasis.]
Exercise 9.18 We know that any unitary operator is normal with eigenvalues on the unit
circle in C. Prove the converse: any normal operator with all eigenvalues on the unit circle
is unitary. [Hint: Use an eigenbasis.]
Positive Operators. An operator A L(H) is positive or positive semidenite (written
A 0) if vAv) 0 for all v H. We say that A is strictly positive or positive denite
(written A > 0) if vAv) > 0 for all v H.
Exercise 9.19 Verify that if A 0 and B 0 are positive operators and a 0 is a
nonnegative real number, then A+B 0 and aA 0.
Exercise 9.20 (A bit challenging) Show for any operator A L(H) that A is Hermitean
if and only if vAv) R for all v H. (Thus every positive operator is Hermitean and
hence normal.) [Hint: The forward direction is easy. For the reverse direction, consider
the matrix elements of A with respect to some orthonormal basis b
1
, . . . , b
n
. Consider
three types of cases:
1. v = b
k
for some k. What does this tell you about the diagonal elements [A]
kk
?
2. v = b
k
+b
j
for some k j. This allows you to relate [A]
kj
and [A]
jk
in some way.
3. v = b
k
+ib
j
for the same k, j above. This allows you to relate [A]
kj
and [A]
jk
further.]
Exercise 9.21 Show that A 0 if and only if A is normal and all its eigenvalues are
nonnegative real numbers. [Hint: Use the previous exercise.]
55
Exercise 9.22 Show that if A 0 and tr A = 0, then A = 0. [Hint: Use the previous
exercise.]
Exercise 9.23 Show that the following are equivalent for any operator A:
1. A > 0.
2. A is normal, and all its eigenvalues are positive reals.
3. A 0 and A is nonsingular.
You may have noticed that you can determine a lot about a normal operator by its
spectrum. This is not too surprising, because
most properties weve been looking at of the underlying matrices are basisinvariant
(i.e., invariant under unitary conjuation),
every normal operator is representable by a diagonal matrix in some basis, and
the spectrum of a diagonal matrix is just the set of diagonal elements of the matrix.
Each entry in the following table is easily checked by representing the operator as a
diagonal matrix with respect to an eigenbasis.
A normal operator is . . . . . . i all its eigenvalues are . . .
nonsingular (invertible) nonzero
Hermitean real
unitary on the unit circle
positive nonnegative
strictly positive positive
a projector either 0 or 1
the identity 1
the zero operator 0
If A 0 is a positive operator, then there exists a unique positive operator B 0 such
that B
2
= A. We denote B by A
1/2
or by
A. To see that B exists, we decompose
A =
1
P
1
+ +
k
P
k
uniquely according to Corollary 9.14. Since A 0, we have
j
0 for 1 j k. Now let
B =
_
1
P
1
+ +
_
k
P
k
.
56
B has eigenvalues
1
, . . . ,
k
0, so B 0. By Exercise 9.15, we get
B
2
= (
_
1
)
2
P
1
+ + (
_
k
)
2
P
k
= A.
To show uniqueness, suppose that B, C 0 such that B
2
= C
2
= A. Using Corollary 9.14
again, decompose
B =
1
P
1
+ +
k
P
k
,
C =
1
Q
1
+ +
.
So,
B
2
=
2
1
P
1
+ +
2
k
P
k
= A =
2
1
Q
1
+ +
2
= C
2
.
Note that the
j
are distinct and nonnegative (same with the
j
), and therefore so are
the
2
j
(same with the
2
j
). Then since the decomposition of A from Corollary 9.14 is
unique, we must have {(P
1
,
2
1
), . . . , (P
k
,
2
k
)} = {(Q
1
,
2
1
), . . . , (Q
,
2
0 and
UAU
= U
AU
AU
0 and that
(U
AU
)
2
= UAU
.]
If A is any operator, then A
_
A
1
0
0 diag(A
2
, . . . , A
k
)
_
_
Two easy but important properties of (block) diagonal matrices are
1. diag(A
1
, . . . , A
k
)
= diag(A
1
, . . . , A
k
), and
2. diag(A
1
, . . . , A
k
)diag(B
1
, . . . , B
k
) = diag(A
1
B
1
, . . . , A
k
B
k
), provided each A
j
is the
same size as B
j
.
Theorem 9.27 If A, B L(H) are normal and AB = BA, then A and B share a common
eigenbasis.
Proof. Suppose A and B are commuting operators (normal or otherwise), i.e., AB = BA.
If v is an eigenvector of A with eigenvalue , then
ABv = BAv = B(v) = Bv,
that is, Bv is also an eigenvector of Awith eigenvalue . Therefore, Bmaps any eigenspace
E
1
(A)), b
m
1
+1
, . . . , b
m
1
+m
2
all have eigenvalue
2
, and
so on. Since B maps each eigenspace of A into itself, B must be in blockdiagonal form
with respect to B:
B = diag(B
1
, . . . , B
k
),
where each B
j
is an m
j
m
j
matrix. Since B is normal, we get
diag(B
1
B
1
, . . . , B
k
B
k
) = diag(B
1
, . . . , B
k
)diag(B
1
, . . . , B
k
)
= BB
= B
B
= diag(B
1
B
1
, . . . , B
k
B
k
),
and thus for each 1 j k, we have B
j
B
j
= B
j
B
j
. So each B
j
matrix is normal, which
means there is an m
j
m
j
unitary matrix U
j
such that U
j
B
j
U
j
is a diagonal matrix. Set
U = diag(U
1
, . . . , U
k
).
58
U is evidently unitary, and UBU
= diag(U
1
B
1
U
1
, . . . , U
k
B
k
U
k
) is a diagonal matrix. We
claim that UAU
is also a diagonal matrix (with respect to B). Note that by our choice of
b
1
, . . . , b
n
, A = diag(
1
I
1
, . . . ,
k
I
k
), where each I
j
is the identity matrix of size m
j
. So the
matrix
UAU
= diag(
1
U
1
I
1
U
1
, . . . ,
k
U
k
I
k
U
k
) = diag(
1
I
1
, . . . ,
k
I
k
) = A,
which is already diagonal with respect to B. Thus U simultaneously diagonalizes A and
B, and so A and B share a common eigenbasis. 2
The proof technique of Theorem 9.27 can be combined with mathematical induction
to prove the more general result:
Theorem 9.28 If A
1
, . . . , A
s
are normal operators any two of which commute, then there is a
common eigenbasis for A
1
, . . . , A
s
.
10 Week 5: Tensor products
Tensor Products and Combining Physical Systems. Suppose we have two physical
systems S and T with state spaces H
S
and H
T
, respectively, and we want to consider the
two systems together as a single system ST. What is the state space of ST? Quantum
mechanics says that the state space of ST is completely determined by H
S
and H
T
via a
construction called the tensor product. Well rst describe the tensor product of matrices,
then well discuss the tensor product in a basisindependent way.
Let A be an mn matrix and let B be an r s matrix (m, n, r, s are arbitrary positive
integers). The tensor product of A and B (also called the outer product or the direct product
or the Kronecker product) is the mr ns matrix given in block form by
AB =
_
_
a
11
B a
12
B a
1n
B
a
21
B a
22
B a
2n
B
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
B a
m2
B a
mn
B
_
_
.
We collect the standard, easily veriable properties of the operation here in one
place. For any matrices A, B, C, D and scalars a, b C, the following equations hold
provided the operations involved are welldened:
1. a b = ab, where we identify scalars with 1 1 matrices as usual.
2. A(B +aC) = AB +a(AC) and (A+aB) C = AC+a(B C), that is,
is bilinear (linear in both arguments).
59
3. (AB) C = A(B C), that is, is associative.
4. (AB)(C D) = AC BD.
5. (AB)
= A
.
Exercise 10.1 Give the 4 4 matrices for I X, X I, X Y, and Z Z.
Exercise 10.2 Show that if A and B are Hermitean (respectively, unitary), then A B is
Hermitean (respectively, unitary).
A special case is when u = (u
1
, . . . , u
m
) C
m
and v = (v
1
, . . . , v
n
) C
n
are column
vectors. Then
u v =
_
_
u
1
v
u
2
v
.
.
.
u
m
v
_
_
=
_
_
u
1
v
1
.
.
.
u
1
v
n
u
2
v
1
.
.
.
.
.
.
u
m
v
n
_
_
C
mn
.
If {e
1
, . . . , e
m
} and {f
1
, . . . , f
n
} are the standard bases for C
m
and C
n
respectively as in
Equation (5), then it is clear that {e
i
f
j
: 1 i m & 1 j n} is the standard basis
for C
mn
. If w = (w
1
, . . . , w
m
) C
m
and x = (x
1
, . . . , x
n
) C
n
, then for the standard inner
product we have
u vwx) = (u v)
(wx) = u
wv
x = uw)vx).
From this it is easy to see that if {b
1
, . . . , b
m
} and {c
1
, . . . , c
n
} are any orthonormal bases for
C
m
and C
n
, respectively, then {b
i
c
j
: 1 i m & 1 j n} is an orthonormal basis
for C
mn
. Indeed, we have
b
i
c
j
b
k
c
) = b
i
b
k
)c
j
c
) =
ik
j
,
which is 1 if i = k and j = and is 0 otherwise.
This last bit suggests that we can dene the tensor product in a basisindependent way,
applied to (abstract) vectors and operators. If H and J are Hilbert spaces, then we can
dene a Hilbert space H J (the tensor product of H and J) together with a bilinear map
: HJ HJ, mapping any pair of vectors u Hand v J to a vector uv HJ,
such that if {b
1
, . . . , b
m
} and {c
1
, . . . , c
n
} are orthonormal bases for H and J, respectively,
then {b
i
c
j
: 1 i m & 1 j n} is an orthonormal basis for H J. Well call such
a basis a product basis. We wont do it here, but it can be shown that these two rules
bilinearity and the basis ruledene in essence the Hilbert space HJ uniquely. Notice
60
that the basis rule implies that the dimension of HJ is the product of the dimensions of
H and J.
Its worth pointing out that not all vectors in HJ are of the formuv for u Hand
v J. For example, the column vector (1, 0, 0, 1) = e
1
+e
4
cannot be written as the single
tensor product of two 2dimensional column vectors. It can, however, be written as the
sum of two tensor products:
_
_
1
0
0
1
_
_
=
_
1
0
_
_
1
0
_
+
_
0
1
_
_
0
1
_
.
In general a vector in H J may not be a tensor product, but it is always a linear
combination of them (which is clear by our discussion about bases, above), i.e., the tensor
products span the space HJ.
Were not done overloading the symbol. Given the denition of HJ just described,
we can extend to apply to operators as well as vectors. For example, we can extend it
to a map : L(H) L(J) L(HJ) by dening the action of an operator A B on a
vector u v HJ:
(AB)(u v) = Au Bv.
One can show that this denition is consistent, and since H J is spanned by vectors of
the form u v, this denes the operator A B uniquely by linearity. We could dene
on dual vectors and other kinds of linear maps, e.g., mapping from one space to another
space.
Picking orthonormal bases for H and J allows us to represent objects such as vectors,
dual vectors, operators, or what have you, in both spaces as matrices. When we do
this, the abstract and matrixbased notions of completely coincide, as is the case with
the other linear algebraic constructs that weve seen, e.g., adjoint, trace, et cetera. This
idea (that the two notions should coincide) guides us in any further extensions of the
operation that we may wish to use.
Back to Combining Physical Systems. If S and T are physical systems with state spaces
H
S
and H
T
as before, then the state space of the combined system is H
ST
= H
S
H
T
. If
)
S
is a state of S and )
T
is a state of T (we occasionally add subscripts to make clear
which state goes with which system), then )
S
)
T
is a state of ST, which we interpret
as saying, The system S is in state )
S
, and the system T is in state )
T
. (Well often
drop the and write )
S
)
T
simply as )
S
)
T
, or even just , ) if the meaning
is clear. The same holds for bras as well as kets.) As weve seen, however, there can be
states of ST that cant be written as a single tensor product, for example, the twoqubit
state (0)0) +1)1))/
2. These states are called entangled states, whereas states of the form
)
S
)
T
are called separable states or tensor product states. More on this later.
61
How does this look in the density operator formalism? Easy answer: exactly the same,
at least for separable states. Let
S
= ) be the density operator corresponding to )
of system S, and let
T
= ) be the density operator corresponding to ) of system T
(subscripts dropped). Then the density operator for the combined system should be
ST
= ()))()))
= ()))()
)
) = ()))() = )) =
S
T
.
So we take the tensor product of the density operators just as we woulddo with the vectors
in the original formulation. For the twoqubit entangled state example (0)0) +1)1))/
2
above, which we abbreviate as (00) +11))/
2
__
00 +11
2
_
=
1
2
_
_
1
0
0
1
_
_
_
1 0 0 1
=
1
2
_
_
1 0 0 1
0 0 0 0
0 0 0 0
1 0 0 1
_
_
.
If S and T are isolated from each other (and the outside world), then each evolves in
time according to a unitary operator, say U for system S and V for system T. U and V
are called local operations. Obviously, UV is the unitary giving the time evolution of the
combined system. If S and T are brought together so that they interact, then the unitary
giving the evolution of the combined systemST might not be able to be written as a single
tensor product of unitaries for S and T respectively.
Exercise 10.3 Let H
S
and H
T
be Hilbert spaces, and let P
1
, . . . , P
k
L(H
S
) be a complete
set of orthogonal projectors for H
S
. Show that P
1
I, . . . , P
k
I is a complete set of
orthogonal projectors for H
S
H
T
, where I is the identity operator on H
T
. (The latter set
represents a projective measurement on the system S when viewed from the combined
system ST.)
Continuing the idea of Exercise 10.3, let P
1
, . . . , P
k
L(H
S
) and Q
1
, , Q
L(H
T
)
be complete sets of orthogonal projectors for systems S and T, respectively. Suppose that
the combined system ST is in some arbitrary state ) and that Alice measures system S
using the rst set of projectors. Then the exercise illustrates howshe is actually measuring
systemST with projectors P
1
I, . . . , P
k
I, where I is the identity operator on H
T
. Shell
see some outcome i with some probability, and the state of ST will collapse to some 
i
)
according to the usual rules. If Bob now measures system T using the second set of
projectors when ST is in state 
i
) (which is tantamount to measuring ST with projectors
I Q
1
, . . . , I Q
U()0))
= (U)0))
U()0))
()()))
= )
2
,
and thus ) = )
2
, which implies ) is either 0 or 1, i.e., ) and ) are either
orthogonal or colinear. But clearly we can choose ) and ) such that this is not the case.
2
Quantum Circuits. The quantum circuit has become the de facto standard theoretical
model of quantum computation. It is equivalent to the other standard modelthe quan
tum Turing machine, or QTMbut it is easier to work with and represent visually. Quan
tum circuits are closely analogous to classical Boolean circuits, and well compare them
occasionally.
A quantum circuit consists of some number of qubits, called a quantum register, rep
resented by horizontal wires. The qubits start in some designated state, representing the
input to the circuit. From time to time, we may act on one or more qubits in the circuit
by applying a quantum gate, which is just a unitary operator applied to the corresponding
qubits. A typical circuit with a twoqubit register is shown in Figure 3. To keep track, we
63
H
Figure 3: Sample quantum circuit with two qubits. Time moves from left to right in the
gure. The gate H is applied rst to the rst qubit, then CNOT is applied to both qubits.
number the qubits in the register from top to bottom, so that the topmost qubit is the rst,
etc. At any given time, the register is in some quantum state ) H H = H
n
,
where H is here the state space of a single qubit, and n is the number of qubits in the
register. We choose an orthonormal basis for H
n
by taking tensor products of the indi
vidual onequbit basis vectors 0) and 1). We call this basis the computational basis for the
register. For example, a typical computational basis vector in H
5
is
0) 0) 1) 0) 1) = 0)0)1)0)1) = 00101).
In this state, the rst, second, and fourth qubits are 0, and the third and fth qubits are
1. The state space of an nqubit register has dimension 2
n
, with computational basis
vectors representing all the 2
n
possible values of n bits, listed in the usual binary order:
00 00), 00 01), 00 10), . . . , 11 11).
In the circuit diagram, the state of the register evolves in time from left to right. In
Figure 3, for example, the rst gate that is applied is the leftmost gate, i.e., the H gate
applied to the rst qubit. Here, we are not using Has a variable to describe any onequbit
gate, but rather we use H to denote a useful onequbit gate, known as the Hadamard gate,
given by
H =
1
2
_
1 1
1 1
_
.
Note that
H0) = (0) +1))/
2,
H1) = (0) 1))/
2,
or more succinctly,
Hb) =
0) + (1)
b
1)
2
,
for any b {0, 1}. Clearly, H = (X + Z)/
2 and H
2
= I. We also have H R
(1,0,1)/
2
(),
and so H rotates the Bloch sphere 180 degrees around the line through (1, 0, 1), swapping
the +zaxis with the +xaxis.
Note that although it looks as if we are only applying H to the rst qubit, were are
really transforming the state ) HH of the entire twoqubit register via the unitary
64
HI, where I is the onequbit identity operator representing the fact that we are not acting
on the second qubit. Suppose that the initial state of the register is 00). After the H gate
is applied, the state becomes

1
) = (HI)00) =
_
0) +1)
2
_
0) =
00) +10)
2
.
65
11 Week 6: Quantum gates
The next gate in Figure 3 is another very useful, twoqubit gate called a controlled NOT or
CNOT gate, acting on both qubits. In a CNOT gate, the small black dot connects to the
control qubit (here, the rst qubit) and the end connects to the target qubit. If the control
is 0), then the target does not change; if the control is 1), then the targets Boolean value
is ipped 0) 1) (logical NOT). The control qubit is unchanged regardless. Here it is
schematically for any a, b {0, 1} (here, represents bitwise exclusive OR, i.e., bitwise
addition modulo 2):
a
a b
a
b
The matrix for the CNOT gate above, with the rst qubit being the control and the second
being the target, is
P
0
I +P
1
X = 0)0 I +1)1 X =
_
_
1 0 0 0
0 1 0 0
0 0 0 1
0 0 1 0
_
_
.
Here X is the usual Pauli X operator, which swaps 0 with 1, and hence represents logical
NOT. If the control and target qubits were reversed, then the gate would be
I P
0
+X P
1
=
_
P
0
P
1
P
1
P
0
_
=
_
_
1 0 0 0
0 0 0 1
0 0 1 0
0 1 0 0
_
_
.
After the CNOT gate is applied to the state 
1
) in Figure 3, the new and nal state of the
circuit is

2
) = CNOT
1
) = CNOT
_
00) +10)
2
_
=
00) +11)
2
.
Keepin mindthat the CNOTgate (as with any quantumgate) acts linearly on the superpo
sition (00) +10))/
2
_
=
_
1 0
0 e
i/4
_
.
For some obscure reason, this gate is known as the /8 gate, maybe because
T R
z
(/4) =
_
e
i/8
0
0 e
i/8
_
.
67
We have T
2
= S, and T rotates the Bloch sphere counterclockwise 45 degrees about the
+zaxis. Notice that T is the only onequbit gate weve seen so far that does not map all
axes to axes (i.e., x, y, and zaxes) in the Bloch sphere. Id call the three gates Z, S, and
T conditional phaseshift gates, that leave the Boolean value of the qubit unchanged while
introducing various phase factors conditioned on the qubit having Boolean value 1.
Heres another twoqubit classical gate, the SWAP gate:
= =
_
_
1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1
_
_
The rst depiction is mine and other peoples; the second is the one the textbook uses.
The SWAP gate just exchanges the Boolean values of the two qubits it acts on, xing 00)
and 11) but mapping 01) to 10) and vice versa.
Exercise 11.2 This is an entirely classical exercise. Show that
=
[Hint: Rather than multiplying matrices, which can be timeconsuming, just compare
what the two circuits do to the four possible basis states.]
Exercise 11.3 Do Exercise 4.16 on pages 178179 of the text.
Exercise 11.4 This is a nonclassical exercise in several parts. It will help you to simplify
circuits by inspection, based on some circuit identities. It mirrors Exercises 4.13 and
4.174.20 on pages 177180 of the text. An item may use previous items.
1. Verify directly that HXH = Z and that HZH = X (oh yes, and that HYH = Y).
2. Verify that
=
Z
Z
What is the matrix of this gate? The same is true for the CS and CT gates.
68
3. Show that, for any unitary gates U and A,
=
A UAU
U
[Hint: Consider separately the case when the control qubit is 0) and when it is 1).
To show equality of two linear operators generally, you only need to show that they
both act the same on the vectors of some basis.]
4. Construct a CZgate using a single CNOTgate andtwo Hgates. Similarly, construct
a CNOT gate using a single CZ gate and two H gates.
5. Using the previous items, show that
=
H
H H
H
Note that gates acting on separate qubits commute, and so it doesnt matter which
of the gates is applied rst, and the order can be freely switched, provided that there
are no gates in between that connect the qubits together. You can think of the gates
as being applied simultaneously if you like.
Finally, we introduce a threequbit classical gate known as the Tooli gate, which is
really a controlled controlled NOT gate:
c (a b)
b
c
a a
b
There are two control qubits and one target qubit. The control qubits are unchanged, and
the target is ipped if and only if both of the controls are 1.
QuantumCircuits Versus Boolean Circuits. Are quantum circuits with unitary gates as
powerful as classical Boolean circuits? You may have already noticed some similarities
and dierences between the two circuit models:
69
Both types of circuits carry bit values on wires which are acted on by gates.
Quantum gates can create superpositions from basis states, but Boolean gates are
classical, mapping Boolean input values to denite Boolean output values.
A Boolean gate may take some number of inputs (usually one or two), and has one
output, which can be freely copied into any number of wires, and thus the number
of wires from layer to layer may change. In quantum circuits, quantum gates are
operators mapping the state space into itself, and so it always has the same number
of outputs as inputs. Thus the number of qubits never changes, and each qubit
retains its identity throughout the circuit.
Boolean gates may lose information from inputs to output, i.e., the input values
are not uniquely recoverable from the output value (e.g., and AND gate or an OR
gate). Any quantum unitary gate U can always be undone (at least theoretically)
by applying U
D
Figure 4: A full implementation of the circuit C. Inputs and ancilla values are restored by
undoing the computation after copying the outputs to fresh qubits. The locations of the
output register and the ancilla are swapped for ease of display. The circuit implementing
D
, the inverse of Dis an exact mirror image of the circuit for D. The values on the qubits
intermediate between the D and D
1

2
) =
1

2
)00 000 0)
= (
1
00 0)(
1
)00 0))
= (
1
00 0C
)(C
1
)00 0)) (since C is unitary)
= ((
1
C
)00 0)((C

1
))00 0)) (by the denition of C
)
73
=
1
C
C
2
)00 000 0)
=
1
C
C
2
).
Thus C
preserves the inner product on H and so must be unitary. This justies our
suppressing the ancilla when we use C as a new unitary gate in another circuit. We
are really using C
x{0,1}
3
x
x),
where
x

x

2
= 1. For each i = 1, 2, 3, give an expression for the probability of seeing 1
when the ith qubit is measured, and give the postmeasurement state in each case.
Based on the discussion after Exercise 10.3, we may measure several dierent qubits
at once, since the actual chronological order of the measurements does not matter. Heres
a completely typical example: we decide to measure qubits 2, 3, and 5 of a nqubit system
(where n 5, obviously). The state ) of an nqubit system can always be expressed as
a linear combination of basis states:
) =
x{0,1}
n
x
x), (42)
where each
x
is a scalar in C, and
x{0,1}
n

x

2
= ) = 1. (43)
If we measure qubits 2,3, and 5 when the system is in state ), what is the probability that
we will see, say, 101, i.e., 1 for qubit 2, 0 for qubit 3, and 1 for qubit 5? The corresponding
projector is P = I P
1
P
0
I P
1
I I, where I is the singlequbit identity operator.
The probability is then
Pr[101] = P) =
x : x
2
x
3
x
5
=101

x

2
,
where we are letting x
j
denote the jth bit of x. That is, we only retain those terms in the
sum in (43) in which the corresponding bits of x match the outcome. Upon seeing 101, the
postmeasurement state will be
post
_
=
P)
Pr[101]
=
1
Pr[101]
x : x
2
x
3
x
5
=101
x
x).
We will often measure several qubits at once, so this example will come in handy.
Bell States and Quantum Teleportation. Recall the circuit of Figure 3. Let B be the
twoqubit unitary operator realized by this circuit. The four states obtained by applying
B to the four computational basis states are known as the Bell states and form the Bell basis:

+
) := B00) = (00) +11))/
2, (44)

+
) := B01) = (01) +10))/
2, (45)

2, (46)

2. (47)
75
These states are also called EPR states or EPR pairs. In a sense we will quantify later,
these states represent maximally entangle pairs of qubits. EPR is an acronym for Einstein,
Podolsky, and Rosen, who coauthored a paper describing apparent paradoxes in the rules
of quantum mechanics involving pairs of qubits in states such as these. Suppose a pair of
electrons is prepared whose spins are in one of the Bell states, say 
+
). (There are actual
physical processes that can do this.) The electrons can then (theoretically) be separated
by a great distancethe rst taken by Alice to a lab at UC Berkeley in California and the
second taken by Bob to a lab at MIT in Massachusetts. If Alice measures her spin rst,
shell see 0 or 1 with equal probability. Same with Bob if he measures his spin rst. But if
Alice measures her spin rst and sees, say, 0, then according to the standard Copenhagen
interpretation of quantum mechanics (which we are using), the state of the two spins
collapses to 00), so if Bob measures his spin afterwards, he will see 0 with certainty. So
Alices measurement seems to aect Bobs somehow. Einstein called this phenomenon
spooky action at a distance. Well talk about this more later, time permitting.
Philosophical problems aside, entangled pairs of qubits can be used in interesting and
subtle ways. One of the earliest discovered uses of EPR pairs is to teleport an unknown
quantum state across a distance using only classical communication, in a process called
quantum teleportation. Suppose Alice and Bob share two qubits in the state 
+
) as above,
which may have been distributed to them long ago. Suppose also that Alice has another
qubit in some arbitrary, unknown state
) = 0) +1).
She wants Bob to have this state. She could mail her electron to Bob, but this wont work
because the state ) of the electron is very delicate and will be destroyed if the package is
bumped, screened with Xrays, etc. Instead, she can transfer this state to Bob with only a
phone call. No quantum states need to be physically transported between Alice and Bob.
Heres how it works: The state of the three qubits initially is
)
+
) = (0) +1))(00) +11))/
2. (48)
Alice possesses the rst two qubits; Bob possesses the third. Alice applies the inverse B
H
to her two qubits. She then measures each qubit in the computational basis, getting
Boolean values b
1
and b
2
for the rst and second qubits, respectively. She then calls
Bob on the phone and tells him the values she observed, i.e., b
1
and b
2
. Bob then does
the following with his qubit (the third qubit): (i) if b
2
= 1, then Bob applies an X gate,
76

+
)
Z X
b
1
b
2
)
)
Figure 5: Quantum teleportation of a single qubit. Alice possesses the rst qubit in some
arbitrary, unknown state ). The second and third qubits are an EPR pair prepared in
the state 
+
) sometime in the past, with the second qubit given to Alice and the third to
Bob. Alice applies B
)) /
2,
01) = (
+
) +
)) /
2,
10) = (
+
) 
)) /
2,
11) = (
+
) 
)) /
2,
so the initial state of (48) is
1
2
[(
+
) +
)) 0) +(
+
) +
)) 1) +(
+
) 
)) 0) +(
+
) 
)) 1)]
=
1
2
[
+
) (0) +1)) +
+
) (1) +0)) +
) (0) 1)) +
) (1) 0))] .
Going back to Equations (4447) and applying B
maps 
+
)
to 00) and so on. So after Alice applies B
Z X b
1
b
2
b
1
b
2
Figure 6: Dense coding. The EPR pair is initially distributed between Alice and Bob, with
Alice getting the rst qubit. The stu above the dotted line belongs to Alice, and the rest
belongs to Bob. The qubit crosses the dotted line when Alice sends it to Bob.
to the state in (49) and normalizing, we get

00
) = 00)(0) +1)) = 00)),

01
) = 01)(1) +0)) = 01)(X)),

10
) = 10)(0) 1)) = 10)(Z)),

11
) = 11)(1) 0)) = 11)(XZ)).
We see that Bobs qubit is now in one of four possible states: ), X), Z), or XZ),
depending on whether the values measured by Alice are 00, 01, 10, or 11, respectively.
Now Bob simply uses the information about b
1
and b
2
to undo the Pauli operators on his
qubit, yielding ) in every case.
This scenario can be used to teleport an nqubit state from Alice to Bob by teleporting
each qubit separately, just as above.
Note that Alice must tell Bob the values b
1
and b
2
so that Bob can recover ) reliably.
This means that quantum states cannot be teleported faster than the speed of light. Also
note that after the protocol is nished, Alice no longer possesses ). She cant, because
that would violate the NoCloning Theorem. Finally, note that the EPR state that Alice
and Bob shared before the protocol no longer exists. It is used up, and cant be used to
teleport additional states. Thus, teleporting an nqubit state needs n separate EPR pairs.
Dense Coding. In quantum teleportation, with the help of an EPR pair, Alice can sub
stitute transmitting a qubit to Bob with transmitting two classical bits. There is a converse
to this: with the help of an EPR pair, Alice can substitute transmitting two classical bits to
Bob with transmitting a single qubit. This inverse tradeo is known as dense coding.
Figure 6 illustrates how dense coding works. Alice has two classical bits b
1
and b
2
that
she wants to communicate to Bob. She also shares an EPR pair with Bob in state 
+
) as
before. If b
2
= 1, Alice applies X to her half of the EPR pair, otherwise she does nothing.
Then, if b
1
= 1, she applies Z to her qubit, otherwise she does nothing. She then sends
78
her qubit to Bob. Bob now has both qubits. He applies B
2 = 
+
),

10
) = (Z I)
+
) = (00) 11))/
2 = 
),

11
) = (ZX I)
+
) = (01) 10))/
2 = 
).
So Alice is just preparing one of the four Bell states. So when Bob applies B
to 
b
1
b
2
), he
gets b
1
b
2
), yielding b
1
b
2
upon measurement.
Note that, as before, the EPR pair is consumed in the process.
Exercise 12.2 Recall the twoqubit swap operator SWAP satisfying SWAPa)b) = b)a)
for all a, b {0, 1}. Show that the four Bell states are eigenvectors of SWAP. What are the
corresponding eigenvalues? For this and other reasons, the states 
+
), 
), and 
+
)
are often called symmetric states, triplet states, or spin1 states, while the state 
) is often
called the antisymmetric state, the singlet state, or the spin0 state.
79
13 Week 7: Basic quantum algorithms
BlackBox Problems. Many quantum algorithms solve what are called blackbox
problems. Typically, we are given some Boolean function f : {0, 1}
n
{0, 1}
m
and we
want to answer some question about the function as a whole, for example, Is f con
stant?, Is f the zero function?, Is f onetoone?, etc. We are allowed to feed an input
x {0, 1}
n
to f and get back the output f(x). The input x is called a query to f and f(x) is the
query answer. Other than making queries to f, we are not allowed to inspect f in any way,
hence the blackbox nature of the function. (A blackbox function f is sometimes called
an oracle.) Generally, we would like to answer our question by making as few queries to
f as we can, since queries may be expensive.
In the context of quantum computing, the function f is most naturally given to us as a
classical, unitary gate U
f
that acts on two quantum registersthe rst with n qubits and
the second with m qubitsand behaves as follows for all x {0, 1}
n
and y {0, 1}
m
:
U
f
x, y) = x, y f(x)).
This is reasonable, given the restriction that unitary quantum gates must be reversible.
U
f
is called an fgate. To solve a blackbox problem involving f, we are allowed to build a
quantum circuit using fgatesas well as the other usual unitary gates. Each occurrence
of an fgate in the circuit counts as a query to f, so the number of queries is the number of
fgates in the circuit. The dierence between classical queries to f and quantum queries
to f is that we can feed a superposition of several classical inputs (basis states) into the
fgate, obtaining a corresponding superposition of the results. Well see in a minute that
we can use this idea, known as quantum parallelism to get more information out of f in
fewer queries than any classical computation.
Deutschs Problem and the DeutschJozsa Problem. The rst indication that quantum
computation may be strictly more powerful than classical computation came with a black
box problem posed by David Deutsch: Given a onebit Boolean function f : {0, 1} {0, 1},
is f constant, that is, is f(0) = f(1)? There are four possible functions {0, 1} {0, 1}: the
constant zero function, the constant one function, the identity function, and the negation
function. Deutschs task is to determine whether f falls among the rst two or the last
two. Classically, it is clear that determining which is the case requires two queries to f,
since we need to know both f(0) and f(1). Quantally, however, we can get by with only
one query to f. Dene
+) := H0) = (0) +1))/
2, (50)
) := H1) = (0) 1))/
2, (51)
where H is the Hadamard gate. The states +) and ) correspond to the states +x) and
x) we dened earlier when we were discussing the Bloch sphere. If we feed these states
into U
f
like so:
80
+)
)
U
f
then the progression of states through the circuit from left to right is
+)) = (0) +1))(0) 1))/2
= (00) 01) +10) 11))/2
U
f
(0, f(0)) 0, 1 f(0)) +1, f(1)) 1, 1 f(1)))/2
=: 
out
).
If f is constant, i.e., if f(0) = f(1) = y for some y {0, 1}, then

out
) = (0, y) 0, 1 y) +1, y) 1, 1 y))/2
= (0) +1))(y) 1 y))/2
= (1)
y
(0) +1))(0) 1))/2
= (1)
y
+)).
If f is not constant, i.e., if f(0) = y = 1 f(1) for some y {0, 1}, then

out
) = (0, y) 0, 1 y) +1, 1 y) 1, y))/2
= (0) 1))(y) 1 y))/2
= (1)
y
(0) 1))(0) 1))/2
= (1)
y
)).
Now suppose we apply the Hadamard gate H to the rst qubit of 
out
). We obtain
) := (HI)
out
) =
0)) if f is contant,
1)) if f is not constant.
So now we measure the rst qubit of ). We get 0 with certainty if f is constant, and we
get 1 with certainty otherwise. We can prepare the initial state +)) by applying two
Hadamards and a Pauli X gate. The full circuit is in Figure 7. We only use the fgate once,
but in superposition. That is the key point.
Deutsch and Jozsa generalized this idea to a function f : {0, 1}
n
{0, 1} with n inputs
and one output. The corresponding (n + 1)qubit U
f
gate looks like
.
.
. x
x
y f(x) y
.
.
.
U
f
81
0)
0)
U
f
X
H
H
H
Figure 7: The full circuit for Deutschs problem. The second qubit is not used after it
emerges from the fgate.
We say that f is balanced if the number of inputs x such that f(x) = 0 is equal to the number
of inputs x such that f(x) = 1, namely, 2
n1
. The DeutschJozsa problem is as follows:
We are given f as above as a blackbox gate, and we know (we are promised) that f is
either constant or balanced, and we want to determine which is the case. Answering this
question classically requires 2
n1
+1 queries to f in the worst case, since it is possible that
f is balanced but the rst 2
n1
queries may all yield the same answer. Quantally, we can
do much better; one query to f suces.
The setup is similar to what we just did, but instead of using an (n + 1)qubit fgate
directly, it is easier to work with an nqubit inversion fgate I
f
dened as follows for every
x {0, 1}
n
:
I
f
x) = (1)
f(x)
x).
That is, I
f
leaves the values of the qubits alone but ips the sign i f(x) = 1. Weve dened
I
f
on computational basis vectors. Since I
f
is linear, this denes I
f
on all vectors in the
state space of n qubits. I
f
can be implemented cleanly (and easily) using U
f
thus:
0)
.
.
.
.
.
.
.
.
.
.
.
. =
0) X
I
f
X
U
f
H H
For any input state x) where x {0, 1}
n
, the progression of states through the circuit from
left to right is
x, 0)
X
x, 1)
H
x))
U
f
x)(f(x)) 1 f(x)))
2
= (1)
f(x)
x)(0) 1))/
2
= (1)
f(x)
x))
H
(1)
f(x)
x)1)
X
(1)
f(x)
x, 0)
82
as advertized. Since only one fgate is used to implement I
f
, each occurrence of I
f
in a
circuit amounts to one occurrence of U
f
in the circuit.
To determine whether f is constant or balanced, we use the following nqubit circuit:
.
.
.
.
.
.
.
.
.
.
.
.
0)
0) H H
I
f
H H
The dots indicate that all n qubits start in state 0), a Hadamard gate is applied to each
qubit before and after I
f
, and all qubits are measured at the end. Before we view the
progression of states, lets see what happens when we apply a column of n Hadamard
gates all at once to n qubits in the state x), for any x = x
1
x
2
x
n
{0, 1}
n
. (We denote
the nfold Hadamard operator as H
n
.) Noting that, for all b {0, 1},
Hb) =
1
2
(0) + (1)
b
1)) =
1
c{0,1}
(1)
bc
c),
we get
x)
H
n
i=1
Hx
i
)
=
1
2
n/2
n
i=1
y
i
{0,1}
(1)
x
i
y
i
y
i
)
=
1
2
n/2
y
1
{0,1}
y
n
{0,1}
(1)
x
1
y
1
++x
n
y
n
y
1
) y
n
)
=
1
2
n/2
y{0,1}
n
(1)
xy
y),
where x y = x
1
y
1
+ + x
n
y
n
denotes the standard dot product of two nbit vectors
x = x
1
x
n
and y = y
1
y
n
.
Now lets view the progression of states of the circuit above.
00 0)
H
n
1
2
n/2
x{0,1}
n
x) (because (00 0) x = 0) (52)
I
f
1
2
n/2
x{0,1}
n
(1)
f(x)
x) (53)
83
H
n
1
2
n
x{0,1}
n
(1)
f(x)
y{0,1}
n
(1)
xy
y) (54)
=
1
2
n
x,y{0,1}
n
(1)
f(x)+xy
y) (55)
=
1
2
n
y{0,1}
n
_
_
x{0,1}
n
(1)
f(x)+xy
_
_
y) (56)
Suppose rst that f is constant, and we let 
const
) denote this last state. Then (1)
f(x)
=
1 independent of x, and so we can bring it out side the sum:

const
) =
1
2
n
y{0,1}
n
_
_
x{0,1}
n
(1)
xy
_
_
y)
=
1
2
n
_
x
(1)
0
_
0
n
)
1
2
n
y0
n
_
x
(1)
xy
_
y)
= 0
n
)
1
2
n
y0
n
_
x
(1)
xy
_
y).
Since
1 =
const

const
) = 1 +
1
2
2n
y0
n
x
(1)
xy
2
,
we must have
x
(1)
xy
= 0 for all y 0
n
,
11
and thus

const
) = 0
n
).
When we measure the qubits in state 
const
), we will see 0
n
with certainty.
Now suppose that f is balanced, and we let 
bal
) denote the state of (56). Again
separating the 0
n
)term from the rest, we get

bal
) =
1
2
n
_
_
x{0,1}
n
(1)
f(x)
_
_
0
n
) +
1
2
n
y0
n
_
_
x{0,1}
n
(1)
f(x)+xy
_
_
y).
But f is balanced, and so
x
(1)
f(x)
= 0 because each term contributes +1 for f(x) = 0
and 1 for f(x) = 1. Thus,

bal
) =
1
2
n
y0
n
_
_
x{0,1}
n
(1)
f(x)+xy
_
_
y).
11
Heres another way to see that
x
(1)
xy
= 0 for all y 0
n
: If y 0
n
, then one of ys bits is 1. For
convenience, lets assume that the rst bit of y is 1, and we let y
x
1
{0,1}
{0,1}
n1 (1)
x
1
x
1y
x
1
(1)
x
1
x
(1)
x
x
(1)
x
x
(1)
x
= 0.
84
When we measure the qubits in state 
bal
), we see 0
n
with probability zero. So we never
see 0
n
, but instead well see some random y 0
n
.
To summarize: when we measure the qubits, if we see 0
n
, then we know that f is
constant; if we see anything else, then we know that f is balanced.
Exercise 13.1 (Challenging) Let f : {0, 1}
n
{0, 1} be a Boolean function. Weve imple
mented an I
f
gate using U
f
and a few standard gates. Show how to implement U
f
given
a single I
f
gate and some standard gates. Thus U
f
and I
f
are computationally equivalent.
Your circuit is allowed to depend on the value of f(0
n
), that is, you can have one circuit
that works assuming f(0
n
) = 0 andanother (slightly dierent) circuit that works assuming
f(0
n
) = 1. [Hint: Build a quantum circuit with three registers: n input qubits; one output
qubit; n ancilla qubits. Assume x {0, 1}
n
is the classical input. Using one Hadamard
and n Tooli gates, convert the input state x, y, 0
n
) into the superposition
x, 0, 0
n
) + (1)
y
x, 1, x)
2
.
Then feed the ancilla register into I
f
, then undo what you did before applying I
f
. What
state do you wind up with? What else do you need to do, if anything?]
Exercise 13.2 Here are some more circuit equalities for you to verify. Remember that
circuits represent linear operators, and thus to show that two circuits are equal, it suces
to show that they act the same on the vectors of some basis, e.g., the computational basis.
1. Check that
=
T
S T T
H T
S
T T
Combining this with item1 above gives a circuit implementing the Tooli gate using
only CNOT, H, T, and T
explicitly by using
T
7
instead, because T
8
= I). The Nielsen & Chuang textbook has a closely similar
implementation of the Tooli gate on page 182, but its not optimal; it has one more
gate than is necessary. [Hint: It will help rst to transform this equation into an
equivalent one by applying H gates on the third qubit to both sides of both circuits,
i.e., unitarily conjugating both sides of the equation by I I H. This has the eect
of canceling out both the H gates on the righthand circuit, and the lefthand side
becomes
=
H H Z
which ips the overall sign of the state (i.e., gives an e
i
= 1 phase change) i all
three qubits are 1. The advantage of doing this is that now nothing in the righthand
circuit creates any superpositions; each gate maps a computational basis state to a
computational basis state, up to a phase factor. Now proceed by cases, considering
the possible 0, 1combinations of the values of the three qubits, adding up the overall
phase angles generated. You can simplify the task further by noticing a few general
facts:
A 0 on the control qubit of a CNOT gate eliminates the gate.
Adjacent T and T
U
86
Furthermore, you are restricted to expressing Uas the product of a sequence of operators,
all of which are either H or T. [Hint: You are trying to nd a U such that UXU
= H.
X gives a rotation of the Bloch sphere about the xaxis, and H gives a rotation about
the line through the point with spherical coordinates (/4, 0) (Cartesian coordinates
(1/
2, 0, 1/
(applied rst) moves to the xaxis, then X (applied second) rotates around the
xaxis, then U (applied last) moves the xaxis back to , the net eect of all three being a
rotation about . One possibility for U is a (/4)rotation about the yaxis, but you
must implement this using just H and T, the latter giving a /4rotation about the zaxis.]
14 Week 7: Simons problem
Simons Problem. The DeutschJozsa problem is hard to decide classically, requiring
exponentially many (in n) queries to f. But there is a sense in which this problem is easy
classically: if we pick inputs to f at random and query f on those inputs, we quickly learn
the right answer with high probability. If we ever see f output dierent values, then we
know for certain that f is balanced, since it is nonconstant. Conversely, if f is balanced
and we make 100 random queries to f, then the chances that f gives the same answer to all
our queries is exceedingly small2
99
. So we have an ecient randomized algorithm for
nding the answer: Make m uniformly and independently random queries to f, where
m is, say, 100. If the answers are all the same, output constant; otherwise, output
balanced. We will never output balanced incorrectly. We might output constant
incorrectly, but only with probability 2
1m
, i.e., exponentially small in m. This algorithm
runs in time polynomial in n and m.
As with classical computation, quantum circuits can simulate classical randomized
computation. We wont pursue that line further here, though. Instead, well now see a
blackbox problemSimons problemthat
can be solved ecently with high probability on a quantum computer, but
cannot be solved eciently by a classical computer, even by a randomized algorithm
that is allowed a probability of error slightly below 1/2.
In Simons problem, we are given a blackbox Boolean function f : {0, 1}
n
{0, 1}
m
,
for some n m. We are also given the promise that there is an s {0, 1}
n
such that for all
distinct x, y {0, 1}
n
,
f(x) = f(y) x y = s.
This condition determines s uniquely: either s = 0
n
and f is onetoone, or s 0
n
in which
case f is twotoone with f(x) = f(x s) for all x, and s is the unique nonzero input such
that f(s) = f(0). Our task is to nd s.
87
The function f is given to us via the gate U
f
as before, such that U
f
x, y) = x, y f(x))
for all x {0, 1}
n
and y {0, 1}
m
. Consider the following quantum algorithm with two
quantum registersan nqubit input register and an mqubit output register.
1. We start with the two registers in the allzero state 0
n
, 0
m
).
2. We then apply H
n
to the input register, obtaining the state 2
n/2
x{0,1}
n x, 0
m
).
3. We then apply U
f
to get the new state 2
n/2
x{0,1}
n x, f(x)).
4. We apply H
n
to the rst register again to get the state

out
) = 2
n
x,y{0,1}
n
(1)
xy
y, f(x)).
5. We now measure the rst register (all n qubits), obtaining some value y {0, 1}
n
.
Exercise 14.1 Draw the quantum circuit implementing the algorithm above.
What z do we get in the last step? Note that f(x) = f(x s) for all x, and that as x
ranges through all of {0, 1}
n
, so does x s. Thus we can rewrite 
out
) as a split sum and
combine terms in pairs:

out
) =
1
2
(
out
) +
out
))
= 2
n1
_
x,y
(1)
xy
y, f(x)) +
x,y
(1)
(xs)y
y, f(x s))
_
= 2
n1
_
x,y
(1)
xy
y, f(x)) +
x,y
(1)
(xs)y
y, f(x))
_
= 2
n1
x,y
[(1)
xy
+ (1)
xy+sy
] y, f(x))
= 2
n1
x,y
(1)
xy
[1 + (1)
sy
] y, f(x))
= 2
n
x,y : s y is even
(1)
xy
y, f(x)).
The basis states y, f(x)) for whichsyis oddcancel out, andwe are left witha superposition
of only states where s y is even, with probability amplitudes diering only by a phase
factor. So in Step 5 we will see an arbitrary such y {0, 1}
n
, uniformly at random. If
s = 0
n
, then s y is even for all y, so each y {0, 1}
n
will be seen with probability 2
n
. If
s 0, then s y is even for exactly half of the y {0, 1}
n
, each of which will be seen with
probability 2
1n
.
88
How does this help us nd s? If s 0
n
and we get some y in Step 5, then we know
that s y is even, which eliminates half the possibilities for s. Repeating the algorithm
will give us some y
= 0 and 1
of a subspace
V of some bit vectors space A. If A has dimension n and V A is a subspace of
dimension k, then V
= V as before.
Not everything works the same over Z
2
as over C. Here are some dierences:
An ndimensional vector space over Z
2
is nite, with exactly 2
n
elements, one for
each possible linear combination of the basis vectors
There is no notion of positive denite. We can have x x = 0 but x 0 (i.e., x has a
positive but even number of 1s). The norm of a vector cannot be dened in the same
way as with C, however, a useful normlike quantity associated with each bit vector
x is the number of 1s in x, known as the Hamming weight of x and denoted wt(x).
The concept of unit vector and orthonormal basis dont work over Z
2
like they do
over C, and there is no GramSchmidt procedure.
Mutually orthogonal subspaces may have nonzero vectors in their intersection.
Indeed, it may be the case that V V
for nontrivial V.
Z
2
is not algebraically closed. This means, for example, that a square matrix may
not have any eigenvectors or eigenvalues.
Exercise 14.2 Let
A =
_
_
0 1 0
0 0 1
1 0 0
_
_
and B =
_
_
1 1 0
1 0 1
0 1 1
_
_
be bit matrices. Compute AB, tr A, tr B, det A, and det B. All arithmetic is in Z
2
.
Exercise 14.3 Findthe two 22 matrices over Z
2
that have no eigenvalues or eigenvectors.
(Challenging) Prove that there are only two.
Let A be an m n matrix (over any eld F). The rank of A, denoted rankA, is the
maximum number of linearly independent columns of A (or rowsit does not matter).
Equivalently, it is the dimension of the span of the columns of A (or rowsit does not
90
matter). An m n matrix A has full rank if rankA = min(m, n). A square matrix is
invertible if and only if it has full rank. The kernal of A, denoted ker A, is the set of column
vectors v F
n
such that Av = 0. The kernal of A is a subspace of F
n
. Its dimension is
known as the nullity of A. A standard theorem in linear algebra is that the sum of the
rank and the nullity of A is equal to the number of columns of A, i.e., n. The rank of any
given bit matrix A is easy to compute; you can use Gaussian elimination, for example. If
the nullity of A is positive, it is also easy to nd a nonzero bit vector v such that Av = 0
(the righthand side is the zero vector (a bit vector)).
Back to Simons Problem. If we run the quantum algorithm above k times for some
k n, we get k independent, uniformly random vectors y
1
, . . . , y
k
Z
n
2
such that the
following k linear equations hold:
y
1
s = 0,
.
.
.
y
k
s = 0.
Let A be the k n bit matrix whose rows are the y
i
. Then the above can be expressed as
the single equation As = 0, where 0 denotes the zero vector in Z
k
2
. Thus, s ker A.
The whole solution to Simons problemis as follows: Run the algorithmabove ntimes,
obtaining y
1
, . . . , y
n
Z
n
2
. Let A be the n n bit matrix whose rows are the y
i
.
1. If rankA < n 1, then give up (i.e., output I dont know).
2. If rankA = n, i.e., if A is invertible, then output 0.
3. Otherwise, rankA = n 1. Find the unique s 0 such that As = 0. Using the
U
f
gate two more times, compute f(0) and f(s). If they are equal, then output s;
otherwise, output 0.
Several things need explaining here. For one thing, the algorithm may fail to nd
s, outputting I dont know. Well see that this is reasonably unlikely to happen. For
another thing, if we nd that Ais invertible in Step 2, then we know that s = A
1
0 = 0, so
our output is correct. Finally, in Step 3 we know that an s exists and is unique: the nullity
of A is nrankA = n (n1) = 1, so ker A is a onedimensional space, which thus has
2
1
= 2 elements, one of which is the zero vector. The nal check is to determine which
of these is the correct output. So if the algorithm does output an answer, that answer is
always correct. Such a randomized algorithm (with low failure probability) is called a Las
Vegas algorithm, as opposed to a Monte Carlo algorithm which is allowed to give a wrong
answer with low probability.
What are the chances of the algorithmfailing? If the algorithmfails, thenrank A < n1,
which certainly implies that the matrix formed from rst n 1 rows of A has rank less
91
than n 1. So if we bound the latter probability, we bound the probability of failure.
For 1 k n, let A
k
be the bit matrix formed from the rst k rows of A. Each row of
A is a uniformly random bit vector in the space S = {0, s}
k=2
Pr[rankA
k
= k  rankA
k1
= k 1].
Clearly, rankA
1
= 1 i its row is a nonzero bit vector in S, and so
Pr[rankA
1
= 1] =
S 1
S
2
n1
1
2
n1
= 1 2
1n
.
Now what is Pr[rankA
k
= k  rankA
k1
= k 1] for k > 1? If rank A
k1
= k 1, then
the rows of A
k1
are linearly independent, and thus span a (k 1)dimensional subspace
of D S that has 2
k1
elements. Assuming this, A
k
will have full rank i its last row is
linearly independent of the other rows, i.e., the last row is an element of S D. Thus,
Pr[rankA
k
= k  rankA
k1
= k 1] =
S D
S
2
n1
2
k1
2
n1
= 1 2
kn
.
Putting this together, we have
Pr[rankA
n1
= n 1]
n1
k=1
(1 2
kn
) =
n1
k=1
(1 2
k
) = p
n1
,
where we dene
p
m
:=
m
k=1
(1 2
k
) (57)
for all m 0. Clearly, 1 = p
0
> p
1
> > p
n
> > 0, and it can be shown that if
p := lim
m
p
n
, then 1/4 < p < 1/3. Thus the chances are better than 1/4 that A
n1
will
have full rank, and so the algorithm will fail with probability less than 3/4. This seems
high, but if we repeat the whole process r times independently, then the chances that we
will fail onall r trials is less than(3/4)
r
, whichgoes to zero exponentially inr. The expected
number of trials necessary to succeed at least once is thus at most
r=1
(r/4)(3/4)
r1
= 4.
Shors Algorithm for Factoring. In the early 1990s, Peter Shor showed how to factor an
integer Nona quantumcomputer intime polynomial inlgN(whichis roughlythe number
of bits needed to represent N in binary). All known classical algorithms for factoring run
92
exponentially slower than this (depending on your denition of exponentially slower).
Although it has not been shown that no fast classical factorization algorithm exists, it is
widely believed that this is the case (and RSA security depends on this being the case).
Shors algorithm is the single most important quantum algorithm to date, because of
its implications for public key cryptography. Using similar techniques, Shor also gave
quantum algorithms for quickly solving the discrete logarithm problem, which also has
cryptographic (actually cryptanalytical) implications. To do Shors algorithm correctly,
we need a couple more mathematical detours.
Modular Arithmetic. If a and m are integers and m > 0, then we can divide a by m
and get two integer resultsquotient and remainder. Put another way, there are unique
integers q, r such that 0 r < m and a = qm + r. We let a mod m denote the number
r. For any integer m > 1, we let Z
m
= {0, 1, . . . , m 1} = {a mod m : a Z}, and we
dene addition and multiplication in Z
m
just as in Z except that we take the result mod
m. Our previous discussion about Z
2
is a special case of this. Arithmetic in Z
m
resembles
arithmetic in Z in several ways:
Both operations are associative and commutative.
Multiplication distributes over addition, i.e., x(y +z) = xy +xz in Z
m
.
0 is the additive identity, and 1 is the multiplicative identity of Z
m
.
A unique additive inverse (negation) x Z
m
exists for each element x Z
m
, such
that x + (x) = 0. In fact, 0 = 0, and x = m x if x 0. Clearly, (x) = x,
and (x)y = xy in Z
m
. Subtraction is dened as addition of the negation as usual:
x y = x + (y).
A multiplicative inverse (reciprocal) may or may not exist for any given element
x Z
m
(that is, a b Z
m
such that xb = 1 in Z
m
). If it does, it is unique and written
x
1
or 1/x, and we say that x is invertible or a unit. If x is a unit, then so is x
1
,
and (x
1
)
1
= x. 0 is never a unit, but 1 and 1 are always units. Division can be
dened as multiplication by the reciprocal as usual, provided the denominator is a
unit: x/y = x(1/y), provided 1/y exists.
We dene exponentiation as usual: x
n
is the product of x with itself n times, where
x Z
m
and n Z with n > 0. We let x
0
= 1 by convention. If x is a unit, then we
can dene x
n
= (1/x)
n
as usual.
We let Z
m
be the set of all units in Z
m
. Z has only two units1 and 1but Z
m
may
have many units other than 1. The units of Z
m
are exactly those elements x that are
relatively prime to m (i.e., gcd(x, m) = 1). If m is prime, then all nonzero elements of Z
m
are units. In any case, Z
m
contains 1 and is closed under multiplication and reciprocals,
but not necessarily under addition.
93
Exercise 14.4 What is Z
30
? Pair the elements of Z
30
with their multiplicative inverses.
For any x Z
m
we dene the order of x in Z
m
to be the least r > 0 such that x
r
= 1.
Such an r must exist: The elements of the sequence 1, x, x
2
, x
3
, . . . are all in Z
m
, which is
nite, so by the Pigeon Hole Principle there must exist some 0 s < t such that x
s
= x
t
.
Multiplyingbothsides byx
s
, we get 1 = x
s
x
s
= x
s
x
t
= x
ts
, andincidentally, ts > 0.
Factoring Reduces to Order Finding. Shors algorithm does not factor N directly. In
stead it solves problem of nding the order of an element x Z
N
. This is enough, as we
will now see.
Let N be a large composite integer, and let x be an element of Z
N
. Suppose that you
had at your disposal a black box into which you could feed x and N, and the box would
promptly output the order of x in Z
N
. Then you could use this box to nd a nontrivial
factor of N quickly and with high probability via the following (classical!) Las Vegas
algorithm:
1. Input: a composite integer N > 0.
2. If N is even, then output 2 and quit.
3. If N = a
b
for some integers a, b 2, then output a and quit. (To see that this
can be done quickly, note that if a, b 2 and a
b
= N, then 2
b
a
b
= N and so
2 b lgN. For each b, you can try nding an integer a such that a
b
= N by
binary search.)
4. (At this point, Nis odd and not a power. This means that Nhas at least two distinct
odd prime factors, in particular, there are odd, coprime p, q > 1 such that N = pq.)
Pick a random x Z
N
.
5. Compute gcd(x, N) with the Euclidean Algorithm. If gcd(x, N) > 1, then output
gcd(x, N) and quit.
6. (At this point, x Z
N
.) Use the ordernding black box to nd the order r of x in
Z
N
.
7. If r is odd, then give up (i.e., output I dont know and quit).
8. (r is even.) Compute y = x
r/2
in Z
N
. If y = 1 (in Z
N
), then give up.
9. (y 1.) Compute gcd(y 1, N) and output the result.
Shors quantum algorithm provides the ordernding black box for this reduction.
94
15 Week 8: Factoring and order nding (cont.)
This algorithm (really a randomized reduction of Factoring to Order Finding) is clearly
ecient (polynomial time in lgN), given blackbox access to Order Finding. We need to
check two things: (i) the algorithm, if it does not give up, outputs a nontrivial factor of N,
and (ii) the probability of it giving up is not too bigat most 1 for some constant ,
say.
Notation 15.1 For a, b Z, we let a  b mean that a divides b, or that b is a multiple of a,
precisely, there is a c Z such that ac = b. Clearly, if a > 0, then a  b i b = 0 in Z
a
. We
write a , b to mean that a does not divide b.
Anything the algorithm outputs in Steps 2, 3, or 5 is clearly correct. The only other
output step is Step 9. We claim that gcd(y 1, N) is a nontrivial factor of N: We have
y 1 in Z
N
by assumption, or equivalently, N , y+1. Also, y 1 in Z
N
, since otherwise
x
r/2
= 1 in Z
N
, which contradicts the fact that r is the least such exponent. Thus N , y1.
Yet we have y
2
= x
r
= 1 in Z
N
, which means that N  y
2
1 = (y + 1)(y 1). So N
divides (y+1)(y1) but neither of its two factors. The only way this can happen is when
y + 1 includes some, but not all, of the prime factors of N, and likewise with y 1. Thus
1 < gcd(y 1, N) < N, and so we output a nontrivial factor of N in Step 9.
The algorithm could give up in Steps 7 or 8. Giving up in Step 7 means that r is odd.
We show that at most half the elements of Z
N
have odd order, and so the algorithm gives
up in Step 7 with probability at most 1/2. In fact, we show that if x Z
N
has odd order r,
then x in Z
N
(which is also in Z
N
) has order 2r. So at least one element of each pair x
has even order, and so were done since Z
N
is made up of such disjoint pairs. First, we
have
(x)
2r
= (1)
2r
x
2r
= ((1)
2
)
r
(x
r
)
2
= 1
r
1
2
= 1,
where all arithmetic is in Z
N
. So x has order at most 2r. Now suppose that x has order
s < 2r. Then 1 = (x)
s
= (1)
s
x
s
. We must have s r, for otherwise this would become
1 = (1)
r
x
r
= (1)
r
= 1, since r is odd (and since N > 2, we have 1 1 in Z
N
). Now
since 0 < s < 2r but s r, we must have x
s
1, and because (1)
s
x
s
= 1, we cannot have
(1)
s
= 1. Thus (1)
s
= x
s
= 1. But now,
1 = (1)
r
= (x
s
)
r
= x
rs
= (x
r
)
s
= 1
s
= 1,
contradiction. Therefore, x has order 2r.
We claim that, if the algorithm makes it to Step 8, then it gives up in this step at
most half the time. We wont prove the claim, since that would get us too much into
number theoretic waters, but well give some reasonable evidence that it is true. Recall
that by Step 8, we have N = pq, where p and q are odd and coprime. Dene a map
d : Z
N
Z
p
Z
q
such that d(x) = (x mod p, x mod q) for all x Z
N
. Here are some
95
easytoprove facts about d. To avoid confusion, for any n > 1 well use +
n
and
n
to
denote addition in Z
n
and multiplication in Z
n
, respectively. Let x, y Z
N
be arbitrary,
and let d(x) = (x
1
, x
2
) and d(y) = (y
1
, y
2
).
d(x +
N
y) = (x
1
+
p
y
1
, x
2
+
q
y
2
).
d(x
N
y) = (x
1
p
y
1
, x
2
q
y
2
).
d(1) = (1, 1).
d(1) = (1, 1). More generally, d(x) = (x
1
, x
2
).
x Z
N
if and only if x
1
Z
p
and x
2
Z
q
.
It turns out that d is a bijection fromZ
N
to Z
p
Z
q
. This is a consequence of the following
classic theorem in number theory:
Theorem 15.2 (Chinese Remainder Theorem (dyadic version)) Let p, q > 0 be coprime
and let N = pq. Dene d : Z
N
Z
p
Z
q
by d(x) = (x mod p, x mod q). Then d is
a bijection, i.e., for every x
1
Z
p
and x
2
Z
q
, there exists a unique x Z
N
such that
d(x) = (x
1
, x
2
).
Ill include the proof here for you to read on your own if you want, but I wont present
it in class.
Proof. Set p = p mod q and q = q mod p. Since gcd(p, q) = 1, we also have gcd( p, q) =
gcd(p, q) = 1, and hence p Z
q
and q Z
p
. Let p
1
and q
1
be the reciprocals of p
in Z
q
and of q in Z
p
, respectively. Given any x
1
Z
p
and x
2
Z
q
, let x = (x
1
q
1
q +
x
2
p
1
p) mod N (normal arithmetic in Z). Clearly, x Z
N
. Then letting d(x) = (y
1
, y
2
),
we get
y
1
= [(x
1
q
1
q +x
2
p
1
p) mod N] mod p
= (x
1
q
1
q +x
2
p
1
p) mod p
= x
1
q
1
q mod p
= x
1
q
1
q mod p
= x
1
mod p
= x
1
,
and similarly,
y
2
= [(x
1
q
1
q +x
2
p
1
p) mod N] mod q
= (x
1
q
1
q +x
2
p
1
p) mod q
= x
2
p
1
p mod q
= x
2
p
1
p mod q
= x
2
mod q
= x
2
.
96
Thus d(x) = (x
1
, x
2
), whichproves that dis surjective. Tosee that dis injective, let x, y Z
N
be such that d(x) = d(y) = (x
1
, x
2
). Then d(x
N
y) = (x
1
p
x
1
, x
2
q
x
2
) = (0, 0), and so
we have (x y) mod p = (x y) mod q = 0, or equivalently, p  x y and q  x y. But
since p and q are coprime, we must have N  x y, and so,
x = x mod N = y mod N = y,
which shows that d is an injection. 2
We wont discuss it here, but given x
1
, x
2
, one can quickly (and classically) compute
inverses in Z
n
, and thus nd the unique x such that d(x) = (x
1
, x
2
), using the Extended
Euclidean Algorithm.
Exercise 15.3 In this exercise, you will prove some standard results about the cardinality
of Z
n
for any integer n > 1. For any such n, the Euler totient function is dened as
(n) := Z
n
,
which is the number of elements of Z
n
that are relatively prime to n. By convention,
(1) := 1.
1. Show that if a, b > 0 are coprime, then (ab) = (a)(b). [Hint: Show that the
bijection d dened in Theorem 15.2 above (with p = a and q = b) matches elements
of Z
ab
with elements of Z
a
Z
b
and vice versa.]
2. Show that if n is some power of a prime p, then (n) = n(p 1)/p. [Hint: An
element x Z
n
is relatively prime to n i x is not a multiple of p.]
3. Conclude that if n = q
e
1
1
q
e
2
2
q
e
k
k
is the prime factorization of n, where q
1
< q
2
<
. . . < q
k
are all prime and e
1
, e
2
, . . . , e
k
> 0, then
(n) =
k
j=1
q
e
j
1
j
(q
j
1).
Dividing this by n, we get
(n)
n
=
k
j=1
_
1
1
q
j
_
. (58)
4. Using Equation (58), prove that for all integers n > 0,
(n)
n
1
1 + lgn
. (59)
[Hint: Use the fact that if n = q
e
1
1
q
e
2
2
q
e
k
k
is the prime factorization of n as above,
then k lgn (why?) and the fact that q
j
j + 1 for all 1 j k (why?).]
97
Exercise 15.4 (Challenging) Show that (n)/n 1/ lgn for all integers n > 1 except 2
and 6. [Hint: For > 0, let n
)/n
1/ lgn
N
with
even order, then x
r/2
is at least as likely to be one of the other square roots of 1 than 1,
where r is the order of x. Thus Step 8 gives up with probability at most 1/2.
So the whole reduction succeeds in outputting a nontrivial factor of Nwith probability
at least 1/4. As with Simons algorithm, we can expect to run this reduction about four
times to nd such a factor. Running it additional times decreases the likelihood of failure
exponentially.
Geometric series. This elementary fact will be useful in what is to come.
Proposition 15.5 For any r C such that r 1, and for any integer n 0,
n1
i=0
r
i
=
r
n
1
r 1
.
You can prove this by induction on n. If n = 0, then both sides are 0. Now assume the
equation holds for xed n 0. Then
n
i=0
r
i
= r
n
+
n1
i=0
r
i
= r
n
+
r
n
1
r 1
=
r
n
(r 1) +r
n
1
r 1
=
r
n+1
1
r 1
.
The sum
n1
i=0
r
i
is called a nite geometric series with ratio r.
The Quantum Fourier Transform. The Fourier transform is of fundamental importance
in many areas of science, math, and engineering. For example, it is used in signal pro
cessing to pick out component frequencies in a periodic signal (and we will see how this
applies to Shors ordernding algorithm). The auditory canal inside your ear acts as
a natural Fourier transformer, allowing your brain to register dierent frequencies (of
musical notes, say) inherent in the sound waves entering the ear.
98
A quantum version of the Fourier transform, known as the quantum Fourier transform
or QFT, is a crucial ingredient in Shors algorithm.
Let m > 1 be an integer. We will dene the mdimensional discrete Fourier transform
12
DFT
m
is a linear map C
m
C
m
that takes a vector x = (x
0
, . . . , x
m1
) C
m
and maps it
to the vector y = (y
0
, . . . , y
m1
) C
m
satisfying
y
j
=
1
m
m1
k=0
e
2ijk/m
x
k
for all 0 j < m.
13
Set
m
:= e
2i/m
. Clearly, m is the least positive integer such that
m
m
= 1. We call
m
the principal mth root of unity. Note that
a
m
=
a mod m
m
for any
a Z, so we can consider the exponent of
m
to be an element of Z
m
.
The matrix corresponding to DFT
m
is the m m matrix whose (j, k)th entry is
[DFT
m
]
jk
=
jk
m
/
DFT
m
has diagonal
entries 1 and odiagonal entries 0. For general j, k, we have
[(DFT
m
)
DFT
m
]
jk
=
1
m
Z
m
j
m
k
m
=
1
m
Z
m
(kj)
m
. (60)
If j = k, then the righthand side is (1/m)
Z
m
1 = 1. Now suppose j k. Then
0 < k j < m, and so
d
m
1, where d = k j. To see that the sum on the righthand
side of (60) is 0, notice that it is a nite geometric series with ratio
d
m
1, and so we have
Z
m
(
d
m
)
=
(
d
m
)
m
1
d
m
1
=
(
m
m
)
d
1
d
m
1
= 0 ,
because (
m
m
)
d
= (
0
m
)
d
= 1
d
= 1.
Naively applying DFT
m
to a vector in C
m
requires (m
2
) scalar arithmetic opera
tions. A much faster method, known as the Fast Fourier Transform (FFT), can do this with
O(mlgm) scalar arithmetic operations. The FFT was described by Cooley & Tukey in
1965, but the same idea can be traced back to Gauss. It uses divideandconquer, and
is easiest to describe when m is a power of 2. The FFT is also easily parallelizable: it
can be computed by an arithmetic circuit of width m and depth lg m called a buttery
network. Because of this, the FFT has been rated as the second most useful algorithm ever,
second only to fast sorting. Besides its use in digital signal processing, it is also used to
12
There are continuous versions of the Fourier transform.
13
There is some variation in the denition of DFT
m
in dierent sources; for example, there may be a
minus sign in the exponent of e, or there may be no factor 1/
yZ
2
n
e
n
(xy)y).
Interestingly, this sum factors completely.
QFT
n
x) =
1
2
n/2
( 0) +e
1
(x)1) ) ( 0) +e
2
(x)1) ) ( 0) +e
n
(x)1) )
=
1
2
n/2
n
k=1
( 0) +e
k
(x)1) ).
Exercise 15.6 (Challenging) Verify this fact.
Before we describe a circuit for QFT
n
, we will sketch out and analyze Shors quantum
algorithm for ordernding, which is a Monte Carlo algorithm. This description and the
QFT
n
circuit layout later on are adapted with modications from a paper by Cleve &
Watrous in 2000.
1. Input: N > 1 and a Z
N
with a > 1. (The algorithm attempts to nd the order of a
in Z
N
.) Let n = lgN.
2. Initialize a 2nqubit register and an nqubit register in the state 0)0). Here we will
label the basis states of a register with nonnegative integers via their usual binary
representations.
3. Apply a Hadamard gate to each qubit of the rst register, obtaining the state
(H
2n
I)0)0) =
1
2
n
xZ
2
2n
x)0).
100
4. Apply a classical quantum circuit for modular exponentiation that sends x)0) to
x)a
x
mod N), obtaining the state
) =
1
2
n
xZ
2
2n
x)a
x
mod N) . (61)
(We can imagine that N and a are hardcoded into the circuit, which means that the
circuit must be built in a preprocessing step after the inputs N and a are known.
Alternatively, we can keep Nand a in separate quantum registers that dont change
during the course of the computation, then feed them into this circuit when theyre
needed.)
5. (Optional) Measure the second register in the computational basis, obtaining some
classical value w Z
N
, which is ignored.
14
6. Apply QFT
2n
to the rst register.
7. Measure the rst register (inthe computational basis), obtainingsome value y Z
2
2n.
(This ends the quantum part of the algorithm.)
8. Find the smallest coprime integers k and r > 0 such that
y
2
2n
k
r
2
2n1
. (62)
(This can be done classically using continued fractions. See below.)
9. Classically compute a
r
mod N. If the result is 1, then output r. Otherwise, give up.
16 Week 8: Shors algorithm (cont.)
Let R be the order of a in Z
N
. The whole key to proving that Shors algorithm works is to
show that in Step 9 the algorithm outputs R with high probability. First, well show that a
single run of the algorithmabove outputs R with probability at least 4/(
2
n)O(n
2
), and
so if we run the algorithm about
2
n/4 times, we will succeed with high probability. The
actual singlerun success probability is usually much higher than 4/(
2
n), but 4/(
2
n) is
a good enough approximate lower bound, and it is easier to derive than a tighter lower
bound. After the analysis, well discuss how the quantum Fourier transform and the
(classical) continued fraction algorithm used in Step 8 are implemented.
14
Since we ignore the result of the measurement, this step is entirely superuous; the algorithm would
would just as well without it. Including this step, however, collapses the state, which simplies the analysis
greatly and allows us to ignore the second register altogether.
101
Shors algorithm, if it succeeds, will be guaranteed to output some r > 0 such that
a
r
= 1 in Z
N
. It is possiblealthough very unlikelythat r is a multiple of R, but not
equal to it. If we run the algorithm until it succeeds k times and take the gcd of the
k results, then the chances of not getting R are at most (1 4/(
2
n))
k
, which decrease
exponentially with k. If we only want to nd a nontrivial factor of N, then we use this
algorithm to implement the black box in the FactoringtoOrderFinding reduction. As
the next exercise shows, we dont need to worry about the value returned by the black
box if it succeeds.
Exercise 16.1 Suppose that on input N and x Z
N
, the black box used in the reduction
from Factoring to Order Finding is only guaranteed to output some r with 0 < r < 2
2n
such that x
r
= 1 in Z
N
, where n = lgN. Show how to modify the reduction slightly so
that it succeeds with the same probability as it did before when the box always outputted
the order of x in Z
N
. [Hint: Let R be the order of x in Z
N
. First, given any multiple r of R,
show how to nd an odd multiple of R (that is, a number of the form cR where c is odd)
that is no bigger than r. Second, show that the probability of success of the reduction is
the same if the black box returns some odd multiple of R.]
Analysis of Shors Algorithm. Let R be the order of a in Z
N
. We can express x uniquely
as qR +s with s Z
R
and note that, owing to the periodicity of a
x
mod N,
a
x
mod N = a
qR+s
mod N = (a
R
)
q
a
s
mod N = 1
q
a
s
mod N = a
s
mod N .
Then letting
M :=
_
2
2n
R
_
,
we rewrite the state ) of Equation (61) as
) =
1
2
n
_
_
qZ
M
sZ
R
qR +s)a
s
mod N) +
(2
2n
mod R)1
s=0
MR +s)a
s
mod N)
_
_
.
Nowwhenthe secondregister is measuredinthe next step, we obtainsome w = a
s
mod N
corresponding to some unique s Z
R
. The state after this measurement then collapses to
either
1
M+ 1
_
_
qZ
M+1
qR +s)
_
_
w) ,
if 0 s < 2
2n
mod R, or to
1
M
_
qZ
M
qR +s)
_
w) ,
102
if 2
2n
mod R s < R. It does not really matter which is the case, as the analysis is nearly
identical and the conclusions (particularly Corollary 16.4, below) are the same either way,
so for simplicity, well assume the latter case applies.
15
Also, the second register will no
longer participate in the algorithm, so we can ingore it from now on. To summarize, the
postmeasurement state of the rst register is then given as
) =
1
qZ
M
qR +s) . (63)
The next step of the algorithm applies QFT
2n
to this state to obtain
) = QFT
2n
) =
1
qZ
M
QFT
2n
qR +s) (64)
=
1
2
n
qZ
M
yZ
2
2n
e
2n
((qR +s)y)y) (65)
=
1
2
n
y
_
q
e
2n
((qR +s)y)
_
y) (66)
=
1
2
n
y
e
2n
(sy)
_
q
e
2n
(qRy)
_
y) . (67)
Finally, the rst register is measured, obtaining y Z
2
2n with probability
Pr[y] =
e
2n
(sy)
2
n
q
e
2n
(qRy)
2
=
e
2n
(sy)
2
2
2n
M
q
e
2n
(qRy)
2
=
1
2
2n
M
qZ
M
e
2n
(qRy)
2
.
Well show that Pr[y] spikes when y/2
2n
is close to a multiple of 1/R, but rst some
intuition. Permit me an acoustical analogy. Think of the column vector ) as a periodic
signal with period R, i.e., the entries at indices x = qR + s (for integral q) have value
1/
M, and all the other entries are 0. The frequency of this signal is then 1/R, and since
the Fourier transform is good at picking out frequencies, wed expect to see a spike
in the probability amplitude of the Fourier transformed state ) of Equation (67) right
around the frequencies 1/R, 2/R, 3/R, . . . , with 1/R being the fundamental component
of the signal and the others being overtones. This is exactly what happens, and it is the
whole point of using the QFT. The larger the signal sample, the sharper and narrower the
spikes will be. We choose a sample of length 2
2n
, which is at least N
2
, giving us at least
N
2
/R N periods of the function. This turns out to give us suciently sharp spikes to
approximate R with high probability.
15
For the former case, just substitute M+ 1 for M in the analysis to follow.
103
We now concentrate on the scalar quantity
qZ
M
e
2n
(qRy) (68)
in the expression for Pr[y], above. For every y Z
2
2n dene
s
y
:=
Ry mod 2
2n
if Ry mod 2
2n
2
2n1
,
(Ry mod 2
2n
) 2
2n
if Ry mod 2
2n
> 2
2n1
.
That is, s
y
is the remainder of Ry divided by 2
2n
with least absolute value. We have
s
y
 2
2n1
, and in addition, s
y
mod 2
2n
= Ry mod 2
2n
, and thus
e
2n
(qRy) = e
2n
(qs
y
) (69)
for all q, and so (68) becomes
qZ
M
e
2n
(qRy) =
qZ
M
e
2n
(qs
y
) =
e
2n
(Ms
y
) 1
e
2n
(s
y
) 1
if s
y
0,
M if s
y
= 0,
(70)
noting that
qZ
M
e
2n
(qs
y
) is a nite geometric series with ratio e
2n
(s
y
), provided s
y
0.
If s
y
 is small, then Ry/2
2n
is close to an integer, and so y/2
2n
is close to an integer
multiple of 1/R, which makes Step 8 of the algorithmmore likely to nd r = R. So we want
to show that s
y
 is small with reasonably high probability. The following claim shows
that if s
y
 is small enough, then (70) has large absolute value. This is true intuitively
because the terms of the sum on the left are all pointing roughly in the same direction in
the complex plane and so they add constructively. (Conversely, if s
y
 is large, then the
terms in the sum wrap around the unit circle many times, mostly canceling each other out
and giving (70) a small absolute value.)
Denition 16.2 We say that y Z
2
2n is okay if s
y
 R/2.
Claim 16.3 If y is okay, then
qZ
M
e
2n
(qRy)
2M/.
Proof. Fix y and suppose that s
y
 R/2. If s
y
= 0, then the claim clearly holds by (70),
so assume s
y
0. Starting from Equation (70) and using Exercise 2.3, we have
e
2n
(Ms
y
) 1
e
2n
(s
y
) 1
e
2n
(Ms
y
/2)[e
2n
(Ms
y
/2) e
2n
(Ms
y
/2)]
e
2n
(s
y
/2)[e
2n
(s
y
/2) e
2n
(s
y
/2)]
=
e
2n
(Ms
y
/2) e
2n
(Ms
y
/2) e
2n
(Ms
y
/2)
e
2n
(s
y
/2) e
2n
(s
y
/2) e
2n
(s
y
/2)
=
e
2n
(Ms
y
/2) e
2n
(Ms
y
/2)
e
2n
(s
y
/2) e
2n
(s
y
/2)
2i sin(M)
2i sin
sin(M)
sin
,
104
where := s
y
/2
2n
. Since we have
Ms
y

2
2n
s
y

R
2
2n1
,
we know that  /(2M) and M /2. This gives
sin(M)
sin
=
sinM
sin
sinM

.
It is readily checked that the function
sinx
x
is decreasing in the interval (0, /2], so
sinM

sin(/2)
/(2M)
=
2M
as desired. 2
Corollary 16.4 If y is okay, then
Pr[y]
4M
2
2n
2
4
R
2
O(2
2n
) .
So for each individual okay y Z
2
2n, we get a relatively large (but still exponentially
small) probability of seeing that particular y. Well need the additional fact that there are
many okay y. The following claim is obvious, so well give it without proof:
Claim 16.5 For every k Z
R
, there exists y Z
2
2n such that Ry is in the closed interval
[2
2n
k R/2, 2
2n
k +R/2]. Each such y is okay.
These intervals are pairwise disjoint for dierent k. This means there are at least R
many okay y. By Corollary 16.4, the chances of nding an okay y in Step 7 of the algorithm
are then
Pr[y is okay] =
y is okay
Pr[y] R
_
4
R
2
O(2
2n
)
_
2
O(2
n
) .
Let y be the value measured in Step 7, and suppose that y is okay. Then there is some
least k
y
Z such that
2
2n
k
y
R/2 Ry 2
2n
k
y
+R/2. (71)
(Actually, k
y
is unique satisfying (71) because the intervals dont overlap.) Dividing by
2
2n
R and rearranging, (71) becomes
y
2
2n
k
y
R
2
2n1
,
and so k
y
/R satises Equation (62). Now Step 8 produces the least k and r satisfying (62),
so we have two possible issues to address:
105
1. The k and r found in Step 8 are such that k/r k
y
/R.
2. The k and r found in Step 8 satisfy k/r = k
y
/R, but r < R because the fraction k
y
/R
is not in lowest terms (k/r is always given in lowest terms).
It turns out that the rst issue never arises. To see this, rst notice that k/r and k
y
/R must
be close to each other, because they are both close to y/2
2n
:
k
y
R
k
r
k
y
R
y
2
2n
+
y
2
2n
k
r
y
2
2n
k
y
R
y
2
2n
k
r
2
2n
, (72)
since both k/r and k
y
/R satisfy (62). Now suppose for the sake of contradiction that
k/r k
y
/R. Recall that R < N 2
n
, and also note that r R by the minimality of r. Then
we also have
0
k
y
R
k
r
k
y
r kR
rR
=
k
y
r kR
rR
1
rR
> 2
2n
,
which contradicts (72). (The rst inequality comes from the fact that k
y
r kR is a nonzero
integer; the second comes from the fact that r R < 2
n
.) Thus if y is okay, we must have
k/r = k
y
/R.
The second issue is more of a problem. It arises when k
y
and R are not coprime,
whence k = k
y
/g and r = R/g, where g = gcd(k
y
, R) > 1.
Denition 16.6 We say that y Z
2
2n is good if y is okay and k
y
is relatively prime to R.
Claim 16.7 If Step 7 produces a good y, then r = R is found in Step 8.
Proof. Let y Z
2
2n be good. We have k/r = k
y
/R, since y is okay. But since both fractions
are in lowest terms, we must have k = k
y
and r = R. 2
Claim 16.8 There are at least (R) many good y Z
2
2n.
(Recall that (n) is Eulers totient function, dened in Exercise 15.3.)
Proof. Claim 16.5 says that every k Z
R
is equal to k
y
for some okay y Z
2
2n. There are
(R) many k coprime with R, so there are at least (R) many good y. 2
Now we can combine all our claims to get our main Theorem 16.9, below.
Theorem 16.9 The probability that r = R is found in Step 8 is at least 4/(
2
n) O(n
2
).
106
Proof. By Claim 16.7, it suces to show that a good y is found in Step 7 with probability
at least 4/(
2
n) O(n
2
). By Claim 16.8 and Equation (59), there are at least
(R)
R
1 + lgR
R
1 + lgN
R
n + 1
many good y. By Corollary 16.4, each good y occurs with probability at least about
4/(R
2
), and so
Pr[y is good]
4
2
n
O(n
2
) .
2
There are some tricks to (modestly) boost the probability of success of Shors algo
rithm while keeping the number of repetitions of the whole quantum computation to a
minimum. For example, if an okay y is returned in Step 8 that is not good, it may be that
gcd(k
y
, R) is reasonably small, in which case R is a small multiple of r. If you can only
aord to run the quantum portion of the computation once, then in Step 9, you could
try computing a
r
, a
2r
, a
3r
, . . . , a
nr
(all modN) and return the least exponent yielding 1, if
there is one. If not, you could try relaxing the distance bound 2
2n1
in (62) to something
bigger, in the hope that the y you found, if not okay, is close to okay (if y is not okay,
it is more likely than random of being close to one that is). If you can aord to run the
quantum computation twice, obtaining r
1
and r
2
respectively in Step 8, then taking the
least common multiple lcm(r
1
, r
2
) yields R with much higher probability than you can get
by running the quantum computation just once.
This concludes the analysis of Shors algorithm. The only things left are (i) to show
how the QFT is implemented eciently with a quantum circuit, and (ii) describe how
Step 8 is implemented by a classical algorithm. Well take these in reverse order.
107
17 Week 9: Best rational approximations
The Continued Fraction Algorithm. The book illustrates continued fractions as part of
the ordernding algorithm, with Theorem 5.1 on page 229, and Box 5.3 on the next page.
We actually dont need to talk about continued fractions explicitly. All we need is to nd
an ecient classical algorithm to implement Step 7, which well do directly now.
For any real numbers a < b, there are innitely many rational numbers in the interval
[a, b]. We want to nd one with smallest denominator and numerator.
Denition 17.1 Let a, b Rwith 0 < a < b. Dene d to be the least positive denominator
of any fraction in [a, b]. Now dene n Z to be least such that n/d [a, b]. We call the
fraction n/d the simplest rational interpolant,
16
or SRI, of a and b, and we denote it SRI(a, b).
The fraction k/r found in Step 7 is just SRI((2y 1)/2
2n+1
, (2y + 1)/2
2n+1
).
Here is a simple, ecient, recursive algorithm to nd SRI(a, b) for positive rational
a < b. Each step will include a comment explaining why it is correct.
SRI(a, b):
Input: Rational numbers a, b with 0 < a < b, each given in numerator/denominator
form, where both numerator and denominator are in binary.
Base Case: If a 1 b, then return 1 = 1/1. (Clearly, this is the simplest possible
fraction!)
First Recursive Case: If 1 < a, then
1. Let q = a 1 be the largest integer strictly less than a.
2. Recursively compute r = SRI(a q, b q).
3. Return r +q.
(Obviously, shifting the interval [a, b] by an integral amount shifts the SRI the same
amount. Also note that q a/2a fact that will be useful later.)
Second Recursive Case: Otherwise, b < 1.
1. Recursively compute r = SRI(1/b, 1/a).
2. Return 1/r.
16
Im making this term up. Im sure there must be an ocial name for it, but I havent found what it is.
108
(We claimthat if d
/n
/d
/d
= n/d. Since n/d [a, b], we clearly have d/n [1/b, 1/a],
and so n
n by minimality of n
. Similarly, n
/d
by
minimality of d. Thus we have n
/d
n/d. Suppose n
/d
/d
/d n/d, so n
/d
< n
/d
or n
/d < n/d. We cant have the latter, owing to the minimality of n. But we
cant have the former, either, for otherwise, d
/n
> d/n
, because d/n
/d
= n/d, and
so SRI(a, b) = 1/SRI(1/b, 1/a).)
The comments suggest that the SRI algorithm is correct as long as it halts. It does halt,
and quickly, too. Let the original inputs be a = a
0
= n
0
/d
0
and b = N
0
/D
0
, given as
fractions in lowest terms (n
0
, d
0
, N
0
, and D
0
are all positive integers). Similarly, for 0 < k,
let a
k
= n
k
/d
k
and b
k
= N
k
/D
k
be respectively the rst and second argument to the kth
recursive call to SRI. We consider the product P
k
:= n
k
d
k
N
k
D
k
and how it changes with
k. If the kth recursive call occurs in the second case, then the numerator and denominators
are simply swapped, so P
k
= P
k1
. If the kth recursive call occurs in the rst case, then
d
k
= d
k1
and D
k
= D
k1
, but
n
k
= n
k1
qd
k1
n
k1
(a
k1
/2)d
k1
n
k1
/2,
and
N
k
= N
k1
qD
k1
< N
k1
,
where q = a
k1
1 a
k1
/2. Thus in this case, P
k
< P
k1
/2. The two recursive cases
alternate, so P
k
decreases by at least half with every other recursive call. Since P
k
> 0,
we must hit the base case after at most 2 lgP
0
= 2(lgn
0
+ lgd
0
+ lgN
0
+ lgD
0
) recursive
calls. For each k 0, lgP
k
approximates the size of the input (in bits) up to an additive
constant, and this size never increases from call to call, so the whole algorithm is clearly
polynomial time.
Exercise 17.2 What is SRI(7/25, 3/10)?
Exercise 17.3 (Challenging) Using your favorite programming language, implement the
SRI algorithm above. You can decide to accept either exact rational or oating point
inputs.
Implementing the QFT. Recall that for all x Z
2
n,
QFT
n
x) =
1
2
n/2
yZ
2
n
e
n
(xy)y).
It was Peter Shor who rst showed how to implement QFT
n
eciently with a quantum
circuit, in the same paper as his factoring algorithm. The following recursive description
109
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
=
.
.
.
.
.
.
.
.
.
.
.
.
QFT
nm
m
n m
QFT
n
P
n,m
QFT
m
Figure 8: QFT
n
in terms of QFT
nm
and QFT
m
.
is taken from Cleve & Watrous (2000). When n = 1, you can easily check that QFT
1
= H,
i.e., the onequbit Hadamard gate. Now suppose that n > 1 and let 1 m < n be an
integer. QFT
n
can be decomposed into a circuit using QFT
nm
, QFT
m
, and two other
subcircuits, as shown in Figure 8. The P
n,m
gate acts on two numbersan (n m)bit
number x Z
2
nm and an mbit number y Z
2
msuch that
P
n,m
x, y) = e
n
(xy)x, y).
That is, P
n,m
adjusts the phase of x, y) by e
2ixy/2
n
. (The P stands for phase.) In the
gure, x is fed to P
n,m
in the upper n m qubits, and y in the lower m qubits. Well
see shortly how P
n,m
can be implemented in terms of simple gates. The unnamed gate
on the far right of the gure merely serves to move the qubits around, bringing the top
n m qubits to the bottom and bringing the bottom m qubits to the top.
17
These qubit
permuting gates can be left out when recursively expanding QFT
nm
and QFT
m
, as long
as you keep track of which qubit is which and adjust the elementary gates accordingly.
Many recursive decompositions are possible, based on the choice of m at each stage.
Shors original circuit for QFT
n
is obtained by recursively decomposing with m = 1
throughout. A smaller depth circuit is achieved by a divideandconquer approach,
letting m be roughly n/2 each time.
Lets check that the decomposition of Figure 8 is correct. Given any nbit number
x Z
2
n, we split its binary representation into its n m highorder bits x
h
Z
2
nm and
its m loworder bits x
Z
2
m. So we have x = x
h
2
m
+ x
) or as x
h
)x
). Applying QFT
n
to x) gives
QFT
n
x) =
1
2
n/2
yZ
2
n
e
n
(xy)y) =
1
2
n/2
y
e
n
((x
h
2
m
+x
)y)y). (73)
17
This, as well as any other rearrangement of qubits, can always be achieved by two layers of swap gates.
Proving this makes a great exercise.
110
Expressing each y as y
h
2
nm
+y
for unique y
h
Z
2
m and y
Z
2
nm, (73) becomes
1
2
n/2
y
e
n
((x
h
2
m
+x
)(y
h
2
nm
+y
))y) =
1
2
n/2
y
e
nm
(x
h
y
)e
m
(x
y
h
)e
n
(x
)y).
(74)
(Notice that there is no x
h
y
h
exponent, since it is multiplied by 2
n
.) Now lets see what
happens when the righthand circuit of Figure 8 acts on x). We have
x) = x
h
)x
)
QFT
nm
1
2
(nm)/2
Z
2
nm
e
nm
(x
h
y
)y
)x
)
P
n,m
1
2
(nm)/2
e
nm
(x
h
y
)e
n
(y
)y
)x
)
QFT
m
1
2
n/2
y
h
Z
2
m
e
nm
(x
h
y
)e
n
(y
)e
m
(x
y
h
)y
)y
h
)
1
2
n/2
y
h
e
nm
(x
h
y
)e
n
(y
)e
m
(x
y
h
)y
h
)y
)
=
1
2
n/2
yZ
2
n
e
nm
(x
h
y
)e
m
(x
y
h
)e
n
(x
)y),
where we set y := y
h
2
nm
+ y
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 e
2i
_
_
.
P()
Owing to the symmetry between the control and target qubits, we will display this gate
as
111
where we place the value somewhere nearby. Our values will always be of the form
2
k
for integers k > 0. Notice that for any a, b Z
2
,
CP(2
k
)a)b) = e
k
(ab)a)b). (75)
It is easiest to think of P
n,m
as acting on two quantum registersthe rst with n m
qubits and the second with m qubits. What gates do we need to implement P
n,m
? Lets
consider P
n,m
applied to the state x)y) = x
1
x
2
x
nm
)y
1
y
2
y
m
), where x
1
, . . . , x
nm
and y
1
, . . . , y
m
are all bits in Z
2
. We have
x
2
nm
= 0.x
1
x
2
x
nm
=
nm
j=1
x
j
2
j
and
y
2
m
= 0.y
1
y
2
y
m
=
m
k=1
y
k
2
k
,
where the decimal expansions are actually base 2. Multiplying these two quantities
gives
xy
2
n
=
nm
j=1
m
k=1
x
j
y
k
2
jk
,
and so
e
n
(xy) = exp(2ixy/2
n
) =
j,k
exp(2ix
j
y
k
2
jk
) =
j,k
e
j+k
(x
j
y
k
).
And thus we get
P
n,m
x)y) =
_
j,k
e
j+k
(x
j
y
k
)
_
x)y).
Recalling (75), notice that for each j and k, we can get the (j, k)th factor in the product
above if we connect the jth qubit of the rst register (carrying x
j
) with the kth qubit of the
second register (carrying y
k
) with a CP(2
jk
) gate (which then acts on the state x
j
y
k
) to
get an overall phase contribution of e
j+k
(x
j
y
k
)). So to implement P
n,m
we just need to do
this for all 1 j nmand all 1 k m. Thats it. All these gates will combine to give
the correct overall phase shift of e
n
(xy). The order of the gates does not matter, because
they all commute with each other (they are all diagonal matrices in the computational
basis). For example, Figure 9 shows the P
9,4
circuit.
Exercise 17.4 Give two complete decompositions of QFT
4
as circuits, the rst using m = 1
throughout, andthe secondusingm = 2 for the initial decomposition. Bothcircuits should
use only Hand CP(2
k
) gates for k > 0. Do not cross wires except at the end of the entire
circuit!
112
1/512
1
2
3
4
5
1
2
3
4
1/4 1/8 1/16 1/32
1/64 1/128 1/256
Figure 9: The circuit implementing P
9,4
. CP() gates are grouped according to the values
of . Within each group, gates act on disjoint pairs of qubits, so they can form a single
layer of gates acting in parallel.
Exercise 17.5 (Challenging) Asymptotically, what is the size (number of elementary gates)
of QFT
n
when decomposed using m = 1 throughout (Shors circuit)? What is the size
using the divideandconquer method with m = n/2 throughout? The same questions for
the depth (minimum possible number of layers of gates acting on disjoint sets of qubits).
Use bigO notation. In all cases, you can ignore the qubitpermuting gates. [Hint: Find
recurrence equations satised by the size and the depth in each case.]
Actually, there is another way to implement P
n,m
: Classically compute xy as an nbit
binary integer, then for each k {1, 2, . . . , n}, send the kth qubit of the result through
the gate P(2
k
). There are fast parallel circuits for multiplication, with polynomial size
and depth O(lgn). This logdepth implementation of P
n,m
together with the divideand
conquer decomposition method for QFT
n
give an O(n)depth, polynomialsize circuit
that exactly implements QFT
n
.
18 Week 9: Approximate QFT
Exact versus Approximate. The QFT
n
circuit we described above for Shors algorithm
blithely uses CP(2
k
) gates where k ranges between 2 and n. If Shors algorithm is to
signicantly outperform the best classical factoring algorithms, then n must be on the
order of 10
3
and above, which means that we will be using gates that produce conditional
phase shifts of 2/2
1000
or less. No one in their right mind imagines that we could ever
tune our instruments so precisely as to produce so small a phase shift, which is required
113
for any exact implementation of QFT
1000
. The bottom line is that implementing QFT
n
exactly for large n will just never be feasible.
Fortunately, an exact implementation is unnecessary for Shors algorithm or for any
other probabilistic quantum algorithm that uses the QFT. We can actually tolerate a lot
of imprecision in the implementation of the CP(2
k
) gates. In fact, if k lgn, then
CP(2
k
) is close enough to the identity operator that we can omit these gates entirely.
The resulting circuit is much smaller and produces a good approximation to QFT
n
that can
be used in Shors algorithm. Good enough so that the probability of nding R in Step 7 of
the algorithm is at worst only slightly smaller than with the exact implementation, thus
requiring only a few more repetitions of the algorithm to produce R with high probability.
In the next few topics, well make this all quantitative. The concepts and techniques
we introduce are useful in other contexts. Before we do, we need a basic inequality known
as the CauchySchwarz inequality.
The CauchySchwarz inequality. We mentioned this inequality early in the course as
proving the triangle inequality for complex scalars, but this is the rst time since then that
we actually need it. Well use it here to bound the eects of unitary errors in implementing
a quantum circuit. Well use it again in other contexts.
Theorem 18.1 (CauchySchwarz Inequality) Let Hbe a Hilbert space. For any vectors u, v
H,
uv) u v,
with equality holding if and only if u and v are linearly dependent.
Proof. There are many ways to prove this theorem. The Nielsen & Chuang textbook
has a proof in Box 2.1 on page 68, which we loosely paraphrase here. See the Background
material for another proof. Equality clearly holds if u and v are linearly dependent, since
then one vector is a scalar multiple of the other. So assume that u and v are linearly
independent. By the GramSchmidt procedure, we can nd orthonormal vectors b
1
, b
2
such that b
1
= u/u and b
2
= (v b
1
v)b
1
)/v b
1
v)b
1
. We thus have
u = ab
1
,
v = cb
1
+db
2
,
for some a, c, d C with a > 0 and d > 0. We now get
u v = a(c
2
+d
2
)
1/2
> a(c
2
)
1/2
= ac = ac = uv).
2
Exercise 18.2 Show that u +v u +v for any two vectors u, v H, with equality
holding if and only if one is a nonnegative scalar times the other. This is another example
of a triangle inequality. [Hint: Use CauchySchwarz (Theorem 18.1) and the fact that
[z] z for any z C.]
114
A Hilbert Space Is a Metric Space. For any two vectors u, v H, the Euclidean distance
between u and v is dened as
d(u, v) := u v.
The function d satises the following axioms:
1. d(u, v) 0,
2. d(u, v) = 0 i u = v,
3. d(u, v) = d(v, u), and
4. d(u, v) d(u, w) +d(w, v) for any w H.
These are the axioms for a metric on the set H. The last item is known as the triangle
inequality, which can be seen as follows:
d(u, v) = u v = u w+wv u w +wv = d(u, w) +d(w, v),
where the inequality follows from Exercise 18.2. All the other axioms are straightforward.
Suppose that you could run an ideal quantum algorithm to produce a state ) that
you then subject to some projective measurement. You would get certain probabilities
for the various possible outcomes. Suppose instead that you actually ran an imperfect
implementationof the algorithmandproduceda state ) that was close to) inEuclidean
distance, and you subjected ) to the same projective measurement. The next proposition
shows that the probabilities of the outcomes are close to those of the ideal situation.
Proposition 18.3 Let {P
a
: a I} be some complete set of orthogonal projectors on H. Let
u, v H be any two unit vectors, and let Pr
u
[a] and Pr
v
[a] be the probability of seeing outcome
a I when measuring the state u and v respectively using this complete set. Then for every
outcome a I,
P
u
[a] Pr
v
[a] 2d(u, v).
Proof. We have
Pr
u
[a] Pr
v
[a] = u
P
a
u v
P
a
v
= u
P
a
u u
P
a
v +u
P
a
v v
P
a
v
= u
P
a
(u v) + (u v)
P
a
v
u
P
a
(u v) +(u v)
P
a
v
= P
a
uu v) +u vP
a
v)
P
a
u u v +u v P
a
v
2u v.
115
The second inequality is an application of CauchySchwarz (Theorem 18.1); the third
follows from the fact that Pw w = 1 for any projector P and unit vector w (see
Exercise 5.10). 2
The next denition extends the notion of distance to operators. Here we give one of
many possible ways to do this.
Denition 18.4 Let A L(H) be an operator. The operator norm of A is dened as
A := sup
vH:v=1
Av = sup
v0
Av
v
.
This norm is also sometimes called the
norm on operators.
A is thus the maximum of Av taken over all unit vectors v. Dont confuse A,
which is a scalar, with A =
1
n
0 are the eigenvalues of A. We claim that A =
1
, i.e., A is the largest
eigenvalue of A. To see why, let v = (v
1
, . . . , v
n
) be any unit column vector with respect
to this basis {b
j
}
1jn
. Then we have
Av
2
= AvAv) =
n
j=1
2
j
v
j

2
=
2
j
a
j
,
where we set a
j
:= v
j

2
. We have a
j
0 for all 1 j n, and since v is a unit vector, we
have
j
a
j
= 1. So,
Av
2
=
n
j=1
2
j
a
j
=
2
1
a
1
+
n
j=2
2
j
a
j
=
2
1
_
1
n
j=2
a
j
_
+
n
j=2
2
j
a
j
=
2
1
+
n
j=2
(
2
j
2
1
)a
j
.
Since
2
j
2
1
0 for all 2 j n, the righthand side is clearly maximized by setting
a
2
= = a
n
= 0 (and so a
1
= 1). So we must have A = (A) = Ab
1
 =
1
as
claimed.
The next property follows from the claim, above.
9. If A and B are operators (not necessarily over the same space), then AB =
A B. In particular, AI = A and I B = B.
This property is useful when we take the norm of a single gate in a circuit. The unitary
operator corresponding to the action of the gate is generally of the form U I, where U
corresponds to the gate acting on the space of its own qubits, and the identity I acts on the
qubits not involved with the gate. Property 9 says that we can ignore the I when taking
the norm of this operator.
To prove Property 9, we rst prove that AB = A B. To show this, we only need
to verify two things: (i) (A B)
2
= (A B)
k
for all 1 j n
and 1 k m. Since all the
j
and
k
are nonnegative, the diagonal entries of A B
are all nonnegative. Hence, A B 0, which proves (ii), and thus A B = A B.
Now the largest eigenvalue of A B is clearly , where = max(
1
, . . . ,
n
) = A
and = max(
1
, . . . ,
m
) = B by the claim. Since A B = A B, the product is
also the largest eigenvalue of AB, and so using the claim again, we get Property 9.
Exercise 18.7 Verify by direct calculation that (A B)
2
= (AB)
(AB).
While were on the subject, one more property of the operator norm will nd use later
on. If you want, you can skip down to after the proof of Claim 18.8, below, and refer back
to it later when you need to.
10. A
 = UAU
.
Since unitarily conjugate operators have the same spectrum, Claim18.8 implies that A
and A
 =
AA
=
_
UA
2
U
= U
_
A
2
U
= UAU
.
118
2
Nowwe consider an arbitrary idealized quantumcircuit Cwith mmany unitary gates,
which basically consists of a succession of unitary operators U
1
, . . . , U
m
applied to some
initial state init), producing the state ) = U
m
U
1
init), which is then projectively mea
sured somehow. When implementing C we might implement each gate U
j
imperfectly,
getting some unitary V
j
instead, where hopefully, V
j
is close to U
j
. I will call this a unitary
error. The actual circuit produces the state 
) = V
m
V
1
init). Assuming d(U
j
, V
j
)
for all 1 j m, what can we say about d(), 
))?
Classical calculations are often numerically unstable, and errors may compound mul
tiplicatively. Fortunately for us, unitary errors only compound additively rather than
multiplicatively, so we can tolerate a fair amount of imperfection in our gatesonly
O(lgn) bits of precision per gate for a circuit with a polynomially bounded (in n) number
of gates.
Back to the question above. Using the basic properties of the operator norm listed
above, we get
d(), 
)) = (U
m
U
1
V
m
V
1
)init)
U
m
U
1
V
m
V
1
 init)
= U
m
U
1
V
m
V
1
.
The operator inside the  on the right can be expressed as a telescoping sum:
U
m
U
1
V
m
V
1
=
m
k=1
U
m
U
k+1
(U
k
V
k
)V
k1
V
1
. (76)
Therefore,
U
m
U
1
V
m
V
1
 =
_
_
_
_
_
m
k=1
U
m
U
k+1
(U
k
V
k
)V
k1
V
1
_
_
_
_
_
k
U
m
U
k+1
(U
k
V
k
)V
k1
V
1

=
k
U
k
V
k

= m,
and so d(), 
)) m.
Suppose we want the probability of some outcome to dier from the ideal probability
by no more than some > 0. Then by Proposition 18.3, it suces that 2m , or that
2m
.
119
For example, the entire quantumcircuit for Shors algorithmhas size polynomial inn
lets say at most cn
k
gates for some constants c andk. (Imnot sure, but I believe that k 3.
The dominant contribution is not the QFT but rather the classical modular exponentiation
circuit.) The algorithmproduces a good y (one that will lead to nding R) with probability
at least 4/(
2
n), ignoring a quadratically small correction term. We could settle instead
for a success probability of at least 2/(
2
n), say, which would require up to twice as many
trials on average for success. But then, choosing := 4/(
2
n) 2/(
2
n) = 2/(
2
n), we
could implement each gate to within an error (operator distance) of
Shor
:=
2/(
2
n)
2cn
k
=
1
2
cn
k+1
= (n
k1
)
away from the ideal. This has major implications for the QFT part of the circuit. The QFT
has size (n
2
), uses n Hadamard gates, and the rest of the gates are CP(2
j
) gates, where
2 j n. (We can do without the swap gates by keeping track of which qubit is which,
and rearranging the bits of the y value that we measure.) Note that for any R,
CP() I =
_
_
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 e
2i
1
_
_
= 2ie
i
_
_
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 sin()
_
_
.
It follows that
d(CP(), I) = CP() I = 2 sin() 2.
This means that if 22
j
Shor
, or equivalently,
j lg(2/
Shor
) = (k + 1) lgn +O(1),
then any CP(2
j
) in the QFT circuit is close enough to I that we can just omit it. Its
easy to see that most of the QFT gates are like this and can be omitted, shrinking the QFT
portion of the circuit from quadratic size to linear size in n. This fact was rst observed
by Coppersmith.
For n = 10
3
and assuming k = 3 we can get by with implementing each gate with error
O(n
4
), which is on the order of one part per trillion. This is still a very tall order, but
unlike 2
1000
it is at least close to the realm of sanity. Optimizing other aspects of Shors
algorithm and its analysis increases the error tolerance considerably.
120
19 Midterm Exam
Doall problems. Handinyour anwsers inclass onWednesday, March28, just as youwould
a homework problem set. The only dierence between this exam and the homeworks is
that you may not discuss exam questions or answers with anyone inside or outside of
class except me. It goes without saying that if you do, you have cheated and Ill have to
summarily fail you, which is my usual policy about cheating. I know you wont, though,
so Ill sleep well at night.
All questions with Roman numerals carry equal weight, but may not be of equal
diculty.
Recall that for two vectors or operators a, b, we say that a b if there is a phase factor
e
i
where R such that a = e
i
b.
I) (Rotating the Bloch sphere) Find a unit vector n = (x, y, z) R
3
on the Bloch sphere
and an angle [0, 2) such that
R
n
()+x) +y),
R
n
()+y) +z),
where R
n
() is dened in Exercise 9.4, and +x), +y), and +z) are given by Equa
tions (1416). Give the 2 2 matrix corresponding to your solution in the standard
computational basis, simplied as much as possible. What can you say about
R
n
()+z)? There are exactly two possible solutions to this problem.
II) (Phase factors and density operators) Let U and V be unitary operators over H. It is
easy to see that if U V, then UU
= VV
= VV
),
show how they can alter their protocol to faithfully teleport ). [Hint: Use the
previous item.]
V) (A blackbox problem) Suppose f : (Z
2
)
n
Z
2
is such that there is some s =
s
1
s
n
(Z
2
)
n
such that for all x = x
1
x
n
(Z
2
)
n
,
f(x) = s x,
Where s x =
_
n
j=1
s
j
x
j
_
mod 2 is the standard dot product of s and x over Z
2
.
Recall the inversion gate I
f
such that
I
f
x) = (1)
f(x)
x)
for all x (Z
2
)
n
. The following describes a circuit that uses I
f
once to nd s:
(a) Initialize an nqubit register in the state 0
n
).
(b) Apply a Hadamard gate H to each of the n qubits. (This is a single layer.)
(c) Apply I
f
to the n qubits.
(d) Apply a Hadamard gate H to each of the n qubits. (This is a single layer.)
(e) Measure the n qubits in the computational basis, obtaining some y (Z
2
)
n
.
Do the following:
(a) Draw the circuit described above.
(b) Give the state of the n qubits after each unitary gateor layer of gatesis
applied.
(c) Show that y = s with certainty.
(d) Show how to nd s classically by evaluating f on exactly n elements of (Z
2
)
n
.
122
20 Week 10: Grovers algorithm
Quantum Search. You are given an array A[1 . . . N] of N values, one of which is a
recognizable target value t. You want to nd the position wof t in the list. The values are
not necessarily sorted or arranged in any particular way. Classically, the best you can do
in the worst case is to probe all A[j] for 1 j N, and nd the target on the last probe. On
average, you will need about N/2 probes before nding the target with high probability.
With a quantum algorithm, you can nd the target with (extremely) high probability
using only O(
N
.
It turns out that we cant do better than this in the worst case. Grovers algorithm now
works as follows:
1. Initialize an nqubit register in the state 0
n
).
2. Apply U to get the state s) = U0
n
). We call s) the start state. Note that x = ws) =
sw) > 0. Well assume that x < 1, or equivalently, that s) and w) are linearly
independent; otherwise, s) w) and we can skip the next step entirely. For U
implemented with Hadamards as above, this assumption clearly holds.
123
3. Apply G to s) /(4 sin
1
x) many times, where
G := UI
0
U
I
f
is known as the Grover iterate.
4. Measure the n qubits in the computational basis, obtaining a value y {0, 1}
n
.
Well showthat y = wwithhighprobability. Note that if x = 1/
N, then/(4 sin
1
x)
/(4x) = (
I
f
= U(I 20
n
)0
n
)U
(I 2w)w)
= (I 2U0
n
)0
n
U
)(I 2w)w)
= (I 2s)s)(I 2w)w)
= I + 2s)s + 2w)w 4xs)w.
Applying the righthand side to s) and w) immediately gives us
Gs) = (1 4x
2
)s) + 2xw),
Gw) = 2xs) +w).
Sowe see that Gs) andGw) are both(real) linear combinations of s) andw). Thus Gmaps
the plane spanned by s) and w) into itself, and all intermediate states of the algorithm lie
in this plane. Thus we can now restrict our attention to this twodimensional subspace S.
Using GramSchmidt, we pick an orthonormal basis for S, with w) being one vector
and r) := r
)/r
)
2
= r
r
1 x
2
.
It is easily checked that rw) = 0. Let 0 < < /2 be such that x = sin. Expressing s)
in the {r), w)} basis, we get
s) =
_
1 x
2
r) +xw) = cos r) + sinw) =
_
cos
sin
_
.
Lets express Gwith respect to the same {r), w)} basis. Note that restricted to the subspace
S, the identityI has the same eect as the orthogonal projector P
S
= r)r+w)w projecting
124
onto S: they both x all vectors in S. It follows that, restricted to S,
G = P
S
+ 2s)s + 2w)w 4xs)w
= r)r w)w + 2(cos r) + sinw))(cos r + sinw)
+ 2w)w 4 sin(cos r) + sinw))w
= (2 cos
2
1)r)r 2 cos sinr)w + 2 sin cos w)r + (1 2 sin
2
)w)w
=
_
cos(2) sin(2)
sin(2) cos(2)
_
.
Geometrically, if we identify r) withthe point (1, 0) R
2
andw) withthe point (0, 1) R
2
,
then s) is the point in the rst quadrant of the unit circle, forming angle with r). Also,
G is seen to give a counterclockwise rotation of the circle through angle 2. We want the
state to wind up as close to w) as possible, which makes an angle /2 with r). Applying
G m times puts the state at an angle (2m+ 1) fromr), so we solve
(2m+ 1) =
2
m =
4
1
2
=
4 sin
1
x
1
2
.
Rounding to the nearest integer gives m = /(4 sin
1
x), which is the number of times
we apply G to s). The nal state is within an angle of w), so the probability of getting
w as the result of the measurement is at least cos
2
= 1 x
2
= 1 2
n
= 1 1/N (if
x = 2
n/2
), which is expontially close to 1.
Interestingly, if we apply G too many times, then we start drifting away fromw) and
the probability of getting win the measurement will start going down again to about zero
at 2m applications, then it will oscillate back to one at about 3m, then close to zero again
at 4m, et cetera.
Some Variants of Quantum Search. An obvious variant is to assume that f(x) = 1 for at
most one x, rather than for exactly one x. For this variant, one can run Grovers algorithm
just as before, but check that the nal result y is such that f(y) = 1, using one more probe
of f. If not, then you can conclude that f is the constant zero function, and youd be wrong
with exponentially small probability.
Another variant is when there are exactly k many x such that f(x) = 1, where k is
known, and your job is to nd the location of any one of them. This is the subject of the
next exercise.
Exercise 20.1 (Somewhat challenging) Show that if there are exactly k many x such that
f(x) = 1, where 0 < k < 2
n
is known, then one of the targets can be foundwith high proba
bilityusingO(
_
N/k) probes tof. [Hint: Let U = H
n
, let s) = U0
n
) = 2
n/2
x{0,1}
n x)
be the start state, and let G = UI
0
U
I
f
= (I 2s)s)I
f
be the Grover iterate, all as
before. Run Grovers algorithm as before, applying Gsome number of times to s). To see
how many times to apply G:
125
1. Dene the state w) to be an equal superposition of all target locations:
w) :=
1
x:f(x)=1
x).
2. Likewise, dene the state r) to be the superposition of all nontarget locations:
r) :=
1
2
n
k
x:f(x)=0
x).
Notice that r) and w) are orthogonal unit vectors.
3. Let x := sw) as before. Show that now, x =
k/2
n/2
.
4. Dene 0 < < /2 such that x = sin, just as before, and show that s) = cos r) +
sinw), just as before.
5. (The crucial step) Show directly that
Gr) = cos(2)r) + sin(2)w),
Gw) = sin(2)r) + cos(2)w),
just as before. Note that G = (I 2s)s)I
f
(I 2s)s)(I 2w)w), so the
calculation must be a bit dierent from before. You might observe that I
f
has the
same eect as I 2w)w within the space spanned by r) and w), but you cant use
this fact until you establish that G maps this space into itself. Better to just do the
calculations above directly.
6. Conclude that G maps the space spanned by the orthonormal set {r), s)} into itself,
and its matrix looks the same as before.
7. Conclude that /(4) is the right number of applications of G, since measuring
the qubits in a state close to w) returns some target location with high probability.
Show that /(4) = (
_
N/k).]
21 Week 10: Quantum search lower bound
ALower BoundonQuantumSearch. The number of probes to the functionf inGrovers
search algorithm is asymptotically tight. That is, no quantum algorithm can nd a unique
target in a search space of size N with high probability using o(
(j)
_
be the state of the r qubits immediately after the application of U
j
. That is,
(j)
_
= U
j
I
(j)
z
U
j1
U
1
I
(1)
z
U
0
0) = U
j
U
j1
U
1
U
0
0).
127
In particular
(m)
_
is the nal state. For 0 j < m we can factor
(j)
_
uniquely as
(j)
_
=
x{0,1}
n
x)
(j)
x
_
,
where the rst ket in each termrepresents a basis state of the n qubits entering the (j +1)st
I
f
gate, and the second ket is a (not necessarily unit) vector representing the other r n
qubits. Likewise, we uniquely factor
(m)
_
as
(m)
_
=
x{0,1}
n
x)
(m)
x
_
,
where here the rst ket in each term represents a basis state of the n qubits that are about
to be measured, and the second ket is a vector representing the r n unmeasured qubits.
Since
(j)
_
is a state, we have, for all 0 j m,
1 =
(j)

(j)
_
=
x{0,1}
n
(j)
x

(j)
x
_
=
x
_
_
(j)
x
__
_
2
. (77)
Let w {0, 1}
n
be arbitrary. Now we run C again with I
w
gates. For 0 j m, dene
(j)
w
_
:= U
j
I
(j)
w
U
j1
U
1
I
(1)
w
U
0
0)
to be the state of the circuit just after the application of U
j
. We claim that there are many
values of w for which 
(j)
w
) does not dier too much from
(j)
_
, for any 1 j m. For
each 0 j m, dene the error vector
(j)
w
_
:=
(j)
w
_
(j)
_
We want to show that enough of the vectors 
(j)
w
) have small norm. For each j, dene
D
(j)
:=
w{0,1}
n
_
_
(j)
w
__
_
2
.
Claim 21.1 D
(j)
4j
2
for all 0 j m.
Proof. We proceed by induction on j. For j = 0, we have 
(0)
w
) = U
0
0) = 
(0)
) and
thus 
(0)
w
) = 0 for all w, and so the claim clearly holds. Now for the inductive case where
0 j < m, we want to express 
(j+1)
w
) in terms of 
(j)
w
). We have, for all w,
(j+1)
w
_
= U
j+1
I
(j+1)
w
(j)
w
_
= U
j+1
I
(j+1)
w
_
(j)
_
+
(j)
w
__
128
= U
j+1
I
(j+1)
w
_
_
x{0,1}
n
x)
(j)
x
_
_
_
+U
j+1
I
(j+1)
w
(j)
w
_
= U
j+1
_
x
(I
w
x))
(j)
x
_
_
+U
j+1
I
(j+1)
w
(j)
w
_
= U
j+1
_
x
(x) 2w)wx))
(j)
x
_
_
+U
j+1
I
(j+1)
w
(j)
w
_
= U
j+1
(j)
_
2U
j+1
w)
(j)
w
_
+U
j+1
I
(j+1)
w
(j)
w
_
=
(j+1)
_
2U
j+1
w)
(j)
w
_
+U
j+1
I
(j+1)
w
(j)
w
_
.
Subtracting, we get
(j+1)
w
_
=
(j+1)
w
_
(j+1)
_
= U
j+1
_
I
(j+1)
w
(j)
w
_
2w)
(j)
w
__
,
whence
_
_
(j+1)
w
__
_
2
=
_
_
I
(j+1)
w
(j)
w
_
2w)
(j)
w
__
_
2
__
_
(j)
w
__
_
+ 2
_
_
(j)
w
__
_
_
2
.
Expanding and summing over w {0, 1}
n
, we have
D
(j+1)
D
(j)
+ 4
w{0,1}
n
_
_
(j)
w
__
_
_
_
(j)
w
__
_
+ 4
w{0,1}
n
_
_
(j)
w
__
_
2
= D
(j)
+ 4) + 4,
where we have used Equation (77) for the last term, and where and are 2
n
dimensional
column vectors whose entries, indexed by w, are 
(j)
w
) and 
(j)
w
), respectively. We
can apply CauchySchwarz to ):
) = )   =
_
w
_
_
(j)
w
__
_
2
_
1/2
_
w
_
_
(j)
w
__
_
2
_
1/2
=
D
(j)
1 =
D
(j)
,
using (77) again. Plugging this in above and using the inductive hypothesis, we have
D
(j+1)
D
(j)
+ 4
D
(j)
+ 4 4j
2
+ 8j + 4 = 4(j + 1)
2
,
which proves the claim. 2
Now for j = m, the claim asserts that
w{0,1}
n
_
_
_
(m)
w
)
_
_
_
2
4m
2
. This implies that
_
_
_
(m)
w
)
_
_
_
2
> 4m
2
/2
n1
for less than 2
n1
many w, i.e., for more than half of the w, we
have
_
_
_
(m)
w
)
_
_
_
2
4m
2
/2
n1
. Using a similar argument with Equation (77), we must have
129
_
_
_
(m)
w
)
_
_
_
2
1/2
n1
for more than half of the w. Thus there is some w {0, 1}
n
such that
both of these inequalities hold. Fix such a w. The nal state of the circuit when run with
target w is 
(m)
w
), and we can factor it as
(m)
w
_
=
x{0,1}
n
x)
(m)
x
_
,
where (as with 
(m)
)) the rst ket represents the n qubits that are about to be measured,
and the second ket represents the other qubits (and is not necessarily a unit vector). The
probability of seeing was the outcome of the measurement when the state is 
(m)
w
) is then
Pr[w] =
_
_
_
(m)
w
)
_
_
_
2
, but this value is quite small, provided m is not too large:
_
_
(m)
w
__
_
=
_
_
(m)
w
_
(m)
w
_
+
(m)
w
__
_
_
_
(m)
w
_
(m)
w
__
_
+
2
2
n/2
=
_
_
w)
_
(m)
w
_
(m)
w
___
_
+
2
2
n/2
=
_
_
(w)w I)
_
(m)
w
_
(m)
___
_
+
2
2
n/2
=
_
_
(w)w I)
(m)
w
__
_
+
2
2
n/2
_
_
(m)
w
__
_
+
2
2
n/2
(2m+ 1)
2
2
n/2
.
And so we get that Pr[w] (2m + 1)
2
/2
n1
= O(m
2
/2
n
). So nally, if m = o(2
n/2
), we
have Pr[w] = o(1), i.e., Pr[w] approaches zero as n gets large, and the circuit likely wont
nd w.
130
22 Week 11: Quantum cryptography
Quantum Cryptographic Key Exchange. If Alice and Bob share knowledge of a secret
string r of random bits, then Alice can send a message m with the same number of bits
as r to Bob over a channel subject to eavesdropping with perfect secrecy, i.e., no third party
(Eve), monitoring the channel with no knowledge of r, can gain any knowledge about m
whatsoever. This scheme, known as a onetime pad, works as follows:
1. Alice computes c = m r, the bitwise exclusive OR of m and r. The message m is
called the cleartext or plaintext, and c is called the ciphertext.
2. Alice transmits the ciphertext c to Bob over the channel, which well assume is
publically readable, e.g., a newspaper or an internet bulletin board.
3. Bob gets c and computes m = c r, thus recovering the cleartext m.
All Eve sees is c = mr, and since she doesnt knowr which is assumed to be uniformly
random, the bits of c look completely random to herall possible cs are equally likely if
all possible rs are equally likely. Hence the perfect secrecy.
Its called a onetime pad for a reason: r cannot be reused to send another message.
Suppose Alice sends another message m
= m
r. Then
Eve can compute
c c
= (mr) (m
r) = mm
r r = mm
.
If m and m
are both uncompressed les of English text, then they have enough redun
dancy that Eve can gain some knowledge of m and m
uniquely from mm
c
) for any b, b
B and c, c
C, then 1/
n is the common
132
value of bc) for any b B and c C. [Hint: Express a vector from C as a linear
combination of vectors fromB. What can you say about the coecients?]
A ddimensional Hilbert space cannot have a mutually unbiased collection of more
than d + 1 orthonormal bases. If d is a power of a prime number, then d + 1 mutually
unbiased bases can be constructed, but it is an open problem to determine how many
mutually unbiased bases there can be when d is not a prime power. Even the case where
d = 6 is open. Anyway, for the onequbit case where d = 2, the bases {+x), x)},
{+y), y)}, and {+z), z)} are mutually unbiased. (Other collections of three mutually
unbiased bases can be obtained from these three by applying some unitary operator U to
every vector (the same for all the vectors). Applying Udoes not change the inner product
of any pair of vectors.) BB84 uses two of these three, say, {+z), z)} and {+x), x)}. Well
denote the rst of these by , consisting of spinup () and spindown () states, and the
second by , consisting of spinright () and spinleft () states. The two vectors of
each basis encode the two possible bit values: in the basis, encodes 0 and encodes 1;
in the basis, encodes 0 and encodes 1. Here is the protocol:
Sending qubits. Alice and Bob repeat the following for j running from 1 to n, where n is
some large number. The random choices made at one iteration are independent of
those made at other iterations.
1. Alice chooses a bit b
j
{0, 1} uniformly at random. She also chooses B
j
to be
one of the bases or uniformly at random, independent of b
j
. She prepares
a qubit in a state q
j
) encoding the bit b
j
in the basis B
j
(i.e., q
j
) is either or
for b
j
= 0, and either or for b
j
= 1), and sends the qubit q
j
) to Bob across
the quantum channel.
2. Bob receives the qubit sent from Alice, chooses a basis C
j
from{, } uniformly
at random, and measures the qubit projectively using C
j
, obtaining a bit value
c
j
according to the same encoding scheme described above.
This ends the quantum part of the protocol. All further communication between
Alice and Bob is classical and uses the public, classical channel.
Discarding uncorrelated bits. Note that if the quantum channel faithfully transmits all
of Alices qubits to Bob unaltered, then b
j
= c
j
with certainty whenever Alices basis
was the same as Bobs, i.e., whenever B
j
= C
j
; otherwise b
j
and c
j
are completely
uncorrelated (because and are mutually unbiased).
1. For each 1 j n, Bob tells Alice the basis C
j
he used to measure c
j
.
2. Alice replies to Bob with the set C = {j {1, . . . , n} : B
j
= C
j
} (C stands for
correlated). Let k be the size of C. Note that k is expected to be about n/2,
because each B
j
and C
j
were chosen independently.
133
3. Alice and Bob discard the results of all trials where B
j
C
j
. Alice retains the
bits b
j
and Bob retains the bits c
j
, for all j C. If the quantum channel was not
tampered with, then b
j
= c
j
for all j C.
Security check. 1. Alice chooses a subset S C uniformly at random (S stands for
security check). For example, she decides to put j into S with probability 1/2
independently for each j C. The set S is expected to have size about k/2.
2. Alice sends S to Bob along with the value of b
j
for each j S.
3. Bob checks whether b
j
= c
j
for every j S. If so, he tells Alice that they can
accept the protocol, in which case, Alice and Bob respectively discard the bits
b
j
and c
j
where j S and retain the rest of the bits b
j
and c
j
for j CS (about
k/2 or about n/4 many bits). On the other hand, if there are any discrepancies,
then Bob tells Alice that they should reject the protocol, in which case, all bits
are discarded and they start over with an entirely new run of the protocol.
Note that if the quantum channel is not tampered with, then Alice and Bob will accept
the protocol. Also notice that any third party monitoring the classical communication
between Alice and Bob knows nothing of the bits that Alice and Bob eventually retain.
Well explain why there is a good chance that Eve will be caught and the protocol rejected
if she tries to eavesdrop on the quantum channel during the initial qubit communication.
For technical simplicity, we will assume that there is only one way that Eve can
eavesdrop on the quantum channel: she can choose to measure some qubit in either of
the bases or , then send along to Bob some qubit that she prepares based on her
measurement. This is not a general proof of security then, because Eve could do other
things: measure a qubit in some arbitrary basis, or even couple the qubit to another
quantum system, let the combined system evolve, make a measurement in the combined
system, then send along some qubit to Bob based on that. She could even make correlated
measurements involving several of the sent qubits together. It takes some work to show
that Eves chances are being caught are not signicantly reduced by these more general
attacks, and we wont show the more general proof here.
If Eve happens to measure a qubit q
j
) in the same basis B
j
that Alice used, then this is
very good for Eve: She knows the encoded bit with certainty, and the postmeasurement
state is still q
j
), i.e., Eve didnot alter it. Soshe cansimplyretransmit the postmeasurement
qubit to Bob. In this case, if j S, then this qubit wont provide any evidence of tampering;
if j C S, then Eve knows one of the secret bits that Alice and Bob share, assuming
they accept the protocol.
With probability 1/2, however, Eve measures q
j
) in the wrong basis B
j
the one other
than B
j
. In this case, she gets a bit value uncorrelated with b
j
, but even worse (for Eve),
her measurement alters the qubit so as to lose any information about b
j
. She has to send
a qubit to Bob, and at this point she cannot tell that she has chosen the wrong basis, so
the best she can do is what she did before: resend the postmeasurement qubit to Bob. If
134
j C, then Bob will measure Eves altered qubit r
j
) using B
j
, and since r
j
) is in the basis
B
j
, which is mutually unbiased with B
j
, Bobs result c
j
will be completely random and
uncorrelated with Alices b
j
. If j S and b
j
c
j
, then Eve is caught and the protocol is
rejected.
To summarize, for each qubit q
j
) that Eve decides to eavesdrop on, Eve will get caught
measuring the qubit if and only if
she chooses the wrong basis (the one other than B
j
), and
j C (i.e., Bob uses B
j
to do his measurement and the bit is not discarded as
uncorrelated), and
Bob measures a value c
j
b
j
, and
j S (i.e., this is one of the bits Alice and Bob use for the security check).
Each of these four things happens with probability 1/2, conditioned on the event that
the things above it all happened. This makes the chances of Eve being caught on behalf
of this qubit to be (1/2)
4
= 1/16. If Eve decides to eavesdrop on qubits q
j
1
), . . . , q
j
)
for 1 j
1
< < j
< e
/16
,
which decreases exponentially in andis less than 1/e for 16. So Eve cannot eavesdrop
on more than 16 qubits without a high probability of being caught. If n 16, this is a
negligible fraction of the roughly n/4 bits retained by Alice and Bob if they accept the
protocol.
Exercise 22.3 Suppose that insteadof the securitycheckgivenabove, Alice andBobdecide
to do the following alternate security check:
1. Alice and Bob each compute the parities b =
jC
b
j
and c =
jC
c
j
of their
current respective qubits.
2. They compare b and c over the public channel.
3. If b c, then Alice and Bob reject the protocol and start over. Otherwise, they agree
on some j
0
C (it doesnt matter which), discard b
j
0
and c
j
0
, and retain the rest
of the bits b
j
and c
j
for j C {j
0
} as their shared secret, accepting the protocol.
[If they didnt discard one of the bits, then someone monitoring the public channel
would know the parity of Alices and Bobs shared bits. Discarding a bit removes
this information.]
135
How many bits on average do Alice and Bob retain in this altered protocol, assuming they
accept it? What are Eves chances of being caught if she eavesdrops on of the qubits,
where > 0?
In practice, polarized photons are used as qubits for the quantum communication
phase. Alice may generate these photons by a lightemitting diode (LED) and can send
them to Bob through ber optic cable. Photon polarization has a twodimensional state
space and so can serve as a qubit. The three standard mutually unbiased bases for photon
polarization (each given with its two possible states) are:
rectilinear (horizontal, vertical),
diagonal (northeastsouthwest, northwestsoutheast), and
circular (clockwise, counterclockwise).
Photons have the advantage that their polarization is easy to measure and is insensitive
to certain common sources of noise, e.g., stray electric and magnetic elds.
One technical problem is making sure that only one photon is sent at a time. Alice
sends a pulse of electric current through the LED, which emits light in a burst of coherent
photons with intensity (expected number of photons) proportional to the strength of the
current. If more than one photon is sent at a time (in identical quantum states), then Eve
could conceivably catch one of the photons and measure it, letting the other photon(s)
go through to Bob as if nothing had been tampered with. To reduce the probability of a
multiphoton burst, the current Alice sends through the LED must be exceedingly weak:
about one tenth the energy of a single photon, say. Then the expected number of photons
sent each time is about 1/10. This means that about nine times out of ten, no photons are
emitted at all. If > 0 is the ratio of the current energy divided by the energy of a single
photon (in this example, = 0.1), then the number of photons emitted in any given burst
satises a Poisson distribution with mean (?); that is, the probability that k photons are
emitted is
f(k, ) = e
k
k!
,
where k is any nonnegative integer. If = 0.1, then f(k, 0) = e
2
/2
0.005, or about one twentieth the probability of a single photon. More photons occur with
rapidly diminishing probability. So if we ignore the times when no photons are emitted
(Bob tells Alice that he did not receive a photon), the chances of multiple photons is
smallabout 1/20. The smaller is, the smaller this probability will be, but the tradeo
is that we have to wait longer for a single photon.
Of course, the quantum channel could also be subject to random, nonmalicious noise,
which would cause discrepancies between Alices and Bobs bits. One subtlety is to make
136
the protocol tolerate a certain amount of noise but still detect malicious tampering with
high probability.
23 Week 11: Basic quantum information
We now start our discussion of quantum information. One of the major uses of quantum
information theory is to analyze how noise can disrupt a quantum computation and how
to make the computation resistant to it. The textbook discusses quantum information
in earnest in Chapters 812, with quantum error correction in Chapter 10 and quantum
information theory in Chapter 12. Quantum information is one of the textbooks real
strong points, and I will assume you will read starting with Chapter 8. The lectures will
ll in some background and reiterate points in the text. In the next few topics, we will be
using the density operator formalism almost exclusively. We really have no choice about
this once we generalize our notion of state to include mixed states.
Norms of Operators. Recall the denition of the HilbertSchmidt inner product on L(H)
(Equation (8)). This suggests another way to dene the norm of an operator:
A
2
:= AA)
1/2
= (tr(A
2
))
1/2
.
This norm, known as the Euclidean norm, the L
2
norm, or the HilbertSchmidt norm, satises
all ten of the properties satised by the operator normof Denition 18.4 except property 4;
in fact, I
2
=
j=1
s
p
j
_
1/p
, (78)
where s
1
, . . . , s
n
0 are the eigenvalues of A, which are called the singular values of A.
(If p is not an integer, then technically, we have not yet dened A
p
, because A is an
operator. For now, you can ignore the middle expression in the equation above and use
the righthand side for the denition of A
p
.) For p = 2, we get the Euclidean norm.
The L
1
normA
1
= tr A is also called the trace norm and is often useful. In addition, we
could dene the L
norm
A
= lim
p
A
p
= max(s
1
, . . . , s
n
),
but this is precisely the operator normA of Denition 18.4, because as p gets large, the
largest term in the sum in (78) starts to dominate.
137
Exercise 23.1 Show that if A is an operator on an ndimensional space, and 1 p q
are real numbers, then A
p
A
q
. Also show that nA A
1
. Thus all these
norms are within a factor of n of each other. What is I
p
? [Hint: For the rst part, x
s
1
, . . . , s
n
0 and dierentiate the expression
_
n
j=1
s
p
j
_
1/p
with respect to p, and show
that the derivative is always negative or zero.]
POVMs. Let S be a physical system with state space H
S
. Often, we want to get some
classical information about the current state of S. We can perform a projective measure
ment on H
S
, obtaining various possible outcomes with various probabilites. This is not
the only way to get information about the state of S, however. We could instead cou
ple the system S with another system T in some known state in the state space H
T
, let
the combined system ST evolve for a while, then make a projective measurement of the
combined system, i.e., on the space H
S
H
T
. This approach is more general and can get
information that cannot be obtained by a projective measurement on H
S
itself.
Recall that mathematically, a projective measurement ona Hilbert space Hcorresponds
toa complete set of orthogonal projectors {P
j
: j I}, where I is the set of possible outcomes.
Well now relax this restriction a bit.
Denition 23.2 Let H be a Hilbert space. A positive operatorvalued measure, or POVM on
H is a set M = {M
j
: j I} where I is some nite or countably innite set (the possible
outcomes), each M
j
0 is a positive operator in L(H), and
jI
M
j
= I,
the identity operator on H. If L(H) is a state, then measuring with respect to M
yields outcome j I with probability Pr[j] := tr(M
j
).
We need to check that the Pr[j] really form a probability distribution. Fix a state
L(H). For each j I, we have
Pr[j] = tr(M
j
) = tr(
_
M
j
_
M
j
) = tr(
_
M
j
_
M
j
).
Since 0 and
_
M
j
=
_ _
M
j
_
_
M
j
_
M
j
v
_
=
_
v
_
_
M
j
_
_
M
j
v
_
= uu) 0,
where we set u :=
_
M
j
v. Thus
_
M
j
_
M
j
0, and so Pr[j] = tr(
_
M
j
_
M
j
) 0.
Furthermore,
jI
Pr[j] =
j
tr(M
j
) = tr
__
j
M
j
_
_
= tr(I) = tr = 1.
138
So the Pr[j] do form a probability distribution. It is important to note that the only
properties of that we used here are that 0 and that tr = 1. This is important,
because we are about the expand our denition of state to include mixed states, which
may no longer be projection operators, but are still positive and have unit trace.
Notice that for a POVM, we dont specify the postmeasurement state. This is okay
quite often, we dont care what the postmeasurement state is; we only care about the
outcomes and their statistics, and a POVMprovides the most general means of measuring
a quantum system if we dont care about the state after the measurement. Well show in a
bit that a POVMis equivalent to what we described above: coupling the systemto another
system, letting the combined system evolve, then making a projective measurement on
the combined system. Notice that a projective measurement on H is just a special case of
a POVM where the M
j
are projectors projecting onto mutually orthogonal subspaces.
Exercise 23.3 Consider the following threeoutcome POVM:
M
1
=
1
4
_
2 1
1 1
_
, M
2
=
1
4
_
1 i
i 1
_
, M
3
=
1
4
_
1 1 i
1 +i 2
_
.
Let = ), where
) =
1
5
_
4
3i
_
.
What is the probability of each of the three outcomes when is measured using the POVM
above? Find a unit vector ) such that when ) is measured with the POVM, the
second outcome occurs with probability 0. (Challenging part) Prove that the rst outcome
occurs with positive probability no matter what the state is. What is the essential property
of M
1
that makes this true? Clearly, M
2
does not share this property. Does M
3
? [Hint:
When computing the probabilities above, you can save yourself some calculation by using
the fact that tr(M
j
) = M
j
) for all j.]
Mixed States.
Denition 23.4 Let A
1
, . . . , A
k
be scalars, vectors, operators, matrices, etc., all of the same
type. A convex linear combination of A
1
, . . . , A
k
is a value of the form
k
i=1
p
i
A
i
,
where the p
i
are real scalars, each p
i
0, and
k
i=1
p
i
= 1. In other words, p
1
, . . . , p
k
form a probability distribution.
Suppose Alice has a lab where she can prepare several states
1
= 
1
)
1
, . . . ,
k
=

k
)
k
 H, and she ips coins and decides to prepare a state chosen at random
139
from this set, where each
i
is chosen with probability p
i
. She then sends the state she
prepared to Bob, without telling him what it is. What can Bob nd out about the state
that Alice sent her? He can, most generally, perform a measurement corresponding to
some POVM{M
j
: j I}. The probability of obtaining any outcome j, taken over both the
POVM and Alices random choice is then
Pr[j] =
k
i=1
Pr[j  = 
i
)
i
] Pr[ = 
i
)
i
] =
i
tr(M
j
i
)p
i
= tr(M
j
),
where =
k
i=1
p
i
i
is a convex combination of the
i
with the associated probabilities.
So all Bob can ever determine physically about Alices is given by the single operator ,
which is called a mixed state. By denition, a mixed state is any nontrivial convex linear
combination of onedimensional projectors. By nontrivial we mean that all probabilities
are strictly less than 1, or equivalently, there are at least two probabilities that are nonzero.
Mathematically, a mixed state behaves in many ways much like a state of the form)
for some unit vector ) (i.e., a onedimensional projector). It represents the state of
a quantum system about which we have incomplete information, or which we are not
describing completely. Completely described states, which up until now we have been
dealing with exclusively, are of the form) for unit vectors ). From now on we will
call these latter states pure states, and when we use the word state unqualied, we will
mean either a pure or mixed state. Both kinds of states are convex combinations of pure
states, trivial or otherwise. A mixed state is then some nontrivial probabilistic mixture, or
weighted average, of pure states.
Lets verify that if is any state (say, =
k
i=1
p
i
i
, where the p
i
form a probability
distribution and the
i
are all pure states), we have 0 and tr = 1. For positivity, let v
be any vector. Then
vv) =
k
i=1
p
i
v
i
v) 0,
because all the
i
are positive operators. Thus 0. By linearity of the trace, we have
tr =
i=1
p
i
tr
i
=
i
p
i
= 1,
becase all the
i
have unit trace. The converse of this is also true.
Proposition 23.5 If L(H) is such that 0 and tr = 1, then is a convex linear
combination of onedimensional projectors that project onto mutually orthogonal subspaces.
Proof. Suppose 0 and tr = 1. Since is normal, it has an eigenbasis {
1
), . . . , 
n
)}.
With respect to this eigenbasis, is represented by the matrix diag(p
1
, . . . , p
n
) for some
p
1
, . . . , p
n
and so =
n
i=1
p
i

i
)
i
. Since 0, all the p
i
are nonnegative real, and
140
further 1 = tr =
i
p
i
. So is a convex combination of 
1
)
1
, . . . , 
n
)
n
, which
project onto mutually orthogonal, onedimensional subspaces. 2
Thus we get the following two corollaries:
Corollary 23.6 An operator L(H) is a state (i.e., a convex combination of onedimensional
projectors) if and only if is positive and has unit trace.
Corollary 23.7 An operator L(H) is a state if and only if is normal and its eigenvalues
form a probability distribution.
Measuring a mixed state with a POVM has exactly the same mathematical form as
with a pure state. Recall that the only two properties of the state we used to show
that the measurement makes sense is that 0 and tr = 1, both of which are true of
any mixed state. Similarly, unitary time evolution of a mixed state has exactly the same
mathematical form as with a pure state. Indeed, if =
i
p
i
i
is some mixture of pure
states, then evolving via a unitary operator U should be equivalent to evolving each
i
by U and taking the same mixture of the results. By linearity, this gives
i
p
i
(U
i
U
) = U
_
i
p
i
i
_
U
= UU
.
Finally, we wont bother proving it, but Equations (25) and (26), which describe projective
measurements, are equally valid for mixed states as well as for pure states.
Dierent probability distributions of pure states can yield the same state, but if they
do, they are physically indistinguishable, that is, no physical experiment can tell one
distribution from the other with positive probability. However, for any state , there is a
preferred mix of pure states that yields , namely, the eigenstates 
1
)
1
, . . . , 
n
)
n

used in the proof of Proposition 23.5, with their respective eigenvalues as probabilities.
The state are distinguished by the fact that they are pairwise orthogonal. We will call this
preferred probability distribution the eigenvalue distribution of .
Its time for an example. Alice may send Bob a single qubit in state 0) with probability
1/2 and state +) = (0) +1))/
2
_
1
2 1
_
with eigenvalue p
1
= (2 +
2)/4,

2
) =
1
_
4 2
2
_
2 1
1
_
with eigenvalue p
2
= (2
2)/4.
141
Thus = p
1

1
)
1
 + p
2

2
)
2
. So if Carol prepares 
1
)
1
 with probability p
1
and

2
)
2
 with probability p
2
, then ships her state to Bob, then Bob (who doesnt see who
the sender is) cant tell with any advantage over guessing who sent him the state.
Exercise 23.8 Do a similar analysis as that above, this time assuming Alice sends (40) +
31))(40 +31)/25 with probability 1/2 and (40) 31))(40 31)/25 with probability
1/2.
Exercise 23.9 Prove that any convex combination of states (pure or mixed) is a state.
OneQubit States and the Bloch Sphere. Recall that we have a nice geometrical repre
sentation of onequbit pure states: for each onequbit pure state there correspond unique
x, y, z R such that x
2
+y
2
+z
2
= 1 and = (I +xX+yY +zZ)/2, and conversely, for any
point (x, y, z) on the unit sphere in R
3
(Bloch sphere), the operator (I +xX+yY +zZ)/2 is
a onequbit pure state.
Can we characterize general onequbit states in a similarly geometrical way? Yes. Let
=
k
i=1
p
i
i
be any onequbit state, where the
i
are onequbit pure states and the p
i
form a probability distribution as usual. For 1 i k, let (x
i
, y
i
, z
i
) be the point on the
Bloch sphere such that
i
= (I +x
i
X +y
i
Y +z
i
Z)/2. Then by linearity we have
=
k
i=1
p
i
i
=
i
p
i
_
I +x
i
X +y
i
Y +z
i
Z
2
_
=
I +xX +yY +zZ
2
,
where (x, y, z) :=
k
i=1
p
i
(x
i
, y
i
, z
i
) R
3
. That is, corresponds geometrically to the
point (x, y, z) R
3
that is the convex combination of all the points (x
i
, y
i
, z
i
), weighted by
the same probabilities p
i
used to weight in terms of the
i
. We note that
_
x
2
+y
2
+z
2
= (x, y, z)
k
i=1
p
i
(x
i
, y
i
, z
i
) =
i
p
i
= 1,
and the inequality is strict i there are at least two distinct points (x
i
, y
i
, z
i
) on the sphere
with p
i
> 0. This means that the point (x, y, z) is somewhere on or inside the Bloch sphere.
The surface points of the Bloch sphere correspond to the pure states, and the points in the
interior correspond to mixed states. A onequbit unitary U rotates a mixed state in the
interior just as it does points on the surface of the sphere (it rotates all of R
3
, in fact).
We can get some important facts about based on its geometry. For example, if
= (I+xX+yY +zZ)/2, then let r = (x, y, z) = (x
2
+y
2
+z
2
)
1/2
1 be the distance from
(x, y, z) to the origin. Thenthe eigenvalues of are (1r)/2, andif r > 0, the corresponding
eigenvectors are the states corresponding to the antipodal points (x, y, z)/r onthe surface
of the sphere. ((I + (x/r)X + (y/r)Y + (z/r)Z)/2 has eigenvalue (1 + r)/2, while (I
(x/r)X (y/r)Y (z/r)Z)/2 has eigenvalue (1 r)/2, which are the two probabilities in
142
the eigenvalue distribution of .) These are the points where the line through (0, 0, 0) and
(x, y, z) intersects the surface of the sphere. (If r = 0, then (x, y, z) is the origin, = I/2,
and every vector is an eigenvector with eigenvalue 1/2.)
Exercise 23.10 Prove all the assertions in the paragraph above. [Hint: You could certainly
compute the eigenvectors and eigenvalues of by brute force if you had to. Alternatively,
you might note that if you let
1
= 
1
)
1
 = (I + (x/r)X + (y/r)Y + (z/r)Z)/2 and
2
=

2
)
2
 = (I (x/r)X (y/r)Y (z/r)Z)/2, then
1

2
) = 0 because the corresponding
points are antipodal, and further, is a convex combination of
1
and
2
. What are the
coecients of this combination in terms of r? What does the matrix of look like in the
{
1
), 
2
)} basis?]
143
24 Week 12: Quantum operations
The Partial Trace. We sometimes have a system T that we are interested in couple
with another system S that we are not interested in, producing an entangled state in the
combined systemST. Since we only care about systemT, does it make sense to ask, what
is the state of T? even though it is entangled with S? The partial trace operator lets us
do just that.
Let H
S
and H
T
be Hilbert spaces. There is a unique linear map tr
S
: L(H
S
H
T
)
L(H
T
) such that for every A L(H
S
) and B L(H
T
),
tr
S
(AB) = (tr A)B. (79)
The map tr
S
is an example of a partial trace. When we apply tr
S
, we often say that we
are tracing out the system S. There can be only one linear map satisfying (79), because
L(H
S
H
T
) is spanned by tensor products of operators. Suppose H
S
has dimension m
and H
T
has dimension n. Suppose some operator C L(H
S
H
T
) is written in block
matrix form with respect to some product basis:
C =
_
_
B
11
B
12
B
1m
B
21
B
22
B
2m
.
.
.
.
.
.
.
.
.
.
.
.
B
m1
B
m2
B
mm
_
_
,
where each block B
ij
is an n n matrix. Then we can also write C uniquely as
C =
m
i,j=1
E
ij
B
ij
,
where E
ij
is the mm matrix whose (i, j)th entry is 1 and all other entries 0. The partial
trace of C is then given in matrix form as
tr
S
(C) =
i,j
(tr E
ij
)B
ij
=
m
i=1
B
ii
,
which is the sum of all the diagonal blocks of C and is an n n matrix.
We mayalternativelytrace out the systemT viathe unique linear maptr
T
: L(H
S
H
T
)
L(H
T
) that satises
tr
T
(AB) = (tr B)A
for any A L(H
S
) and B L(H
T
). If C is as above, then
tr
T
(C) =
m
i,j=1
(tr B
ij
)E
ij
=
_
_
tr B
11
tr B
12
tr B
1m
tr B
21
tr B
22
tr B
2m
.
.
.
.
.
.
.
.
.
.
.
.
tr B
m1
tr B
m2
tr B
mm
_
_
,
144
which is an mm matrix.
The partial trace operator extends in a similar way to combinations of several systems
at once. Intuitively, tracing out a system is a bit like averaging over the system. In tensor
algebra, the partial trace operators and the (total) trace operator are called contractions.
If systemST is in some separable (i.e., tensor product) state =
S
T
L(H
S
H
T
),
where
S
L(H
S
) and
T
L(H
T
) are states in S and T, respectively, then tr
S
=
(tr
S
)
T
=
T
and tr
T
= (tr
T
)
S
=
S
. So we can say unequivocably that the system S
is in state tr
T
and the systemT is in state tr
S
. If is entangled, then we can still say that
system S is in state tr
T
and T is in tr
S
, but now these two states (called reduced states)
are (nontrivially) mixed states, even if itself is a pure state. Thus by tracing out one or
the other system, we will lose some information about the state of the remaining system
if the original combined state was entangled.
Open Systems and Quantum Operations. A closed quantum system is one that does
not interact with the outside world. Closed systems evolve unitarily. An open quantum
system does couple with one or more other systems (collectively called the environment)
that we wish to ignore. By considering open systems, we will obtain a powerful for
malism for describing what can happen to a quantum system that may interact with its
environment. This formalism, the formalism of quantum operations is general enough to
encompass both unitary evolution and measurements.
There are two equivalent views of a quantum operation: one, the coupledsystems
representation, where we include the environment then trace it out, and the other, the
operatorsum representation, where we simply apply operators to states in the system with
out mentioning the environment. Well show that these two views are equivalent. The
coupledsystems view is more physically intuitive, while the operatorsum view is more
mathematically convenient.
We now formally describe a quantum operation E on some system S according to the
coupledsystems view. We imagine S (state space H
S
of dimension n) in some state . We
now consider another systemT (whose state space H
T
has dimension N), in some known
or prepared pure state 0)0, initially isolated from system S. The combined state of TS is
then 0)0 initially.
18
We now couple T and S together and let the combined system
TS evolve according to some unitary operator U L(H
T
H
S
), resulting in the state
U(0)0 )U
. We now forget the system T by tracing it out, obtaining the nal state
= E() := tr
T
[U(0)0 )U
] L(H
S
). (80)
Because all the components making up the denition of E in (80) are linear maps, E itself
is linear, mapping L(H
S
) into L(H
S
), and it depends implicitly on the systemT, its initial
state 0)0, and U.
18
The textbook puts the auxilliary system T on the right, whereas we put it on the left. The two ways
are equivalent, but ours will be more consistent with a block matrix representation well use later when we
prove equivalence of the two views.
145
At rst blush, the operator sum formulation of the quantum operation E looks com
pletely dierent. We pick some nite collection of operators K
1
, . . . , K
N
L(H
S
) (for some
N 1) that are completely arbitrary except that we must have
N
j=1
K
j
K
j
= I. (81)
We then dene, for any L(H
S
),
= E() :=
N
j=1
K
j
K
j
L(H
S
). (82)
Dened this way, the operation E is evidently a linear map from L(H
S
) to itself, and it
depends implicitly on the choice of K
1
, . . . , K
N
, which are called Kraus operators. Well
show in a minute that the two denitions of E just described are equivalent.
Exercise 24.1 Verify that if is a state (i.e., 0 and tr = 1), then the
dened in (82)
is also a state.
The next exercise shows that quantum operations include unitary evolution.
Exercise 24.2 Show that unitary evolution of the system S through a unitary operator
U L(H
S
) is a legitimate quantum operation. Argue with respect to both views of
quantum operations.
For another example, suppose we make a projective measurement on the system S
in state using some complete set {P
1
, . . . , P
k
} of orthogonal projectors in L(H
S
)but
we dont bother to look at what the outcome of the measurement is. Then for all we
know, the postmeasurement state of S will be a mixture of the postmeasurement states
corresponding to all the possible outcomes, weighted by their probabilities. That is, using
Equation (26), the state of S after this informationfree measurement should be
19
=
k
j=1
Pr[j]
P
j
P
j
Pr[j]
=
j
P
j
P
j
.
This looks like the operator sum representation of a quantum operation (Equation (82)),
and indeed we have
I =
k
j=1
P
j
=
j
P
2
j
=
j
P
j
P
j
,
because the P
j
form a complete set of projectors. Thus P
1
, . . . , P
k
satisfy Equation (81), and
this informationfree measurement is a quantum operation.
19
A minor technical point: To be welldened, the rst sum in the next equation is really only over those
j for which Pr[j] > 0. However, the second sum is over all j. The two sums are still equal, because if
Pr[j] = tr(P
j
P
j
) = 0 for some j, then P
j
P
j
= 0 by Exercise 9.22.
146
Equivalence of the CoupledSystems and OperatorSum Representations. First well
show that every quantum operation dened by the coupled system denition has an
operator sum representation. Suppose that E : L(H
S
) L(H
S
) is dened, for all
L(H
S
), as
E() = tr
T
[U(0)0 )U
],
where T is a system with state space H
T
, 0) H
T
is a unit vector, and U L(H
T
H
S
) is
unitary. Let n = dim(H
S
) and let N = dim(H
T
). Well pick a product basis for H
T
H
S
so that we can work directly with matrices. Let {e
1
), . . . , e
N
)} be an orthonormal basis
for H
T
and let {f
1
), . . . , f
n
)} be an orthonormal basis for H
S
. We can choose these bases
arbitrarily, so well assume that e
1
) = 0). With respect to the product basis {e
i
) f
j
) :
1 i N & 1 j n}, the operator U can be written uniquely in block matrix form as
U =
N
a,b=1
E
ab
B
ab
,
where each B
ab
is an n n matrix, and each E
ab
= e
a
)e
b
 is the N N matrix whose
(a, b)th entry is 1 and all the other entries are 0. Noting that E
ab
E
cd
= e
a
)e
b
e
c
)e
d
 =
bc
E
ad
, we have
U(e
1
)e
1
 )U
=
N
a,b,c,d=1
(E
ab
B
ab
)(E
11
)(E
cd
B
cd
)
(83)
=
a,b,c,d
E
ab
E
11
E
dc
B
ab
B
cd
(84)
=
a,c
E
ac
B
a1
B
c1
. (85)
Tracing out the rst component of each tensor product, andusing the fact that tr E
ac
=
ac
,
we get
E() = tr
T
[U(e
1
)e
1
 )U
] =
N
a,c=1
(tr E
ac
)B
a1
B
c1
=
N
a=1
B
a1
B
a1
, (86)
which has the form of (82) if we let K
a
= B
a1
. Were done if (81) holds. Since U is unitary,
we have
I = U
U =
N
a,b,c,d=1
(E
ab
B
ab
)
(E
cd
B
cd
)
=
a,b,c,d
E
ba
E
cd
B
ab
B
cd
=
a,b,d
E
bd
B
ab
B
ad
=
b,d
E
bd
a
B
ab
B
ad
_
.
147
Abusing notation by using the same I to represent identity matrices of dierent dimen
sions, we also have
I =
N
b=1
E
bb
I =
N
b,d=1
bd
E
bd
I.
By uniqueness of the two decompositions of I, we must have
N
a=1
B
ab
B
ad
=
bd
I
for all 1 b, d N. Finally setting b = d = 1, we get
N
a=1
B
a1
B
a1
= I,
which means that (81) holds and we have a legitimate operator sum representation of
E().
Well now show the other direction. Suppose we are given an operator sum rep
resentation of E in the form of some collection K
1
, . . . , K
N
of Kraus operators such that
N
j=1
K
j
K
j
= I. We want to nd a coupledsystems representation of E. As before, we will
x some orthonormal basis {f
j
)}
1jn
of H
S
, so that we can talk about matrices instead of
operators. Dene K to be the nNn matrix
K =
_
_
K
1
.
.
.
K
N
_
_.
The condition that
N
j=1
K
j
K
j
= I can be written in block matrix form as
K
K =
_
K
1
K
_
K
1
.
.
.
K
N
_
_ = I. (87)
Here we are multiplying an n nN matrix on the left and an nN n matrix on the
right to get the n n identity matrix. Consider the columns of K as nNdimensional
column vectors. Equation (87) is equivalent to saying that the columns of K form an
orthonormal set. By GramSchmidt, we can take these column vectors as the rst n
vectors in an orthonormal basis for C
nN
. We assemble these basis vectors as the columns
of an nNnN matrix U written in block form by
U =
_
_
B
11
B
12
B
1N
B
21
B
22
B
2N
.
.
.
.
.
.
.
.
.
.
.
.
B
N1
B
N2
B
NN
_
_
,
148
where each B
ab
is an n n matrix, and the rst n columns of U form K, i.e., K
a
= B
a1
for 1 a N. The orthonormality of the columns of U is equivalent to the equation
U
] = tr
T
(VAV
B) = (tr(VAV
] = tr
T
C
for every C L(H
T
H
S
). In other words, if we eventually trace out the environment T,
then it doesnt matter if we evolve its state unitarily or not. Lets conjugate Equations (83
85) by V I, where V is an NNunitary matrix and I is the nn identity matrix. Noting
that V I =
N
a,b=1
[V]
ab
E
ab
I, we get
(V I)U(e
1
)e
1
 )U
(V I)
=
N
a,b,c,d,e,f,g,h=1
([V]
ef
E
ef
I)(E
ab
B
ab
)(E
11
)(E
cd
B
cd
)
([V]
gh
E
gh
I)
a,...,h
[V]
ef
[V]
gh
(E
ef
E
ab
E
11
E
dc
E
hg
) (B
ab
B
cd
)
=
a,c,e,g
[V]
ea
[V]
gc
E
eg
B
a1
B
c1
.
Tracing out T, we have
E() = tr
T
[U(e
1
)e
1
 )U
]
= tr
T
[(V I)U(e
1
)e
1
 )U
(V I)
]
=
a,c,e,g
([V]
ea
[V]
gc
tr
T
E
eg
)B
a1
B
c1
=
a,c,e
[V]
ea
[V]
ec
B
a1
B
c1
149
=
e
_
a
[V]
ea
B
a1
_
c
[V]
ec
B
c1
_
=
K
e
e
,
where
K
e
:=
N
a=1
[V]
ea
B
a1
=
N
a=1
[V]
ea
K
a
for all 1 e N. So these equations give us the eect of V on the Kraus operators.
Exercise 24.3 Show by direct calculation that if K
1
, . . . , K
N
L(H
S
) are operators such
that
N
j=1
K
j
K
j
= I, and for all 1 j N we dene
K
j
:=
N
a=1
[V]
ja
K
a
for some xed
NN unitary matrix V, then
N
j=1
K
j
= I,
and for every L(H
S
),
N
j=1
K
j
j
=
N
j=1
K
j
K
j
.
So we are allowed to choose V to be any unitary matrix we want without aecting the
quantum operation. Well pick a specic V as follows: Given any set of Kraus operators
K
1
, . . . , K
N
, let T be the NN matrix whose (i, j)th entry is
[T]
ij
:= K
j
K
i
) = tr(K
j
K
i
),
for 1 i, j N. Note that [T]
ij
= K
j
K
i
) = K
i
K
j
)
= [T]
ji
, and so T is a Hermitean
matrix. Since T is normal, we can choose V so that VTV
N
a=1
[V]
ja
K
a
as before, we get, for all 1 i, j N,
_
K
i
K
j
_
=
N
a,b=1
[V]
ia
[V]
jb
K
a
K
b
)
=
a,b
[V]
jb
[T]
ba
[V
]
ai
= [VTV
]
ji
=
j
ij
for some values
1
, . . . ,
N
, because VTV
K
j
are pairwise orthogonal
with respect to the HilbertSchmidt inner product, and hence linearly independent. Since
150
the
K
j
occupy an n
2
dimensional space, there can only be at most n
2
many of them that
are nonzero.
Hence we have a normal form for the operatorsum representation of a quantum
operation: Any quantum operation on an ndimensional state space may be represented
by N n
2
many Kraus operators that are pairwise orthogonal.
Exercise 24.4 Explain why the values , . . . ,
N
above are all nonnegative reals.
Quantum Operations Between Dierent Hilbert Spaces. We have restricted our atten
tion to quantum operations of the form E : L(H) L(H), that is, linear maps that map
operators of a space to operators of the same space. This restriction is unnecessary, and it
is easy to imagine quantum operations mapping states in one space to states in another.
The partial trace operator itself is a good example of such a thing. The operatorsum view
is the easiest way to characterize these more general quantum operations. We will satisfy
ourselves with the following general denition, without going into the details of why it
is the best one. It certainly coincides with our previous view in the case where the two
spaces are the same.
Denition 24.5 Let Hand J be Hilbert spaces. A quantum operation fromHto J is a linear
map E : L(H) L(J) such that there exists an integer N > 0 and linear maps K
j
: H J
for 1 j N satisfying the completeness condition
N
j=1
K
j
K
j
= I
H
, such that for every
L(H),
E() =
N
j=1
K
j
K
j
.
Here, I
H
denotes the identity map on H. The K
j
are known and Kraus operators.
Recall that for any linear map K : H J, the adjoint K
mI
M
m
M = I. (Thus the M
m
satisfy the same completeness condition as the Kraus
151
operators did previously.) I use I here again to describe the set of possible outcomes. Ac
cording to the postulate, when a systemin state is measured using {M
m
}, the probability
of seeing an outcome m I is given by tr(M
m
M
m
), and the postmeasurement state
assuming outcome m occurred is M
m
M
m
/ tr(M
m
M
m
).
All of this is ne except that it is not general enough. There are legitimate physical
measurements that do not take this form. The measurements described by the book are all
guaranteed to produce pure states after the measurement, assuming that a pure state was
measured. There are, however, more imprecise measurements that may yield mixed
states after the measurement, even if the premeasurement state was pure.
As before with quantum operations, there are two equivalent views of a general
measurement: the coupledsystems view and the operatorsum view. The textbook gives
the operatorsum view. Ill describe both views, pointing out how the true operatorsum
view diers from the text, but Ill omit the proof of equivalence, which is very similar
to what I did earlier with quantum operations. If you want a chance to practice index
gymnastics yourself, Ill leave the details of the proof to you as an exercise.
Well only consider nitary measurements here, i.e., measurements with only a nite
set of possible outcomes. One can generalize our analysis to innitary measurements as
well.
In the coupledsystems view, a general measurement on a system S with state space
H
S
proceeds as follows, assuming is the premeasurement state of S:
1. Prepare another system T with (nite dimensional) state space H
T
in some initial
pure state 0)0.
2. Couple T with S, and let the combined system evolve unitarily according to some
unitary U L(H
T
H
S
), producing the state U(0)0 )U
.
3. Performa projective measurement on the systemTS, using some complete set {P
(m)
:
m I} of orthogonal projectors in L(H
T
H
S
). I is the (nite) set of possible
outcomes. By the usual rules, the probability of seeing any outcome m I is Pr[m] =
tr[P
(m)
U(0)0 )U
TS
m
:=
P
(m)
U(0)0 )U
(P
(m)
)
Pr[m]
=
P
(m)
U(0)0 )(P
(m)
U)
tr[P
(m)
U(0)0 )(P
(m)
U)
]
.
4. Trace out the systemT of the postmeasurement state toobtainthe postmeasurement
state of S:
S
m
:=
tr
T
[P
(m)
U(0)0 )(P
(m)
U)
]
tr[P
(m)
U(0)0 )(P
(m)
U)
]
.
152
Remember that the projectors satisfy
mI
P
(m)
=
mI
(P
(m)
)
P
(m)
= I L(H
T
H
S
).
This means that we have
mI
(P
(m)
U)
P
(m)
U = U
mI
(P
(m)
)
P
(m)
_
U = U
U = I
as well.
In the operatorsum view, a general measurement M of system S is described by a
(nite) set I of possible outcomes, and for each outcome m I a nite list M
(m)
1
, . . . , M
(m)
N
of operators in L(H
S
),
20
all satisfying
mI
N
j=1
(M
(m)
j
)
M
(m)
j
= I.
When system S in state is measured according to M, the probability of seeing an
outcome m is given by
Pr[m] := tr
_
N
j=1
M
(m)
j
(M
(m)
j
)
_
= tr(M
(m)
), (88)
where we set M
(m)
:=
N
j=1
(M
(m)
j
)
M
(m)
j
. If m is observed, then the postmeasurement
state of S is
m
:=
E
m
()
Pr[m]
:=
N
j=1
M
(m)
j
(M
(m)
j
)
Pr[m]
. (89)
Here, we dene E
m
() to be the numerator of the righthand side. Note that tr E
m
() =
Pr[m] 1. Anyoperator 0 suchthat tr 1 will be calleda partial state or anincomplete
state. The incompleteness of E
m
() is a reection of the fact that it might not occur with
certaintyits occurrence is conditioned on the outcome of the measurement being m.
Similarly, well call E
m
an incomplete quantum operation, since it might not be applied
with certainty. E
m
is clearly linear and maps positive operators to positive operators,
but it is not (necessarily) tracepreserving. The map
mI
E
m
() is, however, a
tracepreserving (i.e., complete) quantum operation.
Ill nish with two remarks about Equations (88) and (89): First, its easy to see that the
operators M
(m)
of (88) form a POVM, and the converse is also trueany POVM arises
from some general measurement where the postmeasurement state is neglected. To see
20
Actually, the lists could contain dierent numbers of operators, but we can assume they are all the same
length by padding shorter lists with copies of the zero operator.
153
this, let {M
(m)
mI
be any (nitary) POVM. If we dene measurement elements K
(m)
:=
M
(m)
for each m I, then these elements form the operatorsum view of a generalized
measurement, one operator per outcome, and the resulting outcome probabilities are the
same as with the given POVM. Second, Postulate 3 only allows one operator per outcome,
and so it is the special case of (89) where N = 1.
Completely Positive Maps. This is another optional topic. Weve seen that every (com
plete) quantum operation E maps states to states; equivalently, it has two properties:
1. E preserves positivity, i.e., if A 0 then E(A) 0, and
2. E is tracepreserving, i.e., tr E(A) = tr A.
Well see shortly that the converse does not hold. That is, there are linear maps satisfying
(1) and (2) above that are not legitimate quantum operations according to Denition 24.5.
To get a characterization, we need to strengthen (1) a bit. We say that E is positive if (1)
holds, i.e., if E maps positive operators to positive operators. The stronger condition we
need is that E be completely positivea condition that we now explain.
Quantum operations are linear maps, and we can form tensor products of these linear
maps just as we can with any linear maps. So, given two linear maps E : L(H) L(J)
and F : L(K) L(M) (H, J, K, and Mare Hilbert spaces), we dene E F as usual to be
the unique linear map L(H K) L(J M) that takes A B to E(A) F(B) for every
A L(H) and B L(K).
For every Hilbert space H we have the identity map I : L(H) L(H) dened by
I(A) = A for all A L(H). I is certainly a quantum operation, given by the single Kraus
operator I L(H). The next denition gives the strengthening of property (1) that we
need:
Denition 24.6 Let Hand J be Hilbert spaces. Alinear map E : L(H) L(J) is completely
positive if for every Hilbert space K, the map I E : L(KH) L(KJ is positive,
where I is the identity map on L(K).
If E is completely positive as in Denition 24.6, then E is also positive: If 0 is any
operator in either L(H) or L(J), then it is easy to check that 0 if and only if I 0,
where I L(K) is the identity operator. Then by assumption, if 0, then
(I E)(I ) = I(I) E() = I E() 0,
and thus E() 0. This means that E is positive. Therefore, complete positivity is at least
as strong a condition as positivity.
It may be counterintuitive, but there are maps E that are positive but not completely
positive. Heres a great example. Fix some orthonormal basis for H so that we can
154
identify operators on H with matrices. Now consider the transpose operator T that takes
any square matrix to its transpose (not the adjoint, just the transpose), i.e., T(A) = A
T
for
any matrix A. With respect to the chosen basis, we can think of T as a map from operators
in L(H) to operators in L(H), and it is clearly a linear map. T is also positive: For any
square matrix A, it is easily checked that if A 0 then A
T
0 as well (if A is normal,
then so is A
T
, and both matrices have the same spectrum). T is not completely positive,
however, provided dim(H) 2. Suppose H = K is the state space of a single qubit, and
we x the standard computational basis {0), 1)} for H. Consider the matrix
A = 
+
)
+
 =
1
2
_
_
1 0 0 1
0 0 0 0
0 0 0 0
1 0 0 1
_
_
0.
Applying I T (sometimes called the partial transpose) to Ameans taking the transpose of
each 2 2 block (the T part), but not rearranging the blocks at all (the I part). Thus,
(I T)(A) =
1
2
_
_
1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1
_
_
=
1
2
SWAP,
where we recall that the twoqubit SWAP operator swaps the qubits, i.e., SWAPa)b) =
b)a) for any a, b {0, 1}. The eigenvalues of SWAP are 1, 1, 1, 1 (see Exercise 12.2), and
so SWAP is not a positive operator. This shows that I T is not a positive map, and so T
is not completely positive.
Theorem 24.7 Let Hand J be Hilbert spaces, and let E : L(H) L(J) be linear. E is a quantum
operation (Denition 24.5) if and only if E is tracepreserving and completely positive.
Proof. First the forward direction. Let E be a quantum operation given in the operator
sum representation by Kraus operators K
1
, . . . , K
N
(linear maps from H to J) such that
N
j=1
K
j
K
j
= I
H
, where I
H
is the identity operator on H, and E() =
j
K
j
K
j
for any
L(H). (Recall that each K
j
is a linear map fromJ to H. See page 17.) We have
tr E() =
N
j=1
tr(K
j
K
j
) = tr
__
j
K
j
K
j
_
_
= tr
for any , so E is tracepreserving.
We showthat E is completely positive in two easy steps: (1) we showthat any quantum
operation is a positive map, and (2) we show that if K is any Hilbert space and I is the
identity on L(K), then I E is also a quantum operation, hence I E is positive, hence
155
E is completely positive. Step 1 was done in the case where H = J in Exercise 24.1. The
general case is similar: If L(H) is any positive operator and v) J is any vector, then
we want to show that vE()v) 0, which shows that E() is a positive operator in L(J),
hence E is a positive map. For 1 j N, dene u
j
) := K
j
v) H. We now have
vE()v) =
N
j=1
vK
j
K
j
v
_
=
j
(K
j
v))
j
v) =
j
u
j
u
j
) 0
as desired, since 0. Because E is an arbitrary quantum operation, this shows that
every quantum operation is a positive map.
For Step 2, let Kbe any Hilbert space, let I
K
L(K) be the identity operator on K, and
let I L(L(K)) be the identity map on L(K). We want to show that I E : L(KH)
L(KJ) is a quantum operation, so we must come up with Kraus operators for I E:
For 1 j N, dene L
j
= I
K
K
j
. Each L
j
is a linear map fromKH to KJ, and for
completeness, we have
N
j=1
L
j
L
j
=
j
(I
K
K
j
)
(I
K
K
j
) = I
K
j
K
j
K
j
_
= I
K
I
H
,
which is the identity map on K H. Finally, if L(K) and L(H) are arbitrary
operators and we set := , we have
(I E)() = (I E)( ) = E() =
N
j=1
(I
K
K
j
)( )(I
K
K
j
) =
j
L
j
L
j
. (90)
Both sides of Equation (90) are linear in , and so (90) extends to arbitrary L(KH).
This shows that I E is a quantum operation.
Now the reverse direction. Suppose E : L(H) L(J) is tracepreserving and com
pletely positive. We need to come up with Kraus operators for E. Let n := dim(H), and
let I be the identity map on L(H). Fix an orthonormal basis {e
1
), . . . , e
n
)} for H. Taking
the product of this basis with itself, we get a basis {e
i,j
) : 1 i, j n} for H H, where
for convenience we dene e
i,j
) := e
i
) e
j
). Dene the vector
v) :=
n
i=1
e
i,i
) =
i
e
i
) e
i
) HH.
The operator v)v L(HH) is clearly positive, and so by assumption, the operator
:= (I E)(v)v) L(HJ)
is also positive. (We are letting K = H.) We have v)v =
n
i,j=1
e
i
)e
j
 e
i
)e
j
 =
i,j
E
ij
E
ij
, where we let E
ij
:= e
i
)e
j
 as usual. Thus
=
n
i,j=1
E
ij
E(E
ij
).
156
Because 0, we can choose some eigenbasis {g
1
, . . . , g
N
} for , where N := n
2
=
dim(HH). This allows us to write
=
N
k=1
k
g
k
)g
k
,
where
1
, . . . ,
N
0 are the eigenvalues of . For 1 k N, we can now dene the
Kraus operator K
k
L(H) by its matrix with respect to the {e
i
)} basis: for all 1 i, j n,
dene
[K
k
]
ij
:=
_
k
e
j,i
g
k
).
We need to check that
N
k=1
K
k
K
k
= I (completeness) and that E() =
N
k=1
K
k
K
k
for all
L(H).
For completeness, xsome a, b {1, . . . , n}, andusingthe fact that Eis tracepreserving,
compute
_
N
k=1
K
k
K
k
_
ab
=
k
n
c=1
[K
k
]
ac
[K
k
]
cb
=
c
[K
k
]
ca
[K
k
]
cb
=
c
e
b,c
g
k
)g
k
e
a,c
)
=
c
e
b,c

_
k
g
k
)g
k

_
e
a,c
)
=
c
e
b,c
e
a,c
)
=
c
n
i,j=1
e
b,c
(E
ij
E(E
ij
))e
a,c
)
=
c,i,j
e
b
E
ij
e
a
)e
c
E(E
ij
)e
c
)
=
c
e
c
E(E
ba
)e
c
)
= tr[E(E
ba
)]
= tr E
ba
=
ab
.
From this we get that
N
k=1
K
k
K
k
is the identity matrix. This shows completeness.
157
Now let L(H) be arbitrary. Again, we compare matrix elements with respect to
the {e
i
)} basis. For any 1 a, b n, we have
_
N
k=1
K
k
K
k
_
ab
=
k
n
c,d=1
[K
k
]
ac
[]
cd
[K
k
]
db
=
k
n
c,d=1
[]
cd
[K
k
]
ac
[K
k
]
bd
=
c,d
[]
cd
e
c,a
g
k
)g
k
e
d,b
)
=
c,d
[]
cd
e
c,a
e
d,b
) (just as before)
=
c,d
[]
cd
n
i,j=1
e
c,a
(E
ij
E(E
ij
))e
d,b
)
=
c,d
[]
cd
i,j
e
c
E
ij
e
d
)e
a
E(E
ij
)e
b
)
=
c,d
[]
cd
e
a
E(E
cd
)e
b
)
= e
a

_
c,d
[]
cd
E(E
cd
)
_
e
b
)
= e
a
 E
_
c,d
[]
cd
E
cd
_
e
b
)
= e
a
E()e
b
)
= [E()]
ab
.
Thus E() =
N
k=1
K
k
K
k
as we wanted. 2
Exercise 24.8 (Optional) Showthat the composition of two quantumoperations is a quan
tum operation. That is, let E : L(H) L(J) and F : L(J) L(K) be quantum operations.
Show that F E : L(H) L(K) is a quantum operation, where F E is dened as
(F E)() := F(E()) for all L(H).
Exercise 24.9 (Challenging, Optional) Show that the tensor product of two quantum op
erations is a quantum operation. That is, let E : L(H) L(J) and F : L(K) L(M) be
quantum operations. Show that E F : L(HK) L(J M) is a quantum operation.
Denition 24.5 denes what are sometimes called complete quantum operations, and
a general quantum operation (not necessarily complete) is dened the same way, except
158
that we replace the completeness condition
N
j=1
K
j
K
j
= I
H
with the looser condition
N
j=1
K
j
K
j
I
H
. Incomplete quantum operations are used to describe physical processes
that may not happen with certainty, e.g., a general measurement that results in some
outcome m.
Exercise 24.10 (Challenging, Optional) Show that a linear map E : L(H) L(J) is a
not necessarily complete quantum operation, as described above, if and only if (1) E is
completely positive, and (2) for every state (positive operator with unit trace) L(H),
we have 0 tr[E()] 1. The quantity tr[E()] is interpreted as the probability that E
actually occurs. [Hint: Set L := I
H
N
j=1
K
j
K
j
, where the K
j
are the Kraus operators
corresponding to E as above. Since L 0, you can dene K
N+1
:=
() :=
N+1
j=1
K
j
K
j
for any L(H). Notice that E
()
,
and use it to prove facts about E.]
Exercise 24.11 (Challenging, Optional) Show that the partial trace map is always a (com
plete) quantum operation. [Hint: Let tr
H
: L(HJ) L(J) be a partial trace map. Fix
orthonormal bases {e
1
), . . . , e
n
)} and {f
1
), . . . , f
m
)} for Hand J, respectively, and for each
j with 1 j n, dene the Kraus operator K
j
: HJ J by
K
j
:= e
j
 I
J
=
m
k=1
f
k
)e
j
, f
k
,
where I
J
is the identity map on J, and e
j
, f
k
) := e
j
) f
k
). In other words, for every
vector in H J of the form u) v), we have K
j
(u) v)) = e
j
u)v). Show that the K
j
satisfy the completeness condition and dene tr
H
.]
25 Week 12: Distance and delity
Distance Measures. First, some basic denitions from probability theory.
Recall that we have been talking about a probability distribution as a nite list of values
p = (p
1
, p
2
, . . . , p
k
) such that p
j
0 for all 1 j k and
k
j=1
p
j
= 1. Here, the set
{1, . . . , k} is called the sample space. More generally, any nite or countable set can be
used as a sample space, in which case, a probability distribution on is a map p : R
such that p(a) 0 for all a , and
a
p(a) = 1. Subsets of are called events, and
elements of , which we identify with singleton subsets of , are called elementary events.
If S is some event, then the probability of S (with respect to the probability distribution
p, above) is dened as
Pr
p
[S] :=
aS
p(a).
159
We might drop the subscript p if it is clear what probability distribution we are using.
If p and q are two probability distributions over the same sample space, we are
interested in measures of the similarity or dierence between p and q. Well discuss two
here: the trace distance and the delity.
Denition 25.1 Let p and q be two probability distributions on the same sample space .
The trace distance (also called the L
1
distance or the Kolmogorov distance) between p and q is
dened as
D(p, q) :=
1
2
a
p(a) q(a).
It is easy to check that D satises the axioms for a metric on the set of all probability
distributions on . These are:
1. D(p, q) 0,
2. D(p, q) = 0 i p = q,
3. D(p, q) = D(q, p), and
4. D(p, r) D(p, q) +D(q, r),
for any probability distributions p, q, r on . Heres another way of characterizing the
trace distance: for any probability distributions p and q on ,
D(p, q) = max
S
Pr
p
[S] Pr
q
[S]
= max
S
_
Pr
p
[S] Pr
q
[S]
_
. (91)
Exercise 25.2 Prove Equation (91).
The trace distance guages the dierence between two distributions p and q. The
delity, on the other hand, is a measure of their similarity; it is maximized when p = q.
Denition 25.3 Let p and q be two probability distributions on the same sample space .
The delity of p and q is dened as
F(p, q) :=
a
_
p(a)q(a).
F(p, q) can be seen as the dot product of two real unit vectorsthe vector whose
ath entry is
_
p(a) and the vector whose ath entry is
_
q(a). Since these two vectors
clearly have unit norm, the delity is then the cosine of the angle between them. Thus we
immediately get 0 F(p, q) 1, with F(p, q) = 1 i p = q.
160
Trace Distance and Fidelity of Operators. Wed like to extend these denitions to quan
tum states, i.e., operators. A reasonable sanity check on the way we should dene such
an extension would be to say that if and are mixtures of the same set of pairwise
orthogonal pure states with (eigenvalue) probability distributions r and s, respectively,
then D(, ) should be equal to D(r, s), and F(, ) should be equal to F(r, s). Lets see
this in more detail. Suppose =
k
j=1
r
j
j
and =
k
j=1
s
j
j
, where the pure states
j
project onto mutually orthogonal subspaces (equivalently,
i
j
=
ij
i
for any i and j).
Now consider the operator  . We have
  =
_
( )
( )
=
_
( )
2
=
_
_
_
k
j=1
(r
j
s
j
)
j
_
2
_
_
1/2
=
_
k
j=1
(r
j
s
j
)
2
j
_
1/2
,
because the crossterms (
i
j
for i j) all vanish when we expand the expression inside
the square brackets. Since the
j
project onto mutually orthogonal subspaces, we can
choose an orthonormal basis in which all the
j
are diagonal matrices simultaneously.
Permuting the basis vectors if need be, we can assume that each
j
(which is a one
dimensional projector) is given by the matrix E
jj
. Thus
k
j=1
(r
j
s
j
)
2
j
=
j
(r
j
s
j
)
2
E
jj
= diag[(r
1
s
1
)
2
, (r
2
s
2
)
2
, . . . , (r
k
s
k
)
2
, 0, . . . , 0]. To take the square root of
this matrix, we just take the square root of each diagonal entry, which gives the matrix
diag[r
1
s
1
, r
2
s
2
, . . . , r
k
s
k
, 0, . . . , 0], and so this is  in matrix form. Taking one
half of the trace of this gives
1
2
tr   =
1
2
k
j=1
r
j
s
j
 = D(r, s).
This suggests that we can now dene the trace distance D(, ) for arbitrary operators
and as
D(, ) :=
1
2
tr   =
1
2
 
1
.
We can do something similar to dene the delity of two arbitrary positive operators.
I wont do the details here, but a reasonable denition is
F(, ) := tr
_
1/2
1/2
=
_
_
_
_
1
. (92)
It can be shown that F(, ) = F(, ) and that if and are states then 0 F(, ) 1
with F(, ) = 1 i = .
161
We do the same sanity check for F as we did for D, above. If and are commuting
states as before, i.e., = diag(r
1
, . . . , r
k
, 0, . . . , 0) and = diag(s
1
, . . . , s
k
, 0, . . . , 0) with
respect to the same orthonormal basis, then we have
F(, ) = tr
_
diag(r
1
s
1
, . . . , r
k
s
k
, 0, . . . , 0) =
k
j=1
r
j
s
j
= F(r, s).
Exercise 25.4 Show that AB
1
= BA
1
for any Hermitean operators A and B. Thus the
delity function F of (92) is symmetric. [Hint: Use Property 10 of the norm, which says
that C
1
= C

1
for any operator C.]
Properties of the Trace Distance. The trace distance of operators has an alternate char
acterization analogous to Equation (91). If A and B are operators, we say that A B if
B A 0. Well show that for any states and ,
D(, ) = max
projectors P
tr(P( )) = max
P0 & P=1
tr(P( )) = max
0PI
tr(P( )), (93)
where the three maxima are taken over all projectors P, all positive operators P of unit op
erator norm(L
.
Let
1
, . . . ,
n
R be the eigenvalues of A (A acts on an ndimensional space). The following
quantities are all equal:
1. (1/2)A
1
,
2. (1/2) tr A,
3.
i:
i
>0
i
,
4. max
projectors P
tr(PA),
5. max
0P & P=1
tr(PA),
6. max
0PI
tr(PA).
162
Proof. Well do these in increasing order of diculty.
(1) = (2) follows directly from the denition of 
1
(Equation (78)). For (2) = (3),
let p :=
i:
i
>0
i
and let q :=
i:
i
<0
i
. Note that p + q =
n
i=1
i
= tr A = 0, and
so q = p. Also note that the eigenvalues of A are 
1
, . . . , 
n
. This is easiest to see
by taking an eigenbasis for A (A is normal because it is Hermitean) and looking at the
matrices for A and A. So we have
tr A =
n
i=1

i
 =
i:
i
>0
i:
i
<0
i
= p q = 2p.
The inequalities (3) (4) (5) (6) are pretty straightforward and we leave these as
exercises.
It remains to show that (6) (3). Consider the expression max
0PI
tr(PA) of (6). The
key insight is to show rst that the maximum is achieved by some P that commutes with
A (i.e., PA = AP). Once that fact is established, the rest is easy: we can pick a common
eigenbasis for P and A and look at diagonal matrices.
Suppose that 0 P I and that P does not commute with A. We will nd an operator
P
such that 0 P
I and tr(P
:= e
iC
= I iC +O(
2
).
Then U
:= U
PU
for some > 0 that we will choose later. It is easy to check that 0 P
I. Now we have
tr(P
A) = tr(U
PU
A)
= tr[(I iC +O(
2
))P(I +iC +O(
2
))A
= tr[PA+iPCAiCPA+O(
2
)]
= tr(PA) +i[tr(PCA) tr(CPA)] +O(
2
)
= tr(PA) +i[tr(CAP) tr(CPA)] +O(
2
)
= tr(PA) +i tr[C(AP PA)] +O(
2
)
= tr(PA) + tr(C
2
) +O(
2
).
21
We are tacitly assuming that the maximumis achieved by some P such that 0 P I. This is in fact true,
and it follows from concepts in topology that we wont go into here, namely, continuity and compactness.
163
NowC
2
= C
A) > tr(PA). This shows that the maximum value of tr(PA) is achieved only when P
commutes with A, i.e.,
max
0PI
tr(PA) = max
0PI & PA=AP
tr(PA).
Finally suppose that 0 P I and that P commutes with A. Pick a common
eigenbasis for P and A so that, with respect to this basis, A = diag(
1
, . . . ,
n
) and
P = diag(
1
, . . . ,
n
). Since 0 P I, we must have 0
1
, . . . ,
n
1, but other
wise, we are free to choose the
j
arbitrarily (see the hint to Exercise 25.7, below). We now
have
tr(PA) =
n
j=1
j
,
and clearly this sum is the largest possible when we dene
j
:=
1 if
j
> 0,
0 otherwise.
For this choice of the
j
, we get tr(PA) =
j:
j
>0
j
, and weve shown that this is the
largest possible value for tr(PA) with 0 P I. (Note that the optimal P is a projector.
Thats a direct way to see that (4) (3).) 2
Exercise 25.6 Prove (3) (4) in Proposition 25.5, above. [Hint: Find a projector P such
that tr(PA) =
i:
i
>0
i
. This shows that
i:
i
>0
i
max
projectors P
tr(PA). To nd P,
consider the subspace spanned by all the eigenvectors of A with positive eigenvalues.]
Exercise 25.7 Prove (4) (5) (6) in Proposition 25.5, above. [Hint: The following easy
facts are useful for any operator P:
0 P I if and only if P is normal and all its eigenvalues are in the closed interval
[0, 1] R (consider an eigenbasis for P).
Recall that P is the maximum eigenvalue of P.
Recall (Exercise 9.25) that 0 P i P = P, and thus if 0 P then P is the largest
eigenvalue of P itself.
For (4) (5), note that every nonzero projector P satises 0 P and P = 1. You need
to treat the case where P = 0 separately. (5) (6) is straightforward.]
Exercise 25.8 Use Proposition 25.5 to prove Equation (93).
164
Exercise 25.9 Let
1
= (I+r)/2 = (I+r
x
X+r
y
Y+r
z
Z)/2 and = (I+
t)/2 = (I+t
x
X+
t
y
Y +t
z
Z)/2 be singlequbit states, wherer = (r
x
, r
y
, r
z
) R
3
and
t = (t
x
, t
y
, t
z
) R
3
are
either on or inside the Bloch sphere. Show that
D(, ) =
_
_
r
t
_
_
2
=
1
2
_
(r
x
t
x
)
2
+ (r
y
t
y
)
2
+ (r
z
t
z
)
2
.
In Proposition 25.13, below, Ill mention one more interesting property of the trace
distance: it can never increase via a quantum operation. This says that all (complete)
quantum operations are contractive with respect to the metric D. So if no classical infor
mation is coming out of an open quantum system, its dynamics tends to cause states to
become less distinguishable, not more. This is not necessarily the case with incomplete
quantum operations, where some classical information is obtained.
Lemma 25.10 For any Hermitean operator A, there exist unique operators Q and S such that
1. Q 0 and S 0,
2. QS = SQ = 0, and
3. A = QS.
Proof. Let
1
, . . . ,
k
be the distinct eigenvalues of A. By Corollary 9.14, we have a unique
decomposition
A =
1
P
1
+ +
k
P
k
,
where the P
j
form a complete set of orthogonal projectors. For existence, dene
Q :=
j:
j
>0
j
P
j
,
S :=
j:
j
<0
j
P
j
.
It is obvious that A = QS. Further, all eigenvalues of Qand S are either 0 or 
j
 0 for
various j (the uniqueness part of Corollary 9.14). Hence Q, S 0. Finally, QS = SQ = 0
because they share no projectors P
j
in common.
For uniqueness, suppose some operators Q
and S
and S
uniquely as
Q
=
1
Q
1
+ +
,
S
=
1
S
1
+ +
m
S
m
,
165
where
1
, . . . ,
,
1
, . . . ,
m
> 0 (since Q
, S
0) and the Q
i
and S
j
form two sets of
(nonzero) orthogonal projectors (not necessarily complete, since we are omitting the pos
sible zero eigenvalue in each sum). Then since Q
= 0, we have
0 = tr(Q
) =
j=1
m
k=1
k
tr(Q
j
S
k
). (94)
For every j and k, we have tr(Q
j
S
k
) = tr(Q
j
S
2
k
Q
j
) = tr [(S
k
Q
j
)
S
k
Q
j
] = S
k
Q
j
S
k
Q
j
) 0
(HilbertSchmidt inner product), with equality holding i S
k
Q
j
= 0. Thus every term in
the righthand side of (94) is nonnegative, and because each
j
k
> 0, it must be that
tr(Q
j
S
k
) = 0 (and thus S
k
Q
j
= Q
j
S
k
= 0) for all j and k. Therefore, the combined list
Q
1
, . . . , Q
, S
1
, . . . , S
m
is a set of orthogonal projectors. (If it is not complete, then add
T := I
j
Q
j
k
S
k
to the list.) Since Q
= A, we have
1
Q
1
+ +
1
S
1
m
S
m
+ 0T =
1
P
1
+ +
k
P
k
.
By the uniquess of the decompositions (Corollary 9.14), each (
j
, Q
j
) term must match
some unique (
j
, P
j
)term where
j
> 0 and vice versa, and each (
k
, Q
k
) term must
match some unique (
k
, P
k
)term where
k
< 0 and vice versa. It follows that Q
= Q
and S
= S. 2
The condition that QS = SQ = 0 is equivalent to Q and S having orthogonal support.
Exercise 25.11 Show that if A, Q, and S are as in Lemma 25.10, then A = Q S =
Q+S =
_
Q
2
+S
2
.
Exercise 25.12 Show that if A, Q, and S are as in Lemma 25.10, then tr A = tr Q + tr S.
[Hint: Either pick a common eigenbasis for Qand S or use the decomposition in the proof
of Lemma 25.10.]
Proposition 25.13 Let E : L(H) L(J) be a (complete, i.e., tracepreserving) quantum opera
tion, and let and be states in L(H). Then D(E(), E()) D(, ).
Proof. The operator satises Lemma 25.10, so uniquely write = QS, where
Q, S 0 and QS = SQ = 0. Now we have
D(, ) =
1
2
tr  
=
1
2
(tr Q+ tr S) (Exercise 25.12)
=
1
2
(tr[E(Q)] + tr[E(S)]) (E is tracepreserving)
= tr[E(Q)] (because tr(QS) = tr( ) = 0)
166
tr[PE(Q)] (where projector P maximizes tr[P(E() E())])
tr[P(E(Q) E(S))]
= tr[P(E() E())]
= D(E(), E()) (choice of P and Equation (93))
2
A particular example of Proposition 25.13 is when E is a partial trace operator (see
Exercise 24.11). The interpretation is that if we ignore part of a system, we lose distin
guishability between its states.
Properties of the Fidelity. An important special case of Equation (92) is when = )
is a pure state. We may prepare a pure state , then send it through a noisy quantum
channel (quantum operation E), producing a state at the other end. The delity F(, )
is a good measure of how much the state was garbled in the transmissionthe higher the
delity, the less garbling. For = ) and any state , we have
F(), ) = tr
_
)) =
_
) tr
_
) =
_
), (95)
noting that
_
) = ).
There is a fact about the delity analogous with Proposition 25.13 regarding the trace
distance. Well state it without proof.
Proposition 25.14 Suppose E : L(H) L(J) is a complete quantum operation. For any two
states , L(H),
F(E(), E()) F(, ).
Comparing Trace Distance and Fidelity. The trace distance and delity are roughly
interchangeable as measures of distinctness/similarity. For pure states and it can be
shown that D(, ) =
_
1 F(, )
2
. For arbitrary states and , it can be shown that
1 F(, ) D(, )
_
1 F(, )
2
.
So in most situations, it doesnt really matter which measure is used. The book uses the
delity measure almost exclusively.
167
26 Week 13: Quantum error correction
Quantum Error Correction. In this topic, well see ways to reduce the eects of noise in
a quantum channel, thereby increasing the delity between the input state to the channel
and the output state.
First, well see a typical scenario where this is done classically. Suppose Alice sends
individual bits to Bob across a channel that is noisy in the following sense: any bit b is
ipped to the opposite bit 1 b with probability p, independent of the other bits. Such
a channel is called the binary symmetric channel and is an oftenused model of classical
noise. We can assume that p 1/2, because if p > 1/2, then Bob would do well to ip
each bit he receives, making the eective error probability of 1 p < 1/2 per bit sent. If
p = 1/2, then all hope is lost; no information at all can be carried by the bits; Bob receives
independently random bits that are completely uncorrelated with those that Alice sent.
So well assume that p < 1/2 from now on.
To reduce the chances of error per bit, Alice and Bob agree on an binary errorcorrecting
code, which is some mapping
0 c
0
,
1 c
1
,
where c
0
and c
1
are strings over the binary alphabet {0, 1} (binary strings) of equal length,
called codewords. Instead of sending each bit b, Alice sends c
b
instead, and Bob decodes
what he receives to (hopefully) recover b. An obvious errorcorrecting code is
0 000,
1 111,
which well call the majorityof3 code. Alice wants to send a bit b (the plaintext or cleartext)
to Bob, so she sends bbb across the channel. When Bob receives the possibly garbled
string xyz of three bits from Alice, he decodes xyz to get the bit c as follows:
c := majority(x, y, z).
The bit b was decoded successfully i c = b. What is the failure probability, i.e., the
probability that c b due to unrecoverable errors? There will be a failure i at least two of
the three bits were ipped in transit. Since each is ipped with probability p independent
of the others, we have
Pr[failure] = 3(1 p)p
2
+p
3
= 3p
2
2p
3
.
The rst terminthe middle is the probability that exactly two of the three bits were ipped,
and the second term in the middle is the probability that all three bits were ipped. It is
easy to see that Pr[failure] < p if p < 1/2, and so this code reduces the probability of error
per plaintext bit from no encoding at all. Finally, note that Pr[failure] = O(p
2
), and so for
tiny p 1, the failure probability is reduced by a considerable factor.
168
)
0)
0)
Figure 10: The threequbit quantum majorityof3 code. An arbitrary onequbit state
) = 0) +1) is encoded as 
L
) = 0
L
) +1
L
) = 000) +111).
The Quantum BitFlip Channel. Now we can quantize the scheme above. Suppose
Alice sends qubits one at a time to Bob across a noisy quantum channel that we will call
the quantum bitip channel. In the quantum bitip channel, a Pauli X operator is applied
to each transmitted qubit with probability p < 1/2, independendently for each qubit. The
corresponding quantum operation is thus
E() := (1 p) +pXX, (96)
whose set of Kraus operators is {
1 pI,
pX}. Suppose that Alice sends some un
encoded onequbit pure state ) through the bitip channel E. Bob receives
:=
E()) = (1 p)) + pX)X. The delity between Alices sent state and Bobs
received state is, by Equation (95),
F(),
) =
_

)
=
_
(1 p) +pX)
2
_
1 p,
with equality holding if ) = 0) or ) = 1). So the delity without encoding can be as
low as
1 p.
Now suppose that Alice and Bob employ a quantum version of the majorityof3 code.
Alice encodes each plaintext qubit she sends to Bob as a threequbit code state using the
map
0) 0
L
) := 000),
1) 1
L
) := 111)
extended to all onequbit pure states by linearity. Here the subscript L stands for
logicalthree physical qubits are being used to encode one logical (uncoded) qubit.
Figure 10 shows how Alice encodes a single qubit in state ) := 0) + 1) as a three
qubit state 
L
) := 0
L
) + 1
L
). 
L
) lies in the code space, i.e., the twodimensional
subspace of the eightdimensional Hilbert space of three qubits spanned by 0
L
) and 1
L
).
169
The three qubits in state 
L
) are sent through the channel, and (we assume) each qubit
is subjected to the E of Equation (96) independently of the other two. Thus the channel
yields the output state
:= (E E E)(
L
)
L
) = E
3
(
L
)
L
)
Bob receives and wants to decode it to (hopefully) recover ). Now some issues arise
that arent a problem in the classical case. Most importantly, Bob cannot just measure the
physical qubits he receives, since this will destroy the superposition making up ). In
fact, Bobs error correction operation cannot give him any classical information about );
any such information would disrupt ). Instead, Bob can measure what kind of error
occurred (if any) and correct the error directly without disturbing ). The type of error
is called the error syndrome. Bobs decoding is a twostep process: First, Bob will measure
the error syndrome, i.e., which bit (if any) was ipped, without gaining any knowledge
of what the values of the bit were before and after. Second, knowing which qubit was
ipped, Bob applies an X gate to that qubit, and this will allow him to recover ) with
high probability.
To measure the error syndrome, Bob makes a fouroutcome projective measurement
on his three received qubits using the four projectors
P
0
= 000)000 +111)111,
P
1
= 100)100 +011)011,
P
2
= 010)010 +101)101,
P
3
= 001)001 +110)110.
Each P
j
is a twodimensional projector. P
0
projects onto the code space and corresponds
to the outcome of either no qubits ipped or all three qubits ipped. P
1
corresponds to
the outcome of either the rst qubit ipped and the other two left alone, or the other
two ipped and the rst left alone. P
2
and P
3
are similar for the second and third
qubits, respectively. Note that, whatever the state was before the syndrome measurement,
the postmeasurement state is in one of the four subspaces projected onto by P
0
, . . . , P
3
,
respectively.
Let j {0, 1, 2, 3} be the outcome of Bobs syndrome measurement, above. After the
measurement, Bob tries to recover 
L
) as follows: if j = 0, then Bob assumes that no
qubits were ipped (which is way more likely than all three being ipped), and so he
does nothing; if 1 j 3, then Bob assumes that the jth qubit was ipped (which is
somewhat more likely than the other two being ipped), and so he ips the jth qubit back
by applying X to it. No matter what qubits were ipped in the channel, Bob has a state
in the code space after the correction. If at most one qubit was ipped, then Bob has 
L
),
and the recovery is successful. If more than one qubit was ipped, then Bob has the state
X
L

L
), where X
L
is some threequbit operator that swaps 0
L
) with 1
L
), and the recovery
failed. (Bob doesnt know at this point whether he succeeded or failed.)
170
b
1
b
1
b
2
b
2
b
1
b
2
0)
0)
b
1
b
2
X
X
X
Figure 11: Bobs errorrecovery circuit for the quantum bitip channel. The middle three
qubits are what he receives from Alice, and the two outer qubits are ancill used in the
syndrome measurement.
In a moment, well see in detail how Bob can perform these steps, but once he has

L
)and if he really wants tohe can convert 
L
) back into ) by applying the circuit
tr
tr
which is the inverse of Alices circuit of Figure 10. The two gates labeled tr are what
I call trace gates. A trace gate just signies that a qubit is no longer useful and is to be
ignored, i.e., traced out. So mathematically, a trace gate corresponds to a partial trace.
Assuming the input state to the circuit is 
L
), the rst qubit of the output will be in state
). The tracedout qubits will both hold 0) if the input state is in the code space.
A circuit for Bobs syndrome measurement and subsequent correction is shown in
Figure 11. Bob received the threequbit state in the middle three qubits. His syndrome
measurement is split into two binary measurements: He rst measures whether the rst
two of the three qubits from Alice are dierent. The value b
1
measured on the upper
ancilla will be 1 i they are, and 0 otherwise. Similarly, the lower ancilla measurement
is 1 i the second two qubits are dierent. To correct the state, Bob combines these two
Boolean values to determine which qubit value, if any, is dierent from the other two, and
applies a classically controlled X gate to this qubit.
Exercise 26.1 Show mathematically that the syndrome measurement portion of Figure 11
is the same as the projective measurement {P
0
, P
1
, P
2
, P
3
} described earlier. What values of
b
1
b
2
correspond to which P
j
?
171
Bobs entire recovery process in Figure 11 can be described as a quantum operation R
that maps threequbit states to threequbit states: For input state , we have
R() = P
0
P
0
+
3
j=1
X
j
P
j
P
j
X
j
, (97)
where X
j
is the Pauli X gate applied to the jth qubit. That is,
X
1
= X I I,
X
2
= I X I,
X
3
= I I X.
Thus the state after Bobs recovery is
:= R() = R(E
3
(
L
)
L
)).
To get a handle on what is, rst notice that 
L
)
L
 is a linear combination of operators
of the form a
L
)b
L
 = aaa)bbb for a, b {0, 1}. By Equation (96) we have E(a)b) =
(1 p)a)b +p a)
= [E(a)b)]
3
= [(1 p)a)b +p a)
b]
3
= (1 p)
3
aaa)bbb
+ (1 p)
2
p( aaa)
b)
+ (1 p)p
2
(a a a)b
b + aa a)
bb
b + a aa)
bb)
+p
3
 a a a)
b.
(Alternatively, we can expandE
3
() for any threequbit operator to get an operatorsum
expression for E
3
:
E
3
() = (1 p)
3
+ (1 p)
2
p(X
1
X
1
+X
2
X
2
+X
3
X
3
)
+ (1 p)p
2
(X
2
X
3
X
2
X
3
+X
1
X
3
X
1
X
3
+X
1
X
2
X
1
X
2
)
+p
3
X
1
X
2
X
3
X
1
X
2
X
3
,
then plug in aaa)bbb for to get the same expression for E
3
(aaa)bbb).) Applying
the R of Equation (97) to E
3
(aaa)bbb) above, we get, after much simplication,
R(E
3
(a
L
)b
L
)) = (1 3p
2
+ 2p
3
)aaa)bbb + (3p
2
2p
3
) a a a)
b
= (1 3p
2
+ 2p
3
)a
L
)b
L
 + (3p
2
2p
3
)X
1
X
2
X
3
a
L
)b
L
X
1
X
2
X
3
.
172
Exercise 26.2 Verify this last equation. This may be tedious, but it is good practice.
Since this equation holds for all four bowtie operators a
L
)b
L
, by linearity, we get
= (1 3p
2
+ 2p
3
)
L
)
L
 + (3p
2
2p
3
)X
1
X
2
X
3

L
)
L
X
1
X
2
X
3
.
The rst termrepresents Bobs successful recovery of 
L
), andthis occurs with probability
1 3p
2
+ 2p
3
, which is greater than 1 p if p 1/2. In fact, it is 1 O(p
2
), which is
signicant if p is small. For the delity, we get
F(
L
)
L
, ) =
_
L

L
)
_
1 3p
2
+ 2p
3
>
_
1 p,
and so the minimum delity of with 
L
)
L
 is strictly greater than the minimum
delity of with 
L
)
L
. So, recovery improves the worstcase delity.
The QuantumPhaseFlipChannel. Bit ips are not the onlypossible errors ina quantum
channel. Consider the onequbit phaseip channel given by the quantum operation
F() := (1 p) +pZZ,
which applies a Pauli Z operator to the qubit (thus ipping the relative phase between 0)
and 1) by a factor of 1) with probability p < 1/2.
This kind of channel has no classical analogue, but in a very real sense it is closely
analogous to the quantum bitip channelthe two channels are unitarily conjugate to
each other via the Hadamard H operator. Heres what I mean by that: Since HX = ZH
and XH = HZ, we have
H(F())H = (1 p)HH+pHZZH = (1 p)HH+pXHHX = E(HH)
for every onequbit operator . Similarly, H(E())H = F(HH).
22
So by conjugating
everything by Hon each qubit, we can reduce the problem of error recovery in the phase
ip channel to that of error recovery in the bitip channel.
Compare the following with the previous discussion about the quantum bitip chan
nel: If Alice sends a onequbit pure state ) unencoded across the channel F to Bob,
then Bob receives some
is
F(),
) =
_

) =
_
(1 p) +pZ)
2
_
1 p,
with equality holding if ) = H0) or ) = H1). So the worstcase delity is the same as
with the bitip channel.
22
Put more succinctly, UF = EUand UE = FU, where Uis the onequbit unitary evolution quantum
operation that maps HH.
173
0)
)
0)
H
H
H
Figure 12: The threequbit code for the phaseip channel. An arbitrary onequbit state
) = 0) +1) is encoded as 
L
) = 0
L
) +1
L
) = +++) +).
b
1
b
1
b
2
b
2
b
1
b
2
b
1
b
2
0)
0)
H
Z
H Z
H Z
H
H
H
Figure 13: Bobs error recovery procedure for the phaseip channel.
To get an errorcorrecting code for the phaseip channel, we take our majorityof3
construction for the bitip channel and insert Hadamard gates in the right places. Recall
that weve dened +) := H0) and ) := H1). Alice now encodes her onequbit pure
state ) = 0) +1) as

L
) := H
3
(000) +111)) = +++) +) = 0
L
) +1
L
),
where 0
L
) := +++), 1
L
) := ), and H
3
= H
1
H
2
H
3
, dened analogously with X
1
, X
2
,
and X
3
previously. Figure 12 shows the circuit Alice uses to do this.
Note that Z+) = ) and Z) = +). In other words, Z is represented in the {+), )}
basis by the same matrix as X is in the computational basis. So a phase ip in the channel
F will ip a + to a and vice versa. This means that we can do the same analysis of the
channel F as we did with E by substituting the labels + for 0 and for 1. Bob receives
the state
:= F
3
(
L
)
L
) from Alice, measures the error syndrome with projectors
Q
0
, Q
1
, Q
2
, Q
3
, where each Q
j
:= H
3
P
j
H
3
. If Bob sees that the relative phase of one of
the qubits is dierent from that of the other two, then Bob assumes that the qubits phase
was ipped and applies a Z gate to that qubit. The circuit for doing all this is shown in
Figure 13.
174
The quantum operation corresponding to Bobs whole procedure is given by
S() := Q
0
Q
0
+
3
j=1
Z
j
Q
j
Q
j
Z
j
for any threequbit operator . Notice that
H
3
(S())H
3
= P
0
H
3
H
3
P
0
+
3
j=1
X
j
H
3
Q
j
Q
j
H
3
X
j
= P
0
H
3
H
3
P
0
+
3
j=1
X
j
P
j
H
3
H
3
P
j
X
j
= R(H
3
H
3
).
That is, S is unitarily conjugate to R via H
3
. In a similar fashion, we can get that
H
3
(F
3
())H
3
= E
3
(H
3
H
3
) for any threequbit operator .
Letting
:= S(F
3
(
L
)
L
)) and stringing these operations together, we have
H
3
H
3
= H
3
(S(F
3
(
L
)
L
)))H
3
= R(E
3
(H
3

L
)
L
H
3
))
= R(E
3
(
L
)
L
))
= ,
or equivalently,
= H
3
H
3
= (1 3p
2
+ 2p
3
)
L
)
L
 + (3p
2
2p
3
)Z
1
Z
2
Z
3

L
)
L
Z
1
Z
2
Z
3
.
Thus we get the same success probability here as with the bitip channel, and the delity
is at worst the same as it was then:
F(
L
)
L
,
) =
_
L

L
)
_
1 3p
2
+ 2p
3
.
The Shor Code. We can combine the bitip and phaseip error correcting codes above
to correct against both kinds of errors, even on the same qubit. As a bonus, well showthat
the resulting code corrects against arbitrary errors on a single qubit. A typical onequbit
channel that has all three kinds of errors (bit ip, phase ip, and combined bit and phase
ip) is called the depolarizing channel, and it maps
D() := (1 p)+
p
3
(XX+ZZ+ZXXZ) = (1 p)+
p
3
(XX+YY +ZZ). (98)
This channel leaves the qubit alone with probability 1 p > 1/2 and produces each of the
three possible errors with the same probability p/3.
175
0)
0)
0)
0)
0)
0)
0)
)
0) H
H
H
Figure 14: The ninequbit Shor code. This concatenates the phaseip and bitip codes.
To help correct against all three types of errors, Alice rst encodes a single qubit using
the threequbit phaseip code of Figure 12, then she encodes each of the three qubits using
the majorityof3 code for the bitip channel, shown in Figure 10. The resulting encoding
circuit, shown in Figure 14, produces the ninequbit Shor code, named after its inventor,
Peter Shor. Such a code is called a concatenated code, in that it combines (concatenates) two
or more simpler codes into a single code. Using the Shor code, Alice encodes a single
qubit in state ) = 0) +1) as the ninequbit state 
S
) := 0
S
) +1
S
), where
0
S
) :=
1
2
2
(000) +111))
3
= +
L
)
3
, (99)
1
S
) :=
1
2
2
(000) 111))
3
= 
L
)
3
, (100)
where we dene the threequbit states +
L
) := (000) + 111))/
2 and 
L
) := (000)
111))/
2. The nine qubits are naturally divided into three subblocks of three qubits each,
which Ill call 3blocks. Alice sends Bob 
S
)
S
 through a channel (e.g., the depolarizing
channel) that may cause one of the three errors on each of the nine qubits with some
probability independently of the others. If more than one qubit is aected, then the
recovery wont work, and so we hope that the probability of this happening is low.
Bob receives the ninequbit state sent from Alice, and well assume (with high
probability) that at most one of the nine qubits endured either a bit ip, phase ip, or
both. For example, suppose Alice sends 0
S
) = +
L
)
3
to Bob. If the rst qubit is bitipped
enroute, thenBob receives (1/
2)(100)+011))+
L
)
2
. If the rst qubit is phaseipped,
then Bob gets (1/
2)(000) +111))+
L
)
2
= 
L
)+
L
)
2
. (Note that a phase ip in a qubit
contributes an overall phase ip in its 3block; phase ips on two dierent qubits in the
same block would cancel each other.) Finally, if the rst qubit is bitipped and then
phaseipped, then Bob gets (1/
2)(100) +011))+
L
)
2
.
176
0) b
2
0) b
1
b
1
b
2
b
1
b
2
b
1
b
2
H
H
H H Z
H H
H
H
H
Z H
H
H
H
H
Z
H
H
H
H
Figure 15: Recovering from a phase ip with the Shor code. When this circuit starts, Bob
assumes that he has already corrected any bit ip in a 3block if there was one, and so the
incoming state is a linear combination of the eight states 
L
)
L
)
L
).
To recover, Bob rst applies the bitip error recovery operation R of Figure 11 and
Equation (97) to each of the three 3blocks independently. This will correct up to a single
bitip error within each 3block. Importantly, this intrablock bitip recovery works
regardless of whether there was also a phaseip in the 3block. After Bob corrects bit ips
within each 3block, he must then correct phase ips. He does this by comparing phase
dierences between adjacent 3blocks, either nding which 3blocks phase doesnt match
the other two and ipping that 3blocks phase back, or else determining that the phases
of the 3blocks are all equal and nothing needs to be done. A circuit that accomplishes
this phaseip recovery portion of the overall recovery is shown in Figure 15. To see that
this works, dene
even) :=
1
2
(000) +011) +101) +110)),
odd) :=
1
2
(100) +010) +001) +111)).
even) is a superposition of all computational basis states of three qubits with an even
number of 1s; odd) is a superposition of the other computational basis states. Its easy
to check that even) = H
3
+
L
) and that odd) = H
3

L
). Suppose that Alice sends
1
S
) = 
L
)
L
)
L
) to Bob, and there is a phase ip on some qubit in the rst 3block
(it doesnt matter which). So after correcting bit ips, Bobs state is now +
L
)
L
)
L
),
which feeds into the circuit of Figure 15. Applying the Hadamard gates yields the state
177
even)odd)odd). As a result of the CNOTs, the upper ancillas bit value will ip an
even + odd = odd number of times, and so b
1
= 1. The lower ancillas bit value will
ip an odd + odd = even number of times, and so b
2
= 0. The next layer of Hadamards
converts the state back to +
L
)
L
)
L
), and then Bob recovers by applying a Z gate to the
rst qubit, yielding 
L
)
L
)
L
) = 1
S
).
Lets look briey at the quantum operations involved. Let T be the quantumoperation
corresponding to Bobs entire recovery procedure for the Shor code.
Exercise 26.3 (Challenging) Give an expression for T applied to an arbitrary ninequbit
state . Make your expression as succinct as possible but still mathematically precise. You
may use the following notations for operators without having to expand them:
Let R
0
, R
1
, R
2
, R
3
represent the projectors for the measurement performedinFigure 15,
corresponding to the outcomes 00, 10, 11, and 01, respectively, for b
1
b
2
.
For j {0, 1, 2}, let P
(j)
0
, P
(j)
1
, P
(j)
2
, P
(j)
3
be the projectors used for the bitip syndrome
measurement in the (j +1)st 3block, as described in the bitip channel discussion.
For any singlequbit operator A and k {1, . . . , 9}, let A
k
be the ninequbit operator
that applies A to the kth qubit and leaves the other qubits alone.
Suppose the channel between Alice and Bob is the onequbit depolarizing channel D
of Equation (98). If Alice and Bob use the Shor code, then nine qubits will be transferred
per single plaintext qubit. For an arbitrary ninequbit state , the eect of Don is then
D
9
() = (1 p)
9
+ (1 p)
8
p
3
9
j=1
(X
j
X
j
+Y
j
Y
j
+Z
j
Z
j
) +O(p
2
).
Where the terms hidden in the O(p
2
) represent errors on two or more qubits, fromwhich
Bob may not recover. Bob can recover from any of the singlequbit errors showing in the
expression above, however, provided is in the code space of the Shor code. That is, if
= 
S
)
S
 for some onequbit state ), then weve shown that T(X
j
X
j
) = T(X
j
X
j
) =
T(X
j
X
j
) = for all 1 j 9, and thus the nal errorcorrected state is
:= T(D
9
()) = ((1 p)
9
+ 9p(1 p)
8
) +O(p
2
) = (1 p)
8
(1 + 8p) +O(p
2
).
The hidden terms are all of the form KK
= K
S
)
S
K
S

S
)
=
_
(1 p)
8
(1 + 8p) + (nonnegative terms)
_
(1 p)
8
(1 + 8p)
= (1 p)
4
_
1 + 8p
= 1 O(p
2
).
178
The value (1 p)
4
j=1
(X
j
X
j
+Y
j
Y
j
+Z
j
Z
j
), (101)
and represents the portion of D
9
from which we know Bob can recover. Note that E is an
incomplete (nontracepreserving) operation, because we are omitting the terms of D
9
where more than one qubit is subjected to an error, and fromwhich Bob may not be able to
recover. The incompleteness reects the fact that this happens with nonzero probability.
Well say that a quantum state is in the code space C i it is a convex sum of pure
states in C, i.e., =
i
p
i

i
)
i
 where each 
i
) C. Equivalently, is in the code space
i = PP (equivalently, = P, or equivalently, = P, by Exercise 27.1, below).
Exercise 27.1 Prove that the following are equivalent for any projector P and Hermitean
operator A: (1) A = PAP; (2) A = AP; (3) A = PA. [Hint: No decompositions are needed
for any of thesejust simple substitutions and taking adjoints.]
The error operation E can be given in operatorsum form by some Kraus operators
E
1
, . . . , E
N
L(H) such that
N
j=1
E
j
E
j
I (E is not necessarily complete), and
E() =
N
j=1
E
j
E
j
for any L(H). Suppose that R : L(H) L(H) is some (not necessarily complete)
quantum operation representing a recovery procedure. We will say that R successfully
recovers from E if (R E)() = c for any in C, where c is a real constant depending on ,
E, and R and satisfying 0 c 1. (If E and R are both complete (hence tracepreserving),
then we must have c = 1.) We will say that E is recoverable if there exists an R that
successfully recovers fromE. The next theorem gives a quantitative criterion for when an
error operation is recoverable.
180
Theorem 27.2 Let Ebe an error operation on L(H) given by Kraus operators E
1
, . . . , E
N
L(H).
Fix a code space C Hand let P be the projector projecting onto C. E is recoverable (with respect
to C) if and only if there exists an NN matrix M such that, for all 1 i, j N,
PE
i
E
j
P = [M]
ij
P. (102)
Further, if such an M exists, then M 0, tr M is the probability that E occurs given any state in
C, and a (complete) recovery operation R exists such that (R E)() = (tr M) for any in C.
I call Equation (102) the peep condition, because of the lefthand side.
Proof. We will only be interested in the backwards implication, giving sucient condi
tions for E to be recoverable. So we wont prove the forward implication (the textbook
does it).
Suppose that Mexists satisfying (102) for all i, j. We can assume that P 0; otherwise,
the theorem is trivial. Taking the adjoint of each side of (102), we have, for all 1 i, j N,
[M]
ij
P = PE
j
E
i
P = [M]
ji
P,
and so [M]
ij
= [M]
ji
since P 0, which means that Mis Hermitean. The next thing to do is
to simplify (102) by diagonalizing M. Since Mis normal, there is an NNunitary matrix
U and scalars d
1
, . . . , d
N
R (the eigenvalues of M) such that U
MU = diag(d
1
, . . . , d
N
).
For 1 k N, dene
F
k
:=
N
j=1
[U]
jk
E
j
.
Then
N
k=1
F
k
F
k
=
N
i,j=1
_
k
[U]
jk
[U]
ik
_
E
i
E
j
=
i,j
ji
E
i
E
j
=
j
E
j
E
j
I,
and for any L(H),
N
k=1
F
k
F
k
=
N
i,j=1
_
k
[U]
ik
[U]
jk
_
E
i
E
j
=
i,j
ij
E
i
E
j
=
j
E
j
E
j
= E().
Thus F
1
, . . . , F
N
are also a set of Kraus operators for E. Now Equation (102) becomes, for
all 1 k, N,
PF
k
F
P =
N
i,j=1
[U]
ik
[U]
j
PE
i
E
j
P =
i,j
[U
]
ki
[M]
ij
[U]
j
P =
i,j
[U
MU]
k
P = d
k
k
P. (103)
Taking the trace of both sides of (103) with k = , we get
d
k
=
tr(PF
k
F
k
P)
tr P
=
F
k
PF
k
P)
tr P
0.
181
This implies M 0. Also, for any state in C, the probability that E actually occurs is
given by
tr[E()] = tr[E(PP)] =
N
k=1
tr(F
k
PPF
k
) =
k
tr(PF
k
F
k
P) =
k
d
k
tr(P) =
k
d
k
tr =
k
d
k
= tr M.
Note that if d
k
= 0 for some k, then F
k
PF
k
P) = 0, and so F
k
P = 0. This implies that
if is any state in C, then F
k
F
k
= F
k
PPF
k
= 0, and so this term is dropped from the
operatorsum expression for E(). Since we only care about the behavior of E on states in
C, we can eectively ignore the cases where d
k
= 0 and assume instead that all the d
k
are
positive.
By the Polar Decomposition (Theorem 2.1 of the Background Material), for each 1
k N there is a unitary U
k
L(H) such that
F
k
P = U
k
F
k
P = U
k
_
PF
k
F
k
P =
_
d
k
U
k
P. (104)
U
k
rotates C to the subspace C
k
that is the image of the projector P
k
dened as
P
k
:= U
k
PU
k
=
F
k
PU
d
k
. (105)
The crucial fact that makes E recoverable is that these C
k
subspaces are mutually orthog
onal:
P
k
P
= P
k
P
=
U
k
PF
k
F
PU
d
k
d
=
U
k
(d
k
k
)U
d
k
d
= 0 (106)
if k . To help see whats going on, its worth seeing what happens when E is applied
to some pure state ) with ) C. We have ) = P), and so by Equation (104) we
have
E()) =
N
k=1
F
k
)F
k
=
k
F
k
P)PF
k
=
k
d
k
U
k
)U
k
=
k
d
k

k
)
k
,
where 
k
) := U
k
) C
k
for each k. So E()) is a mixture of pure states 
k
) in
the various subspaces C
k
. We can thus interpret E as mapping ) to 
k
) C
k
with
probability d
k
. Since the C
k
are mutually orthogonal, the 
k
) are pairwise orthogonal.
To recover, we can rst measure to which C
k
the state E()) belongs. This projective
measurement projects to one of the states 
k
) = U
k
), where k is the outcome of the
measurement. Then to correct the error, we simply apply U
k
to get U
k

k
) = ).
Now we describe R formally. R consists of two stages: (1) measure the error syn
drome (i.e., which C
k
?), and (2) apply the appropriate (unitary) correction U
k
. By
Equation (106), the projectors P
1
, . . . , P
N
form a set of orthogonal projectors. If this is not
182
a complete set, i.e., if
N
k=1
P
k
I, then we add one more projector P
N+1
:= I
N
k=1
P
k
to the set to make it complete. Otherwise, we set P
N+1
:= 0. The syndrome measurement
is then a projective measurement with the P
k
. (If the outcome is N + 1, which signies
none of the above, then we really dont know what to do, so well give up and dene
U
N+1
:= I for completeness. If the state being measured is the result of applying E to some
state in the code space C, however, then outcome N+ 1 will never actually occur.)
So we dene, for any L(H),
R() :=
N+1
k=1
U
k
P
k
P
k
U
k
.
Thus R has Kraus operators U
k
P
k
for 1 k N+ 1. We rst check that R is compete:
N+1
k=1
P
k
U
k
U
k
P
k
=
N+1
k=1
P
k
= I.
It remains to check that R successfully recovers from E for arbitrary states in Cnot
just pure states. The following equation will make things easier: for all 1 k, N,
U
k
P
k
F
P = U
k
P
k
F
P =
U
k
U
k
PF
k
F
d
k
=
PF
k
F
d
k
=
_
d
k
k
P, (107)
usingEquations (103) and(105). Also, for 1 N, we have P
N+1
P
= 0 byorthogonality,
and thus, using Equations (104) and (105),
U
N+1
P
N+1
F
P = P
N+1
F
P =
_
d
P
N+1
U
P =
_
d
P
N+1
P
= 0, (108)
and so Equation (107) holds for k = N+ 1 as well.
So nally, if is in C, we have, by Equations (107) and (108),
R(E()) = R(E(PP)) =
N+1
k=1
N
=1
U
k
P
k
F
PPF
P
k
U
k
=
(U
k
P
k
F
P) (U
k
P
k
F
P)
_
_
d
k
k
P
_
_
_
d
k
k
P
_
=
_
d
k
k
_
PP
= (tr M).
2
183
Exercise 27.3 (Challenging) Recall the quantum bitip channel for a single qubit:
E() := (1 p) +pXX.
Also recall the recoverable portion of the threequbit bitip channel:
E
() = (1 p)
3
+ (1 p)
2
p
3
j=1
X
j
X
j
.
Show directly that E
pX
1
, (1 p)
pX
2
, (1
p)
pX
3
, satises the peep condition (102) of Theorem 27.2, where Cis the usual majority
of3 code space given by the projector P = 000)000 + 111)111. What is the matrix M?
What are the P
k
and U
k
? Is the R constructed by the Theorem the same as it was before?
Discretization of Errors. The great thing about the Shor code is that it can recover from
an arbitrary singlequbit error. There are many possible singlequbit errors, as there are a
continuum of possible onequbit Kraus operators. Yet they are all corrected by the Shor
code, with no additional work. This happy fact follows from the following two general
theorems:
Theorem 27.4 Suppose C His the code space for a quantum code, P is the projector projecting
orthogonally onto C, E : L(H) L(H) is a not necessarily complete quantum error operation
with Kraus operators F
1
, . . . , F
N
, and R : L(H) L(H) is a quantum operation with Kraus
operators R
1
, . . . , R
M
such that, for any 1 j N there exist scalars d
j
0 such that
R
k
F
j
P =
_
d
j
kj
P (109)
for any 1 k M. Suppose also that G is an error operation whose Kraus operators G
1
, . . . , G
K
are all linear combinations of F
1
, . . . , F
N
. Then R successfully recovers from G.
Proof. For all 1 K we have G
N
j=1
m
j
F
j
, for some scalars m
j
. Using (109), we
get
R
k
G
P =
N
j=1
m
j
R
k
F
j
P =
j
m
j
_
d
j
kj
P = m
k
_
d
k
P,
where we set d
k
:= 0 if N < k M. Then for every state in C, we have
R(G()) = R(G(PP)) =
M
k=1
K
=1
(R
k
G
P)(R
k
G
P)
m
k

2
d
k
PP = c,
where c :=
M
k=1
K
=1
m
k

2
d
k
. Thus R successfully recovers fromG given code space C.
2
184
Theorem 27.5 Suppose C His the code space for a quantum code, P is the projector projecting
orthogonally onto C, E : L(H) L(H) is a not necessarily complete quantum error operation
with Kraus operators E
1
, . . . , E
N
that satisfy the peep condition (102), i.e.,
PE
i
E
j
P = [M]
ij
P
for all 1 i, j N, for some matrix M. Suppose also that G is an error operation whose Kraus
operators G
1
, . . . , G
K
are all linear combinations of E
1
, . . . , E
N
. Then the operation R constructed
in the proof of Theorem 27.2 to recover from E also successfully recovers from G, given code space
C.
Proof. In the proof of Theorem 27.2 above, we chose new Kraus operators F
1
, . . . , F
N
for
E where F
k
:=
N
j=1
[U]
jk
E
j
for all 1 k N, where U is an N N unitary matrix that
diagonalizes M so that there are real numbers d
1
, . . . , d
N
0 such that PF
k
F
P = d
k
k
P
for all 1 k, N. The F
k
are clearly linear combinations of the E
j
, but the E
j
are also
linear combinations of the F
k
; indeed, it is easily checked that E
j
=
N
k=1
[U]
jk
F
k
, using the
unitarity of U. Thus the G
1
P
1
, . . . , U
N
P
N
, U
N+1
P
N+1
.
Setting R
k
:= U
k
P
k
for all 1 k N+ 1, Equations (107) and (108) say that
R
k
F
j
P =
_
d
j
kj
P
for all 1 k N + 1 and all 1 j N. This is exactly the discretization condition of
Equation (109) (with M = N + 1). Therefore, G, the R
k
, and the F
j
together satisfy the
hypotheses of Theorem 27.4, and so R successfully recovers fromG by that theorem. 2
We can apply either Theorem 27.4 or Theorem 27.5 to the Shor code to show that
Bobs recovery procedure can correct any singlequbit error. The key point is that the four
Pauli operators I, X, Y, Z form a basis for the space of all singlequbit operators, and so a
singlequbit error operation has Kraus operators that are linear combinations of the Pauli
operators. Since Bob can recover from any error of the formX
j
, Y
j
, or Z
j
, for 1 j 9 in a
way that satises Theorem 27.4, he can recover from any linear combination of thesein
particular, any error on any one of the nine qubits.
Exercise 27.6 (Challenging) Show that Bobs recovery operation for the Shor code can
recover from any error on any one of the nine qubits. [Hint: By the preceding discussion,
it only remains to show that Bobs recovery procedure satises the discretization condi
tion of Equation (109) for the recoverable portion of the depolarizing channel given by
Equation (101).]
185
28 Week 14: Fault tolerance
FaultTolerant Quantum Computation. If a qubit is in an encoded state, such as with
the Shor code, then we can repeatedly apply an errorrecovery operation to restore
the logic, i.e., the state of the logical qubit, assuming isolated errors in the physical
qubits. Depending on the implementation and frequency of the restore operations, we
can maintain a logical qubit state indenitely with high probability. There is more to
a quantum computation, however, than simply maintaining qubits. We must apply
quantum gates to them. A notsogood way to apply a quantum gate is to decode each
qubit involved in the gate, then apply the gate on the unencoded qubits, then reencode
the qubits. This is bad because qubits spend time unencoded and subject to unrecoverable
errors, defeating the whole purpose of error correction. Abetter way is to keepall qubits in
an encoded state always, never decoding them, so that we prepare, work with, save, and
measure qubits in their encoded states only. This practice is called faulttolerant quantum
computation, andit works by replacing each gate of a standard, nonfaulttolerant quantum
circuit with a quantum minicircuit that aects the state of the logical qubits in the same
way as the original gate.
With the Shor code as well as other quantumerrorcorrecting codes, we can implement
several types of quantum gates faulttolerantly. It can be shown that these codes can
implement a family of gates big enough to provide a basis for any feasible quantum
computation (a socalled, universal family of gates). We will not do an exhaustive
treatment here, but will at least show how to implement the CNOT and Pauli gates
explicitly using the Shor code.
Figure 16 shows how to implement the CNOT gate faulttolerantly using the Shor
code. Each logical qubit is implemented by nine physical qubits.
Exercise 28.1 Verify that the circuit in Figure 16 really implements the CNOT gate with
respect to the Shor code. That is, show that the circuit maps a
S
)b
S
) to a
S
)(a b)
S
) for
all a, b {0, 1}.
29 Week 14: Entanglement and Bell inequalities
A good philosophical discussion of the EPR paradox can be found online in the Stanford
Encyclopedia of Philosophy (http://plato.stanford.edu/entries/qtepr/).
186
S :=
Figure 16: Implementing the CNOT gate faulttolerantly using the Shor code. The double
slashes on the left indicate that each line represents a multiqubit register (nine qubits in
this case). The circuit maps a
S
)b
S
) to a
S
)(a b)
S
) for all a, b {0, 1}.
187
30 Final Exam
Do all problems. Hand in your anwsers in at my oce or in my mailbox by 5:00 pm on
Wednesday, May 9. The same ground rules for the midterm apply here: any resources are
at your disposal except discussion with humans other than me about the exam.
All questions with Roman numerals carry equal weight, but may not be of equal
diculty.
I) (A linear algebraic inequality) Suppose that A is any operator. Show that A 0 if
and only if tr(PA) 0 for all projectors P. EXTRA CREDIT: Show that A 0 if
and only if tr(PA) tr A for all projectors P. [Hint: The extra credit statement is a
corollary of the previous statement.]
II) (A circuit identity) Look at Bobs phaseerror recovery circuit for the Shor code in
Figure 15. Show that the following alternative circuit does exactly the same thing:
0)
b
1
b
2
b
1
b
2
b
1
b
2
b
2
b
1
0) H
H H
Z
Z
Z
H
Find a similar alternative for Bobs phaseerror recovery circuit in Figure 13.
III) (The square root of SWAP)
(a) Show that if V is any unitary operator, then there exists a (not necessarily
unique) unitary U such that U
2
= V. [Hint: All unitary operators are normal.]
(b) Find a twoqubit unitary U such that U
2
= SWAP. The U that you nd should
x the vectors 00) and 11).
188
This Uis sometimes written as
SWAP. It can be shown that
SWAP, among many
other twoqubit gates, is (by itself) universal for quantum computation. Also, there
is currently some hope of implementing it exibly using superconducting Josephson
junctions.
IV) (Generalized Pauli gates and the QFT) For n > 0, let X
n
and Z
n
be nqubit unitary
operators such that, for all x Z
2
n,
X
n
x) = (x + 1) mod 2
n
),
Z
n
x) = e
n
(x)x),
recalling that e
n
(x) := exp(2ix/2
n
). X
n
and Z
n
are nqubit generalizations of the
Pauli X and Z gates, respectively.
(a) What are X
n
Z
n
X
n
and Z
n
X
n
Z
n
? (Just show how each behaves on x) for
x Z
2
n.)
(b) Draw an nqubit quantum circuit that implements Z
n
using only singlequbit
conditional phaseshift gates P() for various .
(c) Show that X
n
and Z
n
are unitarily conjugate via QFT
n
.
(d) What are the eigenvalues and eigenvectors of X
n
?
V) (The Schmidt Decomposition) You may either take the following on faith or read a
proof of it in the textbook on page 109. (The Schmidt Decomposition is actually just
the Singular Value Decomposition (Background Material, Theorem 2.2) in disguise.)
Theorem 30.1 (Schmidt Decomposition) Let H and J be Hilbert spaces, and let )
H J be any unit vector. There exists an integer k > 0, pairwise orthogonal unit vectors
e
1
), . . . , e
k
) H and f
1
), . . . , f
k
) J, and positive values
1
k
> 0 such that
k
j=1
2
j
= 1 and
) =
k
j=1
j
(e
j
) f
j
)). (110)
The vectors e
1
), . . . , e
k
) and f
1
), . . . , f
k
) are known collectively as a Schmidt basis
for ), although they may not span their respective spaces. The
j
are called (the)
Schmidt coecients for ), and k is called the Schmidt number of ).
(a) Give full Schmidt decompositions for the Bell states 
+
) := (00) + 11))/
2
and 
) := (00) 11))/
_
I
J
k
j=1
f
j
)f
j

_
, where I
H
and
I
J
are the identity operators in L(H) and L(J), respectively. The last projector
189
corresponds to the default none of the above outcome. Interms of the
j
, what
is the probability of each of the k+1 outcomes? What is the postmeasurement
state for each possible outcome?
(c) It is implicit in the books discussion on page 109 that k, the
j
, and the Schmidt
basis are unique, but they never come out and say it explicitly. Explain briey
why k and
1
, . . . ,
k
are uniquely determined by ). [Hint: Consider the
density operator ) and trace out one of the spaces.]
(d) Show that the Schmidt basis is not necessarily uniquely determined by ). Do this
by nding a Schmidt basis for 
+
) that is dierent from the one you found
above. (Two Schmidt bases are considered the same if they are identical up to
reordering and phase factors.)
VI) (Logical Pauli gates for the Shor code) Recall the ninequbit Shor code dened by
Equations (99) and (100).
(a) Show that the operator Z
1
Z
2
Z
3
Z
4
Z
5
Z
6
Z
7
Z
8
Z
9
(i.e., a Pauli Z gate applied to
each of the nine qubits) implements the logical Pauli X gate X
S
, such that
X
S
0
S
) = 1
S
) and X
S
1
S
) = 0
S
).
(b) Find an operator that implements the logical Pauli Zgate Z
S
, such that Z
S
0
S
) =
0
S
) and Z
S
1
S
) = 1
S
).
190
Index
L
1
distance, 160
L
2
norm, 137
L
p
norm, 137
S
3
parameterization, 50
/8 gate, 67
fgate, 80
(orthogonal) projection operator, 25
3blocks, 176
absolute value, 9
adjoint, 9, 16
algebraically closed, 10
ancilla, 71
angular momentum, 31
antisymmetric state, 79
argument, 9
balanced, 82
BB84, 132
Bell basis, 75
Bell states, 75
bilinear, 59
binary digit, 34
binary errorcorrecting code, 168
binary strings, 168
binary symmetric channel, 168
bit, 34
bit matrices, 89
bit vectors, 89
Bloch sphere, 42
bounded, 28
bra vector, 20
bracket, 20
buttery network, 99
CNOT, 66
characteristic polynomial, 51
ChurchTuring thesis, 7
ciphertext, 131
classical gate, 66
clean, 71
cleartext, 131, 168
code space, 169, 180
codewords, 168
commutator, 163
complementary, 132
complete quantum operations, 158
complete set of orthogonal projectors, 27
completely positive, 154
complex conjugate, 9
computational basis, 64
concatenated code, 176
conditional phaseshift gates, 68
conjugate linear, 12
contractions, 145
contractive, 165
control, 66
controlled U gate, 67
controlled NOT, 66
convex linear combination, 139
coupledsystems representation, 145
dense coding, 78
density matrix, 39
density operator, 39
density operator formalism, 39
depolarizing channel, 175
diagonal, 51
dimension, 89
direct product, 59
discrete Fourier transform, 98
dual vector, 17
eigenbasis, 53
eigenspace, 54
eigenvalue, 51
eigenvalue distribution, 141
eigenvector, 51
elementary events, 159
entangled states, 61
environment, 145
191
EPR pairs, 76
EPR states, 76
error operation, 180
error syndrome, 170
Euclidean distance, 115
Euclidean norm, 137
Euler angles, 44
Euler totient function, 97
events, 159
Fast Fourier Transform, 99
faulttolerant quantum computation, 186
delity, 160
elds, 89
nite geometric series, 98
full rank, 90
general quantum operation, 158
good, 105
GramSchmidt procedure, 17
Grover iterate, 124
Grovers quantum search algorithm, 123
Hadamard gate, 64
Hamming weight, 90
Hermitean, 18
Hermitean form, 12
Hermitean inner product, 12
Hilbert space, 12
HilbertSchmidt inner product, 19
HilbertSchmidt norm, 137
incomplete quantum operation, 153
incomplete state, 153
inversion fgate, 82
invertible, 93
kernal, 90
ket vector, 20
key exchange, 131
Kolmogorov distance, 160
Kraus operators, 146, 151
Kronecker delta, 13
Kronecker product, 59
Las Vegas algorithm, 91
Lie bracket, 163
linear map, 15
local operations, 62
magnetic moment, 31
majorityof3 code, 168
measurement, 29
measurement operators, 151
metric, 115
mixed state, 140
Monte Carlo algorithm, 91
mutually orthogonal, 24
mutually unbiased, 132
norm, 9, 12
normal, 13, 53
nullity, 90
observation, 29
onetime pad, 131
operator distance, 117
operator norm, 116
operatorsum representation, 145
oracle, 80
order, 93
orthogonal, 13
orthogonal complement, 27
orthogonal matrix, 46
orthogonal support, 166
orthonormal basis, 13
orthonormal set, 13
outer product, 59
partial state, 153
partial trace, 144
partial transpose, 155
Pauli spin matrices, 38
perfect secrecy, 131
permutation matrix, 67
perpendicular, 13
phase gate, 67
phaseip channel, 173
physical system, 28
192
plaintext, 131, 168
polar decomposition, 118
positive, 55, 154
positive denite, 55
positive operatorvalued measure, 138
positive semidenite, 55
possible outcomes, 29
POVM, 138
principal mth root of unity, 99
probability distribution, 159
probability distribution on , 159
probability of S, 159
product basis, 60
projection operators, 25
projective measurement, 29
projector, 25
pure states, 140
quantum bitip channel, 169
quantum bits, 34
quantum circuit, 63
quantum Fourier transform, 98, 99
quantum gate, 63
quantum operation, 151
quantum operations, 145
quantum parallelism, 80
quantum query complexity, 126
quantum register, 63
quantum teleportation, 76
quantum Turing machine, 63
qubits, 34
query, 80
query answer, 80
rank, 90
ratio (of a geometric series), 98
recoverable, 180
reduced, 145
reversible, 70
row vector, 17
sample space, 159
Schatten pnorm, 137
Schmidt basis, 189
Schmidt coecients, 189
Schmidt number, 189
Schur basis, 50
selfadjoint, 18
separable states, 61
Shor code, 176
simplest rational interpolant, 109
singlet state, 79
singular value decomposition, 118
singular values, 137
spectral decomposition, 54
spectrum, 51
spin0 state, 79
spin1 states, 79
SRI, 109
start state, 123
state, 28
state space, 28
SternGerlach experiment, 32
strictly positive, 55
SWAP gate, 68
symmetric, 131
symmetric states, 79
target, 66
tensor product, 59, 60
tensor product states, 61
threshold theorem, 8
Tooli gate, 69
trace, 11
trace distance, 160
trace gates, 171
trace norm, 137
tracing out, 144
triangle inequality, 9, 106, 115
triplet states, 79
unit, 93
unit circle, 9
unit vector, 13
unitarily conjugate, 21
unitary, 18
unitary error, 119
193
upper triangular, 50
zero vector, 10
194