This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

COMSM0214

QUANTUM COMPUTATION

LECTURE NOTES

(1 October 2006 version)

Richard Jozsa

Department of Computer Science

Merchant Venturer’s Building Room 3.22

Useful references:

• John Preskill’s notes for Caltech course on quantum computation.

Available at http://theory.caltech.edu/people/preskill/ph229/

• N. David Mermin’s notes for Cornell course on quantum computation.

Available at http://people.ccmr.cornell.edu/ mermin/qcomp/CS483.html

• S. Loepp and W. Wootters “Protecting information: from classical error correction to

quantum cryptography” (chas 2, 7). CUP 2006.

• M. Nielsen and I. Chuang ”Quantum computation and information”. (Chas 1-6).CUP.

Further books on quantum theory:

• R. I. G. Hughes “The structure and interpretation of quantum theory”. Harvard Uni-

versity Press ( paperback 1992).

• C. J. Isham “Lectures on quantum theory: mathematical and structural foundations”.

Imperial College Press (paperback 1995).

Assessment:

Exam 70%, coursework 30%.

COMSM0214 2

.

“Joy in looking and comprehending is nature’s most beautiful gift.”

Aphorism by Albert Einstein, 1953.

COMSM0214 3

CONTENTS

1 Introduction 4

2 Prologue 5

2.1 Bit strings and vectors 6

2.2 Computational steps as linear operations 8

2.3 Complex numbers – summary 13

3 Qubits and quantum states 14

3.1 Two-qubit states 15

3.2 Multi-qubit states 16

3.3 Lengths and inner products 17

4 Physical operations on qubits 19

4.1 Operations on 1 or 2 qubits in n qubits 21

5 Quantum measurements 22

5.1 Extended Born rule 23

6 Quantum interference 25

7 Quantum non-locality 27

8 Quantum teleportation 31

9 Quantum computation – circuit model 35

9.1 Reversible gate for any Boolean function 38

9.2 Time complexity – P,BPP,BQP 39

9.3 Query complexity and promise problems 40

10 Computation by quantum parallelism, Deutsch algorithm 42

10.1 Computation by quantum parallelism 42

10.2 Deutsch algorithm 42

10.3 DJ algorithm 44

11 Quantum Fourier transform and periodicities 46

11.1 QFT mod N 46

11.2 Periodicity determination 48

11.3 Eﬃcient implementation of QFT 50

12 Shor’s quantum factoring algorithm 53

12.1 Factoring as a periodicity problem 54

12.2 Computing the period r of f(k) = a

k

mod N 56

12.3 Getting r from a good c value 58

12.4 Assessing the complexity of Shor’s algorithm 63

13 Quantum algorithms for search problems 64

13.1 Reﬂections and projections on Dirac notation 65

13.2 Grover’s quantum searching algorithm 67

13.3 The iteration operator Q – reﬂections and rotations 70

13.4 Some further features of Grover’s algorithm 72

COMSM0214 4

1 Introduction - what is quantum

computation?

Quantum physics diﬀers dramatically from classical physics in its representation of the

physical world and the kinds of processes that are allowed by the physical laws. Quantum

computation is the study of the possible applications and exploitation of these novel

quantum eﬀects in issues of computation, complexity and communication. This subject

is a fascinating hybrid of theoretical computer science and quantum physics. It emerged

in the mid-1980’s and it is currently one of the most active areas of all scientiﬁc research

internationally. It is a highly signiﬁcant area of study for a variety of reasons which we

collect into three categories.

Fundamental issues: our ﬁrst category is well summed up by a quote from the physicist

Richard Feynman: “Because nature isn’t classical, dammit...”. The issue here is a deep

fundamental connection between physics and computation. We ask, what is computation

really? It is “processing” of “information”. But what is “information”? It is always rep-

resented in physical degrees of freedom of a physical system (voltage levels, positions of

switches etc) – a computer is always a physical device. Bit values 0 and 1 are just two dis-

tinguishable states of some physical system. What is processing? It is physical evolution

of the system. Hence the possibilities and limitations of information storage, computa-

tion and how eﬃciently a computation can be carried out, all must depend on the laws

of physics which characterise the allowable kinds of evolutions etc. It cannot be derived

from thought/mathematics alone. In computer science we study computation by ﬁrst

setting up a theoretical computational model such as the Turing machine (TM). It turns

out that such “standard” models capture the computational power of classical physics

(from which they are, after all, intuitively motivated). But if we start from quantum

physics we are led to rather diﬀerent models which can provide remarkable new modes of

computation that are not available in the formalism of standard TMs. In view of Feyn-

man’s quote above these new models are not merely unrealistic abstract constructs (such

as the notion of non-deterministic TM) but are actually available for implementation in

the real world of computer technology, leading us to our second category of reasons for

quantum computation:

Technological issues: according to Moore’s law, since 1965 there has been a steady

rate of miniaturisation of computer component, by approximately a factor of 4 every 3.5

years. If this trend continues we will reach the subatomic scale by 2015. At this scale

classical physics fails completely and quantum eﬀects are dominant – components begin

to malfunction in bizarre ways. We could either aim to re-design our components to

stamp out the new eﬀects and provide the same functions as before, or else we could

embrace the new quantum eﬀects, aiming to exploit them in new kinds of computational

functionalities. Our next category of reasons shows that the latter is the way to go!

Theoretical issues: as already mentioned the mathematical formalism of quantum

theory leads to new “non-classical” modes of computation providing remarkable new

possibilities. One of the most signiﬁcant issues in computational complexity theory is

the question of the existence of a polynomial time algorithm for a given computational

COMSM0214 5

task. In some cases the new quantum computational possibilities are able to bridge this

barrier, as we’ll see later in this course. The most famous example is the computational

task of integer factorisation. In classical computation there is no known algorithm that

runs in polynomial time (in the number of digits) but in 1994 Peter Shor discovered a

polynomial time quantum algorithm for factorisation.

The main focus of this course will be to introduce the basic formalism of quantum theory

(as far as required for our purposes, and assuming no prior contact). Then we will be able

to give a precise meaning to the notion of “quantum computation” and discuss a variety

of quantum algorithms (including Shor’s algorithm) to illustrate the power of quantum

versus classical computation.

It is issues of computational complexity rather than computability itself that are at the

heart of the beneﬁts of quantum versus classical computation: it will be clear from our

notion of quantum computation that a quantum computer cannot compute anything

that is classically uncomputable (such as deciding the famous halting problem for Turing

machines). However we will see examples in which quantum computation can solve

computational tasks exponentially faster (i.e. with exponentially fewer steps) than any

(known) classical algorithm for the task. This is achieved not by an increase in clock

speed of steps but by exploiting entirely new (quantum) kinds of computational steps

that are not available to classical computers but are allowed by quantum physics.

Quantum physics also has remarkable implications for issues of communication such as

the so-called process of quantum teleportation (which we’ll also discuss) and a variety of

important cryptographic issues (which we wont treat in this course) such as the ability

to implement provably secure communication.

Quantum computation, being based on a “real” physical theory, is intended to be a real-

isable technology. To date most of the individual ingredients of quantum algorithms have

been demonstrated successfully experimentally but the construction of a “scalable” quan-

tum computing device to carry out large computations remains well beyond the perceived

limits of current state of the art quantum technology and laboratory experimentation.

So, when will we have a working quantum computer? Breakthroughs in technology are

often sudden and unpredictable and we quote the physicist N. D. Mermin: “Only a rash

person would declare that there will be no useful quantum computers by 2050, but only

a rash person would predict that there will be”.

2 Prologue - a curious way to represent classical

computation

We begin our approach to the formalism of quantum theory by representing familiar

classical computation in an unusual notational way. This approach is intended to make

the transition from classical to quantum concepts more transparent.

Conventionally classical information is represented as a string of bits such as 11001011.

A classical computation processes this information in discrete steps by updating it into

COMSM0214 6

other such strings. These updates are carried out in a “local” manner: in each step only

a few contiguous bits are changed by the application of a so-called Boolean gate. We

focus on this Boolean circuit or gate array model of classical computation (in contrast

to say, the Turing machine model) because it allows the simplest generalisation to the

notion of quantum computation.

2.1 Bit strings and vectors

Each bit has two values or “states” viz. 0 or 1, which we’ll write using a curiously

asymmetrical bracket notation as [0` and [1` respectively. This is the so-called Dirac

“ket” notation that features widely in quantum theory. Furthermore we will take these

distinct objects (actually distinguishable states of some classical physical system) to be

orthogonal unit vectors in a 2-dimensional space! More general vectors such as a [0`+b [1`

(where a and b are numbers) have no signiﬁcance for classical computation but they will

later become very important in the quantum context.

In elementary mathematics vectors are often denoted as underlined symbols such as a or

v etc. In ket notation these become [a` and [v` etc.

Having introduced these orthogonal vectors to represent bits, we can now develop the

basic elements of classical computation in terms of the linear algebra of vectors rather than

in terms of Boolean operations on the bit values 0,1 directly. To do this we will need some

mathematics of linear algebra which we will introduce in situ along the way as “linear

algebra inserts”. From the viewpoint of just classical computation this linear algebra

formulation appears very strange and even perversely pointless! But its signiﬁcance will

soon become apparent when we begin to consider the structure of quantum theory. At

present your task is to become familiar with the relevant linear algebra that we develop.

Linear algebra insert 1 (vector basis and components):

A basis for a vector space is a set of vectors ¦[e

1

` , . . . , [e

n

`¦ such that any vector [v`

can be written uniquely as a linear combination [v` = a

1

[e

1

` + . . . + a

n

[e

n

`. For our

applications the coeﬃcients a

1

, . . . , a

n

(the components of the vector in the basis) will be

complex numbers. The dimension of a vector space is the number of elements in a basis.

(This can be shown to be the same for any choice of basis). Once we have chosen a basis

any vector [v` can be represented as the column vector of its components:

[v` =

¸

¸

a

1

.

.

.

a

n

.

Strictly speaking this is not an equality, only a representation, but we will abuse notation

and frequently write an equality symbol here.

COMSM0214 7

Example 1 The set ¦[0` , [1`¦ is a basis for a 2 dimensional space. If [v` = a [0` +b [1`

then we write

[0` =

1

0

[1` =

0

1

[v` =

a

b

.

For two bits we introduce four orthogonal unit basis vectors

[00` , [01` , [10` , [11`

which we also write as

[0` [0` , [0` [1` , [1` [0` , [1` [1`

suggesting a kind of “multiplication” of “single bit vectors”. Note that this multiplication

is not commutative – order matters! e.g. [0` [1` is diﬀerent from [1` [0` etc. Mathemati-

cally this will be the so-called tensor product of vectors (see insert 2), sometimes denoted

with a ⊗ symbol i.e. we write [0` [0` etc. as [0` ⊗[0` etc. if we wish to make the prod-

uct operation explicit. A general vector in this 4 dimensional space associated to 2-bit

strings, is

[v` = a [0` [0` + b [0` [1` +c [1` [0` + d [1` [1`

where a, b, c, d are complex numbers. Similarly for n-bit strings we get a space of di-

mension 2

n

with a basis vector [b

1

. . . b

n

` = [b

1

` . . . [b

n

` associated to each n-bit string

b

1

. . . b

n

.

Linear algebra insert 2 (tensor products of vectors):

Let V be a vector space of dimension m with basis ¦[e

1

` , . . . , [e

m

`¦ and let W be a vector

space of dimension n with basis ¦[e

1

` , . . . , [e

n

`¦. The tensor product of V and W, denoted

V ⊗ W, is a vector space of dimension mn with a basis of mn vectors written formally

as ¦[e

i

`

e

j

**: i = 1, . . . m, j = 1, . . . , n¦. We also sometimes write [e
**

i

`

e

j

as [e

i

` ⊗

e

j

.

A general vector [u` in V ⊗W can be written

[u` =

m

¸

i=1

n

¸

j=1

c

ij

[e

i

`

e

j

(1)

(where the coeﬃcients c

ij

are complex numbers). If [v` =

¸

i

a

i

[e

i

` and [w` =

¸

j

b

j

e

j

**are vectors in V and W respectively then their tensor product, denoted [v` [w` or [v`⊗[w`,
**

lies in V ⊗W and is deﬁned by

[v` [w` = (

¸

i

a

i

[e

i

`) ⊗(

¸

j

b

j

e

j

) =

¸

i

¸

j

a

i

b

j

[e

i

`

e

j

i.e. the components c

ij

in eq. (1) for such products of vectors from V and W, have the

special product form c

ij

= a

i

b

j

. But not all vectors in V ⊗W can be “factorised” in this

way! We will say more about this important point later.

COMSM0214 8

Example 2 Let V with basis ¦[0` , [1`¦ be the 2 dimensional vector space associated to a

single bit. Then V ⊗V contains vectors of the form a [0` [0` +b [0` [1` +c [1` [0` +d [1` [1`

which we write in components as the column vector

¸

¸

¸

a

b

c

d

.

Thus V ⊗V is the vector space associated to 2-bit strings. If we have two vectors in V :

[u` = x

0

[0` + x

1

[1` =

x

0

x

1

[v` = y

0

[0` + y

1

[1` =

y

0

y

1

then

[u` [v` = (x

0

[0`+x

1

[1`)⊗(y

0

[0`+y

1

[1`) = x

0

y

0

[0` [0`+x

0

y

1

[0` [1`+x

1

y

0

[1` [0`+x

1

y

1

[1` [1`

and in components we write

x

0

x

1

⊗

y

0

y

1

=

¸

¸

¸

x

0

y

0

x

0

y

1

x

1

y

0

x

1

y

1

.

Similarly for three vectors we get 8 components

x

0

x

1

⊗

y

0

y

1

⊗

z

0

z

1

=

¸

¸

x

0

y

0

z

0

.

.

.

x

1

y

1

z

1

and for example, the basis state [101` = [1` [0` [1` (the sixth in our basis list) has

0

1

⊗

1

0

⊗

0

1

=

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

0

0

0

0

0

1

0

0

.

2.2 Computational steps as linear operations on vectors

For updating a bit string in a computational step we will consider reversible operations

on n-bit strings (where n is typically 1,2 or 3). These are mappings from n-bit strings

to n-bit strings such that the input string can be uniquely determined from the output

string i.e. the mapping is one-to-one so it must be a permutation of the set of all bit string

COMSM0214 9

values. In conventional classical computing not all common operations are reversible e.g.

the 2-bit operation which updates (b

1

, b

2

) to (b

1

, b

1

ANDb

2

) is not reversible as 00 and

01 are both mapped to 00. In quantum computation later, reversible operations will play

a fundamental role which is why we focus on them here. Also it is known that universal

classical computation can be performed (without any signiﬁcant loss of eﬃciency) if we

allow only reversible operations (see Preskill’s notes for more details about this point).

For a single bit there are only two reversible operations, the identity operation I which

“does nothing” I [0` = [0` , I [1` = [1`, and the NOT operation, or bit ﬂip operation,

which we write as X:

X [0` = [1` X [1` = [0` .

We can extend the action of X to general vectors “by linearity” i.e. by allowing it to

“act freely across sums”:

X(a [0` +b [1`) = aX [0` +bX [1` = a [1` +b [0` . (2)

The general notion of linear operation is described in the insert below.

Note that according to eq. (2) the action of X on general vectors is completely determined

by its action on just the basis states only. In terms of components the action of X is

given by matrix multiplication:

X

a

b

=

0 1

1 0

a

b

=

b

a

.

Thus X has a matrix representation

X =

0 1

1 0

.

(As for vectors and components, strictly speaking this is not an equality but rather

a representation. However we will frequently abuse notation and write equality here.)

Similarly we have the matrix representation:

I =

1 0

0 1

.

Linear algebra insert 3 (linear operations):

A linear operation L on an n dimensional vector space V is a map L : V → V such that

L(a

1

v

1

+ a

2

v

2

) = a

1

L(v

1

) + a

2

L(v

2

)

holds for any vectors v

1

, v

2

and (complex) numbers a

1

, a

2

. If we represent vectors in

terms of components then any such L is described as an n n matrix which we denote

by [L] (or just L when the context is clear). The result L(v) has a column vector given

by the matrix multiplication of [L] into the column vector of v. Given the operation L,

the entries of the matrix [L] are constructed as follows: Let e

1

, . . . , e

n

be the basis of V .

Then the i

th

column of the matrix [L] is given by the column vector of components of

L(e

i

) i.e. the action of L on the i

th

basis vector.

COMSM0214 10

For 2-bit strings we have more reversible operations (indeed there are 4! = 24 permuta-

tions of the four values) which now have matrix representations as 4 4 matrices. We

consider a few important examples.

The swap operation S maps [b

1

` [b

2

` to [b

2

` [b

1

` i.e. 00 and 11 are left unchanged but

01 and 10 are interchanged. You should verify that the matrix of S is

S =

¸

¸

¸

1 0 0 0

0 0 1 0

0 1 0 0

0 0 0 1

.

The controlled-NOT operation or CNOT

12

operation, acting on two bits b

1

b

2

is de-

ﬁned as follows: if b

1

= 0 then b

1

b

2

is left unchanged but if b

1

= 1 then b

2

is ﬂipped

(i.e. X is applied to b

2

). Thus the action of X on b

2

is “controlled by” the value of b

1

.

Explicitly we have:

CNOT

12

:

00 −→ 00

01 −→ 01

10 −→ 11

11 −→ 10

i.e. the last two strings are interchanged. We can also write

CNOT

12

[b

1

` [b

2

` = [b

1

` [b

1

⊕b

2

`

where ⊕ denotes addition modulo 2. Note that the two bits play asymmetrical roles.

The ﬁrst bit is called the control bit and the second is called the target bit. We write

CNOT

12

with subscripts to make the asymmetry explicit. For example we could reverse

the roles of the two bits to get CNOT

21

acting as follows:

CNOT

21

:

00 −→ 00

01 −→ 11

10 −→ 10

11 −→ 01

in which the second bit is now the control. You can verify that the matrix of CNOT

12

is

CNOT

12

=

¸

¸

¸

1 0 0 0

0 1 0 0

0 0 0 1

0 0 1 0

**(which has I in the top corner and X in the lower right corner).
**

Further examples of 2-bit operations are given by applying 1-bit operations to

individual bits of a 2-bit string. This leads to the notion of tensor products of operations.

COMSM0214 11

Suppose we apply X to the ﬁrst bit of b

1

b

2

. This operation is denoted as X ⊗ I (to

indicate also that b

2

is left unaﬀected) and we have:

X ⊗I :

00 −→ 10

01 −→ 11

10 −→ 00

11 −→ 01

.

Hence its matrix is

X ⊗I =

¸

¸

¸

0 0 1 0

0 0 0 1

1 0 0 0

0 1 0 0

**(as we easily check by looking at the columns). Note that this matrix has a special
**

structure determined by X and I: look at the 2 2 blocks in the four corners. Letting

0 denote the 2 2 matrix with all zero entries, we have

0 I

I 0

**i.e. a pattern of I’s determined by the entries of X. This matrix for X⊗I is constructed
**

by starting with the matrix of X and replacing each entry with “the 2 2 matrix I

multiplied by the numerical value of that entry”.

The same prescription holds true for general 1-bit operations A and B acting on the ﬁrst

and second bits respectively to give the 2-bit operation denoted A⊗B. To construct its

matrix, start with the matrix of A and replace each entry with that entry multiplied by

the matrix of B. More example will be given later in '4.1. For now you should verify

that this prescription gives

I ⊗X =

¸

¸

¸

0 1 0 0

1 0 0 0

0 0 0 1

0 0 1 0

**corresponding to doing X on the second bit, and
**

X ⊗X =

¸

¸

¸

0 0 0 1

0 0 1 0

0 1 0 0

1 0 0 0

**corresponding to doing X on both bits. (In each case look at the structure of 2 2
**

blocks.)

So far our examples of operations have been restricted to those that “make sense” on

classical bits i.e. they map bit values to bit values, albeit represented curiously as vectors!

However we can introduce more general linear operations on vectors that do not preserve

the actual classical bit representations [0` and [1`. Such operations will later play a

COMSM0214 12

fundamental role in the quantum formalism. Two of the most important are the 1-bit

operations:

Z =

1 0

0 −1

H =

1

√

2

1 1

1 −1

. (3)

H is called the Hadamard operation. Note that Z [1` = −[1` which makes no sense

in the context of bit values but it makes perfectly good sense in the context of vectors!

Even more peculiarly we have

H [0` =

1

√

2

([0` +[1`) H [1` =

1

√

2

([0` −[1`). (4)

Example 3 Although these more general operations themselves make no sense for bit

values, they can serve to reveal new relationships between operations that do make sense.

For example we saw that CNOT

12

and CNOT

21

were diﬀerent (and classically “valid”)

2-bit operations. We can now verify that

CNOT

21

= (H ⊗H)(CNOT

12

)(H ⊗H)

i.e. we can reverse the control/target roles of the two bits by applying H to each bit vector

both before and after the CNOT action. We can check the validity of this relation on

each basis state. For example starting with [b

1

` [b

2

` = [0` [1`, application of H ⊗H gives

1

√

2

([0` +[1`) ⊗

1

√

2

([0` −[1`) =

1

2

([00` −[01` +[10` −[11`).

Then applying CNOT

12

gives

1

2

([00` −[01` +[11` −[10`). (5)

Applying H ⊗ H again and laboriously collecting and cancelling terms, we get [1` [1`

i.e. CNOT

21

[0` [1` as required. An easier way to do the last step above is to note the

factorisation of eq. (5) (looking at the ﬁrst two and last two terms)

1

2

([00` −[01` +[11` −[10`)

=

1

2

([0` ([0` −[1`) −[1` ([0` −[1`))

=

1

√

2

([0` −[1`)

1

√

2

([0` −[1`).

(6)

Next note that the Hadamard operation is self inverse H = H

−1

i.e. HH = I since

HH =

1

√

2

1 1

1 −1

1

√

2

1 1

1 −1

=

1 0

0 1

.

Hence from eq. (4) we get

H(

1

√

2

([0` +[1`)) = [0` H(

1

√

2

([0` −[1`)) = [1`

so H ⊗H applied to eq. (6) gives [1` [1`.

COMSM0214 13

Example 4 (Pauli matrices). We can supplement the X and Z operations with a

further Y operation deﬁned as:

Y = ZX = −XZ =

0 1

−1 0

.

Note that X

2

= Z

2

= I whereas Y

2

= −I so we introduce σ

y

= −iY and we now have

all the so-called Pauli operations:

σ

x

= X =

0 1

1 0

σ

y

= −iY =

0 −i

i 0

σ

z

= Z =

1 0

0 −1

**which occur frequently in the quantum formalism. They have elegantly simple multiplica-
**

tive properties:

σ

2

x

= σ

2

y

= σ

2

z

= I

σ

x

σ

y

= −σ

y

σ

x

= iσ

z

σ

y

σ

z

= −σ

z

σ

y

= iσ

x

σ

z

σ

x

= −σ

x

σ

z

= iσ

y

(noting the cyclic shift of x, y, z labels in the last three lines).

2.3 Complex numbers – summary

In the extension of all the above concepts to quantum physics, complex numbers will

feature predominantly. Hence we give here a summary of the basic properties of complex

numbers that we’ll need.

Complex numbers are obtained by extending the real numbers with a new symbol i

formally satisfying i

2

= −1. A general complex number a has the form a = x +iy where

x and y are real, and are called the real part and imaginary part respectively, of a. If

b = s +it is a second complex number then the sum a +b = (x +s) +i(y +t) is formed

by collecting the real and imaginary parts. Using i

2

= −1 we get the product

ab = (x + iy)(s + it) = (xs −yt) + i(xt + ys)

(the term −yt arising from iy.it). The modulus [a[ of a is deﬁned by [a[ =

x

2

+ y

2

.

The complex conjugate a of a is deﬁned by replacing i by −i: a = x − iy. Hence

aa = x

2

+y

2

= [a[

2

so a(a/[a[

2

) = 1. Thus reciprocals are given by

1

a

=

a

[a[

2

=

x

x

2

+ y

2

−

y

x

2

+ y

2

i

and general division b/a is given as the product of b with 1/a.

A complex number a = x + iy can be represented pictorially as a real 2-dimensional

vector with components (x, y). The 2-dimensional plane of these vectors is called the

complex plane or Argand diagram.

COMSM0214 14

`

x

y

θ

r

a = x + iy

In terms of polar co-ordinates (r, θ) we have

a = x + iy = r(cos θ + i sin θ)

and r =

x

2

+ y

2

= [a[ is the modulus of a. θ is called the phase of a. A complex

number of modulus 1 (so lying on the circle of radius 1 in the complex plane) is called a

pure phase.

Using properties of the exponential function extended to complex values, it may be shown

(and we omit the details) that e

iθ

= cos θ+i sin θ so a = x+iy may be written as a = re

iθ

and pure phases are written e

iθ

.

3 Qubits and quantum states

We now begin our discussion of the basic postulates and principles of quantum the-

ory, which will utilise the full scope of our formalism of vectors, matrices and complex

numbers. Recall that a bit was really to be thought of as a classical physical system

with two chosen distinguishable states labelled 0 and 1. It is a postulate of quan-

tum theory that the states of any physical system are represented by (unit

length) vectors with complex components and that physically distinguishable

states correspond to orthogonal vectors. (Soon below we will give a precise dis-

cussion of the notions of orthogonality, length and inner products for complex vectors).

We emphasise here that, in classical physics, any two diﬀerent states of a system are

in principle distinguishable, but in quantum theory this is no longer the case! We will

discuss and elaborate on this important feature later when we consider the formalism of

quantum measurements. For now, suﬃce it to say that if two states [ψ

1

` and [ψ

2

` are

not orthogonal as vectors then no physical process can distinguish them with certainty.

The simplest non-trivial quantum system has states lying in a 2 dimensional vector

space, thus allowing only two mutually distinguishable states. Choosing a pair of such

orthogonal unit vectors and labelling them [0` and [1`, the general state can be written

[ψ` = a [0` + b [1`

where a and b are complex numbers subject to the unit length condition

[a[

2

+[b[

2

= 1.

COMSM0214 15

We say that [ψ` is a superposition of states [0` and [1` with amplitudes a and b. The

amplitudes are just the components of the vector and we also write

[ψ` =

a

b

.

We will discuss the physical interpretation of such superposition states later in '6.

Any quantum system, with a 2 dimensional state space and with a chosen orthogonal

basis ¦[0` , [1`¦ is called a qubit. The basis states [0` , [1` are called computational basis

states or standard basis states. There are many real physical systems that can embody

the structure of a qubit, for example the spin of an electron, the polarisation of a pho-

ton, superpositions of two selected energy levels in an atom etc. but our mathematical

formalism allows us to abstract away from having to deal with such explicitly physical

considerations.

3.1 Two-qubit states

Moving on to larger systems, in the classical case systems exist with any ﬁnite number

of distinguishable states but we conventionally choose to represent them in terms of

bit strings. Similarly in quantum theory systems exist with state spaces of any ﬁnite

dimension but we will focus on systems comprising increasing numbers of qubits.

Our second postulate of quantum theory tells us how to combine systems

together to obtain larger systems: if system S

1

had state space V

1

and system S

2

has state space V

2

then the joint system obtained by taking S

1

and S

2

together, has

states given by arbitrary unit vectors in the tensor product space V

1

⊗V

2

.

As an explicit example consider two qubits. Let V denote the state space of a single

qubit. If x

0

[0` +x

1

[1` and y

0

[0` +y

1

[1` are two single-qubit states (i.e. unit vectors in

V ) then (recalling that [00` denotes [0` [0` or [0` ⊗[0` etc.) we see that

(x

0

[0` +x

1

[1`)(y

0

[0` + y

1

[1`) = x

0

y

0

[00` + x

0

y

1

[01` + x

1

y

0

[10` + x

1

y

1

[11` (7)

(constructed by juxtaposing single qubit states) is certainly an example of a two-qubit

state but states of this form do not exhaust all possibilities! i.e. in quantum theory,

intuitively, “the whole can comprise more than the sum of the parts”: the most general

2-qubit state is

[ψ` = a [00` + b [01` + c [10` + d [11` (8)

where the coeﬃcients (called amplitudes) are complex numbers satisfying the normalisa-

tion (unit length) condition:

[a[

2

+[b[

2

+[c[

2

+[d[

2

= 1. (9)

In terms of components we write

[ψ` =

¸

¸

¸

a

b

c

d

.

COMSM0214 16

Not all such states can be factorised into the product of single-qubit states as in eq. (7).

Factorisable states of the form eq. (7) are called product states (of two qubits) and states

not of this form are called entangled states (of two qubits).

How can we tell if a given state [ψ` in eq. (8) is entangled or not?

Theorem 1 The state [ψ` = a [00` + b [01` + c [10` + d [11` is entangled if and only if

ad −bc = 0.

Proof: (Optional) Looking at eq. (7), [ψ` is unentangled if and only if the four ampli-

tudes can be expressed as products:

a = x

0

y

0

b = x

0

y

1

c = x

1

y

0

d = x

1

y

1

for some choice of x

0

, x

1

, y

0

, y

1

. (10)

To prove our theorem we have:

(⇒) Suppose eq. (10) holds. Then ad −bc = x

0

y

0

x

1

y

1

−x

0

y

1

x

1

y

0

= 0.

(⇐) Suppose ad − bc = 0. We’ll need to consider several cases according to whether

some of a, b, c, d are zero or not (as we will want to divide by some of them). We give just

one case to illustrate the kind of argument involved; the others are all similar or easier.

Suppose b and d are non-zero. From ad = bc we get

a

b

=

c

d

= λ for some λ

so a = λb and c = λd. Substituting these into eq. (8) we get

[ψ` = λb [00` + b [01` + λd [10` + d [11` = (b [0` + d [1`)(λ[0` +[1`)

showing that [ψ` has the product state form. But the two factors above are generally

not unit vectors i.e. [a[

2

+[d[

2

= 1 and [λ[

2

+ 1 = 1. However from eq. (9) we get

[λb[

2

+[b[

2

+[λd[

2

+[d[

2

= 1.

Factoring this and taking square roots gives

[b[

2

+[d[

2

[λ[

2

+ 1

2

= 1 so [ψ` is actually

the product of two unit vectors (b [0`+d [1`)/

[b[

2

+[d[

2

and (λ[0`+[1`)/

[λ[

2

+ 1

2

.

Example 5 [ψ` =

1

2

([00` +[01` +[10` +[11`) has a = b = c = d =

1

2

so ad −bc = 0 and

[ψ` must be a product state. We easily verify that if [v` =

1

√

2

([0` +[1`) then [ψ` = [v` [v`.

On the other hand [ψ` =

1

√

2

([00` + [11`) has a = d =

1

√

2

and b = c = 0 so ad − bc = 0

and [ψ` is entangled i.e. [ψ` cannot be written as [v

1

` [v

2

` for any choice of 1-qubit states

[v

1

` and [v

2

`.

3.2 Multi-qubit states

The above generalises to any ﬁnite number n of qubits. The n-fold tensor product

V ⊗. . . ⊗V (n times) has dimension 2

n

and is spanned by the computational basis states

[x` = [x

1

` [x

2

` . . . [x

n

` where x = x

1

x

2

. . . x

n

is any n-bit string.

COMSM0214 17

The general n-qubit state can be written

[ψ` =

¸

x

a

x

[x` where

¸

x

[a

x

[

2

= 1

so we have 2

n

amplitudes (complex numbers) subject to the normalisation condition. We

note (for later) the signiﬁcant fact that as the number of qubits grows linearly the full state

description (given as the full list of amplitudes) grows exponentially in its complexity.

An n-qubit state is a called a product state if it is the product of n single-qubit states

[ψ` = [v

1

` [v

2

` . . . [v

n

` and [ψ` is called entangled if it is not a product state. The

generalisation of theorem 1 for n > 2 is more complicated than the n = 2 case. It

turns out that we need (n − 1) algebraic equations (like ad − bc = 0 for n = 2) to fully

capture the product state condition. We omit the details.

3.3 Lengths and inner products

Above we have referred to vectors “of unit length” and vectors being “orthogonal”. We

now make these notions formally precise by introducing a notion of inner product for

vectors with complex number components. This is a direct generalisation of the familiar

notions for real 2 or 3 dimensional vectors of Euclidean geometry, with one slight catch

for the beginner: some complex conjugation will be involved (which has no eﬀect in the

case of real numbers, but in the complex case guarantees that certain expressions come

out to be real numbers.)

Linear algebra insert 4 (Inner products, orthogonality, lengths):

Let

a =

a

1

a

2

b =

b

1

b

2

**be two-dimensional complex vectors. The inner product is deﬁned as
**

a.b = a

1

b

1

+ a

2

b

2

=

a

1

a

2

b

1

b

2

. (11)

Note the asymmetrical appearance of complex conjugation (denoted by the over-bar) so

a.b generally diﬀers from b.a. In fact b.a is the complex conjugate of the complex number

a.b because

b.a = b

1

a

1

+ b

2

a

2

= a

1

b

1

+a

2

b

2

= a.b

(Here we have used the fact that for any complex numbers c

1

, c

2

we have c

1

+ c

2

= c

1

+c

2

,

c

1

c

2

= c

1

c

2

and c

1

= c

1

). Note also the RHS expression in eq. (11) as a matrix product

in which the column vector of a has been complex conjugated and transposed to form a

row vector. For any (mm) matrix M the operation of “complex conjugation followed

by transposition” will be denoted by a dagger and transposition alone, by a superscript T.

Thus

a

1

a

2

†

=

a

1

a

2

a

1

a

2

T

=

a

1

a

2

COMSM0214 18

and for example, if

M =

1 + i 2 −3i

5 6 + 4i

then

M

†

=

1 −i 5

2 + 3i 6 −4i

and M

T

=

1 + i 5

2 −3i 6 + 4i

(the rows of M

†

are the complex conjugated columns of M). Returning to our vectors,

the length of a is

[a[ =

√

a.a =

[a

1

[

2

+[a

2

[

2

.

a and b are orthogonal if a.b = 0 i.e. if a

1

b

1

+a

2

b

2

= 0.

The above deﬁnitions and expressions generalise in the obvious way to n dimensional

complex vectors e.g. if

a =

¸

¸

a

1

.

.

.

a

n

b =

¸

¸

b

1

.

.

.

b

n

then a.b =

¸

i

a

i

b

i

etc. Inner products have a linearity property for vectors in the second

slot:

a.(c

1

b

1

+c

2

b

2

) = c

1

a.b

1

+ c

2

a.b

2

(where c

1

, c

2

are complex numbers) as is easily veriﬁed from the deﬁnition. For the ﬁrst

slot we have an anti-linearity property:

(c

1

a

1

+c

2

a

2

).b = c

1

a

1

.b + c

2

a

2

.b

in which the coeﬃcients c

1

, c

2

must be complex conjugated if separated out as on the RHS.

In our ket notation for vectors we write the column vector

a

0

a

1

as [a` = a

0

[0` + a

1

[1`

and for inner products we introduce some further corresponding notation: the complex

conjugate transpose of the ket [a` is written as a so-called bra vector with the brackets

reversed:

'a[ = (a

0

[0` +a

1

[1`)

†

= a

0

'0[ + a

1

'1[ =

a

0

a

1

**and the inner product a.b of two such ket vectors is denoted as 'a[b` i.e. we just juxtapose
**

the bra and ket vectors to represent the matrix product of the complex conjugated row

vector of a with the column vector of b.

If [a` = a

0

[0` + a

1

[1` and [b` = b

0

[0` + b

1

[1` we can “multiply out” the compound

expression 'a[b` in the obvious way, as follows:

'a[b` = (a

0

'0[ + a

1

'1[)(b

0

[0` + b

1

[1`)

= a

0

b

0

'0[0` + a

0

b

1

'0[1` + a

1

b

0

'1[0` + a

1

b

1

'1[1`

= a

0

b

0

+ a

1

b

1

.

COMSM0214 19

In the last equality above we have used the fact that [0` and [1` are orthogonal and have

unit length i.e.

'0[0` = '1[1` = 1 (unit length)

'0[1` = '1[0` = 0 (orthogonality)

and we get the correct previous expression eq. (11) for 'a[b`.

A set of pairwise orthogonal vectors, each of unit length, is called an orthonormal set.

Inner products “reduce vectors back down to numbers” i.e. the inner product of two

vectors (always from the same space) is a number, not a vector. We can generalise this

idea to situations involving tensor products. If [b` is a vector in the tensor product space

V ⊗W and [a` is a vector in V then the notion of inner product can be extended so that

'a[b` is a vector in W i.e. “the inner product with [b` in V reduces the V -part of [a`

down to a number, leaving a vector only in W”. Similarly if [a` was instead a vector in

W then 'a[b` would be a vector on V . We illustrate how this works in an example.

Example 6 Let V be the 2 dimensional space of a qubit with orthonormal basis [0` , [1`.

Consider

[A` = x

0

[0` + x

1

[1` in V

and

[B` = a [00` + b [01` + c [10` +d [11` in V ⊗V .

To emphasise the two positions in V ⊗V we can introduce subscripts:

[B` = a [0`

1

[0`

2

+ b [0`

1

[1`

2

+ c [1`

1

[0`

2

+ d [1`

1

[1`

2

.

If [A` = x

0

[0`

1

+x

1

[1`

1

is a vector in the ﬁrst slot then

'A[B` = (x

0

'0[

1

+ x

1

'1[

1

)(a [0`

1

[0`

2

+ b [0`

1

[1`

2

+c [1`

1

[0`

2

+ d [1`

1

[1`

2

).

Now matching up the correct bra and ket slots and using the orthonormality relations for

[0` , [1` and collecting terms we get ﬁnally (as you should verify!)

'A[B` = (x

0

a +x

1

c) [0`

2

+ (x

0

b + x

1

d) [1`

2

as a vector in the second space slot. Similarly if [A` is taken instead to be a vector

[A` = x

0

[0`

2

+ x

1

[1`

2

in the second space slot then 'A[B` is a vector in the ﬁrst space

given (after a direct calculation) by

(x

0

a + x

1

b) [0`

1

+ (x

0

c +xd) [1`

1

which is diﬀerent from the previous result.

4 Physical operations on qubits

So far we have been discussing quantum state descriptions and now we move on to

consider dynamics i.e. the kinds of state changes or physical evolution processes that

COMSM0214 20

are allowed in quantum theory. In our previous discussion of classical bit operations we

focussed on reversible operations represented as particular kinds of linear transformations

on vectors (those that take computational basis vectors to computational basis vectors).

The quantum case is a generalisation of this formalism: it is a postulate of quantum

theory that any physical evolution of a quantum system is represented by the

action of a linear unitary operation on the state space. The mathematical notion

of unitarity is explained in insert 5 below. Such an operation on n qubits is called an

n-qubit gate.

Linear algebra insert 5 (unitary operations and unitary matrices):

A linear operation L on a vector space V is called unitary if it preserves inner products

i.e. (Lv

1

).(Lv

2

) = v

1

.v

2

for all vectors v

1

, v

2

in V . A matrix U is called unitary if its

complex conjugate transpose is its inverse i.e. we require

UU

†

= U

†

U = I

(where I is the identity matrix). It can be shown that a linear operation L is unitary

iﬀ its matrix representation [L] is a unitary matrix. Since any unitary matrix has an

inverse matrix, it must also correspond to a reversible operation.

We will sometimes need to check if a given matrix is unitary or not.

Theorem 2 An n n matrix of complex numbers is unitary iﬀ its columns form an

orthonormal set of vectors.

Proof: (⇐) Suppose the columns of M form an orthonormal set. Recall that the rows

of M

†

are the complex conjugated columns of M and that in the matrix product M

†

M

the ij

th

entry of the result is the i

th

row of M

†

“summed against” the j

th

column of

M i.e. it is the inner product of the i

th

and j

th

columns of M. Since these columns

are an orthonormal set, the matrix M

†

M has 1’s on the diagonal and 0’s elsewhere i.e.

M

†

M = I. Similarly MM

†

= I.

(⇒) Suppose M is unitary so M

†

M = I. As noted above (i.e. looking at how matrix

products are calculated) this means precisely that the columns of M are an orthonormal

set.

Quantum physical operations on n qubits are described by 2

n

2

n

sized unitary matrices.

In our discussion of classical computation we saw that reversible classical operations on

n-bit strings are just permutations of the set of bit strings. Representing these bit strings

as basis vectors, any such permutation Π corresponds to a matrix [Π] which has a single

“one” in each column and all other entries are zero. Furthermore the position of the

“one” in any column cannot be duplicated in any other column (otherwise Π would map

two strings to the same string) i.e. the columns are orthonormal any any such classical

COMSM0214 21

reversible operation has a unitary matrix. But there are more general unitary matrices

than these, which are all allowed as operations in the quantum formalism. For example

X =

0 1

1 0

is the unitary matrix of the classical bit ﬂip operation and the matrices in eq. (3) are uni-

tary (1-qubit) operations allowed in the quantum formalism which are not interpretable

in the classical formalism.

4.1 Operations on 1 or 2 qubits of an n-qubit state

Instead of applying (large) n-qubit operations to an n-qubit state “globally” we will gen-

erally be interested in manipulating only one or two qubits in any given step. This is the

quantum analogue of the classical locality condition: that n-qubit states are successively

updated by applying operations that act on only a few qubits each.

The appropriate formalism (tensor products of operations) for calculating the eﬀect of

such gates on n-qubit states was given in '2.2 and we re-iterate the essential features

here.

Suppose we wish to apply a 1-qubit unitary operation U to the ﬁrst qubit of the 2-qubit

state

[ψ` = a [00` + b [01` + c [10` + d [11` .

We write this operation at the 2-qubit level as U ⊗I to represent the fact that the second

qubit receives the identity operation. Suppose

U [0` = x

0

[0` + x

1

[1` U [1` = y

0

[0` +y

1

[1`

i.e. the matrix of U is

[U] =

x

0

y

0

x

1

y

1

.

Then we can explicitly compute U ⊗I [ψ` using the linearity property of the operation,

as

U ⊗I [ψ` = U ⊗I(a [00` + . . . d [11`)

= a(U [0`) [0` +. . . +d(U [1`) [1`

= a(x

0

[0` + x

1

[1`) [0` + . . . + d(y

0

[0` + y

1

[1`) [1` .

Collecting terms we get

U ⊗I [ψ` = (ax

0

+ cy

0

) [00` + (bx

0

+ dy

0

) [01` + (ax

1

+cy

1

) [10` + (bx

1

+ dy

1

) [11` .

We can perform the same calculation with matrix multiplications and column vectors

as follows. Previously we introduced the tensor product A ⊗ B of matrices A and B:

A⊗B is constructed by taking each entry a

ij

of A and replacing it by the full matrix B

multiplied by that entry value. Thus U ⊗I has matrix

U ⊗I =

x

0

I y

0

I

x

1

I y

1

I

=

¸

¸

¸

x

0

0 y

0

0

0 x

0

0 y

0

x

1

0 y

1

0

0 x

1

0 y

1

.

COMSM0214 22

[ψ` has column vector (a, b, c, d)

T

so

U ⊗I [ψ` =

¸

¸

¸

x

0

0 y

0

0

0 x

0

0 y

0

x

1

0 y

1

0

0 x

1

0 y

1

¸

¸

¸

a

b

c

d

=

¸

¸

¸

x

0

a +y

0

c

x

0

b + y

0

d

x

1

a +y

1

c

x

1

b + y

1

d

**giving the same result as before.
**

Similarly we can calculate I ⊗U [ψ` which is generally diﬀerent from the above. Indeed

the matrix is now

I ⊗U =

U 0

0 U

=

¸

¸

¸

x

0

y

0

0 0

x

1

y

1

0 0

0 0 x

0

y

0

0 0 x

1

y

1

.

More generally if U and W are two 1-qubit operations then U ⊗ W is the 2-qubit gate

corresponding to the instruction“apply U to the ﬁrst qubit and W to the second qubit”.

These two applications of gates to diﬀerent qubits commute: it is readily seen from the

deﬁnitions that U ⊗W = (U ⊗I)(I ⊗W) = (I ⊗W)(U ⊗I).

5 Quantum measurements

In classical physics the state of any given physical system can be fully determined by

suitable “measurements” on the system. For example if we are presented with a bit

we can discover its value (0 or 1) while leaving it intact. The corresponding situation

in quantum theory is drastically diﬀerent! We have a mathematical formalism for the

process of quantum measurement which diﬀers considerably from the unitary physical

evolution process that we’ve been discussing so far. Thus in quantum theory the state

of a system can change in two distinct ways: by unitary evolution or by the process of

measurement, described below.

Suppose we are presented with a qubit in some (as yet unknown) state [ψ` = a [0` +b [1`.

According to the postulates of quantum measurement theory (developed below)

the only thing we can do to get information about the state is to “perform a test” or

“make a measurement” to see if the qubit value is 0 or 1. This test always returns a value

0 or 1 and the answer is given only probabilistically: 0 is seen with probability [a[

2

and 1

is seen with probability [b[

2

. (Recall that [a[

2

+[b[

2

= 1 for any qubit state). e.g. if we are

given a second copy of the same state [ψ` then the test may output a diﬀerent answer,

being just a second independent sample from the probability distribution. Furthermore

after the test the qubit’s state is no longer a [0` + b [1` but it has been “collapsed” to

be [0` or [1` corresponding to the seen output value. Thus, having performed the test,

the qubit’s state is irrevocably destroyed and we can gain no further information from it,

about the initial a, b values! This state transformation, from a [0` + b [1` to either [0` or

[1` is not a unitary process (e.g. it is not reversible as the information of a, b is no longer

present in the output; also it is a probabilistic process).

COMSM0214 23

Remark: Most quantum theory texts describe a more general notion of quantum measurement,

allowing measurement “relative to” any orthonormal basis. But it can be shown that this more

general notion is equivalent to our simpler one (i.e. relative to only the [0` , [1` basis), combined

with prior actions of unitary operations. In this course, we will use only the simpler notion of

measurement i.e. relative only to the computational basis.

In contrast to classical measurements, in summary, quantum measurements have only

probabilistic outcomes, they are generally invasive, unavoidably destroying the input state

and they reveal only a rather small amount of information about the (now irrevocably

lost) input state’s identity! Note that if the state was not collapsed on measurement (but

the output was still probabilistic as above) we could get a lot more information, estimating

the probabilities [a[

2

and [b[

2

to any desired accuracy from statistics of repeated samplings

of the distribution.

You may ask: where do all these crazy rules come from!? Well, alas, no one knows! But

the really astonishing thing is that this formalism of complex vectors, unitary operations

and probabilistic collapses etc. actually works in the real physical world to very accurately

describe a huge range of actual physical phenomena!!

The above features of measurement for one qubit generalise to the case of n qubits. If

[ψ` =

¸

x

a

x

[x` (12)

is an n-qubit state (so x ranges over all 2

n

n-bit strings) then we can measure it in

the computational basis obtaining an n-bit string x with probability Pr(x) = [a

x

[

2

.

The relationship between probabilities of outcomes in a measurement and amplitudes in

the superpostion is known as the Born rule (after the physicist Max Born). After the

measurement the state of the system is no longer [ψ` but it is “collapsed” to [x` where

x is the seen outcome.

5.1 The extended Born rule

We will often use a further important generalisation of the Born rule: given a state of

many qubits, we measure not all, but only some subset of them. Before describing this

extended Born rule in generality we will introduce it via an example.

Example 7 Consider the following 3-qubit state:

[ψ` =

3

10

[000` −

4

10

[010` +

i

2

[011` −

1

2

[100` +

1 +

√

3i

4

[111` .

Note that the amplitudes are correctly normalised:

3

10

2

+

−

4

10

2

+

i

2

2

+

−

1

2

2

+

1 +

√

3i

4

2

= 1.

COMSM0214 24

Suppose we wish to measure just the ﬁrst qubit in the computational basis. To calculate

the eﬀect of this we re-express [ψ` by collecting together all terms having ﬁrst qubit [0`

and [1` respectively (these being the possible measurement outcomes on qubit 1):

[ψ` = [0`

3

10

[00` −

4

10

[10` +

i

2

[11`

+[1`

−

1

2

[00` +

1+

√

3i

4

[11`

≡ [0` [ψ

0

` +[1` [ψ

1

` .

Note that the coeﬃcient vectors [ψ

0

` and [ψ

1

` have lengths less than 1. In fact their

squared lengths add to 1. A more compact way of obtaining these coeﬃcient vectors is to

use the partial inner product (cf example 6) on the ﬁrst qubit of [ψ`:

[ψ

0

` =

1

'0[ψ`

123

[ψ

1

` =

1

'1[ψ`

123

Then the outcome of the measurement is j = 0 or j = 1 with probability Pr(j) = 'ψ

j

[ψ

j

`

given by the squared length of the corresponding coeﬃcient vector. After the measurement

[ψ` is collapsed into the state [j` [ψ

j

` /

'ψ

j

[ψ

j

` corresponding to the seen outcome i.e.

we select all terms of [ψ` with the seen outcome in qubit 1 and re-normalise this vector

to have length 1. Thus for example in the above, the probability of getting outcome 1 is

'ψ

1

[ψ

1

` = [ −

1

2

[

2

+ [

1+

√

3i

4

[

2

=

1

2

and if this outcome is obtained the post-measurement

state is

[1` [ψ

1

` /(

1

√

2

) = −

1

√

2

[100` +

1 +

√

3i

2

√

2

[111`

which is correctly normalised to have length 1.

Suppose instead that we wish to measure qubits 1 and 2 of [ψ` in the computational basis.

In this case we collect together all terms of [ψ` corresponding to the four possible outcomes

00,01,10,11:

[ψ` = [00`

3

10

[0`

+[01`

−

4

10

[0` +

i

2

[1`

+[10`

−

1

2

[0`

+[11`

1+

√

3i

4

[1`

≡ [00` [ψ

00

` +[01` [ψ

01

` +[10` [ψ

10

` +[11` [ψ

11

` .

As above these coeﬃcient vectors can be compactly computed using partial inner prod-

ucts (now on qubits 1 and 2): [ψ

ij

` =

12

'ij[ψ`

123

. Then upon measurement, outcome

ij is seen with probability Pr(ij) = 'ψ

ij

[ψ

ij

`, the squared length of the correspond-

ing coeﬃcient vector, and the post-measurement state is “[ij` [ψ

ij

` re-normalised” i.e.

[ij` [ψ

ij

` /

'ψ

ij

[ψ

ij

`. For example in the above, outcome 10 is seen with probability

'ψ

10

[ψ

10

` =

1

4

and the post-measurement state is then [10` (−

1

2

[0`)/(

1

2

) = −[100`.

We now state the general form of the extended Born rule. Let [ψ` be a state of m+n qubits

and suppose we wish to measure m qubits in the computational basis. For notational

convenience suppose these are the m leftmost qubits. Any computational basis state

[z` of m + n qubits can be written as [z` = [x` [y` where z, x and y are respectively

(m + n)-bit, m-bit and n-bit strings with z being the concatenation z = xy of x and y.

Hence we can write

[ψ` =

¸

z

a

z

[z` =

¸

x,y

a

xy

[x` [y`

=

¸

x

[x`

¸

y

a

xy

[y`

≡

¸

x

[x` [ψ

x

`

COMSM0214 25

where we have collected all terms corresponding to each possible measurement outcome

x:

[ψ

x

` =

¸

y

a

xy

[y` =

m

'x[ψ`

m+n

where the latter partial inner product is over the measured qubit slots (here the m

leftmost qubits). The vectors [ψ

x

` have lengths less than 1. Then outcome x is seen with

probability

Pr(x) = 'ψ

x

[ψ

x

` =

¸

y

[a

xy

[

2

and if this outcome occurs then the post-measurement collapsed state is “[x` [ψ

x

` re-

normalised” i.e. [x` [ψ

x

` /

'ψ

x

[ψ

x

` in which the measured qubits acquire their seen

values.

6 Quantum interference

How should we intuitively think of a superposition of [0` and [1`? According to the Born

rule, a qubit in a superposition state a [0`+b [1` behaves – for the purposes of measurement

– just like a qubit which has been prepared in state [0` or [1` with probabilities [a[

2

and

[b[

2

respectively. However it is important to emphasise that a superposition state cannot

be interpreted as such a probabilistic mixture in more general quantum processes! To

see an explicit example, consider the equal superposition state [ψ

0

` =

1

√

2

([0` + [1`) and

the Hadamard gate:

H =

1

√

2

1 1

1 −1

which maps [0` to

1

√

2

([0` + [1`) and [1` to

1

√

2

([0` −[1`). H is also its own inverse so H

applied to [ψ

0

` gives [0`. Hence if we prepare a qubit in state [ψ

0

`, apply H and then

measure it, we will see outcome 0 with certainty. On the other hand if we prepare either

[0` or [1` and conduct the same process then in each case we will see outcome 0 and 1 with

probabilities half i.e. the equal probabilistic mixture of [0` and [1` will behave diﬀerently

from the equal superposition state. In an intuitive sense we think of [ψ

0

` =

1

√

2

([0` +[1`)

as “simultaneously having both [0` and [1` present” rather than just a probabilistic choice

of one or the other. When we apply H to [ψ

0

`, [0` (starting with amplitude a = 1/

√

2)

is transformed to [0` with amplitude c = 1/

√

2 and to [1` with amplitude c whereas [1`

(starting with amplitude a = 1/

√

2 also) is transformed to [0` with amplitude c and to

[1` with amplitude −c. Thus the total amplitude to go from [ψ

0

` to [0` is made up of

two “paths”: a(c) + a(c) = 1 which add. On the other hand the total amplitude to go

to [1` also has two contributions but these cancel: a(c) + a(−c) = 0. In the ﬁrst case

we say that the two paths interfere constructively whereas in the latter case we say that

they interfere destructively.

These notions of “interfering paths” form the basis of Feynman’s sum-over-paths

description of quantum mechanics which we now brieﬂy outline. To illustrate the

formalism consider the simple quantum process of applying H twice to [0`:

COMSM0214 26

[0`

»

1

√

2

[0` +

1

√

2

[1`

>

>

>

>

>

>. ·

1

√

2

(

1

√

2

[0` +

1

√

2

[1`) +

1

√

2

(

1

√

2

[0` −

1

√

2

[1`) = [0`

apply H

apply H

We think of this as a process of “transitions between basis states [0` and [1` with pre-

scribed amplitudes and we depict it as a branching tree, just like a probabilistic process

but the branches are labelled by amplitudes, not probabilities:

[0`

[1`

[0`

[1`

[0`

1

√

2

1

√

2

−

1

√

2

1

√

2

1

√

2

1

√

2

The rules for accumulating amplitudes are just like those for probabilities:

(i) each path has an amplitude given by the product of numbers along the path;

(ii) each ﬁnal basis state has an amplitude given by the sum over all paths from the

start to it;

(iii) the probability for the transition from initial [0` to a ﬁnal basis state is the modulus

square of the sum over all paths to it.

This is the Feynman sum-over-paths formulation of quantum processes. It turns out to

be an alternative equivalent description of the calculations involved in multiplying gate

matrices and applying the Born rule for a measurement on the ﬁnal state of the process.

As an illustration, in our example above there are two paths to go from initial [0` to ﬁnal

[0` or ﬁnal [1`. For ﬁnal [0` the two paths interfere constructively: [

1

√

2

1

√

2

+

1

√

2

1

√

2

[

2

= 1

whereas for ﬁnal [1` the two paths interfere totally destructively: [

1

√

2

1

√

2

−

1

√

2

1

√

2

[

2

= 0

and this transition is forbidden.

For probabilistic trees we can simulate the whole process by walking through some single

path of the tree so long as we make an appropriate probabilistic choice of direction at each

node along the way. But for quantum amplitude trees this is not possible! In the above

example there are non-zero single paths from initial [0` to ﬁnal [1` yet this transition is

forbidden. The process “needs to know about” all paths and how they interfere!

COMSM0214 27

Just as in the example above, any quantum process of applying a sequence of unitary

gates to an initial starting state of say, n qubits in state [0` . . . [0` can be represented as

a branching tree of amplitudes. The nodes are now labelled by n-bit strings organised

in columns, such that each column contains each n-bit string once. The quantum state

at the k

th

stage of the process corresponds to the nodes in the k

th

column of the tree.

The lines connecting column k to column k +1 are labelled by the amplitudes of the k

th

unitary gate acting on the corresponding basis state in the k

th

column. The Feynman

sum-over-paths rules then correctly reproduce the calculation of matrix multiplication of

the corresponding unitary matrices to determine the amplitudes in the ﬁnal state.

Remark: in physics this phenomenon of interference of diﬀerent paths for a single par-

ticle is well illustrated by the famous double slit experiment – see for example R. P.

Feynman’s Lecture Notes in Physics, volume 3, chapter 1 for a discussion. We set up

a screen with two holes in it, and a short distance behind it, erect a second full screen

which can detect the impact of a particle (photon or electron etc.) on it at each position.

Then we ﬁre single particles at the ﬁrst screen and register where they impact on the

second screen. There are locations x on the second screen with the following property:

if we close either of the two holes then the probability for the particle to impact at x is

non-zero (i.e. the particle is sometimes registered at x) but if we open both holes the

probability to impact at x is zero i.e. the particle is never seen there! This actually

happens in real physical experiments! Classically this is inexplicable: the probability to

impact at x should be the sum of the probabilities via the two holes. In the quantum

formalism we have amplitudes (prior to probabilities) to impact at x via each hole and

these interfere destructively before we square the total value to get the probability i.e.

we use the “square of the sum” rather than the “sum of squares”.

7 Quantum non-locality

One of the strangest and most controversial features of quantum theory (actually a

property of entanglement) is that the formalism implies the presence of “instantaneous

non-local eﬀects” in spatially distributed physical systems. Einstein was very unhappy

about this, referring to it as a “spooky action at a distance”, causing him to doubt

the completeness and correctness of quantum mechanics as a physical theory. He ﬁrst

drew attention to this feature (in a form now known as the Einstein-Podolsky-Rosen

(EPR) paradox) in a famous paper in 1935 and subsequently never gave up his skepticism

of quantum theory. In the 1950’s and 1960’s the essential ingredients were re-cast in

conceptually simpler ways especially in the so-called Bell inequalities (ﬁrst introduced by

J. S. Bell in 1965). Since then the validity of these predicted eﬀects has been veriﬁed many

times in actual experiments. Thus even if the mathematical formalism of quantum theory

turns out not to be our ﬁnal correct physical theory these “spooky nonlocal inﬂuences”

are here to stay as actual physical reality!

We will now describe one way in which this quantum nonlocality is manifested. It

also provides a very good instructive example of the extended Born rule in action in

conjunction with the application of various unitary quantum gates.

COMSM0214 28

Consider the entangled 2-qubit state

[ψ` =

1

√

2

([00` +[11`) .

We imagine that the two qubits (realised as states of two physical particles) are widely

separated in space, held respectively on the left by Alice (A) and on the right by Bob (B).

Even though the two particles are widely separated and have no “physical connection”,

their joint state [ψ` is not expressible as a product of a qubit state for A with a qubit

state for B. Nowadays this kind of state can be routinely manufactured in the laboratory

with pairs of photons or electrons etc. that move oﬀ in opposite directions from the

common source of their creation and interaction to produce [ψ`.

Suppose Alice measures her qubit (in the computational basis). According to the ex-

tended Born rule she will see outcome 0 or 1 with equal probabilities half and Bob’s

particle will be “instantaneously collapsed” into state either [0` or [1` corresponding to

A’s seen outcome. If B now immediately measures his qubit he will see outcome 0 or

1 which is always certainly the same as the outcome A got i.e. the two outcomes are

perfectly correlated. Thus at ﬁrst sight we may conclude that A’s measurement must

have instantaneously physically inﬂuenced B’s particle (by determining B’s outcome) but

this conclusion here is not justiﬁed! Whether or not A performed her measurement, B

will see a bit value 0 or 1 with equal probabilities half. The physical fact that A and B’s

outcomes are perfectly correlated is in fact not mysterious – it can be explained without

invoking any kind of instantaneous action at a distance: we simply say that at the source

of creation of [ψ` (when the two particles were together) they were each instilled with an

equal but randomly chosen bit value which they carried away and later gave up as the

measurement result i.e. the (correlated) measurement results were already (randomly)

chosen at the time of creation of [ψ`!

But one may still object: according to the Born rule A’s measurement instantaneously

changed the state of B’s particle from preparation 1: “the right half of [ψ`” to preparation

2: “a probabilistic choice of [0` or [1` chosen with probabilities half”. However the

mathematical descriptions of these preparations are not physically observable features:

given a particle in some quantum state, we are unable to learn the identity of that

state by any physical process (in contrast to classical physics states). In fact it can

be shown that preparations 1 and 2 above cannot be distinguished by any quantum

process at all – they actually give identical probability distributions of outcomes for any

measurement whatever. Thus this “instantaneous change of state” in the above example,

can be claimed to be just a quirk of the mathematical formalism and not a physically

real inﬂuence!

We now develop a more complicated scenario involving quantum processes with [ψ` which

does turn out to be manifestly mysterious! i.e. having physically observable eﬀects that

cannot be explained without some “spooky action at a distance”. We will use the results

of the following example.

Example 8 Let [ψ` be the entangled 2-qubit state [ψ` =

1

√

2

([00` +[11`) and consider the

COMSM0214 29

single-qubit unitary operation (called “rotation by θ”) given by

U(θ) =

cos θ −sin θ

sin θ cos θ

.

Thus

U(θ) [0` = cos θ [0` + sin θ [1` U(θ) [1` = −sin θ [0` + cos θ [1` .

If we apply U(α) ⊗U(β) to [ψ` a direct calculation (which you should verify) gives

[ψ

αβ

` = U(α) ⊗U(β) [ψ`

=

1

√

2

(cos(α −β) [00` −sin(α −β) [01` + sin(α −β) [10` + cos(α −β) [11`) .

(Here we have used the trigonometric identities cos(α−β) = cos αcos β +sin αsin β and

sin(α−β) = sin αcos β−cos αsin β.) Note that this operation may be implemented locally

by Alice applying U(α) and Bob applying U(β).

The Born rule immediately gives (as you should verify):

Fact 1: if we measure either one of the qubits in the computational basis we will get

output 0 or 1 with equal probabilities of half.

Consider now measuring both qubits. We can do this either by applying the Born rule

directly for the four possible outcomes 00,01,10,11 or else measuring one chosen qubit

ﬁrst (applying the extended Born rule) and then applying the Born rule again to the post-

measurement state of the remaining qubit. These approaches all give the same probability

distribution of 2-bit outcomes. Thus this measurement can be implemented locally, by A

and B each separately performing a 1-qubit measurement. We get:

Fact 2: if we measure both qubits then Pr(outcomes diﬀer) = sin

2

(α − β). (Note that

“outcomes diﬀer” means “outcome is 01 or 10”).

We now describe our “mysterious situation”. Let M

A

(α) denote the following operation

for Alice: apply U(α) to her qubit and measure it in the computational basis. Similarly

M

B

(β) for Bob. Alice will only use settings α = 0

◦

or 30

◦

and Bob will only use settings

β = 0

◦

or −30

◦

.

Suppose A and B have many [ψ` states and consider a long sequence of A and B doing

M

A

(α), M

B

(β) with each choosing one of their allowed settings (for the whole sequence).

For long sequences probabilities will reﬂect the frequencies of occurrence of 0’s and 1’s.

By fact 1 all individual sequences obtained will be uniformly random sequences of 0’s

and 1’s but by fact 2 A and B’s random sequences will display some correlations. We

have four cases of A and B’s sequences corresponding to pairs of settings:

(1) M

A

(0

◦

), M

B

(0

◦

): Pr(diﬀer) = sin

2

0

◦

= 0.

A and B’s sequences will be the same sequence.

(2) M

A

(0

◦

), M

B

(−30

◦

): Pr(diﬀer) = sin

2

30

◦

= 1/4.

The sequences will diﬀer in about 1 in 4 places.

COMSM0214 30

(3) M

A

(30

◦

), M

B

(0

◦

): Pr(diﬀer) = sin

2

30

◦

= 1/4.

The sequences will diﬀer in about 1 in 4 places.

(4) M

A

(30

◦

), M

B

(−30

◦

): Pr(diﬀer) = sin

2

60

◦

= 3/4.

The sequences will diﬀer in about 3 in 4 places!!

Suppose the measurement outcomes at A (respectively B) are not inﬂuenced by the mere

choice of settings at B (respectively A). This is our “locality assumption” and we now

show that it leads to a contradiction. Let S denote the common sequence that would

be obtained by A and B in (1) if they had both chosen angle setting zero. By (2) and

(3) the sequences for M

A

(30

◦

) and M

B

(−30

◦

) must each diﬀer in about 1 in 4 places

from S and hence in about 1 in 2 or fewer from each other (fewer possible here because

they may both diﬀer from S at some same places). By the locality assumption this must

remain true even if neither A nor B actually chose to use angle zero! The sequences

for M

A

(30

◦

) and M

B

(−30

◦

) must each be consistent with the possibility that the other

party chose angle zero even if they did not so choose! i.e. the locality assumption implies

the existence of a sequence S

**such that the ±30
**

◦

sequences each diﬀer in about 1 in 4

places from S

**(and hence fewer than 2 in 4 places from each other). But by (4) these
**

sequences actually diﬀer in about 3 in 4 places – measurably more (for long sequences)

than the expected 2 in 4 places! We conclude that the choice of settings on one side must

have inﬂuenced the outcomes on the other side i.e. the ±30

◦

-setting sequences cannot

be unaﬀected by the choice zero versus non-zero angle setting on the other side and we

have our “spooky action at a distance”.

Note that all sequences above are always uniformly random as sequences of 0’s and 1’s i.e.

our non-local action always replaces one uniformly random sequence by another uniformly

random sequence so the participants will not notice any eﬀect locally: the inﬂuence of

the non-local eﬀects appears only in correlations between A and B’s sequences which

can be noticed only if A and B later communicate or come together. Hence although the

necessity of non-local eﬀects is manifested, these eﬀects cannot be used to instantaneously

communicate or send a message – A’s choice of settings has no eﬀect whatever on any

statistics of measurement on B’s side taken by itself (and viceversa).

Remark (Optional but interesting!): we can formalise the notions of non-locality and

no-signalling for any physical theory in the following general conceptual framework. We

have two participants A and B in spatially separated regions, performing measurements

on a physical system that extends over both regions. They are able to make local choices

of measurement settings α and β respectively (e.g. angle choices), giving measurement

outputs x and y respectively. Outcomes are generally probabilistic with a joint proba-

bility distribution Pr(xy[αβ), the probability of getting x, y given settings α, β. This

joint probability is prescribed by the laws of some physical theory, or maybe determined

from repeated actual experiments. Each participant A or B sees locally the marginal

distribution:

A : Pr(x[αβ) =

¸

y

Pr(xy[αβ) B : Pr(y[αβ) =

¸

x

Pr(xy[αβ).

We will be interested in these local distributions and their correlations, asking especially

whether the locally observed outcomes can be explained by purely local inﬂuences or

COMSM0214 31

whether they are inﬂuenced by settings or actions on the other side. We allow A and B to

have communicated in the past e.g. they may have met and exchanged information before

becoming spatially separated. In particular, to generate future probabilistic outputs such

as our x and y they may have set up some shared probabilistic data in a variable e with

distribution Pr(e) which they can both use as an input to generate later probabilistic

outputs. Thus we will have Pr(xy[αβ) =

¸

e

Pr(e) Pr(xy[αβe).

No-signalling theories: if the x distribution Pr(x[αβe) depends on the choice of β

then B can signal to A by choosing diﬀerent β settings which A can notice by statistics

of her local x outputs. Thus the theory is called no-signalling if

Pr(x[αβe) is independent of β and Pr(y[αβe) is independent of α. (13)

Local vs. non-local theories: the theory is called local if the joint distribution of

xy outputs is explainable by putting together independent local eﬀects (including the

previously shared e used locally on each side) i.e. we require the joint probability to have

the product form:

Pr(xy[αβe) = Pr(x[αe) Pr(y[βe) (14)

and then also

Pr(xy[αβ) =

¸

e

Pr(e) Pr(x[αe) Pr(y[βe).

For locality, there must exist three distributions Pr(e), Pr(x[αe) and Pr(y[βe) making

these relations hold. Otherwise the theory is called non-local. Note that locality implies

no-signalling e.g. if we sum eq. (14) over y then using

¸

y

Pr(y[βe) = 1 we see that

Pr(x[αβe) = Pr(x[αe)) is independent of β.

Our above quantum protocol can be easily recast to show that quantum theory is a

non-local theory i.e. no choices of distributions Pr(e), Pr(x[αe), Pr(y[βe) can correctly

reproduce the quantum distributions Pr(xy[αβ) of the protocol. It can also be shown

that quantum theory is a no-signalling theory which is perhaps surprising once we know

it is non-local! Our quantum protocol above demonstrated no-signalling only for its

particular situation (in fact local outcomes were always uniformly random, independent

of the choice of settings of the other party) but it can be proved to hold generally for any

spatially distributed quantum process whatever.

8 Quantum teleportation

Consider again our distantly separated participants Alice and Bob who each possess one

qubit of the entangled state

[ψ` =

1

√

2

([00` +[11`).

Suppose Alice has another qubit in some state [α` and she wants to transfer this qubit

state to Bob. How can she achieve this transfer? She may not even know the identity

of the state [α` and according to quantum measurement theory she is unable to learn

more than a small amount of information about it before totally destroying it! She can

place the (physical system embodying the) qubit state in a “box” and physically carry it

COMSM0214 32

across to Bob. But is there any other way? What if the space region in between A and

B is a hostile and dangerous place?

Quantum teleportation provides an alternative method for state transfer that utilises

the entanglement in the state [ψ`. As we’ll see precisely in a moment, but speaking

intuitively now, the state transfer from A to B is achieved “without the state having to

pass through the space in between” in the following sense: the transference is unaﬀected

by any physical process whatever that takes place in the intervening space. Note that this

is also a feature of the entanglement of [ψ`: in the previous section we argued that there

is some kind of “non-local connection” between the two particles but the entangled state

remains entirely unaﬀected by any physical process occurring in the space in between; it

can change only by physical actions on the particles themselves.

Let qubit 1, qubit 2 and qubit 3 denote respectively Alice’s input qubit (in state [α`),

Alice’s qubit of [ψ` and Bob’s qubit of [ψ`. Using subscripts to label the qubits the

starting state can be written [α`

1

[ψ`

23

with 1,2 in A’s possession and 3 in B’s possession.

If

[α` = a [0` +b [1`

then we explicitly have

[α` [ψ` = (a [0` +b [1`)

1

√

2

([00` +[11`)

=

a

√

2

[000` +

a

√

2

[011` +

b

√

2

[100` +

b

√

2

[111` .

(15)

The protocol for quantum teleportation comprises the following ﬁve steps (i) to (v).

(i) Alice applies CNOT to her two qubits 1 and 2 (with 1 being the control and 2 the

target).

(ii) Alice applies H to her qubit 1.

(iii) Alice measures her two qubits (in the computational basis) to obtain a 2-bit string

00, 01, 10 or 11.

The combination of (i), (ii) and (iii) is called “performing a Bell measurement” on the

two qubits (named after the physicist J. S. Bell). By calculating the eﬀect of these three

operations on eq. (15) we see that each 2-bit string is obtained with equal probability

of 1/4 (irrespective of the values of a and b, recalling that [a[

2

+ [b[

2

= 1). Furthermore

after the measurements in (iii) we have the following post-measurement states (as you

should calculate):

mmt outcome post-mmt state

00 [00` [α`

01 [01` Z [α`

10 [10` X [α`

11 [11` XZ [α`

i.e. Bob’s qubit 3 is now disentangled from 1,2 and it is in a state that’s a ﬁxed transform

of [α`, the choice of transform depending only on the measurement outcome and not on

the identity of [α` (i.e. not on the a, b values). In fact if the measurement outcome was

ij the Bob’s qubit has state X

i

Z

j

[α`.

COMSM0214 33

(iv) Alice sends the 2-bit measurement outcome ij to Bob (i.e. she sends him 2 bits of

classical information).

(v) On receiving ij Bob applies the unitary operation Z

j

X

i

(i.e. the inverse of X

i

Z

j

) to

his qubit which is then guaranteed to be in state [α`.

This completes the teleportation of [α` from Alice to Bob. Note that no remnant of any

information about [α` remains with Alice. After stage (iii) she is left with only a 2-bit

string that has always been chosen uniformly at random.

The whole protocol is shown diagrammatically in ﬁgure 1. In ﬁgure 2 we give an alter-

native depiction of the protocol as a network of quantum gates. This representation is

perhaps more pertinent to computation (rather than communication), using teleportation

to transfer qubits between diﬀerent parts of a quantum memory.

SPACE

Alice Bob

u

`

`

`

`

`¯

`

`

`

`

`

`

`

t

0

t

1

t

2

TIME

[ψ` state

created

Bell Mmt

two bits ij

transmitted

Z

j

X

i

A’s input

qubit

B’s output

qubit

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Figure 1: Quantum teleportation. The ﬁgure shows a spacetime diagram with the en-

tangled state [ψ` =

1

√

2

([00` +[11`) created at t

0

and subsequently distributed to A and B.

At time t

1

Alice performs a Bell measurement on the joint state of her input qubit and

her qubit from [ψ` and sends the outcome ij to Bob. On reception at time t

2

, Bob applies

Z

j

X

i

to his particle which is then guaranteed to be in the same state as Alice’s original

input qubit.

COMSM0214 34

[ψ`

[α`

CNOT

H

measure

qubits

ij

··

Z

j

X

i

[α`

Figure 2: A quantum network for teleportation. The diagram is read from left to right.

Horizontal lines represent qubits in a quantum memory. As a result of the above sequence

of operations the qubit state is transferred from the top line to the bottom line.

We conclude this section with a few further remarks about the teleportation process, for

contemplation.

• Unlike “star-trek” teleportation, the physical system embodying [α` is not transferred

from A to B. Only the “information” of the state’s identity is transferred, residing ﬁnally

in a new physical system i.e. the system that was initially Bob’s half of [ψ`.

• Before A’s measurements in (iii) Bob’s qubit has preparation: “the right half of [ψ`”.

After A’s measurement Bob’s qubit has preparation: “ one of the four states [α`, Z [α`,

X [α` or XZ [α` chosen uniformly at random”. It can be shown that for any measurement

process on Bob’s qubit, these two preparations give identical probability distributions of

outcomes so Bob cannot notice any change at all in his qubit’s behaviour as a result of

A’s measurements. He can reliably create the qubit state [α` only after receiving the ij

message from Alice.

• Figure 1 highlights one of the most enigmatic features of the quantum teleportation

process. The question is this: Alice succeeds in transferring the quantum state [α` to Bob

by sending him just two bits of classical information. Clearly these two bits are vastly

inadequate to specify the state (whose description depends on continuous parameters) so

“how does the remaining information get across to Bob?” What carries it? What route

does it take? Usually when information is transferred from one location to another, it

requires a channel for its transmission! But in ﬁgure 1 there is clearly another route

connecting Alice to Bob (apart from the channel carrying the two classical bits) and it

does indeed carry a qubit – it runs backwards in time from Alice to the creation of the

[ψ` state and then forwards in time to Bob. Hence it is tempting to assert that most

of the “quantum information” of [α` was propagated along this route, ﬁrstly backwards

in time and then forwards to Bob! In this view, at times between t

0

and t

1

this part of

the state’s information was already well on its way to Bob even though Alice had not

yet performed her measurement! Such statements may appear paradoxical but further

consideration shows that, as an interpretation, this view is fully consistent and sound.

Whether or not you accept this view as a correct description of what actually happens

COMSM0214 35

in the real physical world, is only a matter of personal preference!

9 Quantum computation – the circuit model

Classical computation can be represented in many diﬀerent but essentially equivalent

ways. We have the basic Turing machine and circuit (or gate array) models, useful for

studying fundamental properties of the notions of computation and complexity, and a

variety of programming languages incorporating higher level constructs useful for devel-

oping actual practical algorithms to solve computational problems. In the quantum case

we will adopt a quantum generalisation of the circuit model; this provides the simplest

and most intuitive passage to the notion of a quantum computation. Thus we begin by

reviewing the ingredients of the classical circuit model.

Computational tasks:

The input to a computation will always be taken to be a bit string. The input size is

the number of bits in the bit string. For example if the input is 0110101 then the input

size is 7. A computational task is not just a single task such as “is 10101 prime?”

(where we are interpreting the bit string as an integer in binary) but a whole family of

similar tasks such as “given an n-bit string A (for any n), is A prime?” The output of

a computation is also a bit string. If this is a single bit then the computational task is

called a decision problem.

Let B = ¦0, 1¦ and let B

n

denote the set of all n-bit strings. Let B

∗

denote the set

of all n-bit strings, for all n i.e. B

∗

= ∪

∞

n=1

B

n

. A subset of B

∗

is called a language.

Thus a decision problem corresponds to the recognition of a language viz. those inputs

that give output 1. For example primality testing as above is the decision problem of

recognising the language L ⊆ B

∗

where L is the subset of all bit strings that represent

prime numbers in binary. Many computational tasks have outputs that are bit strings of

length > 1. For example FACTOR(x) has input bit string x and outputs a bit string y

which is a factor of x (interpreting x and y as integers in binary).

Circuit model of classical computation:

For each n the computation with inputs of size n begins with the input string x = b

1

. . . b

n

extended with a number of extra bits all set to 0 viz. b

1

. . . b

n

00 . . . 0. These latter bits

provide “extra working space” that may be needed in the course of the computation. A

computational step is the application of a designated Boolean operation (or Boolean

gate) to designated bits, thus updating the total bit string. We restrict the Boolean gates

to be AND, OR, NOT and COPY (which maps b to the 2-bit string bb). It can be shown

that these operations are universal i.e. any Boolean function f : B

m

→ B

n

at all can be

constructed by the sequential application of just these simple operations. The output of

the computation is the value of some designated subset of bits after the ﬁnal step.

For each input size n we have a so-called circuit C

n

which is a prescribed sequence of

computational steps. C

n

depends only on n and not on the the particular input x of

size n. In total we have a circuit family (C

1

, C

2

, . . . , C

n

, . . .). We think of C

n

as “the

computer program” for inputs of size n. For technical reasons (that we will not fully

COMSM0214 36

elaborate here) we impose a further condition on the circuit family: it should be a so-

called uniform circuit family in the following sense. The descriptions of the circuits C

n

should be generated in a suitably simple computational way as a function of n. More

precisely there is a (log-space) Turing machine which on input 11 . . . 1 of length n outputs

a description of C

n

. This prevents us from “cheating” by coding the answer to a hard

computational problem (or even an uncomputable problem!) into the changing structure

of C

n

with n.

Circuit model for quantum computation:

The above formalism can be generalised in an obvious way to the quantum setting. For

inputs of size n the starting string b

1

. . . b

n

00 . . . 0 is replaced by a sequence of qubits in

the corresponding computational basis state [b

1

` . . . [b

n

` [0` [0` . . . [0`. A computational

step is the application of a quantum gate which is a prescribed unitary operation applied

to a prescribed choice of qubits. For each input size n we have a quantum circuit C

n

which is a prescribed sequence of such steps. The output of the computation is the result

of performing a quantum measurement (in the computational basis) on a speciﬁed subset

of the qubits (this being part of the description of C

n

).

Remark: More generally we could allow measurements along the way (rather than only at

the end) and allow the choice of subsequent gates to depend on the measurement outcomes.

However it can be shown that this further generality adds nothing extra: any such circuit can

be re-expressed as an equivalent circuit in which measurements are performed at the end only.

A quantum computation or quantum algorithm is deﬁned by a (uniform) family of quan-

tum circuits (C

1

, C

2

, . . .).

Each of the circuits can be depicted pictorially as a circuit diagram. Each input qubit is

represented by a horizontal line running across the diagram, which is read from left to

right. The applied quantum gates are represented by labelled boxes in order from left to

right. We illustrate this in an example.

Example 9 Consider a circuit on two input qubits and one ancillary qubit, referred to

as qubits 1,2 and 3 respectively. The initial state is [x

0

` [x

1

` [0` with the ancilla in state

[0`. The circuit is deﬁned by the following sequence of instructions: Apply H to qubit 1;

then apply CNOT to qubits 2,3 with 2 as control; apply CNOT to qubits 1,2 with 1 as

control; apply Z to qubit 2; ﬁnally apply CNOT to qubits 2,3 with 3 as control and then

measure qubit 1 in the computational basis to get an output bit i. The circuit diagram is

the following.

COMSM0214 37

[0`

[x

1

`

[x

0

`

H

Z

measure

i

v

v

v

Note that we have used an asymmetrical symbol to depict CNOT with a “blob” on the

control line and an X on the target line. The ﬁrst H and CNOT, acting on disjoint sets

of qubits, commute and can be done in parallel or the reverse order if desired.

Figure 2 is also a circuit diagram for the quantum teleportation protocol. In this process

the inputs are more general quantum states ([α`, [ψ`) than just computational basis states

i.e. we can think of teleportation as a “quantum computation with quantum inputs and

quantum outputs” (rather than just quantum representations of classical data).

Randomised classical computations:

Recall that the result of a quantum measurement is generally probabilistic so the output

of a quantum computation will generally be a sample from a probability distribution, de-

termined by the ﬁnal quantum state before the measurement. Correspondingly it is useful

to extend out model of classical computation to also incorporate probabilistic choices.

This is done as follows: for input b

1

. . . b

n

we extend the starting string b

1

. . . b

n

00 . . . 0

to b

1

. . . b

n

r

1

. . . r

k

00 . . . 0 where r

1

. . . r

k

is a sequence of bits each of which is set to 0

or 1 uniformly at random. If the computation is repeated with the same input b

1

. . . b

n

the random bits will generally be diﬀerent. The output is now a sample from a proba-

bility distribution over all possible output strings, which is generated by the uniformly

random choice of r

1

. . . r

k

. Then in speciﬁc computational algorithms we normally re-

quire the output to be correct “with suitably high probability”, speciﬁed according to

some desired criteria. This formalism with random input bits can be used to implement

probabilistic choices of gates. For example suppose we wish to apply either AND or OR

at some point, chosen with probability half. Consider the 3-bit gate whose action is as

follows: if the ﬁrst bit is 0 (resp. 1) apply OR (resp. AND) to the last two bits. Then

we use this gate with a random input to the ﬁrst bit.

Universal sets of quantum gates:

In classical computation we restrict our circuits to be composed of a small universal set

of gates that act on only a few qubits each. One such choice is the set ¦ NOT, AND, OR,

COPY¦. Actually OR may even be deleted from this set since b

1

OR b

2

= NOT(NOT(b

1

)

AND NOT(b

2

)).

Remark: It may be shown that no sets of 2-bit reversible gates are universal (see Preskill p241-

2) but there are 3-bit reversible gates G that are universal even just by themselves i.e. any

reversible Boolean function may be constructed as a circuit of G’s alone, together with constant

extra inputs set to 0 or 1. Two examples of such gates are the Fredkin gate F(0b

2

b

2

) = 0b

2

b

3

COMSM0214 38

and F(1b

2

b

3

) = 1b

3

b

2

i.e. a controlled SWAP, controlled by the value of the ﬁrst bit, and the

Toﬀoli gate Toff(0b

2

b

3

) = 0b

2

b

3

and Toff(1b

2

b

3

) = 1CNOT(b

2

b

3

) i.e. a controlled-controlled-

NOT gate in which NOT is applied to bit 3 iﬀ the ﬁrst two bits are 1 and Toff is the identity

otherwise.

In the quantum case all gates are reversible (unitary) by deﬁnition and there are similar

universality results but the situation is a little more complicated: quantum gates are pa-

rameterised by continuous parameters (in contrast to classical gates which form a discrete

set) so no ﬁnite set can generate them all exactly via (arbitrarily large) ﬁnite circuits.

But many small ﬁnite sets of quantum gates are still universal in the sense that they

can generate any unitary gate with any prescribed accuracy > 0. Such approximations

(for suitably small ) will suﬃce for all our purposes and for clarity of discussion we will

generally ignore this issue of approximability and just allow use of any exact gate that

we need. (For more details about approximations see Nielsen and Chuang '4.5) Some ex-

amples of universal sets of quantum gates are the following: ¦CNOT, all 1-qubit gates¦,

¦CNOT, H, T =

1 0

0 exp iπ/4

**¦ and ¦Toﬀoli 3-qubit gate, H¦.
**

9.1 Reversible gate version of any Boolean function

Recall that any n-bit Boolean function or gate g : B

n

→ B

n

is called reversible if we can

uniquely construct the input from the output i.e. g must be one-to-one as a function on

n-bit strings i.e. g must be a permutation of B

n

. Most Boolean functions f : B

m

→ B

n

are not reversible and it is useful to have a reversible representation of them that can be

directly used in the quantum gate formalism.

We introduce an addition operation, denoted ⊕, for n-bit strings: if b = b

1

. . . b

n

and

c = c

1

. . . c

n

then b ⊕ c = (b

1

⊕ c

1

) . . . (b

n

⊕ c

n

) i.e. b ⊕ c is the n-bit string obtained by

adding mod 2, the corresponding bits of b and c i.e. we “add mod 2 without carry”. For

example 011 ⊕ 110 = 101. Note that for any n-bit string we have b ⊕ b = 0 . . . 0 where

0 . . . 0 denotes the n-bits string of all zeroes.

Now let f : B

m

→ B

n

be any Boolean function. Consider

˜

f : B

m+n

→ B

m+n

deﬁned by

˜

f(b, c) = (b, c ⊕f(b)) for any m-bit string b and any n-bit string c.

Note that

˜

f is easily computable if we can compute f and the (simple) addition operation

⊕ on bit strings. Conversely given

˜

f we can easily recover f(b) for any b by setting

c = 0 . . . 0 and looking at the last n bits of output of

˜

f.

Lemma 1 For any f,

˜

f is a reversible operation on m+n bits. Furthermore

˜

f is always

self-inverse i.e.

˜

f applied twice is the identity operation.

Proof: If we apply

˜

f twice to (b, c) we get (b, c ⊕f(b) ⊕f(b)). But f(b) ⊕f(b) = 0 . . . 0

so

˜

f

˜

f(b, c) = (b, c). Hence

˜

f is reversible since from an output (b

, c

) of

˜

f we can recover

the input (b, c) uniquely by simply applying

˜

f again.

COMSM0214 39

If we interpret

˜

f as deﬁning a map on computational basis states of m+n qubits then its

extension by linearity to all states of m+n qubits gives a unitary operation corresponding

to f, which we denote as U

f

.

These ideas extend easily to arbitrary functions between ﬁnite sets (not just Boolean

functions mapping bit strings to bit strings). Let f : X → Y be any function where X

and Y are ﬁnite sets. For deﬁniteness let X = Z

M

and Y = Z

N

where Z

M

is the set

¦0, 1, 2, . . . , M − 1¦ of integers modulo M. Let H be a state space of dimension MN

with orthonormal basis [x, y` = [x` [y` labelled by the elements of X and Y . We will call

[x` and [y` the input and output registers or ﬁrst and second registers respectively, for

U

f

. Consider the operation U

f

on H deﬁned by

U

f

[x` [y` = [x` [y +f(x)`

where + denotes addition modulo N in Y = Z

N

. Then U

f

is reversible with inverse

U

−1

f

[x` [y` = [x` [y −f(x)`

where the minus sign denotes subtraction modulo N. (Note that in the Boolean 1-bit

case, subtraction modulo 2 is the same as addition!) Hence U

f

is one-to-one and mapping

basis states to basis states, it must be a permutation on these states and hence unitary

on the whole space. We adopt U

f

as the quantum representation of the function f.

9.2 Time complexity – P, BPP, BQP

In computational complexity theory a fundamental issue is the time complexity of algo-

rithms: how many steps (in the worst case) does the algorithm require for any input of

size n? In the (classical or quantum) circuit model the number of steps on inputs of size

n is taken to mean the total number of gates in the circuit C

n

i.e. the size of the circuit

C

n

. Let T(n) be the size of C

n

. We are especially interested in the question of whether

T(n) is bounded by a polynomial function of n (i.e. is T(n) < cn

k

for all large n for some

constants k, c?) or else, does T(n) grow faster than any polynomial (e.g. exponential

functions such as T(n) = 2

n

or 2

√

n

have this property).

Although computations with any T(n) are computable in principle, poly time computa-

tions are regarded as tractable or “computable in practice”. If T(n) is not polynomially

bounded then the computation is regarded as intractable or “not computable in practice”

as the physical resource of time will, for fairly small n values, exceed sensibly available

limits (e.g. running time on any available computer will exceed the age of the universe).

We have the following standard terminology for some classes of algorithms:

P (“poly time”): class of all classical algorithms that run in polynomial time and give

the correct answer with certainty (i.e. the algorithms are not probabilistic).

BPP (“bounded error probabilistic poly time”): class of all classical randomised algo-

rithms that run in poly time and give the correct answer with probability at least 2/3 in

every case.

EQP (“exact quantum poly time”): class of all quantum algorithms that run in poly

COMSM0214 40

time and give the correct answer with certainty in every case.

BQP (“bounded error quantum poly time): class of all quantum algorithms that run in

poly time and give the correct answer with probability at least 2/3 in every case.

The “exact” classes P and EQP, although useful in theoretical considerations, are unre-

alistically idealised from a practical point of view: in any real physical situation (classical

or quantum, that includes continuous parameters) we cannot build an inﬁnitely precise

implementation and we must tolerate some level or error. Thus the classes BPP resp.

BQP are generally viewed as mathematical formalisations of “computations that are

feasible on a classical resp. quantum computer”. In view of this it is of great interest to

ask: are there problems with algorithms in BQP but not known to have algorithms in

BPP? i.e. more colloquially: can quantum computation oﬀer an “exponential speed-up”

over classical computation for some computational tasks?

Example 10 The problem FACTOR(N) is the following: given an integer N of n digits

ﬁnd a non-trivial factor of N (i.e. = 1, N and if N is prime return 1). The fastest

known classical algorithm runs in time exp O(n

1

3

(log n)

2

3

) i.e. more than exponential in

the cube root of the input size. Thus this problem is not known to be in BPP. In 1994

Peter Shor discovered a quantum algorithm for this task with running time T(n) < n

3

so

the problem is in BQP. We will study this quantum factoring algorithm later in detail.

One ﬁnal remark about the deﬁnition of the classes BPP and BQP: we have required

the outputs to be correct with probability 2/3. However it may be shown that “2/3” here

may be replaced by any other number 1 − for any 1/2 > > 0 (however small) without

changing the contents of the class i.e. if there is a poly time algorithm for a problem that

succeeds with probability 2/3 then there are also poly time algorithms that succeed with

probability 0.9 or 0.99 or 0.99999 or indeed 1 − for any arbitrarily small ﬁxed > 0.

This result relies on the following fact often called the ampliﬁcation lemma (see Nielsen

and Chuang p154 for more details): if we have an algorithm for a decision problem that

works correctly with probability 2/3 in all cases then consider repeating the algorithm K

times and taking the majority vote of all K answers as our ﬁnal answer. It can be shown

that this answer is correct with a probability 1 −

1

2

O(K)

approaching 1 exponentially in

K. Thus given any > 0 this probability will exceed 1 − for some constant K and if

the original algorithm had poly running time T(n) then our K-repetition majority vote

strategy has running time KT(n) which is still polynomial in n.

9.3 Query complexity and promise problems

In quantum computation and the study of its properties relative to classical computation,

there is another computational scenario that is often considered. This is the concept of

“black box promise problems” with an associated measure of complexity called “query

complexity”.

In this scenario, instead of being given an input bit string of some length n, we are instead

given as input a black box or oracle that computes some function f : B

m

→ B

n

. We can

COMSM0214 41

query the black box by giving it inputs and this is the only access we have to the function

and its values. No other use of the box is allowed. In particular we cannot “look inside

it” to see its actual operation and learn information about the function f. Thus, at the

start, it is unknown exactly which function f is, but there is often an a priori promise

on f i.e. some stated restriction on the possible form of f. Our task is to determine

some desired property of f e.g. some feature of the set of all values of f. We want to

achieve this by querying the box the least possible number of times. In our circuits in

addition to our usual gates we may use the black box as a gate, each use counting as

just one step of computation. The query complexity of such an algorithm is simply the

number of times that the oracle is used (as a function of its “size” measured by m, n).

For quantum circuits we always use the reversible form

˜

f of f i.e. the unitary operation

U

f

associated to f. These notions are illustrated in the example below. In addition to

the query complexity we may also be interested in the total time complexity, counting

also the number of gates used to process the answers to the queries in addition to merely

the number of queries themselves.

Example 11 The following are examples of black box promise problems.

The “balanced versus constant” problem

Input: a black box for a Boolean function f : B

n

→ B (one bit output).

Promise: f is either (a) a constant function (f(x) = 0 for all x or f(x) = 1 for all x)

or (b) a “balanced” function in the sense that f(x) = 0 resp. 1 for exactly half of the 2

n

inputs x.

Problem: Determine whether f is balanced or constant. We could ask for the answer

to be correct with certainty or merely with some probability, say 0.99 in every case.

Boolean satisﬁability

Input: a black box for a Boolean function f : B

n

→ B.

Promise: no restriction on the form of f.

Problem: determine whether there is an input x such that f(x) = 1.

Search

Input: a black box for a Boolean function f : B

n

→ B.

Promise: There is a unique x such that f(x) = 1.

Problem: ﬁnd this special x.

Periodicity

Input: a black box for a function f : Z

n

→ Z

n

(where Z

n

denotes the set of integers mod n).

Promise: f is periodic i.e. there is a least r such that f(x +r) = f(x) for all x (and +

here denotes addition mod n).

Problem: ﬁnd the period r.

In each case we are interested in how the minimum number of queries grows as a function

of the natural parameter n (for quantum versus classical algorithms).

COMSM0214 42

10 Computation by quantum parallelism.

The Deutsch algorithm

10.1 Computation by quantum parallelism

Consider any function f : X → Y between ﬁnite sets of sizes [X[ and [Y [ respectively,

and its corresponding unitary transform U

f

which maps [x` [y` to [x` [y + f(x)`. If we

run U

f

on [x` [0` we get [x` [f(x)` and we can read out the value of f(x) by measuring

the second register. Now U

f

is unitary and linear. Thus if we set the input register to a

superposition of all possible x values we get

U

f

:

1

[X[

¸

all x

[x` [0` → [f` ≡

1

[X[

¸

all x

[x` [f(x)`

i.e. in one run of U

f

we obtain a ﬁnal state which depends on all of the function values.

Such a computation on superposed inputs is called computation by quantum parallelism.

By further quantum processing and measurement on the state [f` we are able to obtain

“global” information about the nature of the function f (e.g. determine some joint

properties of all the values) with just one run of U

f

, and these properties may be diﬃcult

to get classically without many classical evaluations of f (as each such evaluation reveals

only one further value). Hence we can already see a potential quantum beneﬁt over

classical computation for the scenario of query complexity and promise problems. Below

we will develop some explicit examples.

It is instructive to consider more explicitly the important special case of X = B

n

, the set

of all n-bit strings (and Y = B

1

or some other set). How do we actually create the input

state of a uniform superposition over all x values that is needed in the above process?

Recall that H [0` =

1

√

2

([0` + [1`) so if we apply H to each of n qubits each in state [0`

and multiply out all the state tensor products, we get

H ⊗. . . ⊗H([0` . . . [0`) =

1

√

2

n

([0` +[1`) . . . ([0` +[1`)

=

1

√

2

n

¸

1

x

1

,x

2

,...,x

n

=0

[x

1

` [x

2

` . . . [x

n

` =

1

√

2

n

¸

x∈B

n

[x` .

An important feature of this process is that we have created a superposition of exponen-

tially many (viz. 2

n

) terms with only a linear number of elementary operations – we have

applied H just n times.

10.2 Deutsch’s algorithm

Our ﬁrst (and also historically the ﬁrst) example of the beneﬁt of computation by quan-

tum parallelism is the query problem invented by D. Deutsch in 1985.

Deutsch’s problem:

There are four functions f : 1-bit → 1-bit.

Input: we are given a black box for an unknown one of these f’s.

COMSM0214 43

Problem: We want to know the value of f(0) ⊕ f(1). Note that this is equivalent to

deciding whether f is constant (i.e. both values are 0 or both 1) or balanced (i.e. one

value 0 and one value 1).

A little thought shows that classically two queries are necessary and suﬃcient. In the

quantum scenario we will solve the problem (with certainty) with only one query!

Our quantum black box acts on two qubits:

U

f

[x` [y` = [x` [y ⊕f(x)` .

Deutsch’s quantum algorithm is the following. Set the input register to

1

√

2

([0` + [1`)

and the output register (surprisingly!) to

1

√

2

([0` −[1`). (This is achieved by applying H

respectively to [0` and [1`). Then run |

f

on the resulting state.

Note that:

1

√

2

[x` ([0` −[1`) −→

1

√

2

[x` ([f(x)` −[f(x) ⊕1`)

=

1

√

2

[x` ([0` −[1`) if f(x) = 0

−

1

√

2

[x` ([0` −[1`) if f(x) = 1

=

1

√

2

(−1)

f(x)

[x` ([0` −[1`)

(16)

i.e. we just get a minus sign on [x` if f(x) = 1 and no change if f(x) = 0. Discarding the

second register we have [x` → (−1)

f(x)

[x`. We say that “f(x) has been coded as phase

information”.

Now if we run |

f

on

1

2

([0`+[1`)([0`−[1`) each x value runs as above and after discarding

the second register, we get

1

√

2

¸

x

(−1)

f(x)

[x`. Explicitly, we get the following orthogonal

states:

±

1

√

2

([0` +[1`) if f constant

±

1

√

2

([0` −[1`) if f balanced

Finally apply H (to get ±[0` if f constant, and ±[1` if f balanced) and measure in the

computational basis.

Below we give a circuit diagram for the Deutsch algorithm. (the top line is the input

register and the bottom line is the output register).

[0`

[0`

XX H

H

U

f

measure

H

1 if f balanced

0 if f constant

(discard)

Note that the 2-qubit states immediately before and after U

f

are both product states. In

both cases the state of the output register is

1

√

2

([0` −[1`).

COMSM0214 44

10.3 The Deutsch-Jozsa (DJ) algorithm

We can generalise the above idea to obtain a problem with an exponential separation

between the number of queries needed classically and quantumly. Consider the “balanced

versus constant” promise problem of example 11 which we recall here:

The “balanced versus constant” problem

Input: a black box for a Boolean function f : B

n

→ B (one bit output).

Promise: f is either (a) a constant function (f(x) = 0 for all x or f(x) = 1 for all x)

or (b) a “balanced” function in the sense that f(x) = 0 resp. 1 for exactly half of the 2

n

inputs x.

Problem: Determine (with certainty) whether f is balanced or constant.

(At the end we will discuss the bounded error version of the problem i.e. requiring the

correct solution only with some high probability, 0.999 say).

A little thought (hmm...) shows that classically 2

n

/2 + 1 queries (i.e. exponentially

many) are necessary and suﬃcient to solve the problem with certainty in the worst case.

We now show that in the quantum scenario, just one query suﬃces (with O(n) extra

processing steps) in every case!

Our quantum black box is

U

f

: [x` [y` = [x` [y ⊕f(x)`

where the input register [x` comprises n qubits and the output register [y` comprises a

single qubit. We assume that initially all qubits are in standard state [0`. We begin by

constructing an equal superposition of all n-bit strings in the input register and the state

1

√

2

([0` −[1`) in the output register. This requires (n +1) Hadamard operations and one

X operation (on the output register), resulting in the (n + 1)-qubit state

1

√

2

n+1

¸

all x ∈ B

n

[x` ([0` −[1`).

Next we run U

f

(once) on this state. For each individual n-bit string x the calculation

of eq. (16) goes through exactly as before (in the previous case x was a single bit but

this makes no diﬀerence to the calculation). Hence on the full superposition we get

U

f

:

1

√

2

n+1

¸

all x ∈ B

n

[x` ([0` −[1`) −→

1

√

2

n+1

¸

all x ∈ B

n

(−1)

f(x)

[x` ([0` −[1`).

This is a product state of the n-qubit input and single qubit output registers. Discarding

the last (output) qubit we get the n-qubit state

[f` ≡

1

√

2

n

¸

all x ∈ B

n

(−1)

f(x)

[x` . (17)

What does this state look like when f is constant or balanced?

If f is constant then [f` = ±

1

√

2

n

¸

x

[x`. If we apply H

n

= H ⊗ . . . ⊗ H we get

±[0 . . . 0` because H is self inverse and recall that H

n

[0 . . . 0` =

1

√

2

n

¸

x

[x`.

COMSM0214 45

If f is balanced then the sum in eq. (17) contains an equal number of plus and minus

terms, with minus signs sprinkled in some unknown locations along the 2

n

terms. But

if we take the inner product of [f` with

¸

x

[x` we simply add up all the coeﬃcients in

[f` (as 'x[y` = 0 if x = y and = 1 if x = y) and wherever the minus signs occur, the

total sum is always zero i.e. if f is balanced then [f` is orthogonal to

¸

x

[x`. Hence

if we apply the unitary operation H

n

(which preserves inner products), H

n

[f` will be

orthogonal to H

n

¸

x

[x` = [0 . . . 0`. Hence H

n

[f` must have the form

¸

x=0...0

a

x

[x`

having the all-zero term absent.

In view of the above discussion, having constructed [f` (for our given black box) we

apply H

n

and measure the n qubits in the computational basis. If the result is 0 . . . 0

then f was certainly constant and if the result is any non-zero string x

1

. . . x

n

= 0 . . . 0

then f was certainly balanced. Hence we have solved the problem with one query to f

and (3n + 1) further operations: n H’s and one X for the input state, n H’s on [f` and

n single qubit measurements to get the classical output string.

The balanced versus constant problem with bounded error: Suppose we tolerate

some error i.e. require our algorithm to correctly distinguish balanced versus constant

functions only with probability > 1 − for some > 0. Then the above (single query)

algorithm still works (as it has = 0) but there is now a classical (randomised) algorithm

that solves the problem with only a constant number of queries (depending on as

O(1/ log ) for any n and for any ﬁxed > 0). Thus we lose the all-interesting exponential

gap between classical and quantum query complexities in this bounded error scenario.

The classical algorithm is the following: we pick K x values, each chosen independently

uniformly at random and evaluate the corresponding f values. If they are all 0 or all

1, output “f is constant”. If we get at least one instance of each of 0 and 1, output “f

is balanced”. Clearly the second output must always be correct (as a constant function

can never output both values). But the ﬁrst output (“f is constant”) can be erroneous.

Suppose f is a balanced function. Then each random value f(x) has probability half to

be 0 or 1. So the probability that K random values are all 0 or all 1 is 2/2

K

= 1/2

K−1

.

This is < if 1/2

K−1

< i.e. K > log 1/ i.e. K = O(log 1/) suﬃces to guarantee error

probability < in every case, for all n.

Remark: So does the above prove conclusively that quantum computation can be expo-

nentially more powerful (in terms of time complexity) than classical computation!? We

point out two important shortcomings in this claim. The ﬁrst weakness is that if we

allow any level of error in the result, however small, we lose the exponential separation

between classical and quantum algorithm running times (as described in the previous

paragraph). We noted previously that the zero error scenario in computation is an unre-

alistic idealisation and for realistic computation we should always accept some (suitably

small) level of error. However this weakness can be fully addressed: there exist other

black box promise problems for which a provable exponential separation exists between

classical and quantum query complexity even in the presence of error. (An example is

the so-called Simon’s quantum algorithm, which we will not discuss in this course.)

A second (more serious) issue is the fact that the DJ problem is only a black box problem

(with the black box’s interior workings being inaccessible to us) rather than a straight-

COMSM0214 46

forward “standard” computational task with a bit string as input, and no “hidden”

ingredients. To convert it to a standard task we would want a class of Boolean functions

f

n

: B

n

→ B such that the balanced/constant decision is hard classically (e.g. takes

exponential time in n) even if we have full access to a description of the function e.g. a

formula for it or a circuit C

n

that computes f

n

. Note that even a constant function can

be presented to us in such a perversely complicated way that its trivial action is hard

to recognise! Alas, no such (“provably hard”) class of Boolean function descriptions is

known.

So, are there any “standard” computational tasks for which we can prove the existence

of an exponential speed-up for quantum versus classical computation? No such absolute

proofs are known but the diﬃculty seems to be largely within the classical theory: even

though many problems have only exponential-time known classical algorithms, they can-

not be proven to be hard classically i.e. we cannot prove that no poly-time algorithm

exists (that we have not yet discovered!) – recall that it is unproven that the class NP or

even PSPACE is strictly larger than P. However there are problems which are believed

to be hard for classical computation (i.e. no classical poly-time algorithm, even with

bounded error, is known despite much eﬀort) for which poly-time quantum algorithms

do exist. A centrally important such problem is integer factorisation. Below we will de-

scribe Shor’s polynomial time quantum algorithm for factorisation after we introduce the

quantum Fourier transform, which is at the heart of the workings of Shor’s algorithm.

**11 The quantum Fourier transform and periodicities
**

11.1 Quantum Fourier transform mod N

The quantum Fourier transform (QFT) can be viewed as a kind of generalisation of the

Hadamard operation to dimensions N > 2. Later we will be especially interested in

N = 2

n

i.e. the QFT on an n-qubit space. As a pure mathematical construction it is

the same as the so-called discrete Fourier transform which is widely used in digital signal

and image processing. It is a unitary matrix that arises naturally in a wide variety of

mathematical situations so it ﬁts well into the quantum formalism, providing a bridge

between a quantum operation and certain mathematical problems. In fact QFT is at

the heart of most known quantum algorithms that provide a signiﬁcant speedup over

classical computation.

Let H

N

denote a state space with an orthonormal basis (the computational basis)

[0` , [1` , . . . , [N −1` labelled by Z

N

. The quantum Fourier transform (QFT) modulo N,

denoted QFT

N

(or just QFT when N is clear) is the unitary transform on H

N

deﬁned

by:

QFT : [x` →

1

√

N

N−1

¸

y=0

exp(2πi

xy

N

) [y` (18)

COMSM0214 47

Thus the ab

th

matrix entry is

[QFT]

ab

=

1

√

N

exp 2πiab/N a, b = 0, . . . , N −1

(where we are labelling rows and columns from 0 to N − 1 rather than 1 to N!) If

ω = e

2πi/N

is the primitive N

th

root of unity then the matrix elements are all powers of

ω (divided by

√

N) following a simple pattern:

• The initial row and column always contain only 1’s.

• Each row (or column) is a geometric sequence. The k

th

row (or column) for k =

0, . . . , N −1 is the sequence of powers of ω

k

(starting with power 0 up to power N −1).

Example 12 For N = 2 we have ω = −1 and

QFT

2

=

1

√

2

1 1

1 −1

= H.

For N = 4 we have ω = i so (viewing rows as geometric sequences)

QFT

4

=

1

2

¸

¸

¸

1 1 1 1

1 i i

2

i

3

1 (i

2

) (i

2

)

2

(i

2

)

3

1 (i

3

) (i

3

)

2

(i

3

)

3

=

1

2

¸

¸

¸

1 1 1 1

1 i −1 −i

1 −1 1 −1

1 −i −1 i

.

Remark(optional): Note that QFT

4

is diﬀerent from H⊗H and generally QFT

2

n diﬀers from

H

n

= H ⊗ . . . ⊗ H. However there is a more general mathematical formalism, the so-called

Fourier transform on an abelian group, which embraces both of these constructs. For example

on a set of 4 elements there are two (non-isomorphic) group structures viz. Z

2

Z

2

and Z

4

(addition of integers mod 4). Then H⊗H and QFT

4

are respectively the “Fourier transform for

these two diﬀerent group structures”. In this course QFT

N

will always mean “Fourier transform

on the group Z

N

”, as deﬁned above in eq. (18).

Many properties of QFT, including the fact that it is unitary, follow from a basic algebraic

fact about roots of unity and geometric series. Recall the formula for the sum of any

geometric series

1 + α +α

2

+ . . . + α

N−1

=

1−α

N

1−α

if α = 1

N if α = 1

Now consider α = ω

K

= e

2πiK/N

for some chosen K. Then α = 1 iﬀ K is a multiple of

N. Also α

N

= 1 for every K (since α

N

= ω

KN

and ω

N

= 1). Hence we get

1 +ω

K

+ω

2K

+ . . . + ω

(N−1)K

=

N if K is a multiple of N.

0 if K is not a multiple of N.

(19)

Now to see that QFT is unitary, consider the ab

th

element of the matrix product QFT

†

QFT.

This is 1/N times the sum of “the a

th

row of QFT

†

lined up against the b

th

column of

QFT”. The latter sum is just the geometric series with α = ω

b−a

, divided by N. So using

eq. (19) we get 0/N = 0 if b = a and we get N/N = 1 if b = a i.e. QFT

†

QFT is the

identity matrix and QFT is unitary.

COMSM0214 48

11.2 Periodicity determination

A fundamental application of the Fourier transform (both classically and quantumly) is

the determination of periodicity exhibited in a function or some other given data. Some

important mathematical problems (such as integer factorisation, as we’ll see later) can

be reduced to problems of periodicity determination.

Suppose we are given (a black box for) a function f : Z

N

→ Y (where typically Y = Z

M

for some M) and it is promised that f is periodic with some period r i.e. there is a

smallest number r such that f(x + r) = f(x) for all X ∈ Z

N

(and + is addition mod

N). We will also assume that f is one-to-one in each period i.e. f(x

1

) = f(x

2

) for all

0 ≤ x

1

< x

2

< r. We want a method of determining r with some constant level of

probability (0.99 say) that’s independent of increasing the size of N. It can be shown

that O(

√

N) queries to f (i.e. a number not bounded by any polynomial in log N)) are

necessary and suﬃcient to achieve this in classical computation with a black box for f.

In some cases further information may be available about f e.g. we may have an explicit

formula for it but the periodicity determination may still be hard (we will see an example

later), requiring a number of steps that is not bounded by any polynomial in log N. In

the quantum scenario we will see that r can always be determined with any constant

high level of probability 1 − using only O(log log N) queries and poly(log N) further

processing steps i.e. exponentially faster than any classical method.

Quantum algorithm for periodicity determination

We begin by constructing a uniform superposition

1

√

N

¸

N−1

x=0

[x` and one query to U

f

to

obtain the state [f` =

1

√

N

¸

allx

[x` [f(x)`. Since f is periodic (with unknown period r)

r must divide N exactly and we set A = N/r, which is the number of periods. If we

measure the second register we will see some value y = f(x

0

) where x

0

is the least x

having f(x) = y. Then the ﬁrst register will be projected into an equal superposition of

the A values of x = x

0

, x

0

+ r, x

0

+ 2r, . . . , x

0

+ (A −1)r for which f(x) = y i.e. we get

[per` =

1

√

A

A−1

¸

j=0

[x

0

+jr`

Here 0 ≤ x

0

≤ r − 1 has been chosen uniformly at random (by the generalised Born

rule, since each possible value y of f occurs the same number A of times i.e. once in

each period.) If we measure the register of [per` we will see x

0

+ j

0

r where j

0

has been

picked uniformly at random too. Thus we have a random period (the j

th

0

period) and a

random element in it (determined by x

0

) i.e. overall we get a random number between

0 and N −1, giving no information about r at all. Nevertheless the state [per` seems to

contain the information of r!

The resolution of this problem is to use the Fourier transform which is known even

in classical image processing, to be able to pick up periodicities in a periodic pattern

irrespective of an overall random shift of the pattern (e.g. the x

0

in [per`). Applying

COMSM0214 49

QFT to [per` we get (using eq. (18) with x replaced by x

0

+ jr, and summing over j):

QFT [per` =

1

√

NA

A−1

¸

j=0

N−1

¸

y=0

ω

(x

0

+jr)y

[y`

=

1

√

NA

N−1

¸

y=0

ω

x

0

y

¸

A−1

¸

j=0

ω

jry

¸

[y` . (20)

(In the last equality we have reversed the order of summation and factored out the j-

independent ω

x

o

y

terms). Which labels y appear here with nonzero amplitude? Look at

the square-bracketed coeﬃcient of [y` in eq. (20). It is a geometric series with powers

of α = e

2πiry/N

= (e

2πi/A

)

y

summed from power 0 to power A −1. According o eq. (19)

(now applied with A taking the role of N there) this sum is zero whenever y is not a

multiple of A and the sum is A otherwise i.e. only multiples of A = N/r survive as y

values:

A−1

¸

j=0

ω

jry

=

A if y = kN/r for k = 0, . . . , r −1

0 otherwise

and

QFT[per` =

A

N

r−1

¸

k=0

ω

x

0

(kN/r)

[kN/r` .

The random shift x

0

has been eliminated from the labels and now occurs only in a pure

phase ω

x

0

kN/r

(whose modulus squared is 1), and the periodicity of the ket labels has

been “inverted” from r to A = N/r. Since measurement probabilities are squared moduli

of the amplitudes, these probabilities are now independent of x

0

and depend only on N

(known) and r (to be determined). This is represented schematically in the following

diagram.

` `

, , ,

, ,

0 x

0

x

0

+ r x

0

+ 2r 0 N/r 2N/r

(a) for [per` (b) for QFT[per`

. . .

. . .

. . .

. . .

. . .

. . .

labels labels

r

N

1

r

probs probs

x

0

r r

N/r N/r

If we now measure the label we will obtain a value c which is a multiple k

0

N/r of N/r

where 0 ≤ k

0

≤ r −1 has been chosen uniformly at random. Thus c = k

0

N/r so

k

0

r

=

c

N

.

Here c and N are known and k

0

is unknown and random, so how do we get r out of

this? If (by some good fortune!) k

0

was coprime to r we could cancel c/N down and

COMSM0214 50

read oﬀ r as the denominator. If k

0

is not coprime to r then this procedure will deliver

a denominator r

that is smaller than the correct r so f(x) = f(x + r

**) for any x. Thus
**

in our process we check the output r value by evaluating f(0) and f(r) and accepting r

as the correct period iﬀ these are equal.

But k

0

was chosen at random so what is the chance of getting this good fortune of

coprimality? We’ll use (without proof) the following theorem from number theory:

Theorem 3 (Coprimality theorem) The number of integers less than r that are coprime

to r grows as O(r/ log log r) with increasing r. Hence if k

0

< r is chosen at random

prob(k

0

coprime to r) ≈ O((r/ log log r)/r) = O(1/ log log r).

Thus if we repeat the whole process O(log log r) < O(log log N) times we will obtain a

coprime k

0

in at least one case with a constant level of probability. Here we have used

the following fact from probability theory:

Lemma 2 If a single trial has success probability p and we repeat the trial M times

independently then for any constant 0 < 1 − < 1:

prob(at least one success in M trials) > 1 − if M =

−log

p

so to achieve any constant level 1 − of success probability, O(1/p) trials suﬃce.

Proof of lemma We have that the probability of at least one success in M runs = 1−

prob(all runs fail) = 1 −(1 −p)

M

. Then 1 −(1 −p)

M

= 1 − if M =

−log

−log(1−p)

. Next use

the fact that p < −log(1 −p) for all 0 < p < 1 to see that M <

−log

p

i.e. M = O(1/p)

repetitions suﬃce.

In each round we query f three times (once at the start to make [f` and twice more at the

end to check the output r) so we use O(log log N) queries in all. We also need to apply the

“large” unitary gate QFT

N

(which grows with N) and we show in the next section that

this may be implemented in O((log N)

2

) elementary steps. The remaining operations

are all familiar arithmetic operations on integers of size O(N) (such as cancelling c/N

down to lowest form) that are all well known to be computable in polynomial time i.e.

poly(log N) steps. Thus we succeed in determining the period with any constant level

1 − of probability with O(log log N) queries and O(poly(log N)) further computational

steps.

11.3 Eﬃcient implementation of QFT

(This subsection is not required for exam purposes).

If N = 2

n

is an integer power of 2 then QFT mod N acts on n qubits. For these dimension

sizes we will show how to implement QFT with a circuit of polynomial size O(n

2

).

COMSM0214 51

This is a very special property of QFT – almost all unitary transforms in dimension

2

n

require exponential sized ( O(poly(2

n

)) sized) circuits for their implementation. For

general N (not a power of 2) we do not have an exact eﬃcient (i.e. poly(log N) sized)

implementation. Instead we generally approximate QFT mod N by QFT mod 2

k

where 2

k

is near enough to N to incur only an acceptably small reduction in the success probability

of the algorithm.

Our eﬃcient implementation of QFT is really just a translation of the classical fast

Fourier transform formalism to the quantum scenario. We begin by showing that the n

qubit state

QFT [x` =

1

√

2

n

¸

y

exp 2πi

xy

2

n

[y`

is actually a product state of n one-qubit states. We write 0 ≤ x, y ≤ 2

n−1

in binary (as

n bit strings of digits):

(Warning: Take care to distinguish arithmetic mod 2

n

in Z

2

n used here from the bitwise

arithmetic of n bit strings that we used earlier!)

x = x

n−1

2

n−1

+x

n−2

2

n−2

+ . . . + x

1

2 + x

0

y = y

n−1

2

n−1

+ y

n−2

2

n−2

+ . . . + y

1

2 + y

0

In xy/2

n

we discard any terms that are whole numbers since these make no contribution

to exp 2πixy/2

n

and a direct calculation gives:

xy

2

n

≡ y

n−1

(.x

0

) + y

n−2

(.x

1

x

0

) + . . . + y

0

(.x

n−1

x

n−2

. . . x

0

) (21)

where the factors in parentheses are binary expansions e.g.

.x

2

x

1

x

0

=

x

2

2

+

x

1

2

2

+

x

0

2

3

Now

¸

y

exp 2πi

xy

2

n

[y` =

¸

y

0

,...,y

n−1

exp 2πi

xy

2

n

[y

n−1

` [y

n−2

` . . . [y

0

`

and we want to insert the expression for xy/2

n

from eq. (21) into the exponential.

Since eq. (21) is a sum over the diﬀerent y

i

’s, the exponential will be a product of

these terms and hence the sum

¸

y

0

,...,y

n−1

splits up into a product of single index sums

(

¸

y

0

)(

¸

y

1

) . . . (

¸

y

n−1

) so we get

¸

y

exp 2πi

xy

2

n

[y` =

¸

y

exp 2πi

xy

2

n

[y

n−1

` [y

n−2

` . . . [y

0

` =

[0` +e

2πi(.x

0

)

[1`

[0` + e

2πi(.x

1

x

0

)

[1`

. . .

[0` + e

2πi(.x

n−1

....x

0

)

[1`

. (22)

Hence QFT[x` is the product of corresponding 1-qubit states obtained by taking each

bracket with a 1/

√

2 normalising factor.

This factorisation is the key to building our QFT circuit. It should map each basis (prod-

uct) state [x

n−1

` . . . [x

0

` into the corresponding product state given in eq. (22). Before

COMSM0214 52

we start note that the Hadamard operation can be expressed in our binary fractional

notation as

H [x` =

1

√

2

[0` + e

2πi(.x)

[1`

.

Indeed if x = 0 resp. 1 then .x is 0 resp. 1/2 as a decimal fraction so e

2πi(.x)

is 1 resp.

-1, as required.

To see how the QFT circuit actually works, let’s look at the example of N = 8 i.e. n = 3.

We want a circuit that transforms [x

2

` [x

1

` [x

0

` to the following states in these three

registers (called y

2

, y

1

, y

0

at the output):

y

2

register y

1

register y

0

register

1

√

2

[0` + e

2πi(.x

0

)

[1`

. .. .

⊗

1

√

2

[0` + e

2πi(.x

1

x

0

)

[1`

. .. .

⊗

1

√

2

[0` + e

2πi(.x

2

x

1

x

0

)

[1`

. .. .

STAGE 3 STAGE 2 STAGE 1

H [x

0

`. H [x

1

` followed by H [x

2

` followed by

This operation depends phase shift e

2πi0.0x

0

phase shifts of

only on x

0

(not x

1

, x

2

). i.e. phase shift of e

2πi0.01

e

2πi0.01

and e

2πi0.001

Do it last (third) and controlled by x

0

value. controlled by x

1

and x

0

put result on x

0

line. These operations respectively.

depend on x

1

, x

0

(not x

2

). These operations

Do them second and depend on x

0

, x

1

, x

2

.

accumulate result on x

1

line Do them ﬁrst and

(as x

1

line no longer accumulate result on x

2

line

needed after this). (as x

2

line no longer

needed after this).

After completion of these three stages, the desired ﬁnal contents of the y

0

, y

1

, y

2

lines are

respectively on the x

2

, x

1

, x

0

lines. Thus ﬁnally just reverse the order of the qubits in the

string (e.g. by swap operations).

To draw an actual circuit diagram we consider the three stages in turn. In addition to

the Hadamard gate H we’ll introduce the 1-qubit phase gate:

R

d

=

1 0

0 e

iπ/2

d

=

1 0

0 e

2πi(0.00...01)

(23)

where the binary digit 1 in the last exponential is (d + 1) places to the right of the dot.

The controlled-R

d

gate, denoted C-R

d

acts on two qubits and is deﬁned by the following

actions

C-R

d

[0` [ψ` = [0` [ψ` C-R

d

[1` [ψ` = [1` R

d

[ψ`

for any 1-qubit state [ψ`. Diagramatically this will be denoted as

COMSM0214 53

v

R

d

with a “blob” on the control qubit line.

In terms of all these, the circuit for QFT

8

is

[x

0

`

[x

1

`

[x

2

`

[y

0

`

[y

1

`

[y

2

`

STAGE 1 STAGE 2 STAGE 3 SWAP

`

`

`

`

`

`

`

`

`

`

v

v v

H R

1

R

2

H R

1

H

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

For N = 8 = 2

3

we use 3 Hadamard gates (one in each stage) and 2 +1 controlled phase

gates (in stages 1 and 2 respectively). For general N = 2

n

we would use n Hadamard

gates (one in each of n stages) and (n−1) +(n−2) +. . . +2 +1 = n(n−1)/2 controlled

phase gates (in stages 1, 2, . . . , n−1 respectively). Overall we have O(n

2

) = O((log N)

2

)

gates for QFT mod N. (In this accounting we have ignored the ﬁnal swap operation to

reverse the order of qubits, but this requires only a further O(n) 2-qubit SWAP gates to

implement).

12 Shor’s quantum factoring algorithm

We will now describe Shor’s quantum factoring algorithm. Given an integer N with

n = log N digits this algorithm will output a factor 1 < K < N (or output N if N

is a prime) with any chosen constant level of probability 1 − , and the algorithm will

run in polynomial time O(n

3

). Currently the best known classical algorithm (the so-

called number ﬁeld sieve algorithm) runs in time e

O(n

1/3

(log n)

2/3

)

i.e. there is no known

polynomial time classical algorithm for this task.

We’ll begin by ﬁrst describing some pure mathematics (number theory) – involving no

quantum ingredients at all! – showing how to convert the problem of factoring N into

a problem of periodicity determination. Then we’ll use our quantum period ﬁnding

algorithm to achieve the task of factorisation. We’ll encounter (and deal with) a technical

COMSM0214 54

complication: our function will be periodic on the inﬁnite set Z of all integers so for

computational purposes we need to truncate this down to a ﬁnite size Z

M

for some M

(suitably large, depending on N). Since we do not know the period at the outset the

restricted function will not be exactly periodic on Z

M

: the “last” period will generally

be incomplete (as M is not generally an exact multiple of the period). But we’ll see

that if M is suﬃciently large (in fact M = O(N

2

) will suﬃce) then there will be enough

complete periods so that the single “corrupted” period has only a negligible eﬀect on our

period ﬁnding algorithm. We will also always choose M to be a power of 2 to be able to

use our explicit circuit for QFT mod M for such M’s.

12.1 Factoring as a periodicity problem – some number theory

Let N with n = log N digits denote the integer that we wish to factorise. We start by

choosing 1 < a < N at random. Next using Euclid’s algorithm (which is a poly-time

algorithm) we compute the greatest common divisor b = gcd(a, N). If b > 1 we are

ﬁnished. Thus suppose b = 1 i.e. a and N are coprime. We will use:

Theorem 4 (Euler’s theorem): If a and N are coprime then there is a least power

1 < r < N such that a

r

≡ 1 mod N. r is called the order of a mod N.

We omit the proof which may be found in most texts on number theory.

Now consider the powers of a as a function of the index i.e. the modular exponential

function:

f : Z → Z

N

f(k) = a

k

mod N (24)

Clearly f(k

1

+k

2

) = f(k

1

)f(k

2

) and by Euler’s theorem f(r) = 1 so f(k +r) = f(k) for

all k i.e. f is periodic with period r. Also since r is the least integer with f(r) = 1 we

see that f must be one-to-one within each period.

Next suppose we can ﬁnd r. (We will use our quantum period ﬁnding algorithm for this).

Suppose r comes out to be even. Then

a

r

−1 = (a

r/2

−1)(a

r/2

+ 1) ≡ 0 mod N

i.e.

N exactly divides the product (a

r/2

−1)(a

r/2

+ 1) (25)

(and knowing r we can calculate each of these terms in poly(n) time).

We know N does not divide a

r/2

− 1 (since r was the least power x such that a

x

− 1 is

divisible by N). Thus if N does not divide a

r/2

+ 1 i.e. if a

r/2

≡/ −1 mod N, then in eq.

(25) N must partly divide into a

r/2

− 1 and partly into a

r/2

+ 1. Hence using Euclid’s

algorithm again, we compute gcd(a

r/2

±1, N) which will be factors of N.

All this works provided r is even and a

r/2

≡/ −1 mod N. How likely is this, given that a

was chosen at random? We quote the following theorem.

COMSM0214 55

Theorem 5 Suppose N is odd and not a power of a prime. If a < N is chosen uniformly

at random with gcd(a, N) = 1 then Prob(r is even and a

r/2

≡/ −1 mod N) ≥ 1/2.

For a proof of this result see Preskill’s notes page 307 et seq., Nielsen/Chuang appendix

4.3 or A. Ekert and R. Jozsa, Reviews of Modern Physics, vol 68, p733-753 1996, appendix

B.

Hence for any N which is odd and not a prime power, we will obtain a factor with

probability at least half. Given any candidate factor we can check it (in poly(n) time) by

test division into N. Thus repeating the process, say 10 times, we will fail to get a factor

only with tiny probability 1/2

10

, and succeed with any probability 1 − with log

2

1/

repetitions.

Example 13 Consider N = 15 and choose a = 7. Then a direct calculation shows

that the function f(k) = 7

k

mod 15 for k = 0, 1, 2, . . . has values 1,7,4,13,1,7,4,13,. . .

so r = 4. Thus 7

4

− 1 = (7

2

− 1)(7

2

+ 1) = (48)(50) is divisible by 15 and computing

gcd(15, 48) = 3 and gcd(15, 50) = 5 gives non-trivial factors of 15.

All of this works if N is not even or a prime power. So how do we recognise and treat

these latter cases? If N is even (which is easy to recognise!) we immediately have a

factor 2 and we are ﬁnished. If N = p

l

is a prime power then we can identify this case

and ﬁnd p using the following result (which we quote without proof).

Lemma 3 Suppose N = c

l

for some integers c, l ≥ 2. Then there is a classical polyno-

mial time algorithm that outputs c.

Running this algorithm on any N will output some number c

**and we can check if it
**

divides N or not. If N was a prime power p

l

then c

will be p.

Summarizing the process so far: given N we proceed as follows.

(i) Is N even? If so, output 2 and stop.

(ii) Run the algorithm of lemma 3, test divide the output and stop if a factor of N is

obtained.

(iii) If N is neither even nor a prime power choose 1 < a < N at random and compute

s = gcd(a, N). If s = 1 output s and stop.

(iv) If s = 1 ﬁnd the period r of f(k) = a

k

mod N. (We will achieve this with any

desired level of constant probability 1 − using the quantum algorithm described in the

next section).

(v) If r is odd, go back to (iii). If r is even compute t = gcd(a

r/2

+1, N), so by deﬁnition

t is a factor of N. If t = 1, N output t. If t = 1 or N go back to (iii) and try again.

According to theorem 5 any run of (iv) and (v) will output a factor with probability

> 1/2 so K repetitions of looping back to (iii) will all fail only with probability < 1/2

K

which can be made as small as we like.

COMSM0214 56

12.2 Computing the period of f(k) = a

k

mod N

Let r denote the (as yet unknown) period of f(k) = a

k

mod N on the inﬁnite domain

Z. We will work on the ﬁnite domain D = ¦0, 1, . . . , 2

m

−1¦ where 2

m

is the least power

of 2 greater than N

2

(see later for the reason for this choice). Let 2

m

= Br + b with

1 < b < r i.e. the domain D contains B full periods and only the initial part up to b of

the next period. Using a standard application of computation by quantum parallelism

we manufacture the state

1

√

2

m

¸

x∈D

[x` [f(x)` and measure the second register to obtain

some value y

0

= f(x

0

) with 0 ≤ x

0

< r. In the ﬁrst register we get the state

[per` =

1

√

A

A−1

¸

k=0

[x

0

+ kr`

where

A =

B + 1 =

2

m

r

| + 1 if x

0

< b

B =

2

m

r

| if x

0

≥ b.

(26)

Let

QFT

2

m [per` =

2

m

−1

¸

c=0

˜

f(c) [c` .

Writing ω = e

2πi/2

m

we have

˜

f(c) =

1

√

A

√

2

m

A−1

¸

k=0

ω

c(x

0

+kr)

=

ω

cx

0

√

A

√

2

m

¸

A−1

¸

k=0

ω

crk

¸

.

As before (as in eq. (20), where c was called y) the square bracket is a geometric series

with ratio α = ω

cr

and we have

[. . .] = 1 + α + α

2

+ . . . + α

A−1

=

1−α

A

1−α

for α = 1

A for α = 1.

Let’s look more closely at the ratio α = e

2πicr/2

m

. Previously we had r dividing the

denominator 2

m

exactly and 2

m

/r = A so if α = 1 then α was an A

th

root of unity and

the geometric series summed to zero in all these cases. The only c values that survived

were the exact multiples of A = 2

m

/r having α = 1. There were r such multiples each

with equal [amplitude[ of

1

√

r

.

In the present case r does not divide 2

m

exactly generally so α is not an A

th

root of

unity and we don’t get a lot of “exactly zero” amplitudes for [c`’s! However we aim to

show that a measurement on QFT[per` will yield an integer c-value which is close to a

multiple of 2

m

/r with suitably high probability.

Consider the r multiples of 2

m

/r (which are now not integers necessarily!):

0,

2

m

r

, 2(

2

m

r

), . . . , (r −1)(

2

m

r

).

Each of these is within half of a unique nearest integer. Note that k(2

m

/r) can never be

exactly half way between two integers since r < N and 2

m

> N

2

, so (using 2’s in 2

m

) all

COMSM0214 57

factors of 2 can be cancelled out of the denominator r. Thus we consider c values (r of

them) such that

[c −k

2

m

r

[ <

1

2

k = 0, 1, . . . , (r −1). (27)

In the previous case of exact periodicity (where 2

m

/r was an integer) each of these c-

values appeared with probability 1/r and all other c-values had probability zero. Here

we will show that although the other c-values will generally have non-zero probabilities,

the special ones in eq. (27) still have probability at least γ/r for a constant γ.

` `

, , , , , ,

(a) exact periodicity (b) inexact periodicity

c c

2

m

/r 2

m

/r 2

m

/r ≈ 2

m

/r ≈ 2

m

/r ≈ 2

m

/r

[

˜

f(c)[ [

˜

f(c)[

. . . . . .

Figure 12.2: Schematic depiction of amplitudes in QFT[per`. (a) exact periodicity (r

divides 2

m

): we have nonzero amplitudes only at exact multiples c = k2

m

/r. (b) non-

exact periodicity: we have nonzero amplitudes for many c-values but the integers nearest

to the multiples k2

m

/r still have suitably large amplitudes.

Theorem 6 Suppose we measure the label in QFT[per`. Let c

k

be the unique integer

with [c −k

2

m

r

[ <

1

2

. Then prob(c

k

) > γ/r where γ ≈ 4/π

2

.

Proof: (optional) For any c we have

prob(c) = [

˜

f(c)[

2

=

1

A2

m

1 −α

A

1 −α

2

with α = e

2πicr/2

m

= e

2πi(cr mod 2

m

)/2

m

. For our special c-values satisfying eq. (27) we

have [cr −k2

m

[ < r/2 so

−

r

2

< cr mod 2

m

<

r

2

. (28)

Write α = e

iθ

c

with θ

c

= 2π(cr mod 2

m

)/2

m

so [θ

c

[ < πr/2

m

. Also from eq. (26) we see

that in all cases A < 2

m

/r + 1 so

[Aθ

c

[ <

πr

2

m

A < π(1 +

r

2

m

).

COMSM0214 58

Write Aθ

max

= π(1 +r/2

m

). Note that for all c

0 ≤ [Aθ

c

/2[ < Aθ

max

/2 < π. (29)

To estimate prob(c) we’ll use the algebraic identity

1 −e

iAθ

1 −e

iθ

2

=

sin Aθ/2

sin θ/2

2

.

We have

Prob(c) =

1

A2

m

sinAθ

c

/2

sinθ

c

/2

2

>

1

A2

m

sinAθ

c

/2

θ

c

/2

2

(as sin x < x)

=

A

2

m

sinAθ

c

/2

Aθ

c

/2

2

>

A

2

m

sinAθ

max

/2

Aθ

max

/2

2

where the last inequality follows from eq. (29) and the fact that

sinx

x

is decreasing on

0 < x < π.

Next from eq. (26) we have A > 2

m

/r − 1 so

A

2

m

>

1

r

−

1

2

m

. Introducing g(x) =

sin x

x

2

we have

prob(c) > (

1

r

−

1

2

m

)g(Aθ

max

/2) =

1

r

(1 −

r

2

m

)g(Aθ

max

/2) >

γ

r

(30)

for a constant γ, noting that 2

m

> N

2

and r < N so r/2

m

<< 1 for all large N.

To get a proper lower bound for γ is straightforward but a little messy. Here we will

just consider the case of very large N and ignore terms of order r/2

m

< 1/N. We have

Aθ

max

/2 =

π

2

(1+r/2

m

) ≈ π/2 so g(π/2) = (2/π)

2

and from eq. (30) we get prob(c) > γ/r

for γ ≈ 4/π

2

.

According to this theorem, for each k = 0, . . . , r − 1 we will obtain the unique c-value

satisfying eq. (27) with probability at least γ/r. We will be especially interested in

those c’s for which the corresponding k is coprime to r and there are O(r/ log log r) of

these. Hence the total probability of obtaining such a “good” c-value is O(1/ log log r) >

O(1/ log log N) and with O(log log N) repetitions we will obtain such a good c-value with

any desired constant level of probability. To complete the determination of r and hence

the description of the quantum factoring algorithm, it remains to show that r can be

determined from a (“good”) c-value in time poly(log N).

12.3 Getting r from a good c value

Suppose we have c satisfying eq. (27) i.e.

c

2

m

−

k

r

<

1

2

m+1

. (31)

Recall that r < N and 2

m

> N

2

so

c

2

m

−

k

r

<

1

2N

2

with r < N (32)

COMSM0214 59

and c/2

m

is a known fraction. We claim that there is at most one fraction k

/r

with a

denominator r

**less than N satisfying eq. (32). Hence for given c/2
**

m

, eq. (32) determines

k/r uniquely. To prove this claim suppose k

/r

and k

/r

**both lie within 1/(2N
**

2

) of

c/2

m

. Then

k

r

−

k

r

=

[k

r

−r

k

[

r

r

≥

1

r

r

>

1

N

2

(33)

But k

/r

and k

/r

**are both within 1/(2N
**

2

) of c/2

m

so they must be within 1/N

2

of

each other, contradicting eq. (33). Hence there is at most one k/r with r < N satisfying

eq. (32).

This result is the reason why we chose 2

m

to be greater than N

2

: it guarantees that the

bound on RHS of eq. (32) is < 1/(2N

2

) and then k/r is uniquely determined from c/2

m

.

Example 14 Suppose we wish to factor N = 39 and we have chosen a = 7 which is

coprime to N. Let r be the period of f(x) = 7

x

mod 39. We have N

2

= 1521 and

2

10

< N

2

< 2

11

= 2048 = 2

m

so m = 11. Suppose the measurement of QFT

2

m [per`

yields c = 853. According to our theory, this number has a “reasonable” probability to

be within half of a multiple k2

11

/r of 2

m

/r. If this is actually the case then our theory

guarantees that the fraction k/r is uniquely determined, as the unique fraction k/r with

denominator < 39 that is within 1/2

m+1

= 1/2

12

of 853/2048. In this example we can

(with a calculator) check all fractions a/b with a < b < N = 39 to see which ones (if

any) satisfy

a

b

−

853

2048

<

1

2

12

. (34)

There are O(N

2

) such fractions to try. We ﬁnd that there is only one viz. a/b = 5/12

that satisﬁes eq. (34):

a

b

−

853

2048

= 0.000163 <

1

2

12

= 0.000244

This result is consistent with k = 5 and r = 12 and also with k = 10 and r = 24. But

our theory also guarantees that k is coprime to r with “reasonable” probability which in

this case sets r = 12. We can then verify that 7

12

is indeed congruent to 1 mod 39 and

7

x

for all x < 12 is not congruent to 1 so r = 12 is the correct period.

So far we have that k/r is uniquely determined by c/2

m

but how do we actually compute

k/r from c/2

m

? In the above example we were able to try out all candidate fractions

k

/r

**with denominator less than N. But there are generally O(N
**

2

) such fractions to try

so this method of seeking the unique one is not eﬃcient, requiring at least O(N

2

) steps,

which is exponential in n = log N!

To obtain an eﬃcient (i.e. poly(n) time) method we invoke the elegant mathematical:

Theory of continued fractions

Any rational number s/t (with s < t) may be expressed as a so-called continued fraction:

s

t

=

1

a

1

+

1

a

2

+

1

···+

1

a

l

(35)

COMSM0214 60

where a

1

, . . . , a

l

are positive integers. To do this we begin by writing s/t = 1/(t/s). Since

s < t we have t/s = a

1

+ s

1

/t

1

with a

1

≥ 1 and s

1

< t

1

= s and so

s

t

=

1

a

1

+

s

1

t

1

.

Then repeating with s

1

/t

1

we get t

1

/s

1

= a

2

+ s

2

/t

2

, t

2

= s

1

and

s

t

=

1

a

1

+

1

a

2

+

s

2

t

2

.

Continuing in this way we get a sequence of integers a

k

, s

k

and t

k

. Note that s

k

< t

k

and

t

k+1

is always given by s

k

. Hence the sequence t

k

of denominators is strictly a decreasing

sequence of non-negative integers and hence the process must always terminate, after

some number l, of iterations giving the expression in eq. (35).

To avoid the cumbersome “fractions of fractions” notation in eq. (35) we will write

1

a

1

+

1

a

2

+

1

···+

1

a

l

= [a

1

, a

2

, . . . , a

l

]. (36)

For each k = 1, . . . , l we can truncate the fraction in (36) at the k

th

level to get a sequence

of rational numbers

p

1

q

1

= [a

1

] =

1

a

1

,

p

2

q

2

= [a

1

, a

2

] =

1

a

1

+

1

a

2

=

a

2

a

1

a

2

+ 1

,

p

k

q

k

= [a

1

, . . . , a

k

], . . .

p

l

q

l

= [a

1

, . . . , a

l

] =

s

t

.

p

k

/q

k

is called the k

th

convergent of the continued fraction of s/t.

Continued fractions enjoy the following tantalising properties.

Lemma 4 Let a

1

, . . . , a

l

be any positive numbers (not necessarily integers here). Set

p

0

= 0, q

0

= 1, p

1

= 1 and q

1

= a

1

.

(a) Then [a

1

, . . . , a

k

] = p

k

/q

k

where

p

k

= a

k

p

k−1

+p

k−2

q

k

= a

k

q

k−1

+ q

k−2

k ≥ 2. (37)

Note that if the a

k

’s are integers then so are the p

k

’s and q

k

’s.

(b) q

k

p

k−1

−p

k

q

k−1

= (−1)

k

for k ≥ 1.

(c) If a

1

, . . . , a

l

are integers then gcd(p

k

, q

k

) = 1 for k ≥ 1.

Proof outline (optional):

(a) By induction on k. For the base case k = 2 direct calculation gives [a

1

, a

2

] =

a

2

/(a

1

a

2

+1) and eq. (37) correctly gives p

2

= a

2

and q

2

= a

1

a

2

+1. Thus suppose eq. (37)

holds for length k. For length k +1 we have [a

1

, . . . a

k

, a

k+1

] = [a

1

, . . . , a

k−1

, a

k

+1/a

k+1

]

COMSM0214 61

where the RHS now has length k. Let ˜ p

j

/˜ q

j

be the sequence of convergents of RHS. Then

˜ p

k

/˜ q

k

= [a

1

, . . . a

k

, a

k+1

] = [a

1

, . . . , a

k−1

, a

k

+1/a

k+1

] and clearly ˜ p

k−1

= p

k−1

, ˜ p

k−2

= p

k−2

and similarly for the q’s. Hence using the recurrence relation eq. (37) at length k (twice)

we get:

˜ p

k

˜ q

k

=

(a

k

+ 1/a

k+1

)p

k−1

+p

k−2

a

k

+ 1/a

k+1

)q

k−1

+ q

k−2

=

p

k

+ p

k−1

/a

k+1

q

k

+ q

k−1

/a

k+1

=

a

k+1

p

k

+ p

k−1

a

k+1

q

k

+ q

k−1

i.e. eq. (37) holds for k + 1.

(b) is proved by induction on k using the recurrence relations of (a) to express the (k, k−1)

expression in terms of the same expression with lower values of the subscripts.

(c) follows from (b): if a divides p

k

and q

k

exactly then by (b), a must divide ±1 i.e.

a = 1.

Theorem 7 Consider the continued fraction s/t = [a

1

, . . . , a

l

]. Let p

k

/q

k

= [a

1

, . . . , a

k

]

be the k

th

convergent for k = 1, . . . , l. If s and t (cancelled to lowest terms) are m bit

integers then the length l of the continued fraction is O(m) and this continued fraction

together with its convergents can be calculated in time O(m

3

).

Proof outline (optional):

We have a

k

≥ 1 and p

k

, q

k

≥ 1 so by the above recurrence relations, p

k

and q

k

must

be increasing sequences and p

k

= a

k

p

k−1

+ p

k−2

≥ 2p

k−2

. Similarly q

k

≥ 2q

k−2

. Hence

p

k

and q

k

are each ≥ 2

k/2

so since p

k

and q

k

are coprime and increasing, we must get

s/t after at most l = O(m) iterations. The computation of each successive a

k

involves

the division of O(m) bit integers (and splitting oﬀ the integer parts). These arithmetic

operations can be performed in O(m

2

) time so we can compute all O(m) a

k

’s in O(m

3

)

time. Similarly using the recurrence relation we can compute all p

k

’s and q

k

’s in O(m

3

)

time too.

Theorem 8 Let 0 < x < 1 be a rational number and suppose that p/q is a rational

number such that

x −

p

q

<

1

2q

2

.

Then p/q is a convergent of the continued fraction of x.

Proof (optional):

Let p/q = [a

1

, . . . , a

n

] be the CF expansion of p/q with convergents p

j

/q

j

, so p

n

/q

n

= p/q.

Introduce δ deﬁned by

x =

p

n

q

n

+

δ

2q

2

n

(38)

so [δ[ < 1. We aim to show that the CF of x is an extension of the CF of p/q i.e. we

want to construct λ rational so that x = [a

1

, . . . , a

n

, λ]. In view of lemma 4(a) deﬁne λ

by x = (λp

n

+ p

n−1

)/(λq

n

+ q

n−1

). Using eq. (38) to replace x we get

λ = 2

q

n

p

n−1

−p

n

q

n−1

δ

−

q

n−1

q

n

.

COMSM0214 62

By lemma 4(b), q

n

p

n−1

− p

n

q

n−1

= (−1)

n

. We may assume that this is the same as the

sign of δ since if it is the opposite sign then from the start write p/q = [a

1

, . . . , a

n

−1, 1]

so the value of n is increased by 1 and the sign is ﬂipped. Thus without loss of generality

we can assume that (q

n

p

n−1

−p

n

q

n−1

)/δ is positive and so

λ =

2

δ

−

q

n−1

q

n

> 2 −1 > 1

(as [δ[ < 1 and q

n−1

< q

n

). Next let λ = b

0

+λ

where b

0

is te integer part and 0 < λ

< 1

and write λ

= [b

1

, . . . , b

m

]. So x = [a

1

, . . . , a

n

, λ] = [a

1

, . . . , a

n

, b

0

, b

1

, . . . , b

m

] i.e. p/q is

a convergent of the CF of x as required. (In the last argument we also used the easily

proven fact that the CF expansion of any number is unique, except for the above trick

of splitting 1 oﬀ from the last term i.e. if [a

1

, . . . , a

n

] = [b

1

, . . . , b

m

] and a

n

, b

m

= 1 then

m = n and a

i

= b

i

).

Remark: Theorem 8 actually remains true for irrational x too. For an irrational number the

continued fraction development does not terminate – we get an inﬁnitely long continued fraction

and corresponding inﬁnite sequence of rational convergents p

k

/q

k

k = 1, 2, . . .. This sequence

provides an eﬃcient method of computing excellent rational approximations to an irrational

recalling that q

k

grows exponentially with k and (by theorem 8) it determines the accuracy of

the approximation.

Now let us return to our problem of getting r from the knowledge of c and 2

m

satisfying

eq. (32):

c

2

m

−

k

r

<

1

2N

2

and r < N.

We know that there is (at most) a unique such fraction k/r and according to theorem 8

this fraction must be a convergent of the continued fraction of c/2

m

. Since 2

m

= O(N

2

)

we have that c and 2

m

are O(n) bit integers and the computation of all the convergents

can be performed in time O(n

3

). So we do this computation and ﬁnally check through

the list of O(n) convergents to ﬁnd the unique one satisfying eq. (32), and read oﬀ r as

its denominator.

Example 15 (Continuation of example 14).

Suppose we have obtained c = 853 with 2

m

= 2

11

= 2048. We develop 853/2048 as a

continued fraction:

853

2048

= 1/(2048/853);

2048

853

= 2 +

342

853

;

853

243

= 2 +

169

342

;

342

169

= 2 +

4

169

;

169

4

= 42 +

1

4

;

4

1

= 4 + 0

so

853

2048

= [2, 2, 2, 42, 4].

The convergents are

[2] =

1

2

; [2, 2] =

2

5

; [2, 2, 2] =

5

12

; [2, 2, 2, 42] =

212

509

; [2, 2, 2, 42, 4] =

852

2048

.

Checking these ﬁve fractions we ﬁnd only 5/12 as being within 1/2

12

of 853/2048 and

having denominator < 39.

COMSM0214 63

12.4 Assessing the complexity of the quantum factoring algo-

rithm

Let us now consider all the parts of the quantum factoring algorithm and assess the time

complexity of the whole process. Recall that the best known classical algorithm to factor

N with n = log N digits runs in a time that’s exponential in n

1/3

.

Consider the case where N is neither even nor a prime power and a < N chosen at

random is coprime to N. In this case we must proceed to use the quantum part of the

overall algorithm summarised at the end of section 12.1 i.e. the quantum part (iv), in

addition to some further classical computational steps as well.

We ﬁrst need to compute the function f(k) = a

k

mod N (in superposition) over a domain

0 ≤ k < 2

m

where 2

m

= O(N

2

) so m = O(n). To compute a

k

we use repeated squaring

of a log k| times. Once the exponent is close to k we do a few more multiplications

to reach k itself. This requires O(log k) = O(m) = O(n) multiplications of integers

mod N. Each such multiplication can be performed in O(n

2

) time (by the standard

“long multiplication” algorithm) so the computation of f(k) for any 0 ≤ k < 2

m

can

be performed in O(n

3

) steps. To compute the uniform superposition of all inputs for

this computation we need m = O(n) initial Hadamard operations. Thus the state [f` =

1

2

m

¸

[k` [f(k)` can be computed in O(n

3

) steps.

Remark: There exist algorithms for integer multiplication that are faster than O(n

2

) time,

running in time O(nlog nlog log n) so the above O(n

3

) can be improved to O(n

2

log nlog log n).

Next we perform measurements on the output register of O(n) qubits i.e. O(n) single

qubit measurements. Then we apply QFT mod 2

m

to obtain the state QFT[per`. We

have seen in section 11.3 that QFT mod 2

m

may be implemented in O(m

2

) = O(n

2

)

steps.

Remark: There is a further subtle issue here. To implement QFT mod 2

m

we will need

controlled R

d

gates (cf eq. (23)) with smaller and smaller phases e

iπ/2

d

for d = O(m), which

potentially involves an implementational cost that grows with m. However it can be shown

that we can neglect these gates for very small phases, giving an inexact but still suitably good

approximation to QFT for the factoring algorithm to work, and still have implementational

cost O(n

2

).

Next we measure the state QFT[per` (O(n) single qubit measurements again) to obtain

the value that we called c in section 12.3. Thus to get such a value the number of steps

is O(n

2

log nlog log n) + O(n) + O(n

2

) + O(n) = O(n

2

log nlog log n). To get the period

r we need c to be a “good” c value i.e. c/2

m

is close to a multiple k/r of 1/r where k is

coprime to r. To achieve this with a constant level of probability, O(log log N) = O(log n)

repetitions of the above process suﬃce i.e. O(n

2

(log n)

2

log log n) steps in all.

Remark: Actually it may be shown that a constant number of repetitions suﬃces here (instead

of O(log n)) to determine r. Suppose that in two repetitions we obtain k

1

/r and k

2

/r with

neither k

1

nor k

2

coprime to r. Then we will determine r

1

and r

2

which are the denominators

of k

1

/r and k

2

/r cancelled to lowest terms i.e. r

1

and r

2

will be randomly chosen factors of

COMSM0214 64

r. Then, according to a further theorem of number theory, if we compute the least common

multiple ˜ r of r

1

and r

2

we will have ˜ r = r with probability at least 1/4.

To get r from c we use the (classical) continued fractions algorithm which required O(n

3

)

steps. Finally to obtain our factor of N we (classically) compute t = gcd(a

r/2+1

, N) using

Euclid’s algorithm which requires O(n

3

) steps for n digit integers. If r was odd or r is

even but t = 1 then we go back to the start. But we saw that the good case “r is even

and t = 1” will occur with any ﬁxed constant level of probability 1 − after a constant

number O(log 1/) of such repetitions.

Hence the time complexity of the entire algorithm is O(n

3

) (or actually slightly better

with optimized algorithms and a more careful complicated analysis). It is amusing to

note that the “bottlenecks” of the algorithms performance i.e. the sections requiring the

highest degree polynomial running times, are actually the classical processing sections

and not the novel quantum parts!

13 Quantum algorithms for search problems

Searching is a fundamentally important computational task; most important computa-

tional problems can be thought of as searching tasks. For example consider the class NP

which we intuitively think of as problems that are “hard to solve” (i.e. no poly time

algorithm known) but if a solution (or certiﬁcate of a solution) is given then its correct-

ness can be “easily veriﬁed” (i.e. in poly time). Typically we are faced with a search

over an exponentially large space of candidates seeking a “good” candidate. Given any

candidate it is easy to check if it is good or not.

In this section we will consider the following problem. Suppose we are given a large

database with N items and we wish to locate a particular item. We assume that the

database is entirely unstructured or unsorted but given any item we can easily check

whether or not it is the one we seek. Our algorithm should locate the item with some

constant level of probability (half say) independent of the size N. Each access to the

database is called a query and we normally regard it as one computational step.

For classical computation we may argue that O(N) queries will be necessary and suﬃ-

cient: the good item has completely unknown location; if we examine an item and ﬁnd it

bad, we gain no further information about the location of the good item (beyond the fact

that it is not the current one). Hence if we examine m items the probability of seeing

the good one is p = m/N so we must have m = O(N) to have p constant.

For quantum computation we will see that O(

√

N) queries are suﬃcient (and necessary)

to locate the good item i.e. we get a quadratic speedup over classical search. This

speedup does not cross the polynomial vs. exponential divide (as we did in the case of

the factoring algorithm) but it is still viewed as signiﬁcant in situations where exhaustive

search is the best known classical algorithm. At ﬁrst sight we might have naively expected

an exponential quantum speedup here: suppose N = 2

n

and recall that a quantum

algorithm can easily access 2

n

items in superposition (by use of only n = log N Hadamard

COMSM0214 65

operations) so we can look up the “goodness” of all items in superposition, with just

one query! We may then hope that we could manipulate the resulting quantum state

to eﬃciently reveal the good item. But the above-quoted result shows that this hope

cannot be realised. Intuitively the good item occurs with only an exponentially small

amplitude in the total superposition. If the item were re-located at another place then

the corresponding quantum state would diﬀer only by an exponentially small amount in

the space of quantum states and it will thus be very diﬃcult to reliably distinguish by

any physical process.

Above we have initiated a consideration of unstructured search. But databases are often

structured in a way that can facilitate the search. As an example suppose our N items are

labelled by the numbers from 1 to N and we seek the one labelled k. Unstructured search

(requiring O(N) queries) corresponds to the database containing the numbers in some

unknown random order. But if the items are structured by being presented in numerical

order, then we can locate k with only O(log N) queries (in fact exactly 1log N queries)

using a binary search procedure: each query of a middle item eliminates an entire half

of the remaining database. This kind of structured search is common in practice e.g.

the lexicographic ordering of names in a large phone book facilitating search for a given

person’s number. But suppose we were given a person’s number and asked to determine

their name. Then we would be faced with an essentially unstructured search requiring a

lot more time!

In the following we will consider quantum algorithms for only unstructured search, in

particular Grover’s quantum searching algorithm which achieves this search in O(

√

N)

queries. The issue of understanding which kinds of structure in a database can provide a

good beneﬁt for quantum versus classical computation is still largely open and a topic of

current research. (One interesting known result is that in the case of a linearly ordered

database (such as the phone book above) any quantum algorithm still requires O(log N)

queries but the actual number of queries now is k log N with k strictly less than 1).

As a preliminary to our discussion of Grover’s algorithm we introduce some further

features of the Dirac bra-ket notation.

13.1 Reﬂections and projections in Dirac ket notation

For clarity we will present the discussion here with 2 dimensional (qubit) states although

all the issues generalise readily to any dimension d.

Recall that a ket [ψ` is represented in components as a column vector and the corre-

sponding bra 'ψ[ is just a notation for the conjugate transpose (a row vector):

a

0

a

1

†

=

a

∗

0

a

∗

1

.

Here we use the dagger symbol to denote conjugate transpose of any matrix and the star

COMSM0214 66

symbol (rather than the previously used overline) to denote complex conjugation. If

[α` =

a

0

a

1

[β` =

b

0

b

1

**are any kets then 'α[β` is the inner product obtained by matrix multiplication
**

'α[β` =

a

∗

0

a

∗

1

b

0

b

1

= a

∗

0

b

0

+ a

∗

1

b

1

.

Thus we can view 'α[ as a mapping from states to complex numbers, mapping any [ψ`

to 'α[ψ`. Consider now the construction

P

|α

= [α` 'α[ =

a

0

a

1

a

∗

0

a

∗

1

=

a

0

a

∗

0

a

0

a

∗

1

a

1

a

∗

0

a

1

a

∗

1

.

This is a 2-by-2 matrix i.e. an operation mapping kets to kets. In Dirac notation we have

P

|α

[β` = [α` 'α[β` i.e. P

|α

[β` is always proportional to [α` with multiplicative constant

given by the inner product 'α[β`. More cumbersomely, in terms of components we have

a

0

a

∗

0

a

0

a

∗

1

a

1

a

∗

0

a

1

a

∗

1

b

0

b

1

= . . . =

(a

∗

0

b

0

+ a

∗

1

b

1

)a

0

(a

∗

0

b

0

+ a

∗

1

b

1

)a

1

as expected.

Now let

α

⊥

be any chosen normalised vector that’s orthogonal to [α` i.e.

α

⊥

[α

= 0.

Then ¦[α` ,

α

⊥

**¦ is an orthonormal basis of the two dimensional space so any ket [β`
**

can be uniquely written as components parallel and perpendicular to [α`:

[β` = x [α` + y

α

⊥

(39)

for some x and y with [x[

2

+[y[

2

= 1. Then

P

|α

[β` = x [α` 'α[α` + y [α`

α[α

⊥

= x [α`

i.e. geometrically P

|α

is the operator of projection parallel to [α`.

Next consider

I

|α

= I −2 [α` 'α[ = I −2P

|α

(where I is the identity operator). Referring to eq. (39) we have

I

|α

[β` = (I −2P

|α

)(x [α` + y

α

⊥

)

= x [α` +y

α

⊥

−2x [α`

= −x [α` + y

α

⊥

.

Hence I

|α

simply reverses the sign of the component of [β` that’s parallel to [α` so

geometrically we interpret I

|α

as a reﬂection operator, reﬂecting in the mirror line

α

⊥

**perpendicular to [α`. Pictorially we have:
**

COMSM0214 67

`

>

>

>

>

>

>

>

>

>

>

> `

I

|α

[β`

α

⊥

[β` P

|α

[β`

[α`

Note that these operations act in a space of complex vectors. But in the special case

that all components are real numbers, they are exactly projections and reﬂections in real

2-dimensional Euclidean geometry.

Let U =

s t

u v

**be a unitary operation. Note that the dagger operation (of conjugate
**

transposition) reverses the order of any matrix multiplication:

(AB)

†

= B

†

A

†

for any matrices A and B.

Hence if [ψ` = U [α` then 'ψ[ = (U [α`)

†

= [α`

†

U

†

= 'α[ U

†

. In components we have

'ψ[ =

¸

s t

u v

a

0

a

1

†

=

a

∗

0

a

∗

1

s

∗

u

∗

t

∗

v

∗

.

Thus in Dirac notation if [ψ` = U [α` then

P

|ψ

= [ψ` 'ψ[ = U [α` 'α[ U

†

= UP

|α

U

†

(40)

i.e. we have P

U|α

= UP

|α

U

†

. Similarly we have I

U|α

= UI

|α

U

†

.

13.2 Grover’s Quantum Searching Algorithm

We consider the fundamental problem of unstructured search for a unique item: given

a search space of size N in which a single (random) entry has been marked, our task is

to locate it. More precisely we wish to devise a procedure that will locate it with some

constant level of probability (say

1

2

), independent of the size N of the database. We have

argued in the introduction that any classical method will need O(N) queries to solve

this problem. We now describe a quantum algorithm, originally due to Lov Grover in

1996 which solves the problem with only O(

√

N) queries. In the quantum context we

COMSM0214 68

allow the simultaneous querying of many elements of the search space in superposition,

which counts as one query. We will give a simple geometrical interpretation of Grover’s

algorithm which clariﬁes its workings.

It will be convenient to take the size N of our search space to be a power of 2 viz. N = 2

n

.

Thus we can label the entries by bit strings (i.e. strings of 0’s and 1’s) of length n. Let

B = ¦0, 1¦ and let B

n

denote the set of all 2

n

n-bit strings. Our search problem may

then be phrased in terms of a black box promise problem as follows. We will replace the

database by a black box which computes an n bit function f : B

n

→ B. It is promised

that f(x) = 0 for all n bit strings except exactly one string, denoted x

0

(the “marked”

position that we seek) for which f(x

0

) = 1. Our problem is to determine x

0

. As usual

we assume that f is given as a unitary transformation U

f

on n + 1 qubits deﬁned by

U

f

[x` [y` = [x` [y ⊕f(x)` (41)

Here the input register [x` consists of n qubits as x ranges over all n bit strings and the

output register [y` consists of a single qubit with y = 0 or 1. The symbol ⊕ denotes

addition modulo 2. Pictorially we have

U

f

[x`

[y`

[x`

[y ⊕f(x)`

The assumption that the database is unstructured is formalised here as the standard

oracle idealisation that we have no access to the internal workings of U

f

– it operates

as a “black box” on the input and output registers, telling us only if the queried item is

good or not.

Instead of using U

f

we will generally use a closely related operation denoted I

x

0

on n

qubits. It is deﬁned by

I

x

0

[x` =

[x` if x = x

0

−[x

0

` if x = x

0

(42)

i.e. I

x

0

simply inverts the amplitude of the [x

0

` component. If x

0

is the n bit string

00 . . . 0 then I

x

0

will be written simply as I

0

.

A black box which performs I

x

0

may be simply constructed from U

f

by just setting the

output register to

1

√

2

([0` −[1`). Then the action of U

f

leaves the output register in this

state and eﬀects I

x

0

on the input register. Pictorially

COMSM0214 69

U

f

[ψ`

1

√

2

([0` −[1`)

I

x

0

[ψ`

1

√

2

([0` −[1`)

Our searching problem becomes the following: we are given a black box which computes

I

x

0

for some n bit string x

0

and we want to determine the value of x

0

using the least

number of queries to the box.

We will work in a space of n qubits with a standard basis ¦[x`¦ labelled by n-bit strings

x. Let B

n

denote the space of all n-qubit states. Let H

n

= H ⊗ . . . ⊗ H acting on B

n

denote the application of H to each of the n qubits separately.

Grover’s quantum searching algorithm operates as follows. Having no initial information

about x

0

we begin with the state

[ψ

0

` = H

n

[0 . . . 0` =

1

√

2

n

¸

x

[x` (43)

which is an equal superposition of all possible x

0

values. Consider the compound operator

Q deﬁned by

Q = −H

n

I

0

H

n

I

x

0

. (44)

Note that all amplitudes in [ψ

0

` and all matrix elements of Q are real numbers so to

analyse Q we will be justiﬁed in using literally the geometrical interpretations of the

operators described in section 13.1 (i.e. in terms of real Euclidean geometry).

In the next section we will explain the structure of Q and show that it has a simple

geometrical interpretation:

(Q1): In the plane {(x

0

) spanned by (the initially unknown) [x

0

` and [ψ

0

`, Q is rotation

through angle 2α where sin α =

1

√

N

.

(Q2): In the subspace orthogonal to {(x

0

), Q = −I where I is the identity operation.

Thus by repeatedly applying Q to the starting state [ψ

0

` in {(x

0

) we may rotate it

around near to [x

0

` and then determine x

0

with high probability by a measurement in the

standard basis. For large N, [x

0

` and [ψ

0

` are almost orthogonal and 2α ≈ 2 sin α =

2

√

N

.

Thus about

π

4

√

N iterations will be needed. Each application of Q uses one evaluation

of I

x

0

and hence of U

f

so O(

√

N) evaluations are required, representing a square root

speedup over the O(N) evaluations needed for a classical unstructured search. More

precisely we have 'x

0

[ψ

0

` =

1

√

N

so the number of iterations needed is the integer nearest

to (arccos

1

√

N

)/(2 arcsin

1

√

N

) (which is independent of x

0

).

COMSM0214 70

A simple striking example is the case of N = 4 in which sin α =

1

2

and Q is a rotation

through π/3. The initial state is [ψ

0

` =

1

2

([0` + [1` + [2` + [3`) and for any marked x

0

the angle between [x

0

` and [ψ

0

` is precisely π/3 too. Hence after one application of Q

i.e. just one query, we will learn the position of any single marked item in a set of four

with certainty!

13.3 The Iteration Operator Q – Reﬂections and Rotations

To explain the structure of Q we begin with some elementary properties of reﬂections.

For any state [ψ` in B

n

consider the operator

I

|ψ

= I −2 [ψ` 'ψ[ (45)

where I is the identity operator in B

n

. Let H

⊥

([ψ`) denote the hyperplane (i.e. the

(N − 1) dimensional subspace) of all states orthogonal to [ψ`. Then any state [ξ` may

be uniquely decomposed into components parallel and perpendicular to [ψ`:

[ξ` = a [ψ` +b [ξ

` (46)

where [ξ

` is in H

⊥

([ψ`) and we get directly

I

|ψ

[ξ` = −a [ψ` + b [ξ

` . (47)

Thus I

|ψ

simply inverts the parallel component i.e. it can be thought of geometrically

as the operation of reﬂection in the hyperplane orthogonal to [ψ`. We have the following

simple properties of I

|ψ

.

Lemma 5 If [ξ` is any state then I

|ψ

preserves the 2-dimensional subspace o spanned

by [ξ` and [ψ`.

`

`

•

o

H

⊥

([ψ`)

[ξ`

[ψ`

**Proof of lemma 5: Eq. (45) shows that I
**

|ψ

takes [ψ` to −[ψ` and for any [ξ`, it adds

a multiple of [ψ` to [ξ`. Hence any linear combination is mapped to a linear combination

of the same two states.

COMSM0214 71

Lemma 6 For any unitary operator U

UI

|ψ

U

†

= I

U|ψ

Proof: This is immediate from eq. (45):

UI

|ψ

U

†

= I −2U [ψ` 'ψ[ U

†

= I −2 [Uψ` 'Uψ[ = I

U|ψ

where we have used that UIU

†

= UU

†

= I as U is unitary.

Looking back at eq. (44) and noting that H = H

†

we see that

Q = −I

H

n

|0

I

|x

0

(48)

By lemma 1, both I

x

0

and I

H

n

|0

preserve the two dimensional subspace {(x

0

) spanned by

[x

0

` and H

n

[0` . Hence by eq. (48), Q preserves {(x

0

) too. Since all matrix elements are

real numbers we may restrict attention the real (rather than the complex) two dimensional

subspace {(x

0

).

We are now in a position to ﬁnally identify geometrically what Q actually does. For any

vector v ∈ IR

2

let I

v

denote the operation of reﬂection in the line perpendicular to v

through the origin in R

2

.

Lemma 7 Let M1 and M2 be two mirror lines in the Euclidean plane IR

2

intersecting

at a point O and let θ be the angle in the plane from M1 to M2 (cf ﬁgure below). Then

the operation of reﬂection in M1 followed by reﬂection in M2 is just rotation by angle

2θ about the point O.

θ

O

M1

M2

•

Proof of lemma 7: This is immediate, for example, from standard matrix expressions

for rotations and reﬂections in R

2

.

Using lemma 7 we see that the action of I

H

n

|0

I

|x

0

= −Q in {(x

0

) is a rotation through

2β where cos β = 'x

0

[ H

n

[0` =

1

√

N

. For large N, β ≈ π/2 and we have a rotation of

COMSM0214 72

almost π. It would be possible to use this large rotation as the basis of the quantum

searching algorithm but we prefer a smaller incremental motion. We could use the op-

erator (I

H

n

|0

I

|x

0

)

2

but there is another solution, explaining the occurrence of the minus

sign in the deﬁnition of Q:

Lemma 8 For any 2 dimensional real v we have

−I

v

= I

v

⊥

where v

⊥

is a unit vector perpendicular to v.

Proof: For any vector u we write u = av +bv

⊥

. Then I

v

just reverses the sign of a and

−I

v

reverses the sign of b. Thus the action of −I

v

is the same as that of I

v

⊥.

Hence Q = −I

H

n

|0

I

|x

0

acting in {(x

0

) is a rotation through 2α where α is the angle

between [x

0

` and a perpendicular state to H

n

[0` i.e. sin α = 'x

0

[ H

n

[0` =

1

√

N

as claimed

in (Q1). To see the eﬀect of Q on states orthogonal to {(x

0

) suppose that [ξ` ∈ B

n

is

orthogonal to both H

n

[0` and [x

0

`. Then from the deﬁnitions of I

|x

0

and I

H

n

|0

in eq.

(45) we see that I

|x

0

[ξ` = I

H

n

|0

[ξ` = [ξ` so Q = −I in the orthogonal complement to

{(x

0

), as claimed in (Q2).

Thus even though x

0

is unknown (but we are given a black box for I

x

0

) we can construct

a rotation operator Q in the plane spanned by the ﬁxed starting state [ψ

0

` and the

unknown [x

0

`. Furthermore the angle between the starting state and [x

0

` is independent

of the value of x

0

(as the starting state is an equal superposition of all possible x

0

values)

so the number of iterations is independent of x

0

too.

13.4 Some further features of Grover’s algorithm

Optimality

Grover’s algorithm achieves unstructured search for a unique good item with

π

4

√

N

queries. Is it possible to invent an even more ingenious quantum algorithm that uses

fewer queries? Alas the answer is no:

Theorem 9 Any quantum algorithm that achieves the search for a unique good item in

an unstructured database of size N (with a constant level of probability, say half ) must

use O(

√

N) queries.

More precisely the order constant can be estimated to give a requirement of at least

π

4

(1 −)

√

N queries for any > 0 so Grover’s algorithm is optimal in a tight sense. We

will not go through the proof of this result here. One possible argument may be found

in Preskill’s notes section 6.5.

Searching with multiple good items

COMSM0214 73

Suppose our search space contains r ≥ 1 good items and we wish to ﬁnd any one such

item. Consider ﬁrst the case that r is known. In this case we’ll see that our previous

algorithm still works; we just need to modify the number of iterations in a way that

depends on r.

Let the good items be denoted x

1

, . . . , x

r

so now f(x

i

) = 1 for i = 1, . . . , r and f(x) = 0

for all other x’s. Using the same construction that gave I

x

0

from U

f

in the case of a

single good item, we obtain the operator I

G

(where G stands for “good”) with action:

I

G

[x` =

[x` if x = x

1

, . . . , x

r

−[x` if x = x

1

, . . . , x

r

and the iteration operator (cf eq. (44)) is

Q

G

= −H

n

I

0

H

n

I

G

= −I

|ψ

0

I

G

.

Let

[ψ

G

` =

1

√

r

r

¸

i=1

[x

i

`

be the equal superposition of all good items. We can separate out the good and bad

parts of the full equal superposition [ψ

0

` writing:

[ψ

0

` =

1

√

N

¸

all x

[x` =

√

r

√

N

[ψ

G

` +

√

N −r

√

N

[ψ

B

` (49)

where [ψ

B

` =

1

√

N−r

¸

bad x

[x` is the equal superposition of all bad items and [ψ

G

` and

[ψ

B

` are orthogonal states.

Theorem 10 Let {

G

be the plane spanned by [ψ

0

` and [ψ

G

`. Then the action of Q

G

preserves this plane and within {

G

this action is rotation through angle 2α where

sin α = 'ψ

0

[ψ

G

` =

r

N

.

Proof: Clearly I

|ψ

0

preserves {

G

since acting on any [ψ` it just subtracts a multiple of

[ψ

0

`. For I

G

we note that by eq. (49), {

G

can also be characterised as the plane spanned

by the orthogonal states [ψ

G

` and [ψ

B

`. Now I

G

[ψ

G

` = −[ψ

G

` and I

G

[ψ

B

` = [ψ

B

` so

for any state [ψ` = a [ψ

G

` + b [ψ

B

` in {

G

the action of I

G

is to subtract a multiple of

[ψ

G

` i.e. the result lies in the plane too. This also shows that within {

G

, I

G

coincides

with the operation I

|ψ

G

(cf eq. (45)) and Q

G

= −I

|ψ

0

I

|ψ

G

= I

[

ψ

⊥

0

`

I

|ψ

G

. Hence exactly

as before, Q is a rotation through angle 2α where α is the angle between

ψ

⊥

0

and [ψ

G

`

i.e. sin α = 'ψ

0

[ψ

G

` =

r/N.

Now suppose that we start with [ψ

0

` and repeatedly apply Q

G

. The angle between [ψ

0

`

and [ψ

G

` is β where cos β = 'ψ

0

[ψ

G

` =

r/N. Each application of Q

G

is a rotation

through 2α where sin α =

r/N so we need β/(2α) = (arccos

r/N)/(2 arcsin

r/N)

COMSM0214 74

iterations to move [ψ

0

` very close to [ψ

G

`. If r << N then [ψ

0

` and [ψ

G

` are almost

orthogonal (β ≈ π/2) and α ≈ sin α =

r/N so we need

π

4

N/r iterations.

We can also adapt the algorithm to work in the case that r is unknown. The apparent

diﬃculty is the following: if we start with [ψ

0

` and repeatedly apply the operator Q (in

either case r = 1 or r > 1) we just rotate the state round and round in the plane of [ψ

0

`

and [ψ

G

`. The trick is to know when to stop i.e. when the state lines up closely with

[ψ

G

` in this plane. But if r is unknown then the rotation angle 2α of Q is unknown!

To illustrate the way around this problem we’ll consider only the case where the unknown

r is very small r << N. (General r values can be addressed by a more complicated

argument along similar lines). We choose a number K randomly in the range 0 < K <

π

4

√

N, apply K iterations of Q, measure the ﬁnal state and test if the result is good or

not. For r << N each iteration is a rotation through small angle 2α ≈ 2

r/N i.e. we

have chosen a random angle in the range 0 to

√

r

π

2

of

√

r quadrants. Equivalently we can

choose one of the

√

r quadrants at random and then a random angle in it. Now think of

[ψ

0

` as the x-axis direction and [ψ

G

` as the y axis direction (recalling that these states

are almost orthogonal for r << N). If the ﬁnal rotation angle is within ±45

◦

of the y

axis then the ﬁnal state [ψ` has [ 'ψ[ψ

G

` [

2

≥ cos

2

45

◦

= 1/2 i.e. we have probability

at least half of seeing a good item in our ﬁnal measurement. Now for every quadrant,

half the angles are within ±45

◦

of the y axis so our randomised procedure above, using

O(

√

N) queries, will locate a good item with probability at least 1/4. Repeating the

whole procedure a constant number of times, say M = 10 times, thus still using O(

√

N)

queries, we will fail to locate a good item only with tiny probability (3/4)

M

= (3/4)

10

.

This case of unknown r is directly relevant to the consideration of computational tasks

in NP, where rather than locating a good item we want instead to know whether a good

item exists or not. Consider for example the task SAT: given a Boolean function f, does

it have a satisfying assignment or not? f will generally have some unknown number r ≥ 0

of satisfying assignments. We run the above randomised version of Grover’s algorithm,

say 10 times, checking each output x to see if f(x) = 1 or not. If they all fail we conclude

that f is not satisﬁable, which will be correct with high probability 1 −(3/4)

10

. In this

way Grover’s algorithm can be applied to any NP problem to provide a quadratic speedup

over classical exhaustive search.

COMSM0214 .

2

“Joy in looking and comprehending is nature’s most beautiful gift.”

Aphorism by Albert Einstein, 1953.

COMSM0214

3

CONTENTS

1 2 2.1 2.2 2.3 3 3.1 3.2 3.3 4 4.1 5 5.1 6 7 8 9 9.1 9.2 9.3 10 10.1 10.2 10.3 11 11.1 11.2 11.3 12 12.1 12.2 12.3 12.4 13 13.1 13.2 13.3 13.4 Introduction Prologue Bit strings and vectors Computational steps as linear operations Complex numbers – summary Qubits and quantum states Two-qubit states Multi-qubit states Lengths and inner products Physical operations on qubits Operations on 1 or 2 qubits in n qubits Quantum measurements Extended Born rule Quantum interference Quantum non-locality Quantum teleportation Quantum computation – circuit model Reversible gate for any Boolean function Time complexity – P,BPP,BQP Query complexity and promise problems Computation by quantum parallelism, Deutsch algorithm Computation by quantum parallelism Deutsch algorithm DJ algorithm Quantum Fourier transform and periodicities QFT mod N Periodicity determination Eﬃcient implementation of QFT Shor’s quantum factoring algorithm Factoring as a periodicity problem Computing the period r of f (k) = ak mod N Getting r from a good c value Assessing the complexity of Shor’s algorithm Quantum algorithms for search problems Reﬂections and projections on Dirac notation Grover’s quantum searching algorithm The iteration operator Q – reﬂections and rotations Some further features of Grover’s algorithm ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· 4 5 6 8 13 14 15 16 17 19 21 22 23 25 27 31 35 38 39 40 42 42 42 44 46 46 48 50 53 54 56 58 63 64 65 67 70 72

Hence the possibilities and limitations of information storage. In view of Feynman’s quote above these new models are not merely unrealistic abstract constructs (such as the notion of non-deterministic TM) but are actually available for implementation in the real world of computer technology. by approximately a factor of 4 every 3. The issue here is a deep fundamental connection between physics and computation. If this trend continues we will reach the subatomic scale by 2015.what is quantum computation? Quantum physics diﬀers dramatically from classical physics in its representation of the physical world and the kinds of processes that are allowed by the physical laws. Fundamental issues: our ﬁrst category is well summed up by a quote from the physicist Richard Feynman: “Because nature isn’t classical. It emerged in the mid-1980’s and it is currently one of the most active areas of all scientiﬁc research internationally. At this scale classical physics fails completely and quantum eﬀects are dominant – components begin to malfunction in bizarre ways. positions of switches etc) – a computer is always a physical device. Bit values 0 and 1 are just two distinguishable states of some physical system. all must depend on the laws of physics which characterise the allowable kinds of evolutions etc. But what is “information”? It is always represented in physical degrees of freedom of a physical system (voltage levels. We could either aim to re-design our components to stamp out the new eﬀects and provide the same functions as before.5 years. Our next category of reasons shows that the latter is the way to go! Theoretical issues: as already mentioned the mathematical formalism of quantum theory leads to new “non-classical” modes of computation providing remarkable new possibilities.COMSM0214 4 1 Introduction . Quantum computation is the study of the possible applications and exploitation of these novel quantum eﬀects in issues of computation.. after all. It turns out that such “standard” models capture the computational power of classical physics (from which they are. One of the most signiﬁcant issues in computational complexity theory is the question of the existence of a polynomial time algorithm for a given computational . What is processing? It is physical evolution of the system. It is a highly signiﬁcant area of study for a variety of reasons which we collect into three categories. It cannot be derived from thought/mathematics alone. complexity and communication. We ask.”. computation and how eﬃciently a computation can be carried out. aiming to exploit them in new kinds of computational functionalities. leading us to our second category of reasons for quantum computation: Technological issues: according to Moore’s law. or else we could embrace the new quantum eﬀects. what is computation really? It is “processing” of “information”.. In computer science we study computation by ﬁrst setting up a theoretical computational model such as the Turing machine (TM). dammit. intuitively motivated). This subject is a fascinating hybrid of theoretical computer science and quantum physics. But if we start from quantum physics we are led to rather diﬀerent models which can provide remarkable new modes of computation that are not available in the formalism of standard TMs. since 1965 there has been a steady rate of miniaturisation of computer component.

In classical computation there is no known algorithm that runs in polynomial time (in the number of digits) but in 1994 Peter Shor discovered a polynomial time quantum algorithm for factorisation. So. Mermin: “Only a rash person would declare that there will be no useful quantum computers by 2050. is intended to be a realisable technology. Conventionally classical information is represented as a string of bits such as 11001011. with exponentially fewer steps) than any (known) classical algorithm for the task. 2 Prologue . being based on a “real” physical theory.COMSM0214 5 task. but only a rash person would predict that there will be”. Then we will be able to give a precise meaning to the notion of “quantum computation” and discuss a variety of quantum algorithms (including Shor’s algorithm) to illustrate the power of quantum versus classical computation. A classical computation processes this information in discrete steps by updating it into . as we’ll see later in this course. This approach is intended to make the transition from classical to quantum concepts more transparent. The main focus of this course will be to introduce the basic formalism of quantum theory (as far as required for our purposes.a curious way to represent classical computation We begin our approach to the formalism of quantum theory by representing familiar classical computation in an unusual notational way. Quantum physics also has remarkable implications for issues of communication such as the so-called process of quantum teleportation (which we’ll also discuss) and a variety of important cryptographic issues (which we wont treat in this course) such as the ability to implement provably secure communication. This is achieved not by an increase in clock speed of steps but by exploiting entirely new (quantum) kinds of computational steps that are not available to classical computers but are allowed by quantum physics.e. To date most of the individual ingredients of quantum algorithms have been demonstrated successfully experimentally but the construction of a “scalable” quantum computing device to carry out large computations remains well beyond the perceived limits of current state of the art quantum technology and laboratory experimentation. In some cases the new quantum computational possibilities are able to bridge this barrier. D. The most famous example is the computational task of integer factorisation. However we will see examples in which quantum computation can solve computational tasks exponentially faster (i. when will we have a working quantum computer? Breakthroughs in technology are often sudden and unpredictable and we quote the physicist N. and assuming no prior contact). It is issues of computational complexity rather than computability itself that are at the heart of the beneﬁts of quantum versus classical computation: it will be clear from our notion of quantum computation that a quantum computer cannot compute anything that is classically uncomputable (such as deciding the famous halting problem for Turing machines). Quantum computation.

In ket notation these become |a and |v etc. we can now develop the basic elements of classical computation in terms of the linear algebra of vectors rather than in terms of Boolean operations on the bit values 0. Furthermore we will take these distinct objects (actually distinguishable states of some classical physical system) to be orthogonal unit vectors in a 2-dimensional space! More general vectors such as a |0 +b |1 (where a and b are numbers) have no signiﬁcance for classical computation but they will later become very important in the quantum context. . We focus on this Boolean circuit or gate array model of classical computation (in contrast to say. . The dimension of a vector space is the number of elements in a basis. At present your task is to become familiar with the relevant linear algebra that we develop. 2. |en } such that any vector |v can be written uniquely as a linear combination |v = a1 |e1 + . These updates are carried out in a “local” manner: in each step only a few contiguous bits are changed by the application of a so-called Boolean gate. This is the so-called Dirac “ket” notation that features widely in quantum theory. an (the components of the vector in the basis) will be complex numbers. For our applications the coeﬃcients a1 . an Strictly speaking this is not an equality. . To do this we will need some mathematics of linear algebra which we will introduce in situ along the way as “linear algebra inserts”. |v = . . .COMSM0214 6 other such strings. (This can be shown to be the same for any choice of basis). 0 or 1. .1 directly. but we will abuse notation and frequently write an equality symbol here. Linear algebra insert 1 (vector basis and components): A basis for a vector space is a set of vectors {|e1 . . the Turing machine model) because it allows the simplest generalisation to the notion of quantum computation.1 Bit strings and vectors Each bit has two values or “states” viz. Once we have chosen a basis any vector |v can be represented as the column vector of its components: a1 . In elementary mathematics vectors are often denoted as underlined symbols such as a or v etc. . . From the viewpoint of just classical computation this linear algebra formulation appears very strange and even perversely pointless! But its signiﬁcance will soon become apparent when we begin to consider the structure of quantum theory. Having introduced these orthogonal vectors to represent bits. . only a representation. + an |en . . . . which we’ll write using a curiously asymmetrical bracket notation as |0 and |1 respectively.

we write |0 |0 etc. . |em } and let W be a vector space of dimension n with basis {|e1 . . . |1 |1 suggesting a kind of “multiplication” of “single bit vectors”. |10 . m. the components cij in eq.e. d are complex numbers. bn = |b1 . |bn associated to each n-bit string b1 . We also sometimes write |ei ej as |ei ⊗ ej . . Mathematically this will be the so-called tensor product of vectors (see insert 2). |01 . . Note that this multiplication is not commutative – order matters! e. But not all vectors in V ⊗ W can be “factorised” in this way! We will say more about this important point later. . . Linear algebra insert 2 (tensor products of vectors): Let V be a vector space of dimension m with basis {|e1 . . . .g. sometimes denoted with a ⊗ symbol i. b. denoted |v |w or |v ⊗|w . c.e. |1 } is a basis for a 2 dimensional space. denoted V ⊗ W . (1) for such products of vectors from V and W . . is |v = a |0 |0 + b |0 |1 + c |1 |0 + d |1 |1 where a. if we wish to make the product operation explicit. . A general vector in this 4 dimensional space associated to 2-bit strings. . . . as |0 ⊗ |0 etc. . |0 |1 . |1 |0 . . . If |v = i ai |ei and |w = j bj ej are vectors in V and W respectively then their tensor product. . |en }. j = 1. The tensor product of V and W .COMSM0214 7 Example 1 The set {|0 . Similarly for n-bit strings we get a space of dimension 2n with a basis vector |b1 . |11 which we also write as |0 |0 . A general vector |u in V ⊗ W can be written m n |u = i=1 j=1 cij |ei ej (1) (where the coeﬃcients cij are complex numbers). is a vector space of dimension mn with a basis of mn vectors written formally as {|ei ej : i = 1. . 0 1 b For two bits we introduce four orthogonal unit basis vectors |00 . n}. have the special product form cij = ai bj . b n . . lies in V ⊗ W and is deﬁned by |v |w = ( i ai |ei ) ⊗ ( j bj e j ) = i j ai bj |ei ej i. . If |v = a |0 + b |1 then we write 1 0 a |0 = |1 = |v = . |0 |1 is diﬀerent from |1 |0 etc.

the basis state |101 = |1 |0 |1 (the sixth in our basis list) has 0 0 0 0 1 0 0 ⊗ ⊗ = .2 Computational steps as linear operations on vectors For updating a bit string in a computational step we will consider reversible operations on n-bit strings (where n is typically 1. These are mappings from n-bit strings to n-bit strings such that the input string can be uniquely determined from the output string i. c d Thus V ⊗ V is the vector space associated to 2-bit strings. x1 y0 x1 y1 x0 y0 z0 . the mapping is one-to-one so it must be a permutation of the set of all bit string .COMSM0214 8 Example 2 Let V with basis {|0 . Then V ⊗ V contains vectors of the form a |0 |0 + b |0 |1 + c |1 |0 + d |1 |1 which we write in components as the column vector a b .e. = . |1 } be the 2 dimensional vector space associated to a single bit. 0 1 0 1 1 0 0 2. x1 y1 z1 x0 x1 |v = y0 |0 + y1 |1 = y0 y1 ⊗ Similarly for three vectors we get 8 components x0 x1 ⊗ y0 y1 ⊗ z0 z1 and for example.2 or 3). . If we have two vectors in V : |u = x0 |0 + x1 |1 = then |u |v = (x0 |0 +x1 |1 )⊗(y0 |0 +y1 |1 ) = x0 y0 |0 |0 +x0 y1 |0 |1 +x1 y0 |1 |0 +x1 y1 |1 |1 and in components we write x0 x1 y0 y1 x0 y0 xy = 0 1 .

by allowing it to “act freely across sums”: X(a |0 + b |1 ) = aX |0 + bX |1 = a |1 + b |0 . However we will frequently abuse notation and write equality here. If we represent vectors in terms of components then any such L is described as an n × n matrix which we denote by [L] (or just L when the context is clear). I |1 = |1 . en be the basis of V . or bit ﬂip operation. the action of L on the ith basis vector.e. In conventional classical computing not all common operations are reversible e. Note that according to eq. Then the ith column of the matrix [L] is given by the column vector of components of L(ei ) i. Given the operation L.) Similarly we have the matrix representation: I= 1 0 0 1 . (2) Thus X has a matrix representation X= 0 1 1 0 . We can extend the action of X to general vectors “by linearity” i. In terms of components the action of X is given by matrix multiplication: X a b = 0 1 1 0 a b = b a . which we write as X: X |0 = |1 X |1 = |0 .COMSM0214 9 values. reversible operations will play a fundamental role which is why we focus on them here. . In quantum computation later. a2 . . strictly speaking this is not an equality but rather a representation. b1 AN D b2 ) is not reversible as 00 and 01 are both mapped to 00. (2) the action of X on general vectors is completely determined by its action on just the basis states only. Also it is known that universal classical computation can be performed (without any signiﬁcant loss of eﬃciency) if we allow only reversible operations (see Preskill’s notes for more details about this point). For a single bit there are only two reversible operations. . the 2-bit operation which updates (b1 . . (As for vectors and components. b2 ) to (b1 . The general notion of linear operation is described in the insert below.e. the identity operation I which “does nothing” I |0 = |0 . the entries of the matrix [L] are constructed as follows: Let e1 . The result L(v) has a column vector given by the matrix multiplication of [L] into the column vector of v.g. Linear algebra insert 3 (linear operations): A linear operation L on an n dimensional vector space V is a map L : V → V such that L(a1 v 1 + a2 v 2 ) = a1 L(v 1 ) + a2 L(v 2 ) holds for any vectors v 1 . and the NOT operation. . v 2 and (complex) numbers a1 .

Further examples of 2-bit operations are given by applying 1-bit operations to individual bits of a 2-bit string. acting on two bits b1 b2 is deﬁned as follows: if b1 = 0 then b1 b2 is left unchanged but if b1 = 1 then b2 is ﬂipped (i. For example we could reverse the roles of the two bits to get CN OT21 acting as follows: 00 01 : 10 11 −→ −→ −→ −→ 00 11 10 01 CN OT21 in which the second bit is now the control. The swap operation S maps |b1 |b2 to |b2 |b1 i. 00 and 11 are left unchanged but 01 and 10 are interchanged. We consider a few important examples. You should verify that the matrix of S is 1 0 0 0 0 0 1 0 S= 0 1 0 0 .e. Explicitly we have: 00 −→ 00 01 −→ 01 CN OT12 : 10 −→ 11 11 −→ 10 i. X is applied to b2 ).e. 0 0 0 1 The controlled-NOT operation or CN OT12 operation. We write CN OT12 with subscripts to make the asymmetry explicit. Note that the two bits play asymmetrical roles.e. The ﬁrst bit is called the control bit and the second is called the target bit.COMSM0214 10 For 2-bit strings we have more reversible operations (indeed there are 4! = 24 permutations of the four values) which now have matrix representations as 4 × 4 matrices. Thus the action of X on b2 is “controlled by” the value of b1 . You can verify that the matrix of CN OT12 is 1 0 0 0 0 1 0 0 CN OT12 = 0 0 0 1 0 0 1 0 (which has I in the top corner and X in the lower right corner). We can also write CN OT12 |b1 |b2 = |b1 |b1 ⊕ b2 where ⊕ denotes addition modulo 2. the last two strings are interchanged. . This leads to the notion of tensor products of operations.

Such operations will later play a . The same prescription holds true for general 1-bit operations A and B acting on the ﬁrst and second bits respectively to give the 2-bit operation denoted A ⊗ B.COMSM0214 11 Suppose we apply X to the ﬁrst bit of b1 b2 . start with the matrix of A and replace each entry with that entry multiplied by the matrix of B. This operation is denoted as X ⊗ I (to indicate also that b2 is left unaﬀected) and we have: 00 01 X ⊗I : 10 11 Hence its matrix is 0 0 X ⊗I = 1 0 −→ −→ −→ −→ 0 0 0 1 1 0 0 0 10 11 . we have 0 I I 0 i. a pattern of I’s determined by the entries of X. To construct its matrix. albeit represented curiously as vectors! However we can introduce more general linear operations on vectors that do not preserve the actual classical bit representations |0 and |1 .e. This matrix for X ⊗ I is constructed by starting with the matrix of X and replacing each entry with “the 2 × 2 matrix I multiplied by the numerical value of that entry”.) So far our examples of operations have been restricted to those that “make sense” on classical bits i. Letting 0 denote the 2 × 2 matrix with all zero entries. For now you should verify that this prescription gives 0 1 0 0 1 0 0 0 I ⊗X = 0 0 0 1 0 0 1 0 corresponding to doing X on the second bit. (In each case look at the structure of 2 × 2 blocks. 00 01 0 1 0 0 (as we easily check by looking at the columns). Note that this matrix has a special structure determined by X and I: look at the 2 × 2 blocks in the four corners. More example will be given later in §4.e. and 0 0 0 0 X ⊗X = 0 1 1 0 1 0 0 0 0 1 0 0 corresponding to doing X on both bits. they map bit values to bit values.1.

We can check the validity of this relation on each basis state. 2 (5) Applying H ⊗ H again and laboriously collecting and cancelling terms. (4) we get 1 H( √ (|0 + |1 )) = |0 2 so H ⊗ H applied to eq. For example we saw that CN OT12 and CN OT21 were diﬀerent (and classically “valid”) 2-bit operations. . Two of the most important are the 1-bit operations: 1 1 0 1 1 Z= H=√ . application of H ⊗ H gives 1 1 1 √ (|0 + |1 ) ⊗ √ (|0 − |1 ) = (|00 − |01 + |10 − |11 ). Note that Z |1 = − |1 which makes no sense in the context of bit values but it makes perfectly good sense in the context of vectors! Even more peculiarly we have 1 H |0 = √ (|0 + |1 ) 2 1 H |1 = √ (|0 − |1 ). (6) gives |1 |1 . we get |1 |1 i. 1 H( √ (|0 − |1 )) = |1 2 1 1 1 −1 1 √ 2 1 1 1 −1 = 1 0 0 1 . An easier way to do the last step above is to note the factorisation of eq. )) (6) Next note that the Hadamard operation is self inverse H = H −1 i. HH = I since 1 HH = √ 2 Hence from eq. (5) (looking at the ﬁrst two and last two terms) = 1 (|00 − |01 + |11 − |10 ) 2 1 (|0 (|0 − |1 ) − |1 (|0 − |1 2 1 1 = √2 (|0 − |1 ) √2 (|0 − |1 ).e. (3) 0 −1 2 1 −1 H is called the Hadamard operation. we can reverse the control/target roles of the two bits by applying H to each bit vector both before and after the CN OT action. they can serve to reveal new relationships between operations that do make sense. For example starting with |b1 |b2 = |0 |1 . 2 (4) Example 3 Although these more general operations themselves make no sense for bit values.e. CN OT21 |0 |1 as required.COMSM0214 12 fundamental role in the quantum formalism.e. 2 2 2 Then applying CN OT12 gives 1 (|00 − |01 + |11 − |10 ). We can now verify that CN OT21 = (H ⊗ H)(CN OT12 )(H ⊗ H) i.

y). We can supplement the X and Z operations with a further Y operation deﬁned as: Y = ZX = −XZ = 0 1 −1 0 .it). 2. They have elegantly simple multiplicative properties: 2 2 2 σx = σy = σz = I σx σy = −σy σx = iσz σy σz = −σz σy = iσx σz σx = −σx σz = iσy (noting the cyclic shift of x. Complex numbers are obtained by extending the real numbers with a new symbol i formally satisfying i2 = −1. A general complex number a has the form a = x + iy where x and y are real. If b = s + it is a second complex number then the sum a + b = (x + s) + i(y + t) is formed by collecting the real and imaginary parts. complex numbers will feature predominantly. Note that X 2 = Z 2 = I whereas Y 2 = −I so we introduce σy = −iY and we now have all the so-called Pauli operations: σx = X = 0 1 1 0 σy = −iY = 0 −i i 0 σz = Z = 1 0 0 −1 which occur frequently in the quantum formalism. Thus reciprocals are given by x y 1 a = 2 = 2 − 2 i 2 a |a| x +y x + y2 and general division b/a is given as the product of b with 1/a.3 Complex numbers – summary In the extension of all the above concepts to quantum physics. . A complex number a = x + iy can be represented pictorially as a real 2-dimensional vector with components (x. z labels in the last three lines).COMSM0214 13 Example 4 (Pauli matrices). The 2-dimensional plane of these vectors is called the complex plane or Argand diagram. Using i2 = −1 we get the product ab = (x + iy)(s + it) = (xs − yt) + i(xt + ys) (the term −yt arising from iy. and are called the real part and imaginary part respectively. y. The modulus |a| of a is deﬁned by |a| = x2 + y 2 . of a. The complex conjugate a of a is deﬁned by replacing i by −i: a = x − iy. Hence aa = x2 + y 2 = |a|2 so a(a/|a|2 ) = 1. Hence we give here a summary of the basic properties of complex numbers that we’ll need.

Recall that a bit was really to be thought of as a classical physical system with two chosen distinguishable states labelled 0 and 1. it may be shown (and we omit the details) that eiθ = cos θ +i sin θ so a = x+iy may be written as a = reiθ and pure phases are written eiθ . The simplest non-trivial quantum system has states lying in a 2 dimensional vector space. . the general state can be written |ψ = a |0 + b |1 where a and b are complex numbers subject to the unit length condition |a|2 + |b|2 = 1. suﬃce it to say that if two states |ψ1 and |ψ2 are not orthogonal as vectors then no physical process can distinguish them with certainty. 3 Qubits and quantum states We now begin our discussion of the basic postulates and principles of quantum theory. but in quantum theory this is no longer the case! We will discuss and elaborate on this important feature later when we consider the formalism of quantum measurements. (Soon below we will give a precise discussion of the notions of orthogonality. length and inner products for complex vectors). which will utilise the full scope of our formalism of vectors. any two diﬀerent states of a system are in principle distinguishable. A complex number of modulus 1 (so lying on the circle of radius 1 in the complex plane) is called a pure phase. We emphasise here that. θ) we have a = x + iy = r(cos θ + i sin θ) and r = x2 + y 2 = |a| is the modulus of a.COMSM0214 14 T y θ r a = x + iy Q x E In terms of polar co-ordinates (r. Choosing a pair of such orthogonal unit vectors and labelling them |0 and |1 . thus allowing only two mutually distinguishable states. Using properties of the exponential function extended to complex values. For now. matrices and complex numbers. θ is called the phase of a. in classical physics. It is a postulate of quantum theory that the states of any physical system are represented by (unit length) vectors with complex components and that physically distinguishable states correspond to orthogonal vectors.

1 Two-qubit states Moving on to larger systems. in quantum theory. The amplitudes are just the components of the vector and we also write |ψ = a b . If x0 |0 + x1 |1 and y0 |0 + y1 |1 are two single-qubit states (i.e. c d (9) . “the whole can comprise more than the sum of the parts”: the most general 2-qubit state is |ψ = a |00 + b |01 + c |10 + d |11 (8) where the coeﬃcients (called amplitudes) are complex numbers satisfying the normalisation (unit length) condition: |a|2 + |b|2 + |c|2 + |d|2 = 1. for example the spin of an electron.COMSM0214 15 We say that |ψ is a superposition of states |0 and |1 with amplitudes a and b. but our mathematical formalism allows us to abstract away from having to deal with such explicitly physical considerations.) we see that (x0 |0 + x1 |1 )(y0 |0 + y1 |1 ) = x0 y0 |00 + x0 y1 |01 + x1 y0 |10 + x1 y1 |11 (7) (constructed by juxtaposing single qubit states) is certainly an example of a two-qubit state but states of this form do not exhaust all possibilities! i. superpositions of two selected energy levels in an atom etc. the polarisation of a photon. Any quantum system. 3. |1 } is called a qubit. unit vectors in V ) then (recalling that |00 denotes |0 |0 or |0 ⊗ |0 etc. The basis states |0 . |1 are called computational basis states or standard basis states. Similarly in quantum theory systems exist with state spaces of any ﬁnite dimension but we will focus on systems comprising increasing numbers of qubits.e. with a 2 dimensional state space and with a chosen orthogonal basis {|0 . has states given by arbitrary unit vectors in the tensor product space V1 ⊗ V2 . in the classical case systems exist with any ﬁnite number of distinguishable states but we conventionally choose to represent them in terms of bit strings. intuitively. Let V denote the state space of a single qubit. As an explicit example consider two qubits. We will discuss the physical interpretation of such superposition states later in §6. Our second postulate of quantum theory tells us how to combine systems together to obtain larger systems: if system S1 had state space V1 and system S2 has state space V2 then the joint system obtained by taking S1 and S2 together. In terms of components we write a b |ψ = . There are many real physical systems that can embody the structure of a qubit.

Example 5 |ψ = 1 (|00 + |01 + |10 + |11 ) has a = b = c = d = 1 so ad − bc = 0 and 2 2 1 |ψ must be a product state. Factoring this and taking square roots gives |b|2 + |d|2 |λ|2 + 12 = 1 so |ψ is actually the product of two unit vectors (b |0 + d |1 )/ |b|2 + |d|2 and (λ |0 + |1 )/ |λ|2 + 12 . (10) To prove our theorem we have: (⇒) Suppose eq. Factorisable states of the form eq.2 Multi-qubit states The above generalises to any ﬁnite number n of qubits. . . b. . |a|2 + |d|2 = 1 and |λ|2 + 1 = 1. d are zero or not (as we will want to divide by some of them). However from eq. (7). . 1 1 On the other hand |ψ = √2 (|00 + |11 ) has a = d = √2 and b = c = 0 so ad − bc = 0 and |ψ is entangled i. xn is any n-bit string. The n-fold tensor product V ⊗ . 3. (7) are called product states (of two qubits) and states not of this form are called entangled states (of two qubits). ⊗ V (n times) has dimension 2n and is spanned by the computational basis states |x = |x1 |x2 . .e. (7). Proof: (Optional) Looking at eq. . (8) is entangled or not? Theorem 1 The state |ψ = a |00 + b |01 + c |10 + d |11 is entangled if and only if ad − bc = 0. Substituting these into eq. But the two factors above are generally not unit vectors i. the others are all similar or easier. c.COMSM0214 16 Not all such states can be factorised into the product of single-qubit states as in eq. x1 .e. From ad = bc we get a c = = λ for some λ b d so a = λb and c = λd. (8) we get |ψ = λb |00 + b |01 + λd |10 + d |11 = (b |0 + d |1 )(λ |0 + |1 ) showing that |ψ has the product state form. (9) we get |λb|2 + |b|2 + |λd|2 + |d|2 = 1. |ψ cannot be written as |v1 |v2 for any choice of 1-qubit states |v1 and |v2 . (⇐) Suppose ad − bc = 0. Then ad − bc = x0 y0 x1 y1 − x0 y1 x1 y0 = 0. We’ll need to consider several cases according to whether some of a. |ψ is unentangled if and only if the four amplitudes can be expressed as products: a = x0 y0 b = x0 y1 c = x1 y0 d = x1 y1 for some choice of x0 . y1 . |xn where x = x1 x2 . We easily verify that if |v = √2 (|0 + |1 ) then |ψ = |v |v . (10) holds. Suppose b and d are non-zero. We give just one case to illustrate the kind of argument involved. . y0 . How can we tell if a given state |ψ in eq.

COMSM0214 The general n-qubit state can be written |ψ = x 17 ax |x where x |ax |2 = 1 so we have 2n amplitudes (complex numbers) subject to the normalisation condition. |vn and |ψ is called entangled if it is not a product state. The inner product is deﬁned as a. We now make these notions formally precise by introducing a notion of inner product for vectors with complex number components.a is the complex conjugate of the complex number a. (11) as a matrix product in which the column vector of a has been complex conjugated and transposed to form a row vector. .b generally diﬀers from b. The generalisation of theorem 1 for n > 2 is more complicated than the n = 2 case. 3.b (Here we have used the fact that for any complex numbers c1 . This is a direct generalisation of the familiar notions for real 2 or 3 dimensional vectors of Euclidean geometry.) Linear algebra insert 4 (Inner products. orthogonality. We note (for later) the signiﬁcant fact that as the number of qubits grows linearly the full state description (given as the full list of amplitudes) grows exponentially in its complexity.b = a1 b1 + a2 b2 = a1 a2 b1 b2 . It turns out that we need (n − 1) algebraic equations (like ad − bc = 0 for n = 2) to fully capture the product state condition. c1 c2 = c1 c2 and c1 = c1 ). We omit the details. by a superscript T .3 Lengths and inner products Above we have referred to vectors “of unit length” and vectors being “orthogonal”. . An n-qubit state is a called a product state if it is the product of n single-qubit states |ψ = |v1 |v2 . For any (m × m) matrix M the operation of “complex conjugation followed by transposition” will be denoted by a dagger and transposition alone. c2 we have c1 + c2 = c1 + c2 . Thus T † a1 a1 = a1 a2 = a1 a2 a2 a2 .a = b1 a1 + b2 a2 = a1 b1 + a2 b2 = a. Note also the RHS expression in eq.b because b. lengths): Let a1 b1 a= b= a2 b2 be two-dimensional complex vectors. but in the complex case guarantees that certain expressions come out to be real numbers. with one slight catch for the beginner: some complex conjugation will be involved (which has no eﬀect in the case of real numbers. In fact b. (11) Note the asymmetrical appearance of complex conjugation (denoted by the over-bar) so a.a.

e.b + c2 a2 .b in which the coeﬃcients c1 .(c1 b1 + c2 b2 ) = c1 a.b of two such ket vectors is denoted as a|b i. c2 are complex numbers) as is easily veriﬁed from the deﬁnition. . a= . a and b are orthogonal if a.b2 (where c1 . we just juxtapose the bra and ket vectors to represent the matrix product of the complex conjugated row vector of a with the column vector of b. the length of a is √ |a| = a.b = c1 a1 . as follows: a|b = (a0 0| + a1 1|)(b0 |0 + b1 |1 ) = a0 b0 0|0 + a0 b1 0|1 + a1 b0 1|0 + a1 b1 1|1 = a0 b0 + a1 b1 . Inner products have a linearity property for vectors in the second a. . In our ket notation for vectors we write the column vector a0 a1 as |a = a0 |0 + a1 |1 and for inner products we introduce some further corresponding notation: the complex conjugate transpose of the ket |a is written as a so-called bra vector with the brackets reversed: a| = (a0 |0 + a1 |1 )† = a0 0| + a1 1| = a0 a1 and the inner product a. b= .b1 + c2 a.g. The above deﬁnitions and expressions generalise in the obvious way to n dimensional complex vectors e.e. For the ﬁrst slot we have an anti-linearity property: (c1 a1 + c2 a2 ). if a1 b1 . If |a = a0 |0 + a1 |1 and |b = b0 |0 + b1 |1 we can “multiply out” the compound expression a|b in the obvious way.b = slot: i ai bi etc.COMSM0214 and for example. if a1 b1 + a2 b2 = 0. Returning to our vectors. an bn then a. . . if M= then M† = 1−i 5 2 + 3i 6 − 4i 1 + i 2 − 3i 5 6 + 4i and M T = 1+i 5 2 − 3i 6 + 4i 18 (the rows of M † are the complex conjugated columns of M ). c2 must be complex conjugated if separated out as on the RHS.b = 0 i.a = |a1 |2 + |a2 |2 .

e. + x1 |1 1 is a vector in the ﬁrst slot then 2 A|B = (x0 0|1 + x1 1|1 )(a |0 1 |0 + b |0 1 |1 2 + c |1 1 |0 2 + d |1 1 |1 2 ). each of unit length. We illustrate how this works in an example. Similarly if |A is taken instead to be a vector |A = x0 |0 2 + x1 |1 2 in the second space slot then A|B is a vector in the ﬁrst space given (after a direct calculation) by (x0 a + x1 b) |0 1 + (x0 c + xd) |1 1 which is diﬀerent from the previous result. is called an orthonormal set. Now matching up the correct bra and ket slots and using the orthonormality relations for |0 . (11) for a|b . A set of pairwise orthogonal vectors. Inner products “reduce vectors back down to numbers” i.e. the kinds of state changes or physical evolution processes that .e. To emphasise the two positions in V ⊗ V we can introduce subscripts: |B = a |0 1 |0 If |A = x0 |0 1 2 + b |0 1 |1 2 + c |1 1 |0 2 + d |1 1 |1 2 . |1 . If |b is a vector in the tensor product space V ⊗ W and |a is a vector in V then the notion of inner product can be extended so that a|b is a vector in W i. |1 and collecting terms we get ﬁnally (as you should verify!) A|B = (x0 a + x1 c) |0 2 + (x0 b + x1 d) |1 2 as a vector in the second space slot. leaving a vector only in W ”.COMSM0214 19 In the last equality above we have used the fact that |0 and |1 are orthogonal and have unit length i. Example 6 Let V be the 2 dimensional space of a qubit with orthonormal basis |0 . 4 Physical operations on qubits So far we have been discussing quantum state descriptions and now we move on to consider dynamics i. not a vector. We can generalise this idea to situations involving tensor products.e. “the inner product with |b in V reduces the V -part of |a down to a number. Similarly if |a was instead a vector in W then a|b would be a vector on V . the inner product of two vectors (always from the same space) is a number. 0|0 = 1|1 = 1 (unit length) 0|1 = 1|0 = 0 (orthogonality) and we get the correct previous expression eq. Consider |A = x0 |0 + x1 |1 in V and |B = a |00 + b |01 + c |10 + d |11 in V ⊗ V .

As noted above (i. v 2 in V . we require U U † = U †U = I (where I is the identity matrix). the columns are orthonormal any any such classical . Since any unitary matrix has an inverse matrix.e.COMSM0214 20 are allowed in quantum theory. We will sometimes need to check if a given matrix is unitary or not.e. Linear algebra insert 5 (unitary operations and unitary matrices): A linear operation L on a vector space V is called unitary if it preserves inner products i. In our discussion of classical computation we saw that reversible classical operations on n-bit strings are just permutations of the set of bit strings.e. Proof: (⇐) Suppose the columns of M form an orthonormal set. it is the inner product of the ith and j th columns of M . Quantum physical operations on n qubits are described by 2n ×2n sized unitary matrices. M † M = I.v 2 for all vectors v 1 . Theorem 2 An n × n matrix of complex numbers is unitary iﬀ its columns form an orthonormal set of vectors. the matrix M † M has 1’s on the diagonal and 0’s elsewhere i. it must also correspond to a reversible operation.e. Representing these bit strings as basis vectors. A matrix U is called unitary if its complex conjugate transpose is its inverse i.e. (⇒) Suppose M is unitary so M † M = I. In our previous discussion of classical bit operations we focussed on reversible operations represented as particular kinds of linear transformations on vectors (those that take computational basis vectors to computational basis vectors). any such permutation Π corresponds to a matrix [Π] which has a single “one” in each column and all other entries are zero.e. Similarly M M † = I. The quantum case is a generalisation of this formalism: it is a postulate of quantum theory that any physical evolution of a quantum system is represented by the action of a linear unitary operation on the state space. looking at how matrix products are calculated) this means precisely that the columns of M are an orthonormal set. (Lv 1 ). Such an operation on n qubits is called an n-qubit gate. Since these columns are an orthonormal set.(Lv 2 ) = v 1 . Furthermore the position of the “one” in any column cannot be duplicated in any other column (otherwise Π would map two strings to the same string) i. Recall that the rows of M † are the complex conjugated columns of M and that in the matrix product M † M the ij th entry of the result is the ith row of M † “summed against” the j th column of M i. It can be shown that a linear operation L is unitary iﬀ its matrix representation [L] is a unitary matrix. The mathematical notion of unitarity is explained in insert 5 below.

The appropriate formalism (tensor products of operations) for calculating the eﬀect of such gates on n-qubit states was given in §2. But there are more general unitary matrices than these.2 and we re-iterate the essential features here.1 Operations on 1 or 2 qubits of an n-qubit state Instead of applying (large) n-qubit operations to an n-qubit state “globally” we will generally be interested in manipulating only one or two qubits in any given step. Thus U ⊗ I has matrix x0 0 y 0 0 0 x0 0 y 0 x0 I y 0 I U ⊗I = = x1 0 y 1 0 . We can perform the same calculation with matrix multiplications and column vectors as follows. as U ⊗ I |ψ = U ⊗ I(a |00 + . . . + d(y0 |0 + y1 |1 ) |1 . This is the quantum analogue of the classical locality condition: that n-qubit states are successively updated by applying operations that act on only a few qubits each. Collecting terms we get U ⊗ I |ψ = (ax0 + cy0 ) |00 + (bx0 + dy0 ) |01 + (ax1 + cy1 ) |10 + (bx1 + dy1 ) |11 . which are all allowed as operations in the quantum formalism. Suppose U |0 = x0 |0 + x1 |1 i. .e. . 4. Previously we introduced the tensor product A ⊗ B of matrices A and B: A ⊗ B is constructed by taking each entry aij of A and replacing it by the full matrix B multiplied by that entry value. + d(U |1 ) |1 = a(x0 |0 + x1 |1 ) |0 + .COMSM0214 21 reversible operation has a unitary matrix. . the matrix of U is [U ] = U |1 = y0 |0 + y1 |1 x0 y0 x1 y1 . d |11 ) = a(U |0 ) |0 + . Then we can explicitly compute U ⊗ I |ψ using the linearity property of the operation. Suppose we wish to apply a 1-qubit unitary operation U to the ﬁrst qubit of the 2-qubit state |ψ = a |00 + b |01 + c |10 + d |11 . x1 I y 1 I 0 x1 0 y 1 . . For example X= 0 1 1 0 is the unitary matrix of the classical bit ﬂip operation and the matrices in eq. We write this operation at the 2-qubit level as U ⊗ I to represent the fact that the second qubit receives the identity operation. (3) are unitary (1-qubit) operations allowed in the quantum formalism which are not interpretable in the classical formalism.

it is not reversible as the information of a. Thus. Furthermore after the test the qubit’s state is no longer a |0 + b |1 but it has been “collapsed” to be |0 or |1 corresponding to the seen output value. being just a second independent sample from the probability distribution. from a |0 + b |1 to either |0 or |1 is not a unitary process (e. These two applications of gates to diﬀerent qubits commute: it is readily seen from the deﬁnitions that U ⊗ W = (U ⊗ I)(I ⊗ W ) = (I ⊗ W )(U ⊗ I). Similarly we can calculate I ⊗ U |ψ which is generally the matrix is now x0 y0 x1 y1 U 0 I ⊗U = = 0 0 0 U 0 0 so 0 y0 0 a x0 0 y 0 b 0 y1 0 c x1 0 y 1 d x0 a + y 0 c x0 b + y0 d = x1 a + y 1 c x1 b + y1 d 22 diﬀerent from the above. Thus in quantum theory the state of a system can change in two distinct ways: by unitary evolution or by the process of measurement. e. x0 y0 x1 y1 More generally if U and W are two 1-qubit operations then U ⊗ W is the 2-qubit gate corresponding to the instruction“apply U to the ﬁrst qubit and W to the second qubit”. The corresponding situation in quantum theory is drastically diﬀerent! We have a mathematical formalism for the process of quantum measurement which diﬀers considerably from the unitary physical evolution process that we’ve been discussing so far.g. b is no longer present in the output. . if we are given a second copy of the same state |ψ then the test may output a diﬀerent answer. described below. about the initial a.COMSM0214 |ψ has column vector (a.g. For example if we are presented with a bit we can discover its value (0 or 1) while leaving it intact. Indeed 0 0 0 0 . b values! This state transformation. This test always returns a value 0 or 1 and the answer is given only probabilistically: 0 is seen with probability |a|2 and 1 is seen with probability |b|2 . b. c. According to the postulates of quantum measurement theory (developed below) the only thing we can do to get information about the state is to “perform a test” or “make a measurement” to see if the qubit value is 0 or 1. d)T x0 0 U ⊗ I |ψ = x1 0 giving the same result as before. the qubit’s state is irrevocably destroyed and we can gain no further information from it. 5 Quantum measurements In classical physics the state of any given physical system can be fully determined by suitable “measurements” on the system. (Recall that |a|2 +|b|2 = 1 for any qubit state). also it is a probabilistic process). having performed the test. Suppose we are presented with a qubit in some (as yet unknown) state |ψ = a |0 + b |1 .

no one knows! But the really astonishing thing is that this formalism of complex vectors. In this course.COMSM0214 23 Remark: Most quantum theory texts describe a more general notion of quantum measurement. relative only to the computational basis. relative to only the |0 . The relationship between probabilities of outcomes in a measurement and amplitudes in the superpostion is known as the Born rule (after the physicist Max Born). allowing measurement “relative to” any orthonormal basis. |ψ = 10 10 2 2 4 Note that the amplitudes are correctly normalised: 3 10 2 4 + − 10 2 i + 2 2 1 + − 2 2 √ 1 + 3i + 4 2 = 1. estimating the probabilities |a|2 and |b|2 to any desired accuracy from statistics of repeated samplings of the distribution. 5. . In contrast to classical measurements. You may ask: where do all these crazy rules come from!? Well. in summary. After the measurement the state of the system is no longer |ψ but it is “collapsed” to |x where x is the seen outcome. they are generally invasive. Example 7 Consider the following 3-qubit state: √ 4 i 1 1 + 3i 3 |000 − |010 + |011 − |100 + |111 . combined with prior actions of unitary operations. actually works in the real physical world to very accurately describe a huge range of actual physical phenomena!! The above features of measurement for one qubit generalise to the case of n qubits. quantum measurements have only probabilistic outcomes. |1 basis).e. we measure not all. but only some subset of them. alas.1 The extended Born rule We will often use a further important generalisation of the Born rule: given a state of many qubits. unitary operations and probabilistic collapses etc. unavoidably destroying the input state and they reveal only a rather small amount of information about the (now irrevocably lost) input state’s identity! Note that if the state was not collapsed on measurement (but the output was still probabilistic as above) we could get a lot more information. If |ψ = x ax |x (12) is an n-qubit state (so x ranges over all 2n n-bit strings) then we can measure it in the computational basis obtaining an n-bit string x with probability Pr(x) = |ax |2 .e. we will use only the simpler notion of measurement i. But it can be shown that this more general notion is equivalent to our simpler one (i. Before describing this extended Born rule in generality we will introduce it via an example.

Then upon measurement. Any computational basis state |z of m + n qubits can be written as |z = |x |y where z. To calculate the eﬀect of this we re-express |ψ by collecting together all terms having ﬁrst qubit |0 and |1 respectively (these being the possible measurement outcomes on qubit 1): |ψ 3 4 i = |0 10 |00 − 10 |10 + 2 |11 ≡ |0 |ψ0 + |1 |ψ1 . After the measurement |ψ is collapsed into the state |j |ψj / ψj |ψj corresponding to the seen outcome i. |ij |ψij / ψij |ψij . For notational convenience suppose these are the m leftmost qubits. Thus √ example in the above. A more compact way of obtaining these coeﬃcient vectors is to use the partial inner product (cf example 6) on the ﬁrst qubit of |ψ : |ψ0 = 1 0|ψ 123 |ψ1 = 1 1|ψ 123 Then the outcome of the measurement is j = 0 or j = 1 with probability Pr(j) = ψj |ψj given by the squared length of the corresponding coeﬃcient vector. the squared length of the corresponding coeﬃcient vector. For example in the above. and the post-measurement state is “|ij |ψij re-normalised” i.10. outcome ij is seen with probability Pr(ij) = ψij |ψij . + |11 √ 1+ 3i 4 |1 As above these coeﬃcient vectors can be compactly computed using partial inner products (now on qubits 1 and 2): |ψij = 12 ij|ψ 123 .e. x and y are respectively (m + n)-bit.11: |ψ 3 4 i = |00 10 |0 + |01 − 10 |0 + 2 |1 + |10 − 1 |0 2 ≡ |00 |ψ00 + |01 |ψ01 + |10 |ψ10 + |11 |ψ11 . Suppose instead that we wish to measure qubits 1 and 2 of |ψ in the computational basis. the probability of getting outcome 1 is for 1 ψ1 |ψ1 = | − 1 |2 + | 1+4 3i |2 = 2 and if this outcome is obtained the post-measurement 2 state is √ 1 1 1 + 3i √ |1 |ψ1 /( √ ) = − √ |100 + |111 2 2 2 2 which is correctly normalised to have length 1.COMSM0214 24 Suppose we wish to measure just the ﬁrst qubit in the computational basis. 4 2 We now state the general form of the extended Born rule.y axy |x |y ≡ x axy |y |x |ψx .01. m-bit and n-bit strings with z being the concatenation z = xy of x and y. we select all terms of |ψ with the seen outcome in qubit 1 and re-normalise this vector to have length 1. In fact their squared lengths add to 1. + |1 − 1 |00 + 2 √ 1+ 3i 4 |11 Note that the coeﬃcient vectors |ψ0 and |ψ1 have lengths less than 1. outcome 10 is seen with probability 1 ψ10 |ψ10 = 1 and the post-measurement state is then |10 (− 1 |0 )/( 2 ) = − |100 .e. Hence we can write |ψ = = z x az |z = |x y x. In this case we collect together all terms of |ψ corresponding to the four possible outcomes 00. Let |ψ be a state of m+n qubits and suppose we wish to measure m qubits in the computational basis.

In the ﬁrst case we say that the two paths interfere constructively whereas in the latter case we say that they interfere destructively. 6 Quantum interference How should we intuitively think of a superposition of |0 and |1 ? According to the Born rule. To illustrate the formalism consider the simple quantum process of applying H twice to |0 : . consider the equal superposition state |ψ0 = √2 (|0 + |1 ) and the Hadamard gate: 1 1 1 H=√ 2 1 −1 1 1 which maps |0 to √2 (|0 + |1 ) and |1 to √2 (|0 − |1 ). On the other hand the total amplitude to go to |1 also has two contributions but these cancel: a(c) + a(−c) = 0. However it is important to emphasise that a superposition state cannot be interpreted as such a probabilistic mixture in more general quantum processes! To 1 see an explicit example. a qubit in a superposition state a |0 +b |1 behaves – for the purposes of measurement – just like a qubit which has been prepared in state |0 or |1 with probabilities |a|2 and |b|2 respectively. In an intuitive sense we think of |ψ0 = √2 (|0 + |1 ) as “simultaneously having both |0 and |1 present” rather than just a probabilistic choice √ of one or the other. Then outcome x is seen with probability Pr(x) = ψx |ψx = |axy |2 y and if this outcome occurs then the post-measurement collapsed state is “|x |ψx renormalised” i.e. H is also its own inverse so H applied to |ψ0 gives |0 . the equal probabilistic mixture of |0 and |1 will behave diﬀerently 1 from the equal superposition state. we will see outcome 0 with certainty. The vectors |ψx have lengths less than 1. These notions of “interfering paths” form the basis of Feynman’s sum-over-paths description of quantum mechanics which we now brieﬂy outline.COMSM0214 25 where we have collected all terms corresponding to each possible measurement outcome x: |ψx = axy |y = m x|ψ m+n y where the latter partial inner product is over the measured qubit slots (here the m leftmost qubits). |0 (starting with amplitude a = 1/ 2) √ is transformed to |0 with amplitude c = 1/ 2 and to |1 with amplitude c whereas |1 √ (starting with amplitude a = 1/ 2 also) is transformed to |0 with amplitude c and to |1 with amplitude −c. On the other hand if we prepare either |0 or |1 and conduct the same process then in each case we will see outcome 0 and 1 with probabilities half i. Thus the total amplitude to go from |ψ0 to |0 is made up of two “paths”: a(c) + a(c) = 1 which add. |x |ψx / ψx |ψx in which the measured qubits acquire their seen values.e. When we apply H to |ψ0 . Hence if we prepare a qubit in state |ψ0 . apply H and then measure it.

The process “needs to know about” all paths and how they interfere! . For probabilistic trees we can simulate the whole process by walking through some single path of the tree so long as we make an appropriate probabilistic choice of direction at each node along the way. (ii) each ﬁnal basis state has an amplitude given by the sum over all paths from the start to it. It turns out to be an alternative equivalent description of the calculations involved in multiplying gate matrices and applying the Born rule for a measurement on the ﬁnal state of the process. This is the Feynman sum-over-paths formulation of quantum processes. (iii) the probability for the transition from initial |0 to a ﬁnal basis state is the modulus square of the sum over all paths to it. But for quantum amplitude trees this is not possible! In the above example there are non-zero single paths from initial |0 to ﬁnal |1 yet this transition is forbidden. For ﬁnal |0 the two paths interfere constructively: | √2 √2 + √2 √2 |2 = 1 1 1 1 1 whereas for ﬁnal |1 the two paths interfere totally destructively: | √2 √2 − √2 √2 |2 = 0 and this transition is forbidden.COMSM0214 26 |0 © 1 √ 2 d d d apply H |0 c + 1 √ 2 |1 apply H d d q d ¨¨ % ¨ 1 1 √ ( √ 2 2 ¨ ¨¨ 1 √ 2 |0 + |1 ) + 1 1 √ ( √ 2 2 |0 − 1 √ 2 |1 ) = |0 We think of this as a process of “transitions between basis states |0 and |1 with prescribed amplitudes and we depict it as a branching tree. not probabilities: E |0 1 √ 2 |0 1 √ 2 |0 d d d 1 d√ 2 d d d |1 d√ 1 d2 d d d 1 √ d 2 d d E 1 √ − 2 |1 The rules for accumulating amplitudes are just like those for probabilities: (i) each path has an amplitude given by the product of numbers along the path. just like a probabilistic process but the branches are labelled by amplitudes. As an illustration. in our example above there are two paths to go from initial |0 to ﬁnal 1 1 1 1 |0 or ﬁnal |1 .

COMSM0214 27 Just as in the example above. and a short distance behind it. the particle is sometimes registered at x) but if we open both holes the probability to impact at x is zero i. .e. . . Remark: in physics this phenomenon of interference of diﬀerent paths for a single particle is well illustrated by the famous double slit experiment – see for example R. Bell in 1965). Then we ﬁre single particles at the ﬁrst screen and register where they impact on the second screen. It also provides a very good instructive example of the extended Born rule in action in conjunction with the application of various unitary quantum gates. S. n qubits in state |0 . He ﬁrst drew attention to this feature (in a form now known as the Einstein-Podolsky-Rosen (EPR) paradox) in a famous paper in 1935 and subsequently never gave up his skepticism of quantum theory. There are locations x on the second screen with the following property: if we close either of the two holes then the probability for the particle to impact at x is non-zero (i. The Feynman sum-over-paths rules then correctly reproduce the calculation of matrix multiplication of the corresponding unitary matrices to determine the amplitudes in the ﬁnal state. We set up a screen with two holes in it. Einstein was very unhappy about this. the particle is never seen there! This actually happens in real physical experiments! Classically this is inexplicable: the probability to impact at x should be the sum of the probabilities via the two holes. |0 can be represented as a branching tree of amplitudes.) on it at each position. causing him to doubt the completeness and correctness of quantum mechanics as a physical theory. volume 3. such that each column contains each n-bit string once. P. 7 Quantum non-locality One of the strangest and most controversial features of quantum theory (actually a property of entanglement) is that the formalism implies the presence of “instantaneous non-local eﬀects” in spatially distributed physical systems. referring to it as a “spooky action at a distance”. any quantum process of applying a sequence of unitary gates to an initial starting state of say.e. Thus even if the mathematical formalism of quantum theory turns out not to be our ﬁnal correct physical theory these “spooky nonlocal inﬂuences” are here to stay as actual physical reality! We will now describe one way in which this quantum nonlocality is manifested. The quantum state at the k th stage of the process corresponds to the nodes in the k th column of the tree. we use the “square of the sum” rather than the “sum of squares”. In the 1950’s and 1960’s the essential ingredients were re-cast in conceptually simpler ways especially in the so-called Bell inequalities (ﬁrst introduced by J. Since then the validity of these predicted eﬀects has been veriﬁed many times in actual experiments. The lines connecting column k to column k + 1 are labelled by the amplitudes of the k th unitary gate acting on the corresponding basis state in the k th column.e. chapter 1 for a discussion. Feynman’s Lecture Notes in Physics. In the quantum formalism we have amplitudes (prior to probabilities) to impact at x via each hole and these interfere destructively before we square the total value to get the probability i. erect a second full screen which can detect the impact of a particle (photon or electron etc. The nodes are now labelled by n-bit strings organised in columns.

However the mathematical descriptions of these preparations are not physically observable features: given a particle in some quantum state. having physically observable eﬀects that cannot be explained without some “spooky action at a distance”. their joint state |ψ is not expressible as a product of a qubit state for A with a qubit state for B. Nowadays this kind of state can be routinely manufactured in the laboratory with pairs of photons or electrons etc.COMSM0214 Consider the entangled 2-qubit state 1 |ψ = √ (|00 + |11 ) . held respectively on the left by Alice (A) and on the right by Bob (B). Even though the two particles are widely separated and have no “physical connection”. The physical fact that A and B’s outcomes are perfectly correlated is in fact not mysterious – it can be explained without invoking any kind of instantaneous action at a distance: we simply say that at the source of creation of |ψ (when the two particles were together) they were each instilled with an equal but randomly chosen bit value which they carried away and later gave up as the measurement result i. can be claimed to be just a quirk of the mathematical formalism and not a physically real inﬂuence! We now develop a more complicated scenario involving quantum processes with |ψ which does turn out to be manifestly mysterious! i. If B now immediately measures his qubit he will see outcome 0 or 1 which is always certainly the same as the outcome A got i. According to the extended Born rule she will see outcome 0 or 1 with equal probabilities half and Bob’s particle will be “instantaneously collapsed” into state either |0 or |1 corresponding to A’s seen outcome. Suppose Alice measures her qubit (in the computational basis). In fact it can be shown that preparations 1 and 2 above cannot be distinguished by any quantum process at all – they actually give identical probability distributions of outcomes for any measurement whatever. the two outcomes are perfectly correlated.e. the (correlated) measurement results were already (randomly) chosen at the time of creation of |ψ ! But one may still object: according to the Born rule A’s measurement instantaneously changed the state of B’s particle from preparation 1: “the right half of |ψ ” to preparation 2: “a probabilistic choice of |0 or |1 chosen with probabilities half”. 2 28 We imagine that the two qubits (realised as states of two physical particles) are widely separated in space. we are unable to learn the identity of that state by any physical process (in contrast to classical physics states). B will see a bit value 0 or 1 with equal probabilities half. that move oﬀ in opposite directions from the common source of their creation and interaction to produce |ψ . Thus this “instantaneous change of state” in the above example.e. Example 8 Let |ψ be the entangled 2-qubit state |ψ = 1 √ (|00 2 + |11 ) and consider the . We will use the results of the following example.e. Thus at ﬁrst sight we may conclude that A’s measurement must have instantaneously physically inﬂuenced B’s particle (by determining B’s outcome) but this conclusion here is not justiﬁed! Whether or not A performed her measurement.

(Note that “outcomes diﬀer” means “outcome is 01 or 10”). The Born rule immediately gives (as you should verify): Fact 1: if we measure either one of the qubits in the computational basis we will get output 0 or 1 with equal probabilities of half. cos θ − sin θ sin θ cos θ . We get: Fact 2: if we measure both qubits then Pr(outcomes diﬀer) = sin2 (α − β). Let MA (α) denote the following operation for Alice: apply U (α) to her qubit and measure it in the computational basis. MB (−30◦ ): Pr(diﬀer) = sin2 30◦ = 1/4. (2) MA (0◦ ).10. These approaches all give the same probability distribution of 2-bit outcomes. We have four cases of A and B’s sequences corresponding to pairs of settings: (1) MA (0◦ ). Alice will only use settings α = 0◦ or 30◦ and Bob will only use settings β = 0◦ or −30◦ . Similarly MB (β) for Bob. The sequences will diﬀer in about 1 in 4 places. If we apply U (α) ⊗ U (β) to |ψ a direct calculation (which you should verify) gives |ψαβ = U (α) ⊗ U (β) |ψ 1 = √2 (cos(α − β) |00 − sin(α − β) |01 + sin(α − β) |10 + cos(α − β) |11 ) . We can do this either by applying the Born rule directly for the four possible outcomes 00. MB (β) with each choosing one of their allowed settings (for the whole sequence).11 or else measuring one chosen qubit ﬁrst (applying the extended Born rule) and then applying the Born rule again to the postmeasurement state of the remaining qubit. We now describe our “mysterious situation”. Thus this measurement can be implemented locally. . Suppose A and B have many |ψ states and consider a long sequence of A and B doing MA (α). by A and B each separately performing a 1-qubit measurement. For long sequences probabilities will reﬂect the frequencies of occurrence of 0’s and 1’s. 29 (Here we have used the trigonometric identities cos(α − β) = cos α cos β + sin α sin β and sin(α−β) = sin α cos β −cos α sin β. By fact 1 all individual sequences obtained will be uniformly random sequences of 0’s and 1’s but by fact 2 A and B’s random sequences will display some correlations.COMSM0214 single-qubit unitary operation (called “rotation by θ”) given by U (θ) = Thus U (θ) |0 = cos θ |0 + sin θ |1 U (θ) |1 = − sin θ |0 + cos θ |1 . Consider now measuring both qubits. A and B’s sequences will be the same sequence. MB (0◦ ): Pr(diﬀer) = sin2 0◦ = 0.01.) Note that this operation may be implemented locally by Alice applying U (α) and Bob applying U (β).

But by (4) these sequences actually diﬀer in about 3 in 4 places – measurably more (for long sequences) than the expected 2 in 4 places! We conclude that the choice of settings on one side must have inﬂuenced the outcomes on the other side i. By (2) and (3) the sequences for MA (30◦ ) and MB (−30◦ ) must each diﬀer in about 1 in 4 places from S and hence in about 1 in 2 or fewer from each other (fewer possible here because they may both diﬀer from S at some same places). Outcomes are generally probabilistic with a joint probability distribution Pr(xy|αβ). The sequences will diﬀer in about 1 in 4 places. Note that all sequences above are always uniformly random as sequences of 0’s and 1’s i.COMSM0214 (3) MA (30◦ ).e. the locality assumption implies the existence of a sequence S such that the ±30◦ sequences each diﬀer in about 1 in 4 places from S (and hence fewer than 2 in 4 places from each other). performing measurements on a physical system that extends over both regions. Each participant A or B sees locally the marginal distribution: A: Pr(x|αβ) = y Pr(xy|αβ) B: Pr(y|αβ) = x Pr(xy|αβ).g. Hence although the necessity of non-local eﬀects is manifested. This is our “locality assumption” and we now show that it leads to a contradiction.e. angle choices). β. MB (−30◦ ): Pr(diﬀer) = sin2 60◦ = 3/4. giving measurement outputs x and y respectively. They are able to make local choices of measurement settings α and β respectively (e. This joint probability is prescribed by the laws of some physical theory. Let S denote the common sequence that would be obtained by A and B in (1) if they had both chosen angle setting zero. or maybe determined from repeated actual experiments. We will be interested in these local distributions and their correlations.e. asking especially whether the locally observed outcomes can be explained by purely local inﬂuences or . y given settings α. the ±30◦ -setting sequences cannot be unaﬀected by the choice zero versus non-zero angle setting on the other side and we have our “spooky action at a distance”. these eﬀects cannot be used to instantaneously communicate or send a message – A’s choice of settings has no eﬀect whatever on any statistics of measurement on B’s side taken by itself (and viceversa). our non-local action always replaces one uniformly random sequence by another uniformly random sequence so the participants will not notice any eﬀect locally: the inﬂuence of the non-local eﬀects appears only in correlations between A and B’s sequences which can be noticed only if A and B later communicate or come together. Remark (Optional but interesting!): we can formalise the notions of non-locality and no-signalling for any physical theory in the following general conceptual framework. MB (0◦ ): Pr(diﬀer) = sin2 30◦ = 1/4. The sequences will diﬀer in about 3 in 4 places!! 30 Suppose the measurement outcomes at A (respectively B) are not inﬂuenced by the mere choice of settings at B (respectively A). By the locality assumption this must remain true even if neither A nor B actually chose to use angle zero! The sequences for MA (30◦ ) and MB (−30◦ ) must each be consistent with the possibility that the other party chose angle zero even if they did not so choose! i. the probability of getting x. (4) MA (30◦ ). We have two participants A and B in spatially separated regions.

2 Suppose Alice has another qubit in some state |α and she wants to transfer this qubit state to Bob. (14) over y then using y Pr(y|βe) = 1 we see that Pr(x|αβe) = Pr(x|αe)) is independent of β. Thus the theory is called no-signalling if Pr(x|αβe) is independent of β and Pr(y|αβe) is independent of α. non-local theories: the theory is called local if the joint distribution of xy outputs is explainable by putting together independent local eﬀects (including the previously shared e used locally on each side) i. For locality.g.e.g. (13) Local vs. to generate future probabilistic outputs such as our x and y they may have set up some shared probabilistic data in a variable e with distribution Pr(e) which they can both use as an input to generate later probabilistic outputs. Note that locality implies no-signalling e. if we sum eq. In particular. Pr(x|αe). We allow A and B to have communicated in the past e. Pr(x|αe) and Pr(y|βe) making these relations hold.e. 8 Quantum teleportation Consider again our distantly separated participants Alice and Bob who each possess one qubit of the entangled state 1 |ψ = √ (|00 + |11 ). Pr(y|βe) can correctly reproduce the quantum distributions Pr(xy|αβ) of the protocol. we require the joint probability to have the product form: Pr(xy|αβe) = Pr(x|αe) Pr(y|βe) (14) and then also Pr(xy|αβ) = e Pr(e) Pr(x|αe) Pr(y|βe). No-signalling theories: if the x distribution Pr(x|αβe) depends on the choice of β then B can signal to A by choosing diﬀerent β settings which A can notice by statistics of her local x outputs. How can she achieve this transfer? She may not even know the identity of the state |α and according to quantum measurement theory she is unable to learn more than a small amount of information about it before totally destroying it! She can place the (physical system embodying the) qubit state in a “box” and physically carry it . Thus we will have Pr(xy|αβ) = e Pr(e) Pr(xy|αβe). no choices of distributions Pr(e).COMSM0214 31 whether they are inﬂuenced by settings or actions on the other side. It can also be shown that quantum theory is a no-signalling theory which is perhaps surprising once we know it is non-local! Our quantum protocol above demonstrated no-signalling only for its particular situation (in fact local outcomes were always uniformly random. Our above quantum protocol can be easily recast to show that quantum theory is a non-local theory i. there must exist three distributions Pr(e). Otherwise the theory is called non-local. they may have met and exchanged information before becoming spatially separated. independent of the choice of settings of the other party) but it can be proved to hold generally for any spatially distributed quantum process whatever.

10 or 11. (ii) and (iii) is called “performing a Bell measurement” on the two qubits (named after the physicist J. Bell). If |α = a |0 + b |1 then we explicitly have |α |ψ 1 = (a |0 + b |1 ) √2 (|00 + |11 ) a a b = √2 |000 + √2 |011 + √2 |100 + b √ 2 |111 . not on the a.2 and it is in a state that’s a ﬁxed transform of |α . Using subscripts to label the qubits the starting state can be written |α 1 |ψ 23 with 1. As we’ll see precisely in a moment. the state transfer from A to B is achieved “without the state having to pass through the space in between” in the following sense: the transference is unaﬀected by any physical process whatever that takes place in the intervening space. Bob’s qubit 3 is now disentangled from 1. By calculating the eﬀect of these three operations on eq. Let qubit 1. . recalling that |a|2 + |b|2 = 1). the choice of transform depending only on the measurement outcome and not on the identity of |α (i.2 in A’s possession and 3 in B’s possession.COMSM0214 32 across to Bob. (15) we see that each 2-bit string is obtained with equal probability of 1/4 (irrespective of the values of a and b. In fact if the measurement outcome was ij the Bob’s qubit has state X i Z j |α . 01. Note that this is also a feature of the entanglement of |ψ : in the previous section we argued that there is some kind of “non-local connection” between the two particles but the entangled state remains entirely unaﬀected by any physical process occurring in the space in between. (i) Alice applies CN OT to her two qubits 1 and 2 (with 1 being the control and 2 the target). The combination of (i).e. But is there any other way? What if the space region in between A and B is a hostile and dangerous place? Quantum teleportation provides an alternative method for state transfer that utilises the entanglement in the state |ψ . qubit 2 and qubit 3 denote respectively Alice’s input qubit (in state |α ). Alice’s qubit of |ψ and Bob’s qubit of |ψ . (iii) Alice measures her two qubits (in the computational basis) to obtain a 2-bit string 00. b values).e. (15) The protocol for quantum teleportation comprises the following ﬁve steps (i) to (v). it can change only by physical actions on the particles themselves. but speaking intuitively now. Furthermore after the measurements in (iii) we have the following post-measurement states (as you should calculate): mmt outcome post-mmt state 00 |00 |α 01 |01 Z |α 10 |10 X |α 11 |11 XZ |α i. (ii) Alice applies H to her qubit 1. S.

. the inverse of X i Z j ) to his qubit which is then guaranteed to be in state |α . .e. . . . she sends him 2 bits of classical information). . Bob applies Z j X i to his particle which is then guaranteed to be in the same state as Alice’s original input qubit. The ﬁgure shows a spacetime diagram with the en1 tangled state |ψ = √2 (|00 + |11 ) created at t0 and subsequently distributed to A and B. Bob E Figure 1: Quantum teleportation. . . In ﬁgure 2 we give an alternative depiction of the protocol as a network of quantum gates. This representation is perhaps more pertinent to computation (rather than communication). This completes the teleportation of |α from Alice to Bob. . Bell Mmt two bits ij transmitted Q Q Q Q ZjXi t2 TIME T t1 A’s input . . On reception at time t2 .COMSM0214 33 (iv) Alice sends the 2-bit measurement outcome ij to Bob (i. . After stage (iii) she is left with only a 2-bit string that has always been chosen uniformly at random. (v) On receiving ij Bob applies the unitary operation Z j X i (i. . qubit . using teleportation to transfer qubits between diﬀerent parts of a quantum memory.e. . . . At time t1 Alice performs a Bell measurement on the joint state of her input qubit and her qubit from |ψ and sends the outcome ij to Bob. . . The whole protocol is shown diagrammatically in ﬁgure 1. . Alice Q k u Q . . Note that no remnant of any information about |α remains with Alice. B’s output qubit Q T . . . t0 |ψ state created SPACE . . .

COMSM0214

34

|α

E E

H CN OT

measure qubits

ij

|ψ

d

c c

d d d

E

ZjXi E

|α

Figure 2: A quantum network for teleportation. The diagram is read from left to right. Horizontal lines represent qubits in a quantum memory. As a result of the above sequence of operations the qubit state is transferred from the top line to the bottom line. We conclude this section with a few further remarks about the teleportation process, for contemplation. • Unlike “star-trek” teleportation, the physical system embodying |α is not transferred from A to B. Only the “information” of the state’s identity is transferred, residing ﬁnally in a new physical system i.e. the system that was initially Bob’s half of |ψ . • Before A’s measurements in (iii) Bob’s qubit has preparation: “the right half of |ψ ”. After A’s measurement Bob’s qubit has preparation: “ one of the four states |α , Z |α , X |α or XZ |α chosen uniformly at random”. It can be shown that for any measurement process on Bob’s qubit, these two preparations give identical probability distributions of outcomes so Bob cannot notice any change at all in his qubit’s behaviour as a result of A’s measurements. He can reliably create the qubit state |α only after receiving the ij message from Alice. • Figure 1 highlights one of the most enigmatic features of the quantum teleportation process. The question is this: Alice succeeds in transferring the quantum state |α to Bob by sending him just two bits of classical information. Clearly these two bits are vastly inadequate to specify the state (whose description depends on continuous parameters) so “how does the remaining information get across to Bob?” What carries it? What route does it take? Usually when information is transferred from one location to another, it requires a channel for its transmission! But in ﬁgure 1 there is clearly another route connecting Alice to Bob (apart from the channel carrying the two classical bits) and it does indeed carry a qubit – it runs backwards in time from Alice to the creation of the |ψ state and then forwards in time to Bob. Hence it is tempting to assert that most of the “quantum information” of |α was propagated along this route, ﬁrstly backwards in time and then forwards to Bob! In this view, at times between t0 and t1 this part of the state’s information was already well on its way to Bob even though Alice had not yet performed her measurement! Such statements may appear paradoxical but further consideration shows that, as an interpretation, this view is fully consistent and sound. Whether or not you accept this view as a correct description of what actually happens

COMSM0214 in the real physical world, is only a matter of personal preference!

35

9

Quantum computation – the circuit model

Classical computation can be represented in many diﬀerent but essentially equivalent ways. We have the basic Turing machine and circuit (or gate array) models, useful for studying fundamental properties of the notions of computation and complexity, and a variety of programming languages incorporating higher level constructs useful for developing actual practical algorithms to solve computational problems. In the quantum case we will adopt a quantum generalisation of the circuit model; this provides the simplest and most intuitive passage to the notion of a quantum computation. Thus we begin by reviewing the ingredients of the classical circuit model. Computational tasks: The input to a computation will always be taken to be a bit string. The input size is the number of bits in the bit string. For example if the input is 0110101 then the input size is 7. A computational task is not just a single task such as “is 10101 prime?” (where we are interpreting the bit string as an integer in binary) but a whole family of similar tasks such as “given an n-bit string A (for any n), is A prime?” The output of a computation is also a bit string. If this is a single bit then the computational task is called a decision problem. Let B = {0, 1} and let Bn denote the set of all n-bit strings. Let B ∗ denote the set of all n-bit strings, for all n i.e. B ∗ = ∪∞ Bn . A subset of B ∗ is called a language. n=1 Thus a decision problem corresponds to the recognition of a language viz. those inputs that give output 1. For example primality testing as above is the decision problem of recognising the language L ⊆ B ∗ where L is the subset of all bit strings that represent prime numbers in binary. Many computational tasks have outputs that are bit strings of length > 1. For example FACTOR(x) has input bit string x and outputs a bit string y which is a factor of x (interpreting x and y as integers in binary). Circuit model of classical computation: For each n the computation with inputs of size n begins with the input string x = b1 . . . bn extended with a number of extra bits all set to 0 viz. b1 . . . bn 00 . . . 0. These latter bits provide “extra working space” that may be needed in the course of the computation. A computational step is the application of a designated Boolean operation (or Boolean gate) to designated bits, thus updating the total bit string. We restrict the Boolean gates to be AND, OR, NOT and COPY (which maps b to the 2-bit string bb). It can be shown that these operations are universal i.e. any Boolean function f : Bm → Bn at all can be constructed by the sequential application of just these simple operations. The output of the computation is the value of some designated subset of bits after the ﬁnal step. For each input size n we have a so-called circuit Cn which is a prescribed sequence of computational steps. Cn depends only on n and not on the the particular input x of size n. In total we have a circuit family (C1 , C2 , . . . , Cn , . . .). We think of Cn as “the computer program” for inputs of size n. For technical reasons (that we will not fully

COMSM0214

36

elaborate here) we impose a further condition on the circuit family: it should be a socalled uniform circuit family in the following sense. The descriptions of the circuits Cn should be generated in a suitably simple computational way as a function of n. More precisely there is a (log-space) Turing machine which on input 11 . . . 1 of length n outputs a description of Cn . This prevents us from “cheating” by coding the answer to a hard computational problem (or even an uncomputable problem!) into the changing structure of Cn with n. Circuit model for quantum computation: The above formalism can be generalised in an obvious way to the quantum setting. For inputs of size n the starting string b1 . . . bn 00 . . . 0 is replaced by a sequence of qubits in the corresponding computational basis state |b1 . . . |bn |0 |0 . . . |0 . A computational step is the application of a quantum gate which is a prescribed unitary operation applied to a prescribed choice of qubits. For each input size n we have a quantum circuit Cn which is a prescribed sequence of such steps. The output of the computation is the result of performing a quantum measurement (in the computational basis) on a speciﬁed subset of the qubits (this being part of the description of Cn ).

Remark: More generally we could allow measurements along the way (rather than only at the end) and allow the choice of subsequent gates to depend on the measurement outcomes. However it can be shown that this further generality adds nothing extra: any such circuit can be re-expressed as an equivalent circuit in which measurements are performed at the end only.

A quantum computation or quantum algorithm is deﬁned by a (uniform) family of quantum circuits (C1 , C2 , . . .). Each of the circuits can be depicted pictorially as a circuit diagram. Each input qubit is represented by a horizontal line running across the diagram, which is read from left to right. The applied quantum gates are represented by labelled boxes in order from left to right. We illustrate this in an example. Example 9 Consider a circuit on two input qubits and one ancillary qubit, referred to as qubits 1,2 and 3 respectively. The initial state is |x0 |x1 |0 with the ancilla in state |0 . The circuit is deﬁned by the following sequence of instructions: Apply H to qubit 1; then apply CN OT to qubits 2,3 with 2 as control; apply CN OT to qubits 1,2 with 1 as control; apply Z to qubit 2; ﬁnally apply CN OT to qubits 2,3 with 3 as control and then measure qubit 1 in the computational basis to get an output bit i. The circuit diagram is the following.

The output is now a sample from a probability distribution over all possible output strings. . . Consider the 3-bit gate whose action is as follows: if the ﬁrst bit is 0 (resp. Universal sets of quantum gates: In classical computation we restrict our circuits to be composed of a small universal set of gates that act on only a few qubits each.COMSM0214 i 37 |x0 H v measure |x1 v d d Z d d |0 d d v Note that we have used an asymmetrical symbol to depict CN OT with a “blob” on the control line and an X on the target line. 0 to b1 . This formalism with random input bits can be used to implement probabilistic choices of gates. . rk is a sequence of bits each of which is set to 0 or 1 uniformly at random. . This is done as follows: for input b1 . Correspondingly it is useful to extend out model of classical computation to also incorporate probabilistic choices. . bn the random bits will generally be diﬀerent. chosen with probability half. . . Remark: It may be shown that no sets of 2-bit reversible gates are universal (see Preskill p2412) but there are 3-bit reversible gates G that are universal even just by themselves i. Then we use this gate with a random input to the ﬁrst bit. any reversible Boolean function may be constructed as a circuit of G’s alone. . . . . together with constant extra inputs set to 0 or 1. . speciﬁed according to some desired criteria. Randomised classical computations: Recall that the result of a quantum measurement is generally probabilistic so the output of a quantum computation will generally be a sample from a probability distribution. commute and can be done in parallel or the reverse order if desired. . bn we extend the starting string b1 . Actually OR may even be deleted from this set since b1 OR b2 = NOT(NOT(b1 ) AND NOT(b2 )). we can think of teleportation as a “quantum computation with quantum inputs and quantum outputs” (rather than just quantum representations of classical data). . If the computation is repeated with the same input b1 . . AND) to the last two bits. AND.e. . 0 where r1 . Figure 2 is also a circuit diagram for the quantum teleportation protocol. One such choice is the set { NOT. OR. 1) apply OR (resp. rk . which is generated by the uniformly random choice of r1 . For example suppose we wish to apply either AND or OR at some point. acting on disjoint sets of qubits. Then in speciﬁc computational algorithms we normally require the output to be correct “with suitably high probability”. COPY}. The ﬁrst H and CN OT . determined by the ﬁnal quantum state before the measurement. . bn r1 . rk 00 . |ψ ) than just computational basis states i. Two examples of such gates are the Fredkin gate F (0b2 b2 ) = 0b2 b3 . . bn 00 . In this process the inputs are more general quantum states (|α .e.

0 where 0 . cn then b ⊕ c = (b1 ⊕ c1 ) . . bn and c = c1 . c) uniquely by simply applying f . c ) of f we can recover ˜ again. . f ˜ Proof: If we apply f twice to (b. ˜ ˜ Lemma 1 For any f . 1 0 {CN OT. But many small ﬁnite sets of quantum gates are still universal in the sense that they can generate any unitary gate with any prescribed accuracy > 0. . c ⊕ f (b) ⊕ f (b)). Hence f is reversible since from an output (b . the corresponding bits of b and c i. Consider f : Bm+n → Bm+n deﬁned by ˜ f (b. g must be one-to-one as a function on n-bit strings i. the input (b. 0 and looking at the last n bits of output of f . denoted ⊕. a controlled-controlledNOT gate in which NOT is applied to bit 3 iﬀ the ﬁrst two bits are 1 and T of f is the identity otherwise. Furthermore f is always ˜ applied twice is the identity operation. . For example 011 ⊕ 110 = 101. . . c). In the quantum case all gates are reversible (unitary) by deﬁnition and there are similar universality results but the situation is a little more complicated: quantum gates are parameterised by continuous parameters (in contrast to classical gates which form a discrete set) so no ﬁnite set can generate them all exactly via (arbitrarily large) ﬁnite circuits. self-inverse i. controlled by the value of the ﬁrst bit. .e. g must be a permutation of Bn . Most Boolean functions f : Bm → Bn are not reversible and it is useful to have a reversible representation of them that can be directly used in the quantum gate formalism. Conversely given f we can easily recover f (b) for any b by setting ˜ c = 0 . . (For more details about approximations see Nielsen and Chuang §4. We introduce an addition operation. c ⊕ f (b)) for any m-bit string b and any n-bit string c. H}. (bn ⊕ cn ) i. c) we get (b. Note that for any n-bit string we have b ⊕ b = 0 .e. 0 denotes the n-bits string of all zeroes. 0 exp iπ/4 9. ˜ Note that f is easily computable if we can compute f and the (simple) addition operation ˜ ⊕ on bit strings.5) Some examples of universal sets of quantum gates are the following: {CN OT. Such approximations (for suitably small ) will suﬃce for all our purposes and for clarity of discussion we will generally ignore this issue of approximability and just allow use of any exact gate that we need. and the Toﬀoli gate T of f (0b2 b3 ) = 0b2 b3 and T of f (1b2 b3 ) = 1CN OT (b2 b3 ) i. c) = (b.1 Reversible gate version of any Boolean function Recall that any n-bit Boolean function or gate g : Bn → Bn is called reversible if we can uniquely construct the input from the output i. . f is a reversible operation on m + n bits. b ⊕ c is the n-bit string obtained by adding mod 2. . we “add mod 2 without carry”.e. H.e. 0 ˜˜ ˜ ˜ so f f (b. . . a controlled SWAP. T = } and {Toﬀoli 3-qubit gate. c) = (b. for n-bit strings: if b = b1 . But f (b) ⊕ f (b) = 0 .COMSM0214 38 and F (1b2 b3 ) = 1b3 b2 i. all 1-qubit gates}.e.e. .e. ˜ Now let f : Bm → Bn be any Boolean function. .

e. for fairly small n values. We have the following standard terminology for some classes of algorithms: P (“poly time”): class of all classical algorithms that run in polynomial time and give the correct answer with certainty (i. We adopt Uf as the quantum representation of the function f . Then Uf is reversible with inverse −1 Uf |x |y = |x |y − f (x) where the minus sign denotes subtraction modulo N . the algorithms are not probabilistic). Consider the operation Uf on H deﬁned by Uf |x |y = |x |y + f (x) where + denotes addition modulo N in Y = ZN . BPP (“bounded error probabilistic poly time”): class of all classical randomised algorithms that run in poly time and give the correct answer with probability at least 2/3 in every case. BPP. We will call |x and |y the input and output registers or ﬁrst and second registers respectively. For deﬁniteness let X = ZM and Y = ZN where ZM is the set {0. Although computations with any T (n) are computable in principle. M − 1} of integers modulo M .e. 9.e. it must be a permutation on these states and hence unitary on the whole space. exceed sensibly available limits (e. c?) or else. EQP (“exact quantum poly time”): class of all quantum algorithms that run in poly . Let f : X → Y be any function where X and Y are ﬁnite sets. y = |x |y labelled by the elements of X and Y . 1.g. . the size of the circuit Cn . subtraction modulo 2 is the same as addition!) Hence Uf is one-to-one and mapping basis states to basis states. 2. These ideas extend easily to arbitrary functions between ﬁnite sets (not just Boolean functions mapping bit strings to bit strings). does T (n) grow faster than any polynomial (e. exponential √ functions such as T (n) = 2n or 2 n have this property). poly time computations are regarded as tractable or “computable in practice”. If T (n) is not polynomially bounded then the computation is regarded as intractable or “not computable in practice” as the physical resource of time will. which we denote as Uf . Let T (n) be the size of Cn .COMSM0214 39 ˜ If we interpret f as deﬁning a map on computational basis states of m + n qubits then its extension by linearity to all states of m+n qubits gives a unitary operation corresponding to f . running time on any available computer will exceed the age of the universe). Let H be a state space of dimension M N with orthonormal basis |x. . for Uf .g. . We are especially interested in the question of whether T (n) is bounded by a polynomial function of n (i. . BQP In computational complexity theory a fundamental issue is the time complexity of algorithms: how many steps (in the worst case) does the algorithm require for any input of size n? In the (classical or quantum) circuit model the number of steps on inputs of size n is taken to mean the total number of gates in the circuit Cn i.2 Time complexity – P. is T (n) < cnk for all large n for some constants k. (Note that in the Boolean 1-bit case.

that includes continuous parameters) we cannot build an inﬁnitely precise implementation and we must tolerate some level or error. One ﬁnal remark about the deﬁnition of the classes BPP and BQP: we have required the outputs to be correct with probability 2/3. N and if N is prime return 1). In this scenario.e.3 Query complexity and promise problems In quantum computation and the study of its properties relative to classical computation. In view of this it is of great interest to ask: are there problems with algorithms in BQP but not known to have algorithms in BPP? i.e. Thus the classes BPP resp.e. BQP (“bounded error quantum poly time): class of all quantum algorithms that run in poly time and give the correct answer with probability at least 2/3 in every case. there is another computational scenario that is often considered. Thus given any > 0 this probability will exceed 1 − for some constant K and if the original algorithm had poly running time T (n) then our K-repetition majority vote strategy has running time KT (n) which is still polynomial in n. This result relies on the following fact often called the ampliﬁcation lemma (see Nielsen and Chuang p154 for more details): if we have an algorithm for a decision problem that works correctly with probability 2/3 in all cases then consider repeating the algorithm K times and taking the majority vote of all K answers as our ﬁnal answer. Thus this problem is not known to be in BPP. We can . we are instead given as input a black box or oracle that computes some function f : Bm → Bn . The “exact” classes P and EQP. more colloquially: can quantum computation oﬀer an “exponential speed-up” over classical computation for some computational tasks? Example 10 The problem FACTOR(N ) is the following: given an integer N of n digits ﬁnd a non-trivial factor of N (i. The fastest 1 2 known classical algorithm runs in time exp O(n 3 (log n) 3 ) i. 9. more than exponential in the cube root of the input size. instead of being given an input bit string of some length n. This is the concept of “black box promise problems” with an associated measure of complexity called “query complexity”.e.99999 or indeed 1 − for any arbitrarily small ﬁxed > 0. We will study this quantum factoring algorithm later in detail. if there is a poly time algorithm for a problem that succeeds with probability 2/3 then there are also poly time algorithms that succeed with probability 0.COMSM0214 40 time and give the correct answer with certainty in every case. although useful in theoretical considerations.99 or 0. quantum computer”.9 or 0. BQP are generally viewed as mathematical formalisations of “computations that are feasible on a classical resp. = 1. are unrealistically idealised from a practical point of view: in any real physical situation (classical or quantum. It can be shown 1 that this answer is correct with a probability 1 − 2O(K) approaching 1 exponentially in K. However it may be shown that “2/3” here may be replaced by any other number 1 − for any 1/2 > > 0 (however small) without changing the contents of the class i. In 1994 Peter Shor discovered a quantum algorithm for this task with running time T (n) < n3 so the problem is in BQP.

say 0. Example 11 The following are examples of black box promise problems. some stated restriction on the possible form of f . The “balanced versus constant” problem Input: a black box for a Boolean function f : Bn → B (one bit output). Our task is to determine some desired property of f e. some feature of the set of all values of f .e. ˜ For quantum circuits we always use the reversible form f of f i. Search Input: a black box for a Boolean function f : Bn → B. In each case we are interested in how the minimum number of queries grows as a function of the natural parameter n (for quantum versus classical algorithms). Promise: no restriction on the form of f .e. In addition to the query complexity we may also be interested in the total time complexity. We want to achieve this by querying the box the least possible number of times. counting also the number of gates used to process the answers to the queries in addition to merely the number of queries themselves. Promise: There is a unique x such that f (x) = 1. the unitary operation Uf associated to f . n). at the start. Periodicity Input: a black box for a function f : Zn → Zn (where Zn denotes the set of integers mod n). No other use of the box is allowed.COMSM0214 41 query the black box by giving it inputs and this is the only access we have to the function and its values. Boolean satisﬁability Input: a black box for a Boolean function f : Bn → B.99 in every case. 1 for exactly half of the 2n inputs x. In our circuits in addition to our usual gates we may use the black box as a gate. In particular we cannot “look inside it” to see its actual operation and learn information about the function f . Problem: ﬁnd this special x. We could ask for the answer to be correct with certainty or merely with some probability. These notions are illustrated in the example below. Problem: determine whether there is an input x such that f (x) = 1. The query complexity of such an algorithm is simply the number of times that the oracle is used (as a function of its “size” measured by m. Problem: Determine whether f is balanced or constant. Thus. there is a least r such that f (x + r) = f (x) for all x (and + here denotes addition mod n). but there is often an a priori promise on f i.e. Promise: f is periodic i.g. Promise: f is either (a) a constant function (f (x) = 0 for all x or f (x) = 1 for all x) or (b) a “balanced” function in the sense that f (x) = 0 resp. . Problem: ﬁnd the period r. each use counting as just one step of computation. it is unknown exactly which function f is.

e. |xn = √1 2n x∈Bn |x .COMSM0214 42 10 Computation by quantum parallelism. Such a computation on superposed inputs is called computation by quantum parallelism. we get H ⊗ . Now Uf is unitary and linear. . in one run of Uf we obtain a ﬁnal state which depends on all of the function values. and these properties may be diﬃcult to get classically without many classical evaluations of f (as each such evaluation reveals only one further value)..g. determine some joint properties of all the values) with just one run of Uf . It is instructive to consider more explicitly the important special case of X = Bn . How do we actually create the input state of a uniform superposition over all x values that is needed in the above process? 1 Recall that H |0 = √2 (|0 + |1 ) so if we apply H to each of n qubits each in state |0 and multiply out all the state tensor products. The Deutsch algorithm Computation by quantum parallelism 10. 10... |0 ) = = √1 (|0 + |1 ) . ⊗ H(|0 .1 Consider any function f : X → Y between ﬁnite sets of sizes |X| and |Y | respectively.2 Deutsch’s algorithm Our ﬁrst (and also historically the ﬁrst) example of the beneﬁt of computation by quantum parallelism is the query problem invented by D. the set of all n-bit strings (and Y = B1 or some other set). Hence we can already see a potential quantum beneﬁt over classical computation for the scenario of query complexity and promise problems. Input: we are given a black box for an unknown one of these f ’s. . . .. . . (|0 2n 1 √1 x1 . By further quantum processing and measurement on the state |f we are able to obtain “global” information about the nature of the function f (e. . Deutsch’s problem: There are four functions f : 1-bit → 1-bit. . An important feature of this process is that we have created a superposition of exponentially many (viz. Below we will develop some explicit examples.xn =0 |x1 2n + |1 ) |x2 . 2n ) terms with only a linear number of elementary operations – we have applied H just n times. .x2 . Thus if we set the input register to a superposition of all possible x values we get Uf : 1 |X| all x |x |0 → |f ≡ 1 |X| all x |x |f (x) i. If we run Uf on |x |0 we get |x |f (x) and we can read out the value of f (x) by measuring the second register. and its corresponding unitary transform Uf which maps |x |y to |x |y + f (x) . Deutsch in 1985.

and ± |1 if f balanced) and measure in the computational basis. |0 H Uf H measure 0 if f constant 1 if f balanced (discard) |0 X H Note that the 2-qubit states immediately before and after Uf are both product states. 1 Deutsch’s quantum algorithm is the following. Then run Uf on the resulting state. we get the following orthogonal states: 1 ± √2 (|0 + |1 ) if f constant 1 ± √2 (|0 − |1 ) if f balanced Finally apply H (to get ± |0 if f constant. we just get a minus sign on |x if f (x) = 1 and no change if f (x) = 0. both values are 0 or both 1) or balanced (i.e. Discarding the second register we have |x → (−1)f (x) |x . . (the top line is the input register and the bottom line is the output register).COMSM0214 43 Problem: We want to know the value of f (0) ⊕ f (1). In 1 both cases the state of the output register is √2 (|0 − |1 ). Note that: 1 √ 2 |x (|0 − |1 ) −→ = = 1 √ 2 |x (|f (x) − |f (x) ⊕ 1 ) 1 √ 2 1 − √2 |x (|0 − |1 ) if f (x) = 0 |x (|0 − |1 ) if f (x) = 1 |x (|0 − |1 ) (16) 1 √ (−1)f (x) 2 i. Set the input register to √2 (|0 + |1 ) 1 and the output register (surprisingly!) to √2 (|0 − |1 ). 1 Now if we run Uf on 2 (|0 + |1 )(|0 − |1 ) each x value runs as above and after discarding 1 the second register. We say that “f (x) has been coded as phase information”. we get √2 x (−1)f (x) |x . Below we give a circuit diagram for the Deutsch algorithm. (This is achieved by applying H respectively to |0 and |1 ).e. one value 0 and one value 1). Note that this is equivalent to deciding whether f is constant (i.e. A little thought shows that classically two queries are necessary and suﬃcient. In the quantum scenario we will solve the problem (with certainty) with only one query! Our quantum black box acts on two qubits: Uf |x |y = |x |y ⊕ f (x) . Explicitly.

.999 say). This requires (n + 1) Hadamard operations and one 2 X operation (on the output register). 0 because H is self inverse and recall that Hn |0 . . just one query suﬃces (with O(n) extra processing steps) in every case! Our quantum black box is Uf : |x |y = |x |y ⊕ f (x) where the input register |x comprises n qubits and the output register |y comprises a single qubit. . We begin by constructing an equal superposition of all n-bit strings in the input register and the state 1 √ (|0 − |1 ) in the output register. . (At the end we will discuss the bounded error version of the problem i. ⊗ H we get 2 ± |0 . 0. Promise: f is either (a) a constant function (f (x) = 0 for all x or f (x) = 1 for all x) or (b) a “balanced” function in the sense that f (x) = 0 resp. resulting in the (n + 1)-qubit state √ 1 2n+1 all x ∈ Bn |x (|0 − |1 ). . all x ∈ Bn (17) What does this state look like when f is constant or balanced? If f is constant then |f = ± √1 n x |x . 1 for exactly half of the 2n inputs x. .. 0 = √1 n x |x . exponentially many) are necessary and suﬃcient to solve the problem with certainty in the worst case. Hence on the full superposition we get Uf : √ 1 2n+1 all x ∈ Bn |x (|0 − |1 ) −→ √ 1 2n+1 all x ∈ Bn (−1)f (x) |x (|0 − |1 ).e.) shows that classically 2n /2 + 1 queries (i. requiring the correct solution only with some high probability. Consider the “balanced versus constant” promise problem of example 11 which we recall here: The “balanced versus constant” problem Input: a black box for a Boolean function f : Bn → B (one bit output).e. 2 . Problem: Determine (with certainty) whether f is balanced or constant. Next we run Uf (once) on this state.COMSM0214 44 10. This is a product state of the n-qubit input and single qubit output registers. (16) goes through exactly as before (in the previous case x was a single bit but this makes no diﬀerence to the calculation). A little thought (hmm.. We now show that in the quantum scenario. Discarding the last (output) qubit we get the n-qubit state 1 |f ≡ √ 2n (−1)f (x) |x . We assume that initially all qubits are in standard state |0 .3 The Deutsch-Jozsa (DJ) algorithm We can generalise the above idea to obtain a problem with an exponential separation between the number of queries needed classically and quantumly. If we apply Hn = H ⊗ . For each individual n-bit string x the calculation of eq.

0 . But if we take the inner product of |f with x |x we simply add up all the coeﬃcients in |f (as x|y = 0 if x = y and = 1 if x = y) and wherever the minus signs occur. K = O(log 1/ ) suﬃces to guarantee error probability < in every case. However this weakness can be fully addressed: there exist other black box promise problems for which a provable exponential separation exists between classical and quantum query complexity even in the presence of error. .e. 0 then f was certainly balanced. Hence we have solved the problem with one query to f and (3n + 1) further operations: n H’s and one X for the input state. Thus we lose the all-interesting exponential gap between classical and quantum query complexities in this bounded error scenario.e.0 ax |x having the all-zero term absent. output “f is balanced”. .e. Hence if we apply the unitary operation Hn (which preserves inner products). . If they are all 0 or all 1. xn = 0 . But the ﬁrst output (“f is constant”) can be erroneous. n H’s on |f and n single qubit measurements to get the classical output string. . Then the above (single query) algorithm still works (as it has = 0) but there is now a classical (randomised) algorithm that solves the problem with only a constant number of queries (depending on as O(1/ log ) for any n and for any ﬁxed > 0). Remark: So does the above prove conclusively that quantum computation can be exponentially more powerful (in terms of time complexity) than classical computation!? We point out two important shortcomings in this claim. The ﬁrst weakness is that if we allow any level of error in the result. require our algorithm to correctly distinguish balanced versus constant functions only with probability > 1 − for some > 0. K > log 1/ i. . If we get at least one instance of each of 0 and 1. which we will not discuss in this course. with minus signs sprinkled in some unknown locations along the 2n terms.COMSM0214 45 If f is balanced then the sum in eq. for all n. Hence Hn |f must have the form x=0. If the result is 0 . The classical algorithm is the following: we pick K x values. if f is balanced then |f is orthogonal to x |x . We noted previously that the zero error scenario in computation is an unrealistic idealisation and for realistic computation we should always accept some (suitably small) level of error. . In view of the above discussion. This is < if 1/2K−1 < i. having constructed |f (for our given black box) we apply Hn and measure the n qubits in the computational basis. output “f is constant”. Clearly the second output must always be correct (as a constant function can never output both values). . So the probability that K random values are all 0 or all 1 is 2/2K = 1/2K−1 . however small. the total sum is always zero i. . each chosen independently uniformly at random and evaluate the corresponding f values. Then each random value f (x) has probability half to be 0 or 1. (17) contains an equal number of plus and minus terms.) A second (more serious) issue is the fact that the DJ problem is only a black box problem (with the black box’s interior workings being inaccessible to us) rather than a straight- ..e. Suppose f is a balanced function.. The balanced versus constant problem with bounded error: Suppose we tolerate some error i. Hn |f will be orthogonal to Hn x |x = |0 . 0 then f was certainly constant and if the result is any non-zero string x1 . we lose the exponential separation between classical and quantum algorithm running times (as described in the previous paragraph). (An example is the so-called Simon’s quantum algorithm.

. are there any “standard” computational tasks for which we can prove the existence of an exponential speed-up for quantum versus classical computation? No such absolute proofs are known but the diﬃculty seems to be largely within the classical theory: even though many problems have only exponential-time known classical algorithms. 11 11. even with bounded error.g.e. no classical poly-time algorithm. we cannot prove that no poly-time algorithm exists (that we have not yet discovered!) – recall that it is unproven that the class NP or even PSPACE is strictly larger than P. . . However there are problems which are believed to be hard for classical computation (i. Below we will describe Shor’s polynomial time quantum algorithm for factorisation after we introduce the quantum Fourier transform.g. they cannot be proven to be hard classically i. Note that even a constant function can be presented to us in such a perversely complicated way that its trivial action is hard to recognise! Alas. A centrally important such problem is integer factorisation. |N − 1 labelled by ZN . no such (“provably hard”) class of Boolean function descriptions is known.COMSM0214 46 forward “standard” computational task with a bit string as input. |1 . denoted QFTN (or just QFT when N is clear) is the unitary transform on HN deﬁned by: N −1 xy 1 (18) exp(2πi ) |y QF T : |x → √ N N y=0 . which is at the heart of the workings of Shor’s algorithm. In fact QFT is at the heart of most known quantum algorithms that provide a signiﬁcant speedup over classical computation. . and no “hidden” ingredients. a formula for it or a circuit Cn that computes fn . So. It is a unitary matrix that arises naturally in a wide variety of mathematical situations so it ﬁts well into the quantum formalism. providing a bridge between a quantum operation and certain mathematical problems. the QFT on an n-qubit space. Let HN denote a state space with an orthonormal basis (the computational basis) |0 . Later we will be especially interested in N = 2n i.e. takes exponential time in n) even if we have full access to a description of the function e. is known despite much eﬀort) for which poly-time quantum algorithms do exist.e.1 The quantum Fourier transform and periodicities Quantum Fourier transform mod N The quantum Fourier transform (QFT) can be viewed as a kind of generalisation of the Hadamard operation to dimensions N > 2. The quantum Fourier transform (QFT) modulo N . To convert it to a standard task we would want a class of Boolean functions fn : Bn → B such that the balanced/constant decision is hard classically (e. As a pure mathematical construction it is the same as the so-called discrete Fourier transform which is widely used in digital signal and image processing.

divided by N . This is 1/N times the sum of “the ath row of QFT† lined up against the bth column of QFT”. . . For N = 4 we have ω = i so (viewing rows as geometric sequences) 1 1 1 1 1 1 1 1 1 1 i i2 i3 1 1 i −1 −i = QFT4 = 2 1 (i2 ) (i2 )2 (i2 )3 2 1 −1 1 −1 1 (i3 ) (i3 )2 (i3 )3 1 −i −1 i . Then H ⊗H and QFT4 are respectively the “Fourier transform for these two diﬀerent group structures”. Recall the formula for the sum of any geometric series 1−αN if α = 1 2 N −1 1−α 1 + α + α + . So using eq. . + α = N if α = 1 Now consider α = ω K = e2πiK/N for some chosen K. .. ⊗ H. Many properties of QFT. The k th row (or column) for k = 0.COMSM0214 Thus the abth matrix entry is 1 [QFT]ab = √ exp 2πiab/N N a. • Each row (or column) is a geometric sequence. For example on a set of 4 elements there are two (non-isomorphic) group structures viz. the so-called Fourier transform on an abelian group. . . . N − 1 47 (where we are labelling rows and columns from 0 to N − 1 rather than 1 to N !) If ω = e2πi/N is the primitive N th root of unity then the matrix elements are all powers of √ ω (divided by N ) following a simple pattern: • The initial row and column always contain only 1’s. (18). b = 0. Z2 × Z2 and Z4 (addition of integers mod 4).. Hence we get 1 + ω K + ω 2K + . + ω (N −1)K = N if K is a multiple of N . . Example 12 For N = 2 we have ω = −1 and 1 QFT2 = √ 2 1 1 1 −1 = H. QFT† QFT is the identity matrix and QFT is unitary. 0 if K is not a multiple of N . N − 1 is the sequence of powers of ω k (starting with power 0 up to power N − 1). as deﬁned above in eq. follow from a basic algebraic fact about roots of unity and geometric series. In this course QFTN will always mean “Fourier transform on the group ZN ”. The latter sum is just the geometric series with α = ω b−a . (19) we get 0/N = 0 if b = a and we get N/N = 1 if b = a i. . . (19) Now to see that QFT is unitary. which embraces both of these constructs. Then α = 1 iﬀ K is a multiple of N . . However there is a more general mathematical formalism. . consider the abth element of the matrix product QFT† QFT. Remark(optional): Note that QFT4 is diﬀerent from H ⊗ H and generally QFT2n diﬀers from Hn = H ⊗ . including the fact that it is unitary. Also αN = 1 for every K (since αN = ω KN and ω N = 1). .e.

e.e. Thus we have a random period (the j0 period) and a random element in it (determined by x0 ) i. In the quantum scenario we will see that r can always be determined with any constant high level of probability 1 − using only O(log log N ) queries and poly(log N ) further processing steps i. Applying . once in each period.99 say) that’s independent of increasing the size of N . a number not bounded by any polynomial in log N )) are necessary and suﬃcient to achieve this in classical computation with a black box for f . which is the number of periods.2 Periodicity determination A fundamental application of the Fourier transform (both classically and quantumly) is the determination of periodicity exhibited in a function or some other given data. . If we measure the second register we will see some value y = f (x0 ) where x0 is the least x having f (x) = y. Then the ﬁrst register will be projected into an equal superposition of the A values of x = x0 . Since f is periodic (with unknown period r) r must divide N exactly and we set A = N/r. We will also assume that f is one-to-one in each period i. We want a method of determining r with some constant level of probability (0. requiring a number of steps that is not bounded by any polynomial in log N . we may have an explicit formula for it but the periodicity determination may still be hard (we will see an example later). since each possible value y of f occurs the same number A of times i. giving no information about r at all. overall we get a random number between 0 and N − 1. to be able to pick up periodicities in a periodic pattern irrespective of an overall random shift of the pattern (e.) If we measure the register of |per we will see x0 + j0 r where j0 has been th picked uniformly at random too. . It can be shown √ that O( N ) queries to f (i. x0 + (A − 1)r for which f (x) = y i.e. as we’ll see later) can be reduced to problems of periodicity determination. there is a smallest number r such that f (x + r) = f (x) for all X ∈ ZN (and + is addition mod N ). .e.e. .e.g. exponentially faster than any classical method.COMSM0214 48 11.e. Some important mathematical problems (such as integer factorisation. Quantum algorithm for periodicity determination N −1 We begin by constructing a uniform superposition √1N x=0 |x and one query to Uf to obtain the state |f = √1N allx |x |f (x) . Nevertheless the state |per seems to contain the information of r! The resolution of this problem is to use the Fourier transform which is known even in classical image processing. In some cases further information may be available about f e.g. f (x1 ) = f (x2 ) for all 0 ≤ x1 < x2 < r. the x0 in |per ). Suppose we are given (a black box for) a function f : ZN → Y (where typically Y = ZM for some M ) and it is promised that f is periodic with some period r i. x0 + 2r. we get 1 |per = √ A A−1 |x0 + jr j=0 Here 0 ≤ x0 ≤ r − 1 has been chosen uniformly at random (by the generalised Born rule. x0 + r.

. (b) for QFT|per If we now measure the label we will obtain a value c which is a multiple k0 N/r of N/r where 0 ≤ k0 ≤ r − 1 has been chosen uniformly at random.... r − 1 ω jry = 0 otherwise j=0 and QFT |per = A N r−1 ω x0 (kN/r) |kN/r . r N Here c and N are known and k0 is unknown and random. labels E labels E 0 x0 x0 + r (a) for |per x0 + 2r .COMSM0214 49 QFT to |per we get (using eq. these probabilities are now independent of x0 and depend only on N (known) and r (to be determined). . 'x0 ' r E ' r E E . . (18) with x replaced by x0 + jr. and summing over j): QF T |per = √ 1 NA A−1 N −1 ω (x0 +jr)y |y j=0 y=0 =√ 1 NA N −1 A−1 ω x0 y y=0 j=0 ω jry |y . only multiples of A = N/r survive as y values: A−1 A if y = kN/r for k = 0.. . According o eq. (20) (In the last equality we have reversed the order of summation and factored out the jindependent ω xo y terms).. . k=0 The random shift x0 has been eliminated from the labels and now occurs only in a pure phase ω x0 kN/r (whose modulus squared is 1). (19) (now applied with A taking the role of N there) this sum is zero whenever y is not a multiple of A and the sum is A otherwise i.. . This is represented schematically in the following diagram.e. . .. Thus c = k0 N/r so c k0 = . (20). so how do we get r out of this? If (by some good fortune!) k0 was coprime to r we could cancel c/N down and . 0 N/r 2N/r . It is a geometric series with powers of α = e2πiry/N = (e2πi/A )y summed from power 0 to power A − 1. Which labels y appear here with nonzero amplitude? Look at the square-bracketed coeﬃcient of |y in eq. probs T probs T ' N/r E ' N/r E 1 r r N . . Since measurement probabilities are squared moduli of the amplitudes. .. and the periodicity of the ket labels has been “inverted” from r to A = N/r.

Proof of lemma We have that the probability of at least one success in M runs = 1− − log prob(all runs fail) = 1 − (1 − p)M . Thus we succeed in determining the period with any constant level 1 − of probability with O(log log N ) queries and O(poly(log N )) further computational steps. poly(log N ) steps. If N = 2n is an integer power of 2 then QFT mod N acts on n qubits. The remaining operations are all familiar arithmetic operations on integers of size O(N ) (such as cancelling c/N down to lowest form) that are all well known to be computable in polynomial time i. O(1/p) trials suﬃce. Next use the fact that p < − log(1 − p) for all 0 < p < 1 to see that M < − log i. If k0 is not coprime to r then this procedure will deliver a denominator r that is smaller than the correct r so f (x) = f (x + r ) for any x. Thus in our process we check the output r value by evaluating f (0) and f (r) and accepting r as the correct period iﬀ these are equal. But k0 was chosen at random so what is the chance of getting this good fortune of coprimality? We’ll use (without proof) the following theorem from number theory: Theorem 3 (Coprimality theorem) The number of integers less than r that are coprime to r grows as O(r/ log log r) with increasing r. We also need to apply the “large” unitary gate QFTN (which grows with N ) and we show in the next section that this may be implemented in O((log N )2 ) elementary steps. M = O(1/p) p repetitions suﬃce. Thus if we repeat the whole process O(log log r) < O(log log N ) times we will obtain a coprime k0 in at least one case with a constant level of probability.e.3 Eﬃcient implementation of QFT (This subsection is not required for exam purposes). Hence if k0 < r is chosen at random prob(k0 coprime to r) ≈ O((r/ log log r)/r) = O(1/ log log r). In each round we query f three times (once at the start to make |f and twice more at the end to check the output r) so we use O(log log N ) queries in all. Here we have used the following fact from probability theory: Lemma 2 If a single trial has success probability p and we repeat the trial M times independently then for any constant 0 < 1 − < 1: prob(at least one success in M trials) > 1 − if M = − log p so to achieve any constant level 1 − of success probability.e.COMSM0214 50 read oﬀ r as the denominator. 11. Then 1 − (1 − p)M = 1 − if M = − log(1−p) . For these dimension sizes we will show how to implement QFT with a circuit of polynomial size O(n2 ). .

COMSM0214 51 This is a very special property of QFT – almost all unitary transforms in dimension 2n require exponential sized ( O(poly(2n )) sized) circuits for their implementation. . .x0 ) |1 |0 + e2πi(. . Our eﬃcient implementation of QFT is really just a translation of the classical fast Fourier transform formalism to the quantum scenario.yn−1 and we want to insert the expression for xy/2n from eq. .. |0 + e2πi(.xn−1 . . For general N (not a power of 2) we do not have an exact eﬃcient (i. + y1 2 + y0 In xy/2n we discard any terms that are whole numbers since these make no contribution to exp 2πixy/2n and a direct calculation gives: xy ≡ yn−1 (. Before .x0 ) + yn−2 (. x0 ) 2n where the factors in parentheses are binary expansions e. . . y ≤ 2n−1 in binary (as n bit strings of digits): (Warning: Take care to distinguish arithmetic mod 2n in Z2n used here from the bitwise arithmetic of n bit strings that we used earlier!) x = xn−1 2n−1 + xn−2 2n−2 + . |x0 into the corresponding product state given in eq.x0 ) |1 ... . Instead we generally approximate QFT mod N by QFT mod 2k where 2k is near enough to N to incur only an acceptably small reduction in the success probability of the algorithm. poly(log N ) sized) implementation.yn−1 splits up into a product of single index sums ( y0 )( y1 ) . (21) is a sum over the diﬀerent yi ’s. We write 0 ≤ x.. This factorisation is the key to building our QFT circuit. .g. . Since eq.. ( yn−1 ) so we get exp 2πi y xy |y = 2n exp 2πi y xy |yn−1 |yn−2 . |y0 2n xy |y = 2n y 0 .x1 x0 ) |1 Hence QFT|x is √ product of corresponding 1-qubit states obtained by taking each the bracket with a 1/ 2 normalising factor. |y0 = 2n . . . . It should map each basis (product) state |xn−1 .. . + y0 (. .. the exponential will be a product of these terms and hence the sum y0 . (22). .e.xn−1 xn−2 . + x1 2 + x0 y = yn−1 2n−1 + yn−2 2n−2 + . .. . (21) into the exponential. (22) |0 + e2πi(.x1 x0 ) + .. .x2 x1 x0 = Now exp 2πi y (21) x2 x1 x0 + 2+ 3 2 2 2 exp 2πi xy |yn−1 |yn−2 . We begin by showing that the n qubit state xy 1 exp 2πi n |y QF T |x = √ 2 2n y is actually a product state of n one-qubit states...

n = 3. x0 lines. These operations depend on x0 . x0 (not x2 ). let’s look at the example of N = 8 i. phase shift of e2πi0.01 and e2πi0. x2 ). y1 .x0 ) |1 2 STAGE 3 H |x0 . -1. y0 at the output): y2 register 1 √ |0 + e2πi(.01 controlled by x0 value. ⊗ y1 register 1 √ |0 + e2πi(.x is 0 resp. ⊗ y0 register 1 √ |0 + e2πi(. Do them ﬁrst and accumulate result on x2 line (as x2 line no longer needed after this). denoted C-Rd acts on two qubits and is deﬁned by the following actions C-Rd |0 |ψ = |0 |ψ C-Rd |1 |ψ = |1 Rd |ψ for any 1-qubit state |ψ .e.01) (23) where the binary digit 1 in the last exponential is (d + 1) places to the right of the dot. x1 . To see how the QFT circuit actually works. The controlled-Rd gate. 2 Indeed if x = 0 resp. x2 .e.COMSM0214 52 we start note that the Hadamard operation can be expressed in our binary fractional notation as 1 H |x = √ |0 + e2πi(. Do them second and accumulate result on x1 line (as x1 line no longer needed after this).. These operations depend on x1 .. by swap operations). We want a circuit that transforms |x2 |x1 |x0 to the following states in these three registers (called y2 . This operation depends only on x0 (not x1 . x1 . y2 lines are respectively on the x2 .0x0 i. 1/2 as a decimal fraction so e2πi(.x) |1 . as required.00. In addition to the Hadamard gate H we’ll introduce the 1-qubit phase gate: Rd = 1 0 d 0 eiπ/2 = 1 0 0 e2πi(0.g. After completion of these three stages.x2 x1 x0 ) |1 2 STAGE 1 H |x2 followed by phase shifts of e2πi0. Diagramatically this will be denoted as .x1 x0 ) |1 2 STAGE 2 H |x1 followed by phase shift e2πi0.001 controlled by x1 and x0 respectively. y1 . Do it last (third) and put result on x0 line. To draw an actual circuit diagram we consider the three stages in turn. the desired ﬁnal contents of the y0 .x) is 1 resp. Thus ﬁnally just reverse the order of the qubits in the string (e. 1 then .

. . H . . We’ll encounter (and deal with) a technical . . n − 1 respectively). . . |y2 e e e |x1 v H R1 |x0 STAGE 1 v v ¡ ¡ e ¡ e ¡ ¡ e ¡ e ¡ ¡ ¡ |y1 e e |y0 STAGE 2 3 SWAP For N = 8 = 23 we use 3 Hadamard gates (one in each stage) and 2 + 1 controlled phase gates (in stages 1 and 2 respectively). .COMSM0214 53 Rd v with a “blob” on the control qubit line. . 12 Shor’s quantum factoring algorithm We will now describe Shor’s quantum factoring algorithm. . . . + 2 + 1 = n(n − 1)/2 controlled phase gates (in stages 1. . STAGE . . . . . Currently the best known classical algorithm (the so1/3 2/3 called number ﬁeld sieve algorithm) runs in time eO(n (log n) ) i. . . .e. . We’ll begin by ﬁrst describing some pure mathematics (number theory) – involving no quantum ingredients at all! – showing how to convert the problem of factoring N into a problem of periodicity determination. . . Then we’ll use our quantum period ﬁnding algorithm to achieve the task of factorisation. Overall we have O(n2 ) = O((log N )2 ) gates for QFT mod N . . (In this accounting we have ignored the ﬁnal swap operation to reverse the order of qubits. . the circuit for QFT8 is |x2 H R1 R2 . . . . . . In terms of all these. . . . . . . . . but this requires only a further O(n) 2-qubit SWAP gates to implement). . Given an integer N with n = log N digits this algorithm will output a factor 1 < K < N (or output N if N is a prime) with any chosen constant level of probability 1 − . . . . and the algorithm will run in polynomial time O(n3 ). . . . 2. . there is no known polynomial time classical algorithm for this task. For general N = 2n we would use n Hadamard gates (one in each of n stages) and (n − 1) + (n − 2) + .

e. Now consider the powers of a as a function of the index i. 12. given that a / was chosen at random? We quote the following theorem. If b > 1 we are ﬁnished. Since we do not know the period at the outset the restricted function will not be exactly periodic on ZM : the “last” period will generally be incomplete (as M is not generally an exact multiple of the period). we compute gcd(ar/2 ± 1.COMSM0214 54 complication: our function will be periodic on the inﬁnite set Z of all integers so for computational purposes we need to truncate this down to a ﬁnite size ZM for some M (suitably large.1 Factoring as a periodicity problem – some number theory Let N with n = log N digits denote the integer that we wish to factorise. (25) . r is called the order of a mod N . We will also always choose M to be a power of 2 to be able to use our explicit circuit for QFT mod M for such M ’s. N ). We start by choosing 1 < a < N at random. But we’ll see that if M is suﬃciently large (in fact M = O(N 2 ) will suﬃce) then there will be enough complete periods so that the single “corrupted” period has only a negligible eﬀect on our period ﬁnding algorithm. depending on N ). N ) which will be factors of N . then in eq. N exactly divides the product (ar/2 − 1)(ar/2 + 1) (and knowing r we can calculate each of these terms in poly(n) time). We will use: Theorem 4 (Euler’s theorem): If a and N are coprime then there is a least power 1 < r < N such that ar ≡ 1 mod N . / r/2 r/2 (25) N must partly divide into a − 1 and partly into a + 1. We omit the proof which may be found in most texts on number theory. We know N does not divide ar/2 − 1 (since r was the least power x such that ax − 1 is divisible by N ). (We will use our quantum period ﬁnding algorithm for this). Then ar − 1 = (ar/2 − 1)(ar/2 + 1) ≡ 0 mod N i. Hence using Euclid’s algorithm again. All this works provided r is even and ar/2 ≡− 1 mod N . if ar/2 ≡− 1 mod N .e.e. Next using Euclid’s algorithm (which is a poly-time algorithm) we compute the greatest common divisor b = gcd(a. the modular exponential function: f : Z → ZN f (k) = ak mod N (24) Clearly f (k1 + k2 ) = f (k1 )f (k2 ) and by Euler’s theorem f (r) = 1 so f (k + r) = f (k) for all k i. Also since r is the least integer with f (r) = 1 we see that f must be one-to-one within each period.e. f is periodic with period r. Thus if N does not divide ar/2 + 1 i.e. Thus suppose b = 1 i. How likely is this. Next suppose we can ﬁnd r. a and N are coprime. Suppose r comes out to be even.

.13.13. go back to (iii).7. 1. has values 1. Given any candidate factor we can check it (in poly(n) time) by test division into N . (We will achieve this with any desired level of constant probability 1 − using the quantum algorithm described in the next section). (i) Is N even? If so.4. If N = pl is a prime power then we can identify this case and ﬁnd p using the following result (which we quote without proof).7. say 10 times. .COMSM0214 55 Theorem 5 Suppose N is odd and not a power of a prime. test divide the output and stop if a factor of N is obtained. Thus repeating the process. output 2 and stop. p733-753 1996. Reviews of Modern Physics. Hence for any N which is odd and not a prime power. 50) = 5 gives non-trivial factors of 15. Example 13 Consider N = 15 and choose a = 7. Then a direct calculation shows that the function f (k) = 7k mod 15 for k = 0. . If N was a prime power pl then c will be p.3 or A. . Nielsen/Chuang appendix 4. If t = 1. N ) = 1 then Prob(r is even and ar/2 ≡− 1 mod N ) ≥ 1/2. Summarizing the process so far: given N we proceed as follows. appendix B. If t = 1 or N go back to (iii) and try again. so r = 4. If a < N is chosen uniformly at random with gcd(a. l ≥ 2. . (iii) If N is neither even nor a prime power choose 1 < a < N at random and compute s = gcd(a. (v) If r is odd. 2. Lemma 3 Suppose N = cl for some integers c. If s = 1 output s and stop. (iv) If s = 1 ﬁnd the period r of f (k) = ak mod N . vol 68.4. According to theorem 5 any run of (iv) and (v) will output a factor with probability > 1/2 so K repetitions of looping back to (iii) will all fail only with probability < 1/2K which can be made as small as we like. we will obtain a factor with probability at least half.1. N ). / For a proof of this result see Preskill’s notes page 307 et seq. Ekert and R. Jozsa. so by deﬁnition t is a factor of N . Thus 74 − 1 = (72 − 1)(72 + 1) = (48)(50) is divisible by 15 and computing gcd(15. and succeed with any probability 1 − with log2 1/ repetitions. All of this works if N is not even or a prime power. N output t. 48) = 3 and gcd(15. So how do we recognise and treat these latter cases? If N is even (which is easy to recognise!) we immediately have a factor 2 and we are ﬁnished. N ).. . we will fail to get a factor only with tiny probability 1/210 . If r is even compute t = gcd(ar/2 + 1.. Then there is a classical polynomial time algorithm that outputs c. (ii) Run the algorithm of lemma 3. Running this algorithm on any N will output some number c and we can check if it divides N or not.

(20). Using a standard application of computation by quantum parallelism we manufacture the state √1m x∈D |x |f (x) and measure the second register to obtain 2 some value y0 = f (x0 ) with 0 ≤ x0 < r. . Let’s look more closely at the ratio α = e2πicr/2 . . where c was called y) the square bracket is a geometric series with ratio α = ω cr and we have [. .2 Computing the period of f (k) = ak mod N Let r denote the (as yet unknown) period of f (k) = ak mod N on the inﬁnite domain Z. . In the present case r does not divide 2m exactly generally so α is not an Ath root of unity and we don’t get a lot of “exactly zero” amplitudes for |c ’s! However we aim to show that a measurement on QFT|per will yield an integer c-value which is close to a multiple of 2m /r with suitably high probability. Note that k(2m /r) can never be exactly half way between two integers since r < N and 2m > N 2 . There were r such multiples each 1 with equal |amplitude| of √r . so (using 2’s in 2m ) all . 2( ). (r − 1)( ).COMSM0214 56 12. r r r Each of these is within half of a unique nearest integer. . . . + α 2 A−1 = m A 1−αA 1−α for α = 1 for α = 1. the domain D contains B full periods and only the initial part up to b of the next period. . In the ﬁrst register we get the state 1 |per = √ A where A= Let QF T2m |per = c=0 A−1 |x0 + kr k=0 2m + r m 2 r 2m −1 B+1 = B = 1 if x0 < b if x0 ≥ b. Previously we had r dividing the denominator 2m exactly and 2m /r = A so if α = 1 then α was an Ath root of unity and the geometric series summed to zero in all these cases. .] = 1 + α + α + . . Writing ω = e 2πi/2m we have A−1 1 ˜ f (c) = √ √ A 2m ω k=0 c(x0 +kr) ω cx0 =√ √ A 2m A−1 ω crk . k=0 As before (as in eq. Consider the r multiples of 2m /r (which are now not integers necessarily!): 0. . 1. The only c values that survived were the exact multiples of A = 2m /r having α = 1. . 2m − 1} where 2m is the least power of 2 greater than N 2 (see later for the reason for this choice).e. We will work on the ﬁnite domain D = {0. Let 2m = Br + b with 1 < b < r i. (26) ˜ f (c) |c . 2m 2m 2m .

. (r − 1). 2 2 Write α = eiθc with θc = 2π(cr mod 2m )/2m so |θc | < πr/2m . Here we will show that although the other c-values will generally have non-zero probabilities. .COMSM0214 57 factors of 2 can be cancelled out of the denominator r. . .2: Schematic depiction of amplitudes in QFT|per . Thus we consider c values (r of them) such that 2m 1 |c − k | < k = 0. (26) we see that in all cases A < 2m /r + 1 so |Aθc | < r πr A < π(1 + m ). (27) r 2 In the previous case of exact periodicity (where 2m /r was an integer) each of these cvalues appeared with probability 1/r and all other c-values had probability zero. m 2 2 . the special ones in eq. (27) we r r (28) − < cr mod 2m < . T˜ |f (c)| ≈ 2m /r ' ≈ 2m /r ' 2m /rE ≈ E E .. (a) exact periodicity (r divides 2m ): we have nonzero amplitudes only at exact multiples c = k2m /r. . Let ck be the unique integer m with |c − k 2r | < 1 . ' ' c (a) exact periodicity E c (b) inexact periodicity E Figure 12. T˜ |f (c)| 2m /r ' 2m /r ' 2m /r E E E . For our special c-values satisfying eq.. (b) nonexact periodicity: we have nonzero amplitudes for many c-values but the integers nearest to the multiples k2m /r still have suitably large amplitudes. (27) still have probability at least γ/r for a constant γ. Theorem 6 Suppose we measure the label in QFT|per .. 1. Then prob(ck ) > γ/r where γ ≈ 4/π 2 . 2 Proof: (optional) For any c we have ˜ prob(c) = |f (c)|2 = with α = e2πicr/2 = e2πi(cr mod have |cr − k2m | < r/2 so m 1 1 − αA A2m 1 − α 2 2m )/2m . Also from eq.

(27) i. (as sin x < x) sin Aθmax /2 Aθmax /2 where the last inequality follows from eq. it remains to show that r can be determined from a (“good”) c-value in time poly(log N ). 12. noting that 2m > N 2 and r < N so r/2m << 1 for all large N . . We will be especially interested in those c’s for which the corresponding k is coprime to r and there are O(r/ log log r) of these. − m 2 r 2 Recall that r < N and 2m > N 2 so c k 1 − < m 2 r 2N 2 with r < N (32) (31) . According to this theorem. for each k = 0. Introducing g(x) = sin x m m r x we have 1 1 1 r γ prob(c) > ( − m )g(Aθmax /2) = (1 − m )g(Aθmax /2) > (30) r 2 r 2 r for a constant γ. sin x x is decreasing on 2 Next from eq. (30) we get prob(c) > γ/r 2 for γ ≈ 4/π 2 . .COMSM0214 Write Aθmax = π(1 + r/2m ). (29) and the fact that 0 < x < π.3 Getting r from a good c value Suppose we have c satisfying eq. We have Aθmax /2 = π (1+r/2m ) ≈ π/2 so g(π/2) = (2/π)2 and from eq. To complete the determination of r and hence the description of the quantum factoring algorithm. Here we will just consider the case of very large N and ignore terms of order r/2m < 1/N . (27) with probability at least γ/r. (26) we have A > 2m /r − 1 so 2A > 1 − 21 . . To estimate prob(c) we’ll use the algebraic identity 1 − eiAθ 1 − eiθ We have Prob(c) = > = > 1 A2m 1 A2m A 2m A 2m sin Aθc /2 sin θc /2 sin Aθc /2 θc /2 sin Aθc /2 Aθc /2 2 2 58 (29) = sin Aθ/2 sin θ/2 2 2 2 2 . To get a proper lower bound for γ is straightforward but a little messy.e. . r − 1 we will obtain the unique c-value satisfying eq. Note that for all c 0 ≤ |Aθc /2| < Aθmax /2 < π. Hence the total probability of obtaining such a “good” c-value is O(1/ log log r) > O(1/ log log N ) and with O(log log N ) repetitions we will obtain such a good c-value with any desired constant level of probability. 1 c k < m+1 .

But our theory also guarantees that k is coprime to r with “reasonable” probability which in this case sets r = 12. Let r be the period of f (x) = 7x mod 39.000163 < 12 = 0. Then k k |k r − r k | 1 1 − = ≥ > 2 (33) r r rr rr N But k /r and k /r are both within 1/(2N 2 ) of c/2m so they must be within 1/N 2 of each other.000244 b 2048 2 This result is consistent with k = 5 and r = 12 and also with k = 10 and r = 24. But there are generally O(N 2 ) such fractions to try so this method of seeking the unique one is not eﬃcient. as the unique fraction k/r with denominator < 39 that is within 1/2m+1 = 1/212 of 853/2048. poly(n) time) method we invoke the elegant mathematical: Theory of continued fractions Any rational number s/t (with s < t) may be expressed as a so-called continued fraction: 1 s = (35) t a1 + a2 + 1 1 1 ···+ a l . a/b = 5/12 that satisﬁes eq. eq. requiring at least O(N 2 ) steps. b 2048 2 There are O(N 2 ) such fractions to try.COMSM0214 59 and c/2m is a known fraction. To prove this claim suppose k /r and k /r both lie within 1/(2N 2 ) of c/2m . This result is the reason why we chose 2m to be greater than N 2 : it guarantees that the bound on RHS of eq.e. We have N 2 = 1521 and 210 < N 2 < 211 = 2048 = 2m so m = 11. which is exponential in n = log N ! To obtain an eﬃcient (i. So far we have that k/r is uniquely determined by c/2m but how do we actually compute k/r from c/2m ? In the above example we were able to try out all candidate fractions k /r with denominator less than N . We can then verify that 712 is indeed congruent to 1 mod 39 and 7x for all x < 12 is not congruent to 1 so r = 12 is the correct period. We ﬁnd that there is only one viz. (32) is < 1/(2N 2 ) and then k/r is uniquely determined from c/2m . contradicting eq. (32). We claim that there is at most one fraction k /r with a denominator r less than N satisfying eq. Hence for given c/2m . (32) determines k/r uniquely. (32). According to our theory. (33). In this example we can (with a calculator) check all fractions a/b with a < b < N = 39 to see which ones (if any) satisfy 853 a 1 − (34) < 12 . Suppose the measurement of QF T2m |per yields c = 853. (34): a 853 1 − = 0. If this is actually the case then our theory guarantees that the fraction k/r is uniquely determined. Hence there is at most one k/r with r < N satisfying eq. Example 14 Suppose we wish to factor N = 39 and we have chosen a = 7 which is coprime to N . this number has a “reasonable” probability to be within half of a multiple k211 /r of 2m /r.

. (c) If a1 . For the base case k = 2 direct calculation gives [a1 . (37) Note that if the ak ’s are integers then so are the pk ’s and qk ’s. (37) holds for length k. . . . after some number l. . a1 a2 + 1 ··· pk = [a1 . ak + 1/ak+1 ] . . p1 = 1 and q1 = a1 . of iterations giving the expression in eq. 1 a2 = a2 . (35) we will write 1 a1 + a2 + 1 1 ···+ a l 1 = [a1 . ak . . al be any positive numbers (not necessarily integers here). . . q1 a1 p2 1 = [a1 .. . . a2 ] = a2 /(a1 a2 +1) and eq. . Proof outline (optional): (a) By induction on k. . For length k + 1 we have [a1 . qk ) = 1 for k ≥ 1. ak+1 ] = [a1 . . . . (a) Then [a1 . . To do this we begin by writing s/t = 1/(t/s). . .. . . (b) qk pk−1 − pk qk−1 = (−1)k for k ≥ 1. . ak ]. .COMSM0214 60 where a1 . . qk pl s = [a1 . al ]. . (35). ql t pk /qk is called the k th convergent of the continued fraction of s/t. . al are integers then gcd(pk . . Hence the sequence tk of denominators is strictly a decreasing sequence of non-negative integers and hence the process must always terminate. Thus suppose eq. . . sk and tk . . . q0 = 1. Note that sk < tk and tk+1 is always given by sk . . al ] = . ak ] = pk /qk where pk = ak pk−1 + pk−2 qk = ak qk−1 + qk−2 k ≥ 2. t2 = s1 and s 1 . To avoid the cumbersome “fractions of fractions” notation in eq. ak−1 . (36) For each k = 1. Since s < t we have t/s = a1 + s1 /t1 with a1 ≥ 1 and s1 < t1 = s and so s 1 = t a1 + s1 t1 . a2 . . . a2 ] = q2 a1 + . . Then repeating with s1 /t1 we get t1 /s1 = a2 + s2 /t2 . . l we can truncate the fraction in (36) at the k th level to get a sequence of rational numbers p1 1 = [a1 ] = . . al are positive integers. Set p0 = 0. = 1 t a1 + a2 + s2 t2 Continuing in this way we get a sequence of integers ak . Lemma 4 Let a1 . . . (37) correctly gives p2 = a2 and q2 = a1 a2 +1. Continued fractions enjoy the following tantalising properties.

These arithmetic operations can be performed in O(m2 ) time so we can compute all O(m) ak ’s in O(m3 ) time. . al ]. so pn /qn = p/q. . Theorem 7 Consider the continued fraction s/t = [a1 . . . an ] be the CF expansion of p/q with convergents pj /qj . Let pk /qk = [a1 . The computation of each successive ak involves the division of O(m) bit integers (and splitting oﬀ the integer parts). . We aim to show that the CF of x is an extension of the CF of p/q i. Then ˜ q pk /˜k = [a1 . . an . . (37) at length k (twice) we get: pk ˜ (ak + 1/ak+1 )pk−1 + pk−2 pk + pk−1 /ak+1 ak+1 pk + pk−1 = = = qk ˜ ak + 1/ak+1 )qk−1 + qk−2 qk + qk−1 /ak+1 ak+1 qk + qk−1 i. . . . k−1) expression in terms of the same expression with lower values of the subscripts. we want to construct λ rational so that x = [a1 . pk and qk must be increasing sequences and pk = ak pk−1 + pk−2 ≥ 2pk−2 .e. pk−2 = pk−2 ˜ q ˜ ˜ and similarly for the q’s. . ak−1 . we must get s/t after at most l = O(m) iterations. a must divide ±1 i. . q 2q Then p/q is a convergent of the continued fraction of x. . a = 1. (b) is proved by induction on k using the recurrence relations of (a) to express the (k. ak+1 ] = [a1 . Theorem 8 Let 0 < x < 1 be a rational number and suppose that p/q is a rational number such that p 1 x− < 2. . Similarly qk ≥ 2qk−2 . λ]. If s and t (cancelled to lowest terms) are m bit integers then the length l of the continued fraction is O(m) and this continued fraction together with its convergents can be calculated in time O(m3 ). Proof (optional): Let p/q = [a1 . qk ≥ 1 so by the above recurrence relations. Using eq. (38) to replace x we get λ=2 qn pn−1 − pn qn−1 δ − qn−1 . eq. . . In view of lemma 4(a) deﬁne λ by x = (λpn + pn−1 )/(λqn + qn−1 ). l. Proof outline (optional): We have ak ≥ 1 and pk . . qn .COMSM0214 61 where the RHS now has length k. Hence pk and qk are each ≥ 2 k/2 so since pk and qk are coprime and increasing. Let pj /˜j be the sequence of convergents of RHS. (37) holds for k + 1. Introduce δ deﬁned by δ pn + 2 (38) x= qn 2qn so |δ| < 1. Similarly using the recurrence relation we can compute all pk ’s and qk ’s in O(m3 ) time too. . . ak ] be the k th convergent for k = 1. Hence using the recurrence relation eq. . . ak . . .e. .e. . ak +1/ak+1 ] and clearly pk−1 = pk−1 . . (c) follows from (b): if a divides pk and qk exactly then by (b). .

. Remark: Theorem 8 actually remains true for irrational x too. . This sequence provides an eﬃcient method of computing excellent rational approximations to an irrational recalling that qk grows exponentially with k and (by theorem 8) it determines the accuracy of the approximation. . . . an . . . (32): k c 1 − and r < N . 2. 2. . . 1] so the value of n is increased by 1 and the sign is ﬂipped. (32). [2. . except for the above trick of splitting 1 oﬀ from the last term i. an − 1. 42. 2. . p/q is a convergent of the CF of x as required. = 42 + .. Since 2m = O(N 2 ) we have that c and 2m are O(n) bit integers and the computation of all the convergents can be performed in time O(n3 ). . 2. . 2] = . . . =4+0 169 169 4 4 1 so 853 = [2. 4]. 42] = . . =2+ . . (In the last argument we also used the easily proven fact that the CF expansion of any number is unique. 42. [2. Next let λ = b0 + λ where b0 is te integer part and 0 < λ < 1 and write λ = [b1 .COMSM0214 62 By lemma 4(b). bm = 1 then m = n and ai = bi ). So we do this computation and ﬁnally check through the list of O(n) convergents to ﬁnd the unique one satisfying eq. .e. . b0 . . So x = [a1 . qn pn−1 − pn qn−1 = (−1)n . . 2048 853 853 243 342 342 4 169 1 4 =2+ . . We develop 853/2048 as a continued fraction: 2048 342 853 169 853 = 1/(2048/853). bm ]. Suppose we have obtained c = 853 with 2m = 211 = 2048. [2. . . For an irrational number the continued fraction development does not terminate – we get an inﬁnitely long continued fraction and corresponding inﬁnite sequence of rational convergents pk /qk k = 1. b1 . an ] = [b1 . 2048 The convergents are 1 2 5 212 852 [2] = . 2] = . . < m 2 r 2N 2 We know that there is (at most) a unique such fraction k/r and according to theorem 8 this fraction must be a convergent of the continued fraction of c/2m . bm ] i. . if [a1 . . λ] = [a1 . an . [2. Now let us return to our problem of getting r from the knowledge of c and 2m satisfying eq. 2. . 2. .e. 2. . and read oﬀ r as its denominator. 2 5 12 509 2048 Checking these ﬁve fractions we ﬁnd only 5/12 as being within 1/212 of 853/2048 and having denominator < 39. Thus without loss of generality we can assume that (qn pn−1 − pn qn−1 )/δ is positive and so 2 qn−1 λ= − >2−1>1 δ qn (as |δ| < 1 and qn−1 < qn ). Example 15 (Continuation of example 14). 4] = . =2+ . bm ] and an . 2. We may assume that this is the same as the sign of δ since if it is the opposite sign then from the start write p/q = [a1 . . .

Suppose that in two repetitions we obtain k1 /r and k2 /r with neither k1 nor k2 coprime to r. O(log log N ) = O(log n) repetitions of the above process suﬃce i.e.e. Each such multiplication can be performed in O(n2 ) time (by the standard “long multiplication” algorithm) so the computation of f (k) for any 0 ≤ k < 2m can be performed in O(n3 ) steps. We ﬁrst need to compute the function f (k) = ak mod N (in superposition) over a domain 0 ≤ k < 2m where 2m = O(N 2 ) so m = O(n). the quantum part (iv). and still have implementational cost O(n2 ). Remark: There is a further subtle issue here. We have seen in section 11. This requires O(log k) = O(m) = O(n) multiplications of integers mod N . r1 and r2 will be randomly chosen factors of .COMSM0214 63 12.e.e. in addition to some further classical computational steps as well. Remark: Actually it may be shown that a constant number of repetitions suﬃces here (instead of O(log n)) to determine r. Then we apply QFT mod 2m to obtain the state QFT|per .4 Assessing the complexity of the quantum factoring algorithm Let us now consider all the parts of the quantum factoring algorithm and assess the time complexity of the whole process. Consider the case where N is neither even nor a prime power and a < N chosen at random is coprime to N . 2m Remark: There exist algorithms for integer multiplication that are faster than O(n2 ) time.3 that QFT mod 2m may be implemented in O(m2 ) = O(n2 ) steps. Thus to get such a value the number of steps is O(n2 log n log log n) + O(n) + O(n2 ) + O(n) = O(n2 log n log log n). To get the period r we need c to be a “good” c value i. To achieve this with a constant level of probability. running in time O(n log n log log n) so the above O(n3 ) can be improved to O(n2 log n log log n). which potentially involves an implementational cost that grows with m. c/2m is close to a multiple k/r of 1/r where k is coprime to r. (23)) with smaller and smaller phases eiπ/2 for d = O(m).1 i. giving an inexact but still suitably good approximation to QFT for the factoring algorithm to work. To compute ak we use repeated squaring of a log k times. Next we perform measurements on the output register of O(n) qubits i. O(n2 (log n)2 log log n) steps in all.e. Thus the state |f = 1 |k |f (k) can be computed in O(n3 ) steps. Once the exponent is close to k we do a few more multiplications to reach k itself. To compute the uniform superposition of all inputs for this computation we need m = O(n) initial Hadamard operations. In this case we must proceed to use the quantum part of the overall algorithm summarised at the end of section 12.3. O(n) single qubit measurements. To implement QFT mod 2m we will need d controlled Rd gates (cf eq. Recall that the best known classical algorithm to factor N with n = log N digits runs in a time that’s exponential in n1/3 . However it can be shown that we can neglect these gates for very small phases. Then we will determine r1 and r2 which are the denominators of k1 /r and k2 /r cancelled to lowest terms i. Next we measure the state QFT|per (O(n) single qubit measurements again) to obtain the value that we called c in section 12.

if we compute the least common multiple r of r1 and r2 we will have r = r with probability at least 1/4. Given any candidate it is easy to check if it is good or not.e. are actually the classical processing sections and not the novel quantum parts! 13 Quantum algorithms for search problems Searching is a fundamentally important computational task. ˜ ˜ To get r from c we use the (classical) continued fractions algorithm which required O(n3 ) steps. N ) using Euclid’s algorithm which requires O(n3 ) steps for n digit integers. we get a quadratic speedup over classical search.e. in poly time). Finally to obtain our factor of N we (classically) compute t = gcd(ar/2+1 .e. most important computational problems can be thought of as searching tasks. Hence the time complexity of the entire algorithm is O(n3 ) (or actually slightly better with optimized algorithms and a more careful complicated analysis). But we saw that the good case “r is even and t = 1” will occur with any ﬁxed constant level of probability 1 − after a constant number O(log 1/ ) of such repetitions. according to a further theorem of number theory. Suppose we are given a large database with N items and we wish to locate a particular item. no poly time algorithm known) but if a solution (or certiﬁcate of a solution) is given then its correctness can be “easily veriﬁed” (i. Our algorithm should locate the item with some constant level of probability (half say) independent of the size N . we gain no further information about the location of the good item (beyond the fact that it is not the current one). At ﬁrst sight we might have naively expected an exponential quantum speedup here: suppose N = 2n and recall that a quantum algorithm can easily access 2n items in superposition (by use of only n = log N Hadamard .e. if we examine an item and ﬁnd it bad. This speedup does not cross the polynomial vs. Then. If r was odd or r is even but t = 1 then we go back to the start. For example consider the class NP which we intuitively think of as problems that are “hard to solve” (i. In this section we will consider the following problem. We assume that the database is entirely unstructured or unsorted but given any item we can easily check whether or not it is the one we seek. For classical computation we may argue that O(N ) queries will be necessary and suﬃcient: the good item has completely unknown location. exponential divide (as we did in the case of the factoring algorithm) but it is still viewed as signiﬁcant in situations where exhaustive search is the best known classical algorithm. Hence if we examine m items the probability of seeing the good one is p = m/N so we must have m = O(N ) to have p constant.COMSM0214 64 r. Each access to the database is called a query and we normally regard it as one computational step. It is amusing to note that the “bottlenecks” of the algorithms performance i. √ For quantum computation we will see that O( N ) queries are suﬃcient (and necessary) to locate the good item i. the sections requiring the highest degree polynomial running times. Typically we are faced with a search over an exponentially large space of candidates seeking a “good” candidate.

Unstructured search (requiring O(N ) queries) corresponds to the database containing the numbers in some unknown random order. As a preliminary to our discussion of Grover’s algorithm we introduce some further features of the Dirac bra-ket notation. in √ particular Grover’s quantum searching algorithm which achieves this search in O( N ) queries. The issue of understanding which kinds of structure in a database can provide a good beneﬁt for quantum versus classical computation is still largely open and a topic of current research. Recall that a ket |ψ is represented in components as a column vector and the corresponding bra ψ| is just a notation for the conjugate transpose (a row vector): a0 a1 † = a∗ a∗ 1 0 . But if the items are structured by being presented in numerical order. But the above-quoted result shows that this hope cannot be realised. Above we have initiated a consideration of unstructured search. This kind of structured search is common in practice e.1 Reﬂections and projections in Dirac ket notation For clarity we will present the discussion here with 2 dimensional (qubit) states although all the issues generalise readily to any dimension d. Then we would be faced with an essentially unstructured search requiring a lot more time! In the following we will consider quantum algorithms for only unstructured search. Here we use the dagger symbol to denote conjugate transpose of any matrix and the star . As an example suppose our N items are labelled by the numbers from 1 to N and we seek the one labelled k.g. But databases are often structured in a way that can facilitate the search. the lexicographic ordering of names in a large phone book facilitating search for a given person’s number. 13.COMSM0214 65 operations) so we can look up the “goodness” of all items in superposition. Intuitively the good item occurs with only an exponentially small amplitude in the total superposition. with just one query! We may then hope that we could manipulate the resulting quantum state to eﬃciently reveal the good item. But suppose we were given a person’s number and asked to determine their name. then we can locate k with only O(log N ) queries (in fact exactly 1 × log N queries) using a binary search procedure: each query of a middle item eliminates an entire half of the remaining database. If the item were re-located at another place then the corresponding quantum state would diﬀer only by an exponentially small amount in the space of quantum states and it will thus be very diﬃcult to reliably distinguish by any physical process. (One interesting known result is that in the case of a linearly ordered database (such as the phone book above) any quantum algorithm still requires O(log N ) queries but the actual number of queries now is k log N with k strictly less than 1).

0 1 Thus we can view α| as a mapping from states to complex numbers. Next consider I|α = I − 2 |α α| = I − 2P|α (where I is the identity operator). reﬂecting in the mirror line α⊥ perpendicular to |α . α⊥ } is an orthonormal basis of the two dimensional space so any ket |β can be uniquely written as components parallel and perpendicular to |α : |β = x |α + y α⊥ for some x and y with |x|2 + |y|2 = 1. = (a∗ b0 + a∗ b1 )a0 0 1 (a∗ b0 + a∗ b1 )a1 0 1 i. Hence I|α simply reverses the sign of the component of |β that’s parallel to |α so geometrically we interpret I|α as a reﬂection operator. mapping any |ψ to α|ψ .e. In Dirac notation we have P|α |β = |α α|β i. Pictorially we have: . More cumbersomely.COMSM0214 symbol (rather than the previously used overline) to denote complex conjugation. Then P|α |β = x |α α|α + y |α α|α⊥ = x |α (39) b0 b1 = .e.e. Then {|α . Now let α⊥ be any chosen normalised vector that’s orthogonal to |α i. Referring to eq. P|α |β is always proportional to |α with multiplicative constant given by the inner product α|β . geometrically P|α is the operator of projection parallel to |α . an operation mapping kets to kets. in terms of components we have a0 a∗ a0 a∗ 0 1 a1 a∗ a1 a∗ 0 1 as expected.e. This is a 2-by-2 matrix i. Consider now the construction P|α = |α α| = a0 a1 a∗ a∗ 0 1 = a0 a∗ a0 a∗ 1 0 a1 a∗ a1 a∗ 0 1 ... (39) we have I|α |β = (I − 2P|α )(x |α + y α⊥ ) = x |α + y α⊥ − 2x |α = −x |α + y α⊥ . α⊥ |α = 0. If |α = a0 a1 |β = b0 b1 66 are any kets then α|β is the inner product obtained by matrix multiplication α|β = a∗ a∗ 0 1 b0 b1 = a∗ b0 + a∗ b1 .

More precisely we wish to devise a procedure that will locate it with some constant level of probability (say 1 ). our task is to locate it. Similarly we have IU |α = U I|α U † .e. independent of the size N of the database. we have PU |α = U P|α U † . Note that the dagger operation (of conjugate u v transposition) reverses the order of any matrix multiplication: Let U = (AB)† = B † A† for any matrices A and B. In components we have ψ| = s t u v a0 a1 † = a∗ a∗ 0 1 s∗ u∗ t∗ v ∗ .2 Grover’s Quantum Searching Algorithm We consider the fundamental problem of unstructured search for a unique item: given a search space of size N in which a single (random) entry has been marked. Thus in Dirac notation if |ψ = U |α then P|ψ = |ψ ψ| = U |α α| U † = U P|α U † i. In the quantum context we . We now describe a quantum algorithm. they are exactly projections and reﬂections in real 2-dimensional Euclidean geometry. s t be a unitary operation. Hence if |ψ = U |α then ψ| = (U |α )† = |α † U † = α| U † .COMSM0214 67 |α T P|α |β T ¨ ¨¨ rr ¨¨ ¨¨ ¨ B ¨ |β ¨¨ rr E rr α⊥ rr rr r j I|α |β Note that these operations act in a space of complex vectors. But in the special case that all components are real numbers. originally due to Lov Grover in √ 1996 which solves the problem with only O( N ) queries. (40) 13. We have 2 argued in the introduction that any classical method will need O(N ) queries to solve this problem.

If x0 is the n bit string 00 . As usual we assume that f is given as a unitary transformation Uf on n + 1 qubits deﬁned by Uf |x |y = |x |y ⊕ f (x) (41) Here the input register |x consists of n qubits as x ranges over all n bit strings and the output register |y consists of a single qubit with y = 0 or 1.COMSM0214 68 allow the simultaneous querying of many elements of the search space in superposition. It will be convenient to take the size N of our search space to be a power of 2 viz. Thus we can label the entries by bit strings (i. It is promised that f (x) = 0 for all n bit strings except exactly one string. which counts as one query. We will give a simple geometrical interpretation of Grover’s algorithm which clariﬁes its workings. 1} and let B n denote the set of all 2n n-bit strings. Let B = {0. We will replace the database by a black box which computes an n bit function f : B n → B. It is deﬁned by |x if x = x0 Ix0 |x = (42) − |x0 if x = x0 i. Then the action of Uf leaves the output register in this state and eﬀects Ix0 on the input register. A black box which performs Ix0 may be simply constructed from Uf by just setting the 1 output register to √2 (|0 − |1 ). . 0 then Ix0 will be written simply as I0 . N = 2n .e. Pictorially we have |x |y E E |x |y ⊕ f (x) Uf E E The assumption that the database is unstructured is formalised here as the standard oracle idealisation that we have no access to the internal workings of Uf – it operates as a “black box” on the input and output registers. telling us only if the queried item is good or not. The symbol ⊕ denotes addition modulo 2. Our search problem may then be phrased in terms of a black box promise problem as follows. Ix0 simply inverts the amplitude of the |x0 component. Instead of using Uf we will generally use a closely related operation denoted Ix0 on n qubits. Pictorially . denoted x0 (the “marked” position that we seek) for which f (x0 ) = 1.e. strings of 0’s and 1’s) of length n. . Our problem is to determine x0 .

0 = √ 2n |x x (43) which is an equal superposition of all possible x0 values.1 (i. |x0 and |ψ0 are almost orthogonal and 2α ≈ 2 sin α = √2N .COMSM0214 69 |ψ 1 √ 2 E E Ix0 |ψ 1 √ 2 Uf (|0 − |1 ) E E (|0 − |1 ) Our searching problem becomes the following: we are given a black box which computes Ix0 for some n bit string x0 and we want to determine the value of x0 using the least number of queries to the box. (Q2): In the subspace orthogonal to P(x0 ). . √ Thus about π N iterations will be needed. . . In the next section we will explain the structure of Q and show that it has a simple geometrical interpretation: (Q1): In the plane P(x0 ) spanned by (the initially unknown) |x0 and |ψ0 . (44) Note that all amplitudes in |ψ0 and all matrix elements of Q are real numbers so to analyse Q we will be justiﬁed in using literally the geometrical interpretations of the operators described in section 13. representing a square root speedup over the O(N ) evaluations needed for a classical unstructured search. Q = −I where I is the identity operation.e. We will work in a space of n qubits with a standard basis {|x } labelled by n-bit strings x. . Consider the compound operator Q deﬁned by Q = −Hn I0 Hn Ix0 . Q is rotation through angle 2α where sin α = √1N . Let Bn denote the space of all n-qubit states. Grover’s quantum searching algorithm operates as follows. Having no initial information about x0 we begin with the state 1 |ψ0 = Hn |0 . For large N . ⊗ H acting on Bn denote the application of H to each of the n qubits separately. More precisely we have x0 |ψ0 = √1N so the number of iterations needed is the integer nearest to (arccos √1N )/(2 arcsin √1N ) (which is independent of x0 ). in terms of real Euclidean geometry). Let Hn = H ⊗ . Thus by repeatedly applying Q to the starting state |ψ0 in P(x0 ) we may rotate it around near to |x0 and then determine x0 with high probability by a measurement in the standard basis. Each application of Q uses one evaluation 4 √ of Ix0 and hence of Uf so O( N ) evaluations are required. .

Lemma 5 If |ξ is any state then I|ψ preserves the 2-dimensional subspace S spanned by |ξ and |ψ .COMSM0214 70 A simple striking example is the case of N = 4 in which sin α = 1 and Q is a rotation 2 1 through π/3.e. it can be thought of geometrically as the operation of reﬂection in the hyperplane orthogonal to |ψ .e.e. (46) |ψ ¡ ! ¡ T ¡ ¡ ¡ ¡ |ξ ¡ ¡ ¡ • H⊥ (|ψ ) S Proof of lemma 5: Eq. (47) Thus I|ψ simply inverts the parallel component i. . We have the following simple properties of I|ψ . just one query. the (N − 1) dimensional subspace) of all states orthogonal to |ψ .3 The Iteration Operator Q – Reﬂections and Rotations To explain the structure of Q we begin with some elementary properties of reﬂections. (45) shows that I|ψ takes |ψ to − |ψ and for any |ξ . Hence after one application of Q i. Then any state |ξ may be uniquely decomposed into components parallel and perpendicular to |ψ : |ξ = a |ψ + b |ξ where |ξ is in H⊥ (|ψ ) and we get directly I|ψ |ξ = −a |ψ + b |ξ . The initial state is |ψ0 = 2 (|0 + |1 + |2 + |3 ) and for any marked x0 the angle between |x0 and |ψ0 is precisely π/3 too. we will learn the position of any single marked item in a set of four with certainty! 13. For any state |ψ in Bn consider the operator I|ψ = I − 2 |ψ ψ| (45) where I is the identity operator in Bn . it adds a multiple of |ψ to |ξ . Let H⊥ (|ψ ) denote the hyperplane (i. Hence any linear combination is mapped to a linear combination of the same two states.

Looking back at eq. from standard matrix expressions for rotations and reﬂections in R2 . β ≈ π/2 and we have a rotation of . (48). Then the operation of reﬂection in M 1 followed by reﬂection in M 2 is just rotation by angle 2θ about the point O. Using lemma 7 we see that the action of IHn |0 I|x0 = −Q in P(x0 ) is a rotation through 2β where cos β = x0 | Hn |0 = √1N . Q preserves P(x0 ) too. for example. For any vector v ∈ IR2 let Iv denote the operation of reﬂection in the line perpendicular to v through the origin in R2 . Since all matrix elements are real numbers we may restrict attention the real (rather than the complex) two dimensional subspace P(x0 ). (45): U I|ψ U † = I − 2U |ψ ψ| U † = I − 2 |U ψ U ψ| = IU |ψ where we have used that U IU † = U U † = I as U is unitary. Lemma 7 Let M 1 and M 2 be two mirror lines in the Euclidean plane IR2 intersecting at a point O and let θ be the angle in the plane from M 1 to M 2 (cf ﬁgure below). For large N . We are now in a position to ﬁnally identify geometrically what Q actually does. (44) and noting that H = H † we see that Q = −IHn |0 I|x0 71 (48) By lemma 1. Hence by eq. θ • O M2 M1 Proof of lemma 7: This is immediate.COMSM0214 Lemma 6 For any unitary operator U U I|ψ U † = IU |ψ Proof: This is immediate from eq. both Ix0 and IHn |0 preserve the two dimensional subspace P(x0 ) spanned by |x0 and Hn |0 .

Thus even though x0 is unknown (but we are given a black box for Ix0 ) we can construct a rotation operator Q in the plane spanned by the ﬁxed starting state |ψ0 and the unknown |x0 .4 Some further features of Grover’s algorithm Optimality √ Grover’s algorithm achieves unstructured search for a unique good item with π N 4 queries. Searching with multiple good items . Then from the deﬁnitions of I|x0 and IHn |0 in eq. Hence Q = −IHn |0 I|x0 acting in P(x0 ) is a rotation through 2α where α is the angle between |x0 and a perpendicular state to Hn |0 i.COMSM0214 72 almost π. say half ) must √ use O( N ) queries. Furthermore the angle between the starting state and |x0 is independent of the value of x0 (as the starting state is an equal superposition of all possible x0 values) so the number of iterations is independent of x0 too. To see the eﬀect of Q on states orthogonal to P(x0 ) suppose that |ξ ∈ Bn is orthogonal to both Hn |0 and |x0 . Is it possible to invent an even more ingenious quantum algorithm that uses fewer queries? Alas the answer is no: Theorem 9 Any quantum algorithm that achieves the search for a unique good item in an unstructured database of size N (with a constant level of probability.e. One possible argument may be found in Preskill’s notes section 6. We could use the operator (IHn |0 I|x0 )2 but there is another solution. 13. We 4 will not go through the proof of this result here. explaining the occurrence of the minus sign in the deﬁnition of Q: Lemma 8 For any 2 dimensional real v we have −Iv = Iv⊥ where v ⊥ is a unit vector perpendicular to v. It would be possible to use this large rotation as the basis of the quantum searching algorithm but we prefer a smaller incremental motion. Proof: For any vector u we write u = av + bv ⊥ . sin α = x0 | Hn |0 = √1N as claimed in (Q1). Thus the action of −Iv is the same as that of Iv⊥ . as claimed in (Q2). Then Iv just reverses the sign of a and −Iv reverses the sign of b. (45) we see that I|x0 |ξ = IHn |0 |ξ = |ξ so Q = −I in the orthogonal complement to P(x0 ). More precisely the order constant can be estimated to give a requirement of at least √ π (1 − ) N queries for any > 0 so Grover’s algorithm is optimal in a tight sense.5.

xr and the iteration operator (cf eq. We can separate out the good and bad parts of the full equal superposition |ψ0 writing: √ √ 1 r N −r |x = √ |ψG + √ |ψ0 = √ |ψB (49) N N N all x 1 where |ψB = √N −r bad x |x is the equal superposition of all bad items and |ψG and |ψB are orthogonal states.e. Consider ﬁrst the case that r is known. This also shows that within PG . . Then the action of QG preserves this plane and within PG this action is rotation through angle 2α where sin α = ψ0 |ψG = r . we obtain the operator IG (where G stands for “good”) with action: IG |x = |x − |x if x = x1 .e. (45)) and QG = −I|ψ0 I|ψG = I|ψ⊥ I|ψG . . . Using the same construction that gave Ix0 from Uf in the case of a single good item. The angle between |ψ0 and |ψG is β where cos β = ψ0 |ψG = r/N . . xr if x = x1 . . . Now suppose that we start with |ψ0 and repeatedly apply QG . the result lies in the plane too. Hence exactly 0 ⊥ as before. IG coincides with the operation I|ψG (cf eq. . . N Proof: Clearly I|ψ0 preserves PG since acting on any |ψ it just subtracts a multiple of |ψ0 . . Let 1 |ψG = √ r r |xi i=1 be the equal superposition of all good items. Each application of QG is a rotation through 2α where sin α = r/N so we need β/(2α) = (arccos r/N )/(2 arcsin r/N ) . r and f (x) = 0 for all other x’s. PG can also be characterised as the plane spanned by the orthogonal states |ψG and |ψB . Let the good items be denoted x1 . Q is a rotation through angle 2α where α is the angle between ψ0 and |ψG i. we just need to modify the number of iterations in a way that depends on r. xr so now f (xi ) = 1 for i = 1. sin α = ψ0 |ψG = r/N . (44)) is QG = −Hn I0 Hn IG = −I|ψ0 IG . (49). .COMSM0214 73 Suppose our search space contains r ≥ 1 good items and we wish to ﬁnd any one such item. . For IG we note that by eq. . Now IG |ψG = − |ψG and IG |ψB = |ψB so for any state |ψ = a |ψG + b |ψB in PG the action of IG is to subtract a multiple of |ψG i. Theorem 10 Let PG be the plane spanned by |ψ0 and |ψG . In this case we’ll see that our previous algorithm still works. . . . .

Consider for example the task SAT: given a Boolean function f .e. If the ﬁnal rotation angle is within ±45◦ of the y axis then the ﬁnal state |ψ has | ψ|ψG |2 ≥ cos2 45◦ = 1/2 i. using √ O( N ) queries. The apparent diﬃculty is the following: if we start with |ψ0 and repeatedly apply the operator Q (in either case r = 1 or r > 1) we just rotate the state round and round in the plane of |ψ0 and |ψG . Now for every quadrant. If they all fail we conclude that f is not satisﬁable. The trick is to know when to stop i. will locate a good item with probability at least 1/4. we √ have chosen a random angle in the range 0 to r π of r quadrants. measure the ﬁnal state and test if the result is good or 4 not. 4 We can also adapt the algorithm to work in the case that r is unknown. we have probability at least half of seeing a good item in our ﬁnal measurement. thus still using O( N ) queries. which will be correct with high probability 1 − (3/4)10 . For r << N each iteration is a rotation√ through small angle 2α ≈ 2 r/N i. half the angles are within ±45◦ of the y axis so our randomised procedure above. say 10 times. In this way Grover’s algorithm can be applied to any NP problem to provide a quadratic speedup over classical exhaustive search. we will fail to locate a good item only with tiny probability (3/4)M = (3/4)10 . apply K iterations of Q. We run the above randomised version of Grover’s algorithm. If r << N then |ψ0 and |ψG are almost orthogonal (β ≈ π/2) and α ≈ sin α = r/N so we need π N/r iterations. checking each output x to see if f (x) = 1 or not.e. (General r values can be addressed by a more complicated argument along similar lines). when the state lines up closely with |ψG in this plane. Now think of |ψ0 as the x-axis direction and |ψG as the y axis direction (recalling that these states are almost orthogonal for r << N ). We choose a number K randomly in the range 0 < K < √ π N . where rather than locating a good item we want instead to know whether a good item exists or not. does it have a satisfying assignment or not? f will generally have some unknown number r ≥ 0 of satisfying assignments. say M = 10 times. But if r is unknown then the rotation angle 2α of Q is unknown! To illustrate the way around this problem we’ll consider only the case where the unknown r is very small r << N .COMSM0214 74 iterations to move |ψ0 very close to |ψG . Equivalently we can 2 √ choose one of the r quadrants at random and then a random angle in it. Repeating the √ whole procedure a constant number of times. . This case of unknown r is directly relevant to the consideration of computational tasks in NP.e.

- cis515-13-sl1-c.pdf
- tensors 2014 10 05_1
- Lec01_wrg2
- m110 Fr+Ins+Sp Exs.ps
- Notes 2
- Basis and Dimension
- Richard J. Lipton, Kenneth W. Regan-Quantum Algorithms via Linear Algebra_ a Primer-The MIT Press (2014)
- Introducing Matlab
- Speaker Adaptation
- New Microsoft Office Word Document
- Introduction to the Index Theory.pdf
- Marching Procedure for Form-Finding for Tensegrity Structures by Micheletti Williams
- 000372
- Linear Algebra - Solved Assignments - Fall 2006 Semester
- Adam M. Bincer-Lie Groups and Lie Algebras. a Physicist's Perspective-Oxford University Press (2013)
- labook
- Simplex.cc
- 29977312
- Annalisa Calini- Recent Developments in Integrable Curve Dynamics
- Mat Class All
- Advances in Large Margin Classifiers
- 1-s2.0-S0034487702800283-main
- Featured
- Engineering Mathematics
- How Google Uses SVD
- Kalman -- A New Approach to Linear Filtering, etc. (1960)
- 4 Inverse Problem
- Simplicial Calculus with Geometric Algebra.pdf
- Sec Tins 2426
- Duality Ch 4

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd