You are on page 1of 19

An introduction to quantum machine learning

Maria Schulda , Ilya Sinayskiya,b and Francesco Petruccionea,b


arXiv:1409.3097v1 [quant-ph] 10 Sep 2014

a
Quantum Research Group, School of Chemistry and Physics, University of
KwaZulu-Natal, Durban, KwaZulu-Natal, 4001, South Africa
b
National Institute for Theoretical Physics (NITheP), KwaZulu-Natal, 4001, South Africa

September 11, 2014

Abstract
Machine learning algorithms learn a desired input-output relation from examples in order to interpret
new inputs. This is important for tasks such as image and speech recognition or strategy optimisation,
with growing applications in the IT industry. In the last couple of years, researchers investigated
if quantum computing can help to improve classical machine learning algorithms. Ideas range from
running computationally costly algorithms or their subroutines efficiently on a quantum computer to
the translation of stochastic methods into the language of quantum theory. This contribution gives a
systematic overview of the emerging field of quantum machine learning. It presents the approaches as well
as technical details in an accessable way, and discusses the potential of a future theory of quantum learning.

Keywords: Quantum machine learning, quantum computing, artificial intelligence, machine learning

1 Introduction mail filters, iris recognition for security systems, the


evaluation of consumer behaviour, assessing risks
Machine learning refers to an area of computer sci- in the financial sector or developing strategies for
ence in which patterns are derived (‘learned’) from computer games. In short, machine learning comes
data with the goal to make sense of previously un- into play wherever we need computers to interpret
known inputs. As part of both artificial intelligence data based on experience. This usually involves huge
and statistics, machine learning algorithms process amounts of previously collected input-output data
large amounts of information for tasks that come pairs, and machine learning algorithms have to be
naturally to the human brain, such as image and very efficient in order to deal with so called big data.
speech recognition, pattern identification or strategy
optimisation. These problems gain significant impor- Since the volume of globally stored data is growing
tance in our digital age, an illustrative example being by around 20% every year (currently ranging in
Google’s PageRank machine learning algorithm for the order of several hundred exabytes [1]), the
search engines that was patented by Larry Page pressure to find innovative approaches to machine
in 19971 and led to the rise of what is today one learning is rising. A promising idea that is currently
of the biggest IT companies in the world. Other investigated by academia as well as in the research
important applications of machine learning are spam labs of leading IT companies exploits the potential
1 See https://www.princeton.edu/ achaney/tmve/wiki100k/ of quantum computing in order to optimise classical
docs/PageRank.html [Last accessed 6/24/2014] machine learning algorithms. In the last decades,

1
physicists already demonstrated the impressive machine learning method quantum approach
power of quantum systems for information process-
ing. In contrast to conventional computers built k-nearest neighbour
Efficient calculation of
on the physical implementation of the two states support vector machines classical distances on a
‘0’ and ‘1’, quantum computers can make use of quantum computer
a qubit’s superposition of two quantum states |0i k-means clustering
and |1i (e.g. encoded in two distinct energy levels neural networks First explorations of
of an atom) in order to follow many different paths quantum models
decision trees
of computation at the same time. But the laws
of quantum mechanics also restrict our access to Bayesian theory Reformulation in the
information stored in quantum systems, and coming language of open
hidden Markov models quantum systems
up with quantum algorithms that outperform their
classical counterparts is very difficult. However,
the toolbox of quantum algorithms is by now fairly Figure 1: Overview of methods in machine learning
established and contains a number of impressive and approaches from a quantum information perspec-
examples that speed up the best known classical tive as presented in this paper.
methods [2]. The technological implementation
of quantum computing is emerging [3], and many
believe that it is only a matter of time until the comprehensive theory of quantum learning, or how
numerous theoretical proposals can be tested on real quantum information can in principle be applied to
machines. On this background, the new research intelligent forms of computing, is only in the very
field of quantum machine learning might offer the first stages of development.
potential to revolutionise future ways of intelligent
data processing. This contribution gives a systematic overview of
the emerging field of quantum machine learning, with
A number of recent academic contributions ex- a focus on methods for pattern classification. After a
plore the idea of using the advantages of quantum brief discussion of the concepts of classical and quan-
computing in order to improve machine learning tum learning in Section 2, the paper is divided into
algorithms. For example, some effort has been put seven sections, each presenting a standard method of
into the development of quantum versions [4, 5, 6] machine learning (namely k-nearest neighbour meth-
of artificial neural networks (which are widely used ods, support vector machines, k-means clustering,
in machine learning), but they are often based on neural networks, decision trees, Bayesian theory and
a more biological perspective and a major break- hidden Markov models) and the various approaches
through has not been accomplished yet [7]. Some to relate each method to quantum physics. This
authors try to develop entire quantum algorithms structure mirrors the still rather fragmented field and
that solve problems of pattern recognition [8, 9, 10]. allows the reader to select specific areas of interest.
Other proposals suggest to simply run subroutines of As summarised in Figure 1, for k-nearest neighbour
classical machine learning algorithms on a quantum methods, support vector machines and k-means clus-
computer, hoping to gain a speed up [11, 12, 13]. An tering, authors are mainly concerned to find efficient
interesting approach is adiabatic quantum machine calculations of classical distances on a potential quan-
learning, which seems especially fit for some classes tum computer, while probabilistic methods such as
of optimisation problems [14, 15, 16]. Stochastic Bayesian theory and hidden Markov models find an
models such as Bayesian decision theory or hidden analogy in the formalism of open quantum systems.
Markov models find an elegant translation into Neural networks and decision trees are still waiting
the language of open quantum systems [17, 18]. for a convincing quantum version, although especially
Despite this growing level of interest in the field, a the former has been a relatively active field of re-

2
search in the last decade. Finally, in Section 4 we
briefly discuss the need for future works on quantum
machine learning that concentrate on how the actual supervised unsupervised
learning part of machine learning methods can be learning learning
improved using the power of quantum information
processing.

2 Classical and quantum learn- reinforcement


ing learning

2.1 Classical machine learning


The theory of machine learning is an important sub- Figure 2: The three types of classical learning. Super-
discipline of both artificial intelligence and statistics, vised learning derives patterns from training data and
and its roots can be traced back to the beginnings of finds application in pattern recognition tasks. Unsu-
artificial neural network and artificial intelligence re- pervised learning infers information from the struc-
search in the 1950’s [19, 20]. In 1959, Arthur Samuel ture of the input and is important for data cluster-
gave his famous definition of machine learning as the ing. Reinforcement learning optimises a strategy due
‘field of study that gives computers the ability to to feedback by a reward function, and usually applies
learn without being explicitly programmed’2 . This to intelligent agents and games.
is in fact misleading, since the algorithm itself does
not adapt in the learning process, but the function it
encodes. In more formal language, this means that
of correct input-output relations and has to infer a
the input-output relation of a computer program is
mapping therefrom. Probably the most important
derived from a set of training data (which is often
task is pattern classification, where vectors of input
very big). Such methods gain importance as com-
data have to be assigned to different classes. This
puters increasingly interact with humans and have to
might sound like a rather technical problem, but is in
become more flexible to adapt to our specific needs.
fact something humans do continuously - for example
A prominent example is a spam mail filter that
when we recognise a face from different angles and
learns from user behaviour and external databases
light conditions as belonging to one and the same
to classify new spam mails correctly. However, this
person, or when we classify signals from our sensory
is only one of many different cases where machine
organs as dangerous or not. We could even go so
learning intersects with our every-day lives.
far and say that pattern classification is the abstract
description of ‘interpreting’ input coming from our
In the theory of machine learning, the term learn- senses. It is no surprise that a big share of machine
ing is usually divided into three types (see Figure learning research tries to imitate this remarkable
2), which help to illustrate the spectrum of the field: ability of human beings with computers, and there
supervised, unsupervised and reinforcement learning. is an entire zoo of algorithms that generalise from
In supervised learning, a computer is given examples large training data sets how to classify new input.
2 It is interesting to note that although quoted in numer-

ous introductions to machine learning, the original reference The second category, unsupervised learning, has
to the machine learning pioneer’s most famous statement is
very difficult to find. Authors either refer to other secondary
not been part of machine learning for a long time, as
publications, or falsely cite Samuel’s seminal paper from 1959 it describes the process of finding patterns in data
[21]. without prior experience or examples. A prominent

3
task is data clustering, or forming subgroups out of a the training data to decide upon its classification. In
given dataset, in order to summarize large amounts this case, learning is not a parameter optimisation
of information by only a few stereotypes. This is problem, but rather a decision function inferred from
for example an important problem in sociological examples. In reinforcement learning, this decision
studies and market research. Note that this task function becomes a full strategy, and learning refers
is closely related to classification, since clustering to the adaptation of the strategy to increase the
means effectively to assign a class to each vector of a chances of future reward.
given set, but without the goal of treating new inputs.
Whatever type and procedure of learning is cho-
Finally, reinforcement learning is the closest sen, optimal machine learning algorithms run with
to what we might associate with the expression minimum resources and have a minimum error rate
‘learning’. Given a framework of rules and goals, related to the task (as indicated by misclassification
an agent (usually a computer program that acts of input, poor division into clusters, little reward of
as a player in a game) gets rewarded or punished a strategy). Challenges lie in the problem of finding
depending on which strategy it uses in order to win. parameters and initial values that lead to an optimal
Each reward reinforces the current strategy, while solution, or to come up with schemes that reduce the
punishment leads to an adaptation of its policy complexity class of the algorithm.3 This is where
[22, 23]. Reinforcement learning is a central mech- quantum computing promises to help.
anism in the development and study of intelligent
agents. However, it will not be in the focus of this
2.2 Quantum machine learning
paper, and it differs in many regards from the other
two types of learning. Investigations into quantum Quantum computing refers to the manipulation of
games and quantum intelligent agents are diverse quantum systems in order to process information.
and numerous (see for example, [24, 25, 26, 27, 28]), The ability of quantum states to be in a superposi-
and shall be treated elsewhere. tion can thereby lead to a substantial speedup of a
computation in terms of complexity, since operations
Even within these categories, the expression can be executed on many states at the same time.
‘learning’ can relate to different procedures. For The basic unit of quantum computation is the qubit,
example, it may refer to a training phase in which |ψi = α |0i + β |1i (with α, β ∈ C and |0i , |1i in the
optimal parameters of an algorithm (e.g. weights, two-dimensional Hilbert space H2 ). The absolute
initial states) are obtained. This is done by pre- squares of the amplitudes are the probability to
senting examples of correct input-output-relations measure the qubit in the 0 or the 1 state, and
to a task, and adapting the parameters to reproduce quantum dynamics always maintain the property of
these examples. The training set is then discarded probability conservation given by |α|2 + |β|2 = 1. In
[29]. An illustrative case close to human learning mathematical language this means that transforma-
is the weight adjustment process in artificial neural tions that map quantum states onto other quantum
networks through backpropagation or deep learning states (so called quantum gates) have to be unitary.
[30, 31]. Training phases are often the most costly Through single qubit quantum gates we are able to
part of a machine learning algorithm and efficient manipulate the basis state, amplitude or phase of
training methods become especially important when a qubit (for example through the so called X-gate,
dealing with so called big data. Besides learning the Z-gate and the Y-gate respectively), or put a
as a parameter optimisation problem, there is a qubit with β = 0 (α = 0) into an equal superposition
large number of machine learning algorithms that 3 The complexity of a problem tells us by what factor the
do not have an explicit learning phase. For example, computational resources needed to solve a problem grow if we
if presented with an unclassified input vector, the increase the input to the problem (e.g. the digits of a number)
k-nearest-neighbour for pattern classification uses by one.

4
|0 1 computer is to use such elementary gates in order
0
qubit states to create a quantum state that has a relatively high
|1 0
1 amplitude for states that represent solutions for the
given problem. A measurement in the computational
X X 01
10
basis then produces such a desired result with a
relatively high probability. Quantum algorithms
H 1 11
Hadamard 2 11
are usually repeated a number of times since the
result is always probabilistic. For a comprehensive
1 0 0 0 introduction into quantum computing, we refer to
XOR 0 1 0 0
0 0 0 1 the standard textbook by Nielsen and Chuang [2].
0 0 1 0

X 1 0 0 0
SWAP 0 0 1 0 In quantum machine learning, quantum algorithms
X 0 1 0 0
0 0 0 1 are developed to solve typical problems of machine
learning using the efficiency of quantum computing.
Measurement This is usually done by adapting classical algorithms
or their expensive subroutines to run on a potential
Figure 3: Representation of qubit states, unitary quantum computer. The expectation is that in
gates and measurements in the quantum circuit the near future, such machines will be commonly
model and in the matrix formalism. available for applications and can help to process
the growing amounts of global information. The
√ √ √ emerging field also includes approaches vice versa,
α = β = 1/ 2 (α = 1/ 2, β = −1/ 2) (the Hadamard
namely well-established methods of machine learning
or H-gate). Multi-qubit gates are often based on
that can help to extend and improve quantum
controlled operations that execute a single qubit
information theory.
operation only if another (ancilla or control qubit) is
in a certain state. One of the most important gates
As mentioned before, there is no comprehensive
is the two qubit XOR-gate, which flips the basis
theory of quantum learning yet. Discussions of ele-
state of the second qubit in case the first qubit is in
ments of such a theory can be found in [32, 33, 34].
state |1i. A two-qubit gate that will be mentioned
Following the remarks above, a theory of quantum
later is the SWAP-gate exchanging the state of two
learning would refer to methods of quantum infor-
qubits with each other.
mation processing that learn input-output relations
from training input, either for the optimisation of
Quantum gates are usually expressed as unitary
system parameters (for example unitary operators,
matrices (see also Figure 3). The matrices operate on
n see [35]) or to find a ‘quantum decision function’ or
2 -dimensional vectors that contain the amplitudes
n ‘quantum strategy’. There are many open questions
of the 2 basis states of a n-dimensional quantum
of how an efficient quantum learning procedure
system. For example, the XOR-gate working on the
√ could look like. For example, how can we efficiently
quantum state |ψi = 1/ 2 (|00i + |11i) would look
implement an optimisation problem (that is usually
like
solved by iterative and dissipative methods such as
gradient descent) on a coherent and thus reversible
     
1 0 0 0 1 1
0 1 0 0 1 0 1 0 quantum computer? How can we translate and
0 0 0 1 · √2 0 = √2 1 ,
     
process important structural information, such as
0 0 1 0 1 0 distance metrics, using quantum states? How do we
formulate a decision strategy in terms of quantum

and produce |ψ 0 i = 1/ 2 (|00i + |10i). The art physics? And the overall question, is there a general
of developing algorithms for a potential quantum way how quantum physics can in principle speed up

5
certain problems of machine learning? could contain preprocessed information on patients
and their correctly diagnosed disease. A machine
An underlying question is also the representa- learning algorithm then has to find the correct
tion of classical data by quantum systems. The disease of a new patient. More precisely, given a
most common approach in quantum computing is training set T = {~v p , cp }p=1,...,N of N n-dimensional
to represent classical information as binary strings feature vectors ~v and their respective class cp , as
(x1 , ...xn ) with xi ∈ {0, 1} for i = 1, ..., n, that well as a new n-dimensional input vector ~x, we have
are directly translated into n-qubit quantum states to find the class cx of vector ~x. Closely related
|x1 ...xn i from a 2n -dimensional Hilbert space with to pattern classification are other tasks such as
basis {|0....00i , |0....01i , ..., |1....11i}, and to read in- pattern completion (adding missing information to
formation out through measurements. However, ex- an incomplete input), associative memory (retrieving
isting machine learning algorithms are often based one of a number of stored memory vectors upon an
on an internal structure of this data, for example the input) or pattern recognition (including finding and
Euclidean distance as a similarity measure between examining the shape of patterns; this term is often
two examples of features. Alternative data represen- used as a synonym to pattern classification).
tations have been proposed by Seth Lloyd and his
co-workers, who encode classical information into the The central problem of unsupervised learning is
norm of a quantum state, hx| xi = |~x|−1 ~x2 , leading clustering data. Given a set of feature vectors {~v p },
to the definition [11, 12] the goal is to assign each vector to one out of k dif-
ferent clusters so that similar inputs share the same
|xi = |~x|− /2 ~x.
1
(1) assignment. Other problems of machine learning con-
cern optimal strategies in terms of an unknown re-
In order to use the strengths of quantum mechan- ward function, given a set of consecutive observations
ics without being confined by classical ideas of data of choices and consequences. As stated above we will
encoding, finding ‘genuinely quantum’ ways of rep- not concentrate on the learning of strategies here.
resenting and extracting information could become
vital for the future of quantum machine learning.
3.1 Quantum versions of k-nearest
neighbour methods
3 Quantum versions of machine
learning algorithms A very popular and simple standard textbook
method for pattern classification is the k-nearest
Before proceeding to the discussion of classical neighbour algorithm. Given a training set T of
machine learning algorithms and their quantum feature vectors with their respective classification
counterparts, we have to take a look on the actual as well as an unclassified input vector ~x, the idea
problems these methods intend to solve, as well is to choose the class cx for the new input that
as introduce the formalism used throughout this appears most often amongst its k nearest neighbours
article. Probably the most important application is (see Figure 4). This is based on the assumption
the task of pattern classification, and there are many that ‘close’ feature vectors encode similar examples,
different classical algorithms tackling this problem. which is true for many applications. Common
Based on a set of training examples consisting of distance measures are thereby the inner product,
feature vectors4 and their respective class attributes, the Euclidian or the Hamming distance5 . Choosing
the computer has to correctly classify an unknown k is not always easy and can influence the result
feature vector. For example, the feature vector significantly. If k is chosen too big we loose the
4 A feature vector has entries that refer to information on a 5 The Hamming distance between two binary strings is the

specific case, in other words a datapoint. number of flips needed to turn one into the other [36].

6
|0 H H
|a

|b

k=5 'k=1' Figure 5: Quantum circuit representation of a swap


test routine.
Figure 4: (Colour online) a: Illustration of the
kNN method of pattern classification. The new vec- transformation sets the ancilla into a superposition

tor (black cross) gets assigned to the class that the 1/ 2(|0i + |1i), followed by a controlled SWAP-gate

majority of its k closest neighbours have (in this case on a and b which swaps the two states under the
it would be the orange circle shape). b: A variation condition that the ancilla is in state |1i. A sec-
is the nearest-centroid method in which the closest ond Hadamard gate on the ancilla results in state
mean vector of a class of vectors defines the classifi- |ψSW i = 12 |0i (|a, bi + |b, ai) + 12 |1i (|a, bi − |b, ai) for
cation of a new input. This can be understood as a which the probability of measuring the ground state
k-nearest neighbour method with preprocessed data is given by
and k = 1. 1 1 2
P (|0anc i) = + |ha| bi| . (2)
2 2
locality information and end up in a simple majority A probability of 1/2 consequently shows that the two
vote over the entire training set, while a very small quantum states |ai and |bi do not overlap at all (in
k leads to noise-biased results. A variation of the other words, they are orthogonal), while a proba-
algorithm suggests not to run it on the training
P set, bility of 1 indicates that they have maximum overlap.
but to calculate the means or centroid 1/Nc p ~v p of
all Nc vectors belonging to one class c beforehand, Based on the swap test, Lloyd, Mohseni and
and to select the class of the nearest centroid (we call Rebentrost [11] recently proposed a way to retrieve
this here the nearest-centroid algorithm). Another the distance between two real-valued n-dimensional
variation weights the influence of the neighbours by vectors ~a and ~b through a quantum measurement.
distance, gaining an independence of the parameter More precisely, the authors calculate the inner prod-
k (the weighted nearest neighbours algorithm [37]). uct of the ancilla of state |ψi = √12 (|0, ai + |1, bi)
Methods such as k-nearest neighbours are obviously
with the state |φi = √1 (|~a| |0i − |~b| |1i) (with
based on a distance metric to evaluate the similarity Z
2
of two feature vectors. Efforts to translate this Z = |~a|2 + |~b|2 ), evaluating |hφ| ψi| as part of a
algorithm into a quantum version therefore focus swap test. This looks complicated, but is first of all
on the efficient evaluation of a classical distance an inexpensive procedure since the states |φi and
through a quantum algorithm. |ψi can be efficiently prepared [11]. The trick lies
in the clever definition of a quantum state given
Aı̈meur, Brassard and Gambs [38] introduce the in Eq. (1), which encodes the classical length of a
idea of using the overlap or fidelity |ha| bi| of two vector ~x into the scalar product of the quantum state
quantum states |ai and |bi as a ‘similarity mea- with itself, hx| xi = |~x|−1 |~x|. With this definition
2
sure’. The fidelity can be obtained through a sim- the identity |~a − ~b|2 = Z |hφ| ψi| holds true. The
ple quantum routine sometimes referred to as a swap classical distance between two vectors ~a and ~b can
test [39] (see Figure 5). Given a quantum state consequently be retrieved through a simple quantum
|a, b, 0anc i containing the two wavefunctions as well swap test of carefully constructed states. Lloyd,
as an ancilla register initially set to 0, a Hadamard Mohseni and Rebentrost use this procedure for a

7
quantum version of the nearest-centroid algorithm. [42] for this purpose. At the centre is his subrou-
With ~a ≡ ~x and ~b ≡ N1c p ~v p , they propose to
P
tine to measure the Hamming distance between two
calculate the classical distancePfrom the new input binary quantum states. He constructs a quantum
to a given centroid, |~x − N1c p ~v p |, through the superposition containing all states of the quantum
above described procedure. The authors claim that training set, and writes the Hamming distance to the
even when considering the operations to construct binary input vector |xi = |x1 ...xn i , xi = {0, 1} into
the quantum states involved, this quantum method the amplitude of each training vector state. This is
is more efficient than the polynomial runtime needed done by the following useful routine based on elemen-
to calculate the same value on a classical computer. tary quantum operations. Given two binary strings
|a1 ...an i and |b1 ...bn i with entries ai , bi ∈ {0, 1}, we
Wiebe, Kapoor and Svore [13] also use a swap test construct the initial state |ψi = |a1 ...an , b1 ...bn i ⊗
√1 (|0i + |1i), consisting of two registers for the
in order to calculate the inner product of two vectors, 2
which is another distance measure between feature qubits of a and b respectively, as well as an extra
vectors. However, they use an alternative repre- 2-dimensional ancilla register in superposition. The
sentation of classical information through quantum inverse Hamming distance between each qubit of the
states. Given n-dimensional classical vectors ~a, ~b first and second register,
with entries aj = |aj |eiαj , bj = |bj |eiβj , j = 1, ..., n 
¯ 0, if |ak i = |bk i ,
as well as an upper bound rmax for the en- dk =
1, else,
tries of the training vectors in T and an upper
bound for the number of zeros in a vector d (the replaces the respective qubit in the second register.
sparsity), the idea is to write the parameters This is done by applying an XORa,b -gate which over-
into amplitudes
q of the quantum states |Ai = writes the second entry bk with 0 if ak = bk and else
|aj |2 −iαj aj
√1 |1i) |1i and with 1, as well as a NOT gate. The result is the state
P
d j |ji ( 1 − rmax2 e |0i + rmax
q
|b |2 bj 1
|Bi = √1d j |ji |1i ( 1 − r2j e−iβj |0i + rmax
P
|1i) |ψ 0 i = a1 ...an , d¯1 ...d¯n ⊗ √ (|0i + |1i).

max
and perform a swap test on |Ai and |Bi. Ac- 2
cording to Eq. (2), the probability of measuring
To write the total Hamming distance d¯H (~a, ~b) first
the swap-test ancilla in the ground state is then
into the phase and then into the amplitude, Trugen-
P (|0ianc ) = 21 + 12 | dr21 2
P
i ai bi | and the inner π
max berger uses thePunitary operator U = exp(−i 2n H)
product of ~
a , ~b can consequently be evaluated 1
with H = 1 ⊗ k ( 2 (σz + 1))dk ⊗ σz working on the
by | i ai bi |2 = d2 rmax 4
P
(2P (|0ianc ) − 1), which is three registers. Note that this adds a negative sign
altogether independent of the dimension n of the in case the ancilla qubit is in |1i. A Hadamard trans-
vector. The authors in fact claim a quadratic formation on the ancilla state, Hanc = 1 ⊗ 1 ⊗ H
speed-up compared to classical algorithms. In the consequently results in
same contribution, Wiebe, Kapoor and Svore also
hπ i
give a scheme for a (weighted) nearest-centroid algo- |ψ 00 i = cos d¯H (~a, ~b) a1 ...an , d¯1 ...d¯n , 0 +

rithm based on the Euclidian distance evaluated by 2n h
well-known algorithms from the toolbox of quantum π ¯ i
dH (~a, ~b) a1 ...an , d¯1 ...d¯n , 1 .

+ sin
information, the amplitude estimation algorithm 2n
[40] and Dürr and Høyer’s find minimum subroutine Measuring the ancilla in |0i leads to a state in which
[41]. the amplitude scales with the Hamming distance
between ~a and ~b. Of course, the power of this
A full quantum pattern recognition algorithm for routine only becomes visible if it is applied to a large
binary features was presented by Trugenberger [9]. superposition of P training states in the first register
p
He expands his quantum associative memory circuit |a1 , ..., an i → p |v i. A clever measurement then

8
retrieves the states close to the input state with a
high probability.

3.2 Quantum computing for support -b


vector machines ||w||
A support vector machine is used for linear dis- v w*v +b
crimination, which is a subcategory of pattern
w ||w||
classification. The task in linear discrimination
problems is to find a hyperplane that is the best Figure 6: A support vector machine finds a hyper-
discrimination between two class regions and serves plane (here a line) with maximum margin to the clos-
as a decision boundary for future classification tasks. est vectors. This image illustrates the geometry of
In a trivial example of one-dimensional data and the optimisation problem based on [29].
only two classes, we would ask which point x lies
exactly between the members of class 1 and 2, so
that all values left of x belong to one class and all formulated using the Langrangian method [22] or in
values right of x to the other. In higher dimensions, dual space [43].
the boundary is given by a hyperplane (see Figure 6
for two dimensions). It seems like a severe restriction Without going into the complex mathematical
that methods of linear discrimination require the details of support vector machines, it is important
problem to be linearly separable, which means that to note that the mathematical formulation of the
there is a hyperplane that divides the datapoints optimisation problem contains a kernel K, a matrix
so that all vectors of either class are on one side of containing the inner product of the feature vectors
the hyperplane (in other words, the regions of each (K)pk = ~vp · ~vk , p, k = 1, ..., N (or the basis vectors
class have to be disjunct). However, a non-separable they are composed of) as entries. Support vector
problem can be mapped onto a linearly separable machines are in fact part of a larger class of so called
problem by increasing the dimensions [22]. kernel methods [29] (for more details see [22]) that
suffer from the fact that calculating kernels can get
A support vector machine tries to find the opti- very expensive in terms of computational resources.
mal separating hyperplane. The best discriminating More precisely, quadratic programming problems
hyperplane has a maximum distance to the closest of this form have a complexity of O((N n)3 ) [29]
datapoints, the so called support vectors. This is where N n is the number of variables involved, and
a mathematical optimisation problem of finding the computational resources therefore grow significantly
−1
maximum margin |w| ~ (~v w
~ + b) between the hyper- with the size of the training data. It is thus crucial
plane and the support vectors [29] (see Figure 6). In for support vector machines to find a method of
the 2-dimensional case, the boundary conditions are evaluating an inner product efficiently. This is where
quantum computing comes into play.
~ vi + b ≥ 1, when ci = 1,
w~
(3)
~ vi + b ≤ −1, when ci = −1,
w~ Rebentrost, Mohseni and Lloyd [12] claim that
in general, the evaluation of an inner product can
for each support vector ~vi from the training data set be done faster on a quantum computer. Given the
and its classification ci ∈ {−1, 1}. This means that quantum state6 |χi = 1/pN P2n |~x | |ii xi , with
χ i=1 i
while finding a maximum margin, the hyperplane
must still separate the training vectors of the two 6 The initial state can be constructed by using a Quantum

classes correctly. This optimisation problem can be Random Access Memory oracle described in [44], accessing a

9
P2n
Nχ = i=1 |~xi |2 . The xi are a 2n -dimensional ba- measure such as the squared Euclidean distance
sis of the training vector space T , so that every train- ((~a − ~b)2 with ~a, ~b ∈ RN ).
p
p
P |v i ican
ing vector be represented as a superposition
|v i = αi x . Similar to the same authors’ dis-
The standard textbook example for clustering is
tance measurement given in Eq. (1), the quantum
the k-means algorithm, in which alternately each
evaluation of a classical inner product relies on the
feature vector or datapoint is assigned to its closest
fact that the quantum states are normalised as
current centroid vector to form a cluster for each

i j ~xi · ~xj centroid, and the centroid vectors get calculated
x x = i j . from the clusters of the previous step (see Figure 7).
|~x ||~x |
Of course, the first iteration requires initial choices
The kernel matrix of the inner products of the basis for the centroid vectors, and a free parameter is the
vectors, K with (K)i,j = ~xi · ~xj , can then be calcu- number k of clusters to be formed. The procedure
lated by taking the partial trace of the corresponding eventually converges to stable centroid positions.
However, these may represent local minima, as
i

density matrix |χihχ| over the states x ,
only the position of the initial centroids defines
n
2
1 X
i j i j K̂ whether a global minima can be reached [46]. Other
trx [|χihχ|] = x x |~x ||~x | |iihj| = . problems of k-means clustering are how to choose
Nχ i,j=1 | tr[K]
the parameter k without prior knowledge of the
{z }
xi ·~
~ xj
data, and how to deal with clusters that are visibly
Rebentrost, Mohseni and Lloyd propose that the not grouped according to distance measures (such
inner product evaluation can not only be used for as concentric circles). Still, k-means works well
the kernel matrix but also when a pattern has to be for many simple applications of reducing many
classified, which invokes the evaluation of the inner datapoints into only a few groups, for example in
product between the above parameter vector w ~ and data compression tasks. A variation of the k-means
the new input (see Eq. 3).7 algorithm is the k-median clustering, in which the
role of the centroid is taken over by the datapoint of
a cluster, that has the smallest total distance to all
3.3 Quantum algorithms for cluster- other points.
ing
Besides versions of quantum clustering that are
Clustering describes the task of dividing a set of
merely inspired by quantum mechanics [47] or use the
unclassified feature vectors into k subsets or clusters. 2
It is the most prominent problem in unsupervised quantum mechanical fidelity Fid(|ψi , |φi) = |hψ| φi|
learning, which does not use training sets or ‘prior as a distance measure for an otherwise classical al-
examples’ for generalisation, but rather extracts gorithm [38], several full quantum routines for
information on structural characteristics of a data clustering have been proposed. For example,
set. Clustering is usually based on a distance Aı̈meur, Brassard, Gilles and Gambs [48] use two
subroutines for a quantum k-median algorithm.
superposition of memory states in O(log(nM)). First, with the help of an oracle that calculates
7 In the same paper, Rebentrost, Mohseni and Lloyd [12]
the distance between two quantum states, the total
also present another quantum support vector machine that
uses the reformulation of the optimisation as a least-squares distance of each state to all other states of one
problem, which appears to be a system of linear equations. cluster is calculated. Based on the find minimum
Following [45], this can be solved by a quantum matrix inver- subroutine in [41], the authors then describe a
sion algorithm, which under some conditions (depending on
the matrix and the output information required) can be more
routine to find the smallest value of this distance
efficient than classical methods. The classification is then pro- function and select the according quantum state as
posed to be done through a swap test. the new median for the cluster. Unfortunately, the

10
abatically
P transform an initial Hamiltonian H0 =
1 − k1 c,c0 |cihc0 |, into a Hamiltonian
X
H1 = |~v p − ~v̄c0 |2 |c0 ihc0 | ⊗ |jihj|,
c0 ,j

encoding the distance between vector ~v p to the cen-


step 1 step 2 troid of the closest cluster, ~v̄c . They give a more
refined version and also mention that the adiabatic
Figure 7: The alternating steps of a k-means algo- method can be applied to solve the optimisation
rithm. Step 1: The clusters (different shapes and problem of finding good initial or ‘seed’ centroid vec-
colours) are defined by attributing each vector to the tors.
closest centroid vector (larger and darker shapes).
Step 2: The centroids of each cluster defined in the
previous cycle are recalculated and define a new clus-
tering. 3.4 Searching for a quantum neural
network model
An artificial neural network is a n-dimensional
oracle is not described in detail, and their quantum graph where the nodes xm are called neurons and
machine learning proposal largely depends on how their connections are weighted by parameters wml
and with what resources it can be implemented. representing synaptic strengths between neurons
(m, l = 1, ..., n). An activation function defines the
In their contribution discussed earlier, Lloyd, value of a neuron depending on the current value of
Mohseni and Rebentrost [11] present an unsu- all other neurons weighted by the parameters wml ,
pervised quantum learning algorithm for k-means and the dynamics of the neural network is given by
clustering that is based on adiabatic quantum successively updating the value of neurons through
computing. Adiabatic quantum computing is an the activation function. An artificial neural network
alternative to the above introduced method of im- can thus be understood as a computational device,
plementing unitary gates, and tries to continuously the input being the initial values of the neurons
adjust the quantum system’s parameters in an and the output either a stable state of the entire
adiabatic process in order to transfer a ground state network or the state of a specific subset of neurons.
which is easy to prepare into a ground state which ‘Programming’ a neural network can be done by
encodes the result of the computation. Although not selecting weight parameters wml and an activation
in focus here, quantum adiabatic computing seems function encoding a certain input-output relation.
to be an interesting candidate for quantum machine The power of artificial neural networks lies in the
learning methods [15]. This is why we want to sketch fact that they can learn their weights from training
the idea of how to use adiabatic quantum computing data, a fact that neuroscientists believe is the basic
for k-means clustering. principle of how our brain processes information [49].

In [11], the goal of each clustering step is For pattern classification we usually consider
to have
P an output quantum superposition |χi = so called feed-forward neural networks in which
1/√Nc v p i, where as usual {|v p i}p=1,...,N is
c,p∈c |ci |~ neurons are arranged in layers, and each layer feeds
the set of N feature vectors or datapoints expressed its values into the next layer. An input is presented
as quantum
states, and |ci is the cluster the sub- to a feed-forward neural network by initialising the
set { v j }j=1,...,Nc is assigned to after the cluster- input layer, and after each layer successively updates
ing step. The authors essentially propose to adi- its nodes the output (for example encoding the

11
In Out computation. A practical implementation is given by
Elizabeth Behrman [54, 55, 56] who uses interact-
ing quantum dots to simulate neural networks with
In Out quantum systems. An interesting approach is also to
use fuzzy feed-forward neural networks inspired by
quantum mechanics [57] to allow for multi-state neu-
In Out rons. Also worth mentioning is the pattern recogni-
tion scheme implemented through adiabatic comput-
Figure 8: Illustration of a feed-forward neural net- ing with liquid-state nuclear magnetic resonance [16].
work with a sigmoid activation function for each neu- Despite this rich body of ideas, there is no quantum
ron. neural network proposal that delivers a fully function-
ing efficient quantum pattern classification method
that the authors know of. However, it is an interest-
classification of the input) can be read out in the ing open challenge to translate the nonlinear activa-
last layer (see Figure 8). tion function into a meaningful quantum mechanical
framework [7], or to find learning schemes based on
Feed-forward neural networks often use sigmoid ac- quantum superposition and parallelism.
tivation functions
3.5 Towards a quantum decision tree
N
!
X
xl = sgm wml xm ; κ ,
m=1
Decision trees are classifiers that are probably the
most intuitive for humans. Depending on the answer
defined by sgm(a; κ) = (1 + e−κa )−1 . If an appropri- to a question on the features, one follows a certain
ate set of weight parameters is given, feed-forward branch leading to the next question until the final
neural networks are able to classify input patterns class is found (see Figure 9). More precisely, a
extremely well. To evoke the desired generalisation, mathematical tree is an undirected graph in which
the network is initialised with training vectors, the any two nodes are connected by exactly one edge.
output is compared to the correct output, and the Decision trees in particular have one starting node,
weights adjusted through gradient descent in order the ‘root’ (a node with outgoing but no incoming
to minimise the classification error. The procedure is edges), and several end points or ‘leaves’ (nodes with
called backpropagation [50]. A challenge for pattern incoming but no outgoing edges). Each node except
classification with neural networks is the computa- from the leaves contains a decision function which
tional cost for the backpropagation algorithm, even decides which branch an input vector follows to the
when we consider improved training methods such next layer, or in other words, which partition on a
as deep learning [30]. set of data is makes. The leaves then represent the
final classification. As in the example in Figure 9,
There are a number of proposals for quantum ver- this procedure could be used to classify an email as
sions of neural networks. However, most of them ‘spam’, ‘no spam’ or ‘unsure’.
consider another class, so called Hopfield networks,
which are powerful for the related task of associa- Decision trees, as all classifiers in machine learn-
tive memory that is derived from neuroscience rather ing, are constructed using a training data set of
than machine learning. A large share of the litera- feature vectors. The art of decision tree design
ture on quantum neural networks tries to find spe- lies in the selection of the decision function in
cific quantum circuits that integrate the mechanisms each node. The most popular method is to find
of neural networks in some way [6, 51, 52, 53], trying the function that splits the given dataset into
to use the power of neural computing for quantum the ‘most organised’ sub-datasets, and this can

12
clear account of how the division of the set at each
node takes place and remain enigmatic in this essen-
Email sender
address book
tial part of the classifying algorithm. They contribute
the interesting idea of using the von Neumann en-
No Yes tropy to design the graph partition. Although the
Email contains first step has been made, the potential of a quantum
Sender manually
indicated word
marked as spam decision tree is still to be established.
combinations

No Yes 3.6 Quantum state classification with


Unsure Spam No spam
Bayesian methods
Stochastic methods such as Bayesian decision theory
Figure 9: A simple example of a decision tree for play an important role in the discipline of machine
the classification of emails. The geometric shapes learning. It can also be used for pattern classifi-
symbolise feature vectors from different classes that cation. The idea is to analyse existing information
are devided according to decision functions along the (represented by the above training data set T ) in or-
tree structure. der to calculate the probability that a new input is
of a certain class. An illustrative example is the risk
class evaluation of a new customer to a bank. This is
be measured in terms of Shannon’s entropy [22]. nothing else than a conditional probability and can
Assume the decision function of a node splits a be calculated using the famous Bayes formula
set of P feature vectors {~v p }, p = 1, ..., N into
M subsets each containing {N1 , ..., NM } vectors p(c)p(~x|c)
respectively (and
PM p(c|~x) = .
i=1 Ni = N ). Without further p(~x)
information, we calculate the probability of any
vector ~v p to be attributed to subset i, i ∈ {1, ..., M } Here, p(c), p(~x) are the probabilities of data being
(in other words to proceed to the ith node of the in class c and of getting input ~x respectively, while
next layer) as ρi = N N , and the entropy caused by
i
p(c|~x) is the conditional probability of assigning c
the decision
PM function or partition is consequently upon getting ~x and p(~x|c) is the class likelihood of
S = − i=1 ρi log(ρi ). For example, in a binary tree getting ~x if we look in class c. Obviously, we assign
where all nodes have two outgoing edges, the best the class with the highest conditional probability (or
partition would split the original set into two subsets ‘Bayes classifier’) p(cl |~x) to an input [22]. Values of
of the same size. Obviously, this is only possible if interest, such as risk functions, can be calculated
one of the features allows for such a split. Depending accordingly. Bayesian theory is an interesting
on the application, an optimal decision tree would be candidate for the translation into quantum physics,
small in the number of nodes, branches and/or levels. since both approaches are probabilistic.

Lu and Brainstein [58] propose a quantum version Opposed to above efforts to improve machine
of the decision tree. Their classifying process follows learning algorithms through quantum computing,
the classical algorithm with the only difference that Bayesian methods can be used for an important task
p
we use quantum feature states |vi = |v1p , ..., vnp i en- in quantum information called quantum state clas-
coding n features into the states of a quantum sys- sification. This problem stems from quantum in-
tem. At each node of the tree, the set of training formation theory itself, and the goal is to use ma-
quantum states is divided into subsets by a measure- chine learning based on Bayesian theory in order to
ment (or as the authors call it, estimating attribute discriminate between two quantum states produced
vi , i = 1, ..., n). Lu and Brainstein do not give a by an unknown or partly unknown source. This is

13
again a classification problem, since we have to learn Hidden Markov models are thus doubly embedded
the discrimination function between two classes c1 , c2 stochastic processes. To use a common application
from examples. The two (unknown) quantum states for pattern recognition as an example [29], consider
are represented by density matrices ρ, σ. The basic a recorded speech. The speech is a realisation of
idea is to use a positive operator-valued measurement a Markov process, a so called Markov chain of
(POVM) with binary outcome corresponding to the successive words. The recording is the observation,
two classes as a Bayesian classifier, in other words, and we shall for now imagine a way to translate the
to learn (or calculate) the measurement on our quan- signal into discrete symbols. A Markov model is
tum states that is able to discriminate them [59]. For defined by the transition probabilities between words
this process we have a training set consisting of ex- in a certain language, and the model can be learned
amples of the two states and their respective classifi- from examples of speeches. A hidden Markov model
cation, T = {(ρ, c1 ), (σ, c2 ), (ρ, c1 ), ...} and the exper- also includes the conditional probabilities that given
imenter is allowed to perform any operation on the a certain signal observation, a certain word has been
training set. Guţă and Kotlowski [59] find an optimal said. Goals of such models are to find the sequence
qubit classification strategy while Sasaki and Carlini of words that is the most likely for a recording, to
[60] are concerned with the related template match- predict the next word or, if only given the recording,
ing problem8 by solving an optimisation problem for to infer the optimal hidden Markov model that
the measurement operator. Sentis et al. [17] give a would encode it. Hidden Markov models play an
variation in which the training data can be stored as important role in many other applications such as
classical information. The proposals are so far of the- DNA analysis and online handwriting recognition
oretical nature and await experimental verification of [29].
the usefulness of this scheme.
Monras, Beige and Wiesner [61] first introduced a
hidden quantum Markov model in 2010. In contrast
3.7 Hidden quantum Markov models to a previous paper [63] in which the observations are
represented by quantum basis states and the observa-
In the last couple of years, hidden Markov models
tion process is given by a von Neumann or projective
were another important method of machine learning
measurement of an evolving quantum system, the au-
that has been investigated from the perspective
thors consider the much more general formalism of
of quantum information [61, 18]. Hidden Markov
open quantum systems (for an introduction to open
models are Markov processes for which the states of
quantum systems, see [64]). The state of a system is
the system are only accessible through observations
given by a density matrix ρ and transitions between
(see Figure 10, for a very readable introduction
states are governed by completely positive trace-
see [62]). In a (first order discrete and static)
nonincreasing superoperators Ai acting on these ma-
Markov model, a system has a countable set of states
trices. These operations can always be represented by
S = {sm }m=1,...,M and the transition between these
a set of Kraus operators [64] {K1i , ..., Kqi } fulfilling the
states are governed by a stochastic process in such
probability conservation condition q Kqi† Kqi ≤ 1,
P
a way that given a set of transition probabilities
{aml }m,l=1,...,M , the system’s state at time t + 1 only X
depends on the previous state at time t. In a hidden ρ0 = Ai ρ = Kki ρKki† .
model, the state of the system is only accessible k
through observations at time t {ot } that can take one
The probability of obtaining state ρs = P (ρs )−1 As ρ
of a set of symbols, and an observation again has a
is given by P (ρs ) = tr[As ρ] [61].
certain probability to be invoked by a specific state.
8 Template matching is the task to assign the most similar The advantage of hidden quantum Markov models
training vector of a training set to an input vector. is that they contain classical hidden Markov models

14
o12 o4 o8 leading to the next state of the system. The state of
the system is again only accessible through observa-
tions that deliver probabilistic information. The goal
S1 is to find a strategy (defining what action to take
upon what observation) that maximises the rewards
S2 given by a reward function. This is a problem of
reinforcement learning by intelligent agents which is
S3 not the focus of this contribution. However, we also
find the striking analogy to Kraus operations on open
quantum systems representing the actions that ma-
t1 t2 t3
nipulate the density matrix or stochastic description
of the system.
Figure 10: (Colour online) A hidden Markov model
is a stochastic process of state transitions. In this
sketch, the three states s1 , s2 , s3 are connected with 4 Conclusion
lines symbolising transition probabilities. A deter-
ministic realisation is a sequence of states, here the This introduction into quantum machine learning
transition s1 → s2 → s1 that give rise to observations gave an overview of existing ideas and approaches
o12 → o4 → o8 . A task for hidden Markov models to quantum machine learning. Our focus was
is to guess the most likely state sequence given an thereby on supervised and unsupervised methods
observation sequence. for pattern classification and clustering tasks, and
it is therefore by no means a complete review.
In summary, there are two main approaches to
and are therefore a generalisation offering richer quantum machine learning. Many authors try to
dynamics than the original process [61]. In future find quantum algorithms that can take the place
there might also be the possibility of ‘calculating’ the of classical machine learning algorithms to solve a
outcomes of classical models via quantum simulation. problem, and show how an improvement in terms of
That would be especially interesting if the quantum complexity can be gained. This is dominantly true
setting could learn models from given examples, a for nearest neighbour, kernel and clustering methods
problem which is nontrivial [62]. Clark et al. [18] in which expensive distance calculations are sped up
add the notion that hidden quantum Markov models by quantum computation. Another approach is to
can be implemented using open quantum systems use the probabilistic description of quantum theory
with instantaneous feedback, in which information in order to describe stochastic processes. In the
obtained from the environment is used to influence case of hidden quantum Markov models, this served
the system. However, a rigorous treatment of this to generalise the model, while Bayesian theory
idea is still outstanding, and the power of hidden was also used for genuinely quantum information
quantum Markov models to solve the problems for tasks like quantum state discrimination. A great
which classical models where developed is yet to be deal of contributions is still in a phase of exploring
shown. possibilities to combine formalisms from quantum
theory and methods of machine learning, as seen in
An interesting sibling of hidden quantum Markov the area of quantum neural networks and quantum
models are quantum observable Markov decision pro- decision trees.
cesses [65] which use a very similar idea. Classical
observable Markov decision processes can be under- As previously remarked, a quantum theory of
stood as hidden Markov models in which before each learning is yet outstanding. Although working on
step an agent takes a decision for a certain action, quantum machine learning algorithms, only very few

15
contributions actually answer the question of how [3] I. M. Georgescu, S. Ashhab, and Franco
the strength and defining feature of machine learn- Nori. Quantum simulation. Review of Modern
ing, the learning process, can actually be simulated Physics, 86:153–185, 2014.
in quantum systems. Especially learning methods of
parameter optimisation have not yet been accessed [4] Gerasimos G Rigatos and Spyros G Tzafestas.
from a quantum perspective. Different approaches to Neurodynamics and attractors in quantum as-
quantum computing can be investigated for this pur- sociative memories. Integrated Computer-Aided
pose. In quantum computing based on unitary quan- Engineering, 14(3):225–242, 2007.
tum gates, the challenge would be to parameterise [5] Elizabeth C Behrman and James E Steck. A
and gradually adapt the unitary transformations that quantum neural network computes its own rel-
define the algorithm. Several ideas in that direc- ative phase. arXiv preprint arXiv:1301.2808,
tion have been investigated already [66, 67, 35], and 2013.
important tools could be quantum feedback control
[68] or quantum Hamiltonian learning [69]. As men- [6] Sanjay Gupta and RKP Zia. Quantum neural
tioned before, adiabatic quantum computing might networks. Journal of Computer and System Sci-
lend itself to learning as an optimisation problem ences, 63(3):355–383, 2001.
[15]. Other alternatives of quantum computation,
such as dissipative [70] and measurement-based quan- [7] Maria Schuld, Ilya Sinayskiy, and Francesco
tum computing [71] might also offer an interesting Petruccione. The quest for a quantum neural
framowork for quantum learning. In summary, even network. Quantum Information Processing, DOI
though there is still a lot of work to do, quantum 10.1007/s11128-014-0809-8, 2014.
machine learning remains a very promising emerging
[8] Dan Ventura and Tony Martinez. Quan-
field of research with many potential applications and
tum associative memory. Information Sciences,
a great theoretical variety.
124(1):273–296, 2000.

[9] Carlo A Trugenberger. Quantum pattern


recognition. Quantum Information Processing,
Acknowledgements 1(6):471–493, 2002.
This work is based upon research supported by the [10] Ralf Schützhold. Pattern recognition on a quan-
South African Research Chair Initiative of the De- tum computer. Physical Review A, 67:062311,
partment of Science and Technology and National 2003.
Research Foundation.
[11] Seth Lloyd, Masoud Mohseni, and Patrick
Rebentrost. Quantum algorithms for super-
vised and unsupervised machine learning. arXiv
References preprint arXiv:1307.0411, 2013.

[1] Martin Hilbert and Priscila López. The [12] Patrick Rebentrost, Masoud Mohseni, and Seth
world’s technological capacity to store, com- Lloyd. Quantum support vector machine for
municate, and compute information. Science, big feature and big data classification. arXiv
332(6025):60–65, 2011. preprint arXiv:1307.0471, 2013.

[13] Nathan Wiebe, Ashish Kapoor, and Krysta


[2] Michael A Nielsen and Isaac L Chuang. Quan- Svore. Quantum nearest-neighbor algo-
tum computation and quantum information. rithms for machine learning. arXiv preprint
Cambridge University Press, 2010. arXiv:1401.2142, 2014.

16
[14] Hartmut Neven, Vasil S Denchev, Geordie Rose, [24] Steven E Landsburg. Quantum game theory.
and William G Macready. Training a large scale Wiley Encyclopedia of Operations Research and
classifier with the quantum adiabatic algorithm. Management Science, 2011.
arXiv preprint arXiv:0912.0779, 2009.
[25] Jens Eisert, Martin Wilkens, and Maciej Lewen-
[15] Kristen L Pudenz and Daniel A Lidar. Quantum stein. Quantum games and quantum strategies.
adiabatic machine learning. Quantum Informa- Physical Review Letters, 83(15):3077, 1999.
tion Processing, 12(5):2027–2070, 2013.
[26] Hans J Briegel and Gemma De las Cuevas. Pro-
[16] Rodion Neigovzen, Jorge L Neves, Rudolf Sol- jective simulation for artificial intelligence. Sci-
lacher, and Steffen J Glaser. Quantum pattern entific Reports, 2, 2012.
recognition with liquid-state nuclear magnetic [27] Jiangfeng Du, Hui Li, Xiaodong Xu, Mingjun
resonance. Physical Review A, 79(4):042321, Shi, Jihui Wu, Xianyi Zhou, and Rongdian Han.
2009. Experimental realization of quantum games on
a quantum computer. Physical Review Letters,
[17] G Sentı́s, J Calsamiglia, Ramón Muñoz-Tapia,
88(13):137902, 2002.
and E Bagan. Quantum learning without quan-
tum memory. Scientific Reports, 2(708):1–8, [28] Edward W Piotrowski and Jan Sladkowski.
2012. An invitation to quantum game theory. In-
ternational Journal of Theoretical Physics,
[18] Lewis A Clark, Wei Huang, Thomas M Bar- 42(5):1089–1099, 2003.
low, and Almut Beige. Hidden quantum
markov models and open quantum systems [29] Christopher M Bishop et al. Pattern recognition
with instantaneous feedback. arXiv preprint and machine learning, volume 1. springer New
arXiv:1406.5847, 2014. York, 2006.

[19] Stuart Jonathan Russell, Peter Norvig, John F [30] Geoffrey Hinton, Simon Osindero, and Yee-
Canny, Jitendra M Malik, and Douglas D Ed- Whye Teh. A fast learning algorithm for deep be-
wards. Artificial intelligence: A modern ap- lief nets. Neural Computation, 18(7):1527–1554,
proach, volume 3. Prentice Hall Englewood 2006.
Cliffs, 2010. [31] David E Rumelhart, Geoffrey E Hinton, and
Ronald J Williams. Learning representations
[20] Frank Rosenblatt. The perceptron: a proba-
by back-propagating errors. Cognitive Modeling,
bilistic model for information storage and or-
1988.
ganization in the brain. Psychological Review,
65(6):386, 1958. [32] Masahide Sasaki and Alberto Carlini. Quantum
learning and universal quantum matching ma-
[21] Arthur L Samuel. Some studies in machine chine. Physical Review A, 66(2):022303, 2002.
learning using the game of checkers. IBM Jour-
nal of research and development, 44(1.2):206– [33] Esma Aı̈meur, Gilles Brassard, and Sébastien
226, 2000. Gambs. Quantum speed-up for unsupervised
learning. Machine Learning, 90(2):261–287,
[22] Ethem Alpaydin. Introduction to machine learn- 2013.
ing. MIT press, 2004.
[34] Markus Hunziker, David A Meyer, Jihun Park,
[23] Richard O Duda, Peter E Hart, and David G James Pommersheim, and Mitch Rothstein. The
Stork. Pattern classification. John Wiley & geometry of quantum learning. arXiv preprint
Sons, 2012. quant-ph/0309059, 2003.

17
[35] Alessandro Bisio, Giulio Chiribella, Gia- [46] Simon Rogers and Mark Girolami. A first course
como Mauro DAriano, Stefano Facchini, and in machine learning. CRC Press, 2012.
Paolo Perinotti. Optimal quantum learning of
a unitary transformation. Physical Review A, [47] David Horn and Assaf Gottlieb. Algorithm for
81(3):032324, 2010. data clustering in pattern recognition problems
based on quantum mechanics. Physical Review
[36] Richard W Hamming. Error detecting and error Letters, 88(1):018702, 2002.
correcting codes. Bell System Technical Journal,
29(2):147–160, 1950. [48] Esma Aı̈meur, Gilles Brassard, and Sébastien
Gambs. Quantum clustering algorithms. Pro-
[37] Klaus Hechenbichler and Klaus Schliep. ceedings of the 24th international conference on
Weighted k-nearest-neighbor techniques and machine learning, pages 1–8, 2007.
ordinal classification. 2004.
[49] Peter Dayan and Laurence F Abbott. Theoret-
[38] Esma Aı̈meur, Gilles Brassard, and Sébastien ical neuroscience, volume 31. MIT press Cam-
Gambs. Machine learning in a quantum world. bridge, MA, 2001.
In Advances in Artificial Intelligence, pages 431–
442. Springer, 2006. [50] John A Hertz, Anders S Krogh, and Richard G
Palmer. Introduction to the theory of neural
[39] Harry Buhrman, Richard Cleve, John Watrous, computation, volume 1. Westview Press, 1991.
and Ronald De Wolf. Quantum fingerprinting.
Physical Review Letters, 87(16):167902, 2001. [51] W Oliveira, Adenilton J Silva, Teresa B Lud-
ermir, Amanda Leonel, Wilson R Galindo, and
[40] Gilles Brassard, Peter Høyer, Michele Mosca, Jefferson CC Pereira. Quantum logical neural
and Alain Tapp. Quantum amplitude ampli- networks. 10th Brazilian Symposium on Neu-
fication and estimation. arXiv preprint quant- ral Networks, 2008. SBRN’08., pages 147–152,
ph/0005055, 2000. 2008.
[41] Christoph Dürr and Peter Høyer. A quantum al- [52] Adenilton J da Silva, Wilson R de Oliveira,
gorithm for finding the minimum. arXiv preprint and Teresa B Ludermir. Classical and super-
quant-ph/9607014, 1996. posed learning for quantum weightless neural
[42] Carlo A Trugenberger. Probabilistic quantum networks. Neurocomputing, 75(1):52 – 60, 2012.
memories. Physical Review Letters, 87:067901,
[53] Massimo Panella and Giuseppe Martinelli. Neu-
Jul 2001.
ral networks with quantum architecture and
[43] Bernhard E Boser, Isabelle M Guyon, and quantum learning. International Journal of Cir-
Vladimir N Vapnik. A training algorithm for op- cuit Theory and Applications, 39(1):61–77, 2011.
timal margin classifiers. Proceedings of the fifth
[54] Elizabeth C Behrman, James E Steck, and
annual workshop on Computational learning the-
Steven R Skinner. A spatial quantum neural
ory, pages 144–152, 1992.
computer. International Joint Conference on
[44] Vittorio Giovannetti, Seth Lloyd, and Lorenzo Neural Networks, 1999. IJCNN’99., 2:874–877,
Maccone. Quantum random access memory. 1999.
Physical Review Letters, 100(16):160501, 2008.
[55] Géza Tóth, Craig S Lent, P Douglas Tougaw,
[45] Aram W Harrow, Avinatan Hassidim, and Seth Yuriy Brazhnik, Weiwen Weng, Wolfgang Porod,
Lloyd. Quantum algorithm for linear sys- Ruey-Wen Liu, and Yih-Fang Huang. Quantum
tems of equations. Physical Review Letters, cellular neural networks. arXiv preprint cond-
103(15):150502, 2009. mat/0005038, 2000.

18
[56] Jean Faber and Gilson A Giraldi. Quantum [66] Søren Gammelmark and Klaus Mølmer. Quan-
models for artificial neural networks. Elec- tum learning by measurement and feedback.
tronically available: http://arquivosweb. lncc. New Journal of Physics, 11(3):033017, 2009.
br/pdfs/QNN-Review. pdf, 2002.
[67] Søren Gammelmark and Klaus Mølmer.
[57] G. Purushothaman and N.B. Karayiannis. Bayesian parameter inference from continuously
Quantum neural networks (qnns): inherently monitored quantum systems. Physical Review
fuzzy feedforward neural networks. Neural Net- A, 87(3):032115, 2013.
works, IEEE Transactions on, 8(3):679–693,
[68] Alexander Hentschel and Barry C Sanders. Ma-
1997.
chine learning for precise quantum measure-
ment. Physical Review Letters, 104(6):063603,
[58] Songfeng Lu and Samuel L Braunstein. Quan-
2010.
tum decision tree classifier. Quantum Informa-
tion Processing, 13(3):757–770, 2014. [69] Nathan Wiebe, Christopher Granade, Christo-
pher Ferrie, and David Cory. Quantum hamil-
[59] Mădălin Guţă and Wojciech Kotlowski. Quan- tonian learning using imperfect quantum re-
tum learning: asymptotically optimal classifica- sources. Physical Review A, 89(4):042314, 2014.
tion of qubit states. New Journal of Physics,
12(12):123032, 2010. [70] Frank Verstraete, Michael M Wolf, and J Igna-
cio Cirac. Quantum computation and quantum-
[60] Masahide Sasaki, Alberto Carlini, and Richard state engineering driven by dissipation. Nature
Jozsa. Quantum template matching. Physical Physics, 5(9):633–636, 2009.
Review A, 64(2):022317, 2001.
[71] HJ Briegel, DE Browne, W Dür, R Raussendorf,
[61] Alex Monras, Almut Beige, and Karoline Wies- and M Van den Nest. Measurement-based quan-
ner. Hidden quantum markov models and non- tum computation. Nature Physics, 5(1):19–26,
adaptive read-out of many-body states. Applied 2009.
Mathematical and Computational Sciences, 3:93,
2010.

[62] Lawrence R Rabbiner. A tutorial on hid-


den markov models and selected applications in
speech recognition. Proceedings of the IEEE,
77(2):257–286, 1989.

[63] Karoline Wiesner and James P Crutchfield.


Computation in finitary stochastic and quantum
processes. Physica D: Nonlinear Phenomena,
237(9):1173–1195, 2008.

[64] Heinz Peter Breuer and Francesco Petruccione.


The theory of open quantum systems. Oxford
University Press, 2002.

[65] Jennifer Barry, Daniel T Barry, and Scott


Aaronson. Quantum pomdps. arXiv preprint
arXiv:1406.2858, 2014.

19

You might also like