You are on page 1of 5

Quantum speedup for twin support vector machine

Zekun Yea , Lvzhou Lia,c,d ,∗ and Haozhen Situb,c


a
Institute of Computer Science Theory, School of Data and Computer Science,
Sun Yat-sen University, Guangzhou 510006, China
b
College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
c
Center for Quantum Computing, Pengcheng Laboratory, Shenzhen 518055, China and
d
The Key Laboratory of Machine Intelligence and Advanced
Computing (Sun Yat-sen University), Ministry of Education, China
(Dated: February 26, 2019)
In this work, we investigate how to speed up machine learning tasks based on quantum compu-
tation. We show an important classifier in machine learning, the twin support vector machine, can
arXiv:1902.08907v1 [quant-ph] 24 Feb 2019

be exponentially speeded up on quantum computers. Specifically, for a training set with m samples
which are represented by n-dimensional vectors, the proposed quantum algorithm can learn two
non-parallel hyperplanes in O(log mn) time, and then classify a new sample in O(log n) time by
comparing the distances from the simple to the two hyperplanes. Note that the classical algorithm
requires polynomial time both in the training and classification procedures.

I. INTRODUCTION use quantum algorithms to solve problems faster than the


best known classical algorithms. Two famous quantum
With the advent of big data era, it is of great impor- algorithms showing quantum speedups are Shor’s algo-
tance to extract useful information from massive data, rithm for factoring integers in polynomial time [1] and
which makes machine learning become a popular research Grover’s algorithm
√ for searching in database of size n
field. Machine learning enables agents or computers to with only O( n) queries [2]. Recent progresses on quan-
learn properties and characteristics from data, or im- tum algorithms show that quantum computers have the
proves the performance of specific tasks without being potential to speed up a lot of problems in AI especially
explicitly programmed. There are two main classes of in machine learning. Actually, since the basic unit of
machine learning tasks: supervised learning and unsu- quantum information, qubits, could be in the superpo-
pervised learning. In supervised learning, the goal is to sition states, quantum computers are very advantageous
learn a function or model from the training sets for pre- for processing high-dimensional data, such that certain
dicting the output of new data samples. In unsupervised machine learning models or algorithms could be nontriv-
learning, one hopes to find hidden structures in unlabeled ially improved on quantum computers in terms of time
data. Machine learning could deal with tasks which are or space complexity.
common in human’s life, such as image and speech recog-
Recently, quantum machine learning has attracted lots
nition, spam filtering, user behavior analysis, and port- of attention from the academic community. One can re-
folio optimization, etc.
fer to [3] for a review paper published in Nature. Broadly
As data sets continue to grow, the existing machine speaking, quantum machine learning contains two as-
learning algorithms could not meet the efficiency re- pects: (i) explore how to speed up machine learning
quirement of practical applications. Meanwhile, quan- tasks based on quantum computing, and (ii) use machine
tum computing might have the potential to speed up learning as a tool to solve problems in quantum informa-
these algorithms. As a new computation model, quantum tion and even in quantum mechanics. In a narrow sense,
computing has brought new vitality to computer science when referring to ‘quantum machine learning’, it usu-
and is likely to bring about a new technological revolu- ally points to the meaning of item (i). We also focus
tion. Quantum computers, in principle having a powerful on the first aspect in this paper. Some interesting results
parallelism of computation, use effects such as quantum have been reported in recent years, showing that classical
superposition and entanglement to process information machine learning models/algortihms could be essentially
in ways that classical computers cannot. However, it is (even exponentially) speeded up on quantum computers
worth pointing out that if one wants to make use of the [4–13, 18]. For example, there are researches on quan-
parallelism of quantum computing to solve a problem, tum data fitting [4], quantum principal component anal-
quantum algorithms play a critical role. A quantum algo- ysis [5], quantum support vector machine [6], quantum
rithm is a stepwise procedure following the rules of quan- algorithm for linear regression [7], quantum generative
tum mechanics, which might outperform the best known adversarial networks [12, 13], and so on. There are also
classical algorithms when solving certain problems. This some works on experimental realization of quantum ma-
is known as quantum speedups. chine learning algorithms, e.g. [19, 20]. As an emerging
It is a central problem in quantum computing to reveal field, quantum machine learning is still in the early stage
quantum speedups, or in other words, to explore how to of development, and it is worthy of further study.
2

Support vector machine (SVM) is a machine learning tively. Then the constraints of TSVM are as follows:
model widely used for classification. The idea of SVM
is to find a hyperplane so that the interval between the 1
||Aw~1 + b1 e~1 || + c1 e~2 · ξ~2 ,
2
min (3)
positive and negative samples is as large as possible [21]. 2
w1 ,b1

The twin support vector machine (TSVM) is an improved s.t. − (B w~1 + b1 e~2 ) + ξ~2 ≥ e~2 , ξ~2 ≥ 0 (4)
variant of SVM. Differently, TSVM aims to find two non- 1
||B w~2 + b2 e~2 || + c2 e~1 · ξ~1 ,
2
parallel hyperplanes so that each class of samples is close min (5)
2
w2 ,b2
to one of the hyperplanes and as far away as possible
from the other hyperplane [22]. Compared with the tra- s.t.(Aw~2 + b2 e~1 ) + ξ~1 ≥ e~1 , ξ~1 ≥ 0. (6)
ditional SVM, TSVM transforms a large quadratic pro-
gramming problem into two small quadratic program- The least squares twin support vector machine
ming problems, which makes the training process faster. (LSTSVM) uses equality constraints to replace the above
Meanwhile, TSVM also has a better generalization abil- inequality constraints, and simplifies the problem to solve
ity and more advantages on many problems than SVM, two linear equations [28]:
such as the preferential classification problems and the 1 1 2
||Aw~1 + b1 e~1 || + c1 ||ξ~2 || ,
2
problems of automatically discovering two-dimensional min (7)
w1 ,b1 2 2
projections of the data. TSVM is very useful for the pat-
tern classification, and has received extensive attention s.t. − (B w~1 + b1 e~2 ) + ξ~2 = e~2 . (8)
and wide range of applications, see [23–27]. Furthermore, 1 1 2
||B w~2 + b2 e~2 || + c2 ||ξ~1 || ,
2
min (9)
TSVM was reformed into least squares twin support vec- w2 ,b2 2 2
tor machine (LSTSVM) [28]. LSTSVM transforms the s.t.(Aw~2 + b2 e~1 ) + ξ~1 = e~1 , (10)
inequality constraints of TSVM into equality constraints,
which ultimately simplifies the problem into two linear where e~1 , e~2 are all-one column vectors. Substituting the
equations problems, and could classify large datasets for equality constraint (8) into the objective function (7), we
which TSVM requires high training times. have
In this paper, we propose a quantum twin support vec-
tor machine which can dramatically speed up both the 1 2 1 2
L = min ||Aw~1 + b1 e~1 || + c1 ||B w~1 + b1 e~2 + e~2 || .
learning and classification processes, exponentially faster w1 ,b1 2 2
than its classical counterparts. The rest of the paper is (11)
organized as follows. Section II briefly introduces the Set the partial derivatives of the function to be zero, we
classical TSVM. Section III proposes the quantum algo- get
rithm of TSVM. Section IV analyzes the algorithm in
terms of error rate and computational complexity. Sec- AT (Aw~1 + b1 e~1 ) + c1 B T (B w~1 + b1 e~2 + e~2 ) = 0, (12)
tion V gives the conclusion and outlook on future re- e~1 T (Aw~1 + b1 e~1 ) + c1 e~1 T (B w~1 + b1 e~2 + e~2 ) = 0,(13)
searches.
which are rewritten as
 
w~1 1
II. TWIN SUPPORT VECTOR MACHINE = −( E T E + F T F )−1 F T e~2 , (14)
b1 c1
TSVM is a model for solving binary classificitaion. It
where E = [A e~1 ], F = [B e~2 ], E T and F T represent the
aims to find two non-parallel hyperplanes in the following
transpose of E and F , respectively. Similarly, Substitut-
equations:
ing (10) into the objective function (9), we have
w~1 · ~x + b1 = 0, (1)  
w~2 1 T −1 T
w~2 · ~x + b2 = 0, (2) = −(E T E + F F ) E e~1 . (15)
b2 c2
such that the positive class is as close as possible to the
first hyperplane, and is far away from the second hyper- It requires O(m2 n2 ) time to calculate F T F , O(m1 n2 )
plane; the negative class is as close as possible to the time to calculate E T E, O(m2 n) time to calculate F T e~2 ,
second hyperplane, and is far away from the first hyper- O(m1 n) time to calculate E T e~1 , and O(n3 ) time to find
plane [22]. the inverse of c11 E T E + F T F and E T E + c12 F T F . So the
Suppose there are m training samples, among which total running time for solving the two hyperplanes given
there are m1 positive samples and m2 negative samples, in (7)-(10) is O((m + n)n2 ). In order to classify a new
and each training samples is a vector from the real space sample, we need to calculate the distance from the new
Rn . The matrices A ∈ Rm1 ∗n and B ∈ Rm2 ∗n represent sample ~x to the two hyperplanes |w~||1w x+b1 |
~
~1 || and |w~||2w
x+b2 |
~
~2 || ,
the samples of the positive and negative classes, respec- and then compare them. This step requires O(n) time.
3

III. QUANTUM ALGORITHM OF TSVM A. Preparation of input quantum states

In this paper, the quantum algorithm is proposed to In the quantum setting, we assume that we can prapare
speed up the learning and classification procedures of the quantum states by accessing the corresponding classical
classical algorithm of TSVM. Our results are described data. The methods to prepare the general state efficiently
in the following theorem. were demonstrated in many papers. In 2002, Grover and
Theorem 1: there exists a quantum algorithm that can Rudolph [14] gave a simple and efficient process for gen-
learn two non-parallel hyperplanes as given in equations erating a quantum superposition of states which form a
(1) and (2) in O(log mn) time from m training samples discrete approximation of any efficiently integrable prob-
in Rn , and then classify a new sample ~x ∈ Rn in O(log n) ability density functions. It was also shown in [15] that
time. we can generate any presrcibed quantum state by im-
The algorithm procedure is depicted in Algorithm 1. plementing in sequence n controlled rotations to create
For clarity, the following notations are introduced: a n-qubit state. Besides, Soklakov and Schack [16] gave
T T a way to encode a classical probability distribution in a
F e~2 = F e~2 , (16) quantum register in polynomial time in the number of
||F T e~2 ||
qubits.
T T
E e~1 = E e~1 , (17) Specifically, in the quantum recommendation systems
||E T e~1 || [17], Kerenidis and Prakash gave a classical data struc-
−−−→ ture such that an algorithm that has quantum access to
w1 , b1
|w~1 , b1 i = −−−→ , (18) the data structure can create the quantum state |xi corre-
||w1 , b1 ||
−−−→ sponding to a vector ~x ∈ Rn and the quantum state |Ai i
w2 , b2 corresponding to each row Ai of the matrix A ∈ Rm∗n .
|w~2 , b2 i = −−−→ . (19)
||w2 , b2 || That is, there exists a quantum algorithm that has quan-
  tum access to the data structure can perform the map-
−−−→ w
~i ping U fx : |0i → |xi in time O(log n) and the mapping
where wi , bi denotes the column vector for i =
bi f
1, 2. UA : |ii |0i → |ii |Ai i for i ∈ [m] in time O(log mn),
~
x Ai T
where |xi = ||x|| , |Ai i = ( ||A i ||
) .
Algorithm 1: QTSVM learning and classification Since we have the oracle to create quantum states from
algorithm. classical data structures, if we store the matrix F and
vector f~ in a classical data structure with fi = ||Fi ||, then
Input: m1 positive samples and m2 negative
we have the following algorithm to prepare the quantum
samples represented by matrices A and B, where
A ∈ Rm1 ∗n and B ∈ Rm2 ∗n . New sample ~x ∈ Rn . state F T e~2 :
Procedure:
Algorithm 2: Preparation of input quantum states
1. Prepare the input quantum states F T e~2
of QTSVM.
and E T e~1 , where E = [A e~1 ] and
F = [B e~2 ]. Input: matrix F ∈ Rm2 ∗(n+1)
2. Perform Hamiltonian simulation to Procedure:
H1 H2
Ĥ1 = trH 1
and Ĥ2 = trH 2
, where 1. Initialize quantum state to |0i |0i.
1 T T
H1 = c1 E E + F F and
2. Perform the mapping U ff to get the state:
H2 = E T E + c12 F T F . Pm2 −1
|0i i=0 ||Fi || |ii.
3. Use the HHL algorithm as a subroutine to
solve the linear equations shown in Eq. (12) 3. Perform the mapping U fF to get the state:
Pm 2 −1
and (13), and obtain two hyperplanes in the |χi = i=0 ||Fi || |Fi i |ii.
form of quantum states |w~1 , b1 i and |w~2 , b2 i. 4. Perform the Walsh-Hadamard transformation
4. Prepare the new sample ~x as a quantum to the second register of |χi and get the state
Pm2 −1 Pm2 −1
state |~x, 1i, then use the SWAP test to find √1 i=0
||Fi || |Fi i j=0 (−1)ij |ji.

the distances from the new sample to two
hyperplanes respectively and compare them. 5. measure
Pmthe second register to get
If ~x is closer to the first hyperplane, then √1 2 −1
i=0
||Fi || |Fi i |0i.

assign it a positive label; otherwise, assign it
a negative label. T The
Output: quantum
Pm2 −1 state
F e~2 = √1 ||Fi || |Fi i.
Nχi=0
Output: The label of ~x.


The procedure is explained in detail below. Similarly, we can prepare E T e~1 .
4
Pn−1
B. Hamiltonian simulation where Nx̃ = i=0 x2i + 1. By the SWAP operation [29]
in Figure 1, we can estimate the square of inner product
Denote t0 /T as ∆t. We hope to get e−iH1 ∆t and
ˆ I by using O( ǫ12 ) copies of |w~1 , b1 i and |x̃i, where ǫI is
I
−iHˆ2 ∆t the error accuracy and
e by Hamiltonian simulation in this step, where
Ĥ1 and Ĥ2 is the normalization of H1 and H2 . Take a 2
copy of |χi and perform a partial trace operation on the 2 |w~1 ~x + b1 |
I = hw~1 , b1 |x̃i = . (21)
second register, we get Nx̃
m2 −1
1 X 2
tr2 {|χi hχ|} = ||Fi || |Fi i hFi |
Nχ i=0 |0i H • H ✌✌
K2
= |w~1 , b1 i ×
trK2
= K̂2 , |x̃i ×
which needs O(log m2 n) time. Similarly, we can prepare
FIG. 1. SWAP operation
the density operator K̂1 in O(log m1 n) time. Because of
m > m1 and m > m2 , the consumed time is O(log mn). x+b1 |2
|w~1 ~
Then by using the density matrix exponential method Then we obtain the value ~1 ||2
||w
since the value of
[5], which enables to perform Hamiltonian simulation ef- 2
||w~1 || has been estimated. Similarly, we get the value
fectively even if samples do not satisfy the assumption x+b2 |2
of |w~||2w
~ x+b1 |2
. If |w~||1w
~ x+b2 |2
< |w~||2w
~
, then the sample is
~2 ||2 ~1 ||2 ~2 ||2
of sparsity, we can simulate e−iK̂1 ∆t and e−iK̂2 ∆t with
closer to the first hyperplane, and it will be labelled as a
O(∆t2 ). Furthermore, by Trotter’s formula [29], we have
positive point; otherwise, it will be labelled as a negative
i( 1 K1 +K2 )
−iHˆ1 ∆t −
c1
∆t one.
e =e trH1

1 iK1 ∆t iK2 ∆t
= e − c1 trH1
e− trH1
+ O(∆t2 )
trK
−iK̂1 c1 trH1
trK
∆t −iK̂2 trH2 ∆t
IV. TIME COMPLEXITY AND ERROR
=e 1 1 e 1 + O(∆t2 ). ANALYSIS
trK1 1 trK2
Since trH1 , c2 trH1 are constant factors, and trK1 , trK2 ,
trH1 can be efficiently estimated [6], e−iH1 ∆t can be sim-
ˆ Now we anlaysis the time complexity O(log mn) and
O(log n) shown in Theorem 1.
ulated in O(log mn) time with O(∆t2 ) error. Moreover,
ˆ When solving the first hyperplane, it needs O(log mn)
e−iH2 ∆t can then be simulated in the same way.
time to prepare χ, and O(log n) time to perform the
Walsh-Hadamard transformation on the second register.

C. Solve two hyperplanes Thus, we need O(log mn) time to prepare F T e~2 . In
the HHL algorithm [30], the errors come from Hamito-
nian simulation and phase estimation. We denote the
Now according to Eq. (14) and (15), we obtain the two
error in Hamitonian simulation as ǫH and the error in
hyperplanes in the form of quantum states |w~1 , b1 i and
phase estimation as ǫ. In Hamitonian simulation, we
|w~2 , b2 i by calling the HHL algorithm [30]. Note that
|w~1 , b1 i and |w~2 , b2 i are both (n + 1)-dimensional real need to simulate e−iτ Ĥ1 ∆t , where τ = 1, 2, ..., T −1. Since
ˆ
vectors, with w~1 , w~2 ∈ Rn . We measure copies of |w~1 , b1 i the operator e−iH1 ∆t can be simulated with O(∆t2 ) er-
repeatedly using the measurement operations set {I − ror and due to the linear accumulation of error, we can
2
|ni hn| , |ni hn|}, in order to estimate the value of ||w~1 || . get ǫH = O(τ ∆t2 ). Since τ = 0, 1, ..., T − 1, we have
2 τ < T and ǫH = O(T ∆t2 ). Since ∆t = tT0 , we can get
Similarly, we can estimate the value of ||w~2 || . The value
t2 t2
of ||w~1 ||2 and ||w~2 ||2 will be used in classification. ǫH = O(T ∆t2 ) = O( T0 ) and T = O( ǫH0 ). Since it needs
ˆ
O(log mn) time to simulate e−iH1 ∆t , the total time of
t2 log mn
D. Classification Hamiltonian simulation is O(T · log mn) = O( 0 ǫH ).
In the HHL algorithm [30], we let t0 = O(κ/ǫ) so that
the error of phase estimation is no more than ǫ, then
Given a new sample ~x = (x0 , x1 , ..., xn−1 ) ∈ Rn , we t2 log mn 2

hope to decide its class by comparing the distances from we have O( 0 ǫH ) = O( κ log mn
ǫH ǫ2 ). Finally, we need
it to two hyperplanes. By calling the oracle to the vector to repeat the procedure O(κ) time for getting the con-
3
x̃ = (x0 , x1 , ..., xn−1 , 1), we can construct the state stant success probability, so the total time is O( κ ǫlogmn

2 ),
n−1
where κ, ǫH , ǫ are constant numbers. It’s similar to solve
1 X the second hyperplane, so the learning time complexity
|x̃i = √ ( xi |ii + 1 |ni), (20) 3
Nx̃ i=0 is O( κ log mn
ǫH ǫ2 ).
5

For classification, it needs O(log n) to construct state [9] A. Monràs, G. Sentı́s, and P. Wittek, Physical review
|x̃i, O( logǫ2 n ) time to SWAP test, where ǫs is the error letters 118, 190503 (2017).
s
precision in SWAP test. So the total time for classifica- [10] C.-H. Yu, F. Gao, Q.-L. Wang, and Q.-Y. Wen, Physical
Review A 94, 042311 (2016).
tion is O(log n). [11] B. Duan, J. Yuan, Y. Liu, and D. Li, Physical Review
A 96, 032301 (2017).
[12] S. Lloyd and C. Weedbrook,
V. CONCLUSION Phys. Rev. Lett. 121, 040502 (2018).
[13] H. Situ, Z. He, L. Li, and S. Zheng, arXiv preprint
arXiv:1807.01235 (2018).
In this work, we have proposed a quantum algorithm [14] L. Grover and T. Rudolph, arXiv preprint
of twin support vector machine. For a training set with quant-ph/0208112 (2002).
m samples chosen from an n-dimensional feature space, [15] P. Kaye and M. Mosca, arXiv preprint quant-ph/0407102
our quantum algorithm can learn two non-parallel hyper- (2004).
planes in O(log mn) time, and then classify a new sample [16] A. N. Soklakov and R. Schack, Phys. Rev. A. 73, 012307
in O(log n) time by comparing the distances from the (2006).
sample to the two hyperplanes. The advantage of our [17] I. Kerenidis and A. Prakash, 8th Innovations in Theoreti-
cal Computer Science Conference, ITCS2017 (Dagstuhl,
algorithm comes from the procedure of density matrix
2017) 67, 49:1-49:21.
exponentiation, the HHL algorithm, and efficient estima- [18] Y. Du, T. Liu, Y. Li, R. Duan, and D. Tao, in
tion of the inner product. Recently, quantum-inspired Proceedings of the Twenty-Seventh International Joint
classical algorithms for solving linear systems have been Conference on Artificial Intelligence, IJCAI-18 (2018) pp.
proposed [31]. However, the algorithms require the low- 2093–2099.
rank assumption. Thus, when the matrix representa- [19] Z. Li, X. Liu, N. Xu, and J. Du, Physical review letters
tion of data is high-rank or non-singular, our algorithm 114, 140504 (2015).
[20] X.-D. Cai, D. Wu, Z.-E. Su, M.-C. Chen, X.-L. Wang,
still shows an exponential speedup over the correspond-
L. Li, N.-L. Liu, C.-Y. Lu, and J.-W. Pan, Physical
ing classical algorithm. We hope to find more problems review letters 114, 110504 (2015).
in machine learning that show quantum speedups in the [21] C. J. Burges, Data mining and knowledge discovery 2,
future. 121 (1998).
[22] R. Khemchandani, S. Chandra, et al., IEEE Transac-
tions on pattern analysis and machine intelligence 29,
905 (2007).
[23] W. J. Scheirer, A. de Rezende Rocha, A. Sapkota, and

lilvzh@mail.sysu.edu.cn (L. Li). T. E. Boult, IEEE transactions on pattern analysis and
[1] P. W. Shor, SIAM Journal on Computing 26, 1484 machine intelligence 35, 1757 (2013).
(1997). [24] Z. Qi, Y. Tian, and Y. Shi, Pattern Recognition 46, 305
[2] L. K. Grover, in Proceedings of the twenty-eighth annual (2013).
ACM symposium on Theory of computing (ACM, 1996) [25] X. Chen, J. Yang, Q. Ye, and J. Liang, Pattern Recog-
pp. 212–219. nition 44, 2643 (2011).
[3] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, [26] G. R. Naik, D. K. Kumar, et al., IEEE Transactions on
N. Wiebe, and S. Lloyd, Nature 549, 195 (2017). Information Technology in Biomedicine 14, 301 (2010).
[4] N. Wiebe, D. Braun, and S. Lloyd, Physical review let- [27] Y.-H. Shao, C.-H. Zhang, X.-B. Wang, and N.-Y. Deng,
ters 109, 050505 (2012). IEEE transactions on neural networks 22, 962 (2011).
[5] S. Lloyd, M. Mohseni, and P. Rebentrost, Nature Physics [28] M. A. Kumar and M. Gopal, Expert Systems with Ap-
10, 631 (2014). plications 36, 7535 (2009).
[6] P. Rebentrost, M. Mohseni, and S. Lloyd, Physical re- [29] M. A. Nielsen and I. Chuang, “Quantum computation
view letters 113, 130503 (2014). and quantum information,” (2002).
[7] M. Schuld, I. Sinayskiy, and F. Petruccione, Physical [30] A. W. Harrow, A. Hassidim, and S. Lloyd, Physical re-
Review A 94, 022342 (2016). view letters 103, 150502 (2009).
[8] V. Dunjko, J. M. Taylor, and H. J. Briegel, Physical [31] N. Chia, H. Lin, and C. Wang, arXiv preprint
review letters 117, 130501 (2016). arXiv:1811.04852 (2018).

You might also like