You are on page 1of 9

Quantum Algorithm for Unsupervised Anomaly Detection

Ming-Chao Guo,1 Shi-Jie Pan,1 Wen-Min Li,1, ∗ Fei Gao,1, † Su-Juan


Qin,1 Xiao-Ling Yu,2 Xuan-Wen Zhang,2 and Qiao-Yan Wen1
1
State Key Laboratory of Networking and Switching Technology,
Beijing University of Posts and Telecommunications, Beijing, 100876, China
2
Comprehensive Research Center of Electronic Information Technology in the MIIT, Shandong, 264200, China
(Dated: April 19, 2023)
Anomaly detection, an important branch of machine learning, plays a critical role in fraud detec-
tion, health care, intrusion detection, military surveillance, etc. As one of the most commonly used
unsupervised anomaly detection algorithms, the Local Outlier Factor algorithm (LOF algorithm)
has been extensively studied. This algorithm contains three steps, i.e., determining the k-distance
neighborhood for each data point x, computing the local reachability density of x, and calculating
arXiv:2304.08710v1 [quant-ph] 18 Apr 2023

the local outlier factor of x to judge whether x is abnormal. The LOF algorithm is computationally
expensive when processing big data sets. Here we present a quantum LOF algorithm consisting of
three parts corresponding to the classical algorithm. Specifically, the k-distance neighborhood of x
is determined by amplitude estimation and minimum search; the local reachability density of each
data point is calculated in parallel based on the quantum multiply-adder; the local outlier factor of
each data point is obtained in parallel using amplitude estimation. It is shown that our quantum
algorithm achieves exponential speedup on the dimension of the data points and polynomial speedup
on the number of data points compared to its classical counterpart. This work demonstrates the
advantage of quantum computing in unsupervised anomaly detection.

I. INTRODUCTION In recent years, some researchers successfully combined


AD algorithms with quantum techniques and obtained
Quantum computing theoretically demonstrates its various degrees of speedups. In 2018, Liu et al. pro-
computational advantages in solving certain problems posed a quantum kernel principal component analysis
compared with classical computing, such as factoring in- algorithm [30], which achieves an exponential speedup
tegers [1], searching in unstructured databases [2], solv- on the dimension of the training data compared to the
ing linear and differential equations [3–5], attacking cryp- classical counterpart. In 2019, Liang et al. presented
tography [6–8], and designing cryptographic protocols a quantum anomaly detection with density estimation
for the private query [9–11]. Recently, the combina- algorithm [31]. Its complexity is logarithmic in the di-
tion of quantum computing and machine learning, Quan- mension and the number of training data compared to
tum Machine Learning (QML) [12, 13], has emerged as the corresponding classical algorithm. In 2022, Guo et
a promising application of quantum technology. QML al. proposed a quantum algorithm for anomaly detection
made great strides in data classification [14, 15], cluster- [32], which achieves exponential speedup on the number
ing [16], linear regression [17–19], association rule mining of training data points over its classical counterpart.
[20], and dimensionality reduction [21–23]. In this paper, given the importance of unsupervised
Anomaly Detection (AD) refers to finding patterns in anomaly detection, we focus on the research of quantum
data that do not conform to expected behavior. It has algorithms for unsupervised anomaly detection. Specifi-
been extensively used in various fields, such as credit card cally, we propose a quantum LOF algorithm. The classi-
fraud detection [24], intrusion detection [25], and health cal LOF algorithm contains three steps: (1) determines
care [26]. Over the years, many AD algorithms have the k-distance neighborhood of each data point; (2) com-
been proposed, which can be divided into supervised, putes the reachable distance between each data point and
semi-supervised, and unsupervised [27]. Among them, its k-distance neighbors and obtains the local reachabil-
unsupervised AD algorithms are more widely applicable ity density by computing the mean of these reachability
as they do not require labeled data [28]. Breunig et al. distances; (3) obtains the local outlier factor of each data
proposed a density-based anomaly detection algorithm point by comparing its local reachability density with its
(called LOF algorithm) [29], one of the most widely used neighbors. Our quantum algorithm also consists of three
unsupervised AD algorithms. The advantage of this algo- parts, corresponding to the three steps of the classical
rithm is that it performs well when dealing with datasets LOF algorithm. In the first one, amplitude estimation
containing patterns with diverse characteristics. How- [33] and minimum search [34] are utilized to speed up
ever, similar to other AD algorithms, the LOF algorithm the process of determining the k-distance neighborhood.
is quite time-consuming when processing big data sets. Given the information of the k-distance neighborhood,
the second one accesses this information using QRAM
[35, 38] and then calculates the local reachability den-
∗ liwenmin@bupt.edu.cn sity of each data point in parallel based on the quantum
† gaof@bupt.edu.cn multiply-adder [36, 37]. The third one executes the quan-
2

tum amplitude estimation to obtain local outlier factors The local outlier factor (LOF) of x captures the degree
in parallel and performs Grover’s algorithm [2] to speed to which we call x an outlier. It is the average ratio of the
up the search for abnormal data points. As a conclusion, local reachability density of x and those of x’s k-distance
our quantum algorithm can achieve exponential speedup neighbors. It is easy to see that the lower x’s local reach-
on the dimension of the data points n and polynomial ability density, the higher the local reachability densities
speedup on the number of data points m compared to its of x’s k-distance neighbors are, and the higher the LOF
classical counterpart. value of x.
The rest of this article is organized as follows. In
Sec. II, we briefly review the classical LOF algorithm.
In Sec. III, we present a quantum LOF algorithm and B. LOF algorithm
analyze its complexity in detail. The conclusion is given
in Sec. IV. Given an unlabeled data set X = {xi }m i=1 and param-
eter k, where xi = (xi1 , xi2 , · · · , xin ) ∈ Rn . The LOF al-
gorithm contains three steps: the first step is to find the
II. REVIEW OF THE LOF ALGORITHM
k-distance neighborhood of each data point; the second
step is to calculate the reachable distance between each
In this section, we introduce the relevant definitions data point and its k-distance neighbors, and then ob-
and the classical LOF algorithm [29]. tain the local reachable density by computing the mean
of these reachable distances; the third step is to get the
local outlier factor of each data point by comparing its lo-
A. Definitions
cal reachable density with the data point in its neighbor-
hood. We identify whether each data is an anomaly by
For convenience, we begin with the following related comparing its local outlier factor with a pre-determined
definitions [29]: threshold δ. The whole procedure of LOF algorithm is
Definition 1: k-distance: For any positive integer k, shown in Algorithm 1.
the k-distance of x ∈ D (D is an unlabeled dataset),
denoted as k− d(x), is defined as the distance d(x, y) Algorithm 1 The procedure of LOF algorithm
between x and y ∈ D such that: there are at least k Input: The data set X and threshold δ;
data points y 0 ∈ D/{x} that meet d(x, y 0 ) ≤ d(x, y)
and at most k − 1 data points y 0 ∈ D/{x} that meet (1) Determine the k-distance neighborhood Nk (xi ) of the
d(x, y 0 ) < d(x, y). data point xi ;
Definition 2: k-distance neighborhood: Given the (2) Calculate the local reachability density lrdk (xi ) of the
k− d(x), the k-distance neighborhood of x is expressed data point xi by Eq. (1);
as Nk (x) = {y ∈ D/x | d(x, y) ≤ k− d(x)}. These data (3) Compute the local outlier factor LOFk (xi ) of the
points y are called the k-distance neighbors of x. data point xi by Eq. (2);
Definition 3: reachability distance: Given the pa- If LOFk (xi ) ≥ δ, the data point xi is marked as an anomaly.
rameter k, the reachability distance of data point x
with respect to y is defined as reach-distk (x, y) = Output: Abnormal data points.
max{k-distance(y), d(x, y)}.
Intuitively, if x is far away from y, then the reach-
ability distance between the two is simply their actual The complexity of step (1) is O(m2 nk), the complex-
distance. However, if they are “sufficiently” close, the ity of step (2) and step (3) is O(mk). In other words,
actual distance is replaced by the k-distance of y. The the LOF algorithm is computationally expensive when
purpose of introducing reachability distance is to reduce processing big data sets.
statistical fluctuation in the distance measure.
Definition 4: local reachability density: The local
reachability density of x is defined as III. QUANTUM LOF ALGORITHM
P −1
y∈Nk (x) reach-distk (x, y)
lrdk (x) = . (1) In this section, we propose a quantum version of the
|Nk (x)|
LOF algorithm and analyze its complexity. Our algo-
Intuitively, the local reachability density of a data point rithm consisting of three parts corresponding to the clas-
x is the inverse of the average reachability distance based sical LOF algorithm. Firstly, we utilize amplitude es-
on the k-distance neighbors of x. timation [33] and minimum search [34] to speed up the
Definition 5: local outlier factor: The local outlier process of finding the k-distance neighborhood. Secondly,
factor of x is defined as we use a data structure of QRAM [35] to access the in-
formation of k-distance neighborhood and then calculate
1 X lrdk (y)
LOFk (x) = . (2) the local reachability density of each data point in paral-
|Nk (x)| lrdk (x) lel based on the quantum multiply-adder [36, 37]. Finally,
y∈Nk (x)
3

we can obtain local outlier factors in parallel through the (1.1). Prepare the quantum state (i = 1, 2, · · · , m)
quantum amplitude estimation and perform Grover’s al-
n
gorithm [2] to search for abnormal data points satisfying 1 XX
|ii p |ti|ji|0i|0i. (6)
LOFk (xi ) ≥ δ in Sec. III C. An overview of our algorithm (m − 1)n t6=i j=1
is shown in Algorithm 2.
Assume that the data set X = {xi }m i=1 is stored in a (1.2) Apply the oracle OX to prepare
QRAM [35, 38], which allows the following mappings to
n
be performed in time O(log mn): 1 XX
|ii p |ti|ji|xij i|xtj i. (7)
(m − 1)n t6=i j=1
OX : |ii|ji|0i → |ii|ji|xij i, (3)

where i = 1, 2, · · · , m, j = 1, 2, · · · , n. (1.3) Perform the the quantum multiply-adder (QMA)


[36, 37] to creat
As a common tool, QRAM has been used to handle
state preparation tasks in most of the quantum algo- n
1 XX
rithms, espectially the quantum machine learning algo- |ii p |ti|ji|xij i|xtj i|xij − xtj i. (8)
rithms, such as data classification [14, 15], clustering [16], (m − 1)n t6=i j=1
linear regression [17–19], association rule mining [20], di-
mensionality reduction [21–23]. (1.4) Append an ancillary qubit
qand perform controlled
Definition 6: Quantum adder [37, 39]: Let xi −xt xi −xt
rotation from |0i to j C j |0i + 1 − ( j C j )2 |1i condi-
x1 x2 · · · xn be the binary representation for x, where
tioned on |xij − xtj i [3]. Uncompute the fourth, fifth and
x = x1 ·2n−1 +x2 ·2n−2 +· · ·+xn ·20 , |xi = |x1 i|x2 i · · · |xn i,
sixth registers to generate
and |yi = |y1 i|y2 i · · · |yn i. The quantum adder can real-
ize the following transformation, n i t
s
1 X X x j − x j xij − xtj
quantum adder
|ii p |ti|ji( |0i + 1 − ( )2 |1i)
|xi|yi|0i −−−−−−−−−−→ |xi|yi|x + yi. (4) (m − 1)n j=1
C C
t6=i
1 X
The circuit diagram of the quantum adder is shown in := |ii p |ti|χt i, (9)
Refs. [37, 39]. It consumes O(n2 ) controlled rotation (m − 1) t6=i
gates, so the complexity of the quantum adder is O(n2 ).
Definition 7: Quantum multiply-adder [37]: Let where C = max |xij − xtj |. The state |χt i
|xi = |x1 i|x2 i · · · |xn i, |yi = |y1 i|y2 i · · · |yn i and |zi = can be rewritten as |χt i = sin θt |χ0t i + cos θt |χ0t i,
Pd xi −xt
|z1 i|z2 i · · · |z2n i, the quantum multiply-adder can realize where sin θt |χ0t i = √1n j=1 j C j |ji|0i, cos θt |χ1t i =
the following transformation, Pn
q
xij −xtj 2
√1 2
n j=1 1 − ( C ) |ji|1i. Thus, we have sin θt =
quantum multiply -adder i t
|xi|yi|zi −−−−−−−−−−−−−−−−→ |xi|yi|z + x · yi. (5) 1
P n xj −xj 2 π
n j=1 ( C ) , θt ∈ [0, 2 ].
(1.5) Perform the quantum amplitude estimation to
Its circuit diagram is shown in Ref. [37]. The complexity
get
of the quantum multiply-adder is O(n3 ).
1 X θt
|ii p |ti|χt i| i, (10)
(m − 1) t6=i π
A. Quantum process of finding the k-distance
neighborhood
then perform the sine gate on | θπt i [37] and uncompute
the redundant registers |χt i to create the superposition
The k-distance neighborhood of each data point is de- of the distances between all points and xi :
termined based on its k-distance (as shown in Definition
1 and Definition 2). The choice of parameter k is crucial 1 X 1 X 1
|ii p |ti| sin θt i = |ii p |ti| √ d(xi , xt )i,
to the performance of such algorithms, but how to choose (m − 1) (m − 1) nC
t6=i t6=i
a suitable k is outside the scope of our discussion. Here (11)
we assume that the parameter k is given in advance. where d(xi , xt ) represents the Euclidean distance be-
tween xi and xt . (The detailed procedure can be seen in
ref. [32].)
1. Step details (1.6) Append an ancilla register, perform the quan-
tum minimum search [34] to get the k-distance of xi and
We adopt the quantum amplitude estimation and stored it in the ancilla register:
quantum minimum search to determine the k-distance
1 X 1
neighborhood of xi . The whole procedure is depicted as |ii √ |ti| √ d(xi , xt )i|k− d(xi )i. (12)
follows. m − 1 t6=i nC
4

Algorithm 2 The procedure of quantum LOF algorithm


Input: Data matrix X stored in a QRAM. Pre-determined threshold δ and parameter k;

Step 1: Determine the k-distance


P neighborhood Nk (xi ) of the data point xi to get
|ii √ni xt ∈Nk (xi ) |ti| nC d(xi , xt )i|k− d(xi )i|ni i.
1 √ 1

Measure the quantum state to obtain the information of Nk (xi );


Step 2: Calculate the local reachability density lrdk (xi ) of the data point xi to get
i −1
−1
i √1ni n
Pm
√1 |ji|ti| lrdk (xt )
  P i 
m i=1 |ii| lrdk (x ) j=1,xt ∈Nk (xi )
i;
Step 3: Compute the localP outlier factor LOFk (xi ) of data point xi to obtain
m i
√1
m i=1 |ii|LOFk (x )i;
Grover’s algorithm is applied to search all indices i of abnormal data points that satisfy LOFk (xi ) ≥ δ.

Output: Abnormal data points.

(1.7) Perform Grover’s algorithm [2] to create the su- Now, we analyze the error of step (1.5). For conve-
perposition state of labels for points in the k-distance nience, we use â to denote the estimation of a in the
neighborhood of xi : following sections. Let |θt − θ̂t | ≤ ε1 , we can obtain
1 X 1
|ii √ |ti| √ d(xi , xt )i|k− d(xi )i, (13) 1 ˆ i t 1
ni t i
nC |√ d(x , x ) − √ d(xi , xt )| = | sin θt − sin θ̂t |
x ∈Nk (x )
nC nC
1
where √nC d(xi , xt ) ≤ k− d(xi ). ≤ |θt − θ̂t | ≤ ε1 . (15)
(1.8) Perform quantum counting [33] to get the number
of points in the k-distance neighborhood of xi :
For convenience, we use d(xi , xt ) to denote √ 1 d(xi , xt )
nC
1 X 1 in the following sections.
|ii √ |ti| √ d(xi , xt )i|k− d(xi )i|ni i, (14)
ni t i
nC The operations from step (1.1) to step (1.7) need to
x ∈Nk (x )
be repeated m times to obtain the k-distance neigh-
where ni = |Nk (xi )|. Then by measuring the state in borhood of all data points, so the overall complexity is
3
computational basis for several times, we could obtain O[maxi {ni } · m 2 log(mn)/ε1 ]. The complexity of steps
1
the index t, the distance √nC d(xi , xt ), and the number (1.1) to (1.7) is illustrated in Table I.
ni of k-distance neighbors of xi for i = 1, 2, · · · , m.
TABLE I: The time complexity of steps 1 to 7
2. Complexity analysis
Steps Complexity
(1.1)-(1.2) O(log mn)
In step (1.2), the oracle OX can be implemented in (1.3)-(1.4) O(1)
O(log mn). In step (1.3)-(1.4), it takes QMA and control (1.5) O[ (logε1mn) ]

rotation with complexity O(1) (we assume that the data (1.6) O[ k m log(mn) ]
points are represented by a constant number of qubits), √ ε1
(1.7) O[ ni mεlog(mn) ]
the complexity of these gates can be omitted [37]. In √ 1
ni m log(mn)
step (1.5), to ensure that the error is ε1 , the amplitude (1.8) O[ ε1 ε2
]
3
estimation costs O[(log mn)/ε1 ] time. (For detailed com- Total complexity O[ maxi {ni }·m
2 log(mn)
]
ε1 ε2
plexity analysis, see the second paragraph of Sec. 3.3 in
Ref. [32].) In step (1.6), the quantum minimum search
i
is performed to √ find the k-distance of x , which requires
repeating O(k m) operations of √ steps (1.2)-(1.5), so the
complexity of step (1.6) is O[k m log(mn)/ε1 ]. In step
(1.7), the
√ complexity of performing Grover’s algorithm
is O[ni m log(mn)/ε1 ]. As for the quantum counting B. Quantum process of computing the local
reachability density
in step (1.8), letp|ni − n̂i | ≤ ε2 , it needs needs to repeat-
edly perform O( (m − 1)ni /ε2 ) operations of step (1.2)-
(1.5) and one operation of step (1.6), so the complex- We have obtained the index, distance and number of

ity of
√ the quantum counting is O[ mni log(mn)/ε1 ε2 ] + neighbors of all data points in the previous process. This
O[k m log(mn)/ε√1 ]. We can get that the complexity of information can be represented as a matrix and stored in
step (1.8) is O[ni m log(mn)/ε1 ε2 ]. QRAM. To facilitate quantum access in the subsequent
5

𝑖 𝑖
𝟎 𝐻 𝑡
𝟎 𝑗
𝐻 𝐻†
𝑥𝑗𝑖
𝟎 𝑂𝑋 𝑂𝑋†
𝑥𝑗𝑡
𝟎 𝑂𝑋 𝑄𝐴 𝑂𝑋†
𝑄𝐴𝐸 𝑄𝑀𝑆
𝟎 𝑥𝑗𝑖 − 𝑥𝑗𝑡 𝑄𝐶
0 𝐶−𝑅
1
𝟎 𝑑(𝒙𝒊 , 𝒙𝒕 )
𝑛𝐶
𝟎 𝑘_𝑑(𝒙𝒊 )

𝟎 𝑛𝑖

(1.1)-(1.2) (1.3)-(1.4) (1.5) (1.6)-(1.7)

FIG. 1: Quantum circuit of step 1. Here “/” denotes a bundle of wires, QA represents a quantum adder, C − R
denotes controlled rotation, QM S represents quantum minimum search and QC denotes quantum counting.

sections, there are allows the following mappings (2.4) Append ancillaryNregisters and perform the con-
trolled operation Uf on xt ∈Nk (xi ) |k− d(xt )i|d(xi , xt )i,
U : |ii|0i|0i → |ii|ni i|k− d(xi )i,
where Uf : |ai|bi|0i → |ai|bi| max{a, b}i and the function
V : |ii|ti|0i → |ii|ti|d(xi , xt )i, f (a, b) = max{a, b}, to generate the state
ni
1 X
G : |ii|0i → |ii √ |ji, 1 X
m
ni j=1
O
√ |ii|ni i [|ti|k− d(xt )i|d(xi , xt )i
m i=1
ni ni xt ∈Nk (xi )
1 X 1 X
W : |ii √ |ji|0i → |ii √ |ji|ti (16) |reach-distk (x , xt )i],
i
(20)
ni j=1 ni
j=1,xt ∈Nk

in complexity O[polylog(m max{ni })]. where reach-distk (xi , xt ) = max{k− d(xt ), d(xi , xt )} =
√ 1 · reach-distk (xi , xt ).
nC

1. Step details N(2.5) Perform the quantum i t


multiply-adder (QMA) on
t i
x ∈Nk (x ) |reach-distk (x , x )i and uncompute redun-
dant registers to create
We use QRAM to access the information of the k-
distance neighborhood of each point, and then compute m
P
reach-distk (xi , xt )
1 X xt ∈Nk (xi )
the local reachability density of each data point in par- √ |ii|ni i| i
allel based on quantum multiply-adder. The whole pro- m i=1 ni
cedure is depicted as follows. m
1 X −1
|ii|ni i| lrdk (xi ) i,

(2.1) Prepare the following quantum state =√ (21)
m i=1
m max{ni }
1 X O
√ |ii|0i [|ji|0i]. (17) √ 1 lrdk (xi ).
m i=1 where lrdk (xi ) = nC
j=1
(2.6) Append registers and perform the oracle G to
(2.2) Perform the oracles U and W to prepare the k- generate
distance of xi and the number of k-distance neighbors
of xi , uncompute the redundant registers to create the m ni
1 X −1 1 X
|ii|ni i| lrdk (xi ) i √

state √ |ji. (22)
m
m i=1 ni j=1
1 X O
√ |ii|ni i [|ti|k− d(xt )i|0i]. (18)
m i=1 t i x ∈Nk (x ) (2.7) Append an ancilla register and execute the oracle
W on Eq. (22) to get
(2.3) Apply the oracle V to create the distance of xi
and xt and store it in the register: m ni
1 X −1 1 X
|ii|ni i|[ lrdk (xi ) i √

m √ |ji|ti.
1 X O m i=1 ni
√ |ii|ni i [|ti|k− d(xt )i|d(xi , xt )i]. (19) j=1,xt ∈Nk (xi )
m i=1 (23)
xt ∈Nk (xi )
6

(2.8) Perform operations similar to steps (2.1)-(2.3) on 1. Step details


|ti to prepare the following quantum state
m ni
We calculate local outlier factors in parallel through
1 X  i −1
 1 X the amplitude estimation and perform Grover’s algo-
√ |ii|ni i| lrdk (x ) i √ |ji|ti|nt i
m i=1 ni rithm to search for abnormal data points satisfying
j=1,xt ∈Nk (xi )
O LOFk (xi ) ≥ δ. The whole procedure is depicted as fol-
[|li|k− d(xl )i|d(xl , xt )i|0i]. (24) lows.
xl ∈Nk (xt ) (3.1) Perform the quantum multiply-adder on Eq. (25)
to get
(2.9) Repeat the operations of step (2.4)-(2.5) to obtain −1
m ni 
the local density of data xt , we can create the state 1 X 1 X lrdk (xi )
√ |ii √ |ji|ti|  −1 i
m ni
m i=1 ni t i lrdk (xt )
j=1,x ∈Nk (x )
1 X  −1 1 X −1
|ii| lrdk (xi ) i √ |ji|ti| lrdk (xt ) i.

√ m ni
m i=1 ni 1 X 1 X
j=1,xt ∈Nk (xi ) := √ |ii √ |ji|ti|ρti i, (26)
(25) m i=1
ni
j=1,xt ∈Nk (xi )
The entire quantum circuit is shown in Fig. 2.  −1
lrdk (xi ) √1 [lrdk (xi )]−1 lrdk (xt )
where ρti =  −1 = nC
√1 [lrdk (xt )]−1
= lrdk (xi )
.
lrdk (xt ) nC
2. Complexity analysis (3.2) Append an ancillary qubit
q q and perform controlled
ρti ρti
rotate from |0i to E |0i + 1− E |1i conditioned on
In step (2.1), to prepare the state in Eq. (17), H
gates are invoked with the complexity O(log m). In |ρti i, we can obtain
step (2.2), it takes orales U and V with complexity m ni
r r
O[log(m) + max{ni }]. The complexity of performing or- 1 X 1 X ρti ρti
√ |ii √ |ji|ti|ρti i( |0i+ 1− |1i),
acle V in step (2.3) is O[polylog(m max{ni })]. In step m i=1 ni E E
j=1,xt ∈Nk (xi )
(2.5), it takes max{ni } times the quantum multiply-
(27)
adder to obtain the inverse of the local density of xi , t

thus the complexity to prepare the state in Eq. (21) is where E = max{ lrd k (x )
lrdk (xi )
}. Similar to step (1.4), let
t
O[max{ni } log(m)]. The oracles G and W in steps (2.6)- ρ
sin2 (αit ) = n1i xt ∈Nk (xi ) Ei indicates the probability
P
(2.7) are performed with complexity O[log(m max{ni })]. that |0i is measured, which can be obtained by ampli-
Similar to steps (2.1)-(2.3), the complexity of step (2.8) tude estimation.
is O[log(m max{ni }) + max{ni }]. Step (2.9) repeats the (3.3) Append an ancilla register, execute the quantum
operations of steps (2.4)-(2.5) to obtain Eq. (25) with amplitude estimation to obtain and store the amplitude
complexity O[max{ni } log(m)]. The total complexity is of |0i into the register, i.e.,
O[max{ni } log(m max{ni })]. The complexity of steps
(2.1) to (2.9) are illustrated in Table 2. 1 X
m
1 X
m
√ |ii|E · sin2 (αit )i = √ |ii|LOFk (xi )i. (28)
m i=1 m i=1
TABLE II: The time complexity of steps (2.1) to (2.7)
(3.4) Perform Grover’s algorithm to search all indices
Steps Complexity
i of abnormal data points that satisfy LOFk (xi ) ≥ δ.
(2.1) O(log m)
(2.2) O[log(m) + max{ni }]
(2.3) O[polylog(m max{ni })]
(2.4)-(2.5) O[max{ni } log(m)] 2. Complexity analysis
(2.6)-(2.7) O[log(m max{ni })]
(2.8) O[log(m max{ni }) + max{ni }] The

complexity of this step is
(2.9) O[max{ni } log(m)] O( mT max{ni } log(m max{ni })
), which is mainly de-
Total complexity O[max{ni } log(m max{ni })] ε3
rived from the amplitude estimation and the Grover’s
algorithm, where T represents the number of abnormal
data points and ε3 represents the error of step (3.3).
The specific analysis is as follows.
C. Quantum process of obtaining local outlier In step (3.3), the amplitude estimation block needs
factors O(1/ε3 ) applications of the operators of preparing Eq.
(27) to achieve error ε3 , where the complexity of per-
We have obtained the local reachability density of each forming the operators to obtain Eq. (27) is O[max{ni } ·
data point, thus we can compute the local outlier factor of log(m max{ni })]. Therefore, the complexity of step (3.3)
each data point in parallel using the amplitude estimate. is O( max{ni } log(m
ε3
max{ni })
). In step (3.4), the complexity
7

(2.1) (2.2) (2.3) − (2.5) (2.6) − (2.7)

𝟎 𝐻 𝑖
𝟎 𝑛𝑖
𝟎𝟎
𝑈 𝑈†
𝟎 𝑊 𝑊†
𝐴 𝑄𝑀𝐴
𝟎 𝑉 𝑉†
𝑈𝑓 ⊗𝒙𝒕 ∈𝑁𝑘 (𝒙𝒊 )
𝟎 𝑟𝑒𝑎𝑐ℎ − 𝑑𝑖𝑠𝑡𝑘 (𝒙𝒊 , 𝒙𝒕 )
𝟎 [𝑙𝑟𝑑𝑘 (𝒙𝒊 )]−1

𝟎 𝐺 𝑗
𝟎 𝑊 𝑡
𝟎 𝑛𝑡
𝑈
𝟎 𝑈†

𝟎 𝑊 𝑊†
𝐵 †
𝑄𝑀𝐴
𝟎 𝑉 𝑉
𝑈𝑓
𝟎 𝐶

𝟎 [𝑙𝑟𝑑𝑘 (𝒙𝒕 )]−1

FIG. 2: Quantum circuit of step 2. Here “/” denotes a bundle of wires, QM A represents a quantum multiply-adder,
t i i t l t
|Ai = |d(xi , xt )i⊗x ∈Nk (x ) , |Bi = |d(xt , xl )i⊗x ∈Nk (x ) and |Ci = |reach-distk (xi , xt )i⊗x ∈Nk (x ) .

of executing Grover’s
√ algorithm to obtain all abnormal Now, we analyze the errors of LOFk (xi ) as follow:
data points is O( mT · max{ni } log(m
ε2
max{ni })
).
The error of the amplitude estimation performed in
step (3.3) is

ˆ (xt )
ˆ k (xi ) − 1 lrd k
X
LOF = E| sin2 (αit ) − sin2 (α̂it )|
ni ˆ i
lrd (x )
xt ∈Nk (xi ) k

≤ E|α̂it − αit | ≤ E · ε3 . (29)

ˆ k (i) − 1
ˆ k (xi ) − LOFk (xi )| = LOF
X lrdk (xt )
|LOF
ni lrdk (xi )
xt ∈Nk (xi )
ˆ (xt ) ˆ (xt ) lrd (xt ) 
ˆ k (xi ) − 1 lrd k 1 lrd k k
X X
= |LOF + − |
ni ˆ (xi ) ni
lrd ˆ i
lrd (x ) lrd i
k (x )
xt ∈Nk (xi ) k xt ∈Nk (xi ) k

ˆ (xt ) ˆ (xt ) lrd (xt ) 


ˆ k (xi ) − 1 lrd k 1 lrd k k
X X
≤ LOF + −
ni ˆ i
lrd (x ) n i ˆ i
lrdk (x ) lrd i
k (x )
xt ∈Nk (xi ) k xt ∈Nk (xi )

1
P ˆ
reach-dist i t 1
P
reach-distk (xi , xt )
k (x , x )
 
1 X ni xt ∈Nk (xi ) ni xt ∈Nk (xi )
≤ E · ε3 + −
ni ˆ 1
P
xt ∈Nk (xi )
1
P
reach-dist (xt , xl )
k nt xl ∈Nk (xt ) reach-distk (xt , xl )
nt xl ∈Nk (xt )
 1
P ˆ i , xt )
d(x 1
P
d(xi , xt )

1 X ni xt ∈Nk (xi ) ni xt ∈Nk (xi )
≤ E · ε3 + −
ni ˆ t , xl ) 1
P
xt ∈Nk (xi )
1
P
d(x nt xl ∈Nk (xt ) d(xt , xl )
nt xl ∈Nk (xt )
1
P 1
P ˆ i , xt )d(xt , xl ) − d(xi , xt )d(x
d(x ˆ t , xl )
1 X ni xt ∈Nk (xi ) nt xl ∈Nk (xt ) 
≤ E · ε3 +
ni ( n1t
P t l 1
P ˆ t l
xt ∈Nk (xi ) xl ∈Nk (xt ) d(x , x )) · ( nt xl ∈Nk (xt ) d(x , x ))
8

1 X d(xt , xl )ε1 + d(xi , xt )ε1


≤ E · ε3 +
ni ( n1t
P t l 1
P ˆ t l
xt ∈Nk (xi ) xl ∈Nk (xt ) d(x , x )) · ( nt xl ∈Nk (xt ) d(x , x ))
2ε1
≤ E · ε3 + . (30)
( n1t
P
d(xt , xl )) · ( n1t
P ˆ t , xl ))
d(x
xl ∈Nk (xt ) xl ∈Nk (xt )

We assume that at least half of the ˆ k (xi )−LOFk (xi )| ≤ ε. The overall runtime will be
|LOF
√ values of
{d(xt , xl )}nl=1
t
are greater than a constant P , i.e., 3
O[k ·m 2 log(mn)/ε2 ]. It is shown that our quantum algo-
rithm achieves polynomial speedup on m and exponential
X nt √ X ˆ t , xl ) ≥ nt √P . speedup on n compared to its classical counterpart.
d(xt , xl ) ≥ P, d(x
2 2
xl ∈Nk (xt ) xl ∈Nk (xt )
(31)
IV. CONCLUSION
The second term of Eq. (30) is as follows:

2ε1 In the present study, we propose a quantum LOF al-


ˆ t , xl )) gorithm. It is shown that our quantum algorithm can
( n1t d(xt , xl )) · ( n1t
P P
xl ∈Nk (xt ) xl ∈Nk (xt ) d(x achieve exponential speedup on the dimension of data
2ε1 8ε points n and polynomial speedup on the number m of
≤ 1
√ √ = 1. (32) data points compared to its classical counterpart.
2 P · 21 P P
In the step 2, we proposed an efficient method to com-
4ε1 pute the local reachability density of each data point in
Therefore, we get LOFk (xi ) with error Eε2 + P . parallel, which can be revisited as a subroutine of other
quantum clustering algorithms and quantum dimension-
ality reduction algorithms. In step (2.3), the reason we
D. The total complexity can calculate the mean of reachability distance between
each data point and its k-distance neighbors by the quan-
tum multiply-adder is that we have managed to encode
The quantum algorithm can be divided into three steps
the distance information into the computational basis.
and the complexity of each step can be seen in Table
This idea could also be applied to solve other machine
3. Putting it all together, the complexity of the quan-
3 √ learning problems, such as such as density estimation
tum LOF algorithm is O[max{ni } · m log(mn)
2
ε1 + mT · and feature learning. We hope the techniques used in
max{ni } log(m max{ni })
]. our algorithm can inspire more anomaly detection algo-
ε3
rithms to get a quantum advantage, especially unsuper-
vised anomaly detection.
TABLE III: The time complexity of the three steps of
the quantum LOF algorithm
ACKNOWLEDGEMENTS
Step Complexity
3
Step 1 O[ max{ni }·m log(mn)
2
] We thank HaiLing Liu, GuangHui Li and Di Zhang for
ε1 ε2
Step 2 O[max{n i } log(m max{n i })]
useful discussions on the subject. This work is supported

Step 3 O[ mT · max{ni } log(m max{ni })
] by the National Natural Science Foundation of China
ε3
(Grants No.61976024, No.61972048, No.62272056) and
supported by the 111 Project B21049.
If max{ni } = O(k), C, E, P = O(1), ε1 = P16ε , ε2 =
ε
ε, ε3 = 2E and in general T  m, we can get

[1] P W. Shor, Algorithms for quantum computation: Dis- [2] L K. Grover, A fast quantum mechanical algorithm for
crete logarithms and factoring, In proceedings of 35th database search, In Proceedings of the twenty-eighth an-
Annual Symposium on Foundations of Computer Sci- nual ACM symposium on Theory of computing, 212-219
ence, Ieee, 124-134 (1994). (1996).
9

[3] A W. Harrow, A. Hassidim, S. Lloyd, Quantum algo- [22] S J. Pan, L C. Wan, H L. Liu, F. Gao, S J. Qin, Q Y. Wen,
rithm for linear systems of equations, Phys. Rev. Lett. Improved quantum algorithm for A-optimal projection,
103.15, 150502 (2009). Phys. Rev. A 102.5, 052402 (2020).
[4] L C. Wan, C H. Yu, S J. Pan, F. Gao, Q Y. Wen, S [23] C H. Yu, F. Gao, S. Lin, J. Wang, Quantum data com-
J. Qin, Asymptotic quantum algorithm for the Toeplitz pression by principal component analysis, Quantum In-
systems, Phys. Rev. A 97.6, 062322 (2018). formation Processing, 18.8, 1-20 (2019).
[5] H L. Liu, Y S. Wu, L C. Wan, S J. Pan, F. Gao, S J. [24] E. Aleskerov, B. Freisleben, B. Rao, Cardwatch: A neu-
Qin, Q Y. Wen, Variational quantum algorithm for the ral network based database mining system for credit
Poisson equation, Phys. Rev. A 104.2, 022418 (2021). card fraud detection, Proceedings of the IEEE/IAFE
[6] Z Q. Li, B B. Cai, H W. Sun, H L. Liu, L C. Wan, S J. 1997 computational intelligence for financial engineering
Qin, Q Y. Wen, F. Gao, Novel quantum circuit imple- (CIFEr). IEEE (1997).
mentation of Advanced Encryption Standard with low [25] V. Kumar, Parallel and distributed computing for cyber-
costs, Sci. China Phys. Mech. Astron. 65, 290311 (2022). security, IEEE Distributed Systems Online 6.10 (2005).
[7] H W. Sun, C Y. Wei, B B. Cai, S J. Qin, Q Y. Wen, [26] S. Clay, L. Parra, P. Sajda, Detection, synthesis and com-
F. Gao, Improved BV-based quantum attack on block pression in mammographic image analysis with a hierar-
ciphers, Quantum Information Processing, 22.1, 9 (2022). chical image probability model, Proceedings IEEE work-
[8] B B. Cai, F. Gao, G. Leander, Quantum attacks on two- shop on mathematical methods in biomedical image anal-
round even-mansour, Frontiers in Physics 979 (2022). ysis (MMBIA 2001). IEEE 2001.
[9] C Y. Wei, X Q. Cai, T Y. Wang, S J. Qin, F. Gao, Q [27] V. Chandola, A. Banerjee, V. Kumar, A. Valaba,
Y. Wen, Error Tolerance Bound in QKD-Based Quan- Anomaly detection: A survey, ACM computing surveys
tum Private Query, IEEE Journal on Selected Areas in (CSUR) 41.3, 1-58 (2009).
Communications, 38, 517-527 (2020). [28] Amer, Mennatallah, S. Abdennadher, Comparison of
[10] F. Gao, S J. Qin, W. Huang, Q Y. Wen, Quantum private unsupervised anomaly detection techniques, Bachelor’s
query: a new kind of practical quantum cryptographic Thesis (2011).
protocols, Sci. China-Phys. Mech. Astron. 62, 070301 [29] M. Breuning, H P. Kriegel, R. Ng, J. Sander, LOF: Iden-
(2019). tifying density based Local Outliers, Proc. of the ACM
[11] V. Giovannetti, S. Lloyd, L. Maccone, Quantum Private SIGMOD Conf. On Management of Data, 2000.
Queries, Phys. Rev. Lett. 100.23, 230502 (2008). [30] N. Liu, P. Rebentrost, Quantum machine learning for
[12] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. quantum anomaly detection, Phys. Rev. A 97.4, 042315
Wiebe, S. Lloyd, Quantum machine learning, Nature 549, (2018).
195-202 (2017). [31] J M. Liang, S Q. Shen, M. Li, L. Li, Quantum anomaly
[13] Allcock, Jonathan, S Y. Zhang, Quantum machine learn- detection with density estimation and multivariate Gaus-
ing, National Science Review 6.1, 26-28 (2019). sian distribution, Phys. Rev. A 99.5, 052310 (2019).
[14] S. Lloyd, M. Mohseni, and P. Rebentrost, Quantum algo- [32] M C. Guo, H L. Liu, Y M. Li, W M. Li, F. Gao, S J.
rithms for supervised and unsupervised machine learning, Qin, Q Y. Wen, Quantum algorithms for anomaly detec-
arXiv:1307.0411 (2013). tion using amplitude estimation, Physica A: Statistical
[15] N. Wiebe, D. Braun, S. Lloyd, Quantum algorithm for Mechanics and its Applications 604, 127936 (2022).
data fitting, Phys. Rev. Lett. 109.5, 050505 (2012). [33] G. Brassard, P. Hoyer, M. Mosca, Quantum amplitude
[16] S C. Morampudi, B. Hsu, S L. Sondhi, R. Moessner, amplification and estimation, Contemporary Mathemat-
Clustering in Hilbert space of a quantum optimization ics 305 (2002).
problem, Phys. Rev. A 96.4, 042303 (2017). [34] D. Christoph, P. Hoyer, A quantum algorithm for finding
[17] G M. Wang, Quantum algorithm for linear regression, the minimum, arXiv preprint quant-ph/9607014 (1996).
Phys. Rev. A 96.1, 012335 (2017). [35] V. Giovannetti, S. Lloyd, L. Maccone, Quantum Random
[18] C H. Yu, F. Gao, Q Y. Wen, An improved quantum algo- Access Memory, Phys. Rev. Lett. 100.16, 160501 (2008).
rithm for ridge regression, IEEE Transactions on Knowl- [36] L. Ruiz-Perez, J C. Garcia-Escartin, Quantum arithmetic
edge and Data Engineering (2019). with the quantum fourier transform, Quantum Informa-
[19] C H. Yu, F. Gao, C. Liu, D. Huynh, M. Reynolds, J. tion Processing 16.6, 152 (2017).
Wang, Quantum algorithm for visual tracking, Phys. [37] S S. Zhou, T. Loke, J A. Izaac, J B. Wang, Quantum
Rev. A 99.2, 022301 (2019). Fourier transform in computational basis, Quantum In-
[20] C H. Yu, F. Gao, Q L. Wang, Q Y. Wen, Quantum al- formation Processing 16.3, 82 (2017).
gorithm for association rules mining, Phys. Rev. A 94.4, [38] Giovannetti, Vittorio, S. Lloyd, L. Maccone, Architec-
042311 (2016). tures for a quantum random access memory, Physical
[21] S. Lloyd, M. Mohseni, P. Rebentrost, Quantum principal Review A 78.5, 052310 (2008).
component analysis, Nature Physics, 10, 631 (2014). [39] Draper, G. Thomas, Addition on a quantum computer,
arXiv preprint quant-ph/0008033 (2000).

You might also like