You are on page 1of 22

# a

r
X
i
v
:
1
4
0
2
.
0
6
6
0
v
3

[
q
u
a
n
t
-
p
h
]

2

A
p
r

2
0
1
4
Quantum Algorithms for Curve Fitting
Guoming Wang
Computer Science Division, University of California, Berkeley, U.S.A.

Abstract
We present quantum algorithms for estimating the best-t parameters and the quality of least-square
curve tting. The running times of these algorithms are polynomial in logn, d, , , , 1/ and 1/,
where n is the number of data points to be tted, d is the dimension of feature vectors, is the condition
number of the design matrix, and are some parameters reecting the variances of the design matrix
and response vector, is the t quality, and is the tolerable error. Different from previous quantum
algorithms for these tasks, our algorithms do not require the design matrix to be sparse, and they do com-
pletely determine the tted curve. They are developed by combining phase estimation and the density
matrix exponentiation technique for dense Hamiltonian simulation.
1 Introduction
Curve tting [2], also known as regression analysis in statistics, is the process of constructing a (simple
continuous) curve that has the best t to a series of data points. It can be used to understand the relationships
among two or more variables, and to infer values of a function where no data are available. It also provides an
efcient means of data compression. Therefore, it has found numerous applications in science, engineering
and economics. How to quickly t a large amount of data into a simple model has become an important
Least squares [6] is one of the most popular methods of curve tting. This method minimizes the sum
of squared residuals, where a residual is the difference between an observed value and the value predicted
by a model. Formally, we need to nd the parameter vector that minimizes the quantity |Fy|
2
, where
F is the design matrix, and y is the response vector. The solution is known as

= (F
T
F)
1
F
T
y. To compute
this solution, the best classical algorithm needs to take poly(n, d) time, where n is the number of data points
to be tted, and d is the dimension of feature vectors (namely, F is an nd matrix).
Recently, building upon Harrow, Hassidim and Lloyd (HHL)s quantum algorithm for solving linear
systems [10], Wiebe, Braun and Lloyd (WBL) gave several quantum algorithms for curve tting in the
case that F is sparse [15]. Specically, they gave a fast quantum algorithm for estimating the t quality,
and also a fast quantum algorithm for computing the best-t parameters

, but in an unconventional sense.
Namely, under the assumption that there exists a high-quality t
1
and the state proportional to y can be
prepared in polylog(n) time, their algorithm produces a quantum state approximately proportional to

in
poly(logn, logd, s, , 1/) time, where s is the sparsity of F, is the condition number of F, and is the
tolerable error. Then, by performing some measurements on this state and collecting the outcome statistics,
one can learn certain properties of

. However, since

is a d-dimensional vector, one cannot fully determine

in polylog(d) time. Thus, one cannot completely construct the tted curve in polylog(d) time. It is also

Email: wgmcreate@berkeley.edu.
1
The authors of [15] did not specify this condition explicitly. But it was implied by their assumption
_
_
F
T
y
_
_
=1 in the description
of their algorithms.
1
worth mentioning that their algorithms, along with HHLs, rely on a combination of phase estimation [11, 13]
and the techniques for sparse Hamiltonian simulation [3, 8, 9, 4].
In this paper, we present several quantum algorithms for estimating the best-t parameters and the
quality of least-square curve tting in the general case. Namely, our algorithms can work no matter whether
F is sparse or not. The running times of our algorithms are polynomial in logn, d, , , , 1/ and 1/,
where n is the number of data points to be tted, d is the dimension of feature vectors, is the condition
number of the design matrix, and are some parameters reecting the variances of the design matrix and
response vector, is the t quality
2
, and is the tolerable error. Our algorithms run very fast when the
given data are normal, in the sense that F is far from being singular, and the rows of F and y do not vary too
much in their norms. Meanwhile, it is unknown whether classical algorithms can solve this case fast.
Our algorithms differ from WBLs algorithms in several aspects. First, our algorithms do produce the
full classical description of

(not just a quantum state proportional to

), so they completely determine
the tted curve. But on the other hand, our algorithms have running times poly-logarithmic in n, but not
poly-logarithmic in d (as stated before, it is impossible to have time complexity poly-logarithmic in d in this
case). Second, our algorithms use recent technique for dense Hamiltonian simulation, so they can solve data
tting in the general case. Finally, for estimating the best-t parameters

= (F
T
F)
1
F
T
y, WBLs algorithm
consists of two stages: stage 1 generates a state proportional to z := F
T
y, and stage 2 generates a state
proportional to

= (F
T
F)
1
z. Each stage relies on invoking (a variant of) HHLs algorithm, which in turn
relies on analyzing the singular value decomposition of F or the spectral decomposition of F
T
F, respectively.
We notice that these two decompositions are essentially the same. Hence, these two stages can be carried
out simultaneously. So our algorithm consists of only one stage, and it generates the state proportional to

## in one shot. This leads to a saving of running time for estimating

.
Our algorithms are developed by combining phase estimation and recent technique for dense Hamil-
tonian simulation. Specically, Lloyd, Mohseni and Rebentrost [12] introduced a density matrix exponen-
tiation technique, which allows us to simulate e
it
by consuming multiple copies of the state . Then,
by running phase estimation on e
i
, we can analyze the eigenvalues and eigenvectors of . They call this
phenomenon quantum self-analysis. We utilize this phenomenon as follows. First, we prepare a state
proportional to FF
T
. Then s eigenvalues are related to the singular values of F, and its eigenvectors are
the (left) singular vectors of F. Then, the density matrix exponential technique allows us to implement e
it
for any t. Next, by running phase estimation on e
i
starting with state [y, we effectively break [y into
several pieces, where each piece is either parallel to one of Fs (left) singular vectors, or orthogonal to the
column space of F. Then, we can perform different quantum operations on each piece of [y. This, along
with some extra work, enables us to estimate the quality and best-t parameters of least-square curve tting.
2 Preliminaries
2.1 Notation
Let [n] := 1, 2, . . . , n. For any z R, let sgn(z) := 1 if z 0, and 1 otherwise.
For any vector , let [ := . Namely, [ and are essentially the same thing, but [ is used
to denote the quantum state corresponding to , not its classical description. Furthermore, let be the
normalized version of , i.e. := /||.
For any matrix A, let C (A) be the column space of A, and let
A
be the projection onto C (A). The
condition number of A, denoted by (A), is the ratio of As largest singular value to its smallest singular
value. The operator norm of A, denoted by |A|, is the largest singular value of A. The trace norm of A,
denoted by |A|
tr
, is the sum of the singular values of A. For any real matrix A with full column rank, the
2
The time complexity of the algorithm for estimating does not depend on 1/.
2
Moore-Penrose pseudoinverse of A, denoted by A
+
, is dened as (A
T
A)
1
A
T
. Finally, for any two Hermitian
matrices A and B, we use A B to denote that AB is positive semidenite.
For any quantum states
1
and
2
, let
D(
1
,
2
) :=
1
2
|
1

2
|
tr
(1)
be the trace distance between
1
and
2
. For any quantum operations E and F , let
D(E, F ) := max

## D((E I)(), (F I)()) (2)

be the distance between E and F . Here I is the identity operation on an auxiliary system of any di-
mension, and can be any state of this extended system. This is a well-dened distance measure for
quantum operations. It satises many nice properties, such as D(E I, F I) = D(E, F ), D(E
1
, E
3
)
D(E
1
, E
2
) +D(E
2
, E
3
), D(E F , E

) D(E, E

) +D(F , F

), etc.
For any quantum operation E and integer k, let E
k
be the k-repetition of E, i.e. E
k
:= E E E,
where the number of Es on the right-hand side is k. Then, we have D(E
k
, F
k
) k D(E, F ).
For any quantum state , if we say that is prepared to accuracy , it means that we actually prepare
a state

satisfying D(,

## ) . For any quantum operation E, if we say that E is implemented (or

simulated) to accuracy , it means that we actually implement an operation E

satisfying D(E, E

) .
Note that if a quantum circuit consists of m local operations E
1
, E
2
, . . . , E
m
, then in order to implement this
circuit to accuracy , it is sufcient to implement each E
i
to accuracy /m.
We use the symbol

O to suppress polylogarithmic factors. Namely,

O( f (n)) = O
_
f (n)(log f (n))
b
_
for
some constant b.
2.2 Quantum tools
We will mainly use three quantum tools. The rst tool is amplitude amplication:
Lemma 1 (Amplitude Amplication [5]). Suppose Ais a quantum algorithm such that A[0 =

p[1 [
1
+

1 p[0[
0
where p > 0 and [
1
, [
0
are some normalized states. Then there exists a quan-
tum algorithm A

such that A

uses O
_
1/

_
applications of A and A
1
, and A

[0 =

q[1 [
1
[
1
+

1q[0 [
0
[
0
where q 2/3 and [
1
, [
0
are some normalized states.
Namely, if A produces some good state [
1
with probability at least (and we can check whether it
does this or not), then we can amplify this probability to at least 2/3 by calling A and A
1
only O
_
1/

_
times. This implies that:
Corollary 1. Suppose A is a quantum algorithm such that A[0 =

p[1 [
1
+

1 p[0 [
0
where
p > 0 and [
1
, [
0
are some normalized states. Let > 0. Then there exists a quantum algorithm A

that uses O
_
log(1/)/

_
applications of A and A
1
, and A

[0 =

q[1 [
1
[
1
+

1q[0 [
0
[
0

where q 1 and [
1
, [
0
are some normalized states.
Proof. We run k = O(log(1/)) instances of the algorithm in Lemma 1 in parallel. Then by Lemma 1, we
obtain the state
_
r [1 [
1
[
1
+

1r [0 [
0
[
0

_
k
=
i0,1
k
_
r
[i[
(1r)
k[i[
[i
1
[
i
1
[
i
1
. . . [i
k
[
i
k
[
i
k
,
(3)
where r 2/3, i = (i
1
, . . . , i
k
), and [i[ =
k
j=1
i
j
is the Hamming weight of i. Note that on the right-hand
side of this equation, there exists only one term that does not contain [
1
(in any position), which is the
3
one corresponding to i = (0, . . . , 0), and its amplitude is
_
(1r)
k

## by our choice of k. Now we

perform on the state the following unitary operation: On the state [i
1
[
i
1
[
i
1
. . . [i
k
[
i
k
[
i
k
, it nds
the smallest j such that i
j
= 1, and then, unless such j does not exist, it swaps [i
1
with

i
j
_
, and swaps
[
i
1
with

i
j
_
. Then, for each term except the one corresponding to i = (0, . . . , 0), the rst two registers
after this operation will be in the state [1 [
1
. Thus, the whole state after this operation can be written as

q[1 [
1
[
1
+

1q[0[
0
[
0
for some q 1 and normalized states [
1
, [
0
(which are the
states of the third to the last registers, conditioned on the rst register being 1, 0 respectively).
The second tool is amplitude estimation:
Lemma 2 (Amplitude Estimation [5], Multiplicative version). Suppose A is a quantum algorithm such that
A[0 =

p[1[
1
+

1 p[0 [
0
where p (0, 1) is unknown, and [
1
, [
0
are some normalized
states. Let > 0. Then there exists a quantum algorithm A

that uses O
_
1/(

p)
_
applications of A and
A
1
, and A

produces an estimate p

of p such that [p p

## [ p with probability at least 2/3.

Lemma 3 (Amplitude Estimation [5], Additive version). Suppose A is a quantum algorithm such that
A[0 =

p[1[
1
+

1 p[0 [
0
where p (0, 1) is unknown, and [
1
, [
0
are some normalized
states. Let > 0. Then there exists a quantum algorithm A

## that uses O(1/) applications of A and A

1
,
and A

produces an estimate p

of p such that [p p

## [ with probability at least 2/3.

Our nal tool is phase estimation:
Lemma 4 (Phase Estimation [11, 13]). Suppose U is a unitary operation and [ is an eigenstate of U with
eigenvalue e
i
for some [0, 2). Let , >0. Then there exists a quantum algorithm A that uses a copy of
[ and O(log(1/)/) controlled applications of U and produces an estimate

of such that [

[
with probability at least 1.
In Lemma 4, the parameter is called the precision (or accuracy) of phase estimation, while the pa-
rameter is called the error rate of phase estimation. Since the complexity of phase estimation is only
logarithmic in 1/, we will assume that phase estimation always outputs a

satisfying [

[ . Al-
though this is not really true, taking the error rate into account only increases the complexities of our
algorithms by some poly-logarithmic factors.
3 Least-square curve tting
Given a set of n points (x
i,1
, x
i,2
, . . . , x
i,k
, y
i
)
n
i=1
in R
k+1
, the goal of curve tting is to nd a simple con-
tinuous function that has the best t to these data points. Formally, let x
i
:= (x
i,1
, x
i,2
, . . . , x
i,k
)
T
, for i [n].
Also, let f
j
: R
k
R be some simple function, for j [d]. Then we want to approximate y
i
with a function
of x
i
of the form
f (x, ) :=
d

j=1
f
j
(x)
j
, (4)
where := (
1
,
2
, . . . ,
d
)
T
are some parameters
3
. In the least-square approach, we nd the optimal pa-
rameters

by minimizing the sum of squared residuals, i.e.
E :=
n

i=1
[ f (x
i
, ) y
i
[
2
. (5)
3
The most common case is that each f
j
is a monomial of x, and hence f is a polynomial of x.
4
Now, let F be the n d matrix such that F
i, j
= f
j
(x
i
), for i [n] and j [d]. F is called the design
matrix, and F
i
:= ( f
1
(x
i
), f
2
(x
i
), . . . , f
d
(x
i
))
T
is called the i-th feature vector, for i [n]. In addition, let
y = (y
1
, y
2
, . . . , y
n
)
T
. y is called the response vector. Then one can see that
E = |Fy|
2
. (6)
Hence, the best-t parameters

are given by

= F
+
y = (F
T
F)
1
F
T
y. (7)
Correspondingly, the tted values of y are
y = F

= F(F
T
F)
1
F
T
y =
F
y, (8)
and the residuals are
= y y = (I F(F
T
F)
1
F
T
)y = (I
F
)y. (9)
Geometrically, y is exactly the projection of y onto C (F). To measure the quality of this t, we introduce
the quantity
:=
| y|
2
|y|
2
. (10)
Namely, is the squared cosine of the angle between y and y. The larger is, the better the t is. Note that
y and hence

E := | |
2
=
_
_
F

y
_
_
2
= (1)|y|
2
. (11)
We have assumed rank(F) =d (and hence F
T
F is invertible) in the above statement. This is a reasonable
assumption, because if otherwise, either the f
j
s are linearly dependent (e.g. f
2
= 2f
1
), or we simply do not
have enough data to do the tting. In each case, a revision of F is required.
3.1 Our model
We will study quantum algorithms for estimating the best-t parameters

and the t quality , in the
following model. We assume that F is given as a quantum oracle O
F
dened as
O
F
[i [ j[0 = [i [ j

F
i, j
_
, i [n], j [d]. (12)
Namely, O
F
takes a row index i and column index j as input, and returns the value of F
i, j
assume that y is given as a quantum oracle O
y
dened as
O
y
[i [0 = [i [y
i
, i [n]. (13)
Namely, O
y
takes a row index i as input, and returns the value of y
i
. An algorithm in this model has access
to O
F
, O
y
as well as their inverses. Its query complexity is dened as the number of calls to O
F
, O
y
and
their inverses. Its time complexity is dened as its query complexity plus the number of additional one- and
two-qubit gates used.
Without loss of generality, throughout this paper we assume that tr
_
F
T
F
_
=
n
i=1

d
j=1
[F
i, j
[
2
= 1 and
|y| = 1. This can be achieved by scaling the original F and y appropriately. Clearly, this rescaling does not
change the difculty of estimating

or .
5
4 Warmup: Simulating e
iFF
T
t
Before describing our algorithms for curve tting, we present a quantum algorithm for simulating e
iFF
T
t
.
This will become a crucial component of our algorithms.
Let
= FF
T
. (14)
Then, 0 and tr () = tr
_
F
T
F
_
= 1. So we can view as a density matrix. In what follows, we will rst
show how to prepare the quantum state . Then we will show how to simulate e
it
by using multiple copies
of .
Let
[F =
n

i=1
[i[F
i
=
n

i=1
d

j=1
F
i, j
[i [ j , (15)
where [F
i
=
d
j=1
F
i, j
[ j. Then, since |[F|
2
=
i=1

d
j=1
[F
i, j
[
2
= 1, [F is a normalized quantum state.
Furthermore, the reduced state of [F on the rst subsystem is =FF
T
. This implies that, to (approximately)
produce the state , it is sufcient to (approximately) produce the state [F.
Lemma 5. Suppose |F
i
| for any i [n]. Let =/. Then [F can be prepared to accuracy > 0
in

O(polylog(n) d log(1/)) time.
Proof. Let U
F
be the unitary operation dened as
U
F
[i [0 = [i [|F
i
|, i [n]. (16)
Clearly, U
F
can be implemented in O(d) time (since we can query F
i,1
, F
i,2
, . . . , F
i,d
and compute |F
i
| from
them).
Next, let V
F
be the unitary operation dened as
V
F
[i [0 = [i

F
i
_
, i [n]. (17)
(Recall that by denition

F
i
= F
i
/|F
i
|.) We have that:
Claim 1. V
F
can be implemented to accuracy > 0 in

O(d polylog(1/)) time.
Proof. We will describe a quantum circuit that maps [i[0 to [i [
i
such that |
i

F
i
| = O(), for any
i [n]. This ensures that this circuit implements V
F
to accuracy O(), as desired
Pick an integer M =
_
d/
2
_
. Fix any i [n]. Let S
i, j
=
j
l=1
F
2
i, j
, for j [d]. Note that S
i,d
= |F
i
|
2
.
Also, let S
i,0
= 0. Then, let M
i, j
= MS
i, j
/S
i,d
, for 0 j d. Then let Z
i, j
= M
i, j
M
i, j1
, for j [d].
Finally, let
[
i
=
d

j=1

i, j
[ j =
d

j=1
sgn(F
i, j
)
_
Z
i, j
M
[ j . (18)
Then, by construction, we have
[
i, j

F
i, j
[ = O
_
1

M
_
= O
_

d
_
, j [d], (19)
where

F
i, j
= F
i, j
/|F
i
|. It follows that |
i

F
i
| = O(), as claimed.
Now we describe how to map [i[0 to [i[
i
. For any k [M], let h
i
(k) = ( j, t) if M
i, j1
< k M
i, j
and k = M
i, j1
+t. Note that for any k [M], a unique ( j, t) satises this condition. So the function h
i
is
well-dened. Consider the following procedure:
6
1. We create the state [i (
d

j=1

F
i, j
_
) by using O(d) queries to O
F
.
2. We compute the M
i, j
s for j [d], obtaining the state
[i(
d

j=1

F
i, j
_
)(
d

j=1

M
i, j
_
). (20)
3. We append a register in the state
1

M
M

k=1
[k, obtaining the state
[i (
d

j=1

F
i, j
_
)(
d

j=1

M
i, j
_
)
_
1

M
M

k=1
[k
_
(21)
4. We compute h
i
(k) for each k [M], obtaining the state
[i (
d

j=1

F
i, j
_
)(
d

j=1

M
i, j
_
)
_
1

M
M

k=1
[h
i
(k)
_
=[i (
d

j=1

F
i, j
_
)(
d

j=1

M
i, j
_
)
_
1

M
d

j=1
Z
i, j

t=1
[ j, t
_
(22)
5. We perform the unitary operation that maps [ j
_
1

Z
i, j
Z
i, j

t=1
[t
_
to [ j [0 on the last register, obtaining
the state
[i (
d

j=1

F
i, j
_
)(
d

j=1

M
i, j
_
)
_
d

j=1
_
Z
i, j
M
[ j, 0
_
(23)
6. We multiply the phase of each term by the sign of F
i, j
, obtaining the state
[i(
d

j=1

F
i, j
_
)(
d

j=1

M
i, j
_
)
_
d

j=1
sgn(F
i, j
)
_
Z
i, j
M
[ j, 0
_
(24)
7. We uncompute the M
i, j
s by undoing step 2, then uncompute the F
i, j
s by undoing step 1. Eventually,
we obtain the sttae
[i [
i
= [i
_
d

j=1
sgn(F
i, j
)
_
Z
i, j
M
[ j
_
(25)
as desired.
Clearly, this algorithm runs in

O(d polylog(M)) =

O(d polylog(1/)) time, as claimed.
Since the time complexity of implementing V
F
is only poly-logarithmic in the inverse accuracy, we from
now on assume that V
F
is implemented perfectly (taking the accuracy issue into account only increases the
time complexities of our algorithms by a poly-logarithmic factor).
Now consider the following algorithm for preparing [F:
7
1. We prepare the state
1

n
i=1
[i [0 [0, and convert it into the state
1

n
n

i=1
[i

F
i
_
[|F
i
| (26)
by calling V
F
and U
F
once.
2. We append a qubit in the state [0, and perform the controlled-rotation
[z [0 [z
_
z
1
[1 +
_
1z
2

2
[0
_
(27)
on the last two registers (recall that |F
i
| ), obtaining the state
1

n
n

i=1
[i

F
i
_
[|F
i
|
_
|F
i
|
1
[1 +
_
1|F
i
|
2

2
[0
_
. (28)
3. We measure the last qubit in the standard basis. Then conditioned on seeing outcome 1, the rest of the
state becomes proportional to
n

i=1
[i|F
i
|

F
i
_
[|F
i
| =
n

i=1
[i [F
i
[|F
i
|. (29)
Furthermore, since |F
i
| = /, the probability of seeing outcome 1 is
_
1/
2
_
.
4. We uncompute the [|F
i
| by performing the inverse of U
F
on the rst and third registers, obtaining
the state [F =
n
i=1
[i [F
i
.
5. The above procedure succeeds only with probability
_
1/
2
_
. We use amplitude amplication to
raise this probability to 1 O(). This ensures that we have prepared [F to accuracy O(). By
Corollary 1, this requires

O(log(1/)) repetitions of the above procedure and its inverse.
Clearly, this algorithm has time complexity

O(polylog(n) d log(1/)), as claimed.
Lemma 5 immediately implies:
Lemma 6. Suppose |F
i
| for any i [n]. Let = /. Then can be prepared to accuracy > 0
in

O(polylog(n) d log(1/)) time.
Now we review the density matrix exponentiation technique of [12]. This technique allows us to simulate
e
it
by consuming multiple copies of the state :
Lemma 7 (Implicit in [12]). Let be a D-dimensional quantum state. Then there exists a quantum algorithm
that simulates e
it
to accuracy O() using O
_
t
2
/
_
copies of and

O
_
logD t
2
/
_
qubit gates.
Proof. This algorithm is based on the following observation. Let be any D-dimensional state. Then we
have
E
x
() := tr
1
_
e
iSx
()e
iSx
_
= +ix[, ] +O
_
x
2
_
, (30)
where S is the swap operator, i.e. S[i [ j = [ j [i for any i, j [D]. Meanwhile, we have
F
x
() := e
ix
e
ix
=+ix[, ] +O
_
x
2
_
. (31)
8
Therefore,
D(E
x
(), F
x
()) = O
_
x
2
_
. (32)
In fact, one can check that for any state (with a D-dimensional subsystem),
D((E
x
I)(), (F
x
I)()) = O
_
x
2
_
. (33)
This implies that
D(E
x
, F
x
) = O
_
x
2
_
. (34)
Now we use n repeated applications of E
t/n
to simulate e
it
. Since
D(E
n
t/n
, F
n
t/n
) = D(E
n
t/n
, e
it
) = O
_
t
2
/n
_
, (35)
in order to make D(E
n
t/n
, e
it
) = O(), it is sufcient to set n =
_
t
2
/
_
. This algorithm consumes n =
O
_
t
2
/
_
copies of . Furthermore, it needs to implement e
iSt/n
once in each application of E
t/n
. As shown
in [7], e
iSt/n
can be implemented in

O(logD) time. Thus, this algorithm uses O
_
t
2
/
_
copies of and

O(n logD) =

O
_
logD t
2
/
_
The following lemma says that we do not need to prepare exactly in order to simulate e
it
well. A
good approximation of would be sufcient.
Lemma 8. Let and

## | = O(/t). Then there exists

a quantum algorithm that simulates e
it
to accuracy O() using O
_
t
2
/
_
copies of

and

O
_
logD t
2
/
_
Proof. By Lemma 7, there exists a quantum algorithm that simulates e
i

t
to accuracy using O
_
t
2
/
_
copies of

and

O
_
logD t
2
/
_
additional two-qubit gates. So it is sufcient to prove that
D(e
it
, e
i

t
) = O(). (36)
Claim 2. Let A and B be any two Hermitian matrices. Then
_
_
e
iA
e
iB
_
_
= O(|AB|).
Proof. Let C(x) = e
iAx
e
iB(1x)
, for x [0, 1]. Then
e
iA
e
iB
=

1
0
dC(x)
dx
dx =

1
0
ie
iAx
(AB)e
iB(1x)
dx. (37)
Thus,
_
_
e
iA
e
iB
_
_

1
0
_
_
_e
iAx
(AB)e
iB(1x)
_
_
_dx

1
0
|AB|dx = |AB|. (38)
Claim 2 implies that
_
_
_e
it
e
i

t
_
_
_ = O
__
_

_
_
t
_
= O(). (39)
It follows that D(e
it
, e
i

t
) = O(), as desired.
Combining Lemma 6 and Lemma 8, we obtain:
Lemma 9. Suppose |F
i
| for any i [n]. Let = /. Then e
it
can be simulated to accuracy
> 0 in

O(polylog(n) dt
2
/) time.
Proof. We use the algorithm in Lemma 6 to prepare to accuracy O(/t). Then we use the algorithm in
Lemma 8 to simulate e
it
to accuracy O(). By Lemma 6 and Lemma 8, this algorithm has time complexity

O
_
polylog(n) dt
2
/
_
, as claimed.
9
5 Quantum algorithms for estimating the best-t parameters
In this section, we present two quantum algorithms on the estimation of

= (F
T
F)
1
F
T
y. The rst algorithm
produces an estimate of
_
_

_
_
(i.e. the norm of

), and the second one produces an estimate of

:=

/
_
_

_
_
(i.e. the normalized version of

). Then, by multiplying them together, we obtain an estimate of

=
_
_

_
_

.
Before describing our algorithm, it is benecial to consider the singular value decomposition of F and
write

as the linear combination of the (right) singular vectors of F. Formally, suppose F has the singular
value decomposition
F =
d

j=1
s
j

u
j
v
j

, (40)
where s
1
s
2
s
d
are the singular values of F. Then we have
tr
_
F
T
F
_
=
d

j=1
s
2
j
= 1, (41)
Let (F) = s
d
/s
1
. Then
1

d
s
1
s
2
s
d

d
. (42)
Meanwhile, we have
= FF
T
=
d

j=1
s
2
j

u
j
u
j

. (43)
This implies that
C () =C (F) = span([u
1
, [u
2
, . . . , [u
d
). (44)
Therefore, the 1-eigenspace of e
i
is exactly Ker (F).
Now suppose [ y =
d
j=1

u
j
_
. Then, by |y| = 1, we get
=| y|
2
=
d

j=1

2
j
. (45)
Furthermore, we have

_
= (F
T
F)
1
F
T
[y =
d

j=1

j
s
1
j

v
j
_
, (46)
which implies that
_
_

_
_
2
=
d

j=1

2
j
s
2
j
. (47)
Then it follows from Eq.(42) and Eq.(45) that
d

2

_
_

_
_
2
d
2
. (48)
The following lemma will be also useful for our algorithms. It gives an upper bound on the time com-
plexity of preparing the state [y.
Lemma 10. Suppose [y
i
[ , for any i [n]. Let =/. Then [y can be prepared to accuracy > 0
in

O(polylog(n) log(1/)) time.
Proof. Consider the following algorithm:
10
1. We prepare the state
1

n
i=1
[i [0 and call O
y
once, obtaining the state
1

n
n

i=1
[i[y
i
. (49)
2. We append a qubit in the state [0, and perform the controlled-rotation
[z [0 [z
_
z
1
[1 +
_
1z
2

2
[0
_
, (50)
on the last two registers (recall that [y
i
[ ), obtaining the state
1

n
n

i=1
[i[y
i

_
y
i

1
[1 +
_
1y
2
i

2
[0
_
. (51)
3. We measure the last qubit in the standard basis. Then conditioned on seeing outcome 1, the rest of the
state is proportional to
n

i=1
y
i
[i [y
i
(52)
Furthermore, since [y
i
[ = /, the probability of seeing this outcome is
_
1/
2
_
.
4. We uncompute the [y
i
by uncalling O
F
, obtaining the state [y =
n
i=1
y
i
[i.
5. The above procedure succeeds only with probability
_
1/
2
_
. We use amplitude amplication to
raise this probability to 1 O(). This ensures that we have prepared [y to accuracy O(). By
Corollary 1, this requires

O(log(1/)) repetitions of steps 1-4 and their inverses.
Clearly, this algorithm has time complexity

O(polylog(n) log(1/)), as claimed.
Since the time complexity of preparing [y is only poly-logarithmic in the inverse accuracy, we from
now on assume that [y is prepared perfectly (taking the accuracy issue into account only increases the time
complexities of our algorithms by some poly-logarithmic factors).
5.1 Quantum algorithm for estimating
_
_

_
_
Theorem 1. Suppose |F
i
| , for any i [n], and |F
+
| 1/a, |F| b. Moreover, suppose
[y
i
[ , for any i [n]. Let = /, = / and = b/a. Then
_
_

_
_
can be estimated to a relative error
> 0 with probability at least 2/3 in

O
_
polylog(n) (+d
3

6
/(
3
))/(

)
_
time.
Proof. Consider the following algorithm (for convenience, we assume that phase estimation is perfect in the
following description, and we will take the error of phase estimation into account later):
1. We use the algorithm in Lemma 10 to prepare the state [y.
2. We run phase estimation on e
i
starting with [y, obtaining the state
d

j=1

u
j
_

s
2
j
_
+[ [0 . (53)
11
3. We append a qubit in the state [0 and perform the controlled-rotation
[z [0 [z
_
_
a

z
[1 +

1
a
2
z
[0
_
_
, (54)
on the last two registers (note that s
j
a), obtaining a state proportional to
d

j=1

u
j
_

s
2
j
_
_
a
s
j
[1 +

1
a
2
s
2
j
[0
_
+[ [0 , (55)
4. We measure the last qubit in the standard basis. Then, conditioned on seeing outcome 1, the rest of
the state becomes proportional to
d

j=1

j
s
1
j

u
j
_

s
2
j
_
. (56)
Furthermore, the probability of getting outcome 1 is
q := a
2
d

j=1

2
j
s
2
j
= a
2
_
_

_
_
2
. (57)
Since
d
j=1

2
j
= , and s
j
b = a, we have q =
_
/
2
_
.
5. We use amplitude estimation to estimate q to a relative error O() with probability at least 2/3. Since
q =
_
/
2
_
, this requires

O
_
/(

)
_
repetitions of the above procedure and its inverse.
6. Let q

## be the estimate of q. Then we return

q

/a as the estimate of
_
_

_
_
.
Now we take the error of phase estimation into account, and analyze the time complexity of this algo-
rithm. In step 2, we do not get the eigenphase s
2
j
exactly, but instead get some (random)
j
s
2
j
(although
phase estimation singles out the eigenphase 0 perfectly). This implies that we only obtain the states in
steps 2-4 approximately. Since we want to estimate q to a relative error O(), we need to make sure
that

j
s
2
j

= O
_
s
2
j
_
. Then, by s
2
j
1/(d
2
), we need to set the precision of phase estimation to be
O
_
/(d
2
)
_
. It follows that we need to simulate e
it
to accuracy O
_
/
2
_
4
for t = O
_
d
2
/
_
during
phase estimation. This can be done in

O(polylog(n) d(d
2
/)
2
/(/
2
)) =

O(polylog(n) d
3

6
/(
3
))
time by Lemma 9. Meanwhile, [y can be prepared in

O(polylog(n) ) time by Lemma 10. Therefore,
one iteration of steps 1-3 takes

O(polylog(n) ( +d
3

6
/(
3
))) time. Since amplitude estimation re-
quires

O(/(

## )) repetitions of steps 1-4 and their inverses, this algorithm takes

O(polylog(n) ( +
d
3

6
/(
3
))/(

)) time, as claimed.
Remark: We can reduce the failure probability of the algorithm in Theorem 1 to arbitrarily small > 0
by repeating this algorithm O(log(1/)) times and taking the median of the estimates obtained.
4
We want the disturbance caused by the imperfection of simulating e
it
to be O(q).
12
5.2 Quantum algorithm for estimating

Suppose

= (

1
,

2
, . . . ,

d
)
T
. Our algorithm for estimating

consists of two parts. The rst part estimates
[

1
[, [

2
[, . . . , [

d
[ (i.e. the norm of each entry). The second part determines sgn
_

1
_
, sgn
_

2
_
, . . . , sgn
_

d
_
(i.e. the sign of each entry). Both parts depend on the following algorithm for producing the state

_
.
Proposition 1. Suppose |F
i
| , for any i [n], and |F
+
| 1/a, |F| b. Moreover, suppose
[y
i
[ , for any i [n]. Let = /, = / and = b/a. Then

_
can be prepared to accuracy
> 0 in

O
_
polylog(n) (+d
3

6
/(
3
))/

_
time.
Before proving this proposition, let us recall the singular value decomposition of F as shown in Eq.(40).
Although [v
1
, [v
2
, . . . , [v
d
are d-dimensional vectors, we from now on consider them as n-dimensional
vectors (that is, we embed R
d
into R
n
in the natural way). Now let

+
=
d

j=1
s
2
j

w
+
j
w
+
j

=
d

j=1
s
2
j

j
w

,
(58)
where

w
+
j
_
=
1

2
_
[0

v
j
_
+[1

u
j
__
,

j
_
=
1

2
_
[0

v
j
_
[1

u
j
__
.
(59)
Both
+
and

are 2n-dimensional quantum states. The following lemma says that they can be prepared
quickly:
Lemma 11. Suppose |F
i
| , for any i [n]. Let = /. Then

## can be prepared to accuracy

> 0 in

O(polylog(n) d log(1/)) time.
Proof. Consider the following algorithm:
1. We use the algorithm in Lemma 5 to prepare the state
[F =
n

i=1
d

j=1
F
i, j
[i [ j =
d

j=1
s
j

u
j
_

v
j
_
, (60)
where in the second step we perform the Schmidt decomposition of [F, which corresponds to the
singular value decomposition of F.
2. We append a qubit in the state [ =
1

2
([0 [1) and an n-dimensional register in the state [0, and
[i
1
[i
0
[ j[0 [i
1
[i
0
[ j

i
j
_
, i
0
, i
1
[n], j 0, 1. (61)
Then we obtain the state
1

2
d

j=1
s
j

u
j
_

v
j
__
[0

v
j
_
[1

u
j
__
=
d

j=1
s
j

u
j
_

v
j
_

j
_
. (62)
Then the reduced state of this state on the last register is

=
d
j=1
s
2
j

j
w

, as desired.
13
By Lemma 5, this algorithm has time complexity

O(polylog(n) d log(1/)), as claimed.
Combining Lemma 8 and Lemma 11, we obtain:
Lemma 12. Suppose |F
i
| , for any i [n]. Let = /. Then e
i

t
can be simulated to accuracy
> 0 in

O
_
polylog(n) dt
2
/
_
time.
Proof. We use the algorithm in Lemma 11 to prepare

## to accuracy O(/t). Then we use the algorithm in

Lemma 8 to simulate e
i

t
to accuracy O(). It follows from Lemma 11 and Lemma 8 that this algorithm
has time complexity

O
_
polylog(n) dt
2
/
_
, as claimed.
Now, let
=
+

=
d

j=1
s
2
j
_

w
+
j
w
+
j

j
w

_
. (63)
Namely, has eigenvalues s
2
j
s and eigenvectors

j
_
s. We can simulate e
it
by composing the simula-
tions of e
i
+
t
and e
i

t
:
Lemma 13. Suppose |F
i
| , for any i [n]. Let = /. Then e
it
can be simulated to accuracy
> 0 in

O
_
polylog(n) dt
2
/
_
time.
Proof. We use Suzukis method [14] for simulating e
i(A+B)t
, where A, B are arbitrary Hermitian matrices
satisfying |A|, |B| 1. This method works as follows. Dene a function S
2k
(x) recursively: let
S
2
(x) = e
iAx/2
e
iBx
e
iAx/2
, (64)
and let
S
2k
(x) = [S
2k2
(p
k
x)]
2
S
2k2
((14p
k
)x)[S
2k2
(p
k
x)]
2
(65)
where p
k
= (44
1/(2k1)
)
1
for any k 2. Then we have:
Claim 3 ([14]). For any k N,
_
_
_e
i(A+B)x
S
2k
(x)
_
_
_ = O
_
[x[
2k+1
_
. (66)
This implies that for any k N,
_
_
_
_
e
i(A+B)t

_
S
2k
_
t
n
__
n
_
_
_
_
= O
_
t
2k+1
n
2k
_
. (67)
To make the right-hand side O(), we need to set n =
_
t
1+
1
2k

1
2k
_
. Then, (S
2k
(t/n))
n
is the product
of O(n) = O
_
t
1+
1
2k

1
2k
_
terms, where each term is of the form e
iAt
j
or e
iBt
j
for some t
j
= O(t/n) =
O
_

1
2k
t

1
2k
_
.
Now we simulate e
it
by setting A =
+
and B =

## . We need to implement each term, which is of

the form e
i
+
t
j
or e
i

t
j
for t
j
= O
_

1
2k
t

1
2k
_
, to accuracy O(/n) = O
_

1+
1
2k
t
1
1
2k
_
. By Lemma 13, this
takes

O
_
polylog(n) dt
1
1
2k
/
1
1
2k
_
time. Since there are totally O
_
t
1+
1
2k

1
2k
_
terms, this algorithm has
time complexity

O
_
polylog(n) dt
2
/
_
, as claimed. (It is interesting that this complexity is independent
of k. But a better way to simulate e
iT
using multiple copies of might change this fact.)
14
Now we have all the ingredients to prove Proposition 1:
Proof of Proposition 1. Suppose [ y =
d
j=1

u
j
_
, where
d
j=1

2
j
= . Then we have
[1 [ y =
d

j=1

j
[1

u
j
_
=
1

2
d

j=1

j
_

w
+
j
_

j
__
. (68)
Consider the following algorithm for preparing

_
(again, we assume that phase estimation is perfect
in the following description, and we will take the error of phase estimation into account later):
1. We prepare the state [1[y
5
by using the algorithm in Lemma 10.
2. We run phase estimation on e
i
starting with [1[y, obtaining the state
1

2
d

j=1

j
_

w
+
j
_

s
2
j
_

j
_

s
2
j
_
_
+[1[ [0 . (69)
3. We perform the measurement [00[ , I [00[ on the last register. Then, conditioned on seeing the
outcome corresponding to I [00[, the state becomes proportional to
1

2
d

j=1

j
_

w
+
j
_

s
2
j
_

j
_

s
2
j
_
_
(70)
Furthermore, the probability of seeing this outcome is =
d
j=1

2
j
.
4. We append a qubit in the state [0, and perform the controlled-rotation
[z [0 [z
_
_
a[z[
1/2
z
[1 +

1
a
2
[z[
z
2
[0
_
_
, (71)
on the last two registers (note that s
j
a), obtaining a state proportional to
1

2
d

j=1

j
_

w
+
j
_

s
2
j
__
as
1
j
[1 +
_
1a
2
s
2
j
[0
_

j
_

s
2
j
__
as
1
j
[1 +
_
1a
2
s
2
j
[0
__
(72)
5. We measure the last qubit in the standard basis. Then, conditioned on seeing outcome 1, the rest of
the state is proportional to
1

2
d

j=1

j
_
s
1
j

w
+
j
_

s
2
j
_
+s
1
j

j
_

s
2
j
_
_
(73)
Furthermore, since s
j
b = a, the probability of seeing outcome 1 is
_
1/
2
_
.
6. We uncompute the

s
2
j
_
s and

s
2
j
_
s by undoing phase estimation, obtaining a state proportional to
1

2
d

j=1

j
s
1
j
_

w
+
j
_
+

j
__
=
d

j=1

j
s
1
j
[0

v
j
_
= [0

_
. (74)
The reduced state of this state on the second register is

_
, as desired.
5
The dimension of the rst register is 2.
15
7. The above procedure only succeeds with probability q :=
_
/
2
_
. We using amplitude amplica-
tion to raise this probability to 1O(). This ensures that we have prepared

_
to accuracy O().
By Corollary 1, this requires

O
_
log(1/)/

_
repetitions of steps 1-6 and their inverses.
Now we take the error of phase estimation into account, and analyze the time complexity of this
algorithm. In step 2, we do not get the eigenphase s
2
j
exactly, but instead get some (random)
j

s
2
j
(although phase estimation singles out the eigenphase 0 perfectly). This implies that we only ob-
tain the states in steps 2-6 approximately. Since we want to prepare

_
to accuracy O(), we need
to make sure that [
j
s
2
j
[ = O
_
s
2
j
_
. Since s
2
j
1/(d
2
), we need to set the precision of phase esti-
mation to be O
_
/(d
2
)
_
. This implies that we need to simulate e
it
to accuracy O
_
/
2
_
6
for t =
O
_
d
2
/
_
during phase estimation. Then by Lemma 13, this takes

O
_
polylog(n) d(d
2
/)
2
/(/
2
)
_
=

O
_
polylog(n) d
3

6
/(
3
)
_
time. Meanwhile, by Lemma 10, it takes

O(polylog(n) ) time to prepare
[y. Thus, one iteration of steps 1-6 takes

O
_
polylog(n) (+d
3

6
/(
3
))
_
time. Since amplitude am-
plication requires

O
_
log(1/)/

_
repetitions of steps 1-6 and their inverses, this algorithm has time
complexity

O
_
polylog(n) (+d
3

6
/(
3
))/

_
, as claimed.
We remark that the nal state of the algorithm in Proposition 1 is of the form
[ :=

p[1 [0

_
[ +

1 p([1 [1 [ +[0[0[
0
+[0[1[
1
) (75)
where p = 1 O(),

= (

1
,

2
, . . . ,

d
)
T
satises
_
_
_

_
_
_ = 1 and

= O
_

j
_
, and [ is some
normalized state, and [ , [
0
, [
1
are some unnormalized states
7
. This fact will be useful for the following
algorithms for estimating [

j
[ and sgn
_

j
_
.
Proposition 2. Suppose |F
i
| , for any i [n], and |F
+
| 1/a, |F| b. Moreover, suppose
[y
i
[ , for any i [n]. Let =/, = / and = b/a. Then for any j [d],

can be estimated
up to an additive error > 0 with probability at least 2/3 in

O(polylog(n) (+d
3

6
/(
3
))/(
2

))
time.
Proof. Consider the following algorithm:
1. We run the algorithm in Proposition 1 to get the state [ in Eq.(75).
2. We measure the rst three registers of [ in the standard basis. Then the probability of seeing
outcome (1, 0, j) is q
j
:= p

2
.
3. We use amplitude estimation to estimate q
j
up to an additive error O
_

2
_
with probability at least 2/3.
This requires

O
_
1/
2
_
repetitions of the above procedure and its inverse.
4. Let q

j
be the estimate of q
j
. Then we return
_
q

j
as the estimate of

.
Now we prove the correctness of this algorithm. Since p = 1 O() and

= O(), we have

q
j

## = O(). Meanwhile, with probability at least 2/3, we have

j
q
j

= O
_

2
_
, which implies
6
We want to the disturbance caused by the imperfection of simulating e
it
at most O(q).
7
The dimensions of [ , [
0
and [
1
are d times of the dimension of [.
16
that

_
q

q
j

## = O(). Then it follows that

_
q

_
q

q
j

q
j
[

j
[

= O(), as
desired.
Now we analyze the time complexity of this algorithm. By Proposition 1, one iteration of steps 1-2 takes

O
_
polylog(n) (+d
3

6
/(
3
))/

_
time. Since amplitude estimation requires

O
_
1/
2
_
repetitions
of steps 1-2 and their inverses, this algorithm takes

O(polylog(n) (+d
3

6
/(
3
))/(
2

)) time, as
claimed.
Proposition 3. Suppose |F
i
| , for any i [n], and |F
+
| 1/a, |F| b. Moreover, suppose
[y
i
[ , for any i [n]. Let =/, =/ and =b/a. Then for any j [d], if

, then sgn
_

j
_
can be determined correctly with probability at least 2/3 in

O(polylog(n)
3
(+d
3

4
/
3
)/(
3/2
)) time.
Proof. Determining sgn
_

j
_
is more complicated than determining

## . The problem is that the algorithm

in Proposition 1 only produces a quantum state

_
and one cannot know the global phase of a quantum state
8
. To overcome this problem, we modify the design matrix F and response vector y, such that we know for

j
0
, for some j
0
. Then we only need to decide whether the sign of other

j
agrees
with that of

i
0
.
Formally, let us pick an nd matrix G = (G
i, j
) such that:
Gs columns are orthogonal and have norm 1/

d. Namely,
n
i=1
G
i, j
G
i, j
=
j, j
/d, for any j, j

[d].
[G
i,1
[ = 1/

## nd, for any i [n];

There exists constants c
1
and c
2
such that c
1
/n
d
j=1
[G
i, j
[
2
c
2
/n, for any i [n].
Such matrix exists and can be easily constructed (e.g. using the Hadamard matrix). Furthermore, let z be
an n-dimensional vector dened as z = (z
1
, z
2
, . . . , z
n
)
T
=
_
_

_
_
(G
1,1
, G
2,1
, . . . , G
n,1
)
T
. Namely, z equals the
rst column of G times
_
_

_
_
.
Now let
F

=
1

2
_
F 0
0 G
_
, y

=
1
_
1+d
1
_
_

_
_
2
_
y
z
_
. (76)
Then, we have

:= ((F

)
T
F

)
1
(F

)
T
y

2
1+d
1
_
_

_
_
2
_
_
_
_
_
_
_
_
_
_
_
_
_
_

2
.
.
.

d
_
_

_
_
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
. (77)
Namely,

## is proportional to the concatenation of

= (

1
,

2
, . . . ,

d
)
T
and := (
_
_

_
_
, 0, . . . , 0)
T
. Note that
we know that the (d +1)-th entry of

## is always positive. Furthermore, let

= (

1
,

2
, . . . ,

2d
) be the
normalized version of

## . Then one can see that

j
=

j
/

2 for j [d],

d+1
= 1/

2, and

j
= 0 for
8
However, we can decide, e.g. sgn
_

2
_
, provided that [

1
[ and [

2
[ are big enough.
17
j > d +1. To decide sgn
_

j
_
= sgn
_

j
_
for j [d], our strategy is to decide whether

j
+

d+1
is larger
than

d+1
or not.
By construction, we have tr
_
(F

)
T
F

_
=|y

| = 1. Let F

i
= (F

i,1
, F

i,2
, . . . , F

i,d
), for i [n]. Then let

=
min
i[n]
|F

i
| and

= max
i[n]
|F

i
|. Then we have

= (),

## = O(), and hence

:=

= O().
Moreover, since all the singular values of G are 1/

of F

:= (F

## ) = O(). Finally, let

= min
i[n]
[y

i
[ and

= max
i[n]
[y

i
[. Then we have

min(,
_
_

_
_
/

nd)
_
1+
_
_

_
_
2
/d
,

max(,
_
_

_
_
/

nd)
_
1+
_
_

_
_
2
/d
.
(78)
Then by Eq.(48), we get

:=

= O
_

2
/
_
. Using these facts and Proposition 1, we can prepare a
state of the form
[ :=

p[1 [0

_
[ +
_
1 p([1 [1 [ +[0 [0 [
0
+[0 [1 [
1
) (79)
where p = (1),

= (

1
,

2
, . . . ,

2d
)
T
satises
_
_
_

_
_
_ = 1 and

= O
_

j
_
, and [ is some nor-
malized state, and [ , [
0
, [
1
are some unnormalized states, in

O(polylog(n)
3
(+d
3

4
/
3
)/
3/2
)
time.
Now suppose we measure the rst two registers of the state in Eq.(79) in the standard basis. Then with
probability (1), we obtain the outcome (1, 0). Accordingly, the rest of the state becomes

_
[. Then,
let

j
_
=
1

2
([ j [d +1), for any j [d]. Then, we measure the state

_
in the basis

+
j
_
,

j
_

[i : i [2d], i ,j, d +1, and estimate the probability of seeing the outcome corresponding to

+
j
_
. One
can see that this probability is
q

:=
(

j
+

d+1
)
2
2

(

j
+

d+1
)
2
2
= q :=
(

j
+1)
2
4
. (80)
In fact, we have [q

q[ = O(). Now, if

j
, we would have q (1 )/4; if

j
, we would
have q (1 +)/4. Therefore, by estimating q

## up to an additive error O(), we can distinguish these

two cases and hence determine sgn
_

j
_
. We can estimate q

## using amplitude estimation, and this requires

O(1/) repetitions of the above procedure and its inverse. Therefore, the time complexity of this algorithm
is

O(polylog(n)
3
(+d
3

4
/
3
)/(
3/2
)), as claimed.
In the above argument, we have assumed that we know
_
_

_
_
exactly. In fact, we only need to estimate
it to a relative error O(). This would cause at most O() disturbance to the above argument. By Theorem
1, estimating
_
_

_
_
to a relative error O() takes

O(polylog(n) (+d
3

6
/(
3
))/(

)) time, which is
negligible compared to the other part of this algorithm.
Remark: We can reduce the failure probability of the algorithms in Proposition 2 and Proposition 3
to arbitrarily small > 0 by repeating them O(log(1/)) times and taking the median or majority of the
estimates obtained.
Combining Proposition 2 and Proposition 3, we obtain:
18
Theorem 2. Suppose |F
i
| , for any i [n], and |F
+
| 1/a, |F| b. Moreover, suppose [y
i
[
, for any i [n]. Let = /, = / and = b/a. Then there exists a quantum algorithm that produces
an estimate

of

such that
_
_
_

_
_
_ with probability at least 2/3, in

O(polylog(n) d
1.5
(

d/ +

2
/+
6
d
5
/(
4
))/(

)) time.
Proof. Consider the following algorithm:
For each j [d]:
1. We use the algorithm in Proposition 2 to estimate

## up to an additive error /(8

d) with
probability at least 11/(6d).
2. Let
j
be the estimate of

. Then:
If
j
/(8

## d), then we use the algorithm in Proposition 3 (setting = /(8

d)) to de-
termine sgn
_

j
_
with probability at least 11/(6d). Let
j
be the output of this algorithm.
Then we set

j
=
j

j
;
Otherwise, we set

j
to be +
j
or
j
arbitrarily.
Let S = j [d] :
j
/(8

## d). Then, with probability at least 2/3, we have

d
, j [d], (81)
and

j
= sgn
_

j
_
, j S. (82)
This implies that

d
, j S. (83)
Meanwhile, we have

d
, j , S. (84)
Thus,

d
, j , S. (85)
It follows that
_
_
_

_
_
_
2
=

jS

2
+

j,S

2
64
+

2
4

2
. (86)
This proves the correctness of this algorithm.
Now we analyze the time complexity of this algorithm. For each j [d], step 1 takes

O(polylog(n)
d(+d
4.5

6
/(
3
))/(
2

## )) time by Proposition 2, and step 2 takes

O(polylog(n)
3

d(+d
4.5

4
/
3
)
/(
3/2
)) time by Proposition 3. Since we need to do step 1 and 2 for every j [d], this algorithm has time
complexity

O(polylog(n) d
1.5
(

d/ +
2
/+
6
d
5
/(
4
))/(

)), as claimed.
Remark: We can reduce the failure probability of the algorithm in Theorem 2 to arbitrarily small > 0
by making the success probabilities in steps 1 and 2 to be 1O(/d). The running time of this algorithm
will be increased by a factor of O(log(d/)).
19
6 Quantum algorithm for estimating the t quality
So far, we have presented two quantum algorithms on the estimation of the best-t parameters

. One may
notice that they both rely on some knowledge of the t quality . Now we show that this is without loss of
generality, because can be estimated quickly, as indicated by the following theorem:
Theorem 3. Suppose |F
i
| , for any i [n]. Moreover, suppose [y
i
[ , for any i [n]. Let
=/, = / and = (F). Then can be estimated up to an additive error > 0 with probability at
least 2/3 in

O
_
polylog(n) (+d
3

4
/)/
_
time.
Proof of Theorem 3. Our strategy for estimating = | y|
2
= |
F
[y|
2
is as follows. We prepare the state
[y, and perform the projective measurement
F
, I
F
on it, and estimate the probability of seeing the
outcome corresponding to
F
. The measurement
F
, I
F
is implemented by running phase estimation
on e
i
and checking whether the eigenphase is close to 0 or not.
Specically, suppose [ y =
d
j=1

u
j
_
. Consider the following algorithm for estimating = | y|
2
=

d
j=1

2
j
(again, we assume that phase estimation is perfect in the following description, and we will take the
error of phase estimation into account later):
1. We use the algorithm in Lemma 10 to prepare the state [y.
2. We run phase estimation on e
i
starting with [y, obtaining the state
d

j=1

u
j
_

s
2
j
_
+[ [0 , (87)
3. We perform the measurement [00[ , I [00[ on the second register. Then, conditioned on seeing
the outcome corresponding to I [00[, the state becomes proportional to
d

j=1

u
j
_

s
2
j
_
(88)
Furthermore, the probability of seeing this outcome is q :=
d
j=1

2
j
= .
4. We use amplitude estimation to estimate q up to an additive error O() with probability at least 2/3.
This requires

O(1/) repetitions of the above procedure and its inverse.
Now we take the error of phase estimation into account, and analyze the time complexity of this al-
gorithm. In step 2, we do not get the eigenphase s
2
j
exactly, but get some (random)
j
s
2
j
(although
phase estimation can single out the eigenphase 0 perfectly). This implies we only obtain the states in
steps 2-3 approximately. Since we want to estimate q to a relative error O(), we need to make sure that

j
s
2
j

s
2
j
/3 (so that
j
,= 0). Since s
2
j
1/(d
2
), we need to set the precision of phase estimation to
be O
_
1/(d
2
)
_
. It follows that we need to simulate e
it
to accuracy O()
9
for t = O
_
d
2
_
during phase
estimation. This can be done in

O
_
polylog(n) d(d
2
)
2
/
_
=

O
_
polylog(n) d
3

4
/
_
time by Lemma
9. Meanwhile, it takes

O(polylog(n) ) time to prepare [y by Lemma 10. Thus, one iteration of steps 1-3
takes

O(polylog(n) (+d
3

4
/)) time. Since amplitude estimation requires

O(1/) repetitions of steps
1-3 and their inverses, this algorithm takes

O(polylog(n) (+d
3

4
/)/) time, as claimed.
Remark: We can reduce the failure probability of the algorithm in Theorem 3 to arbitrarily small > 0
by repeating this algorithm O(log(1/)) times and taking the median of the estimates obtained.
9
We want the disturbance caused by the imperfection of simulating e
it
to be at most O().
20
7 Conclusion and open problems
To summarize, we have presented several quantum algorithms for estimating the best-t parameters and the
quality of least-square curve tting. The running times of these algorithms are polynomial in logn, d, , ,
, 1/ and 1/, where n is the number of data points to be tted, d is the dimension of feature vectors,
is the condition number of the design matrix, and are some parameters reecting the variances of the
design matrix and response vector respectively, is the t quality, and is the tolerable error. Different from
previous quantum algorithms for these tasks, our algorithms do not require the design matrix to be sparse,
and they do completely determine the tted curve. They are developed by combining phase estimation and
the density matrix exponentiation technique for nonsparse Hamiltonian simulation.
Our work raises many interesting questions:
First, our work suggests that quantum algorithms might be able to solve curve tting exponentially faster
than classical algorithms. But we do not really prove it. Instead, we show that there exist extremely fast
quantum algorithms for curve tting when d, , and are reasonably small. But it is unknown whether
classical algorithms can solve this case very fast. It would be interesting to study the biggest possible gap
between our algorithms and classical algorithms.
Second, in this paper we focus on giving upper bounds for the quantum complexity of curve tting. It
is also worth studying lower bounds for the quantum complexity of the same problem. In particular, is it
possible to generate the state

_
to accuracy > 0 in poly(logn, logd, , , , 1/, 1/) time? Note that
this complexity is poly-logorithmic in both n and d. If such an algorithm exists, it might have immediate
implications to certain physical problems, as suggested by [15]. We suspect that no such algorithm exists.
Can this be proved under some complexity assumption, say, BQP ,= PSPACE?
Third, Ambainis [1] has introduced a technique called variable-time amplitude amplication and used
it to improve HHLs algorithm for solving linear systems. Since our algorithms for estimating the best-t
parameters have some similarity with HHLs algorithm, is it possible to use his technique to improve our
algorithms as well?
Fourth, as one can see, our algorithms crucially depend on the ability to simulate the evolutions of
nonsparse Hamiltonians. Here we have used the density matrix exponentiation method of [12]. It would be
interesting to know whether this method is optimal. Namely, is there a more efcient way of simulating e
it
by consuming multiple copies of ? If so, it would lead to the improvement of our algorithms.
Finally, it would be interesting to apply our algorithms to solve practical problems in many elds.
Acknowledgement
This work was supported by NSF Grant CCR-0905626 and ARO Grant W911NF-09-1-0440.
References
[1] A. Ambainis. Variable time amplitude amplication and a faster quantum algorithm for solving
systems of linear equations. In Proc. 29th STACS, pages 636-647, 2012.
[2] S. L. Arlinghaus. PHB Practical Handbook of Curve Fitting. CRC Press, 1994.
[3] D. W. Berry, G. Ahokas, R. Cleve and B. C. Sanders. Efcient quantum algorithms for simulating
sparse Hamiltonians. Communications in Mathematical Physics 270, 359 (2007).
[4] D. W. Berry and A. M. Childs. Black-box Hamiltonian simulation and unitary implementation.
Quantum Information and Computation 12, 29 (2012).
21
[5] G. Brassard, P. Hoyer, M. Mosca and A. Tapp. Quantum amplitude amplication and estimation.
arXiv:quant-ph/0005055.
[6] O. Bretscher. Linear Algebra With Applications, 3rd ed. Upper Saddle River NJ: Prentice Hall.
[7] A. M. Childs, R. Cleve, E. Deotto, E. Farhi, S. Gutmann and D. A. Spielman. Exponential algorithmic
speedup by quantum walk. In Proc. 35th ACM STOC, pages 59-68, 2003.
[8] A. M. Childs. On the relationship between continuous- and discrete-time quantum walk. Communi-
cations in Mathematical Physics 294, pages 581-603 (2010).
[9] A. M. Childs and R. Kothari. Simulating sparse Hamiltonians with star decompositions. Lecture
Notes in Computer Science 6519, pages 94-103 (2011).
[10] A. W. Harrow, A. Hassidim and S. Lloyd. Quantum algorithm for linear systems of equations.
Physical Review Letters 103, 150502 (2009).
[11] A. Y. Kitaev. Quantum measurements and the Abelian Stabilizer Problem. arXiv:quant-ph/9511026.
[12] S. Lloyd, M. Mohseni and P. Rebentrost. Quantum principal component analysis. arXiv: 1307.0401.
[13] D. Nagaj, P. Wocjan and Y. Zhang. Fast amplication of QMA. Quantum Information and Compu-
tation 9, 1053 (2009).
[14] M. Suzuki. Fractal decomposition of exponential operators with applications to many-body theories
and Monte Carlo simulations. Physical Letters A 146(6), pages 319-323, 1990.
[15] N. Wiebe, D. Braun and S. Lloyd. Quantum data tting. Physical Review Letters 109, 050505 (2012).
22