Professional Documents
Culture Documents
https://doi.org/10.1007/s11128-022-03607-5
Received: 12 August 2021 / Accepted: 6 July 2022 / Published online: 26 July 2022
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022
Abstract
In this paper, we present a series of new results about learning of concentrated Boolean
functions in the quantum computing model. Given a Boolean function f on n variables,
its concentration refers to the dominant terms in its Fourier–Walsh spectrum. We
show that a quantum probabilistically approximately correct learning model to learn
a Boolean function characterized by its concentration yields improvements over the
best-known classical method. All of our results are presented within the framework of
query complexity, and therefore, our advantage represents asymptotic improvements
in the number of queries using a quantum approach over its classical counterpart.
Next, we prove a lower bound in the number of quantum queries needed to learn
the function in the distribution-independent settings. Further, we examine the case
of exact learning which is the learning variant without error. Here, we show that the
query complexity grows as 2βn for some 0 < β < 1 and therefore remains intractable
even when quantum approaches are considered. This proof is based on the quantum
information theoretic approach developed by researchers for the restricted case of
k-sparse functions.
123
256 Page 2 of 23 K. Palem et al.
1 Introduction
Quantum computing has become a major area of research with the hope of finding
solutions to problems that are otherwise intractable using classical computers. Shor’s
celebrated integer factoring algorithm [33] offered great hope in this regard, as did
many results that followed. Another notable well-known result in the same vein is
Grover’s algorithm [21] for unordered search.
On the other hand, machine learning has clearly become a dominant area for
research and application in computer science. Consequently, researchers have been
studying quantum computing techniques and algorithms in this context. Two impor-
tant paradigms of computational learning theory are probably approximately correct
(PAC) learning [36] and exact learning (see Angluin [6]). These learning paradigms
have attracted a lot of research attention over the past few decades with a rich history
in the context of classical computing models.
Both PAC and exact learning problems are based on learning a target n-bit Boolean
function f drawn from among a given set of Boolean functions C = { f 1 , f 2 , . . . fr }
referred to as a concept class. The process of learning follows the supervised style
where examples of the form {x, f (x)} are used to discern the structure of the candi-
date function f . In exact learning, there is no room for error in characterizing f . In
contrast, in PAC learning the algorithm can output a hypothesis function h such that
the error margin between h and f should, with a high probability (Probably), be small
(Approximately Correct). Unless stated otherwise, we will use the phrase machine
learning to refer to PAC or exact learning and any deviations will be explicitly identi-
fied.
Quantum versions of PAC and exact learning have been proposed, where instead
of classical examples (x, f (x)) of the function, the learning algorithm has access to
an
√ oracle that provides quantum examples in the form of quantum superposed state
D(x)|x, f (x). It is noteworthy that improvements in learning algorithm, such as
in [11] and [7], are mostly reported through query complexity—the number of queries
(examples) required to learn a function from a given class.
In this paper, we present three new results that characterize improvements that
can be achieved in learning of classes of functions associated with the concept of
concentration [29]. Concentration has its roots in the spectra of Boolean functions and
informally, captures the number of dominant terms in its Fourier–Walsh spectrum. In
this work, we focus our results for the class of concentrated Boolean functions, and
use 0 ≤ < 1 to denote the amount of concentration and M denotes the number of
dominant terms in the spectrum of the functions. More specifically, the M dominant
terms have a combined l2 mass of at least 1 − . Note that this class of function is
a larger class of functions than the class of junta or k-sparse functions, which were
studied in previous works in this field (see [9, 11]). Our main contributions are as
follows:
1. In our first result, we exhibit a new quantum learning algorithm using uniform
quantum queries with improvement over the best-known classical algorithm by
Hassanieh et. al. [23] in the query complexity metric. We propose a quantum
algorithm that can PAC learn the same class of function in O( M2 ) uniform quantum
123
Quantum learning of concentrated... Page 3 of 23 256
n
M log( 2 )n
queries compared to the query complexity O(
M
) of Hassanieh’s algorithm.
Our algorithm is based on the classical Kushilevitz–Mansour algorithm [26], of
3
which we give a detailed complexity analysis and prove a tight bound O( nM3 ) on
query complexity, since the exact number was not included in the original paper.
2. Next, we prove a lower bound on the number of quantum queries needed to learn
the class in the distribution-independent
√
case. Here, we show that the number of
queries grows as (M + ), which matches our upper bound of our quantum
M
algorithm when is considered to be a constant. Note that this bound holds for the
distribution-independent case; when the distribution is uniform (i.e., using uniform
quantum queries, as in the first result), one can possibly arrive at a smaller lower
bound.
3. Finally, we prove that in the context of exact learning, the number of necessary
uniform quantum queries to learn the class of concentrated Boolean functions
grows as ( 2 log
n M
n ) which is exponential in n. Therefore, an approximate learning
algorithm such as PAC learning is significantly better for learning this class of
functions, even in the quantum domain.
In keeping with the way PAC and exact learning models are defined, all of these
claims apply when the learning schemes in question are accompanied by a proba-
bilistic guarantee. We use the same guarantee when comparing classical and quantum
approaches in contribution 1 and the lower bound in contribution 2, or comparing PAC
learning upperbound in contribution 1 and exact learning lower bound in contribution
3.
In the context of learning, we note that Grover’s algorithm [21] and the Harrow–
Hassidim–Lloyd linear systems solver [22] have been explored with the goal of
improving machine learning systems. For several specific machine learning meth-
ods, improvements from quadratic speedups (such as for reinforcement learning [18])
to exponential speedups (such as for support vector machine [30]) have been reported.
A more extensive survey of proposed quantum machine learning algorithms can be
found at [12].
Multiple results exist for both upper bounds and lower bounds and in time and query
complexity in the settings of exact learning [15] and in PAC learning [26]. In classical
context, past result indicates a strong dependence between the sample complexity for
PAC learning and the Vapnik Chervonenkis (VC) dimension of the concept class [13].
The same correlation can be found in the context of quantum learning, as shown in
[10].
Finally, improvements in the number of queries needed over classical models have
been reported for juntas [11], or in the exact learning of k-sparse functions [7]. A
survey on results in quantum machine learning can be found in [8].
123
256 Page 4 of 23 K. Palem et al.
Machine learning has been one of the most active research areas in computer science
for decades. Especially with the evolutionary development of neural network and deep
learning around the early 2000 s, machine learning has realized major advancements in
numerous computing activities, such as object and face recognition, natural language
processing, and surrogate models for complicated systems.
However, general machine learning (and deep learning in particular) is usually
considered an empirical science. The reason behind that lies in the fact that while very
powerful in practical applications, most machine learning algorithms are designed
based on empirical practice, and are tested based on performance accuracy on data.
There are only a few number of learning models that establish a theoretical bound on
the accuracy of the learning process. PAC learning and exact learning are two such
models.
123
Quantum learning of concentrated... Page 5 of 23 256
In order to learn a target function, learning algorithms in general need to access infor-
mation about that function. The usual form of information that machine learning
algorithms use is examples of that function in the form of input–output pairs. We call
those algorithms supervised learning algorithm, and the input–output pairs queries.
Whereas in practical (we use this term to distinguish with theoretical machine learn-
ing) machine learning, the main performance metric is running time and accuracy, in
theoretical machine learning researchers often also care about another criteria: query
complexity, i.e., the number of queries needed to learn an arbitrary function from the
concept class. There are two intuitive reasons behind this metrics:
– There are numerous machine learning tasks where the difficulty lies in finding the
examples of the function. Examples of this are tasks where publicly available data
set is nonexistent, more domain-specific tasks, and tasks that require substantial
human manual effort to label. In fact, data labeling is becoming an increasing
popular job in many countries [37] because of the hardness of obtaining data using
other means. A survey on data collection and machine learning by Roh et. al. [31]
stresses this problem more in depth.
– In theoretical machine learning, researchers often also look at the learning process
from an information theoretical point of view. Using the metric of query complex-
ity, researchers can reason about the amount of information needed to learn the
concept class, which can give substantial insight on how the class can be learned
using practical learning algorithms.
The learning paradigm where learning algorithms are allowed to access a query
oracle that, when queried, gives examples of the function in the form of pairs of input
and output (x, f (x)) is called a query model. There are two types of queries oracle in
classical learning theory.
– A membership query oracle E X when queried with a specific input x returns the
corresponding specific output f (x). The query type is called a membership query.
– A uniform random example oracle U E X returns a pair (x, f (x)) where x is drawn
uniformly randomly from the set of all possible inputs ∈ {−1, 1}n . Its query is
called uniform random query.
From the definition, a uniform random query can be simulated by a membership query.
A membership query is considered to be much harder to obtain than a uniform random
query.
After being around for more than a decade as a computing model of interest, there
was a sudden elevation in the attention that quantum computing received in the 1990 s
with the discovery of Shor’s efficient quantum algorithm for integer factoring [33]
and Grover’s quadratic speedup over the classical best algorithm for unordered search
[21].
123
256 Page 6 of 23 K. Palem et al.
Since the paradigm is based on the principles of quantum physics, there are some
fundamental differences with classical computation in the way the computer and
the computation is defined. Moreover, some phenomena that are unique to quantum
mechanics are believed to endow the model with a power beyond classical models
of computation. We will briefly highlight some salient features of the model and rec-
ommend the interested reader to the classic textbook by Nielsen and Chuang [28] for
details.
While a ‘bit’ in a classical computer can have a value of 0 or 1 at any given
instant, a qubit in a quantum computer is defined as a ray in a two-dimensional Hilbert
space. For example, following the ket notation in quantum physics, if |0 and |1 form
a (computational) basis for the two-dimensional Hilbert space, the state vector of a
‘closed’ qubit can be written as a linear combination of these basis vectors α|0+β|1,
where α, β ∈ C are called probability amplitudes and |α|2 + |β|2 = 1. Computation
is carried on the qubit through unitary transformations on the Hilbert space. A unitary
transformation U is a length preserving transformation over the Hilbert space: UU † =
U † U = I . A measurement operation (in the computational basis) then yields |0 with
probability |α|2 and |1 with probability |β|2 . Such a linear combination is referred
to as a linear superposition, which results in what is dubbed often in the literature as
quantum parallelism. The notion of superposition can be extended to multiple
qubits.
A quantum state |φ of n qubits can be written as the superposition i∈{0,1}n αi |i
where αi ∈ C and i∈{0,1}n |αi |2 = 1. The idea of computation through unitary
transformation can be lifted from a single qubit to multiple qubits. For a measurement
in the standard basis, the outcome of measuring state |φ is i with probability |αi |2 .
Last but not the least, in addition to superposition, quantum mechanical phenomena
like entanglement (leading to special states with non-local correlations) and inter-
ference in probability amplitudes are harnessed by quantum algorithm designers to
achieve speed-up over classical algorithms.
Given the advancements of both machine learning and quantum computing, their
intersection has received much attention. Many works have been done in this topic
in the last two decades. Two main quantum techniques employed in this line of work
are amplitude amplification and HHL algorithm for solving well-behaved linear-based
system of equations.
Amplitude amplification, a generalization of Grover’s unstructured search algo-
rithm [21], helps speed up learning algorithms that can be translated to an unstructured
search task such as clustering via minimum spanning tree and k-median [4, 5], k-
nearest neighbors [38], and quantum perceptron and deep learning [39, 40]. The
general speedup that can be achieved for those algorithms is usually quadratic based
on the general speedup of amplitude amplification.
The HHL algorithm for well-structured linear-based system of equations solver,
developed by Harrow, Hassidim, and Lloyd [22], brings an exponential speedup to
specific machine learning algorithms. The algorithms benefiting from HHL algorithm
123
Quantum learning of concentrated... Page 7 of 23 256
are algorithms that work with linear equation systems or matrix operations, such as
principle component analysis [27] and support vector machine [30].
A more detailed survey of the state of the art of practical quantum machine learning
can be found in [12]. It is worth noting that most of these works, like most practical
machine learning results, are heuristics in nature. In addition, most of them rely on
underlying assumptions, such as the sparsity of the matrices or the existence of an
efficient quantum memory, to be efficient. We refer the readers to [12] and [1] to learn
more about this issue. Recently, there has been a line of work relying on the assumption
of a classical memory model with the same functionalities as a quantum one to speed
up classical algorithms (see [19, 34, 35]), thus emphasizing the role of the underlying
assumptions in quantum speedup in machine learning applications.
3.2 Quantum computing and theoretical machine learning, quantum query model
With an increasing number of results in quantum speedup for practical machine learn-
ing, theoretical machine learning has also been receiving increasing quantum interest.
This subfield started when Bshouty and Jackson introduced the quantum PAC model
in [14]. In this quantum PAC model, instead of classical queries, the quantum model
is allowed √
to access a quantum oracle, which when given the n + 1 qubit input state
x∈{0,1}n D(x)|x, b (n qubits for x and one for b), returns the following n + 1
qubit state:
D(x)|x, b ⊕ f (x),
x∈{0,1}n
123
256 Page 8 of 23 K. Palem et al.
When working with Boolean functions, in many cases it is useful to look at the func-
tions in the Fourier basis. In this section, we introduce the basics of Fourier analysis
of Boolean functions, which is an important idea with numerous applications in many
fields of Computer Science. We refer the reader to [29] for a more rigorous introduction
to this topic.
In the context of Fourier analysis of Boolean functions, it is more convenient to
define a Boolean function with domain {−1, 1}n and image {−1, 1} instead of the usual
domain {0, 1}n and image {0, 1}. In particular, 0 is mapped to 1, and 1 is mapped to -1
in the conversion. The core idea of the study of Fourier analysis of Boolean functions
is that a Boolean function f : {−1, 1}n → {−1, 1} can be uniquely written as:
f (x) = fˆ(S)x S
S⊆[n]
where the x S = i∈S xi are monomials called the Fourier terms, which together
create the Fourier basis, and the fˆ(S) are called the Fourier coefficients. Note that
sometimes we use the subset S to refer to a Fourier term when there is no confusion.
We mention here some properties of the Fourier expansion of Boolean functions
that will be used later in the paper, such as
fˆ(S)2 = 1 (1)
S⊆[n]
123
Quantum learning of concentrated... Page 9 of 23 256
and
Ex∈{−1,1}n [( f (x) − h(x))2 ] = ( fˆ(S) − ĥ(S))2 (3)
S⊆[n]
In this paper, our target concept class of learning is the class of concentrated func-
tions. A Boolean function f : {−1, 1}n → {−1, 1} is called an -concentrated function
in a set M, for some < 1 and M a set of subsets of [n], if:
fˆ(S)2 ≤
/M
S∈
Note that this class of functions contains, and therefore is a more general class than,
some classes of functions studied in previous quantum machine learning literature,
such as:
– the class of log |M|-junta [11]: a k-junta is a Boolean function that depends on
at most k variables in n variables—clearly a k-junta only has 2k nonzero Fourier
coefficients.
– the class of |M|-sparse functions [7]: a k-sparse Boolean function only has at most
k nonzero Fourier coefficients.
123
256 Page 10 of 23 K. Palem et al.
We begin this section with the second step of the learning process: learn the target
function knowing its concentration set, and work backward from there. This process
follows the construction steps of Kushilevitz–Mansour algorithm [26], together with
our bound and complexity analysis.
Proof First, let us describe a subroutine used to estimate the Fourier coefficients of an
arbitrary Fourier term in a Boolean function:
Lemma 1 Let f : {−1, 1}n → {−1, 1}. For any S ⊆ [n], accuracy > 0 and error
probability δ > 0, one can approximate its Fourier coefficient fˆ(S) using an estimate
f˜(S) that satisfies:
Proof Suppose the uniform queries are {(x1 , f (x1 )), (x2 , f (x2 )), . . . (xm , f (xm ))}.
Calculate f˜(S) as:
m
f (xi )xiS
f˜(S) = i=1
m
With the help of the above lemma, for each term S ∈ L, we estimate the Fourier
coefficient of S as f˜(S) using Lemma 1 to some accuracy γ that will be determined
later. It is crucial to note that we can reuse the same query results to estimate the
coefficients of all terms in L, since the queries are uniformly random. Define
g(x) = f˜(S)x S
S∈L
and
h(x) = sign(g(x))
123
Quantum learning of concentrated... Page 11 of 23 256
as in the Kushilevitz–Mansour Algorithm [26]. We will prove that h(x) is the desired
learning result by proving that Pr[h(x) = f (x)] ≤ α = O().
If f (x) = h(x) then | f (x) − g(x)| ≥ 1. From Eq. 3, we have:
For S ∈
/ L, we have f (S)2 ≤ M and therefore
fˆ(S)2 ≤ M · = (4)
M
S∈M\L
For S ∈
/ M, the condition of the function f guarantees that S ∈M
/ fˆ(S)2 ≤ , thus
fˆ(S)2 ≤ (5)
S ∈(M∪L)
/
√
Picking γ = √
|L|
yields us
( fˆ(S) − f˜(S))2 ≤ (6)
S∈L
√
Estimating Fourier coefficients to error of γ = √
|L|
requires O( |L|
) queries. Thus,
the total time taken is O( |L| ). Because each Fourier coefficient is independent, one
2
can reuse the queries to estimate all the Fourier coefficients. Thus, the total number of
queries needed for learning f is O( |L| ). The probability of success of the procedure
is high since the probability of success of estimating the Fourier coefficients is high.
123
256 Page 12 of 23 K. Palem et al.
Table 1 Different algorithms for solving the sparse Fourier sampling problem using different types of query
Hassanieh2012 [23] O(M log(2n /M)n/) O(M log(2n /M)n/) Uniform random
query
Indyk2014 [25] Õ(2n n c ) Õ(Mn) Uniform random
query
Goldreich–Levin [20, 26] O(n M 3 / 3 ) O(n M 3 / 3 ) Membership query
Quantum Fourier sampling O( M
) O( M
) Quantum uniform
random query
Learning of the concentration set of concentrated Boolean functions has been studied
in different forms outside of classical learning, in particular in the context of digital
signal processing (DSP). In classical learning, this is performed using the Goldreich–
Levin theorem in the Kushilevitz–Mansour algorithm. In DSP, the problem of learning
the set of concentrated Fourier terms is referred to as the sparse Fourier transform.
From now on, let us use the word sparse Fourier sampling to describe the generic
problem of picking out the important terms in the Fourier domain.
Definition 1 The sparse Fourier sampling problem: Given f : {−1, 1}n → {−1, 1}
be an -concentrated Boolean function on an unknown set of size at most M, sample
a set L of Fourier terms such that L contains all the Fourier terms with coefficient
greater than or equal to a threshold γ .
√
From Theorem 1, we need the threshold γ = O( √ ) in order to learn the target
√ M
function. Using γ = O( √ ),
we list in the below table algorithms that solve the
M
sparse Fourier sampling problem using different type of query.
Originally, the Kushilevitz–Mansour algorithm uses the Goldreich–Levin algorithm
[20] to solve the sparse Fourier sampling problem. On the other hand, the algorithm
developed by Hassanieh et al. is the fastest classical algorithm for this purpose (note
that the algorithm by Hassanieh et al. actually performs the sparse Fourier transform,
which is the equivalent of learning; therefore, it automatically solves the sparse Fourier
sampling problem). We will look at and compare those algorithms more in depth in
subsection 5.5.
In this subsection, we look at the quantum Fourier sampling for performing this
task. The quantum Fourier sampling (or Fourier sampling) routine, first used in [16]
for learning disjunctive normal form Boolean function, is a well-known quantum
procedure popularly used in theoretical quantum machine learning algorithms.
Theorem 2 Given f : {−1, 1}n → {−1, 1} be an -concentrated Boolean function
on an unknown set of size at most M, there exists a quantum algorithm that, using
O( M ) quantum uniform queries and time, recovers a list L of√at size O( M ) containing
all Fourier terms of f with Fourier coefficient greater than /M with high success
probability.
123
Quantum learning of concentrated... Page 13 of 23 256
Proof First let us describe the quantum Fourier sampling (QFS) routine: Given access
1
to a uniform query oracle that when given the n+1 qubit input state 2n/2 x∈{0,1}n |x, b
1
(n qubits for x and one for b), returns 2n/2 x∈{0,1}n |x, b ⊕ f (x), as described in
section 3.2, the QFS routine returns S ⊆ [n] with probability fˆ(S)2 in O(1) time using
1 uniform quantum query. Here, f (x) is f (x) viewed in the computational basis, i.e.,
under the change of basis {1, −1} to {0, 1}, with 1 mapped to 0 and −1 mapped to 1.
The steps of the QFS routine are as follows:
1
1. Start with 2n/2 x∈{0,1}n |x, 1. Apply the Hadamard transform to the last qubit to
1
get 2n/2 x∈{0,1}n |x|− where |− = |0−|1 √ .
2
2. Apply the quantum queryoracle (with |− being the auxiliary bit) to turn
1 f (x) |x|− (note that |− ⊕ f (x) =
n/2 x∈{0,1}n |x|− into x∈{0,1}n (−1)
2
(−1) f (x) |−).
Apply the Hadamard gate to the last qubit to send it back to |1, and we have the
1
resulting state 2n/2 x∈{0,1}n f (x)|x in the first n qubit.
3. Apply a Hadamard gate to all of the first n qubits and obtain:
1 1 1
f (x)( (−1)x.S |S) = n ( (−1)x.S f (x)|S) = fˆ(S)|S
2n/2 x
2n/2 2 x
S S S
where x.S denotes the dot product. Here, S denotes both a subset of [n] (or equiv-
alently a number in [0, 2n − 1]), as well as the associated bit string. For example,
for n = 5 and S = [1, 2], we can also say S = 5 or S = 01100.
4. Measure the resulting state to obtain the desired result.
Now, we perform our procedure is as follows: query the QFS routine O( M ) times,
and output the collection of results.
Consider an arbitrary
term S ⊆ [n] in the Fourier basis with Fourier coefficients
fˆ(S) greater than M . The promise that f is -concentrated in a subset of size at
most √ M tells us that there are at most 2M terms with Fourier coefficients that are at
least /M.
Therefore, by using O( M ) uniform quantum queries and applying the QFS pro-
cedure, the probability that a term with coefficient at least M not appearing in the
9
result is less than (say) 10M , and therefore the probability that the procedure succeeds
9
is at least 10 from the union bound (here 9/10 is chosen to represent high success
probability—this can be made arbitrarily close to 1). Since each query can add at most
one new element to L, the size of L is O( M ).
Combining results from the previous two subsections, we arrive at our first main
theorem.
123
256 Page 14 of 23 K. Palem et al.
Theorem 3 Let C be a concept class such that every f : {−1, 1}n → {−1, 1} in C has
its Fourier spectrum -concentrated on a collection of at most M Fourier terms. Then,
2
C can be PAC learned using O( M2 ) quantum uniform queries in O( M 3 ) time.
Proof This theorem is a straightforward
√
corollary√of Theorem 1 and Theorem 2, where
the threshold is γ = O( √ ). When γ = O( √ ), we have |L| = O(M/), and
M M
therefore from Theorem 1 we can learn the class of concentrated Boolean functions
in O(M 2 / 3 ) time using O(M/ 2 ) classical uniform random queries. Combine with
the cost of the subprocedure in Theorem 2, the combined complexity of our algorithm
is O(M 2 / 3 ) time and O(M/ 2 ) classical uniform random queries + O(M) uniform
quantum queries, or O(M/ 2 ) quantum uniform random queries.
In this section, we will explore the classical algorithms that were developed to solve the
problem of learning concentrated Boolean functions. In particular, we derive a tight
bound for the Kushilevitz–Mansour algorithm, and discuss some of the algorithms
that perform sparse Fourier transform in the context of discrete Fourier transform, a
problem similar to that of learning the class of concentrated Boolean functions.
We will go through the procedure of the Kushilevitz–Mansour algorithm at a high
level which internally uses the Goldreich–Levin theorem, and the algorithm developed
by Hassanieh et al. [23] to give the readers an idea of how these algorithms operate
and to analyze their complexity. A correctness proof will not be included.
Goldreich–Levin Theorem [20] was originally developed for finding a hard-core pred-
icate for any one-way function in the context of cryptography. A modified version of
it to work in the Boolean domain is used in the Kushilevitz–Mansour algorithm as
below.
Theorem 4 Goldreich–Levin theorem [20, 26] There exists a randomized algorithm,
which given classical membership query access to a function f : {−1, 1}n → {−1, 1}
and a threshold γ > 0, recovers a list L of O( γ12 ) Fourier terms such that:
• If | fˆ(S)| ≥ γ then S ∈ L
• If S ∈ L then | fˆ(S)| ≥ γ /2
The algorithm succeeds with high probability, runs in O(n γ16 ) time, and uses O(n γ16 )
membership queries.
¯
Proof
(Goldreich–Levin) Let J , J¯ be a partition of [n] and S ⊆ J , we write W S| J [ f ]
= T ⊆ J¯ fˆ(S ∪ T ) the sum of the squares of the Fourier weights of f on sets whose
2
123
Quantum learning of concentrated... Page 15 of 23 256
Now, we can arrive at our tight bound for the Kushilevitz–Mansour algorithm. This
bound is desirable, since an exact bound for the algorithm is not explicitly presented
in the original paper.
123
256 Page 16 of 23 K. Palem et al.
Table 2 Different algorithms solving the learning of concentrated Boolean functions problem using different
types of query
Hassanieh2012 [23] O(M log(2n /M)n/) O(M log(2n /M)n/) Uniform random
query
Indyk2014 [25] Õ(2n n c ) Õ(Mn) Uniform random
query
Kushilevitz–Mansour [20, 26] O(n M 3 / 3 ) O(n M 3 / 3 ) Membership query
Our algorithm O(M 2 / 3 ) O(M/ 2 ) Quantum uniform
random query
The algorithm developed by Hassanieh et al. can be applied to solve the problem of
learning concentrated Boolean functions in a trivial way: Consider the set of all pairs
of input–output (x, f (x)) as the list of number x, and the output of the DFT on x, x̂,
is the Fourier coefficients of the functions multiplying with 2n/2 .
In this section, we have a comparison between the different algorithms to solve the
problem of learning concentrated functions that we have visited. The Kushilevitz–
Mansour algorithm is the oldest among those algorithms, having the worst upper
bounds on both time and query complexity. The Hassanieh algorithm is a highly
optimized algorithm that achieve a good bound on both time and query complexity as
well as uses only uniform random query. Our new algorithm in Theorem 3 achieves the
best query complexity, whereas the time complexity is better when M = Õ(n 2 ). The
algorithm by Indyk et. al. [25] is also worth mentioning, since it achieves a classical
query complexity of Õ(Mn) which is proven to be theoretically optimal. However,
the time complexity of Indyk’s algorithm is exponential in n.
Since the Hassanieh algorithm also has the high-level structure of finding the impor-
tant Fourier terms and then estimating the weights, an obvious question that arises is
that whether we can improve our result using the estimation scheme from Hassanieh
algorithm? It might be possible, but not straightforward—the Hassanieh algorithm is
a iterative algorithm that interleaves finding the big Fourier terms and estimating them
more precisely in each iteration. Therefore, improving our algorithm using ideas from
the Hassanieh algorithm is a possible direction which we will consider to extend this
work in the future.
123
Quantum learning of concentrated... Page 17 of 23 256
Quantum query lower bounds have been studied in the past and are useful in helping
to understand how good quantum algorithms are, as well as in comparing classical and
quantum learning algorithms. In this section, we prove a lower bound for the number
of quantum queries needed to learn the class of -concentrated Boolean functions in
the distribution-independent settings. Our lower bound matches the upper bound from
the previous section when is considered to be a constant; however, since our previous
section performs learning in a uniform distribution settings, the matching bounds do
not indicate optimum.
In this section, we use the Vapnik Chervonenkis (VC) dimension as a tool to prove
the lower bound of our concept class. In term of binary classification, a set X of
examples is shattered by a concept class C if for any labeling of the examples in
X , there is a function in C that assigns those labeling to those examples. The VC
dimension of a concept class C is the size of the largest finite set of examples that is
shattered by C.
Intuitively, the higher the VC dimension of a class of functions, the more com-
plicated the concept class is and therefore more data is required to learn it. In fact,
relationship between the VC dimension of a concept class and the lower bound on the
number of quantum queries needed to learn has been proven in [10].
Theorem 7 (Atici, Servedio [10]) Let C be any concept class of VC dimension d, and
supposed access to a quantum query oracle is given. Then, the sample complexity
√ of
PAC learning C with error ≤ 1/10 under arbitrary distribution is (d + d ).
On the other hand, intuitively, as the number of important Fourier terms of a function
becomes bigger, the function becomes more complicated. This suggests a connection
between the concentration size M of the concept class of concentrated functions and
the VC dimension of that class. We can prove the following theorem.
Theorem 8 The concept class of Boolean functions that are -concentrated on an
unknown subset of size at most M has VC dimension at least M.
Proof To make the analysis simpler, let assume that M is a power of 2. Let us consider
the class of log M-junta (recall they are Boolean functions that only depend on log M
variables). Trivially, such functions have at most M nonzero Fourier coefficients, and
therefore is concentrated on a subset of size M. Hence the concept class of Boolean
functions that are -concentrated on a subset of size at most M contains the class of
log M-junta.
Consequently, the VC dimension of the class of Boolean functions -concentrated
on a subset of size at most M is at least the VC-dimension of the class of log M-junta.
On the other hand, we know that the VC dimension of the class of log M-junta is at
least M as proved in [11]. Therefore, the VC dimension of the class of concentrated
Boolean functions on a subset of size at most M is at least M.
A direct consequent of the above two theorems is a lower bound on the number
of uniform quantum queries needed to PAC learn the concept class of concentrated
functions.
123
256 Page 18 of 23 K. Palem et al.
Theorem 9 Let C be the class of Boolean functions f : {−1, 1}n → {−1, 1} that are
-concentrated on an unknown set of size at most M. Then, the number of quantum
queries necessary to PAC learn C with error α ≤ 1/10 and high success probability
under arbitrary distribution is (M).
The two most used learning models in theoretical learning are PAC learning and
exact learning. Therefore, the natural question one would come up with at this point
is whether we can efficiently learn the class of concentrated functions using exact
learning model. In this section, we will answer that question.
Here, we will first introduce some essential information theory definitions and
concepts we will need to prove our lower bound, followed by a precision formulation
of the lower bound and its proof.
In this subsection, we describe the basic definitions of information theory and quantum
information theory.
A random variable A with probability Pr[A = a] = pa has entropy H (A) =
− a pa log( pa ). Intuitively, the entropy of a variable represents the least number of
bits needed to store the information contained in that variable. For a pair of possibly
correlated random variables A and B, the joint entropy H (A, B) is the information
obtained
from evaluating A and B simultaneously, and can be computed as H (A, B) =
a∼A,b∼B Pr(A = a, B = b) log(Pr(A = a, B = b)). The conditional entropy of
A given B is H (A|B) = H (A, B) − H (B), or as Eb∼B [H (A|B = b)]. The mutual
information between A and B is I (A : B) = H (A) + H (B) − H (A, B).
For quantum information theory, given a density matrix ρ of n qubits, its singular
values ρ1 , . . . ρ2n form a probability distribution P, and the von Neumann entropy of
ρ is S(ρ) = H (P).
7.2 Lower bounds for quantum exact learning of concentrated Boolean functions
Recall that in exact learning, the error α = 0 and the goal is to exactly learn a
function. We will now prove that exact learning of the class of concentrated Boolean
function is remarkably harder than PAC learning, and therefore PAC learning offers
more opportunities for achieving an advantage through quantum computing. We now
show that
Theorem 10 Let C be the class of Boolean functions f : {−1, 1}n → {−1, 1} -
concentrated on an unknown set of size at most M of Fourier terms with 0 < < 1/2.
Then, the number of uniform quantum examples necessary to exact learn C with high
success probability is ( 2 log
n M
n ).
123
Quantum learning of concentrated... Page 19 of 23 256
Proof Our proof follows the techniques used in the proof of Theorem 9 in the work
of Arunachalam et al. [7]. Let V be the set of distinct subspaces in {−1, 1}n with
dimension n − log M and C be the class of functions:
Note that |C | = |V| and each f V evaluates to 1 on a 1−1/M fraction of its domain.
This class of functions is M-sparse Boolean functions, i.e., functions which have at
most M nonzero Fourier coefficients, as proved in [24].
Now, let us take C and for each f V in this class, among the input values x ∈ V (i.e.,
f V (x) = −1), choose a random fraction of x and switch f V (x) (so that f V (x) = 1).
This creates a new function f V . Let us repeat this action for all possible choices of
randomly chosen fraction of x ∈ V . All the newly created functions form a new
class of functions called C .
Claim The concept class C contains functions that are -concentrated in a set of at
most M Fourier terms.
Denote the set Fourier terms of f V with nonzero coefficient F, we know that
|F| ≤ M. From the above equation, we have
> ( fˆV (S) − fˆ V (S))2 + ( fˆV (S) − fˆ V (S))2
S∈F S ∈F
/
> ( fˆV (S) − fˆ V (S))
2
= fˆ V (S)2
S ∈F
/ S ∈F
/
We will prove our lower bound for this class of functions C , following a three-step
information theoretic approach. The proof of this part of the theorem is similar to that
of Theorem 9 in [7] with some argument modifications; here we will rewrite the full
proof for the sake of completeness.
Let A be a random variable that is distributed uniformly over C . Suppose A = f
and let B = B1 , B2 . . . BT be T copies of the uniform quantum example |ψ f =
1
2n/2 x∈{0,1}n |x, f (x). The random variable B is a function of the random variable
A.
Now let us look at the mutual information of A and B, which is I (A : B). We have
a series of bounds:
123
256 Page 20 of 23 K. Palem et al.
1. I (A : B) ≥ (log |C |)
This is true since B allows to recover A with high probability and A is a random
variable uniformly distributed over C .
2. I (A : B) ≤ T · I (A : B1 )
This is true as
T
I (A : B) = H (B) − H (B|A) = H (B) − H (Bi |A)
i=1
T
T
T
≤ H (Bi ) − H (Bi |A) = I (A : Bi ) = T · I (A : B1 )
i=1 i=1 i=1
from the rule of subadditivity of quantum entropy and since all the Bi are identical.
3. I (A : B1 ) ≤ O(n/M)
Since AB is a classical-quantum state, we have
with the first equality comes from the definition, and the second equality uses the
fact that B1 is a function of (fully dependent on) A, thus Pr(A = a, B1 = b) is 0
for all but one value of b, and hence S(A, B1 ) = S(A).
Therefore, it is sufficient to bound S(B1 ). We have B1 is
1
ρ= |ψ f ψ f |
|C |
f ∈C
123
Quantum learning of concentrated... Page 21 of 23 256
2n
Claim The number of distinct functions in the concept class C is 2(n log M M ) .
Proof We can prove that the number of distinct V , |V|, is 2(n log M) . This is already
proven in Theorem 9 in [7]. We also can prove that the f V received from modifying
f V are non-duplicate for different f V . Indeed, for two distinct subspaces V and V of
the same dimension, the number of overlap elements between them cannot be greater
than 21 |V |. This is true since we can specify a d-dimensional subspace using d linearly
independent vectors, and in the case that at least one vector among those d vectors is
modified, the number of modified elements in the subspace is at least 2d−1 . Therefore,
for a subspace V , when we choose to ‘remove’ up to < 1/2 fraction of the elements
from the subspace, we can still uniquely identify the subspace.
However, the main component that makes up the size of C is the number of distinct
f V one can get from a f V . How many ways to remove |V | elements from |V |
|V | 1 |V |
elements? It is |V | , which can be trivially seen to be greater than , which
2n
is greater than 2|V | for 0 < < 1/2. Notice that |V | = M, and therefore 2|V | is
2n
2 M.
n2n log M
Thus, the number of distinct functions in C is 2( M ) .
Therefore, we have that log |C | = ( 2 and therefore T = ( 2
n log M n log M
M ), n ).
The rapid increase in popularity of both machine learning and quantum computing in
recent years has made quantum machine learning an active topic of research. Quan-
tum information and concomitant techniques have been applied in the experimental
context of machine learning with promise in improvements. In this work, we study the
problem of quantum learning of Boolean functions and show novel bounds through
the concept of concentration. Our main contribution in this paper is to show that for
any Boolean function, through its concentration measure quantified as the number of
dominant terms in its Fourier spectrum, we are able to characterize the potential for
improvements.
We note that the idea of using quantum Fourier/Hadamard transform instead of
the Goldreich–Levin theorem to perform the spare Fourier Transform is not new. The
123
256 Page 22 of 23 K. Palem et al.
idea was applied in several theoretical quantum machine learning works such as PAC
learning of DNF [16], juntas [11] and exact learning of k-sparse functions [9]. The
same idea was also applied in the field of cryptography to improve the Goldreich–
Levin theorem such as works done by Adcock and Cleve [2, 3]. However, our work
is the first one that applies this idea to the class of concentrated functions which is a
super class of juntas and k-sparse functions. All of our results and those that precede
our work rely on the framework of query complexity to characterize efficiency. On
a side note, in the more recent years, ‘dequantization algorithms’ has been an active
area of research (see [17, 19, 34, 35]) that focuses on exploiting the advantage of a
quantum-like memory or database oracle. It is noteworthy that these results propose
to use query complexity to characterize improvements and therefore their measure of
goodness is consistent with our approach.
Funding This material is based upon work supported by Defense Advanced Research Projects Agency
under the Grant No. FA8750-16-2-0004.
Data Availability Data sharing not applicable to this article as no datasets were generated or analyzed during
the current study.
References
1. Aaronson, S.: Read the fine print. Nat. Phys. 11, 291–293 (2015). https://doi.org/10.1038/nphys3272
2. Adcock, M., Cleve, R.: A quantum Goldreich-Levin theorem with cryptographic applications. In: Alt,
H., Ferreira, A. (eds.) STACS 2002, pp. 323–334. Springer, Berlin Heidelberg (2002)
3. Adcock, M., Cleve, R., Iwama, K., Putra, R., Yamashita, S.: Quantum lower bounds for the Goldreich-
Levin problem. Inf. Process. Lett. 97(5), 208–211 (2006)
4. Aïmeur, E., Brassard, G., Gambs, S.: Machine learning in a quantum world. In: Lamontagne, L.,
Marchand, M. (eds.) Advances in artificial intelligence, pp. 431–442. Springer, Berlin (2006)
5. Aïmeur, E., Brassard, G., Gambs, S.: Quantum speed-up for unsupervised learning. Mach. Learn.
90(2), 261–287 (2013). https://doi.org/10.1007/s10994-012-5316-5
6. Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1987). https://doi.org/10.
1007/BF00116828
7. Arunachalam, S., Chakraborty, S., Lee, T., Wolf, R.: Two new results about quantum exact learning.
In: ICALP (2019)
8. Arunachalam, S., de Wolf, R.: Guest column: a survey of quantum learning theory. SIGACT News
48(2), 41–67 (2017). https://doi.org/10.1145/3106700.3106710
9. Arunachalam, S., de Wolf, R.: Optimal quantum sample complexity of learning algorithms. J. Mach.
Learn. Res. 19, 71:1–71:36 (2018). http://jmlr.org/papers/v19/18-195.html
10. Atici, A., Servedio, R.: Improved bounds on quantum learning algorithms. Quant. Inf. Process. 4,
355–386 (2004). https://doi.org/10.1007/s11128-005-0001-2
11. Atıcı, A., Servedio, R.: Quantum algorithms for learning and testing juntas. Quant. Inf. Process. 6,
323–348 (2007). https://doi.org/10.1007/s11128-007-0061-6
12. Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning.
Nature 549(7671), 195–202 (2017). https://doi.org/10.1038/nature23474
13. Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the vapnik-chervonenkis
dimension. J. ACM 36(4), 929–965 (1989). https://doi.org/10.1145/76359.76371
14. Bshouty, N., Jackson, J.: Learning DNF over the uniform distribution using a quantum example oracle.
SIAM J. Comput. 28, 1136–1153 (1999). https://doi.org/10.1137/S0097539795293123
15. Bshouty, N.H., Cleve, R., Gavaldà, R., Kannan, S., Tamon, C.: Oracles and queries that are sufficient
for exact learning. J. Comput. Syst. Sci. 52(3), 421–433 (1996). https://doi.org/10.1006/jcss.1996.
0032 . https://www.sciencedirect.com/science/article/pii/S002200009690032X
123
Quantum learning of concentrated... Page 23 of 23 256
16. Bshouty, N.H., Jackson, J.C.: Learning dnf over the uniform distribution using a quantum example
oracle. SIAM J. Comput. 28(3), 1136–1153 (1998). https://doi.org/10.1137/S0097539795293123
17. Chia, N.H., Lin, H.H., Wang, C.: Quantum-inspired sublinear classical algorithms for solving low-rank
linear systems (2018)
18. Dunjko, V., Taylor, J.M., Briegel, H.J.: Quantum-enhanced machine learning. Phys. Rev. Lett. 117(13),
130501 (2016). https://doi.org/10.1103/physrevlett.117.130501
19. Gilyén, A., Lloyd, S., Tang, E.: Quantum-inspired low-rank stochastic regression with logarithmic
dependence on the dimension. ArXiv arXiv:1811.04909 (2018)
20. Goldreich, O., Levin, L.: A hard-core predicate for all one-way functions. pp. 25–32 (1989). https://
doi.org/10.1145/73007.73010
21. Grover, L.K.: A fast quantum mechanical algorithm for database search. In: Proceedings of the Twenty-
Eighth Annual ACM Symposium on Theory of Computing, STOC ’96, p. 212-219. Association for
Computing Machinery, New York, NY, USA (1996). https://doi.org/10.1145/237814.237866
22. Harrow, A.W., Hassidim, A., Lloyd, S.: Quantum algorithm for linear systems of equations. Phys. Rev.
Lett. 103(15), 150502 (2009)
23. Hassanieh, H., Indyk, P., Katabi, D., Price, E.: Nearly optimal sparse Fourier transform (2012)
24. Haviv, I., Regev, O.: The list-decoding size of Fourier-sparse Boolean functions (2015)
25. Indyk, P., Kapralov, M.: Sparse fourier transform in any constant dimension with nearly-optimal sample
complexity in sublinear time (2014)
26. Kushilevitz, E., Mansour, Y.: Learning decision trees using the Fourier spectrum. SIAM J. Comput.
22(6), 1331–1348 (1993). https://doi.org/10.1137/0222080
27. Lloyd, S., Mohseni, M., Rebentrost, P.: Quantum principal component analysis. Nat. Phys. 10(9),
631–633 (2014). https://doi.org/10.1038/nphys3029
28. Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information: 10th Anniversary
Edition, 10th edn. Cambridge University Press, USA (2011)
29. O’Donnell, R.: Analysis of Boolean Functions. Cambridge University Press, USA (2014)
30. Rebentrost, P., Mohseni, M., Lloyd, S.: Quantum support vector machine for big data classification.
Phys. Rev. Lett. 113(13), 130503 (2014). https://doi.org/10.1103/physrevlett.113.130503
31. Roh, Y., Heo, G., Whang, S.E.: A survey on data collection for machine learning: a big data – ai
integration perspective (2019)
32. Shalev-Shwartz, S., Ben-David, S.: Understanding machine learning: From theory to algorithms. Cam-
bridge University Press (2014)
33. Shor, P.W.: Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum
computer. SIAM J. Comput. 26(5), 1484–1509 (1997). https://doi.org/10.1137/S0097539795293172
34. Tang, E.: Quantum-inspired classical algorithms for principal component analysis and supervised
clustering (2018)
35. Tang, E.: A quantum-inspired classical algorithm for recommendation systems. In: Proceedings of the
51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, p. 217-228. Association
for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3313276.3316310
36. Valiant, L.G.: A theory of the learnable. In: Proceedings of the Sixteenth Annual ACM Symposium
on Theory of Computing, STOC ’84, p. 436-445. Association for Computing Machinery, New York,
NY, USA (1984). https://doi.org/10.1145/800057.808710
37. Whalen, J., Wang, Y.: Hottest job in china’s hinterlands: Teaching ai to tell a truck from a turtle.
Washingtonpost (2019). https://www.washingtonpost.com/business/2019/09/26/hottest-job-chinas-
hinterlands-teaching-ai-tell-truck-turtle/
38. Wiebe, N., Kapoor, A., Svore, K.: Quantum algorithms for nearest-neighbor methods for supervised
and unsupervised learning (2014)
39. Wiebe, N., Kapoor, A., Svore, K.M.: Quantum deep learning (2015)
40. Wiebe, N., Kapoor, A., Svore, K.M.: Quantum perceptron models (2016)
41. Zhang, C.: An improved lower bound on query complexity for quantum pac learning. Inf. Process.
Lett. 111(1), 40–45 (2010) https://doi.org/10.1016/j.ipl.2010.10.007. https://www.sciencedirect.com/
science/article/pii/S0020019010003133
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
123