You are on page 1of 23

Quantum Information Processing (2022) 21:256

https://doi.org/10.1007/s11128-022-03607-5

Quantum learning of concentrated Boolean functions

Krishna Palem1 · Duc Hung Pham1 · M. V. Panduranga Rao2

Received: 12 August 2021 / Accepted: 6 July 2022 / Published online: 26 July 2022
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022

Abstract
In this paper, we present a series of new results about learning of concentrated Boolean
functions in the quantum computing model. Given a Boolean function f on n variables,
its concentration refers to the dominant terms in its Fourier–Walsh spectrum. We
show that a quantum probabilistically approximately correct learning model to learn
a Boolean function characterized by its concentration yields improvements over the
best-known classical method. All of our results are presented within the framework of
query complexity, and therefore, our advantage represents asymptotic improvements
in the number of queries using a quantum approach over its classical counterpart.
Next, we prove a lower bound in the number of quantum queries needed to learn
the function in the distribution-independent settings. Further, we examine the case
of exact learning which is the learning variant without error. Here, we show that the
query complexity grows as 2βn for some 0 < β < 1 and therefore remains intractable
even when quantum approaches are considered. This proof is based on the quantum
information theoretic approach developed by researchers for the restricted case of
k-sparse functions.

Keywords Quantum computing · Machine learning · PAC learning · Concentrated


functions

B Duc Hung Pham


hungdpham92@gmail.com
Krishna Palem
kvp1@rice.edu
M. V. Panduranga Rao
mvp@iith.ac.in

1 Department of Computer Science, Rice University, Houston, USA


2 Department of Computer Science and Engineering, IIT Hyderabad, Hyderabad, India

123
256 Page 2 of 23 K. Palem et al.

1 Introduction

Quantum computing has become a major area of research with the hope of finding
solutions to problems that are otherwise intractable using classical computers. Shor’s
celebrated integer factoring algorithm [33] offered great hope in this regard, as did
many results that followed. Another notable well-known result in the same vein is
Grover’s algorithm [21] for unordered search.
On the other hand, machine learning has clearly become a dominant area for
research and application in computer science. Consequently, researchers have been
studying quantum computing techniques and algorithms in this context. Two impor-
tant paradigms of computational learning theory are probably approximately correct
(PAC) learning [36] and exact learning (see Angluin [6]). These learning paradigms
have attracted a lot of research attention over the past few decades with a rich history
in the context of classical computing models.
Both PAC and exact learning problems are based on learning a target n-bit Boolean
function f drawn from among a given set of Boolean functions C = { f 1 , f 2 , . . . fr }
referred to as a concept class. The process of learning follows the supervised style
where examples of the form {x, f (x)} are used to discern the structure of the candi-
date function f . In exact learning, there is no room for error in characterizing f . In
contrast, in PAC learning the algorithm can output a hypothesis function h such that
the error margin between h and f should, with a high probability (Probably), be small
(Approximately Correct). Unless stated otherwise, we will use the phrase machine
learning to refer to PAC or exact learning and any deviations will be explicitly identi-
fied.
Quantum versions of PAC and exact learning have been proposed, where instead
of classical examples (x, f (x)) of the function, the learning algorithm has access to
an
√ oracle that provides quantum examples in the form of quantum superposed state
D(x)|x, f (x). It is noteworthy that improvements in learning algorithm, such as
in [11] and [7], are mostly reported through query complexity—the number of queries
(examples) required to learn a function from a given class.
In this paper, we present three new results that characterize improvements that
can be achieved in learning of classes of functions associated with the concept of
concentration [29]. Concentration has its roots in the spectra of Boolean functions and
informally, captures the number of dominant terms in its Fourier–Walsh spectrum. In
this work, we focus our results for the class of concentrated Boolean functions, and
use 0 ≤  < 1 to denote the amount of concentration and M denotes the number of
dominant terms in the spectrum of the functions. More specifically, the M dominant
terms have a combined l2 mass of at least 1 − . Note that this class of function is
a larger class of functions than the class of junta or k-sparse functions, which were
studied in previous works in this field (see [9, 11]). Our main contributions are as
follows:

1. In our first result, we exhibit a new quantum learning algorithm using uniform
quantum queries with improvement over the best-known classical algorithm by
Hassanieh et. al. [23] in the query complexity metric. We propose a quantum
algorithm that can PAC learn the same class of function in O( M2 ) uniform quantum

123
Quantum learning of concentrated... Page 3 of 23 256

n
M log( 2 )n
queries compared to the query complexity O( 
M
) of Hassanieh’s algorithm.
Our algorithm is based on the classical Kushilevitz–Mansour algorithm [26], of
3
which we give a detailed complexity analysis and prove a tight bound O( nM3 ) on
query complexity, since the exact number was not included in the original paper.
2. Next, we prove a lower bound on the number of quantum queries needed to learn
the class in the distribution-independent

case. Here, we show that the number of
queries grows as (M +  ), which matches our upper bound of our quantum
M

algorithm when  is considered to be a constant. Note that this bound holds for the
distribution-independent case; when the distribution is uniform (i.e., using uniform
quantum queries, as in the first result), one can possibly arrive at a smaller lower
bound.
3. Finally, we prove that in the context of exact learning, the number of necessary
uniform quantum queries to learn the class of concentrated Boolean functions
grows as ( 2 log
n M
n ) which is exponential in n. Therefore, an approximate learning
algorithm such as PAC learning is significantly better for learning this class of
functions, even in the quantum domain.

In keeping with the way PAC and exact learning models are defined, all of these
claims apply when the learning schemes in question are accompanied by a proba-
bilistic guarantee. We use the same guarantee when comparing classical and quantum
approaches in contribution 1 and the lower bound in contribution 2, or comparing PAC
learning upperbound in contribution 1 and exact learning lower bound in contribution
3.

1.1 Related work

In the context of learning, we note that Grover’s algorithm [21] and the Harrow–
Hassidim–Lloyd linear systems solver [22] have been explored with the goal of
improving machine learning systems. For several specific machine learning meth-
ods, improvements from quadratic speedups (such as for reinforcement learning [18])
to exponential speedups (such as for support vector machine [30]) have been reported.
A more extensive survey of proposed quantum machine learning algorithms can be
found at [12].
Multiple results exist for both upper bounds and lower bounds and in time and query
complexity in the settings of exact learning [15] and in PAC learning [26]. In classical
context, past result indicates a strong dependence between the sample complexity for
PAC learning and the Vapnik Chervonenkis (VC) dimension of the concept class [13].
The same correlation can be found in the context of quantum learning, as shown in
[10].
Finally, improvements in the number of queries needed over classical models have
been reported for juntas [11], or in the exact learning of k-sparse functions [7]. A
survey on results in quantum machine learning can be found in [8].

123
256 Page 4 of 23 K. Palem et al.

2 Computational learning theory: PAC learning, exact learning, query


complexity

Machine learning has been one of the most active research areas in computer science
for decades. Especially with the evolutionary development of neural network and deep
learning around the early 2000 s, machine learning has realized major advancements in
numerous computing activities, such as object and face recognition, natural language
processing, and surrogate models for complicated systems.
However, general machine learning (and deep learning in particular) is usually
considered an empirical science. The reason behind that lies in the fact that while very
powerful in practical applications, most machine learning algorithms are designed
based on empirical practice, and are tested based on performance accuracy on data.
There are only a few number of learning models that establish a theoretical bound on
the accuracy of the learning process. PAC learning and exact learning are two such
models.

2.1 PAC learning and exact learning

Introduced by Valiant [36] in the 1980 s, building on earlier works in statistics of


Vapnik and others, PAC learning model provides a rigorous mathematical definition
for a learning model and what it means to efficiently learn an arbitrary target function
f drawn from a given class of Boolean functions C = { f 1 , f 2 . . . f |C | } (referred to as
a concept class). The goal is to learn the target function f , i.e., to be able to compute
the Boolean function f on an arbitrary input x ∈ {−1, 1}n . A learning algorithm A
for f is a PAC-α learning algorithm if with high probability it outputs a hypothesis
function h : {−1, 1}n → {−1, 1} such that h disagrees with f on at most α fraction
of the inputs. Since the model allows some difference (error) in the hypothesis and the
actual target, the model is called probably approximately correct (PAC).
More formally, a learning algorithm A is called an (α, δ)-learner for 0 < α, δ < 1/2
if: given a function f in the concept class C, a distribution D on the universe X of
all inputs, and access to E X ( f , D)—a query oracle that outputs examples of f in the
form of (x, f (x)) where x is drawn from D—then with probability at least 1 − δ, the
algorithm A outputs a hypothesis function h such that Pr[h(x) = f (x)] < α.
On the other hand, exact learning, introduced by Angluin [6], is the learning model
where the learning algorithm must learn the function exactly with high probability.
The introduction of PAC learning model in the 1980 s marks the entrance of machine
learning in theoretical computer science. A remarkable body of work, in both efficient
learning results and hardness results, has been done on this field since then. We refer
readers to [32] for a more comprehensive introduction to this area.
Note that usually in PAC learning literature, the error is denoted . In this paper, we
use α to avoid notational confusion with concentration. In addition, in this paper we
will not argue about the success probability, since error probability of each procedure
can be made exponentially small by repetition.

123
Quantum learning of concentrated... Page 5 of 23 256

2.2 Query model, query complexity

In order to learn a target function, learning algorithms in general need to access infor-
mation about that function. The usual form of information that machine learning
algorithms use is examples of that function in the form of input–output pairs. We call
those algorithms supervised learning algorithm, and the input–output pairs queries.
Whereas in practical (we use this term to distinguish with theoretical machine learn-
ing) machine learning, the main performance metric is running time and accuracy, in
theoretical machine learning researchers often also care about another criteria: query
complexity, i.e., the number of queries needed to learn an arbitrary function from the
concept class. There are two intuitive reasons behind this metrics:
– There are numerous machine learning tasks where the difficulty lies in finding the
examples of the function. Examples of this are tasks where publicly available data
set is nonexistent, more domain-specific tasks, and tasks that require substantial
human manual effort to label. In fact, data labeling is becoming an increasing
popular job in many countries [37] because of the hardness of obtaining data using
other means. A survey on data collection and machine learning by Roh et. al. [31]
stresses this problem more in depth.
– In theoretical machine learning, researchers often also look at the learning process
from an information theoretical point of view. Using the metric of query complex-
ity, researchers can reason about the amount of information needed to learn the
concept class, which can give substantial insight on how the class can be learned
using practical learning algorithms.
The learning paradigm where learning algorithms are allowed to access a query
oracle that, when queried, gives examples of the function in the form of pairs of input
and output (x, f (x)) is called a query model. There are two types of queries oracle in
classical learning theory.
– A membership query oracle E X when queried with a specific input x returns the
corresponding specific output f (x). The query type is called a membership query.
– A uniform random example oracle U E X returns a pair (x, f (x)) where x is drawn
uniformly randomly from the set of all possible inputs ∈ {−1, 1}n . Its query is
called uniform random query.
From the definition, a uniform random query can be simulated by a membership query.
A membership query is considered to be much harder to obtain than a uniform random
query.

3 An overview of quantum computing and quantum computing in


machine learning

After being around for more than a decade as a computing model of interest, there
was a sudden elevation in the attention that quantum computing received in the 1990 s
with the discovery of Shor’s efficient quantum algorithm for integer factoring [33]
and Grover’s quadratic speedup over the classical best algorithm for unordered search
[21].

123
256 Page 6 of 23 K. Palem et al.

Since the paradigm is based on the principles of quantum physics, there are some
fundamental differences with classical computation in the way the computer and
the computation is defined. Moreover, some phenomena that are unique to quantum
mechanics are believed to endow the model with a power beyond classical models
of computation. We will briefly highlight some salient features of the model and rec-
ommend the interested reader to the classic textbook by Nielsen and Chuang [28] for
details.
While a ‘bit’ in a classical computer can have a value of 0 or 1 at any given
instant, a qubit in a quantum computer is defined as a ray in a two-dimensional Hilbert
space. For example, following the ket notation in quantum physics, if |0 and |1 form
a (computational) basis for the two-dimensional Hilbert space, the state vector of a
‘closed’ qubit can be written as a linear combination of these basis vectors α|0+β|1,
where α, β ∈ C are called probability amplitudes and |α|2 + |β|2 = 1. Computation
is carried on the qubit through unitary transformations on the Hilbert space. A unitary
transformation U is a length preserving transformation over the Hilbert space: UU † =
U † U = I . A measurement operation (in the computational basis) then yields |0 with
probability |α|2 and |1 with probability |β|2 . Such a linear combination is referred
to as a linear superposition, which results in what is dubbed often in the literature as
quantum parallelism. The notion of superposition can be extended to multiple
 qubits.
A quantum state |φ of n qubits can be written as the superposition i∈{0,1}n αi |i

where αi ∈ C and i∈{0,1}n |αi |2 = 1. The idea of computation through unitary
transformation can be lifted from a single qubit to multiple qubits. For a measurement
in the standard basis, the outcome of measuring state |φ is i with probability |αi |2 .
Last but not the least, in addition to superposition, quantum mechanical phenomena
like entanglement (leading to special states with non-local correlations) and inter-
ference in probability amplitudes are harnessed by quantum algorithm designers to
achieve speed-up over classical algorithms.

3.1 Quantum computing and practical machine learning

Given the advancements of both machine learning and quantum computing, their
intersection has received much attention. Many works have been done in this topic
in the last two decades. Two main quantum techniques employed in this line of work
are amplitude amplification and HHL algorithm for solving well-behaved linear-based
system of equations.
Amplitude amplification, a generalization of Grover’s unstructured search algo-
rithm [21], helps speed up learning algorithms that can be translated to an unstructured
search task such as clustering via minimum spanning tree and k-median [4, 5], k-
nearest neighbors [38], and quantum perceptron and deep learning [39, 40]. The
general speedup that can be achieved for those algorithms is usually quadratic based
on the general speedup of amplitude amplification.
The HHL algorithm for well-structured linear-based system of equations solver,
developed by Harrow, Hassidim, and Lloyd [22], brings an exponential speedup to
specific machine learning algorithms. The algorithms benefiting from HHL algorithm

123
Quantum learning of concentrated... Page 7 of 23 256

are algorithms that work with linear equation systems or matrix operations, such as
principle component analysis [27] and support vector machine [30].
A more detailed survey of the state of the art of practical quantum machine learning
can be found in [12]. It is worth noting that most of these works, like most practical
machine learning results, are heuristics in nature. In addition, most of them rely on
underlying assumptions, such as the sparsity of the matrices or the existence of an
efficient quantum memory, to be efficient. We refer the readers to [12] and [1] to learn
more about this issue. Recently, there has been a line of work relying on the assumption
of a classical memory model with the same functionalities as a quantum one to speed
up classical algorithms (see [19, 34, 35]), thus emphasizing the role of the underlying
assumptions in quantum speedup in machine learning applications.

3.2 Quantum computing and theoretical machine learning, quantum query model

With an increasing number of results in quantum speedup for practical machine learn-
ing, theoretical machine learning has also been receiving increasing quantum interest.
This subfield started when Bshouty and Jackson introduced the quantum PAC model
in [14]. In this quantum PAC model, instead of classical queries, the quantum model
is allowed √
 to access a quantum oracle, which when given the n + 1 qubit input state
x∈{0,1}n D(x)|x, b (n qubits for x and one for b), returns the following n + 1
qubit state:
 
D(x)|x, b ⊕ f (x),
x∈{0,1}n

where D is a probability distribution over the universe {0, 1}n .


This is called a quantum query, which is a natural quantum generalization of the
classical membership query. Even though it is not always realistic to assume that
access to such quantum states can be given, there are certain cases where this situation
can be realized, for example, learning the functionality of a black-box quantum oracle
(the black box can be generalized to any quantum systems that given a quantum input
produces a quantum output). The query is called a quantum uniform query when the
distribution D(x) is the uniform distribution, i.e., D(x) = 21n for all x.
Quantum versions of other theoretical learning models, such as exact learning or
agnostic learning [9], also use quantum queries as described above. The idea of using
quantum queries does not only apply to learning models but also to general algo-
rithms. In the literature, this general paradigm is called quantum query model, of which
Grover’s search is an outstanding example. Similar to the classical query model, quan-
tum query model also cares about query complexity and uses it as an important metric
of efficiency.
Several results exist for both upper bounds and lower bounds and in time and
query complexity in the settings of PAC learning and exact learning. In the classical
context, past results indicate a strong dependence between the query complexity for
PAC learning and the Vapnik Chervonenkis (VC) dimension of the concept class [13].
The same correlation can be found in the context of quantum learning, as shown in

123
256 Page 8 of 23 K. Palem et al.

[10]. Negative results on query complexity denying a speedup of quantum models


over classical ones have been shown, where the distribution of the oracle is arbitrary
(see [41] and [9]). On the other hand, where the uniform distribution of the quantum
query oracle is permitted, improvements in the number of queries needed over classical
models have been reported for DNF [14], juntas [11], or in the exact learning of k-
sparse functions [7]. A survey on results in quantum machine learning can be found
in [8].
We note that in the context of quantum queries, we consider uniform superposition
in this paper. This is in contrast with the classical PAC learning model which is
distribution agnostic. We also note that following Arunachalam and de Wolf [9], the
need for uniform superposition is essential since otherwise, we will not be able to obtain
any improvement; following their proof, classical and quantum query complexities are
within a constant factor of each other if arbitrary distributions are permitted.

4 The Fourier Analysis of Boolean functions and concentrated


Boolean functions

When working with Boolean functions, in many cases it is useful to look at the func-
tions in the Fourier basis. In this section, we introduce the basics of Fourier analysis
of Boolean functions, which is an important idea with numerous applications in many
fields of Computer Science. We refer the reader to [29] for a more rigorous introduction
to this topic.
In the context of Fourier analysis of Boolean functions, it is more convenient to
define a Boolean function with domain {−1, 1}n and image {−1, 1} instead of the usual
domain {0, 1}n and image {0, 1}. In particular, 0 is mapped to 1, and 1 is mapped to -1
in the conversion. The core idea of the study of Fourier analysis of Boolean functions
is that a Boolean function f : {−1, 1}n → {−1, 1} can be uniquely written as:

f (x) = fˆ(S)x S
S⊆[n]


where the x S = i∈S xi are monomials called the Fourier terms, which together
create the Fourier basis, and the fˆ(S) are called the Fourier coefficients. Note that
sometimes we use the subset S to refer to a Fourier term when there is no confusion.
We mention here some properties of the Fourier expansion of Boolean functions
that will be used later in the paper, such as

fˆ(S)2 = 1 (1)
S⊆[n]

which is called the Parseval’s identity, and



x∈{−1,1}n f (x)x S
fˆ(S) = (2)
2n

123
Quantum learning of concentrated... Page 9 of 23 256

and


Ex∈{−1,1}n [( f (x) − h(x))2 ] = ( fˆ(S) − ĥ(S))2 (3)
S⊆[n]

In this paper, our target concept class of learning is the class of concentrated func-
tions. A Boolean function f : {−1, 1}n → {−1, 1} is called an -concentrated function
in a set M, for some  < 1 and M a set of subsets of [n], if:


fˆ(S)2 ≤ 
/M
S∈

Note that this class of functions contains, and therefore is a more general class than,
some classes of functions studied in previous quantum machine learning literature,
such as:

– the class of log |M|-junta [11]: a k-junta is a Boolean function that depends on
at most k variables in n variables—clearly a k-junta only has 2k nonzero Fourier
coefficients.
– the class of |M|-sparse functions [7]: a k-sparse Boolean function only has at most
k nonzero Fourier coefficients.

Informally, a concentrated Boolean function on a set M of small size is a Boolean


function where all the information about the function is concentrated in a few important
Fourier terms in the Fourier expansion. One can make the intuitive connection from
such functions to known famous instances of quantum speedup. For example, in the
famous Shor’s algorithm for integer factoring, the problem statement states that there
is at least one non-trivial integer factor of the input. We can connect between a non-
trivial integer factor and an important Fourier term, whereas the unimportant Fourier
terms correspond to the non-factor integers. Another example is the HHL algorithm
which works best when the matrix created by the linear equation system is sparse—the
sparsity of the matrix corresponds to the concentration of information idea.
For the rest of the paper, we use the notation M to denote |M|.

5 Learning of concentrated functions

5.1 Overview of learning of concentrated functions

A general and natural approach to learning a target concentrated function is first to


learn the concentration Fourier terms, then evaluate the target function based on these
terms. A learning algorithm following this approach can be broken into two main
steps, (i) learning the concentration, and (ii) learning the target function based on the
information about concentration.

123
256 Page 10 of 23 K. Palem et al.

5.2 Learning the function knowing the concentration

We begin this section with the second step of the learning process: learn the target
function knowing its concentration set, and work backward from there. This process
follows the construction steps of Kushilevitz–Mansour algorithm [26], together with
our bound and complexity analysis.

Theorem 1 Let f : {−1, 1}n → {−1, 1} be an -concentrated Boolean function on an


unknown set of size at most M. Then, if given a set L of√Fourier terms such that all the
Fourier terms with coefficient greater than or equal to /M are included in L, there
is a learning algorithm that can PAC learn the target function, with error α = O()
and high success probability, in O( |L| ) time using O( |L|
2
 ) classical uniform random
queries.

Proof First, let us describe a subroutine used to estimate the Fourier coefficients of an
arbitrary Fourier term in a Boolean function:

Lemma 1 Let f : {−1, 1}n → {−1, 1}. For any S ⊆ [n], accuracy  > 0 and error
probability δ > 0, one can approximate its Fourier coefficient fˆ(S) using an estimate
f˜(S) that satisfies:

| fˆ(S) − f˜(S)| < γ

with high probability, using O( γ12 ) uniform random queries

Proof Suppose the uniform queries are {(x1 , f (x1 )), (x2 , f (x2 )), . . . (xm , f (xm ))}.
Calculate f˜(S) as:
m
f (xi )xiS
f˜(S) = i=1
m

where xiS denotes the value of S with variable assignment xi .


From Eq. 2, each f (xi )xiS is an independent random variable with expectation
fˆ(S); therefore, from standard Chernoff bounds picking m = O( γ12 ) satisfies the
requirement.

With the help of the above lemma, for each term S ∈ L, we estimate the Fourier
coefficient of S as f˜(S) using Lemma 1 to some accuracy γ that will be determined
later. It is crucial to note that we can reuse the same query results to estimate the
coefficients of all terms in L, since the queries are uniformly random. Define

g(x) = f˜(S)x S
S∈L

and

h(x) = sign(g(x))

123
Quantum learning of concentrated... Page 11 of 23 256

as in the Kushilevitz–Mansour Algorithm [26]. We will prove that h(x) is the desired
learning result by proving that Pr[h(x) = f (x)] ≤ α = O().
If f (x) = h(x) then | f (x) − g(x)| ≥ 1. From Eq. 3, we have:

Pr[ f (x) = h(x)] ≤ E[| f (x) − g(x)|] ≤ E[( f (x) − g(x))2 ]


 
= ( fˆ(S) − f˜(S))2 + fˆ(S)2 (as f˜(S) = 0 for S ∈/ L)
S∈L S ∈L
/
  
= ( fˆ(S) − f˜(S))2 + fˆ(S)2 + fˆ(S)2
S∈L S∈M\L S ∈(M∪L)
/


For S ∈
/ L, we have f (S)2 ≤ M and therefore
 
fˆ(S)2 ≤ M · = (4)
M
S∈M\L


For S ∈
/ M, the condition of the function f guarantees that S ∈M
/ fˆ(S)2 ≤ , thus

fˆ(S)2 ≤  (5)
S ∈(M∪L)
/

For S ∈ L, then we have | f˜(S) − fˆ(S)| ≤ γ from the estimation procedure in


Lemma 1. Therefore,

( fˆ(S) − f˜(S))2 ≤ γ 2 |L|
S∈L



Picking γ = √
|L|
yields us


( fˆ(S) − f˜(S))2 ≤  (6)
S∈L

Therefore, from Eqs. 4, 5 and 6, we have


  
Pr[h(x) = f (x)] ≤ ( fˆ(S) − f˜(S))2 + fˆ(S)2 + fˆ(S)2 ≤ 3
S∈L S ∈(M∪L)
/ S∈M\L



Estimating Fourier coefficients to error of γ = √
|L|
requires O( |L|
 ) queries. Thus,
the total time taken is O( |L| ). Because each Fourier coefficient is independent, one
2

can reuse the queries to estimate all the Fourier coefficients. Thus, the total number of
queries needed for learning f is O( |L|  ). The probability of success of the procedure
is high since the probability of success of estimating the Fourier coefficients is high.

123
256 Page 12 of 23 K. Palem et al.

Table 1 Different algorithms for solving the sparse Fourier sampling problem using different types of query

Algorithm Time complexity Query complexity Query type

Hassanieh2012 [23] O(M log(2n /M)n/) O(M log(2n /M)n/) Uniform random
query
Indyk2014 [25] Õ(2n n c ) Õ(Mn) Uniform random
query
Goldreich–Levin [20, 26] O(n M 3 / 3 ) O(n M 3 / 3 ) Membership query
Quantum Fourier sampling O( M
 ) O( M
 ) Quantum uniform
random query

5.3 Learning the concentration

Learning of the concentration set of concentrated Boolean functions has been studied
in different forms outside of classical learning, in particular in the context of digital
signal processing (DSP). In classical learning, this is performed using the Goldreich–
Levin theorem in the Kushilevitz–Mansour algorithm. In DSP, the problem of learning
the set of concentrated Fourier terms is referred to as the sparse Fourier transform.
From now on, let us use the word sparse Fourier sampling to describe the generic
problem of picking out the important terms in the Fourier domain.
Definition 1 The sparse Fourier sampling problem: Given f : {−1, 1}n → {−1, 1}
be an -concentrated Boolean function on an unknown set of size at most M, sample
a set L of Fourier terms such that L contains all the Fourier terms with coefficient
greater than or equal to a threshold γ .

From Theorem 1, we need the threshold γ = O( √  ) in order to learn the target
√ M
function. Using γ = O( √  ),
we list in the below table algorithms that solve the
M
sparse Fourier sampling problem using different type of query.
Originally, the Kushilevitz–Mansour algorithm uses the Goldreich–Levin algorithm
[20] to solve the sparse Fourier sampling problem. On the other hand, the algorithm
developed by Hassanieh et al. is the fastest classical algorithm for this purpose (note
that the algorithm by Hassanieh et al. actually performs the sparse Fourier transform,
which is the equivalent of learning; therefore, it automatically solves the sparse Fourier
sampling problem). We will look at and compare those algorithms more in depth in
subsection 5.5.
In this subsection, we look at the quantum Fourier sampling for performing this
task. The quantum Fourier sampling (or Fourier sampling) routine, first used in [16]
for learning disjunctive normal form Boolean function, is a well-known quantum
procedure popularly used in theoretical quantum machine learning algorithms.
Theorem 2 Given f : {−1, 1}n → {−1, 1} be an -concentrated Boolean function
on an unknown set of size at most M, there exists a quantum algorithm that, using
O( M ) quantum uniform queries and time, recovers a list L of√at size O( M ) containing
all Fourier terms of f with Fourier coefficient greater than /M with high success
probability.

123
Quantum learning of concentrated... Page 13 of 23 256

Proof First let us describe the quantum Fourier sampling (QFS) routine: Given access
1 
to a uniform query oracle that when given the n+1 qubit input state 2n/2 x∈{0,1}n |x, b
1  
(n qubits for x and one for b), returns 2n/2 x∈{0,1}n |x, b ⊕ f (x), as described in
section 3.2, the QFS routine returns S ⊆ [n] with probability fˆ(S)2 in O(1) time using
1 uniform quantum query. Here, f  (x) is f (x) viewed in the computational basis, i.e.,
under the change of basis {1, −1} to {0, 1}, with 1 mapped to 0 and −1 mapped to 1.
The steps of the QFS routine are as follows:
1 
1. Start with 2n/2 x∈{0,1}n |x, 1. Apply the Hadamard transform to the last qubit to
1 
get 2n/2 x∈{0,1}n |x|− where |− = |0−|1 √ .
2
2. Apply the quantum queryoracle (with |− being the auxiliary bit) to turn
1  f  (x) |x|− (note that |− ⊕ f  (x) =
n/2 x∈{0,1}n |x|− into x∈{0,1}n (−1)
2

(−1) f (x) |−).
Apply the Hadamard gate to the last qubit to send it back to |1, and we have the
1 
resulting state 2n/2 x∈{0,1}n f (x)|x in the first n qubit.
3. Apply a Hadamard gate to all of the first n qubits and obtain:

1  1  1  
f (x)( (−1)x.S |S) = n ( (−1)x.S f (x)|S) = fˆ(S)|S
2n/2 x
2n/2 2 x
S S S

where x.S denotes the dot product. Here, S denotes both a subset of [n] (or equiv-
alently a number in [0, 2n − 1]), as well as the associated bit string. For example,
for n = 5 and S = [1, 2], we can also say S = 5 or S = 01100.
4. Measure the resulting state to obtain the desired result.

Now, we perform our procedure is as follows: query the QFS routine O( M ) times,
and output the collection of results.
Consider an arbitrary
 term S ⊆ [n] in the Fourier basis with Fourier coefficients
fˆ(S) greater than M . The promise that f is -concentrated in a subset of size at
most √ M tells us that there are at most 2M terms with Fourier coefficients that are at
least /M.
Therefore, by using O( M ) uniform quantum queries and applying the QFS pro-

cedure, the probability that a term with coefficient at least M not appearing in the
9
result is less than (say) 10M , and therefore the probability that the procedure succeeds
9
is at least 10 from the union bound (here 9/10 is chosen to represent high success
probability—this can be made arbitrarily close to 1). Since each query can add at most
one new element to L, the size of L is O( M ).

5.4 Quantum learning of concentrated functions

Combining results from the previous two subsections, we arrive at our first main
theorem.

123
256 Page 14 of 23 K. Palem et al.

Theorem 3 Let C be a concept class such that every f : {−1, 1}n → {−1, 1} in C has
its Fourier spectrum -concentrated on a collection of at most M Fourier terms. Then,
2
C can be PAC learned using O( M2 ) quantum uniform queries in O( M 3 ) time.
Proof This theorem is a straightforward

corollary√of Theorem 1 and Theorem 2, where
the threshold is γ = O( √  ). When γ = O( √  ), we have |L| = O(M/), and
M M
therefore from Theorem 1 we can learn the class of concentrated Boolean functions
in O(M 2 / 3 ) time using O(M/ 2 ) classical uniform random queries. Combine with
the cost of the subprocedure in Theorem 2, the combined complexity of our algorithm
is O(M 2 / 3 ) time and O(M/ 2 ) classical uniform random queries + O(M) uniform
quantum queries, or O(M/ 2 ) quantum uniform random queries.

5.5 Comparing different methods for learning concentrated Boolean functions

In this section, we will explore the classical algorithms that were developed to solve the
problem of learning concentrated Boolean functions. In particular, we derive a tight
bound for the Kushilevitz–Mansour algorithm, and discuss some of the algorithms
that perform sparse Fourier transform in the context of discrete Fourier transform, a
problem similar to that of learning the class of concentrated Boolean functions.
We will go through the procedure of the Kushilevitz–Mansour algorithm at a high
level which internally uses the Goldreich–Levin theorem, and the algorithm developed
by Hassanieh et al. [23] to give the readers an idea of how these algorithms operate
and to analyze their complexity. A correctness proof will not be included.

5.5.1 A tight bound of the Kushilevitz–Mansour algorithm: learning the


concentration using classical membership queries

Goldreich–Levin Theorem [20] was originally developed for finding a hard-core pred-
icate for any one-way function in the context of cryptography. A modified version of
it to work in the Boolean domain is used in the Kushilevitz–Mansour algorithm as
below.
Theorem 4 Goldreich–Levin theorem [20, 26] There exists a randomized algorithm,
which given classical membership query access to a function f : {−1, 1}n → {−1, 1}
and a threshold γ > 0, recovers a list L of O( γ12 ) Fourier terms such that:
• If | fˆ(S)| ≥ γ then S ∈ L
• If S ∈ L then | fˆ(S)| ≥ γ /2
The algorithm succeeds with high probability, runs in O(n γ16 ) time, and uses O(n γ16 )
membership queries.
¯
Proof
 (Goldreich–Levin) Let J , J¯ be a partition of [n] and S ⊆ J , we write W S| J [ f ]
= T ⊆ J¯ fˆ(S ∪ T ) the sum of the squares of the Fourier weights of f on sets whose
2

restriction to J is S. A crucial step in the Goldreich–Levin algorithm is that it is


provable that
¯
W S| J [ f ] = Ez∼{−1,1} J¯ Ey,y ∼{−1,1} J [ f (y, z)χ S (y) · f (y , z)χ S (y )]

123
Quantum learning of concentrated... Page 15 of 23 256

where y, y are independent assignments to the variables in J , and z is an assignment


to the variables in J¯. From this equation, f (y, z)χ S (y) · f (y , z)χ S (y ) is a random
variable with value in ±1. Therefore, using classical membership queries, we can
¯
estimate the value of W S| J [ f ] to an arbitrary error θ using O(1/θ 2 ) examples with
high confidence.
The above equation gives us a tool to evaluate the sum of the squares of weights of
the Fourier terms inside a ‘range.’ Using this result, the Goldreich–Levin algorithm
proceeds as follows: Initially, all the Fourier terms are put into a single ’bucket.’
– Select a bucket B containing 2m terms
– Split it into two buckets, B1 , B2 each having 2m−1 terms
– Estimate the sum of squares of the Fourier coefficients of the terms in each buckets
– Discard a bucket if its total Fourier square sum estimate is less than γ 2 /2
We will not discuss the correctness of the algorithm but will focus more on its com-
plexity. We can estimate the sum of square of Fourier terms of each bucket to an error
of θ = γ 2 /4 using O(1/γ 4 ) membership queries. From Parseval’s identity, there
are at most 4/γ 2 buckets on every instance. Each bucket can be the split at most n
times; therefore, the algorithm’s main loop runs O( γn2 ) times. Therefore in total, the
algorithm uses O(n/γ 6 ) times and membership queries.

Now, we can arrive at our tight bound for the Kushilevitz–Mansour algorithm. This
bound is desirable, since an exact bound for the algorithm is not explicitly presented
in the original paper.

Theorem 5 (Kushilevitz–Mansour algorithm) Let C be a concept class such that every


f : {−1, 1}n → {−1, 1} in C has its Fourier spectrum -concentrated on a collection
of at most M Fourier terms. Then, C can be PAC learned using O( nM + M2 ) membership
2
queries in time O( M 3 ) with error α = O().

Proof As stated, we will focus on the complexity



of the algorithm. From Theorem 1,
we want to choose the value threshold γ = O( √  ). Therefore, the Goldreich–Levin
M
3 3
algorithm takes O( nM3 ) time to run and uses O( nM3 ) classical membership queries.
Combining this with Theorem 1, we have the run time of the Kushilevitz–Mansour
3
algorithm is O( nM3 ) and the bound on the number of classical membership queries
3
needed is O( nM3 ).

5.5.2 The sparse Fourier transform method

Hassanieh et al. in [23] developed an algorithm to solve a similar problem to learning


concentrated functions in the context of the discrete Fourier transform, which was
denoted ‘sparse Fourier transform’ problem in the context of discrete Fourier trans-
form. The algorithm uses examples taken randomly from the signal stream, with the
promise that the signal is nearly sparse, i.e., the heaviest terms lie in the top k coor-
dinates, and the goal is to output those heaviest coordinates and their corresponding
magnitudes.

123
256 Page 16 of 23 K. Palem et al.

Table 2 Different algorithms solving the learning of concentrated Boolean functions problem using different
types of query

Algorithm Time complexity Query complexity Query type

Hassanieh2012 [23] O(M log(2n /M)n/) O(M log(2n /M)n/) Uniform random
query
Indyk2014 [25] Õ(2n n c ) Õ(Mn) Uniform random
query
Kushilevitz–Mansour [20, 26] O(n M 3 / 3 ) O(n M 3 / 3 ) Membership query
Our algorithm O(M 2 / 3 ) O(M/ 2 ) Quantum uniform
random query

Theorem 6 (Hassanieh et al.) Given a list of numbers x = {x1 , . . . x N }, with N = 2n


where n is an integer, and the promise that the result x̂ of the discrete Fourier transform
on x, is -concentrated on M unknown terms. Then, an approximation x̂  of x̂ such
that ||x̂ − x̂  ||2 ≤ C minM-sparse y ||x̂ − y||2 can be computed in O(M log(2n /M)n/)
time.

The algorithm developed by Hassanieh et al. can be applied to solve the problem of
learning concentrated Boolean functions in a trivial way: Consider the set of all pairs
of input–output (x, f (x)) as the list of number x, and the output of the DFT on x, x̂,
is the Fourier coefficients of the functions multiplying with 2n/2 .

5.5.3 Comparing different algorithms for learning concentrated functions

In this section, we have a comparison between the different algorithms to solve the
problem of learning concentrated functions that we have visited. The Kushilevitz–
Mansour algorithm is the oldest among those algorithms, having the worst upper
bounds on both time and query complexity. The Hassanieh algorithm is a highly
optimized algorithm that achieve a good bound on both time and query complexity as
well as uses only uniform random query. Our new algorithm in Theorem 3 achieves the
best query complexity, whereas the time complexity is better when M = Õ(n 2 ). The
algorithm by Indyk et. al. [25] is also worth mentioning, since it achieves a classical
query complexity of Õ(Mn) which is proven to be theoretically optimal. However,
the time complexity of Indyk’s algorithm is exponential in n.
Since the Hassanieh algorithm also has the high-level structure of finding the impor-
tant Fourier terms and then estimating the weights, an obvious question that arises is
that whether we can improve our result using the estimation scheme from Hassanieh
algorithm? It might be possible, but not straightforward—the Hassanieh algorithm is
a iterative algorithm that interleaves finding the big Fourier terms and estimating them
more precisely in each iteration. Therefore, improving our algorithm using ideas from
the Hassanieh algorithm is a possible direction which we will consider to extend this
work in the future.

123
Quantum learning of concentrated... Page 17 of 23 256

6 Lower bounds for quantum PAC learning of concentrated Boolean


functions

Quantum query lower bounds have been studied in the past and are useful in helping
to understand how good quantum algorithms are, as well as in comparing classical and
quantum learning algorithms. In this section, we prove a lower bound for the number
of quantum queries needed to learn the class of -concentrated Boolean functions in
the distribution-independent settings. Our lower bound matches the upper bound from
the previous section when  is considered to be a constant; however, since our previous
section performs learning in a uniform distribution settings, the matching bounds do
not indicate optimum.
In this section, we use the Vapnik Chervonenkis (VC) dimension as a tool to prove
the lower bound of our concept class. In term of binary classification, a set X of
examples is shattered by a concept class C if for any labeling of the examples in
X , there is a function in C that assigns those labeling to those examples. The VC
dimension of a concept class C is the size of the largest finite set of examples that is
shattered by C.
Intuitively, the higher the VC dimension of a class of functions, the more com-
plicated the concept class is and therefore more data is required to learn it. In fact,
relationship between the VC dimension of a concept class and the lower bound on the
number of quantum queries needed to learn has been proven in [10].
Theorem 7 (Atici, Servedio [10]) Let C be any concept class of VC dimension d, and
supposed access to a quantum query oracle is given. Then, the sample complexity
√ of
PAC learning C with error ≤ 1/10 under arbitrary distribution is (d + d ).
On the other hand, intuitively, as the number of important Fourier terms of a function
becomes bigger, the function becomes more complicated. This suggests a connection
between the concentration size M of the concept class of concentrated functions and
the VC dimension of that class. We can prove the following theorem.
Theorem 8 The concept class of Boolean functions that are -concentrated on an
unknown subset of size at most M has VC dimension at least M.
Proof To make the analysis simpler, let assume that M is a power of 2. Let us consider
the class of log M-junta (recall they are Boolean functions that only depend on log M
variables). Trivially, such functions have at most M nonzero Fourier coefficients, and
therefore is concentrated on a subset of size M. Hence the concept class of Boolean
functions that are -concentrated on a subset of size at most M contains the class of
log M-junta.
Consequently, the VC dimension of the class of Boolean functions -concentrated
on a subset of size at most M is at least the VC-dimension of the class of log M-junta.
On the other hand, we know that the VC dimension of the class of log M-junta is at
least M as proved in [11]. Therefore, the VC dimension of the class of concentrated
Boolean functions on a subset of size at most M is at least M.
A direct consequent of the above two theorems is a lower bound on the number
of uniform quantum queries needed to PAC learn the concept class of concentrated
functions.

123
256 Page 18 of 23 K. Palem et al.

Theorem 9 Let C be the class of Boolean functions f : {−1, 1}n → {−1, 1} that are
-concentrated on an unknown set of size at most M. Then, the number of quantum
queries necessary to PAC learn C with error α ≤ 1/10 and high success probability
under arbitrary distribution is (M).

7 Lower bounds for quantum exact learning of concentrated Boolean


functions

The two most used learning models in theoretical learning are PAC learning and
exact learning. Therefore, the natural question one would come up with at this point
is whether we can efficiently learn the class of concentrated functions using exact
learning model. In this section, we will answer that question.
Here, we will first introduce some essential information theory definitions and
concepts we will need to prove our lower bound, followed by a precision formulation
of the lower bound and its proof.

7.1 Information theory and quantum computing

In this subsection, we describe the basic definitions of information theory and quantum
information theory.
A random variable A with probability Pr[A = a] = pa has entropy H (A) =

− a pa log( pa ). Intuitively, the entropy of a variable represents the least number of
bits needed to store the information contained in that variable. For a pair of possibly
correlated random variables A and B, the joint entropy H (A, B) is the information
obtained
 from evaluating A and B simultaneously, and can be computed as H (A, B) =
a∼A,b∼B Pr(A = a, B = b) log(Pr(A = a, B = b)). The conditional entropy of
A given B is H (A|B) = H (A, B) − H (B), or as Eb∼B [H (A|B = b)]. The mutual
information between A and B is I (A : B) = H (A) + H (B) − H (A, B).
For quantum information theory, given a density matrix ρ of n qubits, its singular
values ρ1 , . . . ρ2n form a probability distribution P, and the von Neumann entropy of
ρ is S(ρ) = H (P).

7.2 Lower bounds for quantum exact learning of concentrated Boolean functions

Recall that in exact learning, the error α = 0 and the goal is to exactly learn a
function. We will now prove that exact learning of the class of concentrated Boolean
function is remarkably harder than PAC learning, and therefore PAC learning offers
more opportunities for achieving an advantage through quantum computing. We now
show that
Theorem 10 Let C be the class of Boolean functions f : {−1, 1}n → {−1, 1} -
concentrated on an unknown set of size at most M of Fourier terms with 0 <  < 1/2.
Then, the number of uniform quantum examples necessary to exact learn C with high
success probability is ( 2 log
n M
n ).

123
Quantum learning of concentrated... Page 19 of 23 256

Proof Our proof follows the techniques used in the proof of Theorem 9 in the work
of Arunachalam et al. [7]. Let V be the set of distinct subspaces in {−1, 1}n with
dimension n − log M and C  be the class of functions:

C  = { f V : {−1, 1}n → {−1, 1}| f V (x) = −1 iff x ∈ V , where V ∈ V}

Note that |C  | = |V| and each f V evaluates to 1 on a 1−1/M fraction of its domain.
This class of functions is M-sparse Boolean functions, i.e., functions which have at
most M nonzero Fourier coefficients, as proved in [24].
Now, let us take C  and for each f V in this class, among the input values x ∈ V (i.e.,
f V (x) = −1), choose a random  fraction of x and switch f V (x) (so that f V (x) = 1).
This creates a new function f V . Let us repeat this action for all possible choices of
randomly chosen  fraction of x ∈ V . All the newly created functions form a new
class of functions called C  .
Claim The concept class C  contains functions that are -concentrated in a set of at
most M Fourier terms.

Proof Examine a function f V in C  and f V an arbitrary function produced from switch-


ing f V ’s values in C  . From Eq. 3, we have:

( fˆV (S) − fˆ  V (S))2 = Ex∈{−1,1}n [( f V (x) − f V (x))2 ]
S⊆[n]

< Ex∈{−1,1}n | f V (x)=−1 [( f V (x) − f V (x))2 ] = 

Denote the set Fourier terms of f V with nonzero coefficient F, we know that
|F| ≤ M. From the above equation, we have
 
> ( fˆV (S) − fˆ  V (S))2 + ( fˆV (S) − fˆ  V (S))2
S∈F S ∈F
/
 
> ( fˆV (S) − fˆ  V (S))
2
= fˆ  V (S)2
S ∈F
/ S ∈F
/

because fˆV (S) = 0 for S ∈


/ F. Therefore, f V is -concentrated in the set F by
definition.

We will prove our lower bound for this class of functions C  , following a three-step
information theoretic approach. The proof of this part of the theorem is similar to that
of Theorem 9 in [7] with some argument modifications; here we will rewrite the full
proof for the sake of completeness.
Let A be a random variable that is distributed uniformly over C  . Suppose A = f
and let B = B1 , B2 . . . BT be T copies of the uniform quantum example |ψ f  =
1 
2n/2 x∈{0,1}n |x, f (x). The random variable B is a function of the random variable
A.
Now let us look at the mutual information of A and B, which is I (A : B). We have
a series of bounds:

123
256 Page 20 of 23 K. Palem et al.

1. I (A : B) ≥ (log |C  |)
This is true since B allows to recover A with high probability and A is a random
variable uniformly distributed over C  .
2. I (A : B) ≤ T · I (A : B1 )
This is true as


T
I (A : B) = H (B) − H (B|A) = H (B) − H (Bi |A)
i=1


T 
T 
T
≤ H (Bi ) − H (Bi |A) = I (A : Bi ) = T · I (A : B1 )
i=1 i=1 i=1

from the rule of subadditivity of quantum entropy and since all the Bi are identical.
3. I (A : B1 ) ≤ O(n/M)
Since AB is a classical-quantum state, we have

I (A : B1 ) = S(A) + S(B1 ) − S(A, B1 ) = S(B1 )

with the first equality comes from the definition, and the second equality uses the
fact that B1 is a function of (fully dependent on) A, thus Pr(A = a, B1 = b) is 0
for all but one value of b, and hence S(A, B1 ) = S(A).
Therefore, it is sufficient to bound S(B1 ). We have B1 is

1 
ρ= |ψ f ψ f |
|C  | 
f ∈C

. Let σ0 ≥ σ1 ≥ . . . ≥ σ2n −1 be the singular values of ρ. Since ρ is a density


matrix the values σ create a probability distribution which uniquely determines
1 
the value of S(ρ). Note that since the inner product between 2n/2 x∈0,1n |x, 1
 
and every f ∈ C is > 1 − 1/M from the way we construct C , we have σ0 >
1 − 1/M. Now let N ∈ {0, 1, . . . 2n − 1} be a random variable with probabilities
σ0 ≥ σ1 ≥ . . . ≥ σ2n −1 , and Z be the indicator of the event N > 0, which indicates
H (N|Z = 0) = 0 and Pr[Z = 0] ≥ 1 − 1/M. Then

S(ρ) = H (N) = H (Z) + H (N|Z)


= H (σ0 ) + σ0 · H (N|Z = 0) + (1 − σ0 ) · H (N|Z = 1)
n+1 n + log M
< H (1/M) + ≤ O( ) = O(n/M)
M M

using H (a) ≤ O(a log(1/a)).


These bounds follow from the proof of Theorem 9 of [7]. From the above three bounds,

we arrive at T = ( M logn |C | ). Now we only need to bound |C  | to complete our proof.

123
Quantum learning of concentrated... Page 21 of 23 256

2n
Claim The number of distinct functions in the concept class C  is 2(n log M M ) .
Proof We can prove that the number of distinct V , |V|, is 2(n log M) . This is already
proven in Theorem 9 in [7]. We also can prove that the f V received from modifying
f V are non-duplicate for different f V . Indeed, for two distinct subspaces V and V  of
the same dimension, the number of overlap elements between them cannot be greater
than 21 |V |. This is true since we can specify a d-dimensional subspace using d linearly
independent vectors, and in the case that at least one vector among those d vectors is
modified, the number of modified elements in the subspace is at least 2d−1 . Therefore,
for a subspace V , when we choose to ‘remove’ up to  < 1/2 fraction of the elements
from the subspace, we can still uniquely identify the subspace.
However, the main component that makes up the size of C  is the number of distinct

f V one can get from a f V . How many ways to remove |V | elements from |V |
 |V |  1 |V |
elements? It is |V | , which can be trivially seen to be greater than  , which
2n
is greater than 2|V | for 0 <  < 1/2. Notice that |V | = M, and therefore 2|V | is
2n
2 M.
n2n log M
Thus, the number of distinct functions in C  is 2( M ) .
Therefore, we have that log |C  | = ( 2 and therefore T = ( 2
n log M n log M
M ), n ).

Corollary 1 Let C be the class of Boolean functions f : {−1, 1}n → {−1, 1} -


concentrated on an unknown set of size at most M of Fourier terms with M polynomial
in n and 0 <  < 1/2. Then, the number of uniform quantum examples necessary to
exact learn C with high success probability is exponential in n when  is not exponen-
tially small in n.
Hence to exact learn the concept class of Boolean functions that are -concentrated
in an unknown set of at most and ‘reasonable’ , we need an exponential number
of uniform quantum queries. The intuition behind this result is that the size of the
concept class is too big to learn with no room for error. This indicates that for the class
of concentrated Boolean functions, an approximation learning scheme with error like
PAC learning performs significantly better than an exact learning scheme.

8 Conclusion and remarks

The rapid increase in popularity of both machine learning and quantum computing in
recent years has made quantum machine learning an active topic of research. Quan-
tum information and concomitant techniques have been applied in the experimental
context of machine learning with promise in improvements. In this work, we study the
problem of quantum learning of Boolean functions and show novel bounds through
the concept of concentration. Our main contribution in this paper is to show that for
any Boolean function, through its concentration measure quantified as the number of
dominant terms in its Fourier spectrum, we are able to characterize the potential for
improvements.
We note that the idea of using quantum Fourier/Hadamard transform instead of
the Goldreich–Levin theorem to perform the spare Fourier Transform is not new. The

123
256 Page 22 of 23 K. Palem et al.

idea was applied in several theoretical quantum machine learning works such as PAC
learning of DNF [16], juntas [11] and exact learning of k-sparse functions [9]. The
same idea was also applied in the field of cryptography to improve the Goldreich–
Levin theorem such as works done by Adcock and Cleve [2, 3]. However, our work
is the first one that applies this idea to the class of concentrated functions which is a
super class of juntas and k-sparse functions. All of our results and those that precede
our work rely on the framework of query complexity to characterize efficiency. On
a side note, in the more recent years, ‘dequantization algorithms’ has been an active
area of research (see [17, 19, 34, 35]) that focuses on exploiting the advantage of a
quantum-like memory or database oracle. It is noteworthy that these results propose
to use query complexity to characterize improvements and therefore their measure of
goodness is consistent with our approach.
Funding This material is based upon work supported by Defense Advanced Research Projects Agency
under the Grant No. FA8750-16-2-0004.

Data Availability Data sharing not applicable to this article as no datasets were generated or analyzed during
the current study.

References
1. Aaronson, S.: Read the fine print. Nat. Phys. 11, 291–293 (2015). https://doi.org/10.1038/nphys3272
2. Adcock, M., Cleve, R.: A quantum Goldreich-Levin theorem with cryptographic applications. In: Alt,
H., Ferreira, A. (eds.) STACS 2002, pp. 323–334. Springer, Berlin Heidelberg (2002)
3. Adcock, M., Cleve, R., Iwama, K., Putra, R., Yamashita, S.: Quantum lower bounds for the Goldreich-
Levin problem. Inf. Process. Lett. 97(5), 208–211 (2006)
4. Aïmeur, E., Brassard, G., Gambs, S.: Machine learning in a quantum world. In: Lamontagne, L.,
Marchand, M. (eds.) Advances in artificial intelligence, pp. 431–442. Springer, Berlin (2006)
5. Aïmeur, E., Brassard, G., Gambs, S.: Quantum speed-up for unsupervised learning. Mach. Learn.
90(2), 261–287 (2013). https://doi.org/10.1007/s10994-012-5316-5
6. Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1987). https://doi.org/10.
1007/BF00116828
7. Arunachalam, S., Chakraborty, S., Lee, T., Wolf, R.: Two new results about quantum exact learning.
In: ICALP (2019)
8. Arunachalam, S., de Wolf, R.: Guest column: a survey of quantum learning theory. SIGACT News
48(2), 41–67 (2017). https://doi.org/10.1145/3106700.3106710
9. Arunachalam, S., de Wolf, R.: Optimal quantum sample complexity of learning algorithms. J. Mach.
Learn. Res. 19, 71:1–71:36 (2018). http://jmlr.org/papers/v19/18-195.html
10. Atici, A., Servedio, R.: Improved bounds on quantum learning algorithms. Quant. Inf. Process. 4,
355–386 (2004). https://doi.org/10.1007/s11128-005-0001-2
11. Atıcı, A., Servedio, R.: Quantum algorithms for learning and testing juntas. Quant. Inf. Process. 6,
323–348 (2007). https://doi.org/10.1007/s11128-007-0061-6
12. Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning.
Nature 549(7671), 195–202 (2017). https://doi.org/10.1038/nature23474
13. Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the vapnik-chervonenkis
dimension. J. ACM 36(4), 929–965 (1989). https://doi.org/10.1145/76359.76371
14. Bshouty, N., Jackson, J.: Learning DNF over the uniform distribution using a quantum example oracle.
SIAM J. Comput. 28, 1136–1153 (1999). https://doi.org/10.1137/S0097539795293123
15. Bshouty, N.H., Cleve, R., Gavaldà, R., Kannan, S., Tamon, C.: Oracles and queries that are sufficient
for exact learning. J. Comput. Syst. Sci. 52(3), 421–433 (1996). https://doi.org/10.1006/jcss.1996.
0032 . https://www.sciencedirect.com/science/article/pii/S002200009690032X

123
Quantum learning of concentrated... Page 23 of 23 256

16. Bshouty, N.H., Jackson, J.C.: Learning dnf over the uniform distribution using a quantum example
oracle. SIAM J. Comput. 28(3), 1136–1153 (1998). https://doi.org/10.1137/S0097539795293123
17. Chia, N.H., Lin, H.H., Wang, C.: Quantum-inspired sublinear classical algorithms for solving low-rank
linear systems (2018)
18. Dunjko, V., Taylor, J.M., Briegel, H.J.: Quantum-enhanced machine learning. Phys. Rev. Lett. 117(13),
130501 (2016). https://doi.org/10.1103/physrevlett.117.130501
19. Gilyén, A., Lloyd, S., Tang, E.: Quantum-inspired low-rank stochastic regression with logarithmic
dependence on the dimension. ArXiv arXiv:1811.04909 (2018)
20. Goldreich, O., Levin, L.: A hard-core predicate for all one-way functions. pp. 25–32 (1989). https://
doi.org/10.1145/73007.73010
21. Grover, L.K.: A fast quantum mechanical algorithm for database search. In: Proceedings of the Twenty-
Eighth Annual ACM Symposium on Theory of Computing, STOC ’96, p. 212-219. Association for
Computing Machinery, New York, NY, USA (1996). https://doi.org/10.1145/237814.237866
22. Harrow, A.W., Hassidim, A., Lloyd, S.: Quantum algorithm for linear systems of equations. Phys. Rev.
Lett. 103(15), 150502 (2009)
23. Hassanieh, H., Indyk, P., Katabi, D., Price, E.: Nearly optimal sparse Fourier transform (2012)
24. Haviv, I., Regev, O.: The list-decoding size of Fourier-sparse Boolean functions (2015)
25. Indyk, P., Kapralov, M.: Sparse fourier transform in any constant dimension with nearly-optimal sample
complexity in sublinear time (2014)
26. Kushilevitz, E., Mansour, Y.: Learning decision trees using the Fourier spectrum. SIAM J. Comput.
22(6), 1331–1348 (1993). https://doi.org/10.1137/0222080
27. Lloyd, S., Mohseni, M., Rebentrost, P.: Quantum principal component analysis. Nat. Phys. 10(9),
631–633 (2014). https://doi.org/10.1038/nphys3029
28. Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information: 10th Anniversary
Edition, 10th edn. Cambridge University Press, USA (2011)
29. O’Donnell, R.: Analysis of Boolean Functions. Cambridge University Press, USA (2014)
30. Rebentrost, P., Mohseni, M., Lloyd, S.: Quantum support vector machine for big data classification.
Phys. Rev. Lett. 113(13), 130503 (2014). https://doi.org/10.1103/physrevlett.113.130503
31. Roh, Y., Heo, G., Whang, S.E.: A survey on data collection for machine learning: a big data – ai
integration perspective (2019)
32. Shalev-Shwartz, S., Ben-David, S.: Understanding machine learning: From theory to algorithms. Cam-
bridge University Press (2014)
33. Shor, P.W.: Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum
computer. SIAM J. Comput. 26(5), 1484–1509 (1997). https://doi.org/10.1137/S0097539795293172
34. Tang, E.: Quantum-inspired classical algorithms for principal component analysis and supervised
clustering (2018)
35. Tang, E.: A quantum-inspired classical algorithm for recommendation systems. In: Proceedings of the
51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, p. 217-228. Association
for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3313276.3316310
36. Valiant, L.G.: A theory of the learnable. In: Proceedings of the Sixteenth Annual ACM Symposium
on Theory of Computing, STOC ’84, p. 436-445. Association for Computing Machinery, New York,
NY, USA (1984). https://doi.org/10.1145/800057.808710
37. Whalen, J., Wang, Y.: Hottest job in china’s hinterlands: Teaching ai to tell a truck from a turtle.
Washingtonpost (2019). https://www.washingtonpost.com/business/2019/09/26/hottest-job-chinas-
hinterlands-teaching-ai-tell-truck-turtle/
38. Wiebe, N., Kapoor, A., Svore, K.: Quantum algorithms for nearest-neighbor methods for supervised
and unsupervised learning (2014)
39. Wiebe, N., Kapoor, A., Svore, K.M.: Quantum deep learning (2015)
40. Wiebe, N., Kapoor, A., Svore, K.M.: Quantum perceptron models (2016)
41. Zhang, C.: An improved lower bound on query complexity for quantum pac learning. Inf. Process.
Lett. 111(1), 40–45 (2010) https://doi.org/10.1016/j.ipl.2010.10.007. https://www.sciencedirect.com/
science/article/pii/S0020019010003133

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

123

You might also like