BQP and the Polynomial Hierarchy

(1)

BQP and the Polynomial Hierarchy

The MIT Faculty has made this article openly available.

Please share

how this access benefits you. Your story matters.

Citation

Aaronson, Scott. “BQP and the polynomial hierarchy.” in

Proceedings of the 42nd ACM Symposium on Theory of Computing.

Cambridge, Massachusetts, USA: ACM, 2010. 141-150.

Publisher

Association for Computing Machinery

Version

Original manuscript

Citable link

http://hdl.handle.net/1721.1/54236

Terms of Use

Creative Commons Attribution-Noncommercial-Share Alike 3.0

(2)

BQP and the Polynomial Hierarchy

Scott Aaronson∗

Abstract

The relationship between BQP and PH has been an open problem since the earliest days of quantum computing. We present evidence that quantum computers can solve problems outside the entire polynomial hierarchy, by relating this question to topics in circuit complexity, pseudorandomness, and Fourier analysis.

First, we show that there exists an oracle relation problem (i.e., a problem with many valid outputs) that is solvable in BQP, but not in PH. This also yields a non-oracle relation problem that is solvable in quantum logarithmic time, but not in AC0.

Second, we show that an oracle decision problem separating BQP from PH would follow from the Generalized Linial-Nisan Conjecture, which we formulate here and which is likely of independent interest. The original Linial-Nisan Conjecture (about pseudorandomness against constant-depth circuits) was recently proved by Braverman, after being open for twenty years.

1 Introduction

A central task of quantum computing theory is to understand how BQP—meaning Bounded-Error Quantum Polynomial-Time, the class of all problems feasible for a quantum computer—fits in with classical complexity classes. In their original 1993 paper defining BQP, Bernstein and Vazirani [13] showed that BPP ⊆ BQP ⊆ P#P_.1 _{Informally, this says that quantum computers are at}

least as fast as classical probabilistic computers and no more than exponentially faster (indeed, they can be simulated using an oracle for counting). Bernstein and Vazirani also gave evidence that BPP 6= BQP, by exhibiting an oracle problem called Recursive Fourier Sampling that requires nΩ(log n) queries on a classical computer but only n queries on a quantum computer.2 The evidence for the power of quantum computers became dramatically stronger a year later, when Shor [37] (building on work of Simon [38]) showed that Factoring and Discrete Logarithm are in BQP. On the other hand, Bennett et al. [11] gave oracle evidence that NP 6⊂ BQP, and while no one regards such evidence as decisive, today it seems extremely unlikely that quantum computers can solve NP-complete problems in polynomial time. A vast body of research, continuing to the present, has sought to map out the detailed boundary between those NP problems that are feasible for quantum computers and those that are not.

However, there is a complementary question that—despite being universally recognized as one of the “grand challenges” of the field—has had essentially zero progress over the last sixteen years:

Is BQP in NP? More generally, is BQP contained anywhere in the polynomial hierar-chy PH = NP_{∪ NP}NP_{∪ NP}NPNP_{∪ · · · ?}

The “default” conjecture is presumably BQP 6⊂ PH, since no one knows what a simulation of BQP in PH would look like. Before this work, however, there was no formal evidence for or against that conjecture. Almost all the problems for which we have quantum algorithms— including Factoring and Discrete Logarithm—are easily seen to be in NP_{∩ coNP.}3 _One

notable exception is Recursive Fourier Sampling, the problem that Bernstein and Vazirani [13] originally used to construct an oracle A relative to which BPPA _{6= BQP}A. One can show, without too much difficulty, that Recursive Fourier Sampling yields oracles A relative to which BQPA 6⊂ NPA and indeed BQPA 6⊂ MAA. However, while it is reasonable to conjecture that Recursive Fourier Sampling (as an oracle problem) is not in PH, it is open even to show that this problem (or any other BQP oracle problem) is not in AM! Recall that AM = NP under

1_{The upper bound was later improved to BQP ⊆ PP by Adleman, DeMarrais, and Huang [4].} 2_{For more about Recursive Fourier Sampling see Aaronson [2].}

3_{Here we exclude BQP-complete problems such as approximating the Jones polynomial [6], which, by the very}

(4)

plausible derandomization assumptions [29]. Thus, until we solve the problem of constructing an oracle A such that BQPA_{6⊂ AM}A, we cannot even claim to have oracle evidence (which is itself, of course, a weak form of evidence) that BQP6⊂ NP.

Before going further, we should clarify that there are two questions here: whether BQP _{⊆ PH} and whether PromiseBQP _{⊆ PromisePH. In the unrelativized world, it is entirely possible that} quantum computers can solve promise problems outside the polynomial hierarchy, but that all

languages in BQP are nevertheless in PH. However, for the specific purpose of constructing an oracle A such that BQPA_{6⊂ PH}A, the two questions are equivalent, basically because one can always “offload” a promise into the construction of the oracle A.4

1.1 Motivation

There are at least four reasons why the BQP versus PH question is so interesting. At a basic level, it is both theoretically and practically important to understand what classical resources are needed to simulate quantum physics. For example, when a quantum system evolves to a given state, is there always a short classical proof that it does so? Can one estimate quantum amplitudes using approximate counting (which would imply BQP⊆ BPPNP_{)? If something like this were true,}

then while the exponential speedup of Shor’s factoring algorithm might stand, quantum computing would nevertheless seem much less different from classical computing than previously thought.

Second, if BQP 6⊂ PH, then many possibilities for new quantum algorithms might open up to us. One often hears the complaint that there are too few quantum algorithms, or that progress on quantum algorithms has slowed since the mid-1990s. In our opinion, the real issue here has nothing to do with quantum computing, and is simply that there are too few natural NP-intermediate problems for which there plausibly could be quantum algorithms! In other words, instead of focussing on Graph Isomorphism and a small number of other NP-intermediate problems, it might be fruitful to look for quantum algorithms solving completely different types of problems— problems that are not necessarily even in PH. In this paper, we will see a new example of such a quantum algorithm, which solves a problem called Fourier Checking.

Third, it is natural to ask whether the P= BQP question is related to that other fundamental? question of complexity theory, P= NP. More concretely, is it possible that quantum computers? could provide exponential speedups even if P = NP? If BQP _{⊆ PH, then certainly the answer to} that question is no (since P = NP =⇒ P = PH). Therefore, if we want evidence that quantum computing could survive a collapse of P and NP, we must also seek evidence that BQP_{6⊂ PH.}

Fourth, a major challenge for quantum computing research is to get better evidence that quantum

computers cannot solve NP-complete problems in polynomial time. As an example, could we show

that if NP_{⊆ BQP, then the polynomial hierarchy collapses? At first glance, this seems like a wild} hope; certainly we have no idea at present how to prove anything of the kind. However, notice

4_{Here is a simple proof: let Π = (Π}

YES,ΠNO) be a promise problem in PromiseBQPA\ PromisePHA, for some

oracle A. Then clearly, every PromisePHA _{machine M fails to solve Π on infinitely many inputs x in Π}

YES∪ ΠNO.

This means that we can produce an infinite sequence of inputs x1, x2, . . .in ΠYES∪ ΠNO, whose lengths n1, n2, . . .

are spaced arbitrarily far apart, such that every PromisePHA _{machine M fails to solve Π on at least one x} i. Now

let B be an oracle that is identical to A, except that for each input length n, it reveals (i) whether n = ni for some

iand (ii) if so, what the corresponding xi is. Also, let L be the unary language that contains 0n if and only if (i)

(5)

that if BQP_{⊆ AM, then the desired implication would follow immediately! For in that case,} NP_{⊆ BQP =⇒ coNP ⊆ BQP}

=⇒ coNP ⊆ AM =⇒ PH = ΣP

2

where the last implication was shown by Boppana, H˚astad, and Zachos [14]. Similar remarks apply to the questions of whether NP ⊆ BQP would imply PH ⊆ BQP, and whether the folklore result NPBPP_{⊆ BPP}NP has the quantum analogue NPBQP_{⊆ BQP}NP. In each of these cases, we find that understanding some other issue in quantum complexity theory requires first coming to grips with whether BQP is contained in some level of the polynomial hierarchy.

1.2 Our Results

This paper presents the first formal evidence for the possibility that BQP6⊂ PH. Perhaps more importantly, it places the relativized BQP versus PH question at the frontier of (classical) circuit lower bounds. The heart of the problem, we will find, is to extend Braverman’s spectacular recent proof [15] of the Linial-Nisan Conjecture, in ways that would reveal a great deal of information about small-depth circuits independent of the implications for quantum computing.

We have two main contributions. First, we achieve an oracle separation between BQP and PH for the case of relation problems. A relation problem is simply a problem where the desired output is an n-bit string (rather than a single bit), and any string from some nonempty set S is acceptable. Relation problems arise often in theoretical computer science; one well-known example is finding a Nash equilibrium (shown to be PPAD-complete by Daskalakis et al. [18]). Within quantum computing, there is considerable precedent for studying relation problems as a warmup to the harder case of decision problems. For example, in 2004 Bar-Yossef, Jayram, and Kerenidis [7] gave a relation problem with quantum one-way communication complexity O (log n) and randomized one-way communication complexity Ω (√n). It took several more years for Gavinsky et al. [23] to achieve the same separation for decision problems, and the proof was much more complicated. The same phenomenon has arisen many times in quantum communication complexity [20, 21, 22, 24, 25], though to our knowledge, this is the first time it has arisen in quantum query complexity.5

Formally, our result is as follows:

Theorem 1 There exists an oracle A relative to which FBQPA _{6⊂ FBPP}PHA, where FBQP and

FBPP are the relation versions of BQP and BPP respectively.6

Interestingly, the oracle A in Theorem 1 can be taken to be a random oracle. By contrast, Aaronson and Ambainis [3] have shown that, under a plausible conjecture about influences in low-degree polynomials, one cannot separate the decision classes BQP and BPP by a random oracle without also separating P from P#Pin the unrelativized world. In other words, the use of relation problems (rather than decision problems) seems essential if one wants a random oracle separation.

5_{Recently, Shepherd and Bremner [36] proposed a beautiful sampling problem that is solvable efficiently on a}

quantum computer but that they conjecture is hard classically. This provides yet another example where it seems easier to find evidence for the power of quantum computers once one moves away from decision problems.

6_{Confusingly, the F stands for “function”; we are simply following the standard naming convention for classes of}

(6)

Underlying Theorem 1 is a new lower bound against AC0 circuits (constant-depth circuits com-posed of AND, OR, and NOT gates). The close connection between AC0 and the polynomial hierarchy that we exploit is not new. In the early 1980s, Furst-Saxe-Sipser [19] and Yao [43] no-ticed that, if we have a PH machine M that computes (say) the Parity of a 2n-bit oracle string, then by simply reinterpreting the existential quantifiers of M as OR gates and the universal quan-tifiers as AND gates, we obtain an AC0 circuit of size 2poly(n)solving the same problem. It follows that, if we can prove a 2ω(polylog n) _{lower bound on the size of AC}0 _{circuits computing Parity, we}

can construct an oracle A relative to which_⊕PA_{6⊂ PH}A. The idea is the same for constructing an A relative to which CA_{6⊂ PH}A_{, where}_{C is any complexity class.}

Indeed, the relation between PH and AC0 is so direct that we get the following as a more-or-less immediate counterpart to Theorem 1:

Theorem 2 In the unrelativized world (with no oracle), there exists a relation problem solvable in BQLOGTIMEbut not in nonuniform AC0.

Here BQLOGTIME is the class of problems solvable by logarithmic-size quantum circuits, with random access to the input string x1. . . xn. It should not be confused with the much larger class

BQNC, of problems solvable by logarithmic-depth quantum circuits.

The relation problem that we use to separate BQP from PH, and BQLOGTIME from AC0, is called Fourier Fishing. The problem can be informally stated as follows. We are given oracle access to n Boolean functions f1, . . . , fn:{0, 1}n→ {−1, 1}, which we think of as chosen uniformly

at random. The task is to output n strings, z1, . . . , zn ∈ {0, 1}n, such that the corresponding

squared Fourier coefficients bf1(z1)2, . . . , bfn(zn)2 are “often much larger than average.” Notice

that if fi is a random Boolean function, then each of its Fourier coefficients bfi(z) follows a

nor-mal distribution—meaning that with overwhelming probability, a constant fraction of the Fourier coefficients will be a constant factor larger than the mean. Furthermore, it is straightforward to create a quantum algorithm that samples each z with probability proportional to bfi(z)2, so that

larger Fourier coefficients are more likely to be sampled than smaller ones.

On the other hand, computing any specific bfi(z) is easily seen to be equivalent to summing 2n

bits. By well-known lower bounds on the size of AC0 circuits computing the Majority function (see H˚astad [40] for example), it follows that, for any fixed z, computing bfi(z) cannot be in PH

as an oracle problem. Unfortunately, this does not directly imply any separation between BQP and PH, since the quantum algorithm does not compute bfi(z) either: it just samples a z with

probability proportional to bfi(z)2. However, we will show that, if there exists a BPPPH machine

M that even approximately simulates the behavior of the quantum algorithm, then one can solve Majority_{by means of a nondeterministic reduction—which uses approximate counting to estimate} Pr [M outputs z], and adds a constant number of layers to the AC0 circuit. The central difficulty is that, if M knew the specific z for which we were interested in estimating bfi(z), then it could

choose adversarially never to output that z. To solve this, we will show that we can “smuggle” a Majority _{instance into the estimation of a random Fourier coefficient b}_f_i_{(z), in such a way that} it is information-theoretically impossible for M to determine which z we care about.

Our second contribution is to define and study a new black-box decision problem, called Fourier Checking_. _{Informally, in this problem we are given oracle access to two Boolean} functions f, g :_{{0, 1}}n_{→ {−1, 1}, and are promised that either}

(7)

(ii) f is uniformly random, while g is extremely well correlated with f ’s Fourier transform over Zn2 (which we call “forrelated”).

The problem is to decide whether (i) or (ii) is the case.

It is not hard to show that Fourier Checking is in BQP: basically, one can prepare a uniform superposition over all x_{∈ {0, 1}}n, then query f , apply a quantum Fourier transform, query g, and check whether one has recovered something close to the uniform superposition. On the other hand, being forrelated seems like an extremely “global” property of f and g: one that would not be apparent from querying any small number of f (x) and g (y) values. And thus, one might conjecture that Fourier Checking (as an oracle problem) is not in PH.

In this paper, we adduce strong evidence for that conjecture. Specifically, we show that for every k_{≤ 2}n/4_{, the forrelated distribution over}_{hf, gi pairs is O k}2_/2n/2_{-almost k-wise independent. By}

this we mean that, if one had 1/2 prior probability that f and g were uniformly random, and 1/2 prior probability that f and g were forrelated, then even conditioned on any k values of f and g, the posterior probability that f and g were forrelated would still be

1 2± O k2 2n/2 .

We conjecture that this almost k-wise independence property is enough, by itself, to imply that an oracle problem is not in PH. We call this the Generalized Linial-Nisan Conjecture.

Without the _{±O k}2/2n/2 error term, our conjecture would be equivalent7 to a famous con-jecture in circuit complexity made by Linial and Nisan [31] in 1990. Their concon-jecture stated that

polylogarithmic independence fools AC0: in other words, every probability distribution over N -bit strings that is uniform on every small subset of bits, is indistinguishable from the truly uniform distribution by AC0 circuits. When we began investigating this topic a year ago, even the original Linial-Nisan Conjecture was still open. Since then, Braverman [15] (building on earlier work by Bazzi [8] and Razborov [33]) has given a beautiful proof of that conjecture. In other words, to construct an oracle relative to which BQP_{6⊂ PH, it now suffices to generalize Braverman’s Theorem} from k-wise independent distributions to almost k-wise independent ones. We believe that this is by far the most promising approach to the BQP versus PH problem.

Alas, generalizing Braverman’s proof is much harder than one might have hoped. To prove the original Linial-Nisan Conjecture, Braverman showed that every AC0 function f : {0, 1}n → {0, 1} can be well-approximated, in the L1-norm, by low-degree sandwiching polynomials: real polynomials

pℓ, pu: Rn→ R, of degree O (polylog n), such that pℓ(x)≤ f (x) ≤ pu(x) for all x∈ {0, 1}n. Since

pℓ and pu trivially have the same expectation on any k-wise independent distribution that they

have on the uniform distribution, one can show that f must have almost the same expectation as well. To generalize Braverman’s result from k-wise independence to almost k-wise independence, we will show that it suffices to construct low-degree sandwiching polynomials that satisfy a certain additional condition. This new condition (which we call “low-fat”) basically says that pℓ and pu

must be representable as linear combinations of terms (that is, products of xi’s and (1− xi)’s), in

such a way that the sum of the absolute values of the coefficients is bounded—thereby preventing “massive cancellations” between positive and negative terms. Unfortunately, while we know two techniques for approximating AC0 functions by low-degree polynomials—that of Linial-Mansour-Nisan [30] and that of Razborov [32] and Smolensky [39]—neither technique provides anything like

(8)

the control over coefficients that we need. To construct low-fat sandwiching polynomials, it seems necessary to reprove the LMN and Razborov-Smolensky theorems in a more “conservative,” less “profligate” way. And such an advance seems likely to lead to breakthroughs in circuit complexity and computational learning theory having nothing to do with quantum computing.

Let us mention three further applications of Fourier Checking:

(1) If the Generalized Linial-Nisan Conjecture holds, then just like with Fourier Fishing, we can “scale down by an exponential,” to obtain a promise problem that is in BQLOGTIME but not in AC0.

(2) Without any assumptions, we can prove the new results that there exist oracles relative to which BQP_{6⊂ BPP}pathand BQP6⊂ SZK. We can also reprove all previous oracle separations

between BQP and classical complexity classes in a unified fashion.

(3) Independent of its applications to complexity classes, Fourier Checking yields the largest gap between classical and quantum query complexity yet known. Its bounded-error random-ized query complexity is Ω(N1/4_{) (and we conjecture Θ(}√_{N )), while its quantum query}

com-plexity is only O (1). Prior to this work, the best known gaps were of the form NΩ(1) versus O (log N ) (for Simon’s and Shor’s algorithms), or Θ (log N ) versus O (1) (for the Bernstein-Vazirani problem).8 _{Buhrman et al. [17] explicitly raised the open problem of finding a}

larger gap. We also get a property (namely, the property of being forrelated) that takes NΩ(1) queries to test classically but only O (1) queries to test quantumly.

To summarize our main conclusions:

Theorem 3 Assuming the Generalized Linial-Nisan Conjecture, there exists an oracle A relative

to which BQPA6⊂ PHA, and there also exists a promise problem in BQLOGTIME\ AC0. Uncondi-tionally, there exists an oracle A relative to which BQPA_{6⊂ BPP}A_path and BQPA_{6⊂ SZK}A.

As a candidate problem, Fourier Checking has at least five advantages over the Recursive Fourier Sampling_{problem of Bernstein and Vazirani [13]. First, it is much simpler to define and} reason about. Second, Fourier Checking has the almost k-wise independence property, which is not shared by Recursive Fourier Sampling, and which immediately connects the former to general questions about pseudorandomness against constant-depth circuits. Third, Fourier Checking _{can yield exponential separations between quantum and classical models, rather than} just quasipolynomial ones. Fourth, one can hope to use Fourier Checking to give an oracle relative to which BQP is not in PH [nc] (or PH with nc alternations) for any fixed c; by contrast, Recursive Fourier Sampling_{is in PH [log n]. Finally, it is at least conceivable that the quantum} algorithm for Fourier Checking is good for something. We leave the challenge of finding an explicit computational problem that “instantiates” Fourier Checking, in the same way that Factoring_{and Discrete Logarithm instantiated Shor’s period-finding problem.}

8_{De Beaudrap, Cleve, and Watrous [10] gave an N}Ω(1) _{versus O (1) separation, but for a non-standard model}

(9)

1.3 In Defense of Oracles

This paper is concerned with finding oracles relative to which BQP outperforms classical complexity classes. As such, it is open to the usual objections: “But don’t oracle results mislead us about the ‘real’ world? What about non-relativizing results like IP = PSPACE [35]?”

In our view, it is most helpful to think of oracle separations, not as strange metamathematical claims, but as lower bounds in a concrete computational model that is natural and well-motivated

in its own right. The model in question is query complexity, where the resource to be minimized

is the number of accesses to a very long input string. When someone gives an oracle A relative to which _CA_{6⊂ D}A_{, what they really mean is simply that they have found a problem that}_{C machines}

can solve using superpolynomially fewer queries than D machines. In other words, C has “cleared the first possible obstacle”—the query complexity obstacle—to having capabilities beyond those of D. Of course, it could be (and sometimes is) that C ⊆ D for other reasons, but if we do not even have a query complexity lower bound, then proving one is in some sense the obvious place to start. Oracle separations have played a role in many of the central developments of both classical and quantum complexity theory. As mentioned earlier, proving query complexity lower bounds for PH machines is essentially equivalent to proving size lower bounds for AC0 circuits—and indeed, the pioneering AC0 lower bounds of the early 1980s were explicitly motivated by the goal of prov-ing oracle separations for PH.9 _{Within quantum computing, oracle results have played an even}

more decisive role: the first evidence for the power of quantum computers came from the oracle separations of Bernstein-Vazirani [13] and Simon [38], and Shor’s algorithm [37] contains an oracle algorithm (for the Period-Finding problem) at its core.

Having said all that, if for some reason one still feels averse to the language of oracles, then (as mentioned before) one is free to scale everything down by an exponential, and to reinterpret a relativized separation between BQP and PH as an unrelativized separation between BQLOGTIME and AC0.

2 Preliminaries

It will be convenient to consider Boolean functions of the form f :{0, 1}n→ {−1, 1}. Throughout this paper, we let N = 2n; we will often view the truth table of a Boolean function as an “input” of size N . Given a Boolean function f :_{{0, 1}}n_{→ {−1, 1}, the Fourier transform of f is defined as}

b f (z) := √1 N X x∈{0,1}n (₋₁₎x·zf (x) . Recall Parseval’s identity: _X

x∈{0,1}n f (x)2= X z∈{0,1}n b f (z)2 = N. 2.1 Problems

We first define the Fourier Fishing problem, in both “distributional” and “promise” versions. In the distributional version, we are given oracle access to n Boolean functions f1, . . . , fn:{0, 1}n → 9_{Yao’s paper [43] was entitled “Separating the polynomial-time hierarchy by oracles”; the Furst-Saxe-Sipser paper}

(10)

{−1, 1}, which are chosen uniformly and independently at random. The task is to output n strings, z1, . . . , zn ∈ {0, 1}n, at least 75% of which satisfy

bfi(zi)

≥ 1 and at least 25% of which satisfy

bfi(zi)

≥ 2. (Note that these thresholds are not arbitrary, but were carefully chosen to produce a separation between the quantum and classical models!)

We now want a version of Fourier Fishing that removes the need to assume the fi’s are

uniformly random, replacing it with a worst-case promise on the fi’s. Call an n-tuplehf1, . . . , fni

of Boolean functions good if

n X i=1 X zi:|fbi(zi)|≥1 b fi(zi)2 ≥ 0.8Nn, n X i=1 X zi:|fbi(zi)|≥2 b fi(zi)2 ≥ 0.26Nn.

(We will show in Lemma 8 that the vast majority of_hf1, . . . , fni are good.) In Promise Fourier

Fishing_{, we are given oracle access to Boolean functions f}₁_{, . . . , f}_n_:_{{0, 1}}n_{→ {−1, 1}, which are} promised to be good. The task, again, is to output strings z1, . . . , zn ∈ {0, 1}n, at least 75% of

which satisfy _bfi(zi)

≥ 1 and at least 25% of which satisfy bfi(zi) ≥ 2.

Next we define a decision problem called Fourier Checking. Here we are given oracle access to two Boolean functions f, g :_{{0, 1}}n_{→ {−1, 1}. We are promised that either}

(i) hf, gi was drawn from the uniform distribution U, which sets every f (x) and g (y) by a fair, independent coin toss.

(ii) hf, gi was drawn from the “forrelated” distribution F, which is defined as follows. First choose a random real vector v = (vx)_x∈{0,1}n ∈ RN, by drawing each entry independently

from a Gaussian distribution with mean 0 and variance 1. Then set f (x) := sgn (vx) and

g (x) := sgn (bvx) for all x. Here

sgn (α) :=

1 if α≥ 0 −1 if α < 0 and bv is the Fourier transform of v over Zn2:

b vy := 1 √ N X x∈{0,1}n (−1)x·yvx.

In other words, f and g individually are still uniformly random, but they are no longer independent: now g is extremely well correlated with the Fourier transform of f (hence “forrelated”).

The problem is to accept ifhf, gi was drawn from F, and to reject if hf, gi was drawn from U. Note that, since F and U overlap slightly, we can only hope to succeed with overwhelming probability over the choice of_{hf, gi, not for every hf, gi pair.}

(11)

We can also define a promise-problem version of Fourier Checking. In Promise Fourier Checking_{, we are promised that the quantity}

p (f, g) := 1 N3   X x,y∈{0,1}n f (x) (−1)x·yg (y)   2

is either at least 0.05 or at most 0.01. The problem is to accept in the former case and reject in the latter case.

2.2 Complexity Classes

See the Complexity Zoo10_{for the definitions of standard complexity classes, such as BQP, AM, and}

PH. When we writeCPH _{(i.e., a complexity class} _{C with an oracle for the polynomial hierarchy),}

we mean ∪k≥1CΣ P k.

We will consider not only decision problems, but also relation problems (also called function

problems). In a relation problem, the output is not a single bit but a poly (n)-bit string y. There

could be many valid y’s for a given instance, and the algorithm’s task is to output any one of them. The definitions of FP and FNP (the relation versions of P and NP) are standard. We now define FBPP and FBQP, the relation versions of BPP and BQP.

Definition 4 FBPP is the class of relations R _{⊆ {0, 1}}∗_{× {0, 1}}∗ for which there exists a proba-bilistic polynomial-time algorithm A that, given any input x _{∈ {0, 1}}n, produces an output y such that

Pr [(x, y)_{∈ R] = 1 − o (1) ,}

where the probability is over A’s internal randomness. (In particular, this implies that for every

x, there exists at least one y such that (x, y)_{∈ R.) FBQP is defined the same way, except that A}

is a quantum algorithm rather than a classical one.

An important point about FBPP and FBQP is that, as far as we know, these classes do not admit amplification. In other words, the value of an algorithm’s success probability might actually matter, not just the fact that the probability is bounded above 1/2. This is why we adopt the convention that an algorithm “succeeds” if it outputs (x, y) _{∈ R with probability 1 − o (1). In} practice, we will give oracle problems for which the FBQP algorithm succeeds with probability 1− 1/ exp (n), while any FBPPPH_{algorithm succeeds with probability at most (say) 0.99. How far}

the constant in this separation can be improved is an open problem.

Another important point is that, while BPPPH= PPH_{(which follows from BPP}_{⊆ Σ}P

2), the class

FBPPPH is strictly larger than FPPH. To see this, consider the relation

R =_{(0n, y) : K (y)_{≥ n} ,}

where we are given n, and asked to output any string of Kolmogorov complexity at least n. Clearly this problem is in FBPP: just output a random 2n-bit string. On the other hand, just as obviously the problem is not in FPPH. This is why we need to construct an oracle A such that FBQPA 6⊂ FBPPPHA: because constructing an oracle A such that FBQPA 6⊂ FPPHA _{is trivial and not even}

related to quantum computing.

(12)

We now discuss some “low-level” complexity classes. AC0 is the class of problems solvable by a nonuniform family of AND/OR/NOT circuits, with depth O (1), size poly (n), and unbounded fanin. When we say “AC0 circuit,” we mean a constant-depth circuit of AND/OR/NOT gates, not necessarily of polynomial size. Any such circuit can be made into a formula (i.e., a circuit of fanout 1) with only a polynomial increase in size. The circuit has depth d if it consists of d alternating layers of AND and OR gates (without loss of generality, the NOT gates can all be pushed to the bottom, and we do not count them towards the depth). For example, a DNF (Disjunctive Normal Form) formula is just an AC0 circuit of depth 2.

We will also be interested in quantum logarithmic time, which can be defined naturally as follows:

Definition 5 BQLOGTIME is the class of languages L _{⊆ {0, 1}}∗ that are decidable, with bounded probability of error, by a LOGTIME-uniform family of quantum circuits {Cn}_n such that each Cn

has O (log n) gates, and can include gates that make random-access queries to the input string

x = x1. . . xn (i.e., that map|ii |zi to |ii |z ⊕ xii for every i ∈ [n]).

One other complexity class that arises in this paper, which is less well known than it should be, is BPPpath. Loosely speaking, BPPpath can be defined as the class of problems that are solvable

in probabilistic polynomial time, given the ability to “postselect” (that is, discard all runs of the computation that do not produce a desired result, even if such runs are the overwhelming majority). Formally:

Definition 6 BPPpath is the class of languages L⊆ {0, 1}∗ for which there exists a BPP machine

M , which can either “succeed” or “fail” and conditioned on succeeding either “accept” or “reject,”

such that for all inputs x: (i) Pr [M (x) succeeds] > 0.

(ii) x_{∈ L =⇒ Pr [M (x) accepts | M (x) succeeds] ≥} 2₃. (iii) x /∈ L =⇒ Pr [M (x) accepts | M (x) succeeds] ≤ 1

3.

BPPpath was defined by Han, Hemaspaandra, and Thierauf [28], who also showed that MA ⊆

BPP_path and PNP

|| ⊆ BPPpath ⊆ BPPNP_|| . Using Fourier Checking, we will construct an oracle

A relative to which BQPA _{6⊂ BPP}A_path. This result might not sound amazing, but (i) it is new, (ii) it does not follow from the “standard” quantum algorithms, such as those of Simon [38] and Shor [37], and (iii) it supersedes almost all previous oracle results placing BQP outside classical complexity classes.11 _{As another illustration of the versatility of Fourier Checking, we use it}

to give an A such that BQPA 6⊂ SZKA, where SZK is Statistical Zero Knowledge. The opposite direction—an A such that SZKA6⊂ BQPA—was shown by Aaronson [1] in 2002.

3 Quantum Algorithms

In this section, we show that Fourier Fishing and Fourier Checking both admit simple quantum algorithms.

11_{The one exception is the result of Green and Pruim [27] that there exists an A relative to which BQP}A

6⊂ PNPA_,

(13)

Figure 1: The Fourier coefficients of a random Boolean function follow a Gaussian distribution, with mean 0 and variance 1. However, larger Fourier coefficients are more likely to be observed by the quantum algorithm.

3.1 Quantum Algorithm for Fourier Fishing

Here is a quantum algorithm, FF-ALG, that solves Fourier Fishing with overwhelming probability in O n2 time and n quantum queries (one to each fi). For i := 1 to n, first prepare the state

1 √ N X x∈{0,1}n fi(x)|xi ,

then apply Hadamard gates to all n qubits, then measure in the computational basis and output the result as zi.

Intuitively, FF-ALG samples the Fourier coefficients of each fiunder a distribution that is skewed

towards larger coefficients; the algorithm’s behavior is illustrated pictorially in Figure 1. We now give a formal analysis. Recall the definition of a “good” tuple _hf1, . . . , fni from Section 2.1.

Assuminghf1, . . . , fni is good, it is easy to analyze FF-ALG’s success probability.

Lemma 7 Assuming _hf1, . . . , fni is good, FF-ALG succeeds with probability 1 − 1/ exp (n).

Proof. Let_hz1, . . . , zni be the algorithm’s output. For each i, let Xi be the event that

bfi(zi) ≥ 1 and let Yi be the event that

bfi(zi)

≥ 2. Also let pi := Pr [Xi] and qi := Pr [Yi], where the

probability is over FF-ALG’s internal (quantum) randomness. Then clearly

pi= 1 N X zi:|fbi(zi)|≥1 b fi(zi)2, qi= 1 N X zi:|fbi(zi)|≥2 b fi(zi)2.

(14)

So by assumption,

p1+· · · + pn≥ 0.8n,

q1+· · · + qn≥ 0.26n.

By a Chernoff/Hoeffding bound, it follows that

Pr [X1+· · · + Xn≥ 0.75n] > 1 − 1 exp (n), Pr [Y1+· · · + Yn≥ 0.25n] > 1 − 1 exp (n).

Hence FF-ALG succeeds with 1− 1/ exp (n) probability by the union bound. We also have the following:

Lemma 8 _hf1, . . . , fni is good with probability 1 − 1/ exp (n), if the fi’s are chosen uniformly at

random.

Proof. Choose f :_{{0, 1}}n_{→ {−1, 1} uniformly at random. Then for each z, the Fourier coefficient} b

f (z) follows a normal distribution, with mean 0 and variance 1. So in the limit of large N ,

E f    X z:_|f (z)b _|≥1 b f (z)2    = X z∈{0,1}n Prh_bf (z)_{≥ 1}iEhf (z)b 2 _| _bf (z)_{≥ 1}i ≈ √2N 2π Z _∞ 1 e−x2/2x2dx ≈ 0.801N. Likewise, E f    X z:|f (z)b |≥2 b f (z)2    ≈ √2N 2π Z _∞ 2 e−x2/2x2dx ≈ 0.261N.

Since the fi’s are chosen independently of one another, it follows by a Chernoff bound that n X i=1 X zi:|fbi(zi)|≥1 b fi(zi)2 ≥ 0.8Nn, n X i=1 X zi:|fbi(zi)|≥2 b fi(zi)2 ≥ 0.26Nn

with probability 1_{− 1/ exp (n) over the choice of hf}1, . . . , fni.

Combining Lemmas 7 and 8, we find that FF-ALG succeeds with probability 1−1/ exp (n), where the probability is over bothhf1, . . . , fni and FF-ALG’s internal randomness.

(15)

3.2 Quantum Algorithm for Fourier Checking

We now turn to Fourier Checking, the problem of deciding whether two Boolean functions f, g are independent or forrelated. Here is a quantum algorithm, FC-ALG, that solves Fourier Check-ing_{with constant error probability using O (1) queries. First prepare a uniform superposition over} all x_{∈ {0, 1}}n. Then query f in superposition, to create the state

1 √ N X x∈{0,1}n f (x)|xi Then apply Hadamard gates to all n qubits, to create the state

1 N

X

x,y∈{0,1}n

f (x) (₋₁₎x·y_{|yi .} Then query g in superposition, to create the state

1 N

X

x,y∈{0,1}n

f (x) (−1)x·yg (y)|yi . Then apply Hadamard gates to all n qubits again, to create the state

1 N3/2

X

x,y,z∈{0,1}n

f (x) (₋₁₎x·yg (y) (₋₁₎y·z_{|zi .}

Finally, measure in the computational basis, and “accept” if and only if the outcome _|0i⊗n is observed. If needed, repeat the whole algorithm O (1) times to boost the success probability.

It is clear that the probability of observing _|0i⊗n (in a single run of FC-ALG) equals

p (f, g) := 1 N3   X x,y∈{0,1}n f (x) (−1)x·yg (y)   2 .

Recall that Promise Fourier Checking was the problem of deciding whether p (f, g) _{≥ 0.05} or p (f, g) ≤ 0.01, promised that one of these is the case. Thus, we immediately get a quantum algorithm to solve Promise Fourier Checking, with constant error probability, using O (1) queries to f and g.

For the distributional version of Fourier Checking, we also need the following theorem.

Theorem 9 If hf, gi is drawn from the uniform distribution U, then E

U[p (f, g)] =

1 N.

If _{hf, gi is drawn from the forrelated distribution F, then}

E

(16)

Proof. The first part follows immediately by symmetry (i.e., the fact that all N = 2nmeasurement outcomes of the quantum algorithm are equally likely).

For the second part, let v∈ RN _{be the vector of independent Gaussians used to generate f and}

g, let w = v/_kvk₂be v scaled to have unit norm, and let H be the n-qubit Hadamard matrix. Also let flat (w) be the unit vector whose xth _{entry is sgn (w}

x) /

√

N = f (x) /√N , and let flat (Hw) be the unit vector whose xth _{entry is sgn (bv}x) /

√

N = g (x) /√N . Then p (f, g) equals

flat (w)T H flat (Hw)2,

or the squared inner product between the vectors flat (w) and H flat (Hw). Note that wT_{· HHw =}

wTw = 1. So the whole problem is to understand the “discretization error” incurred in replacing wT by flat (w)T and HHw by H flat (Hw). By the triangle inequality, the angle between flat (w) and H flat (Hw) is at most the angle between flat (w) and w, plus the angle between w and H flat (Hw). In other words:

arccosflat (w)T H flat (Hw)_{≤ arccos}flat (w)Tw+ arccos wTH flat (Hw). Now, flat (w)T w = X x∈{0,1}n wx·√1 N |wx| wx = √1 N X x∈{0,1}n |wx| = P x∈{0,1}n|vx| √ N_kvk₂ .

Recall that each vx is an independent real Gaussian with mean 0 and variance 1, meaning that

each |vx| is an independent nonnegative random variable with expectation p2/π. So by standard

tail bounds, for all constants ε > 0 we have

Pr v   X x∈{0,1}n |vx| ≤ r 2 π − ε ! N   ≤ 1 exp (N ), Prhkvk22≥ (1 + ε) N i ≤ 1 exp (N ). So by the union bound,

Pr v " flat (w)T w≤ r 2 π − ε # ≤ 1 exp (N ).

Since H is unitary, the same analysis applies to wT_{H flat (Hw). Therefore, for all constants ε > 0,}

with 1− 1/ exp (N) probability we have

arccosflat (w)T w_≤ arccos r

2 π

! + ε,

arccos wTH flat (Hw)_≤ arccos r

2 π

! + ε.

(17)

So setting ε = 0.0001,

arccosflat (w)T H flat (Hw)≤ arccosflat (w)T w+ arccos wTH flat (Hw) ≤ 2 arccos r 2 π ! + 2ε ≤ 1.3

Therefore, with 1_{− 1/ exp (N) probability over hf, gi drawn from F,} flat (w)T H flat (Hw)≥ cos 1.3, in which case p (f, g)≥ (cos 1.3)2≈ 0.072.

Combining Theorem 9 with Markov’s inequality, we immediately get the following:

Corollary 10 Pr hf,gi∼U[p (f, g)≥ 0.01] ≤ 100 N , Pr hf,gi∼D[p (f, g)≥ 0.05] ≥ 1 50.

4 The Classical Complexity of Fourier Fishing

In Section 3.1, we gave a quantum algorithm for Fourier Fishing that made only one query to each fi. By contrast, it is not hard to show that any classical algorithm for Fourier Fishing

requires exponentially many queries to the fi’s—or in complexity terms, that Fourier Fishing is

not in FBPP. In this section, we prove a much stronger result: that Fourier Fishing is not even in FBPPPH. This result does not rely on any unproved conjectures.

4.1 Constant-Depth Circuit Lower Bounds

Our starting point will be the following AC0 lower bound, which can be found in the book of H˚astad [40] for example.

Theorem 11 ([40]) Any depth-d circuit that accepts all n-bit strings of Hamming weight n/2 + 1,

and rejects all strings of Hamming weight n/2, has size exp Ω n1/(d−1).

We now give a corollary of Theorem 11, which (though simple) seems to be new, and might be of independent interest. Consider the following problem, which we call ε-Bias Detection. We are given a string y = y1. . . ym ∈ {0, 1}m, and are promised that each bit yi is 1 with independent

probability p. The task is to decide whether p = 1/2 or p = 1/2 + ε.

Corollary 12 Let U [ε] be the distribution over {0, 1}m where each bit is 1 with independent prob-ability 1/2 + ε. Then any depth-d circuit C such that

Pr_x∼U[ε][C (x)]− Pr_x∼U[0][C (x)] = Ω (1)

(18)

Proof. Suppose such a distinguishing circuit C exists, with depth d and size S, for some ε > 0 (the parameter m is actually irrelevant). Let n = 1/ε, and assume for simplicity that n is an integer. Using C, we will construct a new circuit C′ with depth d′ = d + 3 and size S′= O (nS) + poly (n), which accepts all strings x_{∈ {0, 1}}nof Hamming weight n/2+ 1, and rejects all strings of Hamming weight n/2. By Theorem 11, this will imply that the original circuit C must have had size

S = 1 nexp

Ωn1/(d′−1)_{− poly (n)} = expΩ1/ε1/(d+2).

So fix an input x_{∈ {0, 1}}n, and suppose we choose m bits xi1, . . . , xim from x, with each index ij

chosen uniformly at random with replacement. Call the resulting m-bit string y. Observe that if x had Hamming weight n/2, then y will be distributed according to_{U [0], while if x had Hamming} weight n/2 + 1, then y will be distributed according to_{U [ε]. So by assumption,}

Pr [C (y) _{| |x| = n/2] = α,} Pr [C (y) _{| |x| = n/2 + 1] = α + δ}

for some constants α and δ6= 0 (we can assume δ > 0 without loss of generality).

Now suppose we repeat the above experiment T = kn times, for some constant k = k (α, δ). That is, we create T strings y1, . . . , yT by choosing random bits of x, so that each yi is distributed

independently according toU [0] if |x| = n/2 or U [ε] if |x| = n/2 + 1. We then apply C to each yi.

Let

Z = C (y1) +· · · + C (yT)

be the number of C invocations that accept. Then by a Chernoff bound, if _{|x| = n/2 then} Pr Z > αT +δ 3T < exp (_{−n) ,} while if _{|x| = n/2 + 1 then} Pr Z < αT +2δ 3 T < exp (_{−n) .}

By taking k large enough, we can make both of these probabilities less than 2−n. By the union bound, this implies that there must exist a way to choose y1, . . . , yT so that

|x| = n₂ =_{⇒ Z ≤ αT +}δ 3T, |x| = n 2 + 1 =⇒ Z ≥ αT + 2δ 3 T

for every x with_{|x| ∈ {n/2, n/2 + 1} simultaneously. In forming the circuit C}′_{, we simply hardwire}

that choice.

The last step is to decide whether Z _{≤ αT +} δ₃T or Z _{≥ αT +} 2δ₃ T . This can be done using an AC0 circuit for the Approximate Majority problem (see Viola [41] for example), which has depth 3 and size poly (T ). The end result is a circuit C′to distinguish|x| = n/2 from |x| = n/2+1, which has depth d + 3 and size T S + poly (T ) = O (nS) + poly (n).

(19)

4.2 Secretly Biased Fourier Coefficients

In this section, we prove two lemmas indicating that one can slightly bias one of the Fourier coeffi-cients of a random Boolean function f :_{{0, 1}}n_{→ {−1, 1}, and yet still have f be indistinguishable} from a random Boolean function (so that, in particular, an adversary has no way of knowing which Fourier coefficient was biased). These lemmas will play a key role in our reduction from ε-Bias Detection _{to Fourier Fishing.}

Fix a string s∈ {0, 1}n. Let A [s] be the probability distribution over functions f : {0, 1}n → {−1, 1} where each f (x) is 1 with independent probability 12 + (−1)s·x2√1N, and let B [s] be the

distribution where each f (x) is 1 with independent probability 1₂_{− (−1)}s·x 1

2√N. Then let D [s] = 1

2(A [s] + B [s]) (that is, an equal mixture of A [s] and B [s]).

Now consider the following game involving Alice and Bob, which closely models our problem. First, Alice chooses a string s∈ {0, 1}n uniformly at random. She then draws f according toD [s]. She keeps s secret, but sends the truth table of f to Bob. After examining the entire truth table, Bob can output any string z ∈ {0, 1}n such that _bf (z)_{≥ β. Bob’s goal is to avoid outputting s.} The following lemma says that, regardless of Bob’s strategy, if β is sufficiently greater than 1, then Bob must output s with probability significantly greater than 1/N .

Lemma 13 In the above game, we have

Pr [s = z]_≥ e

β_{+ e}−β

2√eN .

where the probability is over s, f , and Bob’s random choices.

The intuition behind Lemma 13 is simple: f looks to Bob almost exactly like a random Boolean function. Let Vβ be the set of all s∈ {0, 1}n such that

bf (s)_{≥ β. Then when Alice biases f ,} of course she significantly increases the probability that s∈ Vβ. But the biased Fourier coefficient

b

f (s) still gets “lost in the noise”—that is, it is information-theoretically impossible for Bob to tell s apart from the exponentially many other strings z that are in Vβ just by chance. Therefore, the

best Bob can do is basically just to output a random element of Vβ. But since |Vβ| ≪ N, this

strategy will cause Bob to output s with probability _{≫ 1/N.}

Formalizing the above intuition involves a surprisingly simple Bayesian calculation. Basically, all we need to do is fix f together with Bob’s output z (which we know belongs to Vβ), and then

calculate how likely f was to be drawn from _{D [z] rather than D [z}′_{] for some z}′ _{6= z.}

To see the connection to our problem, think of Bob as a putative AC0 circuit for Fourier Fishing_{, and Alice as an AC}0 _{reduction that uses Bob to solve ε-Bias Detection. Bob might}

try not to cooperate, but by choosing s randomly, Alice can force even an adversarial Bob to give

her useful information about the magnitude of bf (s).

Proof of Lemma 13. By Yao’s principle, we can assume without loss of generality that Bob’s strategy is deterministic. For each z, let_{F [z] be the set of all f’s that cause Bob to output z. Then} the first step is to lower-bound Pr_D[z][f ], for some fixed z and f ∈ F [z]. Let Nf[z] be the number

of inputs x _{∈ {0, 1}}n such that f (x) = (₋₁₎z·x. It is not hard to see that Nf[z] = N₂ + √

N bf (z) 2 .

(20)

So Pr D[z][f ] = 1 2 Pr A[z][f ] + PrB[z][f ] = 1 2   Y x∈{0,1}n 1 2 + (−1)z·xf (x) 2√N + Y x∈{0,1}n 1 2 − (−1)z·xf (x) 2√N   = 1 2N +1 1 + √1 N Nf[z] 1₋√1 N N −Nf[z] + 1₋√1 N Nf[z] 1 + √1 N N −Nf[z]! = 1 2N +1 1₋ 1 N N/2  1 + 1/ √ N 1_{− 1/}√N !√N bf (z)/2 + 1− 1/ √ N 1 + 1/√N !√N bf (z)/2  ≥ ₂√1 e2N ef (z)b + e− bf (z) ≥ e β_{+ e}−β 2√e2N .

Here the last line follows from the assumption _bf (z)_{≥ β, together with the fact that e}y + e−y increases monotonically away from y = 0.

Summing over all z and f ,

Pr [s = z] = X z∈{0,1}n X f ∈F[z] Pr [f ]_{· Pr [s = z | f]} = X z∈{0,1}n X f ∈F[z] Pr [f ]·Pr [f | s = z] Pr [s = z] Pr [f ] = 1 N X z∈{0,1}n X f ∈F[z] Pr D[z][f ] ≥ e β_{+ e}−β 2√eN .

Now let _{D = E}s[D [s]] (that is, an equal mixture of all the D [s]’s). We claim that D is

extremely close in variation distance to U, the uniform distribution over all Boolean functions f :_{{0, 1}}n_{→ {−1, 1}.}

Lemma 14 _{kD − Uk = O}1/√N.

Proof. By a calculation from Lemma 13, for all f and s we have

Pr D[s][f ] = 1 2√e2N ef (s)b + e− bf (s) in the limit of large N . Hence

Pr D [f ] = Es Pr D[s][f ] = 1 2√eN 2N X s∈{0,1}n ef (s)b + e− bf (s).

(21)

Clearly Ef[PrD[f ]] = 1/2N. Our goal is to upper-bound the variance Varf[PrD[f ]], which

mea-sures the distance from _{D to the uniform distribution. In the limit of large N, we have} E f Pr D [f ] 2 = 1 4eN2₂2N  X s E f ef (s)b + e− bf (s)2 +X s6=t E f h ef (s)b + e− bf (s) ef (t)b + e− bf (t)i   = 1 4eN2₂2N   P s √1_2π R_∞ −∞e−x 2_/2 (ex_{+ e}−x₎2_dx +P_s6=th√1 2π R∞ −∞e−x 2_/2 (ex+ e−x) dxi2   = 1 4eN2₂2N 2e2+ 2N + 4eN (N _{− 1)} = 1 22N 1 + (e_{− 1)}2 2eN ! . Hence Var f Pr D [f ] = E f Pr D [f ] 2 − E f Pr D [f ] 2 = (e− 1) 2 2eN 22N. So by Cauchy-Schwarz, E f Pr_D [f ]− Pr_U [f ] ≤ s Var f Pr D [f ] = √e− 1 2eN · 1 2N and kD − Uk = 1 2 X f Pr_D [f ]− Pr_U [f ] = O 1 √ N .

An immediate corollary of Lemma 14 is that, if a Fourier Fishing algorithm succeeds with probability p onhf1, . . . , fni drawn from Un, then it also succeeds with probability at least

p_{− kD}n_{− U}n_{k ≥ p − O} n √ N on _hf1, . . . , fni drawn from Dn. 4.3 Putting It All Together

Using the results of Sections 4.1 and 4.2, we are now ready to prove a lower bound on the constant-depth circuit complexity of Fourier Fishing.

Theorem 15 Any depth-d circuit that solves the Fourier Fishing problem, with probability at

least 0.99 over f1, . . . , fn chosen uniformly at random, has size exp Ω N1/(2d+8).

Proof. Let C be a circuit of depth d and size s. Let G be the set of all_hf1, . . . , fni on which C

succeeds: that is, for which it outputs z1, . . . , zn, at least 75% of which satisfy

bfi(zi) ≥ 1 and at least 25% of which satisfy_bfi(zi)

≥ 2. Suppose Pr

(22)

Then by Lemma 14, we also have Pr Dn[hf1, . . . , fni ∈ G] ≥ 0.99 − O n √ N ≥ 0.98 for sufficiently large n.

Using the above fact, we will convert C into a new circuit C′ that solves the ε-Bias Detection problem of Corollary 12, with ε := 1

2√N. This C′ will have depth d′ = d + 2 and size S′ = O (N S).

By Corollary 12, this will imply that C itself must have had size

S = expΩ1/ε1/(d′+2) = expΩN1/(2d+8). Let M = N2_{n, and let R = r}

1. . . rM ∈ {0, 1}M be a string of bits where each rj is 1 with

independent probability p. We want to decide whether p = 1/2 or p = 1/2 + ε—that is, whether R was drawn from_{U [0] or U [ε]. We can do this as follows. First, choose strings s}1, . . . , sn ∈ {0, 1}n,

bits b1, . . . , bn∈ {0, 1}, and an integer k ∈ [n] uniformly at random. Next, define Boolean functions

f1, . . . , fn:{0, 1}n → {−1, 1} using the first Nn bits of R, like so:

fi(x) := (−1)r(i−1)N+x+si·x+bi.

Finally, feed _hf1, . . . , fni as input to C, and consider zk, the kth output of C (discarding the other

n− 1 outputs). We are interested in Pr [zk= sk], where the probability is over R, s1, . . . , sn,

b1, . . . , bn, and k.

If p = 1/2, notice that f1, . . . , fnare independent and uniformly random regardless of s1, . . . , sn.

So C gets no information about sk, and Pr [zk= sk] = 1/N .

On the other hand, if p = 1/2 + ε, then each fi is drawn independently from the distribution

D [s] studied in Lemma 13. So by the lemma, for every i ∈ [n], if bfi(zi)

≥ β then Pr fi [zi = si]≥ eβ + e−β 2√eN . So assuming C succeeds (that is, hf1, . . . , fni ∈ G), we have

Pr f1,...,fn,k [zk= sk]≥ 1 4 e2_{+ e}−2 2√eN + 3 4 − 1 4 e1_{+ e}−1 2√eN ≥ 1.038 N .

So for a random _hf1, . . . , fni drawn according to Dn,

Pr f1,...,fn,k [zk= sk]≥ 0.98 1.038 N ≥ 1.017 N .

Notice that this is bounded above 1/N by a multiplicative constant.

Now we repeat the above experiment N times. That is, for all j := 1 to N , we generate Boolean functions fj1, . . . , fjn :{0, 1}n → {−1, 1} by the same probabilistic procedure as before,

(23)

sj1, . . . , sjn, bj1, . . . , bjn, and kj). We then apply C to each n-tuple hfj1, . . . , fjni. Let zj be the

kth

j string that C outputs when run onhfj1, . . . , fjni. Then by the above, for each j ∈ [N] we have

p = 1 2 =⇒ Pr zj = sjkj = 1 N, p =1 2 + ε =⇒ Pr zj = sjkj ≥ 1.017_N .

Furthermore, these probabilities are independent across the different j’s. So let E be the event that there exists a j _{∈ [N] such that z}j = sjkj. Then if p = 1/2 we have

Pr [E] = 1₋ 1₋ 1 N N ≈ 1 −1_e ≤ 0.633, while if p = 1/2 + ε we have Pr [E]_{≥ 1 −} 1₋1.017 N N ≥ 0.638.

We can now create the circuit C′, which distinguishes R ∈ {0, 1}M drawn from U [0] from R drawn from U [ε] with constant bias. For each j ∈ [N], generate an n-tuple of Boolean functions hfj1, . . . , fjni from R and apply C to it; then check whether there exists a j ∈ [N] such that

zj = sjkj. This checking step can be done by a depth-2 circuit of size O (N n). Therefore, C′ will

have depth d′ = d + 2 and size s′ = O (N s). A technicality is that our choices of the sji’s, bji’s,

and kj’s were made randomly. But by Yao’s principle, there certainly exist sji’s, bji’s, and kj’s

such that Pr U[ε] C′(R)_{− Pr} U[0] C′(R)_{≥ 0.638 − 0.633 = 0.005.} So in forming C′_{, we simply hardwire those choices.}

Combining Theorem 15 with standard diagonalization tricks, we can now prove an oracle sep-aration (in fact, a random oracle sepsep-aration) between the complexity classes FBQP and FBPPPH.

Theorem 16 FBQPA_{6⊂ FBPP}PHA with probability 1 for a random oracle A.

Proof. We interpret A as encoding n random Boolean functions fn1, . . . , fnn :{0, 1}n→ {−1, 1}

for each positive integer n. Let R be the relational problem where we are given 0n _{as input,}

and succeed if and only if we output strings z1, . . . , zn ∈ {0, 1}n, at least 3/4 of which satisfy

bfni(zi)

≥ 1 and at least 1/4 of which satisfy bfni(zi)

≥ 2. Then by Lemmas 7 and 8, there exists an FBQPAmachine M such that for all n,

Pr [M (0n) succeeds]_{≥ 1 −} 1 exp (n),

where the probability is over both A and the quantum randomness. Hence Pr [M (0n) succeeds]_≥ 1_{− 1/ exp (n) on all but finitely many n, with probability 1 over A. Since we can simply hardwire} the answers on the n’s for which M fails, it follows that R∈ FBQPAwith probability 1 over A.

On the other hand, let M be an FBPPPHA machine. Then by the standard conversion between PHand AC0, for every n there exists a probabilistic AC0 circuit CM,n, of size 2poly(n) = 2polylog(N ),

(24)

that takes A as input and simulates M (0n). By Yao’s principle, we can assume without loss of generality that CM,n is deterministic, since the oracle A is already random. Then by Theorem 15,

Pr

A [CM,n succeeds] < 0.99

for all sufficiently large n. By the independence of the fni’s, this is true even if we condition on

CM,1, . . . , CM,n−1 succeeding. So as in the standard random oracle argument of Bennett and Gill

[12], for every fixed M we have Pr

A [CM,1, CM,2, CM,3, . . . all succeed] = 0.

So by the union bound, Pr

A [∃M : CM,1, CM,2, CM,3, . . . all succeed] = 0

as well. It follows that FBQPA6⊂ FBPPPHA _{with probability 1 over A.}

If we “scale down by an exponential,” then we can eliminate the need for the oracle A, and get a relation problem that is solvable in quantum logarithmic time but not in AC0.

Theorem 17 There exists a relation problem in BQLOGTIME_{\ AC}0.

Proof. In our relation problem R, the input (of size M = 2nn) will encode the truth tables of n Boolean functions, f1, . . . , fn : {0, 1}n → {−1, 1}, which are promised to be “good” as defined in

Section 2.1. The task is to solve Promise Fourier Fishing on _hf1, . . . , fni.

By Lemma 7, there exists a quantum algorithm that runs in O (n) = O (log M ) time, making random accesses to the truth tables of f1, . . . , fn, that solves R with probability 1− 1/ exp (n) =

1_{− 1/M}Ω(1)_.

On the other hand, suppose R is in AC0. Then we get a nonuniform circuit family{Cn}_n, of

depth O (1) and size poly (M ) = 2O(n), that solves Fourier Fishing on all tupleshf1, . . . , fni that

are good. Recall that by Lemma 8, a 1_{− 1/ exp (n) fraction of hf}1, . . . , fni’s are good. Therefore

{Cn}_n actually solves Fourier Fishing with probability 1− 1/ exp (n) on hf1, . . . , fni chosen

uniformly at random. But this contradicts Theorem 15.

Hence R _{∈ FBQLOGTIME \ FAC}0 (where FBQLOGTIME and FAC0 are the relation versions of BQLOGTIMEand AC0 respectively).

5 The Classical Complexity of Fourier Checking

Section 4 settled the relativized BQP versus PH question, if we are willing to talk about relation problems. Ultimately, though, we also care about decision problems. So in this section we consider the Fourier Checking problem, of deciding whether two Boolean functions f, g are independent or forrelated. In Section 3.2, we saw that Fourier Checking has quantum query complexity O (1). What is its classical query complexity?12

It is not hard to give a classical algorithm that solves Fourier Checking using O√N= O 2n/2 queries. The algorithm is as follows: for some K = Θ√N, first choose sets X =

12_{So long as we consider the distributional version of Fourier Checking, the deterministic and randomized query}

(25)

{x1, . . . , xK} and Y = {y1, . . . , yK} of n-bit strings uniformly at random. Then query f (xi) and

g (yi) for all i∈ [K]. Finally, compute

Z :=

K

X

i,j=1

f (xi) (−1)xi·yjg (yj) ,

accept if|Z| is greater than some cutoff cK, and reject otherwise. For suitable K and c, one can show that this algorithm accepts a forrelated hf, gi pair with probability at least 2/3, and accepts a random_{hf, gi pair with probability at most 1/3. We omit the details of the analysis, as they are} tedious and not needed elsewhere in the paper.

In the next section, we will show that Fourier Checking has a property called almost k-wise

independence, which immediately implies a lower bound of Ω√4

N= Ω 2n/4 _{on its randomized}

query complexity13 _{(as well as exponential lower bounds on its MA, BPP}

path, and SZK query

complexities). Indeed, we conjecture that almost k-wise independence is enough to imply that Fourier Checking _{is not in PH. We discuss the status of that conjecture in Section 6.}

5.1 Almost k-Wise Independence

Let X = x1. . . xN ∈ {0, 1}N be a string. Then a literal is an expression of the form xi or 1− xi,

and a k-term is a product of k literals (each involving a different xi), which is 1 if the literals all

take on prescribed values and 0 otherwise. We can give an analogous definition for strings over the alphabet{−1, 1}; we simply let the literals have the form 1±xi

2 .

LetU be the uniform distribution over N-bit strings. The following definition will play a major role in this work.

Definition 18 A distribution _{D over N-bit strings is ε-almost k-wise independent if for every} k-term C,

1− ε ≤ PrX∼D[C (X)]

Pr_X∼U[C (X)] ≤ 1 + ε.

(Note that Pr_X∼U[C (X)] is just 2−k_.)

In other words, there should be no assignment to any k input bits, such that conditioning on that assignment gives us much information about whether X was drawn from _{D or U.}

Now let_{F be the forrelated distribution over pairs of Boolean functions f, g : {0, 1}}n_{→ {−1, 1},} where we first choose a vector v = (vx)_x∈{0,1}n ∈ RN of independentN (0, 1) Gaussians, then set

f (x) := sgn (vx) and g (x) := sgn (bvx) for all x∈ {0, 1}n.

Theorem 19 _{F is O}k2/√N-almost k-wise independent for all k_≤√4

N .

Proof. As a first step, we will prove an analogous statement for the real-valued functions F (x) := vx and G (y) := bvy; then we will generalize to the discrete versions f (x) and g (y). LetU′ be the

probability measure over _{hF, Gi that corresponds to case (i) of Fourier Checking: that is, we}

13_{With some more work, we believe that the randomized query complexity lower bound can be improved to}

(26)

choose each F (x) and G (x) value independently from the Gaussian measure _{N (0, 1). Let F}′ be the probability measure over_{hF, Gi that corresponds to case (ii) of Fourier Checking: that is,} we choose each F (x) independently fromN (0, 1), and then set G (y) := bF (y) where

b F (y) = √1 N X x∈{0,1}n (−1)x·yF (x)

is the Fourier transform of F . Observe that since the Fourier transform is unitary, G has the same marginal distribution as F underF′_{: namely, a product of independent}_{N (0, 1) Gaussians.}

Fix inputs x1, . . . , xK ∈ {0, 1}n of F and y1, . . . , yL ∈ {0, 1}n of G, for some K, L ≤ N1/4.

Then given constants a1, . . . , aK, b1, . . . , bL∈ R, let S be the set of all hF, Gi that satisfy the K + L

equations

F (xi) = ai for 1≤ i ≤ K, (1)

G (yj) = bj for 1≤ j ≤ L.

We can think of S as a (2N− K − L)-dimensional affine subspace in R2N_. _{The measure of S,}

under some probability measure µ on R2N_{, is simply}

µ (S) := Z hF,Gi∈S µ (F, G) dhF, Gi . Now let ∆S := a21+· · · + a2K+ b21+· · · + b2L

be the squared distance between S and the origin (that is, the minimum squared 2-norm of any point in S). Then by the spherical symmetry of the Gaussian measure, it is not hard to see that S has measure

U′(S) = e−∆

S/2

√ 2πK+L underU′_{. Our key claim is that}

1− O _{(K + L) ∆} S √ N ≤ F′(S) U′_(S) ≤ 1 + O _{(K + L) ∆} S √ N .

To prove this claim: recall that the probability measure over F induced byF′ _{is just a spherical}

Gaussian _{G on R}N, and that G = bF uniquely determines F and vice versa. So consider the (N _{− K − L)-dimensional affine subspace T of R}N defined by the K + L equations

F (xi) = ai for 1≤ i ≤ K,

b

F (yj) = bj for 1≤ j ≤ L.

ThenF′_{(S) =}_{G (T ): that is, to compute how much measure F}′ _{assigns to S, it suffices to compute}

how much measure_{G assigns to T . We have}

G (T ) = e−∆

T/2

√ 2πK+L

(27)

where ∆T is the squared Euclidean distance between T and the origin. Thus, our problem reduces to minimizing ∆F := X x∈{0,1}n F (x)2

over all F _{∈ T . By a standard fact about quadratic optimization, the optimal F ∈ T will have the} form F (x) = α1E1(x) +· · · + αKEK(x) + β1χ1(x) +· · · + βLχL(x) where Ei(x) := 1 if x = xi 0 otherwise is an indicator function, and

χj(x) :=

(₋₁₎x·yj

√ N

is the y_jth Fourier character evaluated at x. Furthermore, the coefficients {αi}_i∈[K],{βj}_j∈[L] can

be obtained by solving the linear system           1 0 0 _±1/√N _{· · · ±1/}√N 0 . .. 0 ... . .. ... 0 0 1 _±1/√N _{· · · ±1/}√N ±1/√N _{· · · ±1/}√N 1 0 0 .. . . .. ... 0 . .. 0 ±1/√N · · · ±1/√N 0 0 1           | {z } A           α1 .. . αK β1 .. . βL           | {z } u =           a1 .. . aK b1 .. . bL           | {z } w

Here A is simply a matrix of covariances: the top left block records the inner product between each Ei and Ej (and hence is a K× K identity matrix), the bottom right block records the inner

product between each χi and χj (and hence is an L× L identity matrix), and the remaining two

blocks of size K_{× L record the inner product between each E}i and χj.

Thus, to get the vector of coefficients u ∈ RK+L_{, we simply need to calculate A}−1_w. _Let

B = I_{− A. Then by Taylor series expansion,}

A−1 = (I_{− B)}−1= I + B + B2+ B3+_{· · ·}

Notice that every entry of B is at most 1/√N in absolute value. This means that, for all positive integers t, every entry of Bt _{is at most}

(K + L)t−1 Nt/2

in absolute value. Since K + L≪√N , this in turn means that every entry of I− A−1_{has absolute}

value O1/√N. So A−1 _{is exponentially close to the identity matrix. Hence, when we compute}

the vector u = A−1w, we find that

αi= ai+ εi for all 1≤ i ≤ K,

(28)

for some small error terms εi and δj. Specifically, each εi and δj is the inner product of w, a

(K + L)-dimensional vector of length √∆S, with a vector every entry of which has absolute value

O1/√N. By Cauchy-Schwarz, this implies that

|εi| , |δj| = O r (K + L) ∆S N ! for all i, j. So ∆T = min F ∈T X x∈{0,1}n F (x)2 = K X i=1 α2_i + L X j=1 β_j2+ 2 K X i=1 L X j=1 αiβj √ N = K X i=1 (ai+ εi)2+ L X j=1 (bj+ δj)2+ 2 K X i=1 L X j=1 (ai+ εi) (bj+ δj) √ N = ∆S± O (K + L) ∆S √ N + (K + L)2∆S N + (K + L)3∆S N3/2 ! = ∆S 1_{± O} K + L √ N ,

where the fourth line made repeated use of Cauchy-Schwarz, and the fifth line used the fact that K + L_≪√N . Hence F′_(S) U′_(S) = e−∆T/2_/√_2πK+L e−∆S/2/√2πK+L = exp ∆S− ∆T 2 = exp ±O (K + L) ∆S √ N = 1_{± O} (K + L) ∆S √ N

which proves the claim.

To prove the theorem, we now need to generalize to the discrete functions f and g. Here we are given a term C that is a conjunction of K + L inequalities: K of the form F (xi)≤ 0 or F (xi)≥ 0,

and L of the form G (yj) ≤ 0 or G (yj) ≥ 0. If we fix x1, . . . , xK and y1, . . . , yL, we can think

of C as just a convex region of RK+L_. _{Then given an affine subspace S as defined by equation}

(1), we will (abusing notation) write S ∈ C if the vector (α1, . . . , αK, β1, . . . , βL) is in C: that is,

BQP and the Polynomial Hierarchy

BQP and the Polynomial Hierarchy

The MIT Faculty has made this article openly available.

Please share

how this access benefits you. Your story matters.

Citation

Aaronson, Scott. “BQP and the polynomial hierarchy.” in

Proceedings of the 42nd ACM Symposium on Theory of Computing.

Cambridge, Massachusetts, USA: ACM, 2010. 141-150.

Publisher

Association for Computing Machinery

Version

Original manuscript

Citable link

http://hdl.handle.net/1721.1/54236

Terms of Use

Creative Commons Attribution-Noncommercial-Share Alike 3.0

BQP and the Polynomial Hierarchy

Contents

1

Introduction

2

Preliminaries

3

Quantum Algorithms

4

The Classical Complexity of Fourier Fishing

5

The Classical Complexity of Fourier Checking