Trace estimation - Quantum algorithms for machine learning

The algorithm for trace estimation is based on the observation thatT r[A] = _n¹Pn

i=1e^T_i Aei. Classical algorithms approximate this value by using certain probe vectors which can be either Rademacher vectors (i.e. each entry is +1 or−1 with equal probability) or Gaussian distributed. Along with the classical Rademacher and Gaussian estimator, there is another kind of sample vector that has been studied in classical literature, and is more relevant for quantum computers.

In this section we briefly recall the definition of stochastic trace estimator, and recall classical results on unit vector estimation techniques. We will see that even using the unit vector estimator, it is not possible do improve upon the additive error approximation of n.

Definition 68 ((, δ)-approximator of trace). LetA be a symmetric positive semi-definite matrix. A randomized trace estimatorT is an(, δ)-approximator ofTr[A] if

Pr[|T−Tr[A]| ≤Tr[A]]≥1−δ. (7.3)

Definition 69 (Unit vector estimator [15]). A unit vector estimator for a SPD matrix A∈R^n×n is

UM = n M

i=1

z_i^TAzi (7.4)

where thezi’s areM independent uniform random samples from the standard basis{e1, . . . en} of Rⁿ.

It is evident that this estimator does not depend on the off-diagonal elements of A, so the quality of the estimator is affected by the distribution of the elements A_ii. If the diagonal elements are uniform, then only one sample will suffice, but if the distribution is skewed, we need many more samples. It is easy to address this difficulty by mixing the matrix so to make the diagonal element more uniform. Instead of computing the trace of A, we compute the trace ofFAF^T, for a unitary matrixF defined as follows.

Definition 70 (Random mixing matrix). A random mixing matrix is a unitary matrix F=F D∈R^n×n where F is ann×n fixed unitary matrix called seed andD is ann×n matrix where the diagonal is a Rademacher vector, i.e. Dii is±1with equal probability.

In this way, we obtained that F flattens the distribution of all the diagonal of A. We are ready to define a new estimator.

Definition 71 (Mixed unit vector estimator). A mixed unit-vector estimator for a SPD matrixA∈R^n×n is

TM = n M

i=1

z_i^TFAF^Tzi (7.5)

where thez_i’s areM independent uniform random samples from{e1, . . . e_n}

The quality ofFis evaluated on the parameterη=kFk²_max: the smallerηthe better the mixing capacity [15]. It is possible to see that this value is reached in case the matrixF is a Discrete Fourier Transform, where this value is 1/n. There are other possible choices for the seed matrix, but we focus on this one, as performing a QFT in a fault tolerant quantum computer is relatively simple. For reference on the various possible choices of seed matrices, we recommend [15]. In [15, theorem 8.4, remark 8.5] they prove the following result:

Theorem 72(Mixed Unit Vector estimator with DFT matrix [15] ). The mixed unit vector estimatorTM with the DFT matrix as the seed matrixF is an(, δ)-approximator ofTr[A]

forM ≥2⁻²log²(4n²/δ) ln(4/δ).

7.2.1 Quantum trace estimation

The advantage of using the mixed stochastic trace estimator with a quantum computer is the following. Using other stochastic trace estimators (like the Rademacher vectors or the Gaussian vectors), we would incur a penalty in the runtime given by the norm of the probe vectors. Using the mixed unit trace estimator, we can remove this dependence. In the following, albeit we use the block-encoding formalism, we drop the variable of the number

of qubits. That is, for a (α, a, ) block-encoding we will just write (α, ). In the following, we put forward a simple algorithm for trace estimation, based on Definition 69, and then compare its performances with the quantum algorithm based on Definition 71.

Lemma 73 (Quantum trace estimation). Assume to have a (µ(A), a,0) block encoding access to a SPD matrix A ∈ R^n×n. For an > 0, there is a quantum algorithm that estimates Tr[A]with high probability, such that:

• |Tr[A]−Tr[A]| ≤T r[A] in time: O(e _{T r[A]}^µ(A)n)≤O(e ^µ(A)κ )

• |Tr[A]−Tr[A]| ≤nin time: O(e ^µ(A) )

Proof. The algorithm is as follows. We apply the block-encoding ofAto ^√¹_nPn

i=0|ii, and we obtain|ψi= ^√¹_nPn

i(_µ(A)¹ kaik |ai,0i+p

1−γ²|G,1i), where ai is thei-th column of A. Then, again, we use quantum subroutine to estimate the inner product between|ψiand

|φi= ^√¹_nPn

The error in matrix multiplication can be taken as small as possible as it will affect the runtime only polylogarithmically. In order to estimatehψ|φi, we use amplitude estimation with relative error₁on the Hadamard test. This procedure creates a statep

p(0)|G,0i+ p1−p(0)|B,1i, where the probability of measuring |0i on the leftmost qubit is p(0) =

1+hψ|φi

n −

µ(A)hψ|φi| ≤ in time O(e ^µ(A) ) (note that the factor p

p(0) that should appear at the denominator of the runtime of amplitude estimation in the case of a Hadamard test of a SPD matrix is bounded by 1/2). Multiplying this estimate by n, we obtain the absolute error bound. If we want to obtain a relative error , we can use amplitude estimation with precision 1 = _µ(A) ^{T r[A]}_n , which raises the runtime to O(e _{T r[A]}^µ(A)n) ≤ O(e ^µ(A)κ ), as

T r[A] ≤ _nσⁿ

min ≤κ, under the assumption thatkAk₂≤1.

Alas, this algorithm has a dependence onn, the dimension of the space. We could hope to amend this by changing the vector we use in the quadratic form. Along with the matrix A, we now assume to have quantum access to a Rademacher matrix. A Rademacher matrix is a diagonal matrix whose entries are ±1, each with probability 1/2. For a Rademacher matrix D of size n×n, ˜O(n) operations suffice to create quantum access to D. In the following algorithm we also assume to have quantum access to a Rademacher diagonal matrix D. For us, this Rademacher matrix will fulfil the role of the matrix D in the Definition 71 of mixed unit vector estimator.

Lemma 74 (Quantum Mixed unit vector trace estimation). Assume to have quantum access to a SPD matrix A∈R^n×n such that kAk ≤1, and a diagonal Rademacher matrix D∈R^n×n. There is a quantum algorithm that estimatesTr[A]such that|Tr[A]−Tr[A]| ≤ in time: Oe= (n^µ(A)3 )

Proof. We start by combining quantum access to the matrix D and A. Using just one ancilla qubit, we can store the signs on the diagonal ofD on the amplitudes, and create the state

We use this state as input register to get quantum access to matrixA. Afterward, we use quantum access toD using the second register of _kAk¹

i(−1)ⁱka_ik |ii |a_ii. Further on, we use the second register as index forD again. This unitary gives quantum access to the matrixAD=DAD^T: of M numbers sampled uniformly without repetitions from [n]. We create two states,

|ψ1i = |ψ2i = ^√¹

ke^2πijkⁿ |ki. We use quantum access we just created to matrixADto apply matrix multiplication to|ψ1i, and compute the inner product between the resulting state and|ψ₂i. We obtain a scalarT such that:

The error in matrix multiplication can be taken as small as needed as it will affect the runtime only polylogarithmically. Thanks to theorem 72 and Definition 71, we obtain an estimator for the traceT r[A] =nM T whose error we bound with triangle inequality as:

|T r[A]−T r[A]| ≤ |nM T−nM T|+|nM T−T r[A]| ≤] (7.7)

Dans le document Quantum algorithms for machine learning (Page 102-105)