• Aucun résultat trouvé

Maximization

Dans le document Quantum algorithms for machine learning (Page 90-93)

6.2 Quantum Expectation-Maximization for GMM

6.2.2 Maximization

Now we need to get a new estimate for the parameters of our model. This is the idea: at each iteration we recover the new parameters from the quantum algorithms as quantum states, and then by performing tomography we can update the QRAM that gives us quantum access to the GMM for the next iteration. In these sections we will show how.

Updating mixing weights θ

Lemma 57(Computingθt+1). We assume quantum access to a GMM with parametersγt and letδθ>0be a precision parameter. There exists an algorithm that estimatesθt+1∈Rk such that

Proof. An estimate of θt+1j can be recovered from the following operations. First, we use lemma 56 (part 1) to compute the responsibilities to error 1, and then perform the following mapping, which consists of a controlled rotation on an auxiliary qubit:

√1

The previous operation has a cost of TR1,1, and the probability of getting |0i is p(0) =

1 amplifica-tion on |0iwe have the state,

|√

The probability of obtaining outcome|jiif the second register is measured in the standard basis is p(j) = θj

t+1. An estimate for θt+1j with precision can be obtained by either sampling the last register, or by performing amplitude estimation. In this case, we can estimate each of the values θjt+1 for j ∈ [k]. Sampling requires O(−2) samples to get epsilon accuracy onθ (by the Chernoff bounds), but does not incur any dependence onk.

As the number of clusterkis relatively small compared to 1/, we chose to do amplitude estimation to estimate allθt+1j forj∈[k] to error/

We analyze the error in this procedure. The error introduced by the estimation of responsibility in lemma 56 is|θjt+1θt+1j |=n1P norm due to amplitude estimation is at most as it estimates each coordinate of θj

t+1

to error /

k. As the errors sums up additively, we can use the triangle inequality to bound them. The total error is at most +√

and < δθ/2. With these parameters, the overall running time of the quantum procedure isTθ=O(k1.5TR1,1) =O

We use quantum linear algebra to transform the uniform superposition of responsibilities of thej-th mixture into the new centroid of thej-th Gaussian. LetRtj ∈Rn be the vector of responsibilities for a Gaussian j at iterationt. The following claim relates the vectors Rtj to the centroidsµt+1j .

Claim 58. Let Rtj ∈Rn be the vector of responsibilities of the points for the Gaussian j at timet, i.e. (Rtj)i=rtij. Then µt+1j

Lemma 59 (Computing µt+1j ). We assume we have quantum access to a GMM with parameters γt. For a precision parameter δµ > 0, there is a quantum algorithm that calculatesjt+1}kj=1 such that for allj∈[k]

Proof. A new centroid µt+1j is estimated by first creating an approximation of the state

|Rjtiup to error1 in the`2-norm using part 2 of lemma 56. Then, we use the quantum linear algebra algorithms in theorem 28 to multiplyRj byVT, and obtain a state|µjt+1i along with an estimate for the norm

VTRtj =

µjt+1

with error 3. The last step of the algorithm consists in estimating the unit vector|µjt+1iwith precision 4, using `2

tomography. The tomography depends linearly ond, which we expect to be bigger than the precision required by the norm estimation. Thus, we assume that the runtime of the norm estimation is absorbed by the runtime of tomography. We obtain a final runtime of Oe

kd2 4

·κ(V) (µ(V) +TR2,1) .

Let’s now analyze the total error in the estimation of the new centroids. In order to satisfy the condition of the robust GMM of definition 52, we want the error on the

centroids to be bounded by δµ. For this, Claim 15 help us choose the parameters such that √

η(tom+norm) = δµ. Since the error 2 for quantum linear algebra appears as a logarithmic factor in the running time, we can choose 2 4 without affecting the runtime.

Letµbe the classical unit vector obtained after quantum tomography, and|µic be the state produced by the quantum linear algebra procedure starting with an approximation of

|Rtji. Using the triangle inequality we havek|µi −µk<

η. The errors for the norm estimation procedure can be bounded similarly as | kµk − kµk|<| kµk −kµk|d +|kµk −kµk|d < 3+1 < normδµ/2

η. We therefore choose parameters 4 =1 =3δµ/4

η. Again, as the amplitude estimation step we use for estimating the norms does not depends on d, which is expected to dominate the other parameters, we omit the cost for the amplitude estimation step. We have the more concise expression for the running time of:

Oe

kdηκ(V)(µ(V) +k3.5η1.5κ(Σ)µ(Σ)) δµ3

(6.20)

Lemma 60 (Computing Σt+1j ). We assume we have quantum access to a GMM with pa-rametersγt. We also have computed estimatesµjt+1of all centroids such that

µjt+1µt+1jδµ for precision parameter δµ >0. Then, there exists a quantum algorithm that outputs estimates for the new covariance matricest+1j }kj=1 such that

Proof. It is simple to check, that the update rule of the covariance matrix during the maximization step can be reduced to [110, Exercise 11.2]:

Σt+1j ← First, note that we can use the previously obtained estimates of the centroids to compute the outer product µt+1jt+1j )T with error δµkµk ≤ δµ

η. The error in the estimates of the centroids is µ = µ+e where e is a vector of norm δµ. Therefore Now we discuss the procedure for estimating Σ0j. We estimate|vec[Σ0j]iand

vec[Σ0j] . To do it, we start by using quantum access to the norms and part 1 of lemma 6.8. With them, for a cluster j, we start by creating the state |ji1nP

i|ii |riji, Then, we use quantum access to the norms to store them into another register|ji1nP

i|ii |riji |kviki. Using an ancilla qubit we can obtain perform a rotation, controlled on the responsibilities and the norm, and obtain the following state:

|ji 1

We undo the unitary that created the responsibilities in the second register and the query on the norm on the third register, and we perform amplitude amplification on the ancilla qubit being zero. The resulting state can be obtained in timeO(RR1,1

kVRk), wherekVRk

is

On which we can apply quantum linear algebra subroutine, multiplying the first register with the matrixVT. This will lead us to the desired state|Σ0ji, along with an estimate of its norm.

As the runtime for the norm estimation κ(V)(µ(V)+TR2,1)) log(1/mult)

norms does not depend on

d, we consider it smaller than the runtime for performing tomography. Thus, the runtime for this operation is:

O(d2logd

2tom κ(V)(µ(V) +TR2,1)) log(1/mult)).

Let’s analyze the error of this procedure. We want a matrix Σ0j that is √

ηδµ-close to matrix multiplication can be taken as small as necessary, since is inside a logarithm.

From Claim 15, we just need to fix the error of tomography and norm estimation such that η(unit+norms)<√ The same argument applies to estimating the norm

Σ0j the amplitude estimation step used in theorem 28 and1is the error in calling lemma 56.

Again, we choose=1δµ/

η 4 .

Since the tomography is more costly than the amplitude estimation step, we can disre-gard the runtime for the norm estimation step. As this operation is repeated ktimes for the k different covariance matrices, the total runtime of the whole algorithm is given by Oekd2ηκ(V)(µ(V)+η2k3.5κ(Σ)µ(Σ))

δµ3

. Let us also recall that for each of new computed covari-ance matrices, we use lemma 66 to compute an estimate for their log-determinant and this time can be absorbed in the timeTΣ.

Dans le document Quantum algorithms for machine learning (Page 90-93)