Maximization - Quantum Expectation-Maximization for GMM

6.2 Quantum Expectation-Maximization for GMM

6.2.2 Maximization

Now we need to get a new estimate for the parameters of our model. This is the idea: at each iteration we recover the new parameters from the quantum algorithms as quantum states, and then by performing tomography we can update the QRAM that gives us quantum access to the GMM for the next iteration. In these sections we will show how.

Updating mixing weights θ

Lemma 57(Computingθ^t+1). We assume quantum access to a GMM with parametersγ^t and letδθ>0be a precision parameter. There exists an algorithm that estimatesθ^t+1∈R^k such that

Proof. An estimate of θ^t+1_j can be recovered from the following operations. First, we use lemma 56 (part 1) to compute the responsibilities to error 1, and then perform the following mapping, which consists of a controlled rotation on an auxiliary qubit:

√1

The previous operation has a cost of TR₁,₁, and the probability of getting |0i is p(0) =

1 amplifica-tion on |0iwe have the state,

|√

The probability of obtaining outcome|jiif the second register is measured in the standard basis is p(j) = θj

t+1. An estimate for θ^t+1_j with precision can be obtained by either sampling the last register, or by performing amplitude estimation. In this case, we can estimate each of the values θ_j^t+1 for j ∈ [k]. Sampling requires O(⁻²) samples to get epsilon accuracy onθ (by the Chernoff bounds), but does not incur any dependence onk.

As the number of clusterkis relatively small compared to 1/, we chose to do amplitude estimation to estimate allθ^t+1_j forj∈[k] to error/√

We analyze the error in this procedure. The error introduced by the estimation of responsibility in lemma 56 is|θ_j^t+1−θ^t+1_j |=_n¹P norm due to amplitude estimation is at most as it estimates each coordinate of θj

t+1

to error /√

k. As the errors sums up additively, we can use the triangle inequality to bound them. The total error is at most +√

and < δ_θ/2. With these parameters, the overall running time of the quantum procedure isTθ=O(k^1.5^T^R¹^,¹) =O

We use quantum linear algebra to transform the uniform superposition of responsibilities of thej-th mixture into the new centroid of thej-th Gaussian. LetR^t_j ∈Rⁿ be the vector of responsibilities for a Gaussian j at iterationt. The following claim relates the vectors R^t_j to the centroidsµ^t+1_j .

Claim 58. Let R^t_j ∈Rⁿ be the vector of responsibilities of the points for the Gaussian j at timet, i.e. (R^t_j)i=r^t_ij. Then µ^t+1_j ←

Lemma 59 (Computing µ^t+1_j ). We assume we have quantum access to a GMM with parameters γ^t. For a precision parameter δ_µ > 0, there is a quantum algorithm that calculates{µ_j^t+1}^k_j=1 such that for allj∈[k]

Proof. A new centroid µ^t+1_j is estimated by first creating an approximation of the state

|R_j^tiup to error1 in the`2-norm using part 2 of lemma 56. Then, we use the quantum linear algebra algorithms in theorem 28 to multiplyRj byV^T, and obtain a state|µjt+1i along with an estimate for the norm

V^TR^t_j =

µjt+1

with error 3. The last step of the algorithm consists in estimating the unit vector|µjt+1iwith precision 4, using `2

tomography. The tomography depends linearly ond, which we expect to be bigger than the precision required by the norm estimation. Thus, we assume that the runtime of the norm estimation is absorbed by the runtime of tomography. We obtain a final runtime of Oe

k^d2 4

·κ(V) (µ(V) +T_R₂_,₁) .

Let’s now analyze the total error in the estimation of the new centroids. In order to satisfy the condition of the robust GMM of definition 52, we want the error on the

centroids to be bounded by δ_µ. For this, Claim 15 help us choose the parameters such that √

η(_tom+_norm) = δ_µ. Since the error ₂ for quantum linear algebra appears as a logarithmic factor in the running time, we can choose ₂ ₄ without affecting the runtime.

Letµbe the classical unit vector obtained after quantum tomography, and|µic be the state produced by the quantum linear algebra procedure starting with an approximation of

|R^t_ji. Using the triangle inequality we havek|µi −µk<

η. We therefore choose parameters 4 =1 =3 ≤ δµ/4√

η. Again, as the amplitude estimation step we use for estimating the norms does not depends on d, which is expected to dominate the other parameters, we omit the cost for the amplitude estimation step. We have the more concise expression for the running time of:

kdηκ(V)(µ(V) +k^3.5η^1.5κ(Σ)µ(Σ)) δ_µ³

(6.20)

Lemma 60 (Computing Σ^t+1_j ). We assume we have quantum access to a GMM with pa-rametersγ^t. We also have computed estimatesµ_j^t+1of all centroids such that

µ_j^t+1−µ^t+1_j ≤ δ_µ for precision parameter δ_µ >0. Then, there exists a quantum algorithm that outputs estimates for the new covariance matrices {Σ^t+1_j }^k_j=1 such that

Proof. It is simple to check, that the update rule of the covariance matrix during the maximization step can be reduced to [110, Exercise 11.2]:

Σ^t+1_j ← First, note that we can use the previously obtained estimates of the centroids to compute the outer product µ^t+1_j (µ^t+1_j )^T with error δµkµk ≤ δµ

√η. The error in the estimates of the centroids is µ = µ+e where e is a vector of norm δ_µ. Therefore Now we discuss the procedure for estimating Σ⁰_j. We estimate|vec[Σ⁰_j]iand

vec[Σ⁰_j] . To do it, we start by using quantum access to the norms and part 1 of lemma 6.8. With them, for a cluster j, we start by creating the state |ji^√¹_nP

i|ii |riji, Then, we use quantum access to the norms to store them into another register|ji^√¹_nP

i|ii |riji |kviki. Using an ancilla qubit we can obtain perform a rotation, controlled on the responsibilities and the norm, and obtain the following state:

|ji 1

We undo the unitary that created the responsibilities in the second register and the query on the norm on the third register, and we perform amplitude amplification on the ancilla qubit being zero. The resulting state can be obtained in timeO(R_R₁_,₁

√nη

kVRk), wherekVRk

On which we can apply quantum linear algebra subroutine, multiplying the first register with the matrixV^T. This will lead us to the desired state|Σ⁰_ji, along with an estimate of its norm.

As the runtime for the norm estimation ^κ(V^)(µ(V^)+T^R²^,¹^{)) log(1/}^mult⁾

norms does not depend on

d, we consider it smaller than the runtime for performing tomography. Thus, the runtime for this operation is:

O(d²logd

²_tom κ(V)(µ(V) +T_R₂_,₁)) log(1/_mult)).

Let’s analyze the error of this procedure. We want a matrix Σ⁰_j that is √

ηδµ-close to matrix multiplication can be taken as small as necessary, since is inside a logarithm.

From Claim 15, we just need to fix the error of tomography and norm estimation such that η(unit+norms)<√ The same argument applies to estimating the norm

Σ⁰_j the amplitude estimation step used in theorem 28 and1is the error in calling lemma 56.

Again, we choose=1≤ ^δ^µ^/

√η 4 .

Since the tomography is more costly than the amplitude estimation step, we can disre-gard the runtime for the norm estimation step. As this operation is repeated ktimes for the k different covariance matrices, the total runtime of the whole algorithm is given by Oe_kd2ηκ(V)(µ(V)+η²k^3.5κ(Σ)µ(Σ))

δ_µ³

. Let us also recall that for each of new computed covari-ance matrices, we use lemma 66 to compute an estimate for their log-determinant and this time can be absorbed in the timeTΣ.

Dans le document Quantum algorithms for machine learning (Page 90-93)