• Aucun résultat trouvé

Distance estimations and quadratic form estimation

Dans le document Quantum algorithms for machine learning (Page 33-37)

2.5 Additional quantum results

2.5.5 Distance estimations and quadratic form estimation

ακ(a+TU)(1 +c) log2(k1+c )

Nevertheless, the algorithm that emerges by using the previous lemma has a quadratic dependence onκ. To decrease it to an algorithm linear inκthe authors used variable time amplitude amplifications[7]. Hence, we can restate the theorem 21, with the improved runtimes, as follows.

Theorem 28(Matrix algebra [37,70]). LetM :=P

iσiuivTi ∈Rd×d such thatkMk2= 1, and a vectorx∈Rd for which we have quantum access in time Tχ. There exist quantum algorithms that with probability at least1−1/poly(d)return

(i) a state |zi such that| |zi − |M xi | ≤in timeO(κ(M˜ )(µ(M) +Tχ) log(1/)) (ii) a state |zi such that| |zi − |M−1xi | ≤in timeO(κ(M˜ )(µ(M) +Tχ) log(1/)) (iii) a state |M≤θ,δ+ M≤θ,δxiin time O(T˜ χ µ(M)kxk

δθ

M≤θ,δ+ M≤θ,δx )

One can also get estimates of the norms with multiplicative error η by increasing the run-ning time by a factor1/η.

Another important advantage of the new methods is that it provides easy ways to manipulate sums or products of matrices.

Theorem 29(Matrix algebra on products of matrices [37,70]). LetM1, M2∈Rd×d such that kM1k2 =kM2k2 = 1,M =M1M2, and a vector x∈Rd for which we have quantum access. There exist quantum algorithms that with probability at least 1−1/poly(d) return

(i) a state |zi such that| |zi − |M xi | ≤in timeO(κ(M˜ )(µ(M1) +µ(M2)) log(1/)) (ii) a state |zi such that| |zi − |M−1xi | ≤in timeO(κ(M˜ )(µ(M1) +µ(M2)) log(1/)) (iii) a state |M≤θ,δ+ M≤θ,δxiin time O(˜ (µ(M1)+µ(M2))kxk

δθ

M≤θ,δ+ M≤θ,δx )

One can also get estimates of the norms with multiplicative error η by increasing the run-ning time by a factor1/η.

More generally, applying a matrix M which is the product of k matrices, i.e. M = M1. . . Mk will result in a runtime ofκ(M)(Pk

i µ(Mi)) log(1/) factors in the runtime.

2.5.5 Distance estimations and quadratic form estimation

In this section, we prove two new lemmas that can be used to estimate the inner products, distances and quadratic forms between vectors. The lemma 30 has been developed in the work [93], while the lemma for estimating the value of quadratic form has been formalized in the work under preparation with Changpeng Shao.

Lemma 30 (Distance / Inner Products Estimation [93]). Assume for a matrixV ∈Rn×d and a matrix C∈Rk×d that the following unitaries|ii |0i 7→ |ii |vii,and |ji |0i 7→ |ji |cji

can be performed in time T and the norms of the vectors are known. For any>0 and 1>0, there exists a quantum algorithm that computes

|ii |ji |0i 7→ |ii |ji |d2(vi, cj)i, where |d2(vi, cj)−d2(vi, cj)|61 w.p. ≥1−2∆, or

It is relatively simple to extend the previous algorithm to one that computes an estimate of a quadratic form. We will consider the case where we have quantum access to a matrix A and compute the quadratic forms vTAv and vTA−1v. The extension to the case when we have two different vectors, i.e. vTAuandvTA−1uis trivial.

Lemma 31 (Estimation of quadratic forms). Assume to have quantum access to a sym-metric positive definite matrix A∈Rn×n such thatkAk ≤1, and to a matrix V ∈Rn×d. For > 0, there is a quantum algorithm that performs the mapping |ii |0i 7→ |ii |sii, for

|sisi| ≤, wheresi is either:

• (|vii, A|vii)in timeO(µ(A) )

• (|vii, A−1|vii)in timeO(µ(A)κ(A) )

The algorithm can return an estimate of(vi, Avi) such that (vi, Avi)−(vi, Avi)≤ using quantum access to the norm of the rows ofV by increasing the runtime by a factor ofkvik2. Proof. Let’s analyze first the case where we want to compute the quadratic form with A, and after the case forA−1. Recall that the matrixAcan be decomposed in an orthonormal basis|uii. We can use theorem 28 to perform the following mapping:

|ii |vii |0i=|ii 1 It is simple to check that, for a given register|ii, the probability of measuring 0 is:

pi(0) = 1 +kAvik hAvi|vii 2

We analyze the case where we want to compute the quadratic form for A−1. For a C=O(1/κ(A)), we create instead the state:

|ii 1

In this case, the probability of measuring 0 in state of Equation 2.14 is pi(0) = 1 +C

A−1vi

hA−1vi|vii 2

For both cases, we are left with the task of coherently estimating the measurement probability in a quantum register and boost the success probability of this procedure. The unitaries that create the states in Equation 2.14 and 2.15 (i.e before a measurement on the ancilla qubit) describe a mapping: U1:|ii |0i 7→ 12|iip

pi(0)|yi,0i+p

1−pi(0)|Gi,1i . As in [151], the idea is to use amplitude estimation, i.e. theorem 18, along with median evaluation lemma 14. We can apply amplitude estimation to obtain a unitary

U2|ii |0i 7→ 1 2|ii √

α|pi(0), yi,0i+√

1−α|G0i,1i

(2.16) and estimatepi(0) such that|pi(0)−pi(0)|< for the case ofviTAvi and we choose a preci-sion/Cfor the case ofvTi A−1vito get the same accuracy. Amplitude estimation theorem, i.e. theorem 18 fails with probability≤ π82. The runtime of this procedure is given by com-bining the runtime of creating the state|ψii, amplitude estimation, and the median lemma.

Since the error in the matrix multiplication step is negligible, and assuming quantum access to the vectors is polylogarithmic, the final runtime isO(log(1/δ)µ(A) log(1/2)/), with an additional factorκ(A) for the case of the quadratic form ofA−1.

Note that if we want to estimate a quadratic form of two unnormalized vectors, we can just multiply this result by their norms. Note also that the absolute error now becomes relative w.r.t the norms, i.e. kvik2. If we want to obtain an absolute error0, as in the case with normalized unit vectors, we have to run amplitude estimation with precision 0 = O(/kvik2). To conclude, this subroutine succeeds with probability 1−γ and requires timeO(µ(A) log(1/γ) log(1/2)

1 ), with an additional factor ofκ(A) if we were to consider the quadratic form forA−1, and an additional factor ofkvik2if we were to consider the non-normalized vectorsvi. This concludes the proof of the lemma.

Note that this algorithm can be extended by using another index register to query for other vectors from another matrix W, for which we have quantum access. This extends the capabilities to estimating inner products in the form|ii |ji |wTi Avii.

Chapter 3

Classical machine learning

In this chapter, we review and introduce the part of classical machine learning that has been studied in the course of this thesis. Special emphasis is put on formalizing the connection between the machine learning problems and their linear-algebraic formulation.

3.1 Supervised learning

Supervised (or predictive) machine learning is the part of machine learning that deals with supervised datasets, i.e. data where each sample xi comes along with supervised information, i.e. a piece of data yi. It helps the intuition thinking that the supervised information comes from a stochastic process that maps vectorsxi to vectors yi. The goal is to model the mapping on the whole input space X to the output space Y given a set of input-output pairs D ={(xi, yi)}ni=0. Usually, the input space is a subset of Rd, and the output space is usually eitherRor a finite set K of small cardinality. It is practical, for the sake of exposition to consider the training set organized into a matrixX ∈Rn×d and the matrix YRn or Y ∈ [K]n. The components of a vector xi, i.e. a row of X are called features, attributes, or covariates. The matrix X is called design matrix, or simply the dataset. The vector yi is called theresponse variable. If the response variable is categorical (or nominal), the problem is known as classification, or pattern recognition.

If the response variable is real-valued we interpret this problem as learning a function f :Rd7→Rand we call this problem regression. Different assumptions on the structure of f lead to different machine learning models. Each model can be trained (or fitted) with different algorithms.

Dans le document Quantum algorithms for machine learning (Page 33-37)