Conclusions - Quantum algorithms for machine learning

In this chapter we analyzed some applications of quantum linear algebra to propose the quantum version of a fundamental algorithm for dimensionality reduction in supervised ma-chine learning and statistics. We carefully analyzed its performance on standard datasets and proposed a non-trivial application of the SFA model in a context where linear algebra based algorithms are too computationally expensive to be applied.

QFDE 3Quantum Frobenius Distance Estimator Require:

Quantum access to the matrixX_kof clusterkand to a test vectorx(0). Error parameter η >0.

Ensure:

An estimateFk(x(0)) such that|Fk(x(0))−Fk(x(0))|< η.

1: s:=0

2: forr=O(1/η²) do

3: Create the state

√1 Nk

p|Tk| kx(0)k |0i+kXkk_F|1i

|0i |0i

4: Apply the unitary that maps:

|0i |0i 7→ |0i 1 p|Tk|

i∈Tk

|ii and |1i |0i 7→ |1i 1 kXkk_F

i∈Tk

kx_ik |ii

to the first two registers to get

√1 Nk

|0iX

i∈Tk

kx(0)k |ii+|1iX

i∈Tk

kx(i)k |ii

|0i

5: Apply the unitary that maps

|0i |ii |0i 7→ |0i |ii |x(0)i and |1i |ii |0i 7→ |1i |ii |xii to get the state

√1 N_k

|0iX

i∈T_k

kx(0)k |ii |x(0)i+|1iX

i∈T_k

kx(i)k |ii |x(i)i

(4.7)

6: Apply a Hadamard to the first register to get

√1 2Nk

|0iX

i∈Tk

kx(0)k |ii |x(0)i+kx(i)k |ii |x(i)i +

√1

2N_k |1iX

i∈T_k

kx(0)k |ii |x(0)i − kx(i)k |ii |x(i)i

(4.8)

7: Measure the first register and if the outcome is|1ithen s:=s+1

8: end for

9: Output ^s_r.

Chapter 5

Q-means

In this section we detail a quantum algorithm for unsupervised learning, which can be seen as the quantum version of the well known k-means algorithm. This algorithm is one of the simplest, yet most commonly used clustering algorithms. We first introduce the classical algorithm, then propose a definition of the k-mean model that makes is robust to error in the model. Then, we explain how to derive a quantum version of the k-means algorithm and show its performance on experimental data.

5.1 The k-means algorithm

Thek-means algorithm was introduced in [102], and is extensively used for unsupervised problems. The inputs tok-means algorithm are vectorsx_i ∈R^d for i∈[n]. These points must be partitioned in k subsets according to a similarity measure, which in k-means is the Euclidean distance between points. The output of the k-means algorithm is a list of k cluster centers, which are called centroids. The algorithm starts by selecting k initial centroids randomly or using efficient heuristics like thek-means++ [14]. It then alternates between two steps: (i) Each data point is assigned the label of the closest centroid. (ii) Each centroid is updated to be the average of the data points assigned to the corresponding cluster. These two steps are repeated until convergence, that is, until the change in the centroids during one iteration is sufficiently small.

More precisely, we are given a dataset X of vectors xi ∈ R^d for i ∈ [n]. At step t, we denote the k clusters by the sets C_j^t for j ∈ [k], and each corresponding centroid by the vectorc^t_j. At each iteration, the data pointsxi are assigned to a clusterC_j^tsuch that C₁^t∪C₂^t· · · ∪C_K^t =V and C_i^t∩C_l^t=∅ for i6=l. Letd(xi, c^t_j) be the Euclidean distance between vectors xi andc^t_j. The first step of the algorithm assigns each xi a label `(xi)^t corresponding to the closest centroid, that is

`(xi)^t= argmin_j∈[k](d(xi, c^t_j)).

The centroids are then updated, c^t+1_j = _|C¹t j|

i∈C_j^txi, so that the new centroid is the average of all points that have been assigned to the cluster in this iteration. We say that we have converged if for a small thresholdτ (which might be data dependent) we have:

1 k

j=1

d(c^t_j, c^t−1_j )6τ.

The loss function that this algorithm aims to minimize is the RSS (residual sums of squares), the sum of the squared distances between points and the centroid of their cluster.

RSS := X

j∈[k]

i∈Cj

d(c_j, x_i)²

As the RSS decrease at each iteration of the k-means algorithm, the algorithm therefore converges to a local minimum for the RSS. The number of iterations T for convergence depends on the data and the number of clusters. A single iteration has complexity of O(knd) since thenvectors of dimensiondhave to be compared to each of thekcentroids.

The centroids obtained at time t are stored in the matrix C^t ∈ R^k×d, such that the j^th rowc^t_j forj ∈[k] represents the centroid of the clusterC_j^t.

From a computational complexity point of view, we recall that it is NP-hard to find a clustering that achieves the global minimum for the RSS. There are classical clustering algorithms based on optimizing different loss functions, however the k-means algorithm uses the RSS as the objective function. The algorithm can be super-polynomial in the worst case (the number of iterations is 2^ω(

√n)[13]), but the number of iterations is usually small in practice. Thek-means algorithm with a suitable heuristic likek-means++ ( described later on ) to initialize the centroids finds a clustering such that the value for the RSS objective function is within a multiplicativeO(logn) factor of the minimum value [14].

5.1.1 δ−k−means

We now consider aδ-robust version of thek-means in which we introduce some noise. The noise affects the algorithms in both of the steps of k-means: label assignment and centroid estimation.

• Letc^∗_i be the closest centroid to the data pointxi. Then, the set of possible labels Lδ(xi) forxi is defined as follows:

L_δ(x_i) ={c_p:|d²(c^∗_i, x_i)−d²(c_p, x_i)| ≤δ}

The assignment rule selects arbitrarily a cluster label from the setLδ(xi).

• We addδ/2 noise during the calculation of the centroid. LetC_j^t+1be the set of points which have been labeled byjin the previous step. Forδ-k-means we pick a centroid c^t+1_j with the property that:

One way to see this is to perturb the centroid with some noise.

Let us add two remarks on theδ-k-means. First, for a dataset that is expected to have clusterable data, and for a smallδ, the number of vectors on the boundary that risk to be misclassified in each step, that is the vectors for which|Lδ(x_i)|>1 is typically much smaller compared to the vectors that are close to a unique centroid. Second, we also increase by δ/2 the convergence threshold from thek-means algorithm. All in all, δ-k-means is able to find a clustering that is robust when the data points and the centroids are perturbed with some noise of magnitudeO(δ). As we will see in this work,q-means is the quantum equivalent ofδ-k-means.

Dans le document Quantum algorithms for machine learning (Page 65-68)