Prototype-based algorithms - Distance-based clustering algorithms

2.3 Distance-based clustering algorithms

2.3.3 Prototype-based algorithms

Prototype-based algorithms are another family of clustering algorithms. The principle of these algorithms is based on vector quantization, a data compression process which consists in representing the data with a few representatives called prototypes. Each data will then be linked to its closest prototype in the data space. The main task of these algorithms is therefore to build relevant prototypes and link the data to them.

A common example of prototype would be a centroid of a high density area. De-pending on the number of prototypes, each of them may represent a cluster, or several of them may need to be regrouped to find the clusters.

2.3.3.1 The K-Means algorithm

The K-Means algorithm is one of the most famous prototype-based clustering algo-rithm. It is a simple and fast, yet relatively good clustering method. Its principle is the following [96, 122]: Suppose that we would like to divide our data into K clusters, the value of K being known in advance. We allocate K cluster prototypes (also called

2.3. DISTANCE-BASED CLUSTERING ALGORITHMS 25 mean-values) to our input space, and we would like to move these prototypes so that each of them will become the centroid of a cluster. Given that we have chosen a dis-tance measure, the procedure to do so consists in alternating the following two steps until convergence: 1) Link each data to the closest prototype, 2) Move the prototype so that is becomes the barycenter of the current data to which it is linked. This procedure is described in Algorithm 3 and an example with 2 clusters on the “Old Faithful” data set is shown in Figure 2.3.

It is convenient at this point to give a notation that describes the assignment of data points to clusters. For each data point x_n, let s_n,i ∈ {0,1} be a set of binary indicator variables with i ∈ [1..K]. The s_n,i are used to describe to which one of the K clusters a data has been assigned. For instance, ifx_n is assigned to cluster c_k, then s_n,k = 1 and ∀i 6= k, s_n,i = 0. Ultimately, what the K-Means algorithm does is to optimize a cost functionR(µ)˜ as given in Equation (2.5).

R(µ) =˜ Because each phase reduces the value of the objective function R(µ), convergence˜ of the algorithm is assured. However, it may converge to a local rather than global minimum of R(µ). The convergence properties of the K-means algorithm have been˜ studied in [96].

Algorithm 3: K-Means Algorithm Choose a value for K

Randomly initialize the K centroidsµi

while Learning do forall the x_n∈X do

Assign x_n to the cluster c_i with the closest centroid:

s_n,i=

Several algorithms based on improved or modified versions of the K-Means algo-rithm have been proposed in the literature [41, 46, 136, 64, 77, 34, 29]. Algoalgo-rithms based on the K-Means algorithm suffer from several weaknesses: The main one is the need to provide a K. It requires to know in advance how many clusters are to be found. In practice this is rarely the case because we expect the clustering algorithm to actually discover the clusters. Therefore, the only solution when the number of clusters is really unknown is to run the algorithm several times with different values ofK and to pick the best clustering based of a given quality index (for instance the Silhouette index [113] or the Davies-Bouldin index [36]). This method is costly and may prove ineffective because of the non-deterministic nature of the K-Means algorithm. Adapta-tions of the K-Means algorithm have been proposed [106] to solve this issue, but they remain only partially satisfying. Second, algorithms based on the K-Means can only

find hyper-spherical clusters and will also fail to detect the clusters properly if their sizes are significantly different.

Figure 2.3: Illustration of the K-Means algorithm on the Old Faithful data set

2.3.3.2 Fuzzy C-Means

Since fuzzy clustering allows the objects to belong to several clusters simultaneously with different degrees of membership, in many situations it is more natural than hard clustering. Within this context, the Fuzzy C-Means algorithm [14, 91] is a major extension of the K-Means algorithm that enables fuzzy clusters.

A generalization of the hard partition from the K-Means to the fuzzy case is easily possible by allowing the s_n,i to attain real values: s_n,i ∈ [0,1] with the properties shown in Equation (2.6). A partition can then be represented by the partition matrix S = (s_n,i)_C×N where N is the size of the data set and C the number of clusters (equivalent to the K in K-Means).

s_n,i∈[0,1], ∀n

i=1

s_n,i = 1 (2.6)

Theobjective function optimized by the fuzzy C-Means algorithm is given in Equa-tion (2.7) where m is a fuzziness parameter (sometimes called a fuzzifier), which de-termines the fuzziness of the resulting clusters. For m = 1 the resulting partition will be hard, and when m → ∞the partition will be completely fuzzy (s_n,i→ _C¹). Usually when the fuzzy C-Means relies on an Euclidian distance, m = 2 is the most common choice.

R(µ) =˜

n=1 C

i=1

(s_n,i)^m||x_n−µ_i||² (2.7)

2.3. DISTANCE-BASED CLUSTERING ALGORITHMS 27 In Algorithm 4 bellow, we show the Fuzzy C-Mean algorithm. As one can see, it is fairly similar with the K-Means algorithm.

Algorithm 4: Fuzzy C-Means Algorithm Choose a value for C and for m

Randomly initialize the s⁰_n,i so that the conditions in (2.6) are respected r= 0

The affinity propagation algorithm (AP) [48] is another recent and popular clustering algorithm that relies on representing the clusters in the form of prototypes. The main idea of this method is to use a data similarity graph (containing pairwise similarities between all the data) with the goal of determining which data elements can best be used to represent the other data locally.

Let X = {x₁, ..., x_N} be a data set, and s(x_i, x_j) a similarity measure. Let R = (r_i,k)N×N be a responsibility matrix where eachr_i,k quantifies how well-suitedx_kis to be a good representative for xi relatively to the other candidates. And let A= (ai,k)N×N

be the availability matrix that quantifies how appropriate it would be forx_i to pick x_k as a representative when taking into account the other points preferences to also pick xk as their representative. The algorithm then works as shown in Algorithm 5: First it builds the matrices A and R. Then it extract the most accurate representative for each data, and pick a few of them to become prototypes based on which data are their own best representative. Finally, each data is linked to one of the final prototype.

The main advantage of the Affinity propagation algorithm is that unlike the K-Means algorithm or the Fuzzy C-K-Means algorithms, the number of clusters need not be provided. The algorithm will find by itself a certain number of prototypes based on the initial similarity graph.

One drawback of this algorithm is its high N² complexity and the need to use matrices of the same size. These requirements make the Affinity propagation algorithm unsuitable for large size data sets.

Algorithm 5: Affinity Propagation Algorithm Build the similarity graph S

InitializeA and R to all zeroes

while the algorithm did not converge do UpdateR:

r_i,k =s_i,k−max_k⁰6=k(a_i,k⁰+s_i,k⁰) UpdateA:

∀i6=k, a_i,k =min0, r_k,k +^P_i⁰_∈i,k_/ max(0, r_i⁰_,k) r_k,k =^P_i⁰_6=kmax(0, r_i⁰_,k)

end

Initialize the prototypes: C =∅ Find the prototypes:

forall the x_i ∈X do

Findk^∗ = argmax_k(a_i,k+r_i,k) if k^∗ ==i then

C+ = {k^∗} end

end

Link each data to a prototype:

forall the x_i ∈X do

Findc= argmax_k∈C(a_i,k+r_i,k) Linkx_i to x_c as a prototype end

Dans le document The DART-Europe E-theses Portal (Page 25-29)