**2.3 Distance-based clustering algorithms**

**2.3.3 Prototype-based algorithms**

Prototype-based algorithms are another family of clustering algorithms. The principle of these algorithms is based on vector quantization, a data compression process which consists in representing the data with a few representatives called prototypes. Each data will then be linked to its closest prototype in the data space. The main task of these algorithms is therefore to build relevant prototypes and link the data to them.

A common example of prototype would be a centroid of a high density area. De-pending on the number of prototypes, each of them may represent a cluster, or several of them may need to be regrouped to find the clusters.

2.3.3.1 The K-Means algorithm

The K-Means algorithm is one of the most famous prototype-based clustering
algo-rithm. It is a simple and fast, yet relatively good clustering method. Its principle is
the following [96, 122]: Suppose that we would like to divide our data into *K* clusters,
the value of *K* being known in advance. We allocate *K* cluster prototypes (also called

2.3. DISTANCE-BASED CLUSTERING ALGORITHMS 25 mean-values) to our input space, and we would like to move these prototypes so that each of them will become the centroid of a cluster. Given that we have chosen a dis-tance measure, the procedure to do so consists in alternating the following two steps until convergence: 1) Link each data to the closest prototype, 2) Move the prototype so that is becomes the barycenter of the current data to which it is linked. This procedure is described in Algorithm 3 and an example with 2 clusters on the “Old Faithful” data set is shown in Figure 2.3.

It is convenient at this point to give a notation that describes the assignment of
data points to clusters. For each data point *x** _{n}*, let

*s*

*∈ {0,1} be a set of binary indicator variables with*

_{n,i}*i*∈ [1..K]. The

*s*

*are used to describe to which one of the*

_{n,i}*K*clusters a data has been assigned. For instance, if

*x*

*is assigned to cluster*

_{n}*c*

*, then*

_{k}*s*

*= 1 and ∀i 6=*

_{n,k}*k, s*

*= 0. Ultimately, what the K-Means algorithm does is to optimize a cost function*

_{n,i}*R(µ)*˜ as given in Equation (2.5).

*R(µ) =*˜
Because each phase reduces the value of the objective function *R(µ), convergence*˜
of the algorithm is assured. However, it may converge to a local rather than global
minimum of *R(µ). The convergence properties of the K-means algorithm have been*˜
studied in [96].

Algorithm 3: K-Means Algorithm
Choose a value for *K*

Randomly initialize the *K* centroids*µ**i*

while Learning do
forall the *x** _{n}*∈

*X*do

Assign *x** _{n}* to the cluster

*c*

*with the closest centroid:*

_{i}*s** _{n,i}*=

Several algorithms based on improved or modified versions of the K-Means
algo-rithm have been proposed in the literature [41, 46, 136, 64, 77, 34, 29]. Algoalgo-rithms
based on the K-Means algorithm suffer from several weaknesses: The main one is the
need to provide a *K. It requires to know in advance how many clusters are to be*
found. In practice this is rarely the case because we expect the clustering algorithm to
actually discover the clusters. Therefore, the only solution when the number of clusters
is really unknown is to run the algorithm several times with different values of*K* and
to pick the best clustering based of a given quality index (for instance the Silhouette
index [113] or the Davies-Bouldin index [36]). This method is costly and may prove
ineffective because of the non-deterministic nature of the K-Means algorithm.
Adapta-tions of the K-Means algorithm have been proposed [106] to solve this issue, but they
remain only partially satisfying. Second, algorithms based on the K-Means can only

find hyper-spherical clusters and will also fail to detect the clusters properly if their sizes are significantly different.

Figure 2.3: Illustration of the K-Means algorithm on the Old Faithful data set

2.3.3.2 Fuzzy C-Means

Since fuzzy clustering allows the objects to belong to several clusters simultaneously with different degrees of membership, in many situations it is more natural than hard clustering. Within this context, the Fuzzy C-Means algorithm [14, 91] is a major extension of the K-Means algorithm that enables fuzzy clusters.

A generalization of the hard partition from the K-Means to the fuzzy case is easily
possible by allowing the *s** _{n,i}* to attain real values:

*s*

*∈ [0,1] with the properties shown in Equation (2.6). A partition can then be represented by the partition matrix*

_{n,i}*S*= (s

*)*

_{n,i}*where*

_{C×N}*N*is the size of the data set and

*C*the number of clusters (equivalent to the

*K*in K-Means).

*s** _{n,i}*∈[0,1], ∀n

*C*

X

*i=1*

*s** _{n,i}* = 1 (2.6)

Theobjective function optimized by the fuzzy C-Means algorithm is given in
Equa-tion (2.7) where *m* is a fuzziness parameter (sometimes called a fuzzifier), which
de-termines the fuzziness of the resulting clusters. For *m* = 1 the resulting partition will
be hard, and when *m* → ∞the partition will be completely fuzzy (s* _{n,i}*→

_{C}^{1}). Usually when the fuzzy C-Means relies on an Euclidian distance,

*m*= 2 is the most common choice.

*R(µ) =*˜

*N*

X

*n=1*
*C*

X

*i=1*

(s* _{n,i}*)

*||x*

^{m}*−*

_{n}*µ*

*||*

_{i}^{2}(2.7)

2.3. DISTANCE-BASED CLUSTERING ALGORITHMS 27 In Algorithm 4 bellow, we show the Fuzzy C-Mean algorithm. As one can see, it is fairly similar with the K-Means algorithm.

Algorithm 4: Fuzzy C-Means Algorithm
Choose a value for *C* and for *m*

Randomly initialize the *s*^{0}* _{n,i}* so that the conditions in (2.6) are respected

*r*= 0

The affinity propagation algorithm (AP) [48] is another recent and popular clustering algorithm that relies on representing the clusters in the form of prototypes. The main idea of this method is to use a data similarity graph (containing pairwise similarities between all the data) with the goal of determining which data elements can best be used to represent the other data locally.

Let *X* = {x_{1}*, ..., x** _{N}*} be a data set, and

*s(x*

_{i}*, x*

*) a similarity measure. Let*

_{j}*R*= (r

*)*

_{i,k}*N×N*be a responsibility matrix where each

*r*

*quantifies how well-suited*

_{i,k}*x*

*is to be a good representative for*

_{k}*x*

*i*relatively to the other candidates. And let

*A*= (a

*i,k*)

*N*×N

be the availability matrix that quantifies how appropriate it would be for*x** _{i}* to pick

*x*

*as a representative when taking into account the other points preferences to also pick*

_{k}*x*

*k*as their representative. The algorithm then works as shown in Algorithm 5: First it builds the matrices

*A*and

*R. Then it extract the most accurate representative for*each data, and pick a few of them to become prototypes based on which data are their own best representative. Finally, each data is linked to one of the final prototype.

The main advantage of the Affinity propagation algorithm is that unlike the K-Means algorithm or the Fuzzy C-K-Means algorithms, the number of clusters need not be provided. The algorithm will find by itself a certain number of prototypes based on the initial similarity graph.

One drawback of this algorithm is its high *N*^{2} complexity and the need to use
matrices of the same size. These requirements make the Affinity propagation algorithm
unsuitable for large size data sets.

Algorithm 5: Affinity Propagation Algorithm
Build the similarity graph *S*

Initialize*A* and *R* to all zeroes

while the algorithm did not converge do
Update*R:*

*r** _{i,k}* =

*s*

*−max*

_{i,k}

_{k}^{0}6=k(a

_{i,k}^{0}+

*s*

_{i,k}^{0}) Update

*A:*

∀i6=*k, a** _{i,k}* =

*min*

^{}0, r

*+*

_{k,k}^{P}

_{i}^{0}

_{∈i,k}

_{/}*max(0, r*

_{i}^{0}

*)*

_{,k}^{}

*r*

*=*

_{k,k}^{P}

_{i}^{0}

_{6=k}

*max(0, r*

_{i}^{0}

*)*

_{,k}end

Initialize the prototypes: *C* =∅
Find the prototypes:

forall the *x** _{i}* ∈

*X*do

Find*k*^{∗} = argmax* _{k}*(a

*+*

_{i,k}*r*

*) if*

_{i,k}*k*

^{∗}==

*i*then

*C+ =* {k^{∗}}
end

end

Link each data to a prototype:

forall the *x** _{i}* ∈

*X*do

Find*c*= argmax* _{k∈C}*(a

*+*

_{i,k}*r*

*) Link*

_{i,k}*x*

*to*

_{i}*x*

*as a prototype end*

_{c}