Cluster validity for relational models - The non-Euclidean RFCM (NERFCM) model

3 Cluster Analysis for Relational Data

Definition 3.3 Zadeh (1971) A fuzzy relation R is a fuzzy similarity relation (or fuzzy equivalence relation) if R is reflexive, symmetric

D. The non-Euclidean RFCM (NERFCM) model

3.6 Cluster validity for relational models

Methods for validation of clusters found from relational data are scarce. Validity for partitions based on object data X was discussed in Section 2.5. When X c 9?P is transformed into relational data R using the techniques in Section 3.2, subsequent processing with a relational algorithm leads to a crisp or fuzzy c-partition of X. In this case many of the indices in Section 2.5 can be used for validation of U, since X is available.

When the data are relational to begin with, validation methods that explicitly require numerical data (for example, all direct indices) are not applicable to the question of cluster validity. When a relational algorithm produces/uzzy partitions of O, indirect indices such as the partition coefficient and partition entropy can be used for validation.

RELATIONAL CLUSTERING MODELS 179

Jain and Dubes (1988) cover several statistical h3qDothesis tests for validation of entire hierarchies (dendrograms) of clusters that are obtained from methods such as the linkage algorithms and Zadeh's decomposition of the transitive closure. Another validation strategy for linkage type algorithms is associated with the size of Jumps taken by the set distance 6 that controls merger or splitting (Hartigan, 1975). For example, when the procedure begins with 6=0 at c = n and terminates at 6 = maximum at c = 1, the usueil strategy is to look for the largest jump in AS = 6(c -1) - 8(c). This is taken as an indicator that c is the most natural choice for the best number of clusters, on the presumption that the SAHN method works hardest to merge clusters that cause the biggest Jump. In Figure 3.4, for example, successive Jumps in the single linkage merger distances are 0.50, 0.50, 0.85 and 0.21. The largest jump, (0.85 from c = 3 to c

=2) identifies the crisp partition X = {l, 2} u {3.4} u {5,6,7,8,9}, c = 3 clusters at 5 . =1.00, as the most natural ones. The configuration of

min ='

the data in Figure 3.2 seems to confirm this visually, although a case can be made that c = 2 is Just as natural.

The sequence of j u m p s for the complete linkage solution shown in Figure 3.4 is 0.50, 0.50, 0.75, 0.45 and 1.9, indicating t h a t the clusters associated with c = 2 are the best choice, which is of course different than the solution offered by single linkage. One problem with this method is that the biggest jump can be severely influenced by the presence of a few outliers.

Zadeh's algorithm decomposes a fuzzy relation into crisp partitions with different values of c at values of a corresponding to a-cuts of R°°. The scalars {a} are sometimes regarded as a rough indication of the validity of each hard clustering of O. In Example 3.7, we might assert that each object belongs by itself (c = 5) with confidence a = 1.

But this is always true, and leads to the wrong conclusion - i.e., that the best clustering is one with all singletons. Continuing this reasoning, x and x would belong together with confidence a = 0.9, and so on. Since a is j u s t the edge weight of the strongest adjacency link between each pair of nodes in R , the word confidence as used here has no statistical connotation, and this use of the values of a is pretty misleading, for they do not portray "better and better"

partitions as a gets closer and closer to 1. Indeed, you might argue t h a t the confidence in a partition by R should be inversely proportional to a, and we would not object. In view of Theorem M, we know that single linkage generates the same clusters as Theorem Z for fuzzy similarity relations. Consequently, the largest j u m p method can also be used for validation of clusters such as those associated with the dendrogram in Figure 3.5. In this figure, the

successive j u m p s in a are : 0.10, 0.10, and 0.40, indicating a strong preference for c = 3 clusters.

When R is clustered with convex decomposition, since X Cj^ = 1, c p

k=l ^

indicates the "percentage" of R needed for convex factorization of R.

In terms of cluster validity then, c can be loosely interpreted as an indicator of the relative merit of the associated c-partition induced on O by R . In Example 3.8, this leads to interpreting the 3-partition UQ 4 as the most valid choice, and the two partitions with c = 0.3 at different values of c are regarded as less but equally valid.

3.7 Comments and bibliography

The SAHN, transitive closure and convex decomposition techniques produce hard partitions from certain crisp or fuzzy relations. FNM, AP, RFCM a n d NERFCM all produce fuzzy partitions from particular classes of fuzzy relations by minimizing a relational objective function with alternating optimization. Kaufman and Rouseeuw (1990) discuss a method called FANNY that is closely related to RFCM. Sen and Dave (1998) show that using the method of LaGrange multipliers with the RFCM objective function in (3.25a) leads to the RFCM algorithm without making the assumption in (3.25b), b u t the derivation does not ensure t h a t all of the memberships will be non-negative, because LaGrange multipliers only enforce equality constraints. Following Kaufmann and Rousseeuw's derivation of FANNY, these authors have very recently obtained an even stronger result using Kuhn-Tucker theory that proves t h a t the memberships will satisfy the required non-negativity condition. This result will be published in a forthcoming paper. The equations obtained in their Kuhn-Tucker approach are slightly different than the ones given in our description of the RFCM algorithm.

Fuzzy partitions enable the user to quantitatively distinguish between objects which are strongly associated with particular c l u s t e r s from those t h a t have only a marginal (borderline) association with several classes. The AP algorithm assumes the existence of prototypical objects which should be good representatives of different clusters (but does not give a method for finding them, although the object with maximum typicality would be an obvious candidate), and has the least restrictions on relations that it can process. Runkler and Bezdek (1998b) give a relational version of the alternating cluster estimation method called RACE t h a t explicitly finds prototypical object indices (and hence, prototypical objects too), even though the data are known only in relational form. RACE finds a fuzzy partition of the objects too, and, like its object data counterpart (ACE, Runkler and Bezdek, 1998a), is not objective function driven.

RELATIONAL CLUSTERING MODELS 181

Using NERFCM-AO is in one sense like extracting clusters from a crisp relation by first computing its v - A transitive closure as in Example 3.2. Both methods group the n objects underlying the original relation by clustering in a transformed relational data matrix. For NERFCM-AO it is difficult to estimate of how closely clusters in Rjj might resemble those extracted from R by some other method. Intuitively, if the required spread is not too large, structure inherent in R should be mirrored by t h a t in Rp. Hathaway and Bezdek (1994c) discuss a crisp version of this model called, naturally, non-Euclidean relational hard c-means (NERHCM).

The usefulness of relational clustering methods is limited by several things. First is the matter of their computational complexity. O(n^) run times are flne if n = 150, as in the Iris data. But in large relational databases, n may be on the order of 10^, and CPU time becomes an important factor. On the other hand, some information retrieval problems, for example, are cast naturally in the form of relational clustering, and there may be little choice b u t to use one of these schemes. Another limitation with the methods in this chapter is that they are explicitly limited to s q u a r e relations, while a real application may have rectangular relational data.

Delgado et al. (1995) propose the use of hierarchical clustering algorithms such as single linkage for cluster validity. In their view the failure of cluster validity functionals such as the ones discussed in Chapter 2 can be ameliorated by pre-clustering data with a SAHN algorithm, and using the results to limit the range of c and provide good initializations for the "real" clustering to follow (presumably by a non-crisp clustering algorithm). Having said this, they try to address the topic of Section 3.6 how to validate relational clusters -by proposing several validity measures for their SAHN algorithms.

In other words, they end up with the same problem in one domain they are trying to avoid in another domain! They give several numerical examples of their methods, including one with the Iris data.

Sato et al. (1997) propose three relational clustering models they call additive clustering models. In the language of our book these correspond to crisp (ordered additive), fuzzy (simple additive) and posslbilistic (overlapping additive) clustering schemes. All three methods are regarded as relatives of the crisp relational model of Shephard and Arable (1979), extended using fuzzy data, fuzzy dissimilarity and multicriteria clustering. The basic objective function for Sato et al.'s three additive models is

m m •; K,cM(U,a) =

n n / c

I Z kkj - a I UjkU

j=l k;=lV i=l

I I f k j - r

j=i k=i ^ '

, where (3.37a)

n n X Z TkJ J=l k=l

n(n -1) (3.37b)

Sato et al. build three models (one each for U e Mj^^^.M^^^.M ^^) based on variations of (3.37) that are used with ratio, interval and ordinal relational data. The relational data matrix in their models is not restricted to inner product norm distance relations or even symmetric relations. Sato et al. also discuss a "generalized fuzzy clustering model" for relational data t h a t uses an aggregation operator in the objective function

m m K„(U)= I S

J=l k=l

r , - I T , u

^ik ^{U . ,} ^(3.38)

where T is any T-norm. Three choices are discussed and exemplified in Sato et al. : the minimum (T ), product (T ) and Hamacher T-norms (Klir and Yuan, 1995). Also given are methods for optimizing (3.38) in each of the three cases. They give several examples of clustering with each of these models using small relational data sets, but like their discussion of TFCM (Section 2.6), no numerical comparisons to other relational clustering models are offered, so we are again at a loss to make any assessment of the utility of these models. However, the work presented in this little book considerably extends the body of fuzzy clustering algorithms available for relational data, so if this is the type of data you have, by all means consider trying one or more of these approaches.

Dans le document FUZZY MODELS AND ALGORITHMS FOR PATTERN RECOGNITION AND IMAGE PROCESSING (Page 192-197)