Semi-supervised clustering models - 2 Cluster Analysis for Object Data

2 Cluster Analysis for Object Data

B. Semi-supervised clustering models

Objective functions such as J and J that minimize s u m s of squared errors are well known for their propensity to find solutions t h a t "balance" the n u m b e r of members in each cluster. This illustrates the sometimes confusing and always frustrating fact that lower values of J do NOT necessarily point to better partitions of X.

Semi-supervised c -means clustering models attempt to overcome this limitation. In this category are models due to Pedrycz (1985), Hirota and Iwama (1988) and Bensaid et al. (1996a). They are applicable in domains where users may have a small set of labeled data that can be used to supervise clustering of the remaining data (this is often the case, for example, in medical image segmentation).

Algorithms in this category are clustering algorithms t h a t use a finite design set X*^ c 9^^ of labeled (crisp or otherwise) data to help clustering algorithms partition a finite set X" c 9^P of unlabeled data. These algorithms terminate without the capability to label additional points in 9^P - t h a t is, they do not build classifier functions. X is used to guide FCM-AO to a good c-partition of X".

Let X = X"^ u X", Ix'*] = n^, | x " | = n^, |X| = n^ + n^ = n . Without loss we assume that the labeled data are the first n points in X,

labeled

^US „ u _ u —V 2L. , i . „ , . . . , * .

unlabeled

y-x^ux".

^(2.11)

Pedrycz (1985) defined pointer b = 1 if x^^ is labeled, and b = 0 otherwise. Then he defined the matrix F = [fi]j^] with the given label vectors in appropriate columns and zero vectors elsewhere.

Pedrycz modified J at (2.5) to the new functional J„(U, V) = a I . I (u,, - b, f, )"D:^ + I I , (ujrn

l=lk=l ^{i k} k i k ' Ik l=lk=l Ik' I k ' (2.12) where a > 0 and U in M, is a c x n matrix to be found by

t e n

minimizing (2.12). Under the same assumptions as in Table 2.2, Pedrycz derived first order necessary conditions for J^^^ by differentiating (2.12) with respect to U and V in the usual fashion.

The formula for V remains (2.7b), while (2.7a) is replaced by the more complicated expression

u 1

ik 1 + a ^l/{m-l)

l + [ocl/(m-l)j

1 - b k I f j k + ai/'™-"btf k^ik . | ^ ( D i k / D j k )

\2/(m-l) (2.13)

Replacing (2.7a) with (2.13) 3delds the semi-supervised FCM-AO of Pedrycz which we call ssJcm-AO. J^ includes a new term whose minimization "forces" U to follow F for the patterns that are already labeled. Weight factor a is used to balance unequal cluster population. Notice especially that U is a new partition of all of X. so at termination the supervising labels are replaced by the computed labels.

The approach to semi-supervision taken by Bensaid et al. (1996a) is to heuristically alter the equations for FCM-AO given in (2.7). Their premise is that the supervising data are labeled correctly, so the n labels (crisp or fuzzy) in U'' should be fixed. They use AO scheme (2.4c), and take the initial matrix as U^ = [U |U^], where only UQ is initialized. The terminal matrix has the form U^ = [U |Uj 1.

CLUSTER ANALYSIS 25 In ordinary HCM-AO or FCM-AO, once U is determined, the next step is to compute cluster centers {v } using all n columns of U . However, since the last n columns of U„ are user-initialized, these

u 0

authors compute the first set of cluster centers using only the n columns in U''. This is justified by their belief that using only the labeled data to find the initial cluster centers makes them "well-seeded". Consequently, they calculate

1,0=I Kor^k / 1 K,or. 1 ^i^c

_k;=l'^X ^(2.14)

Next, AO c-means ordinarily calculates U using the {v } to update all n columns of U . However, Bensaid et al. use the functional form at (2.7a) and update only the n columns in U" by calculating, for 1 <

i < c; 1 < k < n ,

u . _lk,t

I

x " - v

k i.t-1 J , t - l | 2

m-1 ,t=l,...,T. (2.15)

The cluster centers are then allowed to migrate in feature space, by using all n columns of U to subsequently recompute the {v } after the first pass. To counter the possible effect of unequal cluster populations, the few samples that are labeled are weighted more heavily t h a n their unlabeled counterparts. This is done by introducing non-negative weights w = (w , w ,..., w ) as follows:

V = i.t

/ d \"

W, U ,

1 k l ik.t/ k k ^ i \ ik.t

^ ' ^ k « t

V k=l k^ ik.t

iKf

, l < i < c ; t = l , . . . , T (2.16) x^is replicated w times by this weighting scheme. Equations (2.14)-(2.16) comprise the basis of the semi-supervised FCM (ssFCM) algorithm of Bensaid et al. (1996a). The major difference between ssFCM and ssfcm-AO is that Pedrycz's scheme is an attempt to solve a new (different than (2.5)) optimization problem, whereas Bensaid et al.'s method is a heuristically defined approach based on (2.5) that is not a true optimization problem (and hence, does not bear the designation AO).

Each point in X'* can be given a different weight in ssFCM. The vector of weights in (2.16) is analogous to the factor a in (2.13): it is chosen by the user to induce the labeled data to drive the clustering

algorithm towards a solution that avoids the problem of population balancing that is illustrated in our next example.

Example 2.3 Figure 2.3(a) shows the results of processing a data set given in Bensaid et al. (1996a) called X , which h a s c = 2 visually apparent clusters, with FCM-AO. Cluster 1 (X ) on the left h a s 40 points, while cluster 2 (X ) on the right has only 3.

Figure 2.3(a) A hardened FCM-AO partition of X

4 3

Data very similar to these appear on p. 220 of Duda and Hart (1973), where they were used to illustrate the tendency of J to split large clusters. Figure 2.3(a) is essentially the same as Figure 6.13(a) in Duda and Hart, except that our figure is a crisp partition of X

4 3

obtained by hardening the terminal partition of a run of FCM-AO.

The basic parameters used were the Euclidean norm for both J and E ^ . c m 2 and e= 0.0001. The terminal cluster centers are indicated by the symbol i'^). Notice how the large number of points in X draws v far to the left of its visually desirable position. Here, unequal cluster sizes cause J to identify a visually disagreeable solution. This exemplifies our caveat about mathematical models:

J prefers this partition to the one we would choose because its measure of similarity and method of aggregation is only a crude and very limited representation of what we do in order to see the obvious structure in the data.

CLUSTER ANALYSIS 27 View 2.3(b) shows a partition obtained by hardening the terminal ssFCM (Bensald et al., 1996a) partition of X found using the same basic parameters as for FCM-AO. As shown in Figure 2.3(b), ssFCM used four points from X and 1 point from X as the supervising labeled data (so n = 5), and equal weights w.= 6 for each supervising point. Apparently ssFCM overcomes the problem illustrated in Figure 2.3(a). The three points in X are isolated from the large cluster in the hardened partition, and v occupies a position that is visually correct. However, the supervising points for the left cluster were well chosen in the sense that they, like v , occupy the visually apparent center of X . Loosely speaking, the weight w = 6 essentially gives these four points 6 times as much influence as any unlabeled individual during iteration of ssFCM.

^ Supervising points

Figure 2.3(b) A hardened ssFCM-AO partition of X _{4 3}

View 2.3(c) shows a partition obtained by hardening the terminal ssfcm-AO (Pedrycz) partition of X found using the same basic parameters and supervising points as for ssFCM, with scale factor a in (2.12) set at a = 200. Figures 2.3(a) and 2.3(c) are very similar, and based on the difference between views 2.3(b) and 2.3(c), it appears that ssFCM-AO is superior to ssfcm-AO. However, this may be due more to a fortuitous choice of supervising points and weights than by any inherent superiority in ssFCM. Some insight into this aspect of comparing various miethods is gained by simply altering the points used for supervision of ssfcm-AO, with all other parameters fixed.

Figure 2.3(c) A hardened ssfem-AO partition of X _{4 3}

Supervising points

Figure 2.3(d) Another hardened ssfcm-AO partition of X _{4 3} Figure 2.3(d) shows the Pedrycz (1985) ssfcm-AO result vv^hen the six points shown there are used to supervise it. For this alternate choice of supervision, the algorithm of Pedrycz produces a hardened partition that is quite similar to the one shovm in view 2.3(b); only 2 of the 43 points are mislabeled in this crisp partition of the data. We have no doubt that some combination of supervising points and algorithmic parameters would enable ssfcm-AO to produce the same partition t h a t ssFCM does here. This emphasizes an important point. Most clustering methods will produce almost identical results

CLUSTER ANALYSIS 29 if you have enough time to find the parameters that yield the same solution. For p > 3, the luxury of choosing supervising data that are

"Just right" is not available even when some of the data are labeled.

See Bensaid et al. (1996a) for further discussion about this problem.

Dans le document FUZZY MODELS AND ALGORITHMS FOR PATTERN RECOGNITION AND IMAGE PROCESSING (Page 37-43)