A CLUSTERING MODEL WITH R´ENYI ENTROPY REGULARIZATION

(1)

WITH R´ ENYI ENTROPY REGULARIZATION

COSTIN CIPRIAN POPESCU

A fuzzy clustering model (fcm) with regularization function given by R´enyi entropy is discussed. We explore theoretically the cluster pattern in this mathematical model, together with some considerations about the cross-sectional Coppi- D’Urso approach and R´enyi measure of entropy.

AMS 2000 Subject Classification: 94A17.

Key words: fuzzy clustering, regularization, R´enyi entropy.

1. INTRODUCTION

Fuzzy clustering is one of the most important research direction in the field of statistics. In the last decades, many outstanding results were pub- lished in literature (Bezdek, 1981; Coppi and D’Urso, 2003, 2006). The goal of clustering algorithms is to divide a set of observations or data into a finite number of clusters on the basis of a proper criterion. A classical approach is based on minimization of F(U,V) =

n

P

i=1 c

P

k=1

u^m_ikd²(xi,c_k), where U is the matrix of membership functions, C the matrix of clusters centroids, u_ik the membership function of the vector xi ∈ R^p to cluster k, X the data matrix and da certain distance in R^p (Bezdek, 1981; Leski, 2003). In this approach, the number of clusters and m ∈ [1,∞) (the fuzziness coefficient or weighted exponent) are known. This method is usually called fuzzy c-means (Bezdek, 1981). For ameliorating the clustering procedure, in the sense of reduction of noise and outliers, new methods and algorithms were introduced. A method may be considered robust if it gives a good accuracy of the model, and small deviations from the model hypotheses do not substantially impair the per- formance while larger deviations do not invalidate the model ( Leski, 2003).

In the last decades, the general tendency is to renounce to use the fuzziness coefficient (Miyamoto and Mukaidono, 1997; Coppi and D’Urso, 2006) and to introduce a regularization term, namely the fuzzy entropy (which accounts for the uncertainty given by the membership degrees u_ik). An illustrative approach for these new models was developed by Coppi and D’Urso (2006)

MATH. REPORTS11(61),1 (2009), 59–65

(2)

and called cross-sectional fuzzy clustering model or fcm. As regularization term was takenp

I

P

i=1 C

P

c=1

uicloguic(a fuzzy variant of Shannon entropy), withp called “degree of fuzzy entropy”. Otherwise, the concept of “entropy” plays an important role in information theory. In this paper, the cross-sectional model is extended by using R´enyi’s entropy of order α.

2. A FUZZY CLUSTERING MODEL WITH R ´ENYI REGULARIZATION

In what follows Coppi-D’Urso’s cross sectional fcm [3] will be generalized using the instrumentality of R´enyi entropy of orderα. This kind of model deals with three types of information (see also [2]). We say that the set of initial dataX={x_ijt, i= 1, . . . , I;j = 1, . . . , J;t= 1, . . . , T}is a “time data array”

(units × variables ×times) if x_ijt is the jth LR₁ fuzzy variable observed on the ith unit at time t[3].

Since this time w_t⁽¹⁾ is an “instantaneous” weight [3], the cross sectional fcm with R´enyi regularization is equivalent to a weighted minimization problem, namely,

minf

uic,w⁽¹⁾,h⁽¹⁾_ct

subject to

C

P

c=1

u_ic = 1 (u_ic≥0) and

T

P

t=1

w⁽¹⁾_t =1 (w⁽¹⁾_t ≥0), where

f

u_ic,w⁽¹⁾,h⁽¹⁾_ct

=

I

X

i=1 C

X

c=1

u_ic

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_ict2

+p 1 1−α

I

X

i=1

"

log

C

X

c=1

u^α_ic

!#

=

I

X

i=1 C

X

c=1

u_ic

T

X

t=1

w⁽¹⁾_t

x_it−h⁽¹⁾_ct

2

+p 1 1−α

I

X

i=1

"

log

C

X

c=1

u^α_ic

!#

. Here,h⁽¹⁾_ct gives the centroid of thecth cluster at timet, andu_i = (u_i1, . . . , u_ic, . . . , uiC)^T,w⁽¹⁾ = w⁽¹⁾₁ , . . . , w⁽¹⁾_t , . . . , w⁽¹⁾_T T

.

First, we search for ui, i = 1, . . . , I. Fix the values w⁽¹⁾_t ,h⁽¹⁾_ct . The Lagrangian function is given by

L_p(u_i⁰, λ) =

C

X

c=1

u_i⁰_c

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0ct

2

+p 1 1−αlog

C

X

c=1

u^α_i0c

!

−λ

C

X

c=1

u_i⁰_c−1

! . It follows from

∂L_p(u_i⁰, λ)

∂u_i⁰_c⁰ = 0 and ∂L_p(u_i⁰, λ)

∂λ = 0

(3)

that

T

X

t=1

w_t⁽¹⁾d⁽¹⁾_i0c⁰t

2

+p 1 1−α

αu^α−1_i0c⁰ C

P

c=1

u^α_i0c

−λ= 0,

C

X

c=1

u_i⁰_c= 1.

Thus,

T

X

t=1

w_t⁽¹⁾d⁽¹⁾_i0c⁰t

2

+p α 1−α

u^α−1_i0c⁰ C

P

c=1

u^α_i0c

−λ= 0

=⇒

" _T X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2# " _C X

c=1

u^α_i⁰_c

#

+p α

1−αu^α−1_i0c⁰ −λ

" _C X

c=1

u^α_i⁰_c

#

= 0

=⇒





 u^α_i⁰_c⁰+

C

X

c=1 c6=c⁰

u^α_i⁰_c







"

−λ+

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2#

+p α

1−αu^α−1_i0c⁰ = 0

=⇒u^α_i0c⁰

"

−λ+

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2#

+p α

1−αu^α−1_i0c⁰ +

+







C

X

c=1 c6=c⁰

u^α_i⁰_c







"

−λ+

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2#

= 0.

The terms which contain the membership degrees will be replaced by the first Taylor polynom. We then have

(1+αu_i⁰_c⁰−α)

"

−λ+

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2#

+p α

1−α[1 + (α−1)u_i⁰_c⁰−(α−1)]+

+







C

X

c=1 c6=c⁰

(1 +αu_i⁰_c−α)







"

−λ+

T

X

t=1

h

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

i2#

= 0

=⇒

"

−λ+

T

X

t=1

w_t⁽¹⁾d⁽¹⁾_i0c⁰t

2#

(1−α) +αu_i⁰_c⁰

"

−λ+

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2# +

+pα(2−α)

1−α −pαu_i⁰_c⁰+







C

X

c=1 c6=c⁰

(1−α) +α

C

X

c=1 c6=c⁰

u_i⁰_c







"

−λ+

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2#

= 0

(4)

=⇒

"

−λ+

T

X

t=1

w_t⁽¹⁾d⁽¹⁾_i0c⁰t

2#

(1−α) +αu_i⁰_c⁰

"

−λ+

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2# +

+pα(2−α)

1−α −pαu_i⁰_c⁰+ [(C−1)(1−α) +α(1−u_i⁰_c⁰)]

"

−λ+

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2#

= 0

=⇒

"

−λ+

T

X

t=1

w_t⁽¹⁾d⁽¹⁾_i0c⁰t

2#

(1−α) +αui⁰c⁰

"

−λ+

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2# +

+pα(2−α)

1−α −pαu_i⁰_c⁰+ [(C−1) (1−α) +α]

"

−λ+

T

X

t=1

w_t⁽¹⁾d⁽¹⁾_i0c⁰t

2#

−

−αu_i⁰_c⁰

"

−λ+

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2#

= 0 =⇒

"

−λ+

T

X

t=1

w_t⁽¹⁾d⁽¹⁾_i0c⁰t

2#

(1−α) +

+pα(2−α)

1−α −pαu_i⁰_c⁰+ [(C−1) (1−α) +α]

"

−λ+

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2#

= 0.

With

−λ+

T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

_{2 not}

= D_i⁰_c⁰, we obtain

D_i⁰_c⁰(1−α) +pα(2−α)

1−α −pαu_i⁰_c⁰+ [(C−1) (1−α) +α]D_i⁰_c⁰ = 0

=⇒pαu_i⁰_c⁰ = [(C−1) (1−α) +α+ (1−α)]D_i⁰_c⁰+pα(2−α) 1−α

=⇒u_i⁰_c⁰ = 2−α 1−α + 1

pα[(C−1) (1−α) + 1]D_i⁰_c⁰. Since

C

X

c⁰=1

u_i⁰_c⁰ = 1, we have

C

X

c⁰=1

2−α 1−α + 1

pα[(C−1) (1−α) + 1]D_i⁰_c⁰

= 1

=⇒ C(2−α) 1−α + 1

pα[(C−1) (1−α) + 1]

C

X

c⁰=1

"

−λ+

T

X

t=1

w_t⁽¹⁾d⁽¹⁾_i0c⁰t

2#

= 1

=⇒ C(2−α) 1−α + 1

pα[(C−1) (1−α) + 1]

" _C X

c⁰=1 T

X

t=1

w_t⁽¹⁾d⁽¹⁾_i0c⁰t

2#

−

(5)

−C

pα[(C−1) (1−α) + 1]λ= 1 =⇒λ= pα(2−α)

(1−α) [(C−1) (1−α) + 1]+ +1

C

" _C X

c⁰=1 T

X

t=1

w_t⁽¹⁾d⁽¹⁾_i0c⁰t

2#

− pα

C[(C−1) (1−α) + 1]. Thus,

D_i⁰_c⁰ =

" _T X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2#

− pα(2−α)

(1−α) [(C−1) (1−α) + 1]−

−1 C

" _C X

c⁰=1 T

X

t=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2#

+ pα

C[(C−1) (1−α) + 1] =

=

T

X

t=1

"

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2

− 1 C

C

X

c⁰=1

w_t⁽¹⁾d⁽¹⁾_i0c⁰t

2#

+ pα(1−α−2C+αC) C(1−α) [(C−1) (1−α) + 1]

and

u_i⁰_c⁰ = 2−α 1−α + 1

pα[(C−1) (1−α) + 1]D_i⁰_c⁰ =

= 1 C + 1

pα[(C−1) (1−α) + 1]

T

X

t=1

"

w_t⁽¹⁾d⁽¹⁾_i0c⁰t

2

− 1 C

C

X

c⁰=1

w⁽¹⁾_t d⁽¹⁾_i0c⁰t

2# .

In order to find a solution forw⁽¹⁾_t , consider the Lagrangian function Lp

w⁽¹⁾, ρ

=

I

X

i=1 C

X

c=1

uic T

X

t=1

w_t⁽¹⁾d⁽¹⁾_ict 2

+

+p 1 1−α

I

X

i=1

"

log

C

X

c=1

u^α_ic

!#

−ρ

T

X

t=1

w⁽¹⁾_t −1

! . From

∂L_p w⁽¹⁾, ρ

∂w_t⁽¹⁾0

= 0 and ∂L_p w⁽¹⁾, ρ

∂ρ = 0

we obtain 2w_t⁽¹⁾0

I

X

i=1 C

X

c=1

u_ic d⁽¹⁾_ict0

2

−ρ= 0 and

T

X

t=1

w⁽¹⁾_t −1 = 0.

Then

2w⁽¹⁾_t0

I

X

i=1 C

X

c=1

u_ic d⁽¹⁾_ict0

2

−ρ= 0

(6)

=⇒w_t⁽¹⁾0 = ρ 2

I

P

i=1 C

P

c=1

uic

d⁽¹⁾_ict0

2 = ρ 2

1

I

P

i=1 C

P

c=1

uic

d⁽¹⁾_ict0

2;

T

X

t=1

w_t⁽¹⁾= 1 =⇒

T

X

t⁰=1

w⁽¹⁾_t0 = 1 =⇒

T

X

t⁰=1

ρ 2

1

I

P

i=1 C

P

c=1

uic

d⁽¹⁾_ict0

2 = 1

=⇒ ρ 2

T

X

t⁰=1

1

I

P

i=1 C

P

c=1

u_ic d⁽¹⁾_ict0

2 = 1

=⇒ ρ

2 = 1

T

P

t⁰=1

1

I

P

i=1 C

P

c=1

u_ic d⁽¹⁾_ict0

2

=







T

X

t⁰=1

1

I

P

i=1 C

P

c=1

uic

d⁽¹⁾_ict0

2







−1

.

Thus,

w_t⁽¹⁾= ρ 2

1

I

P

i=1 C

P

c=1

u_ic

d⁽¹⁾_ict2 =

=

" _I X

i=1 C

X

c=1

uic

d⁽¹⁾_ict

2#⁻¹







T

X

t⁰=1

1

I

P

i=1 C

P

c=1

u_ic d⁽¹⁾_ict0

2







−1

= 1

T

P

t⁰=1 I

P

i=1 C

P

c=1

u_ic d⁽¹⁾_ict2 I

P

i=1 C

P

c=1

u_ic d⁽¹⁾_ict0

2

.

A solution forh⁽¹⁾_ct is obtained by solving the problem ming

h⁽¹⁾_ct , where

g

h⁽¹⁾_ct

=

I

X

i=1 C

X

c=1

uic T

X

t=1

w⁽¹⁾_t

xit−h⁽¹⁾_ct

2

+p 1 1−α

I

X

i=1

"

log

C

X

c=1

u^α_ic

!#

.

Hence

h⁽¹⁾_ct =

I

P

i=1

uicxit I

P

i=1

u_ic .

(7)

Remark that, even theoretically, the solution for membership degrees with the R´enyi entropy are far away from that obtained with the Shannon regularization [3] but the calculations for w_t⁽¹⁾ and h⁽¹⁾_ct are similar to those in [3].

It is important to notice that the solutions depend on each other. It is thus necessary to resort to a certain procedure for a complete approach.

The final calculation is given by an iterative algorithm in four main steps [3]; from now on (in concrete situations, numerical calculus – the clustering methods with various types of entropy already have a wide range of practical applications), after establishing a nearness degree ξ between the members of the same cluster, a software (for instance Matlab, because some routines already exist in the field’s literature) is required.

REFERENCES

[1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, 1981.

[2] R. Coppi and P. D’Urso,Three way fuzzy clustering models for LR fuzzy time trajecto- ries. Comput. Statist. Data Anal.43(2003), 149–177.

[3] R. Coppi and P. D’Urso,Fuzzy unsupervised classification of multivariate time trajecto- ries with the Shannon entropy regularization. Computat. Statist. Data Anal.50(2006), 1452–1477.

[4] C. Lep˘adatu and E. Nit¸ulescu, Information energy and information temperature for molecular systems. Acta Chim. Slovaca50(2003), 539–546.

[5] J. Leski, Towards a robust fuzzy clustering. Theme: Data analysis. Fuzzy Sets and Systems137(2003), 215–233.

[6] S. Miyamoto and M. Mukaidono, Fuzzy c-means as a regularization and maximum entropy approach. In: Proc. Seventh Internat. Fuzzy Systems Assoc. World Congress, Vol. II, pp. 86–92. Prague, 1997.

[7] V. Preda,Teoria deciziilor statistice. Ed. Academiei Romˆane, Bucure¸sti, 1992.

[8] V. Preda, Optimality conditions for a nondifferentiable multiobjective problem. Revue Roumaine Math. Pures Appl.51(2006), 485–496.

[9] A. R´enyi, On measures of information and entropy. In: Proc. 4th Berkeley Sympos.

Math. Statist. Probab., Vol. 1, pp. 547–561. Univ. California Press, Berkeley, CA, 1961.

[10] C. E. Shannon,A mathematical theory of communication. Bell System Technical Journal 27 (1948), 379–423 and 623–656.

[11] A. Zadeh, Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans. Systems Man Cybern.3(1973), 28–44.

Received 8 May 2008 Academy of Economic Studies

Department of Mathematics Calea Dorobant¸ilor 15-17 010552 Bucharest, Romania

cippx@yahoo.com