• Aucun résultat trouvé

A CLUSTERING MODEL WITH R´ENYI ENTROPY REGULARIZATION

N/A
N/A
Protected

Academic year: 2022

Partager "A CLUSTERING MODEL WITH R´ENYI ENTROPY REGULARIZATION"

Copied!
7
0
0

Texte intégral

(1)

WITH R´ ENYI ENTROPY REGULARIZATION

COSTIN CIPRIAN POPESCU

A fuzzy clustering model (fcm) with regularization function given by R´enyi en- tropy is discussed. We explore theoretically the cluster pattern in this mathemat- ical model, together with some considerations about the cross-sectional Coppi- D’Urso approach and R´enyi measure of entropy.

AMS 2000 Subject Classification: 94A17.

Key words: fuzzy clustering, regularization, R´enyi entropy.

1. INTRODUCTION

Fuzzy clustering is one of the most important research direction in the field of statistics. In the last decades, many outstanding results were pub- lished in literature (Bezdek, 1981; Coppi and D’Urso, 2003, 2006). The goal of clustering algorithms is to divide a set of observations or data into a finite number of clusters on the basis of a proper criterion. A classical approach is based on minimization of F(U,V) =

n

P

i=1 c

P

k=1

umikd2(xi,ck), where U is the matrix of membership functions, C the matrix of clusters centroids, uik the membership function of the vector xi ∈ Rp to cluster k, X the data matrix and da certain distance in Rp (Bezdek, 1981; Leski, 2003). In this approach, the number of clusters and m ∈ [1,∞) (the fuzziness coefficient or weighted exponent) are known. This method is usually called fuzzy c-means (Bezdek, 1981). For ameliorating the clustering procedure, in the sense of reduction of noise and outliers, new methods and algorithms were introduced. A method may be considered robust if it gives a good accuracy of the model, and small deviations from the model hypotheses do not substantially impair the per- formance while larger deviations do not invalidate the model ( Leski, 2003).

In the last decades, the general tendency is to renounce to use the fuzziness coefficient (Miyamoto and Mukaidono, 1997; Coppi and D’Urso, 2006) and to introduce a regularization term, namely the fuzzy entropy (which accounts for the uncertainty given by the membership degrees uik). An illustrative approach for these new models was developed by Coppi and D’Urso (2006)

MATH. REPORTS11(61),1 (2009), 59–65

(2)

and called cross-sectional fuzzy clustering model or fcm. As regularization term was takenp

I

P

i=1 C

P

c=1

uicloguic(a fuzzy variant of Shannon entropy), withp called “degree of fuzzy entropy”. Otherwise, the concept of “entropy” plays an important role in information theory. In this paper, the cross-sectional model is extended by using R´enyi’s entropy of order α.

2. A FUZZY CLUSTERING MODEL WITH R ´ENYI REGULARIZATION

In what follows Coppi-D’Urso’s cross sectional fcm [3] will be generalized using the instrumentality of R´enyi entropy of orderα. This kind of model deals with three types of information (see also [2]). We say that the set of initial dataX={xijt, i= 1, . . . , I;j = 1, . . . , J;t= 1, . . . , T}is a “time data array”

(units × variables ×times) if xijt is the jth LR1 fuzzy variable observed on the ith unit at time t[3].

Since this time wt(1) is an “instantaneous” weight [3], the cross sectional fcm with R´enyi regularization is equivalent to a weighted minimization prob- lem, namely,

minf

uic,w(1),h(1)ct

subject to

C

P

c=1

uic = 1 (uic≥0) and

T

P

t=1

w(1)t =1 (w(1)t ≥0), where

f

uic,w(1),h(1)ct

=

I

X

i=1 C

X

c=1

uic

T

X

t=1

w(1)t d(1)ict2

+p 1 1−α

I

X

i=1

"

log

C

X

c=1

uαic

!#

=

=

I

X

i=1 C

X

c=1

uic

T

X

t=1

w(1)t

xit−h(1)ct

2

+p 1 1−α

I

X

i=1

"

log

C

X

c=1

uαic

!#

. Here,h(1)ct gives the centroid of thecth cluster at timet, andui = (ui1, . . . , uic, . . . , uiC)T,w(1) = w(1)1 , . . . , w(1)t , . . . , w(1)T T

.

First, we search for ui, i = 1, . . . , I. Fix the values w(1)t ,h(1)ct . The Lagrangian function is given by

Lp(ui0, λ) =

C

X

c=1

ui0c

T

X

t=1

w(1)t d(1)i0ct

2

+p 1 1−αlog

C

X

c=1

uαi0c

!

−λ

C

X

c=1

ui0c−1

! . It follows from

∂Lp(ui0, λ)

∂ui0c0 = 0 and ∂Lp(ui0, λ)

∂λ = 0

(3)

that

T

X

t=1

wt(1)d(1)i0c0t

2

+p 1 1−α

αuα−1i0c0 C

P

c=1

uαi0c

−λ= 0,

C

X

c=1

ui0c= 1.

Thus,

T

X

t=1

wt(1)d(1)i0c0t

2

+p α 1−α

uα−1i0c0 C

P

c=1

uαi0c

−λ= 0

=⇒

" T X

t=1

w(1)t d(1)i0c0t

2# " C X

c=1

uαi0c

#

+p α

1−αuα−1i0c0 −λ

" C X

c=1

uαi0c

#

= 0

=⇒

 uαi0c0+

C

X

c=1 c6=c0

uαi0c

"

−λ+

T

X

t=1

w(1)t d(1)i0c0t

2#

+p α

1−αuα−1i0c0 = 0

=⇒uαi0c0

"

−λ+

T

X

t=1

w(1)t d(1)i0c0t

2#

+p α

1−αuα−1i0c0 +

+

C

X

c=1 c6=c0

uαi0c

"

−λ+

T

X

t=1

w(1)t d(1)i0c0t

2#

= 0.

The terms which contain the membership degrees will be replaced by the first Taylor polynom. We then have

(1+αui0c0−α)

"

−λ+

T

X

t=1

w(1)t d(1)i0c0t

2#

+p α

1−α[1 + (α−1)ui0c0−(α−1)]+

+

C

X

c=1 c6=c0

(1 +αui0c−α)

"

−λ+

T

X

t=1

h

w(1)t d(1)i0c0t

i2#

= 0

=⇒

"

−λ+

T

X

t=1

wt(1)d(1)i0c0t

2#

(1−α) +αui0c0

"

−λ+

T

X

t=1

w(1)t d(1)i0c0t

2# +

+pα(2−α)

1−α −pαui0c0+

C

X

c=1 c6=c0

(1−α) +α

C

X

c=1 c6=c0

ui0c

"

−λ+

T

X

t=1

w(1)t d(1)i0c0t

2#

= 0

(4)

=⇒

"

−λ+

T

X

t=1

wt(1)d(1)i0c0t

2#

(1−α) +αui0c0

"

−λ+

T

X

t=1

w(1)t d(1)i0c0t

2# +

+pα(2−α)

1−α −pαui0c0+ [(C−1)(1−α) +α(1−ui0c0)]

"

−λ+

T

X

t=1

w(1)t d(1)i0c0t

2#

= 0

=⇒

"

−λ+

T

X

t=1

wt(1)d(1)i0c0t

2#

(1−α) +αui0c0

"

−λ+

T

X

t=1

w(1)t d(1)i0c0t

2# +

+pα(2−α)

1−α −pαui0c0+ [(C−1) (1−α) +α]

"

−λ+

T

X

t=1

wt(1)d(1)i0c0t

2#

−αui0c0

"

−λ+

T

X

t=1

w(1)t d(1)i0c0t

2#

= 0 =⇒

"

−λ+

T

X

t=1

wt(1)d(1)i0c0t

2#

(1−α) +

+pα(2−α)

1−α −pαui0c0+ [(C−1) (1−α) +α]

"

−λ+

T

X

t=1

w(1)t d(1)i0c0t

2#

= 0.

With

−λ+

T

X

t=1

w(1)t d(1)i0c0t

2 not

= Di0c0, we obtain

Di0c0(1−α) +pα(2−α)

1−α −pαui0c0+ [(C−1) (1−α) +α]Di0c0 = 0

=⇒pαui0c0 = [(C−1) (1−α) +α+ (1−α)]Di0c0+pα(2−α) 1−α

=⇒ui0c0 = 2−α 1−α + 1

pα[(C−1) (1−α) + 1]Di0c0. Since

C

X

c0=1

ui0c0 = 1, we have

C

X

c0=1

2−α 1−α + 1

pα[(C−1) (1−α) + 1]Di0c0

= 1

=⇒ C(2−α) 1−α + 1

pα[(C−1) (1−α) + 1]

C

X

c0=1

"

−λ+

T

X

t=1

wt(1)d(1)i0c0t

2#

= 1

=⇒ C(2−α) 1−α + 1

pα[(C−1) (1−α) + 1]

" C X

c0=1 T

X

t=1

wt(1)d(1)i0c0t

2#

(5)

−C

pα[(C−1) (1−α) + 1]λ= 1 =⇒λ= pα(2−α)

(1−α) [(C−1) (1−α) + 1]+ +1

C

" C X

c0=1 T

X

t=1

wt(1)d(1)i0c0t

2#

− pα

C[(C−1) (1−α) + 1]. Thus,

Di0c0 =

" T X

t=1

w(1)t d(1)i0c0t

2#

− pα(2−α)

(1−α) [(C−1) (1−α) + 1]−

−1 C

" C X

c0=1 T

X

t=1

w(1)t d(1)i0c0t

2#

+ pα

C[(C−1) (1−α) + 1] =

=

T

X

t=1

"

w(1)t d(1)i0c0t

2

− 1 C

C

X

c0=1

wt(1)d(1)i0c0t

2#

+ pα(1−α−2C+αC) C(1−α) [(C−1) (1−α) + 1]

and

ui0c0 = 2−α 1−α + 1

pα[(C−1) (1−α) + 1]Di0c0 =

= 1 C + 1

pα[(C−1) (1−α) + 1]

T

X

t=1

"

wt(1)d(1)i0c0t

2

− 1 C

C

X

c0=1

w(1)t d(1)i0c0t

2# .

In order to find a solution forw(1)t , consider the Lagrangian function Lp

w(1), ρ

=

I

X

i=1 C

X

c=1

uic T

X

t=1

wt(1)d(1)ict 2

+

+p 1 1−α

I

X

i=1

"

log

C

X

c=1

uαic

!#

−ρ

T

X

t=1

w(1)t −1

! . From

∂Lp w(1), ρ

∂wt(1)0

= 0 and ∂Lp w(1), ρ

∂ρ = 0

we obtain 2wt(1)0

I

X

i=1 C

X

c=1

uic d(1)ict0

2

−ρ= 0 and

T

X

t=1

w(1)t −1 = 0.

Then

2w(1)t0

I

X

i=1 C

X

c=1

uic d(1)ict0

2

−ρ= 0

(6)

=⇒wt(1)0 = ρ 2

I

P

i=1 C

P

c=1

uic

d(1)ict0

2 = ρ 2

1

I

P

i=1 C

P

c=1

uic

d(1)ict0

2;

T

X

t=1

wt(1)= 1 =⇒

T

X

t0=1

w(1)t0 = 1 =⇒

T

X

t0=1

ρ 2

1

I

P

i=1 C

P

c=1

uic

d(1)ict0

2 = 1

=⇒ ρ 2

T

X

t0=1

1

I

P

i=1 C

P

c=1

uic d(1)ict0

2 = 1

=⇒ ρ

2 = 1

T

P

t0=1

1

I

P

i=1 C

P

c=1

uic d(1)ict0

2

=

T

X

t0=1

1

I

P

i=1 C

P

c=1

uic

d(1)ict0

2

−1

.

Thus,

wt(1)= ρ 2

1

I

P

i=1 C

P

c=1

uic

d(1)ict2 =

=

" I X

i=1 C

X

c=1

uic

d(1)ict

2#−1

T

X

t0=1

1

I

P

i=1 C

P

c=1

uic d(1)ict0

2

−1

= 1

T

P

t0=1 I

P

i=1 C

P

c=1

uic d(1)ict2 I

P

i=1 C

P

c=1

uic d(1)ict0

2

.

A solution forh(1)ct is obtained by solving the problem ming

h(1)ct , where

g

h(1)ct

=

I

X

i=1 C

X

c=1

uic T

X

t=1

w(1)t

xit−h(1)ct

2

+p 1 1−α

I

X

i=1

"

log

C

X

c=1

uαic

!#

.

Hence

h(1)ct =

I

P

i=1

uicxit I

P

i=1

uic .

(7)

Remark that, even theoretically, the solution for membership degrees with the R´enyi entropy are far away from that obtained with the Shannon regulari- zation [3] but the calculations for wt(1) and h(1)ct are similar to those in [3].

It is important to notice that the solutions depend on each other. It is thus necessary to resort to a certain procedure for a complete approach.

The final calculation is given by an iterative algorithm in four main steps [3]; from now on (in concrete situations, numerical calculus – the clustering methods with various types of entropy already have a wide range of practical applications), after establishing a nearness degree ξ between the members of the same cluster, a software (for instance Matlab, because some routines already exist in the field’s literature) is required.

REFERENCES

[1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, 1981.

[2] R. Coppi and P. D’Urso,Three way fuzzy clustering models for LR fuzzy time trajecto- ries. Comput. Statist. Data Anal.43(2003), 149–177.

[3] R. Coppi and P. D’Urso,Fuzzy unsupervised classification of multivariate time trajecto- ries with the Shannon entropy regularization. Computat. Statist. Data Anal.50(2006), 1452–1477.

[4] C. Lep˘adatu and E. Nit¸ulescu, Information energy and information temperature for molecular systems. Acta Chim. Slovaca50(2003), 539–546.

[5] J. Leski, Towards a robust fuzzy clustering. Theme: Data analysis. Fuzzy Sets and Systems137(2003), 215–233.

[6] S. Miyamoto and M. Mukaidono, Fuzzy c-means as a regularization and maximum entropy approach. In: Proc. Seventh Internat. Fuzzy Systems Assoc. World Congress, Vol. II, pp. 86–92. Prague, 1997.

[7] V. Preda,Teoria deciziilor statistice. Ed. Academiei Romˆane, Bucure¸sti, 1992.

[8] V. Preda, Optimality conditions for a nondifferentiable multiobjective problem. Revue Roumaine Math. Pures Appl.51(2006), 485–496.

[9] A. R´enyi, On measures of information and entropy. In: Proc. 4th Berkeley Sympos.

Math. Statist. Probab., Vol. 1, pp. 547–561. Univ. California Press, Berkeley, CA, 1961.

[10] C. E. Shannon,A mathematical theory of communication. Bell System Technical Journal 27 (1948), 379–423 and 623–656.

[11] A. Zadeh, Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans. Systems Man Cybern.3(1973), 28–44.

Received 8 May 2008 Academy of Economic Studies

Department of Mathematics Calea Dorobant¸ilor 15-17 010552 Bucharest, Romania

cippx@yahoo.com

Références

Documents relatifs

This artificial memory can be presented by direct-driven neural network (called static associative memory) and by recurrent neural network (called dynamic associative

Vincent Vandewalle, Thierry Mottet, Matthieu Marbac. Model-based variable clustering. CMStatis- tics/ERCIM 2017 - 10th International Conference of the ERCIM WG on Computational

The R package blockcluster allows to estimate the parameters of the co-clustering models [Govaert and Nadif (2003)] for binary, contingency and continuous data.. This package is

We prove the convergence (in suitable topologies) of the solution of the problem to the unique solution of the limit transport problem, as the diffusion coefficient tends to zero..

The criterion SICL has been conceived in the model-based clustering con- text to choose a sensible number of classes possibly well related to an exter- nal qualitative variable or a

We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting

Another approach to model selection is based on hold-out data: one estimates the generalization error by repeatedly training on a subset of the data and testing on the rest (e.g.

As the previous Monte Carlo experiments revealed, both algorithms tend to select the correct model in the context of large samples. Therefore, we consider the settings described in