WITH R´ ENYI ENTROPY REGULARIZATION
COSTIN CIPRIAN POPESCU
A fuzzy clustering model (fcm) with regularization function given by R´enyi en- tropy is discussed. We explore theoretically the cluster pattern in this mathemat- ical model, together with some considerations about the cross-sectional Coppi- D’Urso approach and R´enyi measure of entropy.
AMS 2000 Subject Classification: 94A17.
Key words: fuzzy clustering, regularization, R´enyi entropy.
1. INTRODUCTION
Fuzzy clustering is one of the most important research direction in the field of statistics. In the last decades, many outstanding results were pub- lished in literature (Bezdek, 1981; Coppi and D’Urso, 2003, 2006). The goal of clustering algorithms is to divide a set of observations or data into a finite number of clusters on the basis of a proper criterion. A classical approach is based on minimization of F(U,V) =
n
P
i=1 c
P
k=1
umikd2(xi,ck), where U is the matrix of membership functions, C the matrix of clusters centroids, uik the membership function of the vector xi ∈ Rp to cluster k, X the data matrix and da certain distance in Rp (Bezdek, 1981; Leski, 2003). In this approach, the number of clusters and m ∈ [1,∞) (the fuzziness coefficient or weighted exponent) are known. This method is usually called fuzzy c-means (Bezdek, 1981). For ameliorating the clustering procedure, in the sense of reduction of noise and outliers, new methods and algorithms were introduced. A method may be considered robust if it gives a good accuracy of the model, and small deviations from the model hypotheses do not substantially impair the per- formance while larger deviations do not invalidate the model ( Leski, 2003).
In the last decades, the general tendency is to renounce to use the fuzziness coefficient (Miyamoto and Mukaidono, 1997; Coppi and D’Urso, 2006) and to introduce a regularization term, namely the fuzzy entropy (which accounts for the uncertainty given by the membership degrees uik). An illustrative approach for these new models was developed by Coppi and D’Urso (2006)
MATH. REPORTS11(61),1 (2009), 59–65
and called cross-sectional fuzzy clustering model or fcm. As regularization term was takenp
I
P
i=1 C
P
c=1
uicloguic(a fuzzy variant of Shannon entropy), withp called “degree of fuzzy entropy”. Otherwise, the concept of “entropy” plays an important role in information theory. In this paper, the cross-sectional model is extended by using R´enyi’s entropy of order α.
2. A FUZZY CLUSTERING MODEL WITH R ´ENYI REGULARIZATION
In what follows Coppi-D’Urso’s cross sectional fcm [3] will be generalized using the instrumentality of R´enyi entropy of orderα. This kind of model deals with three types of information (see also [2]). We say that the set of initial dataX={xijt, i= 1, . . . , I;j = 1, . . . , J;t= 1, . . . , T}is a “time data array”
(units × variables ×times) if xijt is the jth LR1 fuzzy variable observed on the ith unit at time t[3].
Since this time wt(1) is an “instantaneous” weight [3], the cross sectional fcm with R´enyi regularization is equivalent to a weighted minimization prob- lem, namely,
minf
uic,w(1),h(1)ct
subject to
C
P
c=1
uic = 1 (uic≥0) and
T
P
t=1
w(1)t =1 (w(1)t ≥0), where
f
uic,w(1),h(1)ct
=
I
X
i=1 C
X
c=1
uic
T
X
t=1
w(1)t d(1)ict2
+p 1 1−α
I
X
i=1
"
log
C
X
c=1
uαic
!#
=
=
I
X
i=1 C
X
c=1
uic
T
X
t=1
w(1)t
xit−h(1)ct
2
+p 1 1−α
I
X
i=1
"
log
C
X
c=1
uαic
!#
. Here,h(1)ct gives the centroid of thecth cluster at timet, andui = (ui1, . . . , uic, . . . , uiC)T,w(1) = w(1)1 , . . . , w(1)t , . . . , w(1)T T
.
First, we search for ui, i = 1, . . . , I. Fix the values w(1)t ,h(1)ct . The Lagrangian function is given by
Lp(ui0, λ) =
C
X
c=1
ui0c
T
X
t=1
w(1)t d(1)i0ct
2
+p 1 1−αlog
C
X
c=1
uαi0c
!
−λ
C
X
c=1
ui0c−1
! . It follows from
∂Lp(ui0, λ)
∂ui0c0 = 0 and ∂Lp(ui0, λ)
∂λ = 0
that
T
X
t=1
wt(1)d(1)i0c0t
2
+p 1 1−α
αuα−1i0c0 C
P
c=1
uαi0c
−λ= 0,
C
X
c=1
ui0c= 1.
Thus,
T
X
t=1
wt(1)d(1)i0c0t
2
+p α 1−α
uα−1i0c0 C
P
c=1
uαi0c
−λ= 0
=⇒
" T X
t=1
w(1)t d(1)i0c0t
2# " C X
c=1
uαi0c
#
+p α
1−αuα−1i0c0 −λ
" C X
c=1
uαi0c
#
= 0
=⇒
uαi0c0+
C
X
c=1 c6=c0
uαi0c
"
−λ+
T
X
t=1
w(1)t d(1)i0c0t
2#
+p α
1−αuα−1i0c0 = 0
=⇒uαi0c0
"
−λ+
T
X
t=1
w(1)t d(1)i0c0t
2#
+p α
1−αuα−1i0c0 +
+
C
X
c=1 c6=c0
uαi0c
"
−λ+
T
X
t=1
w(1)t d(1)i0c0t
2#
= 0.
The terms which contain the membership degrees will be replaced by the first Taylor polynom. We then have
(1+αui0c0−α)
"
−λ+
T
X
t=1
w(1)t d(1)i0c0t
2#
+p α
1−α[1 + (α−1)ui0c0−(α−1)]+
+
C
X
c=1 c6=c0
(1 +αui0c−α)
"
−λ+
T
X
t=1
h
w(1)t d(1)i0c0t
i2#
= 0
=⇒
"
−λ+
T
X
t=1
wt(1)d(1)i0c0t
2#
(1−α) +αui0c0
"
−λ+
T
X
t=1
w(1)t d(1)i0c0t
2# +
+pα(2−α)
1−α −pαui0c0+
C
X
c=1 c6=c0
(1−α) +α
C
X
c=1 c6=c0
ui0c
"
−λ+
T
X
t=1
w(1)t d(1)i0c0t
2#
= 0
=⇒
"
−λ+
T
X
t=1
wt(1)d(1)i0c0t
2#
(1−α) +αui0c0
"
−λ+
T
X
t=1
w(1)t d(1)i0c0t
2# +
+pα(2−α)
1−α −pαui0c0+ [(C−1)(1−α) +α(1−ui0c0)]
"
−λ+
T
X
t=1
w(1)t d(1)i0c0t
2#
= 0
=⇒
"
−λ+
T
X
t=1
wt(1)d(1)i0c0t
2#
(1−α) +αui0c0
"
−λ+
T
X
t=1
w(1)t d(1)i0c0t
2# +
+pα(2−α)
1−α −pαui0c0+ [(C−1) (1−α) +α]
"
−λ+
T
X
t=1
wt(1)d(1)i0c0t
2#
−
−αui0c0
"
−λ+
T
X
t=1
w(1)t d(1)i0c0t
2#
= 0 =⇒
"
−λ+
T
X
t=1
wt(1)d(1)i0c0t
2#
(1−α) +
+pα(2−α)
1−α −pαui0c0+ [(C−1) (1−α) +α]
"
−λ+
T
X
t=1
w(1)t d(1)i0c0t
2#
= 0.
With
−λ+
T
X
t=1
w(1)t d(1)i0c0t
2 not
= Di0c0, we obtain
Di0c0(1−α) +pα(2−α)
1−α −pαui0c0+ [(C−1) (1−α) +α]Di0c0 = 0
=⇒pαui0c0 = [(C−1) (1−α) +α+ (1−α)]Di0c0+pα(2−α) 1−α
=⇒ui0c0 = 2−α 1−α + 1
pα[(C−1) (1−α) + 1]Di0c0. Since
C
X
c0=1
ui0c0 = 1, we have
C
X
c0=1
2−α 1−α + 1
pα[(C−1) (1−α) + 1]Di0c0
= 1
=⇒ C(2−α) 1−α + 1
pα[(C−1) (1−α) + 1]
C
X
c0=1
"
−λ+
T
X
t=1
wt(1)d(1)i0c0t
2#
= 1
=⇒ C(2−α) 1−α + 1
pα[(C−1) (1−α) + 1]
" C X
c0=1 T
X
t=1
wt(1)d(1)i0c0t
2#
−
−C
pα[(C−1) (1−α) + 1]λ= 1 =⇒λ= pα(2−α)
(1−α) [(C−1) (1−α) + 1]+ +1
C
" C X
c0=1 T
X
t=1
wt(1)d(1)i0c0t
2#
− pα
C[(C−1) (1−α) + 1]. Thus,
Di0c0 =
" T X
t=1
w(1)t d(1)i0c0t
2#
− pα(2−α)
(1−α) [(C−1) (1−α) + 1]−
−1 C
" C X
c0=1 T
X
t=1
w(1)t d(1)i0c0t
2#
+ pα
C[(C−1) (1−α) + 1] =
=
T
X
t=1
"
w(1)t d(1)i0c0t
2
− 1 C
C
X
c0=1
wt(1)d(1)i0c0t
2#
+ pα(1−α−2C+αC) C(1−α) [(C−1) (1−α) + 1]
and
ui0c0 = 2−α 1−α + 1
pα[(C−1) (1−α) + 1]Di0c0 =
= 1 C + 1
pα[(C−1) (1−α) + 1]
T
X
t=1
"
wt(1)d(1)i0c0t
2
− 1 C
C
X
c0=1
w(1)t d(1)i0c0t
2# .
In order to find a solution forw(1)t , consider the Lagrangian function Lp
w(1), ρ
=
I
X
i=1 C
X
c=1
uic T
X
t=1
wt(1)d(1)ict 2
+
+p 1 1−α
I
X
i=1
"
log
C
X
c=1
uαic
!#
−ρ
T
X
t=1
w(1)t −1
! . From
∂Lp w(1), ρ
∂wt(1)0
= 0 and ∂Lp w(1), ρ
∂ρ = 0
we obtain 2wt(1)0
I
X
i=1 C
X
c=1
uic d(1)ict0
2
−ρ= 0 and
T
X
t=1
w(1)t −1 = 0.
Then
2w(1)t0
I
X
i=1 C
X
c=1
uic d(1)ict0
2
−ρ= 0
=⇒wt(1)0 = ρ 2
I
P
i=1 C
P
c=1
uic
d(1)ict0
2 = ρ 2
1
I
P
i=1 C
P
c=1
uic
d(1)ict0
2;
T
X
t=1
wt(1)= 1 =⇒
T
X
t0=1
w(1)t0 = 1 =⇒
T
X
t0=1
ρ 2
1
I
P
i=1 C
P
c=1
uic
d(1)ict0
2 = 1
=⇒ ρ 2
T
X
t0=1
1
I
P
i=1 C
P
c=1
uic d(1)ict0
2 = 1
=⇒ ρ
2 = 1
T
P
t0=1
1
I
P
i=1 C
P
c=1
uic d(1)ict0
2
=
T
X
t0=1
1
I
P
i=1 C
P
c=1
uic
d(1)ict0
2
−1
.
Thus,
wt(1)= ρ 2
1
I
P
i=1 C
P
c=1
uic
d(1)ict2 =
=
" I X
i=1 C
X
c=1
uic
d(1)ict
2#−1
T
X
t0=1
1
I
P
i=1 C
P
c=1
uic d(1)ict0
2
−1
= 1
T
P
t0=1 I
P
i=1 C
P
c=1
uic d(1)ict2 I
P
i=1 C
P
c=1
uic d(1)ict0
2
.
A solution forh(1)ct is obtained by solving the problem ming
h(1)ct , where
g
h(1)ct
=
I
X
i=1 C
X
c=1
uic T
X
t=1
w(1)t
xit−h(1)ct
2
+p 1 1−α
I
X
i=1
"
log
C
X
c=1
uαic
!#
.
Hence
h(1)ct =
I
P
i=1
uicxit I
P
i=1
uic .
Remark that, even theoretically, the solution for membership degrees with the R´enyi entropy are far away from that obtained with the Shannon regulari- zation [3] but the calculations for wt(1) and h(1)ct are similar to those in [3].
It is important to notice that the solutions depend on each other. It is thus necessary to resort to a certain procedure for a complete approach.
The final calculation is given by an iterative algorithm in four main steps [3]; from now on (in concrete situations, numerical calculus – the clustering methods with various types of entropy already have a wide range of practical applications), after establishing a nearness degree ξ between the members of the same cluster, a software (for instance Matlab, because some routines already exist in the field’s literature) is required.
REFERENCES
[1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, 1981.
[2] R. Coppi and P. D’Urso,Three way fuzzy clustering models for LR fuzzy time trajecto- ries. Comput. Statist. Data Anal.43(2003), 149–177.
[3] R. Coppi and P. D’Urso,Fuzzy unsupervised classification of multivariate time trajecto- ries with the Shannon entropy regularization. Computat. Statist. Data Anal.50(2006), 1452–1477.
[4] C. Lep˘adatu and E. Nit¸ulescu, Information energy and information temperature for molecular systems. Acta Chim. Slovaca50(2003), 539–546.
[5] J. Leski, Towards a robust fuzzy clustering. Theme: Data analysis. Fuzzy Sets and Systems137(2003), 215–233.
[6] S. Miyamoto and M. Mukaidono, Fuzzy c-means as a regularization and maximum entropy approach. In: Proc. Seventh Internat. Fuzzy Systems Assoc. World Congress, Vol. II, pp. 86–92. Prague, 1997.
[7] V. Preda,Teoria deciziilor statistice. Ed. Academiei Romˆane, Bucure¸sti, 1992.
[8] V. Preda, Optimality conditions for a nondifferentiable multiobjective problem. Revue Roumaine Math. Pures Appl.51(2006), 485–496.
[9] A. R´enyi, On measures of information and entropy. In: Proc. 4th Berkeley Sympos.
Math. Statist. Probab., Vol. 1, pp. 547–561. Univ. California Press, Berkeley, CA, 1961.
[10] C. E. Shannon,A mathematical theory of communication. Bell System Technical Journal 27 (1948), 379–423 and 623–656.
[11] A. Zadeh, Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans. Systems Man Cybern.3(1973), 28–44.
Received 8 May 2008 Academy of Economic Studies
Department of Mathematics Calea Dorobant¸ilor 15-17 010552 Bucharest, Romania
cippx@yahoo.com