Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Robustness of classification based on
clustering
Ch. Ruwet
D
EPARTMENT OFM
ATHEMATICS- U
NIVERSITY OFL
IÈGERobustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Outline
1
Introduction
2
Statistical functionals
3
Error Rate
4
Influence functions
5
Asymptotic Loss
6
Breakdown point
7
Some improvements
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Outline
1
Introduction
2
Statistical functionals
3
Error Rate
4
Influence functions
5
Asymptotic Loss
6
Breakdown point
7
Some improvements
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Classification
0
5
10
15
0
5
10
15
Algèbre
Analyse
Succès
Echec
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Clustering
0
5
10
15
0
5
10
15
Algèbre
Analyse
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
The k -means clustering method
X
n
= {
x
1
, . . . ,
x
n
}
a dataset in p dimensions;
Aim of clustering
: Group similar observations in k
clusters C
1
, . . . ,
C
k
;
The k -means algorithm constructs clusters in order to
minimize the within cluster sum of squared distances
The clusters centers
(
T
1
n
, . . . ,
T
k
n
)
are solutions of
min
{
t
1,...,
t
k}⊂Rpn
X
i
=
1
inf
1
≤
j
≤
k
k
x
i
−
t
j
k
2
;
The classification rule:
x
∈
C
j
n
⇔ k
x
−
T
j
n
k =
min
1
≤
i
≤
k
k
x
−
T
n
i
k;
Let us focus on k
=
2 groups:
C
1
n
=
x
∈ R
p
: (
T
1
n
−
T
2
n
)
t
x
−
1
2
k
T
1
n
k
2
− k
T
2
n
k
2
>
0
.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
The k -means clustering method
X
n
= {
x
1
, . . . ,
x
n
}
a dataset in p dimensions;
Aim of clustering
: Group similar observations in k
clusters C
1
, . . . ,
C
k
;
The k -means algorithm constructs clusters in order to
minimize the within cluster sum of squared distances
The clusters centers
(
T
1
n
, . . . ,
T
k
n
)
are solutions of
min
{
t
1,...,
t
k}⊂R
pn
X
i
=
1
inf
1
≤
j
≤
k
k
x
i
−
t
j
k
2
;
The classification rule:
x
∈
C
j
n
⇔ k
x
−
T
j
n
k =
min
1
≤
i
≤
k
k
x
−
T
n
i
k;
Let us focus on k
=
2 groups:
C
1
n
=
x
∈ R
p
: (
T
1
n
−
T
2
n
)
t
x
−
1
2
k
T
1
n
k
2
− k
T
2
n
k
2
>
0
.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
The k -means clustering method
X
n
= {
x
1
, . . . ,
x
n
}
a dataset in p dimensions;
Aim of clustering
: Group similar observations in k
clusters C
1
, . . . ,
C
k
;
The k -means algorithm constructs clusters in order to
minimize the within cluster sum of squared distances
The clusters centers
(
T
1
n
, . . . ,
T
k
n
)
are solutions of
min
{
t
1,...,
t
k}⊂R
pn
X
i
=
1
inf
1
≤
j
≤
k
k
x
i
−
t
j
k
2
;
The classification rule:
x
∈
C
j
n
⇔ k
x
−
T
j
n
k =
min
1
≤
i
≤
k
k
x
−
T
n
i
k;
Let us focus on k
=
2 groups:
C
1
n
=
x
∈ R
p
: (
T
1
n
−
T
2
n
)
t
x
−
1
2
k
T
1
n
k
2
− k
T
2
n
k
2
>
0
.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Result of 2-means
0
5
10
15
20
0
5
10
15
20
Algèbre
Analyse
T
1T
2C
2C
1Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Introduction of contamination
0
5
10
15
20
0
5
10
15
20
Algèbre
Analyse
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Result of 2-means with contamination
0
5
10
15
20
0
5
10
15
20
Algèbre
Analyse
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
The generalized 2-means clustering method
The clusters centers
(
T
n
1
,
T
2
n
)
are solution of
min
{
t
1,
t
2}⊂R
pn
X
i
=
1
Ω
inf
1
≤
j
≤
2
k
x
i
−
t
j
k
for an increasing penalty function
Ω : R
+
→ R
+
such
that
Ω(0) =
0.
Classical penalty functions:
Ω(
x
) =
x
2
→
2-means method
Ω(
x
) =
x
→
2-medoids method
The classification rule:
x
∈
C
1
n
⇔ Ω(k
x
−
T
1
n
k) ≤ Ω(k
x
−
T
2
n
k)
⇔ k
x
−
T
1
n
k ≤ k
x
−
T
2
n
k.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Classical penalty functions
0
1
2
3
4
5
0
5
10
15
20
25
x
2−means
2−medoids
Ω
(
x
)
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Result of 2-medoids
0
5
10
15
20
0
5
10
15
20
Algèbre
Analyse
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Result of 2-medoids with contamination
0
5
10
15
20
0
5
10
15
20
Algèbre
Analyse
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Outline
1
Introduction
2
Statistical functionals
3
Error Rate
4
Influence functions
5
Asymptotic Loss
6
Breakdown point
7
Some improvements
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Statistical functionals
The empirical
distribution F
n
is
replaced by a
cumulative distribution
F
∈ F
;
A statistical functional
T
: F → R
l
:
F
7→
T
(
F
)
such that T
(
F
n
) =
T
n
.
−3 −2 −1 0 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 x onction de répar tition Fn F FRobustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Classification setting
Suppose
X
∼
F arises from G
1
and G
2
with
π
i
(
F
) =
IP
F
[
X
∈
G
i
]
then
F is a mixture of two distributions
F
= π
1
(
F
)
F
1
+ π
2
(
F
)
F
2
with
π
1
+ π
2
=
1 and where F
1
and F
2
are the conditional
distributions under G
1
and G
2
with densities f
1
and f
2
.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Univariate mixture distribution
−4
−2
0
2
4
0.00
0.05
0.10
0.15
0.20
F
x
0
.
4
0
.
6
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
The generalized 2-means statistical functionals
The clusters centers
(
T
1
(
F
),
T
2
(
F
))
are solution of
min
{
t
1,
t
2}⊂R
pZ
Ω
inf
1
≤
j
≤
2
k
x
−
t
j
k
dF
(
x
)
for a suitable increasing penalty function
Ω;
The classification rule is
R
F
:
x
7→
j
⇔ k
x
−
T
j
(
F
)k =
min
1
≤
i
≤
2
k
x
−
T
i
(
F
)k;
The clusters are
C
1
(
F
) =
x
∈ R
p
:
A
(
F
)
t
x
+
b
(
F
) >
0
C
2
(
F
) = R
p
\
C
1
(
F
)
with A(
F
) =
T
1
(
F
) −
T
2
(
F
)
and b(
F
) = −
1
2
k
T
1
(
F
)k
2
− k
T
2
(
F
)k
2
.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Mixture of spherically symmetric distributions
X
∼
F
µ,σ
2if its density is
f
µ,σ
2(
x
) =
K
σ
p
g
(
x
− µ)
t
(
x
− µ)
σ
2
where g is a non-increasing generator function and with K a
constant such that the honesty condition holds.
Examples:
Multivariate Normal distribution: g(
r
) =
exp(−
r
2
)
Multivariate Student distribution: g(
r
) =
1
+
r
ν
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Mixture of spherically symmetric distributions
X
∼
F
µ,σ
2if its density is
f
µ,σ
2(
x
) =
K
σ
p
g
(
x
− µ)
t
(
x
− µ)
σ
2
where g is a non-increasing generator function and with K a
constant such that the honesty condition holds.
Examples:
Multivariate Normal distribution: g(
r
) =
exp(−
r
2
)
Multivariate Student distribution: g(
r
) =
1
+
r
ν
−
ν+2pModel (M):
(M)
F
M
= π
1
F
−µ,σ
2+ π
2
F
µ,σ
2with
µ = µ
1
e
1
and
µ
1
>
0
.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Representation
−4
−2
0
2
4
−4
−2
0
2
4
x1
x2
F
M
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Position of T
1
(F
M
)
and T
2
(F
M
)
2-means:
Proposition (Kurata and Qiu, 2011)
Under the model distribution (M), the 2-means centers are
on the first axis.
Generalized 2-means:
Conjecture (Ruwet and Haesbroeck, 2011)
Under the model distribution (M), the generalized 2-means
centers are on the first axis.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Position of T
1
(F
M
)
and T
2
(F
M
)
2-means:
Proposition (Kurata and Qiu, 2011)
Under the model distribution (M), the 2-means centers are
on the first axis.
Generalized 2-means:
Conjecture (Ruwet and Haesbroeck, 2011)
Under the model distribution (M), the generalized 2-means
centers are on the first axis.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Outline
1
Introduction
2
Statistical functionals
3
Error Rate
4
Influence functions
5
Asymptotic Loss
6
Breakdown point
7
Some improvements
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Error rate of the 2-means
0
5
10
15
20
0
5
10
15
20
Algèbre
Analyse
Succès
Echec
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Error rate
Training sample according to F : estimation of the rule
Test sample according to F
m
: evaluation of the rule
In ideal circumstances : F
=
F
m
Probability to misclassify data coming from F
m
:
ER(
F
,
F
m
) = π
1
(
F
m
)IP
F
m[
R
F
(
X
) 6=
1|
G
1
]
+ π
2
(
F
m
)IP
F
m[
R
F
(
X
) 6=
2|
G
2
]
=
2
X
j
=1
π
j
(
F
m
)IP
F
mR
F
(
X
) 6=
j
|
G
j
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Error rate
Training sample according to F : estimation of the rule
Test sample according to F
m
: evaluation of the rule
In ideal circumstances : F
=
F
m
Probability to misclassify data coming from F
m
:
ER(
F
,
F
m
) = π
1
(
F
m
)IP
F
m[
R
F
(
X
) 6=
1|
G
1
]
+ π
2
(
F
m
)IP
F
m[
R
F
(
X
) 6=
2|
G
2
]
=
2
X
j
=
1
π
j
(
F
m
)IP
F
mR
F
(
X
) 6=
j
|
G
j
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Optimality in classification
A classification rule is optimal if the corresponding error
rate is minimal;
The optimal classification rule is the Bayes rule :
x
∈
C
1
(
F
) ⇔ π
1
(
F
)
f
1
(
x
) > π
2
(
F
)
f
2
(
x
)
(Anderson, 1958).
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Optimality in classification
A classification rule is optimal if the corresponding error
rate is minimal;
The optimal classification rule is the Bayes rule :
x
∈
C
1
(
F
) ⇔ π
1
(
F
)
f
1
(
x
) > π
2
(
F
)
f
2
(
x
)
(Anderson, 1958).
Proposition (Ruwet and Haesbroeck, 2011)
The 2-means procedure is optimal under the model
F
O
=
0.5 F
−µ
,σ
2+
0.5 F
µ
,σ
2with
µ = µ
1
e
1
and
µ
1
>
0.
With the Conjecture, the generalized 2-means procedures
are also optimal under F
O
.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Contaminated distribution
A contaminated distribution is defined by
F
ε
""
D
D
D
D
D
D
D
D
zzuu
uu
uu
uu
uu
1
− ε :
F
ε :
G
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Contaminated distribution
A contaminated distribution is defined by
F
ε
""
D
D
D
D
D
D
D
D
zzuu
uu
uu
uu
uu
1
− ε :
F
ε :
G
where G is a arbitrary distribution function.
To see the influence of one singular point x , G
= ∆
x
leading
to
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Contaminated mixture
−4
−2
0
2
4
0.00
0.05
0.10
0.15
0.20
x
(1−0.05) 0.6
(1−0.05) 0.4
0.05
F
ε,
x
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Error rate under contamination
Contaminated training sample according to F
ε
:
estimation of the rule
Test sample according to F
m
: evaluation of the rule
ER(
F
ε,
F
m
) =
2
X
j
=
1
π
j
(
F
m
)IP
F
mR
Fε
(
X
) 6=
j
|
G
j
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Outline
1
Introduction
2
Statistical functionals
3
Error Rate
4
Influence functions
5
Asymptotic Loss
6
Breakdown point
7
Some improvements
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Definition and properties of the first order
influence function
Hampel et al. (1986) : For any statistical functional T and
any distribution F ,
IF(
x
;
T,
F
) =
lim
ε→
0
T((1
− ε)
F
+ ε∆
x
) −
T(
F
)
ε
=
∂
∂ε
T((1
− ε)
F
+ ε∆
x
)
ε=
0
(under condition of existence);
E
F
[
IF
(
X
;
T
,
F
)] =
0;
First order Taylor expansion of T at F :
T
(
F
ε,
x
) ≈
T
(
F
) + ε
IF
(
x
;
T
,
F
)
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Definition and properties of the first order
influence function
Hampel et al. (1986) : For any statistical functional T and
any distribution F ,
IF(
x
;
T,
F
) =
lim
ε→
0
T((1
− ε)
F
+ ε∆
x
) −
T(
F
)
ε
=
∂
∂ε
T((1
− ε)
F
+ ε∆
x
)
ε=
0
(under condition of existence);
E
F
[IF(
X
;
T,
F
)] =
0;
First order Taylor expansion of T at F :
T(
F
ε,
x
) ≈
T(
F
) + εIF(
x
;
T,
F
)
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
First order influence function of the error rate
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
First order influence function of the error rate
Now, the training sample is distributed as F
ε,
x
.
If the model distribution is F
O
,
ER(
F
ε,
x
,
F
O
) ≈
ER(
F
O
,
F
O
) + εIF(
x
;
ER,
F
O
)
ER(
F
ε,
x
,
F
O
) ≥
ER(
F
O
,
F
O
)
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
First order influence function of the error rate
Now, the training sample is distributed as F
ε,
x
.
If the model distribution is F
O
,
ER(
F
ε,
x
,
F
O
) ≈
ER(
F
O
,
F
O
) + εIF(
x
;
ER,
F
O
)
ER(
F
ε,
x
,
F
O
) ≥
ER(
F
O
,
F
O
)
E
F
O[IF(
X
;
ER,
F
O
)] =
0
⇒
IF(
x
;
ER,
F
O
) ≡
0
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Definition of the second order influence function
For any statistical functional T and any distribution F ,
IF2(
x
;
T,
F
O
) =
∂
2
∂ε
2
T((1
− ε)
F
O
+ ε∆
x
)
ε=
0
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Definition of the second order influence function
For any statistical functional T and any distribution F ,
IF2(
x
;
T,
F
O
) =
∂
2
∂ε
2
T((1
− ε)
F
O
+ ε∆
x
)
ε=
0
(under condition of existence).
Second order Taylor expansion of ER at F
O
:
ER
(
F
ε,x
,
F
O
) ≈
ER
(
F
O
,
F
O
) +
ε
2
2
IF2
(
x
;
ER
,
F
O
)
for
ε
small enough.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
First order influence function of the error rate
under F
M
Proposition (Ruwet and Haesbroeck, 2011)
Under F
M
, the first order influence function of the error rate
of the generalized 2-means classification procedure is given
by
IF(
x
;
ER,
F
M
) =
π
2
− π
1
2
f
µ,σ
2(0)
IF(
x
;
T
1
,
F
M
)+IF(
x
;
T
2
,
F
M
)
t
e
1
for all x such that A
(
F
M
)
t
x
+
b
(
F
M
) 6=
0.
This influence function is bounded as soon as the influence
functions of the generalized 2-means centers (see next
slide) are bounded.
The influence function is also available for any model
distribution F .
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
First order influence function of the generalized
2-means centers
Proposition (García-Escudero and Gordaliza, 1999)
The influence function of the generalized 2-means centers
T
1
and T
2
is given by
IF(
x
;
T
1
,
F
m
)
IF(
x
;
T
2
,
F
m
)
=
M
−
1
ω
1
(
x
)
ω
2
(
x
)
where
ω
i
(
x
) = −
grad
y
Ω(k
y
k)
y
=
x
−
T
i(
F
m)
I(
x
∈
C
i
(
F
m
))
and
where the matrix M depends only on the distribution F
m
.
This influence function is bounded as soon as M
−
1
exists
and as soon as the gradient of
Ω
is bounded.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Univariate second order influence function of
the error rate under F
O
Proposition (Ruwet and Haesbroeck, 2011)
Under F
O
, the univariate second order influence function of
the error rate of the generalized 2-means classification
procedure is given by
IF2(
x
;
ER,
F
O
) = −
1
4
f
′
−µ,σ
2(0)
IF(
x
;
T
1
,
F
O
)+IF(
x
;
T
2
,
F
O
)
2
for all x such that A
(
F
O
)
x
+
b
(
F
O
) 6=
0.
The influence function is also available for multivariate
distributions.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Graph of IF(x
;
ER,
F
M
)
−4
−2
0
2
4
−0.2
−0.1
0.0
0.1
0.2
x
IF ER
2−means
2−medoids
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Graph of IF(x
;
ER,
F
M
)
−4
−2
0
2
4
−2.0
−1.0
0.0
0.5
2−means
x
IF ER
π
1=
0
.
4
π
1=
0
.
6
π
1=
0
.
2
π
1=
0
.
8
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Graph of IF(x
;
ER,
F
M
)
−4
−2
0
2
4
−0.2
−0.1
0.0
0.1
0.2
2−medoids
x
IF ER
∆=
1
∆=
3
∆=
5
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Graph of IF2(x
;
ER,
F
O
)
−4
−2
0
2
4
0
1
2
3
4
5
6
x
IF2 ER
2−means
2−medoids
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Outline
1
Introduction
2
Statistical functionals
3
Error Rate
4
Influence functions
5
Asymptotic Loss
6
Breakdown point
7
Some improvements
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Asymptotic loss
Under optimality (F
O
), a measure of the expected increase
in error rate when estimating the optimal clustering rule from
a finite sample with empirical cdf F
n
is
A-Loss
=
lim
n
→∞
n E
F
O[ER(
F
n
,
F
O
) −
ER(
F
O
,
F
O
)].
As in Croux et al. (2008) :
Proposition
Under some regularity conditions of the clusters centers
estimators,
A-Loss
=
1
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Asymptotic loss
Proposition (Ruwet and Haesbroeck, 2011)
Under an optimal mixture of normal distributions, F
N
, with
µ = ∆/2 e
1
, the asymptotic loss of the generalized 2-means
procedure is given by
A-Loss
=
∆
16σ
3
τ
2
f
0
,
1
∆
2σ
τ
2
[ASV(
T
21
) +
ASV(
T
11
)
+
2ASC(
T
11
,
T
21
)]
+ σ
2
[ASV(
T
12
) +
ASV(
T
22
) −
2ASC(
T
12
,
T
22
)]
where ASV and ASC stand for the asymptotic variance and
covariance of their component (at the model distribution).
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Graph of the asymptotic loss
0
2
4
6
8
0.0
0.5
1.0
1.5
Delta
A−Loss
2−means
2−medoids
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Asymptotic relative classification efficiencies
A measure of the price one needs to pay in error rate for
protection against the outliers when using a robust
procedure instead of the classical one is
ARCE(Robust,Classical)
=
A-Loss(Classical)
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Asymptotic relative classification efficiencies
A measure of the price one needs to pay in error rate for
protection against the outliers when using a robust
procedure instead of the classical one is
ARCE(Robust,Classical)
=
A-Loss(Classical)
A-Loss(Robust)
.
More generally, the ARCE of a method (Method 1) w.r.t.
another one (Method 2) is given by
ARCE(Method 1,Method 2)
=
A-Loss(Method 2)
A-Loss(Method 1)
.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
ARCE of 2-medoids w.r.t. 2-means
0
2
4
6
8
0.60
0.65
0.70
Delta
ARCE
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
ARCE of classification procedures w.r.t.
2-means
0.4
0.6
0.8
1.0
1.2
1.4
0.0
0.2
0.4
0.6
Delta
ARCE
Fisher
Logistic
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Outline
1
Introduction
2
Statistical functionals
3
Error Rate
4
Influence functions
5
Asymptotic Loss
6
Breakdown point
7
Some improvements
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Intuitive definition
The breakdown point (BDP) is the minimal fraction of
outliers that needs to be added (addition BDP) or replaced
(replacement BDP) in order to destroy completely the
estimator, i.e. to get an estimation
at infinity (Hampel, 1971);
at the bounds of the support of the estimator (He and
Simpson, 1992);
which is restricted to a finite set while it could lie in an
infinite set without contamination (Genton and Lucas,
2003);
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
BDP of the error rate of the generalized
2-means
Let the observation x of the training sample (F
ε,
x
) tend to
infinity
⇒
It becomes the center of a cluster with this observation
only even if
Ω(
x
) =
x (García-Escudero and Gordaliza,
1999);
⇒
One entire group of the test sample (F ) is badly
classified while the other is well classified;
⇒
ER
(
F
ε
,
F
) = π
1
or ER
(
F
ε
,
F
) = π
2
for any sample;
⇒
ER has broken down in the sense of Genton and Lucas
(2003);
⇒
The BDP of the ER is 1
/
n which tends to zero as
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
BDP of the error rate of the generalized
2-means
Let the observation x of the training sample (F
ε,
x
) tend to
infinity
⇒
It becomes the center of a cluster with this observation
only even if
Ω(
x
) =
x (García-Escudero and Gordaliza,
1999);
⇒
One entire group of the test sample (F ) is badly
classified while the other is well classified;
⇒
ER
(
F
ε
,
F
) = π
1
or ER
(
F
ε
,
F
) = π
2
for any sample;
⇒
ER has broken down in the sense of Genton and Lucas
(2003);
⇒
The BDP of the ER is 1
/
n which tends to zero as
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
BDP of the error rate of the generalized
2-means
Let the observation x of the training sample (F
ε,
x
) tend to
infinity
⇒
It becomes the center of a cluster with this observation
only even if
Ω(
x
) =
x (García-Escudero and Gordaliza,
1999);
⇒
One entire group of the test sample (F ) is badly
classified while the other is well classified;
⇒
ER
(
F
ε
,
F
) = π
1
or ER
(
F
ε
,
F
) = π
2
for any sample;
⇒
ER has broken down in the sense of Genton and Lucas
(2003);
⇒
The BDP of the ER is 1
/
n which tends to zero as
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
BDP of the error rate of the generalized
2-means
Let the observation x of the training sample (F
ε,
x
) tend to
infinity
⇒
It becomes the center of a cluster with this observation
only even if
Ω(
x
) =
x (García-Escudero and Gordaliza,
1999);
⇒
One entire group of the test sample (F ) is badly
classified while the other is well classified;
⇒
ER(
F
ε
,
F
) = π
1
or ER(
F
ε
,
F
) = π
2
for any sample;
⇒
ER has broken down in the sense of Genton and Lucas
(2003);
⇒
The BDP of the ER is 1
/
n which tends to zero as
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
BDP of the error rate of the generalized
2-means
Let the observation x of the training sample (F
ε,
x
) tend to
infinity
⇒
It becomes the center of a cluster with this observation
only even if
Ω(
x
) =
x (García-Escudero and Gordaliza,
1999);
⇒
One entire group of the test sample (F ) is badly
classified while the other is well classified;
⇒
ER(
F
ε
,
F
) = π
1
or ER(
F
ε
,
F
) = π
2
for any sample;
⇒
ER has broken down in the sense of Genton and Lucas
(2003);
⇒
The BDP of the ER is 1
/
n which tends to zero as
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
BDP of the error rate of the generalized
2-means
Let the observation x of the training sample (F
ε,
x
) tend to
infinity
⇒
It becomes the center of a cluster with this observation
only even if
Ω(
x
) =
x (García-Escudero and Gordaliza,
1999);
⇒
One entire group of the test sample (F ) is badly
classified while the other is well classified;
⇒
ER(
F
ε
,
F
) = π
1
or ER(
F
ε
,
F
) = π
2
for any sample;
⇒
ER has broken down in the sense of Genton and Lucas
(2003);
⇒
The BDP of the ER is 1/
n which tends to zero as
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Outline
1
Introduction
2
Statistical functionals
3
Error Rate
4
Influence functions
5
Asymptotic Loss
6
Breakdown point
7
Some improvements
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
The trimming approach
Idea:
Delete extreme observations!
Problem:
How can we detect extreme observations?
Solution:
Impartial trimming
k the fixed number of clusters;
α ∈ [0,
1[
the trimming size;
X
n
= {
x
1
, . . . ,
x
n
} ∈ R
p
a dataset that is not
concentrated on k points after removing a mass equal
to
α;
Optimization over partitions
R = {
R
1
, . . . ,
R
k
}
of
{1, . . . ,
n
}
with
⌈
n
(1
− α)⌉
observations;
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
The generalized trimmed k -means method
Cuesta-Albertos et al., 1997
The clusters centers
(
T
1
n
, . . . ,
T
k
n
)
are solutions of the
double minimization problem
min
R
{
t
1,...,
min
t
k}⊂R
pX
x
i∈R
Ω
inf
1
≤
j
≤
k
k
x
i
−
t
j
k
;
The classification rule:
x
∈
C
j
n
⇔
k
x
−
T
j
n
k =
min
1
≤
i
≤
k
k
x
−
T
i
n
k
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
The generalized trimmed k -means method
Cuesta-Albertos et al., 1997
The clusters centers
(
T
1
n
, . . . ,
T
k
n
)
are solutions of the
double minimization problem
min
R
{
t
1,...,
min
t
k}⊂R
pX
x
i∈R
Ω
inf
1
≤
j
≤
k
k
x
i
−
t
j
k
;
The classification rule:
x
∈
C
j
n
⇔
k
x
−
T
j
n
k =
min
1
≤
i
≤
k
k
x
−
T
i
n
k
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Properties of the generalized trimmed 2-means
García-Escudero and Gordaliza, 1999
Bounded IF whatever
Ω;
Better breakdown behavior.
Ruwet and Haesbroeck (unpublished result)
If the Conjecture also holds for the generalized trimmed
2-means, this procedure is optimal under the model F
O
.
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Problem of all "k -means type" procedures
−20 −10 0 10 20 30 −20 −10 0 10
Simulated dataset
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Problem of all "k -means type" procedures
−20 −10 0 10 20 30 −20 −10 0 10
Simulated dataset
−20 −10 0 10 20 30 −20 −10 0 10Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
The TCLUST procedure
García-Escudero et al., 2008
Optimization also over the scatter matrices S
n
j
and the
weights p
j
n
such that
P
k
j
=
1
p
j
=
1;
Maximization of
k
X
j
=
1
X
i
∈
R
jlog p
j
ϕ
x
i
;
T
j
,
S
j
where
ϕ
is the pdf of the Gaussian distribution;
Eigenvalues-ratio restriction:
M
n
m
n
=
max
j
=
1
,...,
k
max
l
=
1
,...,
p
λ
l
(
S
j
)
min
j
=
1
,...,
k
min
l
=
1
,...,
p
λ
l
(
S
j
)
≤
c
for a constant c
≥
1 and where
λ
l
(
S
j
)
are the
eigenvalues of S
j
, l
=
1, . . . ,
p and j
=
1, . . . ,
k .
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements
Improvement with the TCLUST procedure
−20 −10 0 10 20 30 −20 −10 0 10
Simulated dataset
−20 −10 0 10 20 30 −20 −10 0 10Estimation by TCLUST
Robustness of classification based on clustering Ch. Ruwet Introduction Statistical functionals Error Rate Influence functions Asymptotic Loss Breakdown point Some improvements